Repository of colleges and higher education institutions

Show document
A+ | A- | Help | SLO | ENG

Title:Razvoj specializiranega velikega jezikovnega modela za analizo medijskih besedil
Authors:ID Petek, Jon (Author)
ID Rončević, Borut (Mentor) More about this mentor... New window
ID Mileva Boshkoska, Biljana (Comentor)
Files:.pdf 16434$$zakljucno_delo.pdf (1,78 MB)
MD5: 41D578FA9D4613852566D245E7F146DF
 
Language:Slovenian
Work type:Master's thesis/paper
Organization:FUDS - School of advanced social studies
Abstract:Magistrska naloga obravnava razvoj specializiranega velikega jezikovnega modela za slovenščino, namenjenega celoviti analizi medijskih besedil. V ospredju je izziv obvladovanja dezinformacij, pristranskosti in polarizacije medijskega prostora, ki zahteva napredna orodja za podporo medijski pismenosti in preverjanju dejstev. Razvita rešitev združuje večopravilno arhitekturo, ki omogoča sočasno izvajanje štirih ključnih nalog: tematsko kategorizacijo, analizo sentimenta, zaznavanje politične pristranskosti ter ocenjevanje verodostojnosti virov. V raziskavi je bil zbran in anotiran obsežen korpus slovenskih medijskih besedil, ki je služil kot učna zbirka za prilagoditev (fine-tuning) obstoječega transformacijskega modela GaMS-9B-Instruct. Nastali model, poimenovan »Klasifikacijski model za medijsko analizo« (KMMA), uporablja večglavo arhitekturo, kjer skupna jedrna plast deli predstavitve med nalogami, izhodne plasti pa so specializirane za posamezne analize. Tak pristop omogoča prenos znanja med nalogami in povečuje robustnost klasifikacije. Rezultati evalvacije na ločenem testnem naboru 518 člankov so pokazali izrazito prednost KMMA v primerjavi z odprtokodnim modelom Qwen2.5-7B-Instruct in osnovnim GaMS-9B-Instruct. Povprečna natančnost našega modela presega 90 %, pri čemer dosega uravnotežene metrike makro-F1 za vse razrede. Pri tematski kategorizaciji je natančnost dosegla 93,6 %, pri analizi sentimenta 85,1 %, pri detekciji politične pristranskosti skoraj 89 %, pri oceni verodostojnosti pa 88 %. V nasprotju z obema primerjalnima modeloma KMMA uspešno prepoznava tudi redkejše razrede in ne favorizira večinskih. Na koncu lahko sklenemo, da je naloga uspešno dosegla zastavljene cilje. Razviti model predstavlja prvo večopravilno rešitev v slovenskem jeziku, ki omogoča zanesljivo in razložljivo analizo medijskih vsebin ter pomembno prispeva k razvoju digitalnih orodij za spremljanje kakovosti in objektivnosti slovenskega medijskega prostora.
Keywords:veliki jezikovni modeli, umetna inteligenca, analiza, klasifikacija, slovenski medijski prostor
Year of publishing:2026
PID:20.500.12556/ReVIS-12921 New window
Publication date in ReVIS:14.01.2026
Views:180
Downloads:1
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Secondary language

Language:English
Title:Developing a Specialised Large Language Model for Media Text Analysis
Abstract:This master’s thesis addresses the development of a specialised large language model for Slovene, designed for the comprehensive analysis of media texts. At the forefront lies the challenge of combating disinformation, bias, and polarisation within the media landscape, which necessitates advanced tools to support media literacy and fact-checking. The developed solution integrates a multi-task architecture that enables the simultaneous execution of four key tasks: thematic categorisation, sentiment analysis, detection of political bias, and assessment of source credibility. In the course of the research, a large corpus of Slovene media texts was collected and annotated, serving as the training dataset for fine-tuning the existing transformer model GaMS-9B-Instruct. The resulting model, named the Classification Model for Media Analysis (CMMA), employs a multi-head architecture in which a shared core layer provides representations across tasks, while the output layers are specialised for individual analyses. This approach facilitates knowledge transfer between tasks and enhances classification robustness. Evaluation results on a separate test set of 518 articles demonstrated a clear advantage of CMMA compared with the open-source model Qwen2.5-7B-Instruct and the baseline GaMS-9B-Instruct. The average accuracy of the proposed model exceeds 90%, achieving balanced macro-F1 metrics across all classes. In thematic categorisation, accuracy reached 93.6%; in sentiment analysis, 85.1%; in political bias detection, nearly 89%; and in credibility assessment, 88%. Unlike the two comparison models, CMMA successfully identifies rarer classes and does not exhibit bias towards majority ones. In conclusion, the thesis successfully fulfils its objectives. The developed model represents the first multi-task solution in Slovene, enabling reliable and explainable analysis of media content and making a significant contribution to the development of digital tools for monitoring the quality and objectivity of the Slovene media landscape.
Keywords:large language models, artificial intelligence, analysis, classification, Slovene media landscape


Back