Repository of colleges and higher education institutions

Show document
A+ | A- | Help | SLO | ENG

Title:Določanje ključnih besed in tematik besedil : magistrska naloga
Authors:ID Robida, Nika (Author)
ID Lužar, Borut (Mentor) More about this mentor... New window
Files:.pdf MAG_2024_Nika_Robida.pdf (3,47 MB)
MD5: 791CA3DAA8AFCD0ADA8BB4BEB9DF4D8D
 
Language:Slovenian
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FIŠ - Faculty of Information Studies in Novo mesto
Abstract:Z vedno večjo količino besedilnih vsebin postajajo učinkovita obdelava, analiza in razumevanje teh besedil ključni za številne naloge, vključno z razvrščanjem besedil v kategorije, izboljšanjem iskalnih algoritmov, generiranjem povzetkov ter spremljanjem in analizo trendov. Poseben izziv predstavlja analiza kratkih in neformalnih besedil, kot so objave na družbenih omrežjih. Naša raziskava se osredotoča na dve ključni področji: ekstrakcijo ključnih besed in določanje tematik besedil. Za ekstrakcijo ključnih besed smo implementirali in analizirali štiri algoritme: RAKE, TextRank, YAKE in KeyBERT, za določanje tematik besedil pa smo preučili algoritme: LDA, prodLDA, NMF in BERTopic. Cilj naše raziskave je oceniti učinkovitost in zanesljivost teh algoritmov ter izbrati najprimernejšega za specifične potrebe, s posebnim poudarkom na boljši analizi in razumevanju kratkih, neformalnih besedil. Kot rezultat med drugim potrdimo, da se učinkovitost algoritmov spreminja glede na vrsto besedila.
Keywords:ekstrakcija ključnih besed, določanje tematik, koherenca, Twitter, predpriprava besedil
Place of publishing:Novo mesto
Place of performance:Novo mesto
Publisher:N. Robida
Year of publishing:2024
Year of performance:2024
Number of pages:XVII, 127 str.
PID:20.500.12556/ReVIS-11070 New window
COBISS.SI-ID:215810819 New window
UDC:004.93(043.2)
Note:Na ov.: Magistrska naloga : študijskega programa druge stopnje;
Publication date in ReVIS:03.12.2024
Views:161
Downloads:6
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Licences

License:CC BY-NC-ND 4.0, Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Link:http://creativecommons.org/licenses/by-nc-nd/4.0/
Description:The most restrictive Creative Commons license. This only allows people to download and share the work for no commercial gain and for no other purposes.

Secondary language

Language:English
Abstract:With the increasing amount of textual content, effective processing, analysis, and understanding of texts are becoming crucial for various tasks, including text classification, improving search algorithms, generating summaries, and monitoring and analyzing trends. A particular challenge lies in the analysis of short and informal texts, such as social media posts. Our research focuses on two key areas: keyword extraction and topic modeling. For keyword extraction, we implemented and analyzed four algorithms: RAKE, TextRank, YAKE, and KeyBERT. For topic modeling, we studied the algorithms LDA, prodLDA, NMF, and BERTopic. The goal of our research is to evaluate the effectiveness and reliability of these algorithms and select the most suitable one for specific needs, with a particular emphasis on better analysis and understanding of short, informal texts. Among other findings, we confirm that efficiency of algorithms varies depending on the type of text.
Keywords:keyword extraction, topic modeling, coherence, Twitter, text preprocessing


Back