Repository of colleges and higher education institutions

Show document
A+ | A- | Help | SLO | ENG

Title:OCENA SPOSOBNOSTI NAČRTOVANJA VELIKIH JEZIKOVNIH MODELOV Z UPORABO TESTA TOWER OF LONDON
Authors:ID Žužek, Katarina (Author)
ID Gams, Matjaž (Mentor) More about this mentor... New window
Files:.pdf 1305$$mag_zuzek_katarina.pdf (1,40 MB)
MD5: ACAF49BD29FBA2FE6C0D0D6C985D1CCD
 
Language:Slovenian
Work type:Master's thesis/paper
Organization:UNM FEI - University of Novo mesto - Faculty of Economics and Informatics
Abstract:V sodobni umetni inteligenci je razumevanje zmožnosti strateškega načrtovanja velikih jezikovnih modelov ključnega pomena, saj ti modeli dosegajo izjemen napredek pri obdelavi naravnega jezika in reševanju kompleksnih intelektualnih izzivov. Kljub napredku ostajajo njihove sposobnosti na področju strateškega načrtovanja, ključne za kompleksno razmišljanje, relativno premalo raziskane. Načrtovanje, kot temeljna izvršilna funkcija, vključuje zastavljanje ciljev, oblikovanje strategij in predvidevanje posledic, kar zahteva strukturirano in dolgoročno razmišljanje. V tej študiji se uporablja kognitivni test Tower of London, uveljavljen psihološki instrument, ki z nalogami preurejanja kroglic na treh palicah meri logično razmišljanje in strateške odločitve petih sodobnih velikih jezikovnih modelov – DeepSeek V3, Grok - 3, Gemini 2.0 Flash, Qwen 235B-A22B in Mistral 12B – za ovrednotenje njihove sposobnosti razumevanja navodil in reševanja nalog različnih zahtevnosti. Besedilni opisi nalog omogočajo standardizirano oceno kognitivnih zmogljivosti, medtem ko analiza v programskem okolju Python kaže razlike v uspešnosti glede na arhitekturo modelov in kompleksnost nalog. Pomembno je, da modeli ostanejo dosledni pri interpretaciji navodil in pravil premikanja, saj s tem prispevajo k zanesljivosti rezultatov. Empirični del raziskave osvetljuje potencial modelov za simulacijo kognitivnih procesov, hkrati pa opozarja na omejitve pri obvladovanju zahtevnih nalog. Metodologija vključuje uporabo ponavljajočih se meritev, binomski test, Friedmanov test s post-hoc analizo in opisno obravnavo nalog z visoko zahtevnostjo. Kognitivno testiranje je v kontekstu ocenjevanja velikih jezikovnih modelov še posebej pomembno, saj lahko pravočasno in natančno pripravljene analize pomagajo pri razumevanju mej in možnosti teh tehnologij. S tem delom želimo povezati kognitivno znanost in umetno inteligenco ter odpreti priložnosti za uporabo standardiziranih testov pri ocenjevanju naprednih kognitivnih funkcij velikih jezikovnih modelov.
Keywords:umetna inteligenca, veliki jezikovni modeli, Tower of London, izvršilne funkcije, načrtovanje.
Year of publishing:2025
PID:20.500.12556/ReVIS-12860 New window
Publication date in ReVIS:19.12.2025
Views:27
Downloads:0
Metadata:XML DC-XML DC-RDF
:
Copy citation
  
Share:Bookmark and Share


Hover the mouse pointer over a document title to show the abstract or click on the title to get all document metadata.

Secondary language

Language:English
Title:AN EVALUATION OF PLANNING SKILLS IN LARGE LANGUAGE MODELS USING THE TOWER OF LONDON TEST
Abstract:In contemporary artificial intelligence, understanding the strategic planning capabilities of large language models is of crucial importance, as these models achieve remarkable progress in natural language processing and solving complex intellectual challenges. Despite their advancements, their capacities for strategic planning — essential for sophisticated reasoning – remain relatively underexplored. Planning, as a core executive function, involves goal setting, strategy formulation, and anticipation consequences, all requiring structured and long-term thinking. This study employs the Tower of London cognitive test, an established psychological instrument that measures logical reasoning and strategic decision-making in five contemporary large language models – DeepSeek V3, Grok - 3, Gemini 2.0 Flash, Qwen 235B-A22B, and Mistral 12B – to evaluate their ability to understand instructions and solve tasks of varying difficulty. Textual descriptions of tasks enable standardized assessment of cognitive abilities, while analysis in the Python programming environment reveals differences in performance based on model architecture and task complexity. It is important that models remain consistent in interpreting instructions and movment rules, as this contributes to the reliability of results. The empirical part of the research illuminates the potential od models to simulate cognitive processes while highlighting limitations in handling demanding tasks. The methodology includes the use of repeated measurements, binomial test, Friedman test with post-hoc analysis, and descriptvie treatment of highly demanding tasks. Cognitive testing is particularly important in the context of large language model evaluation, as timely and accurate analyses can help in understanding the limitations and possibilities od these technologies. Through this work, the study aims to the bridge cognitive science and artificial intelligence and to open opportunities for the application od standardized tests in evaluating advanced cognitive functions of large language models.
Keywords:artificial intelligence, large language models, Tower of London, executive functions, planning abilities.


Back