Rubric-Based Assessment of Narrative Texts via Human-AI Collaboration: A Specialized GPT Model Approach
Main Article Content
Abstract
Abstract
This study investigates whether narrative texts can be accurately and stably scored over time and whether effective formative feedback can subsequently be provided for these texts through human-AI collaboration. To this end, two models were employed: the default version of ChatGPT and the Text Assessment Tool (TAT), a GPT model specifically trained through a six-step process for this research purpose. 114 narrative texts were scored three times according to criteria in a rubric by both the specially trained and default models. The agreement levels of the scores given by TAT and default ChatGPT with the actual scores, as well as the stability of these scores over time, were examined. The results indicated that, in contrast to the performance of default ChatGPT, TAT’s scores demonstrated high levels of agreement with the actual scores and maintained stability over time across all rubric categories, consistently surpassing the threshold and frequently indicating high reliability. Additionally, the feedback provided by TAT for the texts exceeded an 83% success rate in meeting effective feedback criteria across all categories. The statistical evidence presented in this study underscores that large language models, when specifically trained, can perform very well in scoring texts using a rubric and providing feedback. This is particularly promising for achieving fairer education, especially in large classes and situations where evaluators are overburdened.
Keywords: educational assessment, human-AI collaboration, GPT training