An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical Features

E-learning systems are transforming the educational sector and making education more affordable and accessible. Recently, many e-learning systems have been equipped with advanced technologies that facilitate the roles of educators and increase the efficiency of teaching and learning. One such techno...

Full description

Saved in:
Bibliographic Details
Main Authors: Husam M. Alawadh, Talha Meraj, Lama Aldosari, Hafiz Tayyab Rauf
Format: Article
Language:English
Published: SAGE Publishing 2024-12-01
Series:SAGE Open
Online Access:https://doi.org/10.1177/21582440241300548
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832593560031985664
author Husam M. Alawadh
Talha Meraj
Lama Aldosari
Hafiz Tayyab Rauf
author_facet Husam M. Alawadh
Talha Meraj
Lama Aldosari
Hafiz Tayyab Rauf
author_sort Husam M. Alawadh
collection DOAJ
description E-learning systems are transforming the educational sector and making education more affordable and accessible. Recently, many e-learning systems have been equipped with advanced technologies that facilitate the roles of educators and increase the efficiency of teaching and learning. One such technology is Automatic Essay Grading (AEG) or Automatic Text Scoring (ATS) systems. To enable educators to remain more focused on teaching, there is a dire need to develop a more efficient use of their time. This is where automatic systems come into play, but they are still encountering an ongoing challenge due to many complex aspects, such as covering students’ creativity, novelty, context, subjectivity, coherence, cohesion, and homogeneity. The proposed study chose the Kaggle dataset of the Hewlett Foundation competition to cover this gap. It contains eight different essay sets based on student-written essays and their different range-based scores. Firstly, a score quantification method is applied to domain scores. Moreover, the proposed study covered four different aspects of student-written essays and extracted cohesion features via sentence connectivity, coherence via sentence relatedness, statistical lexical features via the Term Frequency (TF)-Inverse Document Frequency (IDF) method, and discourse macrostructural features via calculating the unique pattern of each essay. Three different experiments based upon the combination of these features are conducted, the most effective combination of features remains as statistical lexical features and discourse macrostructural features whereas the Linear Regression method is used for score prediction. The average Quadratic Weighted Kappa (QWK) score of 0.9339 was achieved and outperformed previous solutions in terms of time, computation, and performance.
format Article
id doaj-art-96ced7658aee475f9e7775c71c139d4b
institution Kabale University
issn 2158-2440
language English
publishDate 2024-12-01
publisher SAGE Publishing
record_format Article
series SAGE Open
spelling doaj-art-96ced7658aee475f9e7775c71c139d4b2025-01-20T13:03:48ZengSAGE PublishingSAGE Open2158-24402024-12-011410.1177/21582440241300548An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical FeaturesHusam M. Alawadh0Talha Meraj1Lama Aldosari2Hafiz Tayyab Rauf3Department of English Language and Translation, College of Language Sciences, King Saud University, Riyadh, Saudi ArabiaCOMSATS University Islamabad—Wah Campus, Wah Cantt, Punjab, PakistanDepartment of English Language and Translation, College of Language Sciences, King Saud University, Riyadh, Saudi ArabiaBool Mind Software Technologies, Mequon, WI, USAE-learning systems are transforming the educational sector and making education more affordable and accessible. Recently, many e-learning systems have been equipped with advanced technologies that facilitate the roles of educators and increase the efficiency of teaching and learning. One such technology is Automatic Essay Grading (AEG) or Automatic Text Scoring (ATS) systems. To enable educators to remain more focused on teaching, there is a dire need to develop a more efficient use of their time. This is where automatic systems come into play, but they are still encountering an ongoing challenge due to many complex aspects, such as covering students’ creativity, novelty, context, subjectivity, coherence, cohesion, and homogeneity. The proposed study chose the Kaggle dataset of the Hewlett Foundation competition to cover this gap. It contains eight different essay sets based on student-written essays and their different range-based scores. Firstly, a score quantification method is applied to domain scores. Moreover, the proposed study covered four different aspects of student-written essays and extracted cohesion features via sentence connectivity, coherence via sentence relatedness, statistical lexical features via the Term Frequency (TF)-Inverse Document Frequency (IDF) method, and discourse macrostructural features via calculating the unique pattern of each essay. Three different experiments based upon the combination of these features are conducted, the most effective combination of features remains as statistical lexical features and discourse macrostructural features whereas the Linear Regression method is used for score prediction. The average Quadratic Weighted Kappa (QWK) score of 0.9339 was achieved and outperformed previous solutions in terms of time, computation, and performance.https://doi.org/10.1177/21582440241300548
spellingShingle Husam M. Alawadh
Talha Meraj
Lama Aldosari
Hafiz Tayyab Rauf
An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical Features
SAGE Open
title An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical Features
title_full An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical Features
title_fullStr An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical Features
title_full_unstemmed An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical Features
title_short An Efficient Text-Mining Framework of Automatic Essay Grading Using Discourse Macrostructural and Statistical Lexical Features
title_sort efficient text mining framework of automatic essay grading using discourse macrostructural and statistical lexical features
url https://doi.org/10.1177/21582440241300548
work_keys_str_mv AT husammalawadh anefficienttextminingframeworkofautomaticessaygradingusingdiscoursemacrostructuralandstatisticallexicalfeatures
AT talhameraj anefficienttextminingframeworkofautomaticessaygradingusingdiscoursemacrostructuralandstatisticallexicalfeatures
AT lamaaldosari anefficienttextminingframeworkofautomaticessaygradingusingdiscoursemacrostructuralandstatisticallexicalfeatures
AT hafiztayyabrauf anefficienttextminingframeworkofautomaticessaygradingusingdiscoursemacrostructuralandstatisticallexicalfeatures
AT husammalawadh efficienttextminingframeworkofautomaticessaygradingusingdiscoursemacrostructuralandstatisticallexicalfeatures
AT talhameraj efficienttextminingframeworkofautomaticessaygradingusingdiscoursemacrostructuralandstatisticallexicalfeatures
AT lamaaldosari efficienttextminingframeworkofautomaticessaygradingusingdiscoursemacrostructuralandstatisticallexicalfeatures
AT hafiztayyabrauf efficienttextminingframeworkofautomaticessaygradingusingdiscoursemacrostructuralandstatisticallexicalfeatures