Evaluation of early student performance prediction given concept drift

Forecasting student performance can help to identify students at risk and aids in recommending actions to improve their learning outcomes. That often involves elaborate machine learning pipelines. These tend to use large feature sets including behavioral data from learning management systems or demo...

Full description

Saved in:
Bibliographic Details
Main Authors: Benedikt Sonnleitner, Tom Madou, Matthias Deceuninck, Filotas Theodosiou, Yves R. Sagaert
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Computers and Education: Artificial Intelligence
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666920X25000098
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586213160124416
author Benedikt Sonnleitner
Tom Madou
Matthias Deceuninck
Filotas Theodosiou
Yves R. Sagaert
author_facet Benedikt Sonnleitner
Tom Madou
Matthias Deceuninck
Filotas Theodosiou
Yves R. Sagaert
author_sort Benedikt Sonnleitner
collection DOAJ
description Forecasting student performance can help to identify students at risk and aids in recommending actions to improve their learning outcomes. That often involves elaborate machine learning pipelines. These tend to use large feature sets including behavioral data from learning management systems or demographic information. However, this complexity can lead to inaccurate predictions when concept drift occurs, or when a large number of features are used with a limited sample size. We investigate the performance of different machine learning pipelines on a data set with change in study behavior during the Covid-19 period. We demonstrate that (i) LASSO, a shrinkage estimator that reduces complexity and overfitting, outperforms several machine learning models under these circumstances, (ii) a linear regression relying on only two handcrafted features achieves higher accuracy and substantially less predictive bias than commonly used, more complex models with large feature sets. Due to their simplicity, these models can serve as a benchmark for future studies and a fallback model when substantial concept or covariate drift is encountered.
format Article
id doaj-art-94efdd9ccf5d4409a471d6e6f22fe962
institution Kabale University
issn 2666-920X
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Computers and Education: Artificial Intelligence
spelling doaj-art-94efdd9ccf5d4409a471d6e6f22fe9622025-01-26T05:05:10ZengElsevierComputers and Education: Artificial Intelligence2666-920X2025-06-018100369Evaluation of early student performance prediction given concept driftBenedikt Sonnleitner0Tom Madou1Matthias Deceuninck2Filotas Theodosiou3Yves R. Sagaert4VIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, Belgium; Fraunhofer IIS, Nordostpark 84, 90411 Nuremberg, Germany; Corresponding author at: VIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, Belgium.VIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumVIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumVIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumVIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumForecasting student performance can help to identify students at risk and aids in recommending actions to improve their learning outcomes. That often involves elaborate machine learning pipelines. These tend to use large feature sets including behavioral data from learning management systems or demographic information. However, this complexity can lead to inaccurate predictions when concept drift occurs, or when a large number of features are used with a limited sample size. We investigate the performance of different machine learning pipelines on a data set with change in study behavior during the Covid-19 period. We demonstrate that (i) LASSO, a shrinkage estimator that reduces complexity and overfitting, outperforms several machine learning models under these circumstances, (ii) a linear regression relying on only two handcrafted features achieves higher accuracy and substantially less predictive bias than commonly used, more complex models with large feature sets. Due to their simplicity, these models can serve as a benchmark for future studies and a fallback model when substantial concept or covariate drift is encountered.http://www.sciencedirect.com/science/article/pii/S2666920X25000098Data science applications in educationDistance education and online learning
spellingShingle Benedikt Sonnleitner
Tom Madou
Matthias Deceuninck
Filotas Theodosiou
Yves R. Sagaert
Evaluation of early student performance prediction given concept drift
Computers and Education: Artificial Intelligence
Data science applications in education
Distance education and online learning
title Evaluation of early student performance prediction given concept drift
title_full Evaluation of early student performance prediction given concept drift
title_fullStr Evaluation of early student performance prediction given concept drift
title_full_unstemmed Evaluation of early student performance prediction given concept drift
title_short Evaluation of early student performance prediction given concept drift
title_sort evaluation of early student performance prediction given concept drift
topic Data science applications in education
Distance education and online learning
url http://www.sciencedirect.com/science/article/pii/S2666920X25000098
work_keys_str_mv AT benediktsonnleitner evaluationofearlystudentperformancepredictiongivenconceptdrift
AT tommadou evaluationofearlystudentperformancepredictiongivenconceptdrift
AT matthiasdeceuninck evaluationofearlystudentperformancepredictiongivenconceptdrift
AT filotastheodosiou evaluationofearlystudentperformancepredictiongivenconceptdrift
AT yvesrsagaert evaluationofearlystudentperformancepredictiongivenconceptdrift