Evaluation of early student performance prediction given concept drift

Forecasting student performance can help to identify students at risk and aids in recommending actions to improve their learning outcomes. That often involves elaborate machine learning pipelines. These tend to use large feature sets including behavioral data from learning management systems or demo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Benedikt Sonnleitner, Tom Madou, Matthias Deceuninck, Filotas Theodosiou, Yves R. Sagaert
Format:	Article
Language:	English
Published:	Elsevier 2025-06-01
Series:	Computers and Education: Artificial Intelligence
Subjects:	Data science applications in education Distance education and online learning
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666920X25000098
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832586213160124416
author	Benedikt Sonnleitner Tom Madou Matthias Deceuninck Filotas Theodosiou Yves R. Sagaert
author_facet	Benedikt Sonnleitner Tom Madou Matthias Deceuninck Filotas Theodosiou Yves R. Sagaert
author_sort	Benedikt Sonnleitner
collection	DOAJ
description	Forecasting student performance can help to identify students at risk and aids in recommending actions to improve their learning outcomes. That often involves elaborate machine learning pipelines. These tend to use large feature sets including behavioral data from learning management systems or demographic information. However, this complexity can lead to inaccurate predictions when concept drift occurs, or when a large number of features are used with a limited sample size. We investigate the performance of different machine learning pipelines on a data set with change in study behavior during the Covid-19 period. We demonstrate that (i) LASSO, a shrinkage estimator that reduces complexity and overfitting, outperforms several machine learning models under these circumstances, (ii) a linear regression relying on only two handcrafted features achieves higher accuracy and substantially less predictive bias than commonly used, more complex models with large feature sets. Due to their simplicity, these models can serve as a benchmark for future studies and a fallback model when substantial concept or covariate drift is encountered.
format	Article
id	doaj-art-94efdd9ccf5d4409a471d6e6f22fe962
institution	Kabale University
issn	2666-920X
language	English
publishDate	2025-06-01
publisher	Elsevier
record_format	Article
series	Computers and Education: Artificial Intelligence
spelling	doaj-art-94efdd9ccf5d4409a471d6e6f22fe9622025-01-26T05:05:10ZengElsevierComputers and Education: Artificial Intelligence2666-920X2025-06-018100369Evaluation of early student performance prediction given concept driftBenedikt Sonnleitner0Tom Madou1Matthias Deceuninck2Filotas Theodosiou3Yves R. Sagaert4VIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, Belgium; Fraunhofer IIS, Nordostpark 84, 90411 Nuremberg, Germany; Corresponding author at: VIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, Belgium.VIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumVIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumVIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumVIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumForecasting student performance can help to identify students at risk and aids in recommending actions to improve their learning outcomes. That often involves elaborate machine learning pipelines. These tend to use large feature sets including behavioral data from learning management systems or demographic information. However, this complexity can lead to inaccurate predictions when concept drift occurs, or when a large number of features are used with a limited sample size. We investigate the performance of different machine learning pipelines on a data set with change in study behavior during the Covid-19 period. We demonstrate that (i) LASSO, a shrinkage estimator that reduces complexity and overfitting, outperforms several machine learning models under these circumstances, (ii) a linear regression relying on only two handcrafted features achieves higher accuracy and substantially less predictive bias than commonly used, more complex models with large feature sets. Due to their simplicity, these models can serve as a benchmark for future studies and a fallback model when substantial concept or covariate drift is encountered.http://www.sciencedirect.com/science/article/pii/S2666920X25000098Data science applications in educationDistance education and online learning
spellingShingle	Benedikt Sonnleitner Tom Madou Matthias Deceuninck Filotas Theodosiou Yves R. Sagaert Evaluation of early student performance prediction given concept drift Computers and Education: Artificial Intelligence Data science applications in education Distance education and online learning
title	Evaluation of early student performance prediction given concept drift
title_full	Evaluation of early student performance prediction given concept drift
title_fullStr	Evaluation of early student performance prediction given concept drift
title_full_unstemmed	Evaluation of early student performance prediction given concept drift
title_short	Evaluation of early student performance prediction given concept drift
title_sort	evaluation of early student performance prediction given concept drift
topic	Data science applications in education Distance education and online learning
url	http://www.sciencedirect.com/science/article/pii/S2666920X25000098
work_keys_str_mv	AT benediktsonnleitner evaluationofearlystudentperformancepredictiongivenconceptdrift AT tommadou evaluationofearlystudentperformancepredictiongivenconceptdrift AT matthiasdeceuninck evaluationofearlystudentperformancepredictiongivenconceptdrift AT filotastheodosiou evaluationofearlystudentperformancepredictiongivenconceptdrift AT yvesrsagaert evaluationofearlystudentperformancepredictiongivenconceptdrift

Evaluation of early student performance prediction given concept drift

Similar Items