Evaluation of early student performance prediction given concept drift
Forecasting student performance can help to identify students at risk and aids in recommending actions to improve their learning outcomes. That often involves elaborate machine learning pipelines. These tend to use large feature sets including behavioral data from learning management systems or demo...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-06-01
|
Series: | Computers and Education: Artificial Intelligence |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666920X25000098 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832586213160124416 |
---|---|
author | Benedikt Sonnleitner Tom Madou Matthias Deceuninck Filotas Theodosiou Yves R. Sagaert |
author_facet | Benedikt Sonnleitner Tom Madou Matthias Deceuninck Filotas Theodosiou Yves R. Sagaert |
author_sort | Benedikt Sonnleitner |
collection | DOAJ |
description | Forecasting student performance can help to identify students at risk and aids in recommending actions to improve their learning outcomes. That often involves elaborate machine learning pipelines. These tend to use large feature sets including behavioral data from learning management systems or demographic information. However, this complexity can lead to inaccurate predictions when concept drift occurs, or when a large number of features are used with a limited sample size. We investigate the performance of different machine learning pipelines on a data set with change in study behavior during the Covid-19 period. We demonstrate that (i) LASSO, a shrinkage estimator that reduces complexity and overfitting, outperforms several machine learning models under these circumstances, (ii) a linear regression relying on only two handcrafted features achieves higher accuracy and substantially less predictive bias than commonly used, more complex models with large feature sets. Due to their simplicity, these models can serve as a benchmark for future studies and a fallback model when substantial concept or covariate drift is encountered. |
format | Article |
id | doaj-art-94efdd9ccf5d4409a471d6e6f22fe962 |
institution | Kabale University |
issn | 2666-920X |
language | English |
publishDate | 2025-06-01 |
publisher | Elsevier |
record_format | Article |
series | Computers and Education: Artificial Intelligence |
spelling | doaj-art-94efdd9ccf5d4409a471d6e6f22fe9622025-01-26T05:05:10ZengElsevierComputers and Education: Artificial Intelligence2666-920X2025-06-018100369Evaluation of early student performance prediction given concept driftBenedikt Sonnleitner0Tom Madou1Matthias Deceuninck2Filotas Theodosiou3Yves R. Sagaert4VIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, Belgium; Fraunhofer IIS, Nordostpark 84, 90411 Nuremberg, Germany; Corresponding author at: VIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, Belgium.VIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumVIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumVIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumVIVES University of Applied Sciences, Doorniksesteenweg 145, 8500 Kortrijk, BelgiumForecasting student performance can help to identify students at risk and aids in recommending actions to improve their learning outcomes. That often involves elaborate machine learning pipelines. These tend to use large feature sets including behavioral data from learning management systems or demographic information. However, this complexity can lead to inaccurate predictions when concept drift occurs, or when a large number of features are used with a limited sample size. We investigate the performance of different machine learning pipelines on a data set with change in study behavior during the Covid-19 period. We demonstrate that (i) LASSO, a shrinkage estimator that reduces complexity and overfitting, outperforms several machine learning models under these circumstances, (ii) a linear regression relying on only two handcrafted features achieves higher accuracy and substantially less predictive bias than commonly used, more complex models with large feature sets. Due to their simplicity, these models can serve as a benchmark for future studies and a fallback model when substantial concept or covariate drift is encountered.http://www.sciencedirect.com/science/article/pii/S2666920X25000098Data science applications in educationDistance education and online learning |
spellingShingle | Benedikt Sonnleitner Tom Madou Matthias Deceuninck Filotas Theodosiou Yves R. Sagaert Evaluation of early student performance prediction given concept drift Computers and Education: Artificial Intelligence Data science applications in education Distance education and online learning |
title | Evaluation of early student performance prediction given concept drift |
title_full | Evaluation of early student performance prediction given concept drift |
title_fullStr | Evaluation of early student performance prediction given concept drift |
title_full_unstemmed | Evaluation of early student performance prediction given concept drift |
title_short | Evaluation of early student performance prediction given concept drift |
title_sort | evaluation of early student performance prediction given concept drift |
topic | Data science applications in education Distance education and online learning |
url | http://www.sciencedirect.com/science/article/pii/S2666920X25000098 |
work_keys_str_mv | AT benediktsonnleitner evaluationofearlystudentperformancepredictiongivenconceptdrift AT tommadou evaluationofearlystudentperformancepredictiongivenconceptdrift AT matthiasdeceuninck evaluationofearlystudentperformancepredictiongivenconceptdrift AT filotastheodosiou evaluationofearlystudentperformancepredictiongivenconceptdrift AT yvesrsagaert evaluationofearlystudentperformancepredictiongivenconceptdrift |