A Multi-Layer Attention Knowledge Tracking Method with Self-Supervised Noise Tolerance

The knowledge tracing method based on deep learning is used to assess learners’ cognitive states, laying the foundation for personalized education. However, deep learning methods are inefficient when processing long-term series data and are prone to overfitting. To improve the accuracy of cognitive...

Full description

Saved in:
Bibliographic Details
Main Authors: Haifeng Wang, Hao Liu, Yanling Ge, Zhihao Yu
Format: Article
Language:English
Published: MDPI AG 2025-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/15/8717
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The knowledge tracing method based on deep learning is used to assess learners’ cognitive states, laying the foundation for personalized education. However, deep learning methods are inefficient when processing long-term series data and are prone to overfitting. To improve the accuracy of cognitive state prediction, we design a Multi-layer Attention Self-supervised Knowledge Tracing Method (MASKT) using self-supervised learning and the Transformer method. In the pre-training stage, MASKT uses a random forest method to filter out positive and negative correlation feature embeddings; then, it reuses noise-processed restoration tasks to extract more learnable features and enhance the learning ability of the model. The Transformer in MASKT not only solves the problem of long-term dependencies between input and output using an attention mechanism, but also has parallel computing capabilities that can effectively improve the learning efficiency of the prediction model. Finally, a multidimensional attention mechanism is integrated into cross-attention to further optimize prediction performance. The experimental results show that, compared with various knowledge tracing models on multiple datasets, MASKT’s prediction performance remains 2 percentage points higher. Compared with the multidimensional attention mechanism of graph neural networks, MASKT’s time efficiency is shortened by nearly 30%. Due to the improvement in prediction accuracy and performance, this method has broad application prospects in the field of cognitive diagnosis in intelligent education.
ISSN:2076-3417