Treffer: Analysis of Student Understanding in Short-Answer Explanations to Concept Questions Using a Human-Centered AI Approach

Title:
Analysis of Student Understanding in Short-Answer Explanations to Concept Questions Using a Human-Centered AI Approach
Language:
English
Authors:
Harpreet Auby (ORCID 0000-0002-0117-6097), Namrata Shivagunde, Vijeta Deshpande, Anna Rumshisky, Milo D. Koretsky (ORCID 0000-0002-6887-4527)
Source:
Journal of Engineering Education. 2025 114(4).
Availability:
Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Peer Reviewed:
Y
Page Count:
34
Publication Date:
2025
Sponsoring Agency:
National Science Foundation (NSF), Division of Engineering Education and Centers (EEC)
Contract Number:
2226553
2226601
Document Type:
Fachzeitschrift Journal Articles<br />Reports - Research
DOI:
10.1002/jee.70032
ISSN:
1069-4730
2168-9830
Entry Date:
2025
Accession Number:
EJ1487671
Database:
ERIC

Weitere Informationen

Background: Analyzing student short-answer written justifications to conceptually challenging questions has proven helpful to understand student thinking and improve conceptual understanding. However, qualitative analyses are limited by the burden of analyzing large amounts of text. Purpose: We apply dense and sparse Large Language Models (LLMs) to explore how machine learning can automate coding for responses in engineering mechanics and thermodynamics. Design/Method: We first identify the cognitive resources students use through human coding of seven questions. We then compare the performance of four dense LLMs and a sparse Mixture of Experts (Mixtral) model to automate coding. Finally, we investigate the extent to which domain-specific training is necessary for accurate coding. Findings: In a sample question, we analyze 904 responses to identify 48 unique cognitive resources, which we then organize into six themes. In contrast to recommendations in the literature, students who activate molecular resources were less likely to answer correctly. This example illustrates the usefulness of qualitatively analyzing large datasets. Of the LLMs, Mixtral and Llama-3 performed best at within the same-dataset, in-domain coding tasks, especially as the training set size increases. Phi-3.5-mini, while effective in mechanics, shows inconsistent improvements with additional data and struggles in thermodynamics. In contrast, GPT-4 and GPT-4o-mini stand out for their robust generalization across in- and cross-domain tasks. Conclusions: Open-source models like Mixtral have the potential to perform well when coding short-answer justifications to challenging concept questions. However, more fine-tuning is needed so that they can be robust enough to be utilized with a resources-based framing.

As Provided