In engineering education contexts, assessing socio-technical skills, such as systems thinking or risk assessment, is a complicated and difficult task. In many cases self-report scales are the most common, and sometimes only, existing available assessment tools to evaluate students’ abilities. Scenario-based and case-based assessments have risen as an alternative to counteract the response shift biases that afflict many self-report scale pre/post tests. Unfortunately, while these scenario-based assessments may offer more reliable measures of students’ socio-technical skills, the process of scoring scenario responses is time intensive even with trained raters and detailed scoring guides. Additionally, the time to score each scenario response plateaus once raters reach high proficiency, limiting scenario-based assessments’ usefulness as formative assessment tools. Larger pools of textual data also create more challenges for intrarater reliability as raters’ interpretation of scoring guides may drift over time.
To address this barrier, we have created a natural language processing system that augments scoring by preprocessing textual responses and assigning in line with a developed scoring guide. Specifically, we take responses to a scenario-based assessment and a detailed scoring guide accompanying the assessment and utilize term extraction to categorize common terms from the response based on categories from the scoring guide. Responses containing phrases that meet these scoring categories are then identified and extracted from the raw text and presented alongside that raw text to the human rater. Such a system can speed up the scoring process by performing a first pass of responses and identifying the presence of certain content specified by the accompanying scoring guide. The human rater is then able to briefly check the accuracy of the system’s categorization and assign a score. This accelerates the scoring process by identifying and highlighting salient content from the raw text and narrowing the range of prospective scores that a rater must consider. The system also improves consistency by applying the same categorization across the collection of text simultaneously. This is in contrast with a single person or a team that would analyze responses sequentially with the aid of the scoring guide, creating the potential for inconsistencies across time as the rater builds familiarity with the guide.
In this paper we describe the system’s architecture, data processing steps, and preliminary results. We demonstrate the utility of the system by applying it on an open-ended question from a scenario-based assessment targeting systems thinking in domain general contexts. This instance of the scenario was administered to undergraduate students across disciplines as part of both a statistics and introductory humanities course. Given students’ numerous undergraduate disciplines and knowledge domains, this presented a varied and challenging dataset.
Our preliminary results suggest that pre-processing of textual content can improve the speed and reliability of scoring when compared to unassisted human scoring with the same scoring guide. As natural language processing methods continue to advance, applications to augment textually focused assessment like scenario and case-based should continue to be explored.
Are you a researcher? Would you like to cite this paper? Visit the ASEE document repository at peer.asee.org for more tools and easy citations.