2024 ASEE Annual Conference & Exposition

Integrating Data-Driven and Career Development Theory-Driven Approaches to Study High School Student Persistence in STEM Career Aspirations

Presented at DSA Technical Session 6

High school students’ aspirations for STEM occupations can significantly influence their decisions to pursue a STEM track in college or as a career. Existing large-scale datasets, such as the Education Longitudinal Study of 2002 (ELS:2002), promise a comprehensive investigation of the factors that contribute to high school students' persistence in STEM career aspirations. Prior research on this topic often relies on the theory-driven approach to identify predictors and form hypotheses for statistical tests. Some commonly used theories explaining persistence in STEM career aspirations include Social Cognitive Career Theory (SCCT), Expectancy-Value Theory (EVT), and Expectation States Theory (EST). However, when using the theory-driven approach with large-scale dataset, challenges emerge. Many studies tend to rely on one theory to identify predictors, potentially missing out on the rich insights these datasets offer. Yet, employing multiple theories for predictor identification can lead to an overwhelming number of predictors. This is where the data-driven approach becomes beneficial. We can reduce the number of predictors identified from multiple theories based on the feature selection model. Notably, the predictors selected using this data-driven method remain interpretable since they are originally sourced from established theories.
This study proposes a blended approach that integrates theory-driven and data-driven methods. We demonstrate this approach by analyzing the ELS:2002 dataset to construct a model explaining high school students’ persistence in STEM career aspirations. Initially, we use three theory-driven approaches to identify candidate predictors from ELS:2002, following SCCT, EVT, and EST frameworks to maximize data utilization. By using the Boruta algorithm, a data-driven method based on random forest classification, we streamline predictor selection from this extensive list to construct the final model.
The analytical data comprises a total sample of 2,741 9th-graders from 361 high schools who expressed STEM career aspirations at the age of 30. The binary outcome variable is whether these students still have STEM career aspirations at the age of 30 in 12th grade. The procedure of implementing the approach includes (a) utilizing three theory-driven approaches to identify potential predictor variables, (b) using the Boruta in R 4.13 to distinguish important, tentative, and unimportant variables, and (c) clustering the important variables into subgroups and conducting different multilevel modeling models to determine the best model and investigate the relationship between student persistence in STEM career aspirations and predictors, while also exploring variability across schools.
Out of the 81 candidate predictors chosen through three theory-driven approaches, a total of 17 important predictors were identified by Boruta. These predictors were linked to parental expectations, math performance, student success expectations, student educational expectations, SES, self-efficacy, and gender. Significant variables include self-efficacy, student educational expectations, student success expectations, and gender. The odds ratios show that (a) students with strong math self-efficacy have a higher likelihood of STEM career persistence and students with strong English self-efficacy are less likely to persist STEM career, (b) high educational expectations or high learning success expectations are associated with greater STEM career persistence, and (c) female students are more likely to have STEM career persistence compared to male students.

Authors
  1. tonghui xu University of Massachusetts, Lowell [biography]
Download paper (1.86 MB)

Are you a researcher? Would you like to cite this paper? Visit the ASEE document repository at peer.asee.org for more tools and easy citations.