In the dynamic landscape of STEM education, active learning strategies have become central to promoting student engagement, critical thinking, and long-term retention. In this study we investigate the application of Peer Instruction (PI), a structured peer-to-peer discussion method, in an undergraduate data science classroom, specifically within the module "Problems Faced When Handling Large Data & General Techniques for Handling Large Volume of Data". The goal was to assess PI's impact on performance across different cognitive levels and gauge student perception of difficulty for complex data science topics.
Sixteen third-year undergraduate students participated in the structured activity. They were divided into four balanced groups. The intervention utilized the digital response system Poll Everywhere to administer ten multiple-choice questions (MCQs): five targeting recall (factual knowledge) and five targeting conceptual understanding (abstract models and trade-offs). Each question followed a three-step PI cycle: an initial individual vote (Pre-discussion Vote), a 5–7 minute peer discussion in groups of four, and a final individual vote (Post-discussion Vote). Learning gain was calculated by comparing pre- and post-discussion accuracy. A dedicated perception poll was also conducted to determine which question type students perceived as more difficult.
Our results demonstrate that PI significantly enhances student performance, particularly on cognitively demanding material.
• Learning Gain (RQ1): While both question types improved, conceptual questions showed a greater average learning gain of +21.8%. The highest single improvement was a +50% gain for the abstract CAP theorem question, followed by substantial gains in questions related to MapReduce, caching, and batch processing. Recall-based questions saw an average gain of +17.6%. The question related to the Parquet file format, an unfamiliar term for many, saw a notable +56% gain, highlighting the effectiveness of PI even for factual recall under ambiguity.
• Perception (RQ2): The perception poll revealed that 75% of students found conceptual questions to be more difficult, confirming their higher cognitive load and the necessity for collaborative reasoning.
• Student Feedback (RQ3): Student feedback indicated increased conceptual clarity, boosted confidence, and a clear preference for the collaborative approach. Students particularly valued peer explanations and reported that discussion made abstract content easier to grasp, expressing strong interest in adopting this model for future sessions.
This study confirms that Peer Instruction, when integrated with real-time polling via Poll Everywhere, is a highly effective pedagogical strategy for enhancing deep learning in data science education. The consistent performance improvement and positive student feedback reinforce that PI successfully bridges the gap between surface-level recall and the essential conceptual understanding required for technical, abstract subjects like distributed computing. These findings strongly support embedding PI into the data science curriculum to support metacognitive engagement and conceptual clarity.
http://orcid.org/0000-0002-5988-3851
Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, India-603203
[biography]
The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026