2026 ASEE Annual Conference & Exposition

Assessing the Reliability of Large Language Models for Scientific Information Extraction

Presented at AI-Enhanced Learning Ecosystems in Engineering Education

Building on our prior work, WIP: Leveraging AI for Literature Reviews (Gong & Maitra, 2025), which introduced an AI-assisted step by step guideline for new researchers, this study extends the framework to evaluate Retrieval-Augmented Generation (RAG) capabilities. We propose a systematic approach for assessing a single large language model’s (LLM) RAG capabilities using three quantitative metrics—correctness, completeness, and compliance—that capture factual accuracy, contextual coverage, and adherence to citation or formatting standards. These metrics are tested on literature-retrieval and summarization tasks drawn from perovskite solar-cell and additive-manufacturing datasets, allowing a comparison between AI-generated and human-verified reviews. By quantifying how reliably an LLM retrieves, attributes, and integrates external sources, the study establishes a reproducible method for benchmarking AI tools such as ChatGPT, Perplexity, and Elicit in research workflows.
The second phase explores multi-agent RAG collaboration, modeling interactions among multiple LLMs (e.g., GPT-4, Claude, Gemini) as agents with distinct “personalities.” Each agent is characterized by behavioral traits—willingness to change, cooperation, verification precedence, and acceptance of reasoning—that influence how information is exchanged and reconciled. Through multi-shot dialogue experiments, we visualize knowledge exchange networks that reveal patterns of consensus, contradiction, and reasoning acceptance across agents. Preliminary results show that inter-agent verification enhances completeness and factual alignment but may reduce efficiency when compliance constraints dominate. This multi-level RAG evaluation framework links algorithmic accuracy with collaborative reasoning behavior, offering new directions for teaching undergraduates how to critically evaluate, verify, and synthesize AI-generated research outputs within ethical and academically rigorous contexts.

Authors

Luke Schneider Pennsylvania State University, Behrend College
Debalina Maitra Kennesaw State University [biography]

Assistant Professor of Teacher Leadership
Jing Zhao Pennsylvania State University, Behrend College
Dr. Jiawei Gong http://orcid.org/https://0000-0003-4318-9387 Pennsylvania State University, Behrend College [biography]

Dr. Jiawei Gong is an Associate Professor of Mechanical Engineering at Penn State Behrend. He earned his PhD (2017) and MS (2014) in Mechanical Engineering from North Dakota State University and a BE in Polymer Materials and Engineering from East China University of Science and Technology (2010). Dr. Gong's research interests include energy materials and devices, particularly dye-sensitized and perovskite solar cells. His recent research focuses on leveraging machine learning for data acquisition and analysis, aimed at advancing solar energy technologies.

Note

The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026

« View session