The admissions process for the University of Toronto requires its staff to spend countless hours manually reviewing student transcript images to make critical decisions about their academic future. Academic transcript images are tedious to read and transcribe due to their myriads of visual features, such as colored backgrounds, watermarks, multi-column layouts, and small text. To streamline this process, this report investigates the development of an AI system specifically designed for transcribing grade data from academic transcript images into organized tables. While models for table extraction are not novel, existing methods are limited when dealing with academic transcripts due to their unique features and a lack of representation in pre-existing datasets used for training. To our knowledge, this report presents the first labeled, open-source dataset of purely academic transcript images used for training computer-vision based machine learning algorithms. Two primary approaches for image-to-text table reconstruction were explored; the first is a pipeline comprising a YOLOv8 object detection model, Tesseract OCR engine, and a Mistral7b large language model (LLM). The second option implemented a fine-tuned multimodal language model (MiniCPM-Llama3-V-2_5). The multimodal LLM showed superior accuracy on a small test set, with a multi-stage prompting strategy further enhancing its recall on images with more complex multi-column layouts. Future work could greatly improve on this solution by leveraging the trained YOLOv8 object detection model as a preprocessing step, as well as continuing to develop the dataset with a greater diversity of images and prompting formats. Additionally, given the uniquely finite number of transcript formats in circulation, it’s hypothesized that a larger, more inclusive dataset could be used to train a high precision model with near-universal applicability within the target domain. This work forms the foundation of future analytics projects at the University of Toronto, providing a platform with which admissions data may be used to predict student success, and to better track student progress over their academic career.
The full paper will be available to logged in and registered conference attendees once the conference starts on June 22, 2025, and to all visitors after the conference ends on June 25, 2025