With the rapid development of machine learning and artificial intelligence, the volume of data requiring processing is increasing significantly, and the models are becoming increasingly complex. Single-core CPUs and traditional single-machine memory cannot support the growing demands of data processing. Consequently, there is a pressing need to explore the parallelization and implementation of machine learning and deep learning codes on heterogeneous (multi-CPU/multi-GPU) cluster architectures. In response, the course "CSYE7105: High Performance Parallel Machine Learning and AI" was developed for graduate students in 2020.
This course aims to equip students with an understanding of the principles of high-performance parallel computing, including architecture and parallel programming models, while providing practical exposure to emerging parallel-based machine learning and AI techniques. This is essential to meet the high demand for skilled professionals in fields that require accelerated computing, such as image classification, speech recognition, and natural language processing. Utilizing the university's High-Performance Computing (HPC) cluster, the course offers students the opportunity to apply these principles in practice.
The course is divided into four parts. The first part analyzes different types of parallel computing system architectures—shared memory systems, distributed memory systems, accelerator systems, and hybrid systems—as well as the specifications and standards of parallel programming (OpenMP, MPI, and CUDA). The second introduces the university's HPC cluster, facilitates student access, and teaches the use of Slurm, a supercomputer job management platform, while incorporating assessments for manual operations on the cluster. The third focuses on parallelism-based code development and the implementation of large-data processing, machine learning algorithms, and models on a multi-CPU architecture. The fourth covers code development and the implementation of data parallelism and model parallelism for large deep learning models on a multi-GPU architecture. The culmination of these four parts will be research projects by students on the HPC cluster.
Five years of teaching this course and student interaction has provided feedback which demonstrates that the latest industry knowledge and technologies imparted in this course empower graduate students to explore and implement parallelism-based projects in machine learning and deep learning across various domains on high-performance clusters. The experience gained positions them favorably in both industrial job markets and academic doctoral program applications. Consequently, this unique course has attracted many students due to its innovative approach to technologies, challenges, and, consequently, high demand. As a teaching professor in a rapidly evolving field, I remain committed to keeping abreast of innovative technologies, continuously updating and refining the course content each semester. This prepares students not only for current industry needs but also for the ever-evolving data-driven decision-making landscape.