Anvil Hero

Data Science and Machine Learning Training

RCAC offers training on a variety of data science and machine learning topics on a regular cadence.  These trainings are intended to offer an applied view of data science to help researchers effectively utilize such techniques and the clusters to further their work.

If just getting started on the clusters, users are encouraged to check out our general computing training resources as well.

Foundational Data Science and Machine Learning Topics

Introduction to how data is used for machine learning, covering data collection, pre-processing, and exploratory data analysis.

Machine Learning and Deep Learning techniques to analyze and forecast time series data in high-performance computing environments.

Introduction to core concepts of modern NLP to help users get started incorporating NLP into their work.

Essential concepts and hands-on experience in data visualization.

Free full-day workshops offered in partnership with Nvidia Deep Learning Institute on a regular basis.

Topics in generative AI including prompt engineering, architecture, and custom models.

Basics of R programming: data structures, control structures, data import/export, manipulation, visualization, and statistical analysis.

For additional training on technical concepts, we encourage users to explore trainings offered by Purdue libraries.

Large-Scale Data Science and Related Topics

Intermediate-level discussion of how the Jupyter system (JupyterHub, Jupyter Notebook, etc.) functions on a distributed computing cluster.

How to install scientific Python applications on RCAC clusters.

RCAC storage resources and services, typical use cases, access methods, and data management strategies.

Coming Soon Accelerated Machine Learning with GPUs