- Create a clean, organised, reproducible, end-to-end machine learning pipeline from scratch using MLflow;
- Clean and validate the data using Pytest;
- Track experiments, code and results using GitHub and Weights & Biases;
- Select the best-performing model for production;
- Deploy a model using MLflow.
You will learn how to be more efficient, effective and productive in modern, real-world machine learning (ML) projects by adopting best practices around reproducible workflows.
You will be required to complete a Machine Learning Pipeline project that solves the following problem:
A property management company rents rooms and properties for short periods on various rental platforms. They need to estimate the typical price for a given property based on the price of similar properties. The company receives new data in bulk every week, so the model needs to be retrained with the same cadence, necessitating a reusable pipeline. Learners will write an end-to-end pipeline covering data fetching, validation, segregation, training and validation, testing, and release. Learners will run it on an initial data sample, then re-run it on a new data sample simulating a new data delivery.
You need to be familiar with the following for the course:
Knowledge of the data science and machine learning processes,
- Experience in Python programming,
- Using Jupyter Notebook to solve data science problems,
- Machine Learning/Deep Learning knowledge,
- Writing scripts to clean data, train machine learning models and evaluate their performance,
- Using the terminal, Git and Github, etc.
This is a 100% e-learning, self-paced and self-directed course that runs on a PC/Laptop's web browser.
Depending on your experience and prior knowledge, you may spend up to 15 hours per week to finish this 1-month course.
Certification
Learners will be awarded the Advanced Course certification on passing a project within the 1-month course duration.