Skip to content

Data Engineering and ML Track


DE and ML Lessons
Topics
Sub-Topic
Lesson
Status
Notes
Introduction to Scalable Data Pipelines
6
What is a data pipeline?
3
Definition and use cases of data pipelines
Open
Importance in data-driven decision making
Open
Real-time vs batch processing
Open
Key components of a scalable data pipeline:
2
Data sources, data ingestion, data transformation, and data storage
Open
Tools Overview: Apache Kafka, Apache NiFi, Apache Airflow
Open
Used Case
1
Use Apache Airflow to build and schedule a basic ETL pipeline that reads data from a file and writes to a database.
Open
Data Integration and ETL Processes
4
ETL Overview
3
What is ETL (Extract, Transform, Load)
Open
The role of ETL in data pipelines
Open
Types of ETL (Batch vs. Real-Time)
Open
Practical
1
Build an ETL pipeline using Apache Spark that extracts data from CSV, transforms it (e.g., cleaning data), and loads it into a MySQL database.
Open
Data Transformation and Cleaning
5
Practical
2
Use Python and Pandas to clean and transform a messy dataset (e.g., removing NaNs, converting data types, and scaling numeric columns).
Open
Perform data transformations (e.g., filtering, joining) using PySpark for larger datasets.
Open
Data Transformation Techniques
1
Normalization, aggregation, filtering, and reshaping data
Open
Data Cleaning
1
Handling missing data, removing duplicates, identifying outliers
Open
Tools Overview
1
Pandas, NumPy, PySpark
Open
Database Schemas and Design
6
Practical
2
Design a normalized database schema in MySQL or PostgreSQL for an e-commerce application (products, orders, customers).
Open
Set up a NoSQL database (MongoDB) and design a flexible schema for a product catalog.
Open
Tools Overview
1
MySQL, PostgreSQL, MongoDB, DBML for schema modeling
Open
Database Design
2
Relational vs NoSQL Databases
Open
Key concepts: normalization, primary/foreign keys, indexing
Open
Database Schema Creation:
1
Designing efficient schemas for scalable systems
Open
Tableau
5
Working with Tableau
5
Basic Charts: Bar, Line, Scatter
Open
Filters & Sorting
Open
Interactive Dashboards
Open
Calculated Fields & Parameters
Open
Best Practices
Open
Introduction to Machine Learning
5
Machine Learning Basics
3
What is machine learning? (Supervised vs Unsupervised learning)
Open
Common ML algorithms: Linear Regression, Decision Trees, K-Means Clustering
Open
Tools Overview: Scikit-learn, TensorFlow, Keras
Open
Practicals
2
Use Scikit-learn to train a simple linear regression model for predicting housing prices based on features like square footage, location, etc.
Open
Evaluate model performance using metrics like accuracy, precision, recall, and F1-score.
Open
Feature Engineering and Selection
5
Practical
2
Use Python and Scikit-learn to engineer new features from raw data (e.g., extracting date parts, creating interaction terms).
Open
Apply recursive feature elimination (RFE) to select important features for an ML model.
Open
Feature Engineering
2
Creating new features from existing data (e.g., log transformations, polynomial features)
Open
One-hot encoding, binning, and feature scaling (standardization, normalization)
Open
Feature Selection
1
Methods for selecting important features (e.g., recursive feature elimination)
Open
Advanced Machine Learning Algorithms
2
Advanced Algorithms
2
Decision Trees, Random Forests, Support Vector Machines (SVM)
Open
Unsupervised Learning: K-Means Clustering, Hierarchical Clustering
Open
Model Deployment
5
Practicals
2
Create a Flask REST API to expose your trained ML model as a web service.
Open
Create a Docker image for the Flask application and run it in a container.
Open
Introduction to Model Deployment:
3
Deploying ML models using APIs (Flask)
Open
Dockerizing applications for portability
Open
Tools Overview: Flask, Docker
Open
Introduction to MLOps
4
Practicals
1
Use Jenkins to automate the process of training and deploying an ML model.
Open
MLOps Concepts
3
Introduction to MLOps
Open
Continuous Integration and Continuous Deployment (CI/CD) for machine learning
Open
Model Monitoring, Retraining, and Versioning
Open

Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.