JavaScript required
We’re sorry, but Coda doesn’t work properly without JavaScript enabled.
Skip to content
PD176 - Gen AI
PD176 - Training in Gen AI and DE
Gradebook
Lesson Tracker
Attendance
More
Share
Explore
Data Engineering and ML Track
DE and ML Lessons
DE and ML Lessons
Topics
Sub-Topic
Lesson
Status
Notes
Topics
Sub-Topic
Lesson
Status
Notes
Introduction to Scalable Data Pipelines
6
What is a data pipeline?
3
Definition and use cases of data pipelines
Done
Open
Importance in data-driven decision making
Done
Open
Real-time vs batch processing
Done
Open
Key components of a scalable data pipeline:
2
Data sources, data ingestion, data transformation, and data storage
In progress
Open
Tools Overview: Apache Kafka,
Apache NiFi
, Apache Airflow
In progress
Open
Used Case
1
Use Apache Airflow to build and schedule a basic ETL pipeline that reads data from a file and writes to a database.
In progress
Open
Data Integration and ETL Processes
4
ETL Overview
3
What is ETL (Extract, Transform, Load)
Done
Open
The role of ETL in data pipelines
Done
Open
Types of ETL (Batch vs. Real-Time)
Done
Open
Practical
1
Build an ETL pipeline using Apache Spark that extracts data from CSV, transforms it (e.g., cleaning data), and loads it into a MySQL database.
Done
Open
Data Transformation and Cleaning
5
Practical
2
Use Python and Pandas to clean and transform a messy dataset (e.g., removing NaNs, converting data types, and scaling numeric columns).
Done
Open
Perform data transformations (e.g., filtering, joining) using PySpark for larger datasets.
Done
Open
Data Transformation Techniques
1
Normalization, aggregation, filtering, and reshaping data
Done
Open
Data Cleaning
1
Handling missing data, removing duplicates, identifying outliers
Done
Open
Tools Overview
1
Pandas, NumPy, PySpark
Done
Open
Database Schemas and Design
6
Practical
2
Design a normalized database schema in MySQL or PostgreSQL for an e-commerce application (products, orders, customers).
Done
Open
Set up a NoSQL database (MongoDB) and design a flexible schema for a product catalog.
Done
Open
Tools Overview
1
MySQL, PostgreSQL, MongoDB, DBML for schema modeling
Done
Open
Database Design
2
Relational vs NoSQL Databases
Done
Open
Key concepts: normalization, primary/foreign keys, indexing
Done
Open
Database Schema Creation:
1
Designing efficient schemas for scalable systems
Done
Open
Tableau
5
Working with Tableau
5
Basic Charts: Bar, Line, Scatter
Done
Open
Filters & Sorting
Done
Open
Interactive Dashboards
Done
Open
Calculated Fields & Parameters
Done
Open
Best Practices
Done
Open
Introduction to Machine Learning
5
Machine Learning Basics
3
What is machine learning? (Supervised vs Unsupervised learning)
Done
Open
Common ML algorithms: Linear Regression, Decision Trees, K-Means Clustering
Done
Open
Tools Overview: Scikit-learn, TensorFlow, Keras
Done
Open
Practicals
2
Use Scikit-learn to train a simple linear regression model for predicting housing prices based on features like square footage, location, etc.
Done
Open
Evaluate model performance using metrics like accuracy, precision, recall, and F1-score.
Done
Open
Feature Engineering and Selection
5
Practical
2
Use Python and Scikit-learn to engineer new features from raw data (e.g., extracting date parts, creating interaction terms).
Done
Open
Apply recursive feature elimination (RFE) to select important features for an ML model.
Done
Open
Feature Engineering
2
Creating new features from existing data (e.g., log transformations, polynomial features)
Done
Open
One-hot encoding, binning, and feature scaling (standardization, normalization)
Done
Open
Feature Selection
1
Methods for selecting important features (e.g., recursive feature elimination)
Done
Open
Advanced Machine Learning Algorithms
2
Advanced Algorithms
2
Decision Trees, Random Forests, Support Vector Machines (SVM)
Done
Open
Unsupervised Learning: K-Means Clustering, Hierarchical Clustering
Done
Open
Model Deployment
5
Practicals
2
Create a Flask REST API to expose your trained ML model as a web service.
Done
Open
Create a Docker image for the Flask application and run it in a container.
Blocked
Open
Introduction to Model Deployment:
3
Deploying ML models using APIs (Flask)
Done
Open
Dockerizing applications for portability
Blocked
Open
Tools Overview: Flask,
Docker
Done
Open
Introduction to MLOps
4
Practicals
1
Use Jenkins to automate the process of training and deploying an ML model.
Open
MLOps Concepts
3
Introduction to MLOps
Done
Open
Continuous Integration and Continuous Deployment (CI/CD) for machine learning
Demo
Open
Model Monitoring, Retraining, and Versioning
Demo
Open
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
Ctrl
P
) instead.