Skip to content

Azure + Databricks + MLflow

This project simulates a real-world Machine Learning Engineer workflow. It covers end-to-end development from model training, cloud storage, batch scoring on Databricks, MLflow experiment tracking, and saving scored results back to Azure Blob Storage.

Workflow Summary

image.png
image.png
image.png

1. Local Development – Simulated Data + Model Training + MLflow Tracking

Created synthetic transaction data for fraud detection
(features such as amount, hour, country, device type, merchant category, international flag, etc.)
Built a complete sklearn.Pipeline with:
ColumnTransformer (OneHotEncoder + StandardScaler)
RandomForestClassifier
Tracked training using MLflow, including:
Parameters
Metrics (ROC AUC, accuracy, etc.)
Full model artifacts
Exported the final model as fraud_model.pkl using joblib → This file is later used for batch scoring in Databricks.
image.png

2. Azure Blob Storage – Raw & Processed Data

Created a dedicated Azure Storage Account with two containers
raw/
Contains incoming transaction data fraud_new.csv
processed/
Will store final scored results after Databricks processing
image.png

3. Azure Databricks – Workspace & Model Setup

Deployed a new Azure Databricks Workspace
image.png
Created a compute cluster (fraud-cluster) using Databricks Runtime ML
Upload the Trained Model into workspace
image.png

4. Databricks Notebook – End-to-End Batch Scoring Pipeline

Connect Databricks to Azure Blob Storage
image.png
Load new data from Blob Storagen and convert Spark to Pandas
image.png
Apply feature engineering inside Databricks and Predict
image.png
Log scoring run to MLflow, each scoring job logs:
input_file
model_source
Process metrics:
num_records
avg_fraud_score
max_fraud_score
fraud_rate_pred

image.png
Example of Log scoring
image.png
Save Scored Results Back to Azure Blob Storage, we can download to explore the result as csv files
image.png
image.png


Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.