Skip to content

🌿 Fermentation Insight Hub — Chemometrics for Lactic Acid Production

A Data Science & Digitalisation Project by Chadaporn Khaolumloet (Tangmo)

1. Theory Summary

2. Project Overview

Fermentation Insight Hub is a full end-to-end data science solution designed to simulate lactic acid fermentation through:
Chemometrics (PCA, PLS)
Process Analytical Technology (PAT)
Machine Learning
Digital dashboards
AI-powered statistical interpretation

3. Business Context

Corbion specialises in fermentation-based biochemicals, especially lactic acid and its derivatives. Industrial fermentation processes are influenced by:
Raw sugar concentration
pH control (lime / sulfuric acid neutralisation)
Temperature profiles
DO limitations
Agitation
Cell growth kinetics
NIR/Raman spectra from PAT sensors
Variability in these factors directly affects lactate yield (%) and final concentration (g/L).
This portfolio demonstrates how a Data Science knowledge can:
👉 apply chemometrics 👉 analyse fermentation health 👉 model process relationships 👉 detect deviations early 👉 build digital tools for R&D + Operations

image.png
Project Overall

4. Project Deliverables

4.1 Project Workflow

Step
File Name
What This File Does
1. Synthetic Data Generation
src/generate_synthetic_data.py
Creates fermentation batches + NIR spectra.
2. Utilities for Chemometrics
src/chemometrics_utils.py
Helper functions for PCA, PLS, scaling, spectral preprocessing, etc.
3. Model Training (Chemometrics)
src/train_models.py
Trains PCA/PLS models and saves them as .joblib.
4. Dataset Storage
data/lactic_fermentation_synthetic.csv
Full synthetic fermentation dataset.
5. Saved Chemometrics Models
data/chemometrics_models.joblib
Contains PCA model, scaler, PLS model, wavelength list, etc.
6. EDA Notebook (PCA + Visualisation)
notebooks/01_eda_chemometrics.ipynb
Does exploratory data analysis + initial PCA.
7. Dashboard / Web App
app/app.py
Streamlit app for uploading data, PCA scoring, PLS predictions.
There are no rows in this table

4.2 Deliverables

1. Data Generation (
generate_synthetic_data.py
)

Simulates 300 fermentation batches with realistic process parameters
Process variables: Sugar concentration, lime/H2SO4 doses, temperature, pH, DO, agitation, cell density, CO2, duration
NIR spectra: 60 wavelengths (900-1700nm) with sugar and lactate absorption bands
Outputs: Lactate yield (%), concentration (g/L), and batch quality flag
Creates:
lactic_fermentation_synthetic.csv
image.png

2. Model Training (
train_models.py
+
chemometrics_utils.py
)

PCA: 5 components for spectral dimensionality reduction and outlier detection (T²-Q diagnostics)
PLS Regression: 5 latent variables to predict lactate concentration from NIR spectra
Preprocessing: StandardScaler for spectral data normalization
Saves:
chemometrics_models.joblib
image.png

3. Exploratory Analysis (
01_eda_chemometrics.ipynb
)

Data exploration and visualization
PCA score plots colored by yield/concentration
T²-Q residual plots for batch quality assessment
PLS model performance evaluation (R², RMSE)
Prediction vs observed plots
image.png

4. Interactive Dashboard (
app.py
)

5 Tabs with comprehensive analytics:
📈 Overview:Key Performance Indicators and Distribution plots (yield, concentration, quality), etc.

image.png
image.png
image.png
image.png
🔬 Chemometrics (PCA):Interactive PC selection (PC1-3)
image.png

🤖 ML Model (PLS): Predicted vs Observed plot
image.png
💡 AI Insights: Google Gemini integration for intelligent data analysis
image.png

image.png

Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.