A Data Science & Digitalisation Project by Chadaporn Khaolumloet (Tangmo)
1. Theory Summary
2. Project Overview
Fermentation Insight Hub is a full end-to-end data science solution designed to simulate lactic acid fermentation through:
Process Analytical Technology (PAT) AI-powered statistical interpretation 3. Business Context
Corbion specialises in fermentation-based biochemicals, especially lactic acid and its derivatives.
Industrial fermentation processes are influenced by:
pH control (lime / sulfuric acid neutralisation) NIR/Raman spectra from PAT sensors Variability in these factors directly affects lactate yield (%) and final concentration (g/L).
This portfolio demonstrates how a Data Science knowledge can:
👉 apply chemometrics
👉 analyse fermentation health
👉 model process relationships
👉 detect deviations early
👉 build digital tools for R&D + Operations
4. Project Deliverables
4.1 Project Workflow
4.2 Deliverables
1. Data Generation (generate_synthetic_data.py
)
Simulates 300 fermentation batches with realistic process parameters Process variables: Sugar concentration, lime/H2SO4 doses, temperature, pH, DO, agitation, cell density, CO2, duration NIR spectra: 60 wavelengths (900-1700nm) with sugar and lactate absorption bands Outputs: Lactate yield (%), concentration (g/L), and batch quality flag Creates: lactic_fermentation_synthetic.csv
2. Model Training ( + )
PCA: 5 components for spectral dimensionality reduction and outlier detection (T²-Q diagnostics) PLS Regression: 5 latent variables to predict lactate concentration from NIR spectra Preprocessing: StandardScaler for spectral data normalization Saves: chemometrics_models.joblib
3. Exploratory Analysis (01_eda_chemometrics.ipynb
)
Data exploration and visualization PCA score plots colored by yield/concentration T²-Q residual plots for batch quality assessment PLS model performance evaluation (R², RMSE) Prediction vs observed plots 4. Interactive Dashboard ()
5 Tabs with comprehensive analytics:
📈 Overview:Key Performance Indicators and Distribution plots (yield, concentration, quality), etc.
🔬 Chemometrics (PCA):Interactive PC selection (PC1-3)
🤖 ML Model (PLS): Predicted vs Observed plot 💡 AI Insights: Google Gemini integration for intelligent data analysis