From human health to climatic change, many of our most pressing issues depend upon our ability to understand and impact biology. It is one of the greatest challenge faced by humanity, both essential to our survival and highly complex.
Our incapacity to determine the behavior of biological processes seriously hinders development in bioengineering and biomedical research. We still can not precisely estimate the impact of biological perturbation on the outcomes.
In bio-pharma, this means pre-clinical research is not a very strong predictor of clinical outcomes, in terms of adverse reactions and efficacy.
This results in a 90% attrition rate for hit molecules, 12 years hits-to-market and costs of $2.6B per market therapy on average.
Pre-clinical Pharmaceutical research is not a very strong predictor of clinical outcomes, in terms of adverse reactions and efficacy.
Recent advances in AI and Bioengineering are however opening the door to a revolution in life science research.
Our goal is to push that revolution forward : we want to make research fast and predictive.
To achieve that goal we are building a new type of integrated platform, leveraging cutting edge deep learning and bioengineering methods to provide combined high-fidelity in-vitro and in-silico models of human biology.
HOW ARE WE BUILDING IT ?
We believe true progress in our field can only be achieved with access to high quality self-generated data and continuous monitoring.
This is why unlike many other AI-enabled drug discovery companies, we iterate through experimental biology (wet lab) and computational modeling (dry lab) to test our results in a continuous, confirmatory loop.
We develop bio-organisms in-house, from human tissue research models to therapeutic cells.
By combining dry and wet lab capabilities, we can more confidently generate actionable insights on biology for rapid discovery of novel treatments and innovative products.
Building a platform to make programming cells easier, means improving the process of,
Our platform builds upon Stem Cells (iPSC), Genetic Engineering and Monitoring technologies to perform and test the outcome of biological interventions.
Modeling relies on a specifically designed pipeline of deep learning and programming technologies.
We generate computer representations of the cells and tissues, designed to simulate biological processes and suggest genetic perturbations.
Insilico modelling platform : data extraction and aggregation models = 20%, deep learning pipeline under early development.
Software application: visual reactive prototype = 20%, backend system =5%, frontend under early development.
Biological disease models platform: cells and organoids characteristics and specific needs/improvements over existing models is being evaluated through end-users interviews. Development starting in the coming months in partnership with two academic partners the Institute for Regenerative Medicine and Biotherapy and the organ-on-chips lab at Institut Pasteur.
If there is one thing that the recent events taught us it is how much collaboration was an absolute necessity, a critical tool for human progress and survival.
We don’t plan on reaching our goals alone, our business model is designed with collaborative, distributed innovation at it’s core.
We believe this approach, bringing together biotechnology, computer analytics and large scale collaborations in a coherent system will enable researchers to develop the next generation of bio-organisms, gain predictive power and radically transform biology.
Our model revolves around:
Technology Platform access fees
Shared IP through R&D collaborations
Developing our own products
We focus our efforts on making sure we are building the perfect end-to-end platform for biology researchers by collaborating on R&D with labs, biotechs and big pharmas.
Predictive models in academic research
Predictive models in pre-clinical studies
Predictive models in clinical trials & precision medecine
We create complex engineered cells and tissues for a variety of applications, from academic research to clinical trials : a highly defensible product in terms of expertise and IP protection.
Our graph analytics and deep learning platform is built on in-house research and development : it leverages combined expertise in Biology, Insilico modelling, Deep Learning and Graph Analytics with the objective of pushing the boundaries of what can be achieved. Most parts of our pipeline are not shared and provide high technology defensibility.
We direct our efforts towards the development of an ecosystem business model. Part of that work is used to building a large network of academic and corporate partners . The platform itself is designed to benefit from economies of scale and sustain multiple research projects concurrently.
Jimmy worked on both scientific and commercial aspects of the Pharmaceutical Industry. He is a former member of the Market Access department at Bristol Myers Squibb and a former Life Science Strategy consultant.
He carried out strategic consulting, pharmaceutical pipeline management and market access missions with several of the world's largest pharmaceutical companies, including Sanofi, BMS, Pfizer and Lundbeck.
He graduated in Neurobiology (BSc) and in Therapeutic Bio-engineering (MSc) and worked on research projects at the CNRS, UK’s National Institute for Medical Research and College de France.
He holds degrees from the London Business School (PhD 2 years - MSc), ESSEC (MBA), KEIO University (MBA), and is a former member of the Sanofi Chair in Therapeutic Innovation.
He studied and applied Graph theory (at UCL) and Agent-Based Simulations (at LBS) in research contexts and applies deep unsupervised learning methods to biology.
Anthony is a cellular and molecular biologist with an expertise in Stem Cells and Senescence.
He contributed significantly to the understanding of the role of mitochondria and telomeres dysfunction during cell senescence. He worked in several world-renowned institutions, including research in induced pluripotent Stem Cells at the Institute of Regenerative Medicine, with, skin tissue at Newcastle University and on aging biological processes with Dr. Passos, a pioneer in aging-related diseases at the Mayo Clinic. Among other contributions, he is involved in the development of unique methodologies for analyzing telomeres dysfunction using super-resolution microscopy and reporter systems that allow us to see the dynamics of telomeres damage in living cells. He develops deep learning computer vision methods applied to microscopy.
He holds a Marie Curie PhD in Biology, a Post-Doctorate in Cellular Senescence and a Master in Bioedical Engineering.
Johan is an expert in genomics and bioinformatics, with proven track records of delivering and sustaining revenue and profit gains within highly mutating markets.
He is a senior commercial executive with a long experience applying commercial and scientific skills to the marketing and business development of several biotech companies, including Cephalon, APCure, Genostar and Congenica.
He worked on genomics business development for 11+ years, with a focus on Personalized Medicine, Bioremediation and Diagnostics markets and technologies.
He holds a PhD in Bioinformatics, MSc in therapeuthic bio-engineering and Business Management Msc.
Fig 1. The platform in a simplified form : Biology to Computer Model intervention-learning loop
We have a clear shared vision on what needs to be achieved, it’s not a unique vision anymore, a few companies started to work with similar closed-loop approaches : Zymergen, Gingko Bioworks, Recursion and Insitro.
These companies have been tremendously successful. this is unfortunately only a handful of companies in the sea of biotechs and AI companies focusing on partial solutions.
We believe there is much more to be done before biology at large can benefit : better integration of computation and bioengineering technologies, improved deep learning algorithms, real progress towards interpretable AI, improving the predictivity of stem cells engineering; and maybe even more importantly, finally allowing academic researchers, biotechs and pharmas to access and participate in a large ecosystem based on these principles.
Our platform builds upon
Stem Cell (iPSC) Engineering
Deep Learning modelisation
to perform and test the outcome of biological interventions.
Deep Learning & Insilico Models
A. Web based application : Frontend ( Visual Graph & Simulation, Tools ), Backend, Distributed Database
B. Whole Cell Modeling : Ontology, Graphs and Deep Learning
C. Deep learning models I : Sequencing Data
D. Deep learning models II : Computer vision
E. Deep learning models III : Integration & state prediction
F. Deep learning models IV : State transformations
G. Deep Learning models V : Application to Multi-Cellular models
H. Integration in a learning loop : Bio-Engineering, Data Generation, Learning and Model Building
I. Deep Learning Models VI : Cell & Gene Therapy
J. Deep Learning Models VII : Clinical predictive models
- Treatment outcome prediction
K. Deep Learning Models VIII : Precision medecine
III. Deep Learning and Computation
The core models we use are designed to build
computer representations of cells and tissues
They are designed to simulate biological processes and
suggest the biological perturbations needed to reach specific states
Modelling relies on a specifically designed pipeline of deep learning and programming.
A. A pipeline designed to
between cell states
Deep learning applied to multi-omics expression data is still in its infancy, but the future is bright.
Many previously untestable hypotheses can now be interrogated as deep learning enables analysis of increasing amounts of data generated by new technologies.
For example, the effects of cellular heterogeneity on basic biology and disease etiology can now be explored by single-cell RNA-seq.
Given a set of observed cell types in control (i.e real cells) and simulation, we aim to :
Define the distance between different cells by segmenting over omics and visual expression
Predict the perturbation response of specific cells by training a model that learns to generalize the response of the cells in the training set
Reduce the distance between unwanted cell states and desired cell states
Cell states are defined by multi-omics and microscopy depending on the type of modelling :
I. Proteome, transcriptome, methylome and metabolome can be used at the cell or tissue level
Our aim is to
cell & tissue states.
To achieve that goal we are developing a pipeline based on 3 stages :
Graph-based modeling of known and inferred biological interactions
Deep unsupervised learning to
identify cell states and sub-states
Adversarial models to suggest predictive
B. DEEP LEARNING MODELS
of known and inferred biological interactions
a. Ontology generation : NLP methods
b. Equation based modeling
c. Deep Learning based approaches
1. Improving Equation based models
2. Intervention outcome prediction : efficacy, toxicity, cell faith (growth, apoptosis, differentiation, senescence)
II. Generative models to
visually interior features of cells
We are developing a deep learning pipeline to identify the interior features of cells, label them automatically, and represent them on screen. This pipeline has 3 objectives :
A. Learn a latent space of visual features to be fed to the
B. Perform automating segmentation
C. Simulate cells & tissues visually
III. Deep unsupervised learning to
identify cell states and sub-states
Deep unsupervised learning models are well adapted to the task of defining meaningful states from large amounts of data.
The models we use also allow for latent space analysis to uncover
interpretable classification rules
, which offers a number of advantages compared to black box models.
Here we describe how a specific Unsupervised model, a Variational Autoencoders (VAE), can be used.
Learn a representation in the form of a latent space
Use vector arithmetics in the autoencoder’s latent space to identify meaningful dimensions
A variational autoencoder is a neural network consisting of an encoder and a decoder similar to classical autoencoders. Unlike the classical autoencoders, VAEs are able to generate new data points.
The mathematics behind VAEs is not similar to classical autoencoders like sparse or denoising autoencoders.
The difference is that the model maximizes the likelihood of each sample xi in the training set under a generative process as formulated in the equation :
Where θ is the model parameter which in our model corresponds to a neural network with its learnable parameters and zi is a latent variable.
The most important idea of a VAE is to sample latent variables zi that have a certain probability of producing xi and to approximate P(xi).
The encoder compresses the input data (depicted as the sequencing expression of differentiating single cells) into a fewer dimensions in the so-called bottleneck layer.
The decoder tries to reconstruct the original input from the compressed data in the bottleneck layer.
Reconstruction accuracy is quantified by the loss function between the original data and the reconstructed data.
The bottleneck layer is a low-dimensional representation of the original input revealing the cell differentiation process.
IV. Adversarial models to suggest predictive
A first approach for state transformation is based on the adversarial features of GANs.
The model consist of a generator and a discriminator neural networks that are trained jointly.
The generator and discriminator are trained concurrently so that the first learns to simulate realistic data and the second learns to differentiate between different types of data patterns (see figure below).
Initially the generator is fed noise and aims to generate realistic sequencing data samples, and thereby tries to deceive the discriminator into mistakenly classifying synthetic samples as real. The discriminator classifies whether a given data point was drawn from the real data or whether it was synthetically generated . The weights of both networks are updated through back-propagation depending on the loss function.
This method is used to
generate high fidelity computer representations of the cells we study
. We reach that point once the discriminator is not able to differentiate between data coming from the real cells and the model ( and receptively predicts a 50% chance for both types.
Adversarial models can be adapted to identify biological perturbations that are likely to minimize the distance between :
biologically engineered tissues
, the goal being to improve the representativity of cells, organoids and organs-on-chips
( and tissues ) to uncover treatment options and relevant pathways
Different cell ( and tissue ) states
along specific trajectories
, for exemple stem cells to differentiated cells
Below is an exemple of a generator being trained to generate a
constraints of previously learned biologically plausible rules
This approach, among others, allows us to not only differentiate between different cell states, but to suggest biologically coherent modification of those states. Beyond health and diseases, this is directly applicable to study and simulate cell trajectories between a variety of types and subtypes of interest.
Models such as VAEs and GANs allow for the simulation of cell development or differentiation through arithmetic operations applied in their learned latent space representations.
This will be developed in a new chapter.