-
Notifications
You must be signed in to change notification settings - Fork 1
Leveraging the increased statistical value of flux sampling
The project will introduce new features for further statistical analysis of the samples and visualizations generated by dingo
.
These will include at least the following features:
- inference of pairwise correlated reactions
- construction of a weighted graph of the model's reactions with the correlation coefficients as weights
- annotation of these weights to the metabolic model and extraction to an annotated SBML file
All methods will be implemented in Python and merged in the dingo
library.
The contributor will also run experiments with several metabolic networks to investigate the scaling of their findings.
In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors.
Contrary to Flux Balance Analysis (FBA), flux sampling is not dependent on an objective function [1]. Thus, sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. Moreover, by sampling a sufficient number of samples one can study the properties of certain components of the whole network and deduce significant biological insights such as correlated reactions and/or pathways and more.
To sample uniformly from a convex polytope, dingo
uses Multiphase Monte Carlo Sampling method based on Billiard walk [2].
Metabolic models are usually in SBML format and can be visualized through Cytoscape.
For more information contact the mentors.
References:
[1] Apostolos Chalkis, Vissarion Fisikopoulos, Elias Tsigaridas, Haris Zafeiropoulos, Geometric algorithms for sampling the flux space of metabolic networks, 2021.
[2] Wedmark, Y. K., Vik, J. O., & Øyås, O. (2023). A hierarchy of metabolite exchanges in metabolic models of microbial species and communities. bioRxiv, 2023-09.
The contributor will have to initiate the a post-process statistical analysis of the returned samples.
Then, they will extend dingo's illustrations.py
to visualize reaction pairs found correlated.
Both Plotly and plotnine libraries will be considered.
Then they will have to run a few experiments on benchmark metabolic networks to assess how their methods scale in real-world metabolic networks and write a brief report with the results.
Difficulty: Medium
Large (350 hours)
- Required: python, basic knowledge in mathematics (especially linear algebra and/or geometry)
- Preferred: Experience with mathematical software, C++ and/or biology is a plus
The project will provide great help in the interpretation of the sampling findings. This benefits both the biologists community as they would gain novel insight and the geometry community highlighting the added value of the random sampling methods. Also, it brings together GeomScale Org. with the NRNB community supporting Cytoscape.
-
Haris Zafeiropoulos <haris.zafeiropoulos at kuleuven.be> is working on metabolic modeling software development and applications as a post-doc in the Lab of Systems Biology at KU Leuven and has previous GSoC student experience (2021) and mentoring experience with GeomScale (2022) and NRNB (2023).
-
Apostolos Chalkis <tolis.chal at gmail.com> is a Research Engineer at Quantagonia GmbH. He is an expert in statistical software, computational geometry, and optimization, and has previous GSoC student experience (2018 & 2019) and mentoring experience with GeomScale (from 2020 to 2023).
Students, please do one or more of the following tests before contacting the mentors above.
-
Easy: compile and run
dingo
. Use the documentation to sample from the flux space of the e_coli model. -
Medium: Compare FBA solution with your samples. Choose radom pairs of reactions and check if they are correlated or not.
-
Hard: For a pair of correlated reactions, show whether all the reactions of the metabolic pathway they belong to are also correlated.