refineGEMs package

Here is an overview on all functions. All imports are mocked in autodoc_mock_imports in the conf.py file to enable automatic building.

refineGEMs.biomass module

Most functions within this module were copied from the MEMOTE GitHub page and modified by Gwendolyn O. Gusak.

This module provides functions to be used to assess the biomass weight as well as normalise it.

refinegems.biomass.check_normalise_biomass(model: cobra.Model) → cobra.Model | None

Checks if at least one biomass reaction is present

For each found biomass reaction checks if it sums up to 1g[CDW]

Normalises the coefficients of each biomass reaction where the sum is not 1g[CDW] until the sum is 1g[CDW]

Returns model with adjusted biomass function(s)

Args:: model (cobraModel): Model loaded with COBRApy
Returns:: cobraModel: COBRApy model with adjusted biomass functions

refinegems.biomass.normalise_biomass(biomass: cobra.Reaction, current_sum: float) → cobra.Reaction

Normalises the coefficients according to current biomass weight to one g[CDW]

Args:: biomass (Reaction): Biomass function/reaction current_sum (float): Biomass weight calculated with sum_biomass_weight in g/mmol
Returns:: Reaction: Biomass function/reaction with updated coefficients

refinegems.biomass.sum_biomass_weight(reaction: cobra.Reaction) → float

From MEMOTE: https://github.com/opencobra/memote/blob/81a55a163262a0e06bfcb036d98e8e551edc3873/src/memote/support/biomass.py#L95

Compute the sum of all reaction compounds.

This function expects all metabolites of the biomass reaction to have formula information assigned.

Parameters

reactioncobra.core.reaction.Reaction: The biomass reaction of the model under investigation.

Returns

float: The molecular weight of the biomass reaction in units of g/mmol.

refinegems.biomass.test_biomass_consistency(model: cobra.Model, reaction_id: str) → float | str

Modified from MEMOTE: https://github.com/opencobra/memote/blob/81a55a163262a0e06bfcb036d98e8e551edc3873/src/memote/suite/tests/test_biomass.py#L89

Expect biomass components to sum up to 1 g[CDW].

This test only yields sensible results if all biomass precursor metabolites have chemical formulas assigned to them. The molecular weight of the biomass reaction in metabolic models is defined to be equal to 1 g/mmol. Conforming to this is essential in order to be able to reliably calculate growth yields, to cross-compare models, and to obtain valid predictions when simulating microbial consortia. A deviation from 1 - 1E-03 to 1 + 1E-06 is accepted.

Implementation: Multiplies the coefficient of each metabolite of the biomass reaction with its molecular weight calculated from the formula, then divides the overall sum of all the products by 1000.

refinegems.biomass.test_biomass_presence(model: cobra.Model) → list[str] | None

Modified from MEMOTE: https://github.com/opencobra/memote/blob/81a55a163262a0e06bfcb036d98e8e551edc3873/src/memote/suite/tests/test_biomass.py#LL42C3-L42C3

Expect the model to contain at least one biomass reaction.

The biomass composition aka biomass formulation aka biomass reaction is a common pseudo-reaction accounting for biomass synthesis in constraints-based modelling. It describes the stoichiometry of intracellular compounds that are required for cell growth. While this reaction may not be relevant to modeling the metabolism of higher organisms, it is essential for single-cell modeling.

Implementation: Identifies possible biomass reactions using two principal steps:

1. Return reactions that include the SBO annotation “SBO:0000629” for biomass.

If no reactions can be identified this way:

Look for the buzzwords “biomass”, “growth” and “bof” in reaction IDs.

Look for metabolite IDs or names that contain the buzzword “biomass” and obtain the set of reactions they are involved in.

Remove boundary reactions from this set.

Return the union of reactions that match the buzzwords and of the reactions that metabolites are involved in that match the buzzword.

This test checks if at least one biomass reaction is present.

If no reaction can be identified return None.

refineGEMs.charges module

Provides functions for adding charges to metabolites

When iterating through all metabolites present in a model, you will find several which have no defined charge (metab.getPlugin(‘fbc’).isSetCharge() = false). This can lead to charge imbalanced reactions. This script takes information on metabolite charges from the ModelSEED database. A charge is automatically added to a metabolite if it has no defined charge and if there is only one charge denoted in ModelSEED. When multiple charges are present, the metabolite and the possible charges are noted and later returned in a dictionary.

It is possible to use the correct_charges_from_db function with other databases. The user just needs to make sure that the compounds dataframe has a ‘BiGG’ and a ‘charge’ column.

refinegems.charges.correct_charges_from_db(model: libsbml.Model, compounds: pandas.DataFrame) → tuple[libsbml.Model, dict]

Adds charges taken from given database to metabolites which have no defined charge

Args:

model (libModel): Model loaded with libsbml
compounds (pd.DataFrame): Containing database data with ‘BiGG’ (BiGG-Ids) and ‘charge’ (float or int) as columns

Returns:

tuple: libSBML model (1) & dictionary ‘metabolite_id’: list(charges) (2)

libModel: Model with added charges
dict: Metabolites with respective multiple charges

refinegems.charges.correct_charges_modelseed(model: libsbml.Model) → tuple[libsbml.Model, dict]

Wrapper function which completes the steps to charge correction with the ModelSEED database

Args:

model (libModel): Model loaded with libsbml

Returns:

tuple: libSBML model (1) & dictionary ‘metabolite_id’: list(charges) (2)

libModel: Model with added charges
dict: Metabolites with respective multiple charges

refineGEMs.comparison module

Provides functions to compare and visualize multiple models

Can mainly be used to compare growth behaviour of multiple models. All other stats are shown in the memote report.

refinegems.comparison.get_sbo_mapping_multiple(models: list[libsbml.Model]) → pandas.DataFrame

Determines number of reactions per SBO Term and adds label of SBO Terms

Args:

models (list[libModel]): Models loaded with libSBML

Returns:

pd.DataFrame: SBO Terms, number of reactions per Model and SBO Label

refinegems.comparison.plot_heatmap_dt(growth: pandas.DataFrame)

Creates heatmap of simulated doubling times with additives

Args:

growth (pd.DataFrame): Containing growth data from simulate_all

Returns:

plot: Seaborn Heatmap

refinegems.comparison.plot_heatmap_native(growth: pandas.DataFrame)

Creates a plot were if growth without additives is possible is marked from yellow to green otherwise black

Args:

growth (pd.DataFrame): Containing growth data from simulate_all

Returns:

plot: Seaborn Heatmap

refinegems.comparison.plot_initial_analysis(models: list[libsbml.Model])

Creates bar plot of number of entities per Model

Args:

models (list[libModel]): Models loaded with libSBML

Returns:

plot: Pandas Barchart

refinegems.comparison.plot_rea_sbo_multiple(models: list[libsbml.Model], rename=None)

Plots reactions per SBO Term in horizontal bar chart with stacked bars for the models

Args:

models (list[libModel]): Models loaded with libSBML
rename (dict, optional): Rename model ids to custom names. Defaults to None.

Returns:

plot: Pandas stacked barchart

refinegems.comparison.plot_venn(models: list[cobra.Model], entity: str, perc: bool = False, rename=None)

Creates Venn diagram to show the overlap of model entities

Args:

models (list[cobraModel]): Models loaded with cobrapy
entity (str): Compare on metabolite|reaction
perc (bool, optional): True if percentages should be used. Defaults to False.
rename (dict, optional): Rename model ids to custom names. Defaults to None.

Returns:

plot: Venn diagram

refinegems.comparison.simulate_all(models: list[cobra.Model], media: list[str], basis: str, anaerobic: bool) → pandas.DataFrame

Does a run of growth simulation for multiple models on different media

Args:

models (list[cobraModel]): Models loaded with cobrapy
media (list[str]): Media of interest (f.ex. LB, M9, …)
basis (str): Either default_uptake (adding metabs from default) or minimal_uptake (adding metabs from minimal medium)
anaerobic (bool): If True ‘EX_o2_e’ is set to 0.0 to simulate anaerobic conditions

Returns:

pd.DataFrame: table containing the results of the growth simulation

refineGEMs.curate module

Functions to enable annotation of entities using a manual curated table

While working on GEMs the user might come across ill-annotated or missing metabolites, reactions and genes. This module aims to enable faster manual curation by allowing to edit an excel table directly which is used to update the given model. This module makes use of the cvterms module aswell.

refinegems.curate.add_reactions_from_table(model: libsbml.Model, table: pandas.DataFrame, email: str) → libsbml.Model

Wrapper function to use with table format given in data/manual_curation.xlsx, sheet gapfill: Adds all reactions with their info given in the table to the given model

Args:

model (libModel): Model loaded with libSBML
table (pd-DataFrame): Table in format of sheet gapfill from manual_curation.xlsx located in the data folder
email (str): User Email to access the NCBI Entrez database

Returns:

libModel: Modified model with new reactions

refinegems.curate.update_annotations_from_others(model: libsbml.Model) → libsbml.Model

Synchronizes metabolite annotations for core, periplasm and extracelullar

Args:

model (libModel): Model loaded with libSBML

Returns:

libModel: Modified model with synchronized annotations

refinegems.curate.update_annotations_from_table(model: libsbml.Model, table: pandas.DataFrame) → libsbml.Model

Wrapper function to use with table format given in data/manual_curation.xlsx, sheet metabs: Updates annotation of metabolites given in the table

Args:

model (libModel): Model loaded with libSBML
table (pd-DataFrame): Table in format of sheet metabs from manual_curation.xlsx located in the data folder

Returns:

libModel: Modified model with new annotations

refineGEMs.cvterms module

Helper module to work with annotations (CVTerms)

Stores dictionaries which hold information the identifiers.org syntax, has functions to add CVTerms to different entities and parse CVTerms.

refinegems.cvterms.add_cv_term_genes(entry: str, db_id: str, gene: libsbml.GeneProduct, lab_strain: bool = False)

Adds CVTerm to a gene

Args:

entry (str): Id to add as annotation
db_id (str): Database to which entry belongs. Must be in gene_db_dict.keys().
gene (GeneProduct): Gene to add CVTerm to
lab_strain (bool, optional): For locally sequenced strains the qualifiers are always HOMOLOG_TO. Defaults to False.

refinegems.cvterms.add_cv_term_metabolites(entry: str, db_id: str, metab: libsbml.Species)

Adds CVTerm to a metabolite

Args:

entry (str): Id to add as annotation
db_id (str): Database to which entry belongs. Must be in metabol_db_dict.keys().
metab (Species): Metabolite to add CVTerm to

refinegems.cvterms.add_cv_term_pathways(entry: str, db_id: str, path: libsbml.Group)

Add CVTerm to a groups pathway

Args:

entry (str): Id to add as annotation
db_id (str): Database to which entry belongs. Must be in pathway_db_dict.keys().
path (Group): Pathway to add CVTerm to

refinegems.cvterms.add_cv_term_pathways_to_entity(entry: str, db_id: str, reac: libsbml.Reaction)

Add CVTerm to a reaction as OCCURS IN pathway

Args:

entry (str): Id to add as annotation
db_id (str): Database to which entry belongss
reac (Reaction): Reaction to add CVTerm to

refinegems.cvterms.add_cv_term_reactions(entry: str, db_id: str, reac: libsbml.Reaction)

Adds CVTerm to a reaction

Args:

entry (str): Id to add as annotation
db_id (str): Database to which entry belongs. Must be in reaction_db_dict.keys().
reac (Reaction): Reaction to add CVTerm to

refinegems.cvterms.add_cv_term_units(unit_id: str, unit: libsbml.Unit, relation: int)

Adds CVTerm to a unit

Args:

unit_id (str): ID to add as URI to annotation
unit (Unit): Unit to add CVTerm to
relation (int): Provides model qualifier to be added

refinegems.cvterms.generate_cvterm(qt, b_m_qt) → libsbml.CVTerm

Generates a CVTerm with the provided qualifier & biological or model qualifier types

Args:

qt (libSBML qualifier type): BIOLOGICAL_QUALIFIER or MODEL_QUALIFIER
b_m_qt (libSBML qualifier): BQM_IS, BQM_IS_HOMOLOG_TO, etc.

Returns:

CVTerm: With provided qualifier & biological or model qualifier types

refinegems.cvterms.get_id_from_cv_term(entity: libsbml.SBase, db_id: str) → list[str]

Extract Id for a specific database from CVTerm

Args:

entity (SBase): Species, Reaction, Gene, Pathway
db_id (str): Database of interest

Returns:

list[str]: Ids of entity belonging to db_id

refinegems.cvterms.print_cvterm(cvterm: libsbml.CVTerm)

Debug function: Prints the URIs contained in the provided CVTerm along with the provided qualifier & biological/model qualifier types

Args:: cvterm (CVTerm): A libSBML CVTerm

refineGEMs.gapfill module

The gapfill module can be used either with KEGG were you only need the KEGG organism ID or with BioCyc or with both (Options: ‘KEGG’, ‘BioCyc’, ‘KEGG+BioCyc’). For how to obtain the BioCyc tables look into the documentation under ‘Filling gaps with refineGEMs’ > ‘Automated gap filling’.

Run times:

‘KEGG’: ~ 2h

‘BioCyc’: ~ 45mins - 1h

‘KEGG+BioCyc’: ~ 3 - 4h

refinegems.gapfill.gap_analysis(model_libsbml: libsbml.Model, gapfill_params: dict[slice(<class 'str'>, <class 'str'>, None)], filename: str) → pandas.DataFrame | tuple

Main function to infer gaps in a model by comparing the locus tags of the GeneProducts | to KEGG/BioCyc/both

Args:

model_libsbml (libModel): Model loaded with libSBML
gapfill_params (dict): Dictionary obtained from YAML file containing the parameter mappings
filename (str): Path to output file for gapfill analysis result

Returns:

Case ‘KEGG’
pd.DataFrame: Table containing the columns ‘bigg_id’ ‘locus_tag’ ‘EC’ ‘KEGG’ ‘name’ ‘GPR’
Case ‘BioCyc’
tuple: Five tables (1) - (4)
pd.DataFrame: Gap fill statistics with the columns
‘Missing entity’ ‘Total’ ‘Have BiGG ID’ ‘Can be added’ ‘Notes’

pd.DataFrame: Genes with the columns
‘locus_tag’ ‘protein_id’ ‘model_id’ ‘name’

pd.DataFrame: Metabolites with the columns
‘bigg_id’ ‘name’ ‘BioCyc’ ‘compartment’ ‘Chemical Formula’ ‘InChI-Key’ ‘ChEBI’ ‘charge’

pd.DataFrame: Reactions with the columns
‘bigg_id’ ‘name’ ‘BioCyc’ ‘locus_tag’ ‘Reactants’ ‘Products’ ‘EC’ ‘Fluxes’ ‘Spontaneous?’ ‘bigg_reaction’
Case ‘KEGG+BioCyc’:

tuple: Five tables (1)-(4) from output of ‘BioCyc’ & (5) from output of ‘KEGG’
-> Table reactions contains additionally column ‘KEGG’

refinegems.gapfill.gapfill(model_libsbml: libsbml.Model, gapfill_params: dict[slice(<class 'str'>, <class 'str'>, None)], filename: str) → tuple[pandas.DataFrame, libsbml.Model] | tuple[tuple, libsbml.Model]

Main function to fill gaps in a model by comparing the locus tags of the GeneProducts to | KEGG/BioCyc/(Genbank) GFF file

Args:

model_libsbml (libModel): Model loaded with libSBML
gapfill_params (dict): Dictionary obtained from YAML file containing the parameter mappings
filename (str): Path to output file for gapfill analysis result
gapfill_model_out (str): Path where gapfilled model should be written to

Returns:

tuple: gap_analysis() table(s) (1) & libSBML model (2)

pd.DataFrame|tuple(pd.DataFrame): Result from function gap_analysis()
libModel: Gap filled model

refinegems.gapfill.gapfill_model(model_libsbml: libsbml.Model, gap_analysis_result: str | tuple) → libsbml.Model

Main function to fill gaps in a model from a table

Args:

model_libsbml (libModel): Model loaded with libSBML
gap_analysis_result (str|tuple): Path to Excel file from gap_analysis|Tuple of pd.DataFrames obtained from gap_analysis

Returns:

libModel: Gap filled model

refineGEMs.growth module

Provides functions to simulate growth on any medium

Tailored to work with the media denoted in the local db, should work with any medium as long as its defined in a csv with ; as delimiter and BiGG Ids for the compounds. Use refinegems.io.load_medium_custom and hand this to the growth_one_medium_from_default or growth_one_medium_from_minimum function.

refinegems.growth.find_additives(model: cobra.Model, base_medium: dict) → pandas.DataFrame

Iterates through all exchanges to find metabolites that lead to a higher growth rate compared to the growth rate yielded on the base_medium

Args:

model (cobraModel): Model loaded with COBRApy
base_medium (dict): Exchanges as keys and their flux bound as value (f.ex {‘EX_glc__D_e’ : 10.0})

Returns:

pd.DataFrame: Exchanges sorted from highest to lowest growth rate improvement

refinegems.growth.find_minimum_essential(medium: pandas.DataFrame, essential: list[str]) → list[str]

Report metabolites necessary for growth and not in custom medium

Args:

medium (pd.DataFrame): Dataframe with medium definition
essential (list[str]): Ids of all metabolites which lead to zero growth if blocked. Output of find_missing_essential.

Returns:

list[str]: Ids of exchanges of metabolites not present in the medium but necessary for growth

refinegems.growth.find_missing_essential(model: cobra.Model, growth_medium: dict, default_uptake: list[str], anaerobic: bool) → list[str]

Report which exchange reactions are needed for growth, combines default uptake and valid new medium

Args:

model (cobraModel): Model loaded with COBRApy
growth_medium (dict): Growth medium definition that can be used with the model. Output of modify_medium.
default_uptake (list[str]): Metabolites consumed in standard medium
anaerobic (bool): If True ‘EX_o2_e’ is set to 0.0 to simulate anaerobic conditions

Returns:

list[str]: Ids of exchanges of all metabolites which lead to zero growth if blocked

refinegems.growth.get_all_minimum_essential(model: cobra.Model, media: list[str]) → pandas.DataFrame

Returns metabolites necessary for growth and not in media

Args:

model (cobraModel): Model loaded with COBRApy
media (list[str]): Containing the names of all media for which the growth essential metabolites not contained in the media should be returned

Returns:

pd.DataFrame: information on different media which metabs are missing

refinegems.growth.get_default_secretion(model: cobra.Model) → list[str]

Checks fluxes after FBA, if positive the metabolite is produced

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

list[str]: BiGG Ids of produced metabolites

refinegems.growth.get_default_uptake(model: cobra.Model) → list[str]

Determines which metabolites are used in the standard medium

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

list[str]: Metabolites consumed in standard medium

refinegems.growth.get_essential_reactions(model: cobra.Model) → list[str]

Knocks out each reaction, if no growth is detected the reaction is seen as essential

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

list[str]: BiGG Ids of essential reactions

refinegems.growth.get_essential_reactions_via_bounds(model: cobra.Model) → list[str]

Knocks out reactions by setting their bounds to 0, if no growth is detected the reaction is seen as essential

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

list[str]: BiGG Ids of essential reactions

refinegems.growth.get_growth_selected_media(model: cobra.Model, media: list[str], basis: str, anaerobic: bool) → pandas.DataFrame

Simulates growth on all given media

Args:

model (cobraModel): Model loaded with COBRApy
media (list[str]): Ids of media to simulate on
basis (str): Either default_uptake (adding metabs from default) or minimal_uptake (adding metabs from minimal medium)
anaerobic (bool): If True ‘EX_o2_e’ is set to 0.0 to simulate anaerobic conditions

Returns:

pd.DataFrame: Information on growth behaviour on given media

refinegems.growth.get_minimal_uptake(model: cobra.Model) → list[str]

Determines which metabolites are used in a minimal medium

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

list[str]: Metabolites consumed in minimal medium

refinegems.growth.get_missing_exchanges(model: cobra.Model, medium: pandas.DataFrame) → list[str]

Look for exchange reactions needed by the medium but not in the model

Args:

model (cobraModel): Model loaded with COBRApy
medium (pd.DataFrame): Dataframe with medium definition

Returns:

list[str]: Ids of all exchanges missing in the model but given in medium

refinegems.growth.growth_one_medium_from_default(model: cobra.Model, medium: pandas.DataFrame, anaerobic: bool) → pandas.DataFrame

Simulates growth on given medium, adding missing metabolites from the default uptake

Args:

model (cobraModel): Model loaded with COBRApy
medium (pd.DataFrame): Dataframe with medium definition
anaerobic (bool): If True ‘EX_o2_e’ is set to 0.0 to simulate anaerobic conditions

Returns:

pd.DataFrame: Information on growth behaviour on given medium

refinegems.growth.growth_one_medium_from_minimal(model: cobra.Model, medium: pandas.DataFrame, anaerobic: bool) → pandas.DataFrame

Simulates growth on given medium, adding missing metabolites from a minimal uptake

Args:

model (cobraModel): Model loaded with COBRApy
medium (pd.DataFrame): Dataframe with medium definition
anaerobic (bool): If True ‘EX_o2_e’ is set to 0.0 to simulate anaerobic conditions

Returns:

pd.DataFrame: Information on growth behaviour on given medium

refinegems.growth.modify_medium(medium: pandas.DataFrame, missing_exchanges: list[str]) → dict

Helper function: Remove exchanges from medium that are not in the model to avoid KeyError

Args:

medium (pd.DataFrame): Dataframe with medium definition
missing_exchanges (list): Ids of exchanges not in the model

Returns:

dict: Growth medium definition that can be used with the model (f.ex {‘EX_glc__D_e’ : 10.0})

refinegems.growth.set_fluxes_to_simulate(reaction: cobra.Reaction) → cobra.Reaction

Helper function: Set flux bounds to -1000.0 and 1000.0 to enable model simulation with growth_one_medium_from_minimal/default

Args:

reaction (Reaction): Reaction with unusable flux bounds

Returns:

Reaction: Reaction with usable flux bounds

refinegems.growth.simulate_minimum_essential(model: cobra.Model, growth_medium: dict, minimum: list[str], anaerobic: bool) → float

Simulate growth with custom medium plus necessary uptakes

Args:

model (cobraModel): Model loaded with COBRApy
growth_medium (dict): Growth medium definition that can be used with the model. Output of modify_medium.
minimum (list[str]): Ids of exchanges of metabolites not present in the medium but necessary for growth. Output of find_minimum_essential.
anaerobic (bool): If True ‘EX_o2_e’ is set to 0.0 to simulate anaerobic conditions

Returns:

float: Growth value in mmol per (gram dry weight) per hour

refineGEMs.investigate module

Provides functions to investigate the model and test with MEMOTE

These functions enable simple testing of any model using MEMOTE and access to its number of reactions, metabolites and genes.

refinegems.investigate.get_egc(model: cobra.Model) → pandas.DataFrame

Energy-generating cycles represent thermodynamically infeasible states. Charging of energy metabolites without any energy source causes such cycles. Detection method is based on (Fritzemeier et al., 2017)

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

pd.DataFrame: Table with possible EGCs

refinegems.investigate.get_mass_charge_unbalanced(model: cobra.Model) → tuple[list[str], list[str]]

Creates lists of mass and charge unbalanced reactions,vwithout exchange reactions since they are unbalanced per definition

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

tuple: Lists of reactions that might cause errors (1) & (2) (1) list: List of mass unbalanced reactions (2) list: List of charge unbalanced reactions

refinegems.investigate.get_memote_score(memote_report: dict) → float

Extracts MEMOTE score from report

Args:

memote_report (dict): Output from run_memote.

Returns:

float: MEMOTE score

refinegems.investigate.get_metabs_with_one_cvterm(model: libsbml.Model) → list[str]

Reports metabolites which have only one annotation, can be used as basis for further annotation research

Args:

model (libModel): Model loaded with libSBML

Returns:

list: Metabolite Ids with only one annotation

refinegems.investigate.get_model_info(modelpath: str) → pandas.DataFrame

Reports core information of given model

Args:

modelpath (str): Path to model file

Returns:

pd.DataFrame: Overview on model parameters

refinegems.investigate.get_orphans_deadends_disconnected(model: cobra.Model) → tuple[list[str], list[str], list[str]]

Uses MEMOTE functions to extract orphans, deadends and disconnected metabolites

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

tuple: Lists of metabolites that might cause errors (1) - (3)

list: List of orphans
list: List of deadends
list: List of disconnected metabolites

refinegems.investigate.get_reactions_per_sbo(model: libsbml.Model) → dict

Counts number of reactions of all SBO Terms present

Args:

model (libModel): Model loaded with libSBML

Returns:

dict: SBO Term as keys and number of reactions as values

refinegems.investigate.initial_analysis(model: libsbml.Model) → tuple[str, int, int, int]

Extracts most important numbers of GEM

Args:

model (libModel): Model loaded with libSBML

Returns:

tuple: Model name (1) & corresponding amounts of entities (2) - (4)

str: Name of model
int: Number of reactions
int: Number of metabolites
int: Number of genes

refinegems.investigate.parse_reaction(eq: str, model: cobra.Model) → dict

Parses a reaction equation string to dictionary

Args:

eq (str): Equation of a reaction
model (cobraModel): Model loaded with COBRApy

Returns:

dict: Metabolite Ids as keys and their coefficients as values (negative = educts, positive = products)

refinegems.investigate.plot_rea_sbo_single(model: libsbml.Model)

Plots reactions per SBO Term in horizontal bar chart

Args:

model (libModel): Model loaded with libSBML

Returns:

plot: Pandas Barchart

refinegems.investigate.run_memote(model: cobra.Model) → dict

Runs MEMOTE to obtain report as dict

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

dict: MEMOTE report as json in dict format

refinegems.investigate.run_memote_sys(model: cobra.Model)

Run MEMOTE on the local linux machine

Args:

model (cobraModel): Model loaded with COBRApy

refineGEMs.io module

Provides functions to load and write models, media definitions and the manual annotation table

Depending on the application the model needs to be loaded with cobra (memote) or with libSBML (activation of groups). The media definitions are denoted in a csv within the data folder of this repository, thus the functions will only work if the user clones the repository. The manual_annotations table has to follow the specific layout given in the data folder in order to work with this module.

refinegems.io.load_a_table_from_database(table_name_or_query: str) → pandas.DataFrame

Loads the table for which the name is provided or a table containing all rows for which the query evaluates to | true from the refineGEMs database (‘data/database/data.db’)

Args:

table_name_or_query (str): Name of a table contained in the database ‘data.db’/ a SQL query

Returns:

pd.DataFrame: Containing the table for which the name was provided from the database ‘data.db’

refinegems.io.load_all_media_from_db(mediumpath: str) → pandas.DataFrame

Helper function to extract media definitions from media_db.csv

Args:

mediumpath (str): Path to csv file with medium database

Returns:

pd.DataFrame: Table from csv with metabs added as BiGG_EX exchange reactions

refinegems.io.load_document_libsbml(modelpath: str) → libsbml.SBMLDocument

Loads model document using libSBML

Args:

modelpath (str): Path to GEM

Returns:

SBMLDocument: Loaded document by libSBML

refinegems.io.load_manual_annotations(tablepath: str = 'data/manual_curation.xlsx', sheet_name: str = 'metab') → pandas.DataFrame

Loads metabolite sheet from manual curation table

Args:

tablepath (str): Path to manual curation table. Defaults to ‘data/manual_curation.xlsx’.
sheet_name (str): Sheet name for metabolite annotations. Defaults to ‘metab’.

Returns:

pd.DataFrame: Table containing specified sheet from Excel file

refinegems.io.load_manual_gapfill(tablepath: str = 'data/manual_curation.xlsx', sheet_name: str = 'gapfill') → pandas.DataFrame

Loads gapfill sheet from manual curation table

Args:

tablepath (str): Path to manual curation table. Defaults to ‘data/manual_curation.xlsx’.
sheet_name (str): Sheet name for reaction gapfilling. Defaults to ‘gapfill’.

Returns:

pd.DataFrame: Table containing sheet with name ‘gapfill’|specified sheet_name from Excel file

refinegems.io.load_medium_custom(mediumpath: str) → pandas.DataFrame

Helper function to read medium csv

Args:

mediumpath (str): path to csv file with medium

Returns:

pd.DataFrame: Table of csv

refinegems.io.load_medium_from_db(mediumname: str) → pandas.DataFrame

Wrapper function to extract subtable for the requested medium from the database ‘data.db’

Args:

mediumname (str): Name of medium to test growth on

Returns:

pd.DataFrame: Table containing composition for one medium with metabs added as BiGG_EX exchange reactions

refinegems.io.load_model_cobra(modelpath: str) → cobra.Model

Loads model using COBRApy

Args:

modelpath (str): Path to GEM

Returns:

cobraModel: Loaded model by COBRApy

refinegems.io.load_model_libsbml(modelpath: str) → libsbml.Model

Loads model using libSBML

Args:

modelpath (str): Path to GEM

Returns:

libModel: loaded model by libSBML

refinegems.io.load_multiple_models(models: list[str], package: str) → list

Loads multiple models into a list

Args:

models (list): List of paths to models
package (str): COBRApy|libSBML

Returns:

list: List of model objects loaded with COBRApy|libSBML

refinegems.io.parse_dict_to_dataframe(str2list: dict) → pandas.DataFrame

Parses dictionary of form {str: list} & | Transforms it into a table with a column containing the strings and a column containing the lists

Args:: str2list (dict): Dictionary mapping strings to lists
Returns:: pd.DataFrame: Table with column containing the strings and column containing the lists

refinegems.io.parse_fasta_headers(filepath: str, id_for_model: bool = False) → pandas.DataFrame

Parses FASTA file headers to obtain:

the protein_id

and the model_id (like it is obtained from CarveMe)

corresponding to the locus_tag

Args:

filepath (str): Path to FASTA file
id_for_model (bool): True if model_id similar to autogenerated GeneProduct ID should be contained in resulting table

Returns:

pd.DataFrame: Table containing the columns locus_tag, Protein_id & Model_id

refinegems.io.parse_gff_for_gp_info(gff_file: str) → pandas.DataFrame

Parses gff file of organism to find gene protein reactions based on locus tags

Args:

gff_file (str): Path to gff file of organism of interest

Returns:

pd.DataFrame: Table containing mapping from locus tag to GPR

refinegems.io.save_user_input(configpath: str) → dict[slice(<class 'str'>, <class 'str'>, None)]

This aims to collect user input from the command line to create a config file, will also save the user input to a config if no config was given

Args:

configpath (str): Path to config file if present

Returns:

dict: Either loaded config file or created from user input

refinegems.io.search_ncbi_for_gpr(locus: str) → str

Fetches protein name from NCBI

Args:

locus (str): NCBI compatible locus_tag

Returns:

str: Protein name|description

refinegems.io.search_sbo_label(sbo_number: str) → str

Looks up the SBO label corresponding to a given SBO Term number

Args:

sbo_number (str): Last three digits of SBO-Term as str

Returns:

str: Denoted label for given SBO Term

refinegems.io.validate_libsbml_model(model: libsbml.Model) → int

Debug method: Validates a libSBML model with the libSBML validator Args:

model (libModel): A libSBML model

Returns:: int: Integer specifying if vaidate was successful or not

refinegems.io.write_report(dataframe: pandas.DataFrame, filepath: str)

Writes reports stored in dataframes to xlsx file

Args:

dataframe (pd.DataFrame): Table containing output
filepath (str): Path to file with filename

refinegems.io.write_to_file(model: libsbml.Model, new_filename: str)

Writes modified model to new file

Args:

model (libModel): Model loaded with libSBML
new_filename (str): Filename|Path for modified model

refineGEMs.modelseed module

Reports mismatches in charges and formulae based on ModelSEED

Extracts ModelSEED data from a given tsv file, extracts all metabolites from a given model. Both lists of metabolites are compared by charge and formula.

refinegems.modelseed.compare_model_modelseed(model_charges: pandas.DataFrame, modelseed_charges: pandas.DataFrame) → pandas.DataFrame

Compares tables with charges / formulae from model & modelseed

Args:

model_charges (pd.DataFrame): Charges and formulae of model metabolites. Output of get_model_charges.
modelseed_charges (pd.DataFrame): Charges and formulae of ModelSEED metabolites. Output of get_modelseed_charges.

Returns:

pd.DataFrame: Table containing info whether charges / formulae match

refinegems.modelseed.compare_to_modelseed(model: cobra.Model) → tuple[pandas.DataFrame, pandas.DataFrame]

Executes all steps to compare model metabolites to ModelSEED metabolites

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

tuple: Tables with charge (1) & formula (2) mismatches

pd.DataFrame: Table with charge mismatches
pd.DataFrame: Table with formula mismatches

refinegems.modelseed.get_charge_mismatch(df_comp: pandas.DataFrame) → pandas.DataFrame

Extracts metabolites with charge mismatch of model & modelseed

Args:: df_comp (pd.DataFrame): Charge and formula mismatches. Output from compare_model_modelseed.
Returns:: pd.DataFrame: Table containing metabolites with charge mismatch

refinegems.modelseed.get_compared_formulae(formula_mismatch: pandas.DataFrame) → pandas.DataFrame

Compare formula by atom pattern

Args:: formula_mismatch (pd.DataFrame): Table with column containing atom comparison. Output from get_formula_mismatch.
Returns:: pd.DataFrame: table containing metabolites with formula mismatch

refinegems.modelseed.get_formula_mismatch(df_comp: pandas.DataFrame) → pandas.DataFrame

Extracts metabolites with formula mismatch of model & modelseed

Args:: df_comp (pd.DataFrame): Charge and formula mismatches. Output from compare_model_modelseed.
Returns:: pd.DataFrame: Table containing metabolites with formula mismatch

refinegems.modelseed.get_model_charges(model: cobra.Model) → pandas.DataFrame

Extracts all metabolites from model

Args:

model (cobraModel): Model loaded with COBRApy

Returns:

pd.DataFrame: Table containing charges and formulae of model metabolites

refinegems.modelseed.get_modelseed_charges(modelseed_compounds: pandas.DataFrame) → pandas.DataFrame

Extract table with BiGG, charges and formulae

Args:

modelseed_compounds (pd.DataFrame): ModelSEED data. Output from get_modelseed_compounds.

Returns:

pd.DataFrame: Table containing charges and formulae of ModelSEED metabolites

refinegems.modelseed.get_modelseed_compounds() → pandas.DataFrame

Extracts compounds from ModelSEED which have BiGG Ids

Returns:: pd.DataFrame: Table containing ModelSEED data

refineGEMs.pathways module

Provides functions for adding KEGG reactions as Group Pathways

If your organism occurs in the KEGG database, extract the KEGG reaction ID from the annotations of your reactions and identify, in which KEGG pathways this reaction occurs. Add all KEGG pathways for a reaction then as annotations with the biological qualifier ‘OCCURS_IN’ to the respective reaction.

refinegems.pathways.add_kegg_pathways(model, kegg_pathways)

Add KEGG reactions as BQB_OCCURS_IN

Args:

model (libModel): Model loaded with libSBML. Output of load_model_enable_groups.
kegg_pathways (dict): Reaction Id as key and Kegg Pathway Id as value. Output of extract_kegg_pathways.

Returns:

libsbml-model: modified model with Kegg pathways

refinegems.pathways.create_pathway_groups(model: libsbml.Model, pathway_groups)

Use group module to add reactions to Kegg pathway

Args:

model (libModel): Model loaded with libSBML. Output of load_model_enable_groups.
pathway_groups (dict): Kegg Pathway Id as key and reactions Ids as values. Output of get_pathway_groups.

Returns:

libModel: modified model with groups for pathways

refinegems.pathways.extract_kegg_pathways(kegg_reactions: dict) → dict

Finds pathway for reactions in model with KEGG Ids, accesses KEGG API, uses tqdm to report progres to user

Args:

kegg_reactions (dict): Reaction Id as key and Kegg Id as value. Output[0] from extract_kegg_reactions.

Returns:

dict: Reaction Id as key and Kegg Pathway Id as value

refinegems.pathways.extract_kegg_reactions(model: libsbml.Model) → tuple[dict, list]

Extract KEGG Ids from reactions

Args:

model (libModel): Model loaded with libSBML. Output of load_model_enable_groups.

Returns:

tuple: Dictionary ‘reaction_id’: ‘KEGG_id’ (1) & List of reactions without KEGG Id (2)

dict: Reaction Id as key and Kegg Id as value
list: Ids of reactions without KEGG annotation

refinegems.pathways.get_pathway_groups(kegg_pathways)

Group reaction into pathways

Args:

kegg_pathways (dict): Reaction Id as key and Kegg Pathway Id as value. Output of extract_kegg_pathways.

Returns:

dict: Kegg Pathway Id as key and reactions Ids as values

refinegems.pathways.kegg_pathways(modelpath: str) → tuple[libsbml.Model, list[str]]

Executes all steps to add KEGG pathways as groups

Args:

modelpath (str): Path to GEM

Returns:

tuple: libSBML model (1) & List of reactions without KEGG Id (2)

libModel: Modified model with Pathways as groups
list: Ids of reactions without KEGG annotation

refinegems.pathways.load_model_enable_groups(modelpath: str) → libsbml.Model

Loads model as document using libSBML and enables groups extension

Args:

modelpath (str): Path to GEM

Returns:

libModel: Model loaded with libSBML

refineGEMs.polish module

Can be used to polish a model (created with CarveMe v.1.5.1)

The newer version of CarveMe leads to some irritations in the model, these scripts enable for example the addition of BiGG Ids to the annotations as well as a correct formatting of the annotations.

refinegems.polish.add_compartment_structure_specs(model: libsbml.Model)

Adds the required specifications for the compartment structure | if not set (size & spatial dimension)

Args:

model (libModel): Model loaded with libSBML

refinegems.polish.add_fba_units(model: libsbml.Model)

Args:

model (libModel): Model loaded with libSBML

refinegems.polish.add_metab(entity_list: list[libsbml.Species], id_db: str)

Adds the ID of metabolites as URI to the annotation field | For a VMH model, additionally, the corresponding BiGG IDs are added! | (Currently, only BiGG & VMH IDs supported!)

Args:

entity_list (list): libSBML ListOfSpecies
id_db (str): Name of the database of the IDs contained in a model

refinegems.polish.add_reac(entity_list: list[libsbml.Reaction], id_db: str)

Adds the ID of reactions as URI to the annotation field

(Currently, only BiGG & VMH IDs supported!)

Args:

entity_list (list): libSBML ListOfReactions

id_db (str): Name of the database of the IDs contained in a model

refinegems.polish.add_uri_set(entity: libsbml.SBase, qt, b_m_qt, uri_set: sortedcontainers.SortedSet.<class 'str'>) → list[str]

Add a complete URI set to the provided CVTerm

Args:

entity (SBase): A libSBML SBase object like model, GeneProduct, etc.
qt: A libSBML qualifier type: BIOLOGICAL_QUALIFIER|MODEL_QUALIFIER
b_m_qt: A libSBML biological or model qualifier type like BQB_IS|BQM_IS
uri_set (SortedSet[str]): SortedSet containing URIs

refinegems.polish.change_all_qualifiers(model: libsbml.Model, lab_strain: bool) → libsbml.Model

Wrapper function to change qualifiers of all entities at once

Args:

model (libModel): Model loaded with libSBML
lab_strain (bool): True if the strain was sequenced in a local lab

Returns:

libModel: Model with all qualifiers updated to be MIRIAM compliant

refinegems.polish.change_qualifier_per_entity(entity: libsbml.SBase, new_qt, new_b_m_qt, specific_db_prefix: str | None = None) → list

Updates Qualifiers to be MIRIAM compliant for an entity

Args:

entity (SBase): A libSBML SBase object like model, GeneProduct, etc.
new_qt (Qualifier): A libSBML qualifier type: BIOLOGICAL_QUALIFIER|MODEL_QUALIFIER
new_b_m_qt (QualifierType): A libSBML biological or model qualifier type like BQB_IS|BQM_IS
specific_db_prefix (str): Has to be set if only for a specific database the qualifier type should be changed. Can be ‘kegg.genes’, ‘biocyc’, etc.

Returns:

list: CURIEs that are not MIRIAM compliant

refinegems.polish.change_qualifiers(model: libsbml.Model, entity_type: str, new_qt, new_b_m_qt, specific_db_prefix: str | None = None) → libsbml.Model

Updates Qualifiers to be MIRIAM compliant for an entity type of a given model

Args:

model (libModel): Model loaded with libSBML
entity_type (str): Any string of the following: model|compartment|metabolite|parameter|reaction|unit definition|unit|gene product|group
new_qt (Qualifier): A libSBML qualifier type: BIOLOGICAL_QUALIFIER|MODEL_QUALIFIER
new_b_m_qt (QualifierType): A libSBML biological or model qualifier type like BQB_IS|BQM_IS
specific_db_prefix (str): Has to be set if only for a specific database the qualifier type should be changed. Can be ‘kegg.genes’, ‘biocyc’, etc.

Returns:

libModel: Model with changed qualifier for given entity type

refinegems.polish.create_fba_units(model: libsbml.Model) → list[libsbml.UnitDefinition]

Creates all fba units required for a constraint-based model

Args:

model (libModel): Model loaded with libSBML

Returns:

list: List of libSBML UnitDefinitions

refinegems.polish.create_unit(model_specs: tuple[int], meta_id: str, kind: str, e: int, m: int, s: int, uri_is: str = '', uri_idf: str = '') → libsbml.Unit

Creates unit for SBML model according to arguments

Args:

model_specs (tuple): Level & Version of SBML model
meta_id (str): Meta ID for unit (Neccessary for URI)
kind (str): Unit kind constant (see libSBML for available constants)
e (int): Exponent of unit
m (int): Multiplier of unit
s (int): Scale of unit
uri_is (str): URI supporting the specified unit
uri_idf (str): URI supporting the derived from unit

Returns:

Unit: libSBML unit object

refinegems.polish.create_unit_definition(model_specs: tuple[int], identifier: str, name: str, units: list[libsbml.Unit]) → libsbml.UnitDefinition

Creates unit definition for SBML model according to arguments

Args:

model_specs (tuple): Level & Version of SBML model
identifier (str): Identifier for the defined unit
name (str): Full name of the defined unit
units (list): All units the defined unit consists of

Returns:

UnitDefinition: libSBML unit definition object

refinegems.polish.cv_ncbiprotein(gene_list, email, protein_fasta: str, lab_strain: bool = False)

Adds NCBI Id to genes as annotation

Args:

gene_list (list): libSBML ListOfGenes
email (str): User Email to access the Entrez database
protein_fasta (str): The path to the CarveMe protein.fasta input file
lab_strain (bool): Needs to be set to True if strain was self-annotated
and/or the locus tags in the CarveMe input file should be kept

refinegems.polish.cv_notes_metab(species_list: list[libsbml.Species])

Checks the notes field for information which should be in the annotation field | removes entry from notes and adds it as URL to the CVTerms of a metabolite

Args:

species_list (list): libSBML ListOfSpecies

refinegems.polish.cv_notes_reac(reaction_list: list[libsbml.Reaction])

Checks the notes field for information which should be in the annotation field | removes entry from notes and adds it as URL to the CVTerms of a reaction

Args:

reaction_list (list): libSBML ListOfReactions

refinegems.polish.generate_miriam_compliant_uri_set(prefix2id: sortedcontainers.SortedDict.slice(<class 'str'>, sortedcontainers.SortedSet.<class 'str'>, None)) → sortedcontainers.SortedSet.<class 'str'>

Generate a set of complete MIRIAM compliant URIs from the provided prefix to identifier mapping

Args:

prefix2id (SortedDict[str: SortedSet[str]]): Dictionary containing a mapping from database prefixes to their respective identifier sets

Returns:

SortedSet: Sorted set containing complete URIs

refinegems.polish.generate_uri_set_with_specific_pattern(prefix2id: sortedcontainers.SortedDict.slice(<class 'str'>, sortedcontainers.SortedSet.<class 'str'>, None), new_pattern: bool) → sortedcontainers.SortedSet.<class 'str'>

Generate a set of complete URIs from the provided prefix to identifier mapping

Args:

prefix2id (SortedDict[str: SortedSet[str]]): Dictionary containing a mapping from database prefixes to their respective identifier sets
new_pattern (bool): True if new pattern is wanted, otherwise False

Returns:

SortedSet: Sorted set containing complete URIs

refinegems.polish.get_set_of_curies(uri_list: list[str]) → tuple[sortedcontainers.SortedDict.slice(<class 'str'>, sortedcontainers.SortedSet.<class 'str'>, None), list[str]]

Gets a list of URIs | & maps the database prefixes to their respective identifier sets

Args:

uri_list (list[str]): List containing CURIEs

Returns:

tuple: Two dictionaries (1) & (2)

SortedDict: Sorted dictionary mapping database prefixes from the provided CURIEs to their respective identifier sets also provided by the CURIEs
list: List of CURIEs that are invalid according to bioregistry

refinegems.polish.improve_uri_per_entity(entity: libsbml.SBase, bioregistry: bool, new_pattern: bool) → tuple[list[str], list[str]]

Helper function: Removes duplicates & changes pattern according to new_pattern

Args:

entity (SBase): A libSBML SBase object, either a model or an entity
bioregistry (bool): Specifies whether the URIs should be changed with the help of bioregistry to be MIRIAM compliant or changed according to new or old pattern
new_pattern (bool): True if new pattern is wanted, otherwise False

Returns:

tuple: Two lists (1) & (2)

list: List of all collected invalid annotations of one entity
list: List of all collected invalid CURIEs of one entity

refinegems.polish.improve_uris(entities: libsbml.SBase, bioregistry: bool, new_pattern: bool) → tuple[dict[slice(<class 'str'>, list[str], None)], dict[slice(<class 'str'>, list[str], None)]]

Removes duplicates & changes pattern according to bioregistry or new_pattern

Args:

entities (SBase): A libSBML SBase object, either a model or a list of entities
bioregistry (bool): Specifies whether the URIs should be changed with the help of bioregistry to be MIRIAM compliant or changed according to new or old pattern
new_pattern (bool): True if new pattern is wanted, otherwise False

Returns:

tuple: Two dictionnaries (1) & (2)

dictionary: Mapping of entity identifier to list of corresponding not MIRIAM compliant annotations
dictionary: Mapping of entity identifier to list of corresponding invalid CURIEs

refinegems.polish.polish(model: libsbml.Model, email: str, id_db: str, protein_fasta: str, lab_strain: bool, path: str) → libsbml.Model

Completes all steps to polish a model | (Tested for models having either BiGG or VMH identifiers.)

Args:

model (libModel): model loaded with libSBML
email (str): E-mail for Entrez
id_db (str): Main database where identifiers in model come from
protein_fasta (str): File used as input for CarveMe
lab_strain (bool): True if the strain was sequenced in a local lab
path (str): Output path for incorrect annotations file(s)

Returns:

libModel: Polished libSBML model

refinegems.polish.polish_annotations(model: libsbml.Model, bioregistry: bool, new_pattern: bool, filename: str) → libsbml.Model

Polishes all annotations in a model such that no duplicates are present | & the same pattern is used for all CURIEs

Args:

model (libModel): Model loaded with libSBML
bioregistry (bool): Specifies whether the URIs should be changed with the help of bioregistry to be MIRIAM compliant or changed according to new or old pattern
new_pattern (bool): True if new pattern is wanted, otherwise False
filename (str): Path to output file for invalid CURIEs detected by improve_uris

Returns:

libModel: libSBML model with polished annotations

refinegems.polish.polish_entities(entity_list: list, metabolite: bool)

Sets boundary condition and constant if not set for a metabolite

Args:

entity_list (list): libSBML ListOfSpecies or ListOfReactions
metabolite (boolean): flag to determine whether entity = metabolite

refinegems.polish.print_UnitDefinitions(contained_unit_defs: list[libsbml.UnitDefinition])

Prints a list of libSBML UnitDefinitions as XMLNodes

Args:

contained_unit_defs (list): List of libSBML UnitDefinition objects

refinegems.polish.print_remaining_UnitDefinitions(model: libsbml.Model, list_of_fba_units: list[libsbml.UnitDefinition])

Prints UnitDefinitions from the model that were removed as these were not contained in the list_of_fba_units

Args:

model (libModel): Model loaded with libSBML
list_of_fba_units (list): List of libSBML UnitDefinitions

refinegems.polish.set_default_units(model: libsbml.Model)

Sets default units of model

Args:

model (libModel): Model loaded with libSBML

refinegems.polish.set_initial_amount(model: libsbml.Model)

Sets initial amount to all metabolites if not already set or if initial concentration is not set

Args:

model (libModel): Model loaded with libSBML

refinegems.polish.set_units(model: libsbml.Model)

Sets units of parameters in model

Args:

model (libModel): Model loaded with libSBML

refineGEMs.sboann module

Provides functions to automate the addition of SBO terms to the model

Script written by Elisabeth Fritze in her bachelor thesis. Modified by Gwendolyn O. Gusak during her master thesis. Commented by Famke Bäuerle and extended by Nantia Leonidou.

It is splitted into a lot of small functions which are all annotated, however when using it for SBO-Term annotation it only makes sense to run the “main” function: sbo_annotation(model_libsbml, database_user, database_name) if you want to continue with the model. The smaller functions might be useful if special information is needed for a reaction without the context of a bigger model or when the automated annotation fails for some reason.

refinegems.sboann.addSBOforCompartments(model)

refinegems.sboann.addSBOforGenes(model)

refinegems.sboann.addSBOforGroups(model)

refinegems.sboann.addSBOforMetabolites(model)

refinegems.sboann.addSBOforModel(model)

refinegems.sboann.addSBOforParameters(model)

refinegems.sboann.addSBOfromDB(reac: libsbml.Reaction, cur) → bool

Adds SBO term based on bigg id of a reaction

Args:

reac (Reaction): Reaction from sbml model
cur (sqlite3.connect.cursor): Used to access the sqlite3 database

Returns:

bool: True if SBO Term was changed

refinegems.sboann.addSBOviaEC(reac: libsbml.Reaction, cur)

Adds SBO terms based on EC numbers given in the annotations of a reactions

Args:

reac (Reaction): Reaction from sbml model
cur (sqlite3.connect.cursor): Used to access the sqlite3 database

refinegems.sboann.checkAcetylationViaEC(reac: libsbml.Reaction)

Tests if reac is acetylation by its EC-Code and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkActiveTransport(reac: libsbml.Reaction)

Tests if reac is active transport (uses atp/pep) and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkBiomass(reac: libsbml.Reaction)

Tests if reac is biomass / growth and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkCoTransport(reac: libsbml.Reaction)

Tests if reac is co-transport and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkDeamination(reac: libsbml.Reaction)

Tests if reac is deamination and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkDeaminationViaEC(reac: libsbml.Reaction)

Tests if reac is deamination by its EC-Code and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkDecarbonylation(reac: libsbml.Reaction)

Tests if reac is decarbonylation and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkDecarboxylation(reac: libsbml.Reaction)

Tests if reac is decarboxylation and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkDecarboxylationViaEC(reac: libsbml.Reaction)

Tests if reac is decarboxylation by its EC-Code and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkDemand(reac: libsbml.Reaction)

Tests if reac is demand and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkExchange(reac: libsbml.Reaction)

Tests if reac is exchange and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkGlycosylation(reac: libsbml.Reaction)

Tests if reac is glycosylation and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkGlycosylationViaEC(reac: libsbml.Reaction)

Tests if reac is glycosylation by its EC-Code and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkHydrolysisViaEC(reac: libsbml.Reaction)

Tests if reac is hydrolysis by its EC-Code and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkIsomerisationViaEC(reac: libsbml.Reaction)

Tests if reac is isomerisation by its EC-Code and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkMethylationViaEC(reac: libsbml.Reaction)

Tests if reac is methylation by its EC-Code and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkPassiveTransport(reac: libsbml.Reaction)

Tests if reac is passive transport and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkPhosphorylation(reac: libsbml.Reaction)

Tests if reac is phosphorylase / kinase and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkRedox(reac: libsbml.Reaction)

Tests if reac is redox and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkRedoxViaEC(reac: libsbml.Reaction)

Tests if reac is redox by its EC-Code and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkSink(reac: libsbml.Reaction)

Tests if reac is sink and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.checkTransaminationViaEC(reac: libsbml.Reaction)

Tests if reac is transamination by its EC-Code and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.getCompartmentDict(reac: libsbml.Reaction)

sorts metabolites by compartment

Args:

reac (Reaction): Reaction from sbml model

Returns:

dict: compartment as key and metabolites as values

refinegems.sboann.getCompartmentFromSpeciesRef(speciesReference: libsbml.SpeciesReference) → libsbml.Compartment

Extracts compartment from a species by its reference

Args:

speciesReference (SpeciesReference): Reference to species

Returns:

Compartment: Compartment which the species lives in

refinegems.sboann.getCompartmentList(reac: libsbml.Reaction)

Extracts compartments of metabolites

Args:

reac (Reaction): Reaction from sbml model

Returns:

set: compartment information of all metabolites

refinegems.sboann.getCompartmentlessMetaboliteIds(reac: libsbml.Reaction)

Extracts metabolites which have no compartment information

Args:

reac (Reaction): Reaction from sbml model

Returns:

list: all metabolites which have no compartment

refinegems.sboann.getCompartmentlessProductIds(reac: libsbml.Reaction)

Extracts products which have no compartment information

Args:

reac (Reaction): Reaction from sbml model

Returns:

list: products (metabolites) without compartments

refinegems.sboann.getCompartmentlessReactantIds(reac: libsbml.Reaction)

Extracts reactants which have no compartment information

Args:

reac (Reaction): Reaction from sbml model

Returns:

list: reactants (metabolites) without compartments

refinegems.sboann.getCompartmentlessSpeciesId(speciesReference: libsbml.SpeciesReference) → str

Determines wheter a species has compartment by its refernece

Args:

speciesReference (SpeciesReference): Reference to species

Returns:

libsbml-species-id: id of species without compartment

refinegems.sboann.getECNums(reac: libsbml.Reaction)

Extracts EC-Code from the reaction annotations

Args:

reac (Reaction): Reaction from sbml model

Returns:

list: all EC-Numbers of the reaction

refinegems.sboann.getListOfMetabolites(reac: libsbml.Reaction)

Extracts list of metabolites of the reaction

Args:

reac (Reaction): Reaction from sbml model

Returns:

list: metabolites that are part of the reaction

refinegems.sboann.getMetaboliteIds(reac: libsbml.Reaction)

Extracts list of metabolite ids of reaction

Args:

reac (Reaction): Reaction from sbml model

Returns:

list: metabolite ids

refinegems.sboann.getProductCompartmentList(reac: libsbml.Reaction)

Extracts compartments of products

Args:

reac (Reaction): Reaction from sbml model

Returns:

set: compartment information of all products (metabolites)

refinegems.sboann.getProductIds(reac: libsbml.Reaction)

Extracts products (metabolites) of reaction

Args:

reac (Reaction): Reaction from sbml model

Returns:

list: products (metabolites) ids

refinegems.sboann.getReactantCompartmentList(reac: libsbml.Reaction)

Extracts compartments of reactants

Args:

reac (Reaction): Reaction from sbml model

Returns:

set: compartment information of all reactants (metabolites)

refinegems.sboann.getReactantIds(reac: libsbml.Reaction) → list[str]

Extracts reactants (metabolites) of reaction

Args:

reac (Reaction): Reaction from sbml model

Returns:

list[str]: Reactants (metabolites) ids

refinegems.sboann.hasReactantPair(reac: libsbml.Reaction, met1: libsbml.Species, met2: libsbml.Species) → bool

Checks if a pair of metabolites is present in reaction | needed for special reactions like redox or deamination

Args:

reac (Reaction): Reaction from sbml model
met1 (Species): metabolite 1 of metabolite pair
met2 (Species): metabolite 2 of metabolite pair

Returns:

bool: True if one of the metabolites is in reactants and the other in products

refinegems.sboann.isProtonTransport(reac: libsbml.Reaction)

check if reaction is proton transport

Args:

reac (Reaction): Reaction from sbml model

Returns:

bool: True if reaction is proton transport

refinegems.sboann.moreThanTwoCompartmentTransport(reac: libsbml.Reaction)

check if reaction traverses more than 2 compartments

Args:

reac (Reaction): Reaction from sbml model

Returns:

bool: True if reaction traverses more than 2 compartments

refinegems.sboann.returnCompartment(id): Helper to split compartment id

refinegems.sboann.sbo_annotation(model_libsbml: libsbml.Model) → libsbml.Model

Executes all steps to annotate SBO terms to a given model (former main function of original script by Elisabeth Fritze)

Args:

model_libsbml (libModel): Model loaded with libsbml

Returns:

libModel: Modified model with SBO terms

refinegems.sboann.soleProtonTransported(reac: libsbml.Reaction)

check if reaction is transport powered by one H

Args:

reac (Reaction): Reaction from sbml model

Returns:

bool: True if reaction is transport powered by one H

refinegems.sboann.splitSymAntiPorter(reac: libsbml.Reaction)

Tests if reac is sym- or antiporter and sets SBO Term if true

Args:

reac (Reaction): Reaction from sbml model

refinegems.sboann.splitTransportBiochem(reac: libsbml.Reaction)

Tests if reaction traverses more than 1 compartment and set SBO Term

Args:

reac (Reaction): Reaction from sbml model