refinegems.curation

biomass module

Most functions within this module were copied from the MEMOTE GitHub page and modified by Gwendolyn O. Döbel.

This module provides functions to normalise the biomass objective function(s).

refinegems.curation.biomass.check_normalise_biomass(model: cobra.Model, cycles: int = 10) cobra.Model | None[source]
  1. Checks if at least one biomass reaction is present

  2. For each found biomass reaction checks if it sums up to 1g[CDW]

  3. Normalises the coefficients of each biomass reaction where the sum is not 1g[CDW] until the sum is 1g[CDW]

  4. Returns model with adjusted biomass function(s)

Args:
  • model (cobraModel):

    Model loaded with COBRApy

  • cycles (int, optional):

    Maximal number of optiomisation cycles that will be run. Used to avoid endless optiomisation cycles.

Returns:
cobraModel:

COBRApy model with adjusted biomass functions

charges module

Provides functions for adding charges to metabolites

When iterating through all metabolites present in a model, you will find several which have no defined charge (metab.getPlugin('fbc').isSetCharge() = false). This can lead to charge imbalanced reactions. This script takes information on metabolite charges from the ModelSEED database. A charge is automatically added to a metabolite if it has no defined charge and if there is only one charge denoted in ModelSEED. When multiple charges are present, the metabolite and the possible charges are noted and later returned in a dictionary.

It is possible to use the correct_charges_from_db function with other databases. The user just needs to make sure that the compounds dataframe has a ‘BiGG’ and a ‘charge’ column.

refinegems.curation.charges.correct_charges_from_db(model: libsbml.Model, compounds: pandas.DataFrame) tuple[libsbml.Model, dict][source]

Adds charges taken from given database to metabolites which have no defined charge

Args:
  • model (libModel):

    Model loaded with libsbml

  • compounds (pd.DataFrame):

    Containing database data with ‘BiGG’ (BiGG-Ids) and ‘charge’ (float or int) as columns

Returns:
tuple:

libSBML model (1) & dictionary ‘metabolite_id’: list(charges) (2)

  1. libModel: Model with added charges

  2. dict: Metabolites with respective multiple charges

refinegems.curation.charges.correct_charges_modelseed(model: libsbml.Model) tuple[libsbml.Model, dict][source]

Wrapper function which completes the steps to charge correction with the ModelSEED database

Args:
  • model (libModel):

    Model loaded with libsbml

Returns:
tuple:

libSBML model (1) & dictionary ‘metabolite_id’: list(charges) (2)

  1. libModel: Model with added charges

  2. dict: Metabolites with respective multiple charges

curate module

General functions for curating a model

This module provides functionalities for curating models, including special functions for CarveMe models.

Since CarveMe version 1.5.1, the draft models from CarveMe contain pieces of information that are not correctly added to the annotations. To address this, this module includes the following functionalities:

  • Add URIs from the entity IDs to the annotation field for metabolites & reactions

  • Transfer URIs from the notes field to the annotations for metabolites & reactions

  • Add URIs from the GeneProduct IDs to the annotations

The functionalities for CarveMe models, along with some of the following further functionalities, are gathered in the main function polish_model().

Further functionalities:

  • Setting boundary condition & constant for metabolites & reactions

  • Unit handling to add units & UnitDefinitions & to set units for parameters

  • Addition of default settings for compartments & metabolites

  • Addition of URIs to GeneProducts

    • via a mapping from model IDs to valid database IDs

    • via the KEGG API

  • Changing the CURIE pattern/CVTerm qualifier & qualifier type

  • Directionality control

refinegems.curation.curate.NH_PATTERN = re.compile('nh[3-4]')
refinegems.curation.curate.add_compartment_structure_specs(model: libsbml.Model)[source]
Adds the required specifications for the compartment structure
if not set (size & spatial dimension)
Args:
  • model (libModel):

    Model loaded with libSBML

refinegems.curation.curate.check_direction(model: cobra.Model, data: pandas.DataFrame | str) cobra.Model[source]

Check the direction of reactions by searching for matching MetaCyc, KEGG and MetaNetX IDs as well as EC number in a downloaded BioCyc (MetaCyc) database table or dataFrame (need to contain at least the following columns: Reactions (MetaCyc ID),EC-Number,KEGG reaction,METANETX,Reaction-Direction.

Args:
model (cobra.Model):

The model loaded with COBRApy.

data (pd.DataFrame | str):

Either a pandas DataFrame or a path to a CSV file containing the BioCyc smart table.

Raises:
  • TypeError: Unknown data type for parameter data

Returns:
cobra.Model:

The edited model.

refinegems.curation.curate.extend_gp_annots_via_KEGG(gene_list: list[libsbml.GeneProduct], kegg_organism_id: str)[source]

Adds KEGG gene & UniProt identifiers to the GeneProduct annotations

Args:
gene_list (list[GeneProduct]):

libSBML ListOfGenes

kegg_organism_id (str):

Organism identifier in the KEGG database

refinegems.curation.curate.extend_gp_annots_via_mapping_table(model: libsbml.Model, mapping_tbl_file: str | Path = None, gff_paths: list[str] = None, email: str = None, contains_locus_tags: bool = False, lab_strain: bool = False, outpath: str = None) libsbml.Model[source]
Extend GenePoduct annotations via mapping table.
If no mapping table is provided, a mapping table will be generated.
Args:
  • model (libModel):

    Model loaded with libSBML

  • mapping_tbl_file (str|Path, optional):

    Path to a file containing a mapping table with columns model_id | X... where X can be REFSEQ, NCBI, locus_tag or UNCLASSIFIED. The table can contain all of the X columns or at least one of them. Defaults to None.

  • gff_paths (list[str], optional):

    Path(s) to GFF file(s). Allowed GFF formats are: RefSeq, NCBI and Prokka. This is only used when mapping_tbl_file == None. Defaults to None.

  • email (str, optional):

    E-mail for NCBI queries. This is only used when mapping_tbl_file == None. Defaults to None.

  • contains_locus_tags (bool, optional):

    Specifies if provided model has locus tags within the label tag if set to True. This is only used when mapping_tbl_file == None. Defaults to False.

  • lab_strain (bool, optional):

    Specifies if a strain from no database was provided and thus has only homolog mappings if set to True. Defaults to False.

  • outpath (str, optional):

    Output path for location where the generated mapping table should be written to. This is only used when mapping_tbl_file == None. Defaults to None.

Returns:
libModel:

Modified model with extended annotations for the GeneProducts

refinegems.curation.curate.extend_metab_reac_annots_via_id(entity_list: libsbml.ListOfSpecies | libsbml.ListOfReactions, id_db: str) None[source]

Extends metabolite or reaction annotations by extracting the core ID from the entity ID and adding this ID as valid annotation if possible

Args:
  • entity_list (Union[ListOfSpecies, ListOfReactions]):

    List of entities, either species (metabolites, ListOfSpecies) or reactions (ListOfReactions)

  • id_db (str):

    The database prefix to validate IDs against. Must correspond to a valid prefix in the Bioregistry

Raises:
  • TypeError: Unsupported type for entity_list

refinegems.curation.curate.extend_metab_reac_annots_via_notes(entity_list: libsbml.ListOfSpecies | libsbml.ListOfReactions) None[source]

Extends metabolite or reaction annotations by extracting valid URIs from the notes section of the provided entities and cleans up the notes by removing processed elements

Args:
  • entity_list (Union[ListOfSpecies, ListOfReactions]):

    List of entities, either species (metabolites, ListOfSpecies) or reactions (ListOfReactions)

Raises:
  • TypeError: Unsupported type for entity_list

refinegems.curation.curate.polish_entity_conditions(entity_list: libsbml.ListOfSpecies | libsbml.ListOfReactions)[source]

Sets boundary condition and constant if not set for an entity

Args:
  • entity_list (Union[ListOfSpecies, ListOfReactions]):

    libSBML ListOfSpecies or ListOfReactions

refinegems.curation.curate.polish_model(model: libsbml.Model, id_db: str = 'BiGG', mapping_tbl_file: str = None, gff_paths: list[str] = None, email: str = None, contains_locus_tags: bool = False, lab_strain: bool = False, kegg_organism_id: str = None, reaction_direction: str = None, outpath: str = None) libsbml.Model[source]

Completes all steps to polish a model

Note

So far only tested for models having either BiGG or VMH identifiers.

Args:
  • model (libModel):

    Model loaded with libSBML

  • id_db (str, optional):

    Main database where identifiers in model come from. Defaults to ‘BiGG’.

  • mapping_tbl_file (str, optional):

    Path to a file containing a mapping table with columns model_id | X... where X can be REFSEQ, NCBI, locus_tag or UNCLASSIFIED. The table can contain all of the X columns or at least one of them. Defaults to None.

  • gff_paths (list[str], optional):

    Path(s) to GFF file(s). Allowed GFF formats are: RefSeq, NCBI and Prokka. This is only used when mapping_tbl_file == None. Defaults to None.

  • email (str, optional):

    E-mail for NCBI queries. This is only used when mapping_tbl_file == None. Defaults to None.

  • contains_locus_tags (bool, optional):

    Specifies if provided model has locus tags within the label tag if set to True. This is only used when mapping_tbl_file == None. Defaults to False.

  • lab_strain (bool, optional):

    Specifies if a strain from no database was provided and thus has only homolog mappings, if set to True. Defaults to False.

  • kegg_organism_id (str, optional):

    KEGG organism identifier if available. Defaults to None.

  • reaction_direction (str, optional):

    Path to a CSV file containing the BioCyc smart table with the columns Reactions (MetaCyc ID) | EC-Number | KEGG reaction | METANETX | Reaction-Direction. For more details see check_direction() Defaults to None.

  • outpath (str, optional):

    Output path for mapping table from model ID to valid database IDs (if mapping_tbl_file == None) & incorrect annotations file(s). Defaults to None.

Returns:
libModel:

Polished libSBML model

refinegems.curation.curate.polish_model_units(model: libsbml.Model) None[source]

Replaces the list of unit definitions with the unit definitions needed for FBA:

  • mmol per gDW per h

  • mmol per gDW

  • hour (h)

  • femto litre (fL)

Args:
  • model (libModel):

    Model loaded with libSBML

refinegems.curation.curate.resolve_duplicate_metabolites(model: cobra.Model, based_on: str = 'metanetx.chemical', replace: bool = True) cobra.Model[source]

Resolve duplicate metabolites in a model. Metabolites are considered duplicate if they share the same annotations (same or nan).

Note

Depending on the starting database, the results might differ.

Args:
  • model (cobra.Model):

    The model loaded with COBRApy.

  • based_on (str, optional):

    Label to base the resolvement process on . Can be any annotation label. Defaults to ‘metanetx.chemical’.

  • replace (bool, optional):

    Either report the duplicates (False) or replace them with one (True). Defaults to True.

Returns:
cobra.Model:

The model.

refinegems.curation.curate.resolve_duplicate_reactions(model: cobra.Model, based_on: str = 'reaction', remove_reac: bool = True) cobra.Model[source]

Resolve and remove duplicate reaction based on their reaction equation and matching database identifiers. Only if all match or a comparison with nan occurs will one of the reactions be removed.

Args:
  • model (cobra.Model):

    A model loaded with COBRApy.

  • based_on (str, optional):

    Label to base the resolvement process on . Can be ‘reaction’ or any other annotation label. Defaults to ‘reaction’.

  • remove_reac (bool, optional):

    When True, combines and remove duplicates. Otherwise only reports the findings. Defaults to True.

Returns:
cobra.Model:

The model.

refinegems.curation.curate.resolve_duplicates(model: cobra.Model, check_reac: bool = True, check_meta: Literal['default', 'exhaustive', 'skip'] = 'default', replace_dupl_meta: bool = True, remove_unused_meta: bool = False, remove_dupl_reac: bool = True) cobra.Model[source]

Resolve and remove (optional) duplicate metabolites and reactions in the model.

Args:
  • model (cobra.Model):

    The model loaded with COBRApy.

  • check_reac (bool, optional):

    Whether to check reactions for duplicates. Defaults to True.

  • check_meta (Literal[‘default’,’exhaustive’,’skip’], optional):

    Whether to check for duplicate metabolites. Defaults to ‘default’.

  • replace_dupl_meta (bool, optional):

    Option to replace/remove duplicate metabolites. Defaults to True.

  • remove_unused_meta (bool, optional):

    Option to remove unused metabolites. Defaults to False.

  • remove_dupl_reac (bool, optional):

    Option to combine/remove duplicate reactions. Defaults to True.

Returns:
cobra.Model:

The (edited) model.

refinegems.curation.curate.set_initial_amount_metabs(model: libsbml.Model)[source]

Sets initial amount to all metabolites if not already set or if initial concentration is not set

Args:
  • model (libModel):

    Model loaded with libSBML

refinegems.curation.curate.set_model_default_units(model: libsbml.Model)[source]

Sets default units of model

Args:
  • model (libModel):

    Model loaded with libSBML

refinegems.curation.curate.set_units_of_parameters(model: libsbml.Model)[source]

Sets units of parameters in model

Args:
  • model (libModel):

    Model loaded with libSBML

refinegems.curation.curate.update_annotations_from_others(model: libsbml.Model) libsbml.Model[source]

Synchronizes metabolite annotations for core, periplasm and extracelullar

Args:
  • model (libModel):

    Model loaded with libSBML

Returns:
libModel:

Modified model with synchronized annotations

pathways module

This module provides functions for adding, handling and analysing the KEGG pathways (or more specific their annotations) contained in a model.

refinegems.curation.pathways.add_kegg_pathways(model, kegg_pathways) libsbml.Model[source]

Add KEGG reactions as BQB_OCCURS_IN.

Args:
Returns:

libModel: Modified model with KEGG pathways.

refinegems.curation.pathways.create_pathway_groups(model: libsbml.Model, pathway_groups) libsbml.Model[source]

Use group module to add reactions to KEGG pathway.

Args:
  • model (libModel):

    Model loaded with libSBML. Output of load_model_enable_groups().

  • pathway_groups (dict):

    KEGG Pathway Id as key and reactions Ids as values, e.g. see output of _invert_reac_pathway_dict().

Returns:
libModel:

Modified model with groups for pathways.

refinegems.curation.pathways.find_kegg_pathways(mapped_reacs: dict, viaEC: bool = False, viaRC: bool = False) dict[source]

Given a dictionary of reaction IDs mapped to KEGG reaction IDs and/or EC numbers, extract the KEGG pathways for each reaction based on the KEGG reaction ID.

Args:
  • mapped_reacs (dict):

    Dictionary containing the information about the reactions. For more information see, _extract_kegg_ec_from_reac().

  • viaEC (bool, optional):

    If True, also tries mapping to pathways via EC number, if via reaction ID is unsuccessful. Defaults to False.

  • viaRC (bool, optional):

    If True, also tries mapping to pathways via reaction class, if via reaction ID is unsuccessful. Defaults to False.

Returns:
dict:

Dictionary with the reaction IDs as keys and a list of KEGG pathway IDs as values.

refinegems.curation.pathways.kegg_pathway_analysis(model: cobra.Model) KEGGPathwayAnalysisReport[source]

Analyse the pathways that are covered by the model.

The analysis is based on the KEGG pathway classification and the available KEGG pathway identifiers present in the model.

Note: one reaction can have multiple pathway identifiers associated with it. This analysis focuses on the total number of IDs found within the model.

Args:
  • model (cobra.Model):

    A model loaded with COBRApy.

Returns:
KEGGPathwayAnalysisReport:

The KEGG pathway analysis report.

refinegems.curation.pathways.load_model_enable_groups(modelpath: str) libsbml.Model[source]

Loads model as document using libSBML and enables groups extension

Args:
  • modelpath (str):

    Path to GEM

Returns:
libModel:

Model loaded with libSBML

refinegems.curation.pathways.set_kegg_pathways(modelpath: str, viaEC: bool = False, viaRC: bool = False) tuple[libsbml.Model, list[str]][source]

Executes all steps to add KEGG pathways as groups

Args:
  • modelpath (str):

    Path to GEM.

Returns:
tuple:

libSBML model (1) & List of reactions without KEGG Id (2)

  1. libModel: Modified model with Pathways as groups

  2. list: Ids of reactions without KEGG annotation

polish module