refinegems.curation
biomass module
Most functions within this module were copied from the MEMOTE GitHub page and modified by Gwendolyn O. Döbel.
This module provides functions to normalise the biomass objective function(s).
- refinegems.curation.biomass.check_normalise_biomass(model: cobra.Model, cycles: int = 10) cobra.Model | None[source]
Checks if at least one biomass reaction is present
For each found biomass reaction checks if it sums up to 1g[CDW]
Normalises the coefficients of each biomass reaction where the sum is not 1g[CDW] until the sum is 1g[CDW]
Returns model with adjusted biomass function(s)
- Args:
- model (cobraModel):
Model loaded with COBRApy
- cycles (int, optional):
Maximal number of optiomisation cycles that will be run. Used to avoid endless optiomisation cycles.
- Returns:
- cobraModel:
COBRApy model with adjusted biomass functions
charges module
Provides functions for adding charges to metabolites
When iterating through all metabolites present in a model, you will find several
which have no defined charge (metab.getPlugin('fbc').isSetCharge() = false).
This can lead to charge imbalanced reactions. This script takes information on
metabolite charges from the ModelSEED database. A charge is automatically added to
a metabolite if it has no defined charge and if there is only one charge denoted in
ModelSEED. When multiple charges are present, the metabolite and the possible charges
are noted and later returned in a dictionary.
It is possible to use the correct_charges_from_db function with other databases. The user just needs to make sure that the compounds dataframe has a ‘BiGG’ and a ‘charge’ column.
- refinegems.curation.charges.correct_charges_from_db(model: libsbml.Model, compounds: pandas.DataFrame) tuple[libsbml.Model, dict][source]
Adds charges taken from given database to metabolites which have no defined charge
- Args:
- model (libModel):
Model loaded with libsbml
- compounds (pd.DataFrame):
Containing database data with ‘BiGG’ (BiGG-Ids) and ‘charge’ (float or int) as columns
- Returns:
- tuple:
libSBML model (1) & dictionary ‘metabolite_id’: list(charges) (2)
libModel: Model with added charges
dict: Metabolites with respective multiple charges
- refinegems.curation.charges.correct_charges_modelseed(model: libsbml.Model) tuple[libsbml.Model, dict][source]
Wrapper function which completes the steps to charge correction with the ModelSEED database
- Args:
- model (libModel):
Model loaded with libsbml
- Returns:
- tuple:
libSBML model (1) & dictionary ‘metabolite_id’: list(charges) (2)
libModel: Model with added charges
dict: Metabolites with respective multiple charges
curate module
General functions for curating a model
This module provides functionalities for curating models, including special functions for CarveMe models.
Since CarveMe version 1.5.1, the draft models from CarveMe contain pieces of information that are not correctly added to the annotations. To address this, this module includes the following functionalities:
Add URIs from the entity IDs to the annotation field for metabolites & reactions
Transfer URIs from the notes field to the annotations for metabolites & reactions
Add URIs from the GeneProduct IDs to the annotations
The functionalities for CarveMe models, along with some of the following further functionalities, are gathered in the
main function polish_model().
Further functionalities:
Setting boundary condition & constant for metabolites & reactions
Unit handling to add units & UnitDefinitions & to set units for parameters
Addition of default settings for compartments & metabolites
Addition of URIs to GeneProducts
via a mapping from model IDs to valid database IDs
via the KEGG API
Changing the CURIE pattern/CVTerm qualifier & qualifier type
Directionality control
- refinegems.curation.curate.NH_PATTERN = re.compile('nh[3-4]')
- refinegems.curation.curate.add_compartment_structure_specs(model: libsbml.Model)[source]
- Adds the required specifications for the compartment structureif not set (size & spatial dimension)
- Args:
- model (libModel):
Model loaded with libSBML
- refinegems.curation.curate.check_direction(model: cobra.Model, data: pandas.DataFrame | str) cobra.Model[source]
Check the direction of reactions by searching for matching MetaCyc, KEGG and MetaNetX IDs as well as EC number in a downloaded BioCyc (MetaCyc) database table or dataFrame (need to contain at least the following columns: Reactions (MetaCyc ID),EC-Number,KEGG reaction,METANETX,Reaction-Direction.
- Args:
- model (cobra.Model):
The model loaded with COBRApy.
- data (pd.DataFrame | str):
Either a pandas DataFrame or a path to a CSV file containing the BioCyc smart table.
- Raises:
TypeError: Unknown data type for parameter data
- Returns:
- cobra.Model:
The edited model.
- refinegems.curation.curate.extend_gp_annots_via_KEGG(gene_list: list[libsbml.GeneProduct], kegg_organism_id: str)[source]
Adds KEGG gene & UniProt identifiers to the GeneProduct annotations
- Args:
- gene_list (list[GeneProduct]):
libSBML ListOfGenes
- kegg_organism_id (str):
Organism identifier in the KEGG database
- refinegems.curation.curate.extend_gp_annots_via_mapping_table(model: libsbml.Model, mapping_tbl_file: str | Path = None, gff_paths: list[str] = None, email: str = None, contains_locus_tags: bool = False, lab_strain: bool = False, outpath: str = None) libsbml.Model[source]
- Extend GenePoduct annotations via mapping table.If no mapping table is provided, a mapping table will be generated.
- Args:
- model (libModel):
Model loaded with libSBML
- mapping_tbl_file (str|Path, optional):
Path to a file containing a mapping table with columns
model_id | X...where X can beREFSEQ,NCBI,locus_tagorUNCLASSIFIED. The table can contain all of theXcolumns or at least one of them. Defaults to None.
- gff_paths (list[str], optional):
Path(s) to GFF file(s). Allowed GFF formats are: RefSeq, NCBI and Prokka. This is only used when mapping_tbl_file == None. Defaults to None.
- email (str, optional):
E-mail for NCBI queries. This is only used when mapping_tbl_file == None. Defaults to None.
- contains_locus_tags (bool, optional):
Specifies if provided model has locus tags within the label tag if set to True. This is only used when mapping_tbl_file == None. Defaults to False.
- lab_strain (bool, optional):
Specifies if a strain from no database was provided and thus has only homolog mappings if set to True. Defaults to False.
- outpath (str, optional):
Output path for location where the generated mapping table should be written to. This is only used when mapping_tbl_file == None. Defaults to None.
- Returns:
- libModel:
Modified model with extended annotations for the GeneProducts
- refinegems.curation.curate.extend_metab_reac_annots_via_id(entity_list: libsbml.ListOfSpecies | libsbml.ListOfReactions, id_db: str) None[source]
Extends metabolite or reaction annotations by extracting the core ID from the entity ID and adding this ID as valid annotation if possible
- Args:
- entity_list (Union[ListOfSpecies, ListOfReactions]):
List of entities, either species (metabolites, ListOfSpecies) or reactions (ListOfReactions)
- id_db (str):
The database prefix to validate IDs against. Must correspond to a valid prefix in the Bioregistry
- Raises:
TypeError: Unsupported type for entity_list
- refinegems.curation.curate.extend_metab_reac_annots_via_notes(entity_list: libsbml.ListOfSpecies | libsbml.ListOfReactions) None[source]
Extends metabolite or reaction annotations by extracting valid URIs from the notes section of the provided entities and cleans up the notes by removing processed elements
- Args:
- entity_list (Union[ListOfSpecies, ListOfReactions]):
List of entities, either species (metabolites, ListOfSpecies) or reactions (ListOfReactions)
- Raises:
TypeError: Unsupported type for entity_list
- refinegems.curation.curate.polish_entity_conditions(entity_list: libsbml.ListOfSpecies | libsbml.ListOfReactions)[source]
Sets boundary condition and constant if not set for an entity
- Args:
- entity_list (Union[ListOfSpecies, ListOfReactions]):
libSBML ListOfSpecies or ListOfReactions
- refinegems.curation.curate.polish_model(model: libsbml.Model, id_db: str = 'BiGG', mapping_tbl_file: str = None, gff_paths: list[str] = None, email: str = None, contains_locus_tags: bool = False, lab_strain: bool = False, kegg_organism_id: str = None, reaction_direction: str = None, outpath: str = None) libsbml.Model[source]
Completes all steps to polish a model
Note
So far only tested for models having either BiGG or VMH identifiers.
- Args:
- model (libModel):
Model loaded with libSBML
- id_db (str, optional):
Main database where identifiers in model come from. Defaults to ‘BiGG’.
- mapping_tbl_file (str, optional):
Path to a file containing a mapping table with columns
model_id | X...where X can beREFSEQ,NCBI,locus_tagorUNCLASSIFIED. The table can contain all of theXcolumns or at least one of them. Defaults to None.
- gff_paths (list[str], optional):
Path(s) to GFF file(s). Allowed GFF formats are: RefSeq, NCBI and Prokka. This is only used when mapping_tbl_file == None. Defaults to None.
- email (str, optional):
E-mail for NCBI queries. This is only used when mapping_tbl_file == None. Defaults to None.
- contains_locus_tags (bool, optional):
Specifies if provided model has locus tags within the label tag if set to True. This is only used when mapping_tbl_file == None. Defaults to False.
- lab_strain (bool, optional):
Specifies if a strain from no database was provided and thus has only homolog mappings, if set to True. Defaults to False.
- kegg_organism_id (str, optional):
KEGG organism identifier if available. Defaults to None.
- reaction_direction (str, optional):
Path to a CSV file containing the BioCyc smart table with the columns
Reactions (MetaCyc ID) | EC-Number | KEGG reaction | METANETX | Reaction-Direction. For more details seecheck_direction()Defaults to None.
- outpath (str, optional):
Output path for mapping table from model ID to valid database IDs (if mapping_tbl_file == None) & incorrect annotations file(s). Defaults to None.
- Returns:
- libModel:
Polished libSBML model
- refinegems.curation.curate.polish_model_units(model: libsbml.Model) None[source]
Replaces the list of unit definitions with the unit definitions needed for FBA:
mmol per gDW per h
mmol per gDW
hour (h)
femto litre (fL)
- Args:
- model (libModel):
Model loaded with libSBML
- refinegems.curation.curate.resolve_duplicate_metabolites(model: cobra.Model, based_on: str = 'metanetx.chemical', replace: bool = True) cobra.Model[source]
Resolve duplicate metabolites in a model. Metabolites are considered duplicate if they share the same annotations (same or nan).
Note
Depending on the starting database, the results might differ.
- Args:
- model (cobra.Model):
The model loaded with COBRApy.
- based_on (str, optional):
Label to base the resolvement process on . Can be any annotation label. Defaults to ‘metanetx.chemical’.
- replace (bool, optional):
Either report the duplicates (False) or replace them with one (True). Defaults to True.
- Returns:
- cobra.Model:
The model.
- refinegems.curation.curate.resolve_duplicate_reactions(model: cobra.Model, based_on: str = 'reaction', remove_reac: bool = True) cobra.Model[source]
Resolve and remove duplicate reaction based on their reaction equation and matching database identifiers. Only if all match or a comparison with nan occurs will one of the reactions be removed.
- Args:
- model (cobra.Model):
A model loaded with COBRApy.
- based_on (str, optional):
Label to base the resolvement process on . Can be ‘reaction’ or any other annotation label. Defaults to ‘reaction’.
- remove_reac (bool, optional):
When True, combines and remove duplicates. Otherwise only reports the findings. Defaults to True.
- Returns:
- cobra.Model:
The model.
- refinegems.curation.curate.resolve_duplicates(model: cobra.Model, check_reac: bool = True, check_meta: Literal['default', 'exhaustive', 'skip'] = 'default', replace_dupl_meta: bool = True, remove_unused_meta: bool = False, remove_dupl_reac: bool = True) cobra.Model[source]
Resolve and remove (optional) duplicate metabolites and reactions in the model.
- Args:
- model (cobra.Model):
The model loaded with COBRApy.
- check_reac (bool, optional):
Whether to check reactions for duplicates. Defaults to True.
- check_meta (Literal[‘default’,’exhaustive’,’skip’], optional):
Whether to check for duplicate metabolites. Defaults to ‘default’.
- replace_dupl_meta (bool, optional):
Option to replace/remove duplicate metabolites. Defaults to True.
- remove_unused_meta (bool, optional):
Option to remove unused metabolites. Defaults to False.
- remove_dupl_reac (bool, optional):
Option to combine/remove duplicate reactions. Defaults to True.
- Returns:
- cobra.Model:
The (edited) model.
- refinegems.curation.curate.set_initial_amount_metabs(model: libsbml.Model)[source]
Sets initial amount to all metabolites if not already set or if initial concentration is not set
- Args:
- model (libModel):
Model loaded with libSBML
- refinegems.curation.curate.set_model_default_units(model: libsbml.Model)[source]
Sets default units of model
- Args:
- model (libModel):
Model loaded with libSBML
pathways module
This module provides functions for adding, handling and analysing the KEGG pathways (or more specific their annotations) contained in a model.
- refinegems.curation.pathways.add_kegg_pathways(model, kegg_pathways) libsbml.Model[source]
Add KEGG reactions as BQB_OCCURS_IN.
- Args:
- model (libModel):
Model loaded with libSBML. Output of
load_model_enable_groups().
- kegg_pathways (dict):
Reaction Id as key and KEGG Pathway Id as value, e.g. see output of
find_kegg_pathways().
- Returns:
libModel: Modified model with KEGG pathways.
- refinegems.curation.pathways.create_pathway_groups(model: libsbml.Model, pathway_groups) libsbml.Model[source]
Use group module to add reactions to KEGG pathway.
- Args:
- model (libModel):
Model loaded with libSBML. Output of
load_model_enable_groups().
- pathway_groups (dict):
KEGG Pathway Id as key and reactions Ids as values, e.g. see output of
_invert_reac_pathway_dict().
- Returns:
- libModel:
Modified model with groups for pathways.
- refinegems.curation.pathways.find_kegg_pathways(mapped_reacs: dict, viaEC: bool = False, viaRC: bool = False) dict[source]
Given a dictionary of reaction IDs mapped to KEGG reaction IDs and/or EC numbers, extract the KEGG pathways for each reaction based on the KEGG reaction ID.
- Args:
- mapped_reacs (dict):
Dictionary containing the information about the reactions. For more information see,
_extract_kegg_ec_from_reac().
- viaEC (bool, optional):
If True, also tries mapping to pathways via EC number, if via reaction ID is unsuccessful. Defaults to False.
- viaRC (bool, optional):
If True, also tries mapping to pathways via reaction class, if via reaction ID is unsuccessful. Defaults to False.
- Returns:
- dict:
Dictionary with the reaction IDs as keys and a list of KEGG pathway IDs as values.
- refinegems.curation.pathways.kegg_pathway_analysis(model: cobra.Model) KEGGPathwayAnalysisReport[source]
Analyse the pathways that are covered by the model.
The analysis is based on the KEGG pathway classification and the available KEGG pathway identifiers present in the model.
Note: one reaction can have multiple pathway identifiers associated with it. This analysis focuses on the total number of IDs found within the model.
- Args:
- model (cobra.Model):
A model loaded with COBRApy.
- Returns:
- KEGGPathwayAnalysisReport:
The KEGG pathway analysis report.
- refinegems.curation.pathways.load_model_enable_groups(modelpath: str) libsbml.Model[source]
Loads model as document using libSBML and enables groups extension
- Args:
- modelpath (str):
Path to GEM
- Returns:
- libModel:
Model loaded with libSBML
- refinegems.curation.pathways.set_kegg_pathways(modelpath: str, viaEC: bool = False, viaRC: bool = False) tuple[libsbml.Model, list[str]][source]
Executes all steps to add KEGG pathways as groups
- Args:
- modelpath (str):
Path to GEM.
- Returns:
- tuple:
libSBML model (1) & List of reactions without KEGG Id (2)
libModel: Modified model with Pathways as groups
list: Ids of reactions without KEGG annotation