refinegems.curation

biomass module

Biomass objective function helpers.

Most functions within this module were adapted from MEMOTE and modified by Gwendolyn O. Döbel. MEMOTE is distributed under the Apache License 2.0: https://github.com/opencobra/memote/blob/develop/LICENSE

This module provides functions to normalise the biomass objective functions.

refinegems.curation.biomass.check_normalise_biomass(model: cobra.Model, cycles: int = 10) → cobra.Model | None[source]

Checks if at least one biomass reaction is present

For each found biomass reaction checks if it sums up to 1g[CDW]

Normalises the coefficients of each biomass reaction where the sum is not 1g[CDW] until the sum is 1g[CDW]

Returns model with adjusted biomass function(s)

Args:

model (cobraModel):
Model loaded with COBRApy
cycles (int, optional):
Maximal number of optiomisation cycles that will be run. Used to avoid endless optiomisation cycles.

Returns:

cobraModel:: COBRApy model with adjusted biomass functions

charges module

Provides functions for adding charges to metabolites

When iterating through all metabolites present in a model, you will find several which have no defined charge (metab.getPlugin('fbc').isSetCharge() = false). This can lead to charge imbalanced reactions. This script takes information on metabolite charges from the ModelSEED database. A charge is automatically added to a metabolite if it has no defined charge and if there is only one charge denoted in ModelSEED. When multiple charges are present, the metabolite and the possible charges are noted and later returned in a dictionary.

It is possible to use the correct_charges_from_db function with other databases. The user just needs to make sure that the compounds dataframe has a ‘BiGG’ and a ‘charge’ column.

refinegems.curation.charges.correct_charges_from_db(model: libsbml.Model, compounds: pandas.DataFrame) → tuple[libsbml.Model, dict][source]

Adds charges taken from given database to metabolites which have no defined charge

Args:

model (libModel):
Model loaded with libsbml
compounds (pd.DataFrame):
Containing database data with ‘BiGG’ (BiGG-Ids) and ‘charge’ (float or int) as columns

Returns:

tuple:

libSBML model (1) & dictionary ‘metabolite_id’: list(charges) (2)

libModel: Model with added charges
dict: Metabolites with respective multiple charges

refinegems.curation.charges.correct_charges_modelseed(model: libsbml.Model) → tuple[libsbml.Model, dict][source]

Wrapper function which completes the steps to charge correction with the ModelSEED database

Args:

model (libModel):
Model loaded with libsbml

Returns:

tuple:

libSBML model (1) & dictionary ‘metabolite_id’: list(charges) (2)

libModel: Model with added charges
dict: Metabolites with respective multiple charges

curate module

General functions for curating a model

This module provides functionalities for curating models, including special functions for CarveMe models.

Since CarveMe version 1.5.1, the draft models from CarveMe contain pieces of information that are not correctly added to the annotations. To address this, this module includes the following functionalities:

Add URIs from the entity IDs to the annotation field for metabolites & reactions

Transfer URIs from the notes field to the annotations for metabolites & reactions

Add URIs from the GeneProduct IDs to the annotations

The functionalities for CarveMe models, along with some of the following further functionalities, are gathered in the main function polish_model().

Further functionalities:

Setting boundary condition & constant for metabolites & reactions

Unit handling to add units & UnitDefinitions & to set units for parameters

Addition of default settings for compartments & metabolites

Addition of URIs to GeneProducts

via a mapping from model IDs to valid database IDs

via the KEGG API

Changing the CURIE pattern/CVTerm qualifier & qualifier type

Directionality control

refinegems.curation.curate.NH_PATTERN = re.compile('nh[3-4]')

refinegems.curation.curate.add_compartment_structure_specs(model: libsbml.Model) → None[source]

Adds the required specifications for the compartment structure

if not set (size & spatial dimension)

Args:

model (libModel):
Model loaded with libSBML

refinegems.curation.curate.check_direction(model: cobra.Model, data: pandas.DataFrame | str, exclude: None | tuple[Literal['annotation', 'notes'], str, str] = None) → cobra.Model[source]

Check the direction of reactions by searching for matching MetaCyc, KEGG and MetaNetX IDs as well as EC number in a downloaded BioCyc (MetaCyc) database table or dataFrame (need to contain at least the following columns:

Reaction | EC-Number | KEGG reaction | METANETX | Reaction-Direction

The Reaction column should contain the BioCyc/MetaCyc ID (withou the META: etc. prefix)

Args:

model (cobra.Model):: The model loaded with COBRApy.
data (pd.DataFrame | str):: Either a pandas DataFrame or a path to a CSV file containing the BioCyc smart table.
exclude (None | tuple(Literal[‘annotation’,’notes’], str, str), optional):: Tuple containing the type of exclusion (‘annotation’ or ‘notes’), the key to check, and the value to determine exclusion of the reaction. If not tuple is given (None), no reaction is excluded. Defaults to None

Raises:

TypeError: Unknown data type for parameter data

Returns:

cobra.Model:: The edited model.

refinegems.curation.curate.extend_gp_annots_via_KEGG(gene_list: list[libsbml.GeneProduct], kegg_organism_id: str, prefixes2remove: str | list[str] = '') → None[source]

Adds KEGG gene & UniProt identifiers to the GeneProduct annotations

Note

This function infers the KEGG Gene ID based on the Genbank locus tag stored in the GeneProduct labels in the model and the KEGG Organism ID. If the locus tag from Genbank and the locus tag part from the KEGG Gene ID for your organism do not match, please provide a prefix or a list of prefixes to remove from the locus tag in the prefixes2remove argument.

Args:

gene_list (list[GeneProduct]):
libSBML ListOfGenes
kegg_organism_id (str):
Organism identifier in the KEGG database
prefixes2remove (Union[str,list[str]], optional):
Prefix(es) to remove from the locus tag to get a valid KEGG Gene ID. Defaults to empty string (‘’).

refinegems.curation.curate.extend_gp_annots_via_mapping_table(model: libsbml.Model, mapping_tbl_file: str | Path = None, gff_paths: list[str] = None, email: str = None, contains_locus_tags: bool = False, lab_strain: bool = False, outpath: str = None) → libsbml.Model[source]

Extend GenePoduct annotations via mapping table.

If no mapping table is provided, a mapping table will be generated.

Args:

model (libModel):
Model loaded with libSBML
mapping_tbl_file (str|Path, optional):
Path to a file containing a mapping table with columns model_id | X... where X can be REFSEQ, NCBI, locus_tag or UNCLASSIFIED. The table can contain all of the X columns or at least one of them. Defaults to None.
gff_paths (list[str], optional):
Path(s) to GFF file(s). Allowed GFF formats are: RefSeq, NCBI and Prokka. This is only used when mapping_tbl_file == None. Defaults to None.
email (str, optional):
E-mail for NCBI queries. This is only used when mapping_tbl_file == None. Defaults to None.
contains_locus_tags (bool, optional):
Specifies if provided model has locus tags within the label tag if set to True. This is only used when mapping_tbl_file == None. Defaults to False.
lab_strain (bool, optional):
Specifies if a strain from no database was provided and thus has only homolog mappings if set to True. Defaults to False.
outpath (str, optional):
Output path for location where the generated mapping table should be written to. This is only used when mapping_tbl_file == None. Defaults to None.

Returns:

libModel:: Modified model with extended annotations for the GeneProducts

refinegems.curation.curate.extend_metab_reac_annots_via_id(entity_list: libsbml.ListOfSpecies | libsbml.ListOfReactions, id_db: str) → None[source]

Extends metabolite or reaction annotations by extracting the core ID from the entity ID and adding this ID as valid annotation if possible

Args:

entity_list (Union[ListOfSpecies, ListOfReactions]):
List of entities, either species (metabolites, ListOfSpecies) or reactions (ListOfReactions)
id_db (str):
The database prefix to validate IDs against. Must correspond to a valid prefix in the Bioregistry

Raises:

TypeError: Unsupported type for entity_list

refinegems.curation.curate.extend_metab_reac_annots_via_notes(entity_list: libsbml.ListOfSpecies | libsbml.ListOfReactions) → None[source]

Extends metabolite or reaction annotations by extracting valid URIs from the notes section of the provided entities and cleans up the notes by removing processed elements

Args:

entity_list (Union[ListOfSpecies, ListOfReactions]):
List of entities, either species (metabolites, ListOfSpecies) or reactions (ListOfReactions)

Raises:

TypeError: Unsupported type for entity_list

refinegems.curation.curate.fix_compartments(model: libsbml.Model) → libsbml.Model[source]

Fixes compartments in a model

By adding missing compartments based on metabolite IDs if not set
By checking for valid compartment IDs & adjusting them if necessary
By setting the size and spatial dimension if not set

Args:

model (libModel):
Model loaded with libSBML

Returns:

libModel:: Model as libSBML model with adjusted compartments

refinegems.curation.curate.fix_reac_bounds(model: cobra.Model) → None[source]

Check the model`s reaction bounds and adjust values, if they are likely to cause problems.

If the lower bound is greater than 0.0 or the upper bound is greater than 0.0, they are set to 0.0. If both cases appear at the same time, the value are switches, as it is assumed, that the reaction direction got messed up.

Args:

model (cobra.Model):
The model to check loaded with COBRApy.

refinegems.curation.curate.polish_entity_conditions(entity_list: libsbml.ListOfSpecies | libsbml.ListOfReactions) → None[source]

Sets boundary condition and constant if not set for an entity

Args:

entity_list (Union[ListOfSpecies, ListOfReactions]):
libSBML ListOfSpecies or ListOfReactions

refinegems.curation.curate.polish_model(model: libsbml.Model, id_db: str = 'BiGG', mapping_tbl_file: str = None, gff_paths: list[str] = None, email: str = None, contains_locus_tags: bool = False, lab_strain: bool = False, kegg_organism_id: str = None, prefixes2remove_kegg: list[str] | str = '', outpath: str = None) → libsbml.Model[source]

Completes all steps to polish a model

Note

So far only tested for models having either BiGG or VMH identifiers.

Args:

model (libModel):
Model loaded with libSBML
id_db (str, optional):
Main database where identifiers in model come from. Defaults to ‘BiGG’.
mapping_tbl_file (str, optional):
Path to a file containing a mapping table with columns model_id | X... where X can be REFSEQ, NCBI, locus_tag or UNCLASSIFIED. The table can contain all of the X columns or at least one of them. Defaults to None.
gff_paths (list[str], optional):
Path(s) to GFF file(s). Allowed GFF formats are: RefSeq, NCBI and Prokka. This is only used when mapping_tbl_file == None. Defaults to None.
email (str, optional):
E-mail for NCBI queries. This is only used when mapping_tbl_file == None. Defaults to None.
contains_locus_tags (bool, optional):
Specifies if provided model has locus tags within the label tag if set to True. This is only used when mapping_tbl_file == None. Defaults to False.
lab_strain (bool, optional):
Specifies if a strain from no database was provided and thus has only homolog mappings, if set to True. Defaults to False.
kegg_organism_id (str, optional):
KEGG organism identifier if available. Defaults to None.
prefixes2remove_kegg (Union[str,list[str]], optional):
Prefix(es) to remove from the locus tag to get a valid KEGG Gene ID. Defaults to empty string (‘’).
outpath (str, optional):
Output path for mapping table from model ID to valid database IDs (if mapping_tbl_file == None) & incorrect annotations file(s). Defaults to None.

Returns:

libModel:: Polished libSBML model

refinegems.curation.curate.polish_model_metadata(model: libsbml.Model) → None[source]

refinegems.curation.curate.polish_model_units(model: libsbml.Model) → None[source]

Replaces the list of unit definitions with the unit definitions needed for FBA:

mmol per gDW per h

mmol per gDW

hour (h)

femto litre (fL)

Args:

model (libModel):
Model loaded with libSBML

refinegems.curation.curate.prune_mass_unbalanced_reacs(model: cobra.Model) → None[source]

Prune mass unbalanced reactions from a model. Reactions that are part of the biomass function or boundary reactions are not pruned, even if they are mass unbalanced. Metabolites and genes that become orphaned due to the pruning are also removed.

Args:

model (cobra.Model):
The input model, loaded with COBRApy.

refinegems.curation.curate.resolve_duplicate_metabolites(model: cobra.Model, based_on: str = 'metanetx.chemical', replace: bool = True) → cobra.Model[source]

Resolve duplicate metabolites in a model. Metabolites are considered duplicate if they share the same annotations (same or nan).

Note

Depending on the starting database, the results might differ.

Args:

model (cobra.Model):
The model loaded with COBRApy.
based_on (str, optional):
Label to base the resolvement process on . Can be any annotation label. Defaults to ‘metanetx.chemical’.
replace (bool, optional):
Either report the duplicates (False) or replace them with one (True). Defaults to True.

Returns:

cobra.Model:: The model.

refinegems.curation.curate.resolve_duplicate_reactions(model: cobra.Model, based_on: str = 'reaction', remove_reac: bool = True) → cobra.Model[source]

Resolve and remove duplicate reaction based on their reaction equation and matching database identifiers. Only if all match or a comparison with nan occurs will one of the reactions be removed.

Args:

model (cobra.Model):
A model loaded with COBRApy.
based_on (str, optional):
Label to base the resolvement process on . Can be ‘reaction’ or any other annotation label. Defaults to ‘reaction’.
remove_reac (bool, optional):
When True, combines and remove duplicates. Otherwise only reports the findings. Defaults to True.

Returns:

cobra.Model:: The model.

refinegems.curation.curate.resolve_duplicates(model: cobra.Model, check_reac: bool = True, check_meta: Literal['default', 'exhaustive', 'skip'] = 'default', replace_dupl_meta: bool = True, remove_unused_meta: bool = False, remove_dupl_reac: bool = True) → cobra.Model[source]

Resolve and remove (optional) duplicate metabolites and reactions in the model.

Args:

model (cobra.Model):
The model loaded with COBRApy.
check_reac (bool, optional):
Whether to check reactions for duplicates. Defaults to True.
check_meta (Literal[‘default’,’exhaustive’,’skip’], optional):
Whether to check for duplicate metabolites. Defaults to ‘default’.
replace_dupl_meta (bool, optional):
Option to replace/remove duplicate metabolites. Defaults to True.
remove_unused_meta (bool, optional):
Option to remove unused metabolites. Defaults to False.
remove_dupl_reac (bool, optional):
Option to combine/remove duplicate reactions. Defaults to True.

Returns:

cobra.Model:: The (edited) model.

refinegems.curation.curate.set_initial_amount_metabs(model: libsbml.Model) → None[source]

Sets initial amount to all metabolites if not already set or if initial concentration is not set

Args:

model (libModel):
Model loaded with libSBML

refinegems.curation.curate.set_model_default_units(model: libsbml.Model) → None[source]

Sets default units of model

Args:

model (libModel):
Model loaded with libSBML

refinegems.curation.curate.set_units_of_parameters(model: libsbml.Model) → None[source]

Sets units of parameters in model

Args:

model (libModel):
Model loaded with libSBML

refinegems.curation.curate.update_annotations_from_others(model: libsbml.Model) → libsbml.Model[source]

Synchronizes metabolite annotations for core, periplasm and extracelullar

Args:

model (libModel):
Model loaded with libSBML

Returns:

libModel:: Modified model with synchronized annotations

miriam module

General functions to conform annotations to the MIRIAM standards

The functions can be used to change the CURIE pattern and to clean the CURIEs, CVTerm qualifiers and qualifier types up.

refinegems.curation.miriam._is_valid_bioregistry_uri(bioregistry, uri: str) → bool[source]: Return whether a URI parses to a valid bioregistry identifier.

refinegems.curation.miriam._parse_bioregistry_uri(bioregistry, uri: str) → tuple[str, str] | None[source]: Parse a URI while tolerating incompatible bioregistry/curies releases.

refinegems.curation.miriam.add_uri_set(entity: libsbml.SBase, qt, b_m_qt, uri_set: sortedcontainers.SortedSet.<class 'str'>) → list[str][source]

Add a complete URI set to the provided CVTerm

Args:

entity (SBase):
A libSBML SBase object like model, GeneProduct, etc.
qt:
A libSBML qualifier type: BIOLOGICAL_QUALIFIER|MODEL_QUALIFIER
b_m_qt:
A libSBML biological or model qualifier type like BQB_IS|BQM_IS
uri_set (SortedSet[str]):
SortedSet containing URIs

refinegems.curation.miriam.change_all_qualifiers(model: libsbml.Model, lab_strain: bool) → libsbml.Model[source]

Wrapper function to change qualifiers of all entities at once

Args:

model (libModel):
Model loaded with libSBML
lab_strain (bool):
True if the strain was sequenced in a local lab

Returns:

libModel:: Model with all qualifiers updated to be MIRIAM compliant

refinegems.curation.miriam.change_qualifier_per_entity(entity: libsbml.SBase, new_qt, new_b_m_qt, specific_db_prefix: str = None) → list[source]

Updates Qualifiers to be MIRIAM compliant for an entity

Args:

entity (SBase):
A libSBML SBase object like model, GeneProduct, etc.
new_qt (Qualifier):
A libSBML qualifier type: BIOLOGICAL_QUALIFIER|MODEL_QUALIFIER
new_b_m_qt (QualifierType):
A libSBML biological or model qualifier type like BQB_IS|BQM_IS
specific_db_prefix (str):
Has to be set if only for a specific database the qualifier type should be changed. Can be ‘kegg.genes’, ‘biocyc’, etc.

Returns:

list:: CURIEs that are not MIRIAM compliant

refinegems.curation.miriam.change_qualifiers(model: libsbml.Model, entity_type: str, new_qt, new_b_m_qt, specific_db_prefix: str = None) → libsbml.Model[source]

Updates Qualifiers to be MIRIAM compliant for an entity type of a given model

Args:

model (libModel):
Model loaded with libSBML
entity_type (str):
Any string of the following: model|compartment|metabolite|parameter|reaction|unit definition|unit|gene product|group
new_qt (Qualifier):
A libSBML qualifier type: BIOLOGICAL_QUALIFIER|MODEL_QUALIFIER
new_b_m_qt (QualifierType):
A libSBML biological or model qualifier type like BQB_IS|BQM_IS
specific_db_prefix (str):
Has to be set if only for a specific database the qualifier type should be changed. Can be ‘kegg.genes’, ‘biocyc’, etc.

Returns:

libModel:: Model with changed qualifier for given entity type

refinegems.curation.miriam.generate_miriam_compliant_uri_set(prefix2id: sortedcontainers.SortedDict.slice(<class 'str'>, sortedcontainers.SortedSet.<class 'str'>, None)) → sortedcontainers.SortedSet.<class 'str'>[source]

Generate a set of complete MIRIAM compliant URIs from the provided prefix to identifier mapping

Args:

prefix2id (SortedDict[str: SortedSet[str]]):
Dictionary containing a mapping from database prefixes to their respective identifier sets

Returns:

SortedSet:: Sorted set containing complete URIs

refinegems.curation.miriam.generate_uri_set_with_old_pattern(prefix2id: sortedcontainers.SortedDict.slice(<class 'str'>, sortedcontainers.SortedSet.<class 'str'>, None)) → sortedcontainers.SortedSet.<class 'str'>[source]

Generate a set of complete URIs from the provided prefix to identifier mapping with the old MIRIAM pattern.

Args:

prefix2id (SortedDict[str: SortedSet[str]]):
Dictionary containing a mapping from database prefixes to their respective identifier sets

Returns:

SortedSet:: Sorted set containing complete URIs

refinegems.curation.miriam.get_set_of_curies(uri_list: list[str]) → tuple[sortedcontainers.SortedDict.slice(<class 'str'>, sortedcontainers.SortedSet.<class 'str'>, None), list[str]][source]

Gets a list of URIs & maps the database prefixes to their respective identifier sets

Args:

uri_list (list[str]):
List containing CURIEs

Returns:

tuple:

A sorted dictionary (1) & a list (2)

SortedDict: Sorted dictionary mapping database prefixes from the provided CURIEs to their respective identifier sets also provided by the CURIEs
list: List of CURIEs that are invalid according to bioregistry

refinegems.curation.miriam.improve_uri_per_entity(entity: libsbml.SBase, new_pattern: bool) → list[str][source]

Helper function: Removes duplicates & changes pattern according to new_pattern

Args:

entity (SBase):
A libSBML SBase object, either a model or an entity
new_pattern (bool):
True if new pattern is wanted, otherwise False

Returns:

list:: List of all collected invalid CURIEs of one entity

refinegems.curation.miriam.improve_uris(entities: libsbml.SBase, new_pattern: bool) → dict[slice(<class 'str'>, list[str], None)][source]

Removes duplicates & changes pattern according to bioregistry or new_pattern

Args:

entities (SBase):
A libSBML SBase object, either a model or a list of entities
bioregistry (bool):
Specifies whether the URIs should be changed with the help of bioregistry to be MIRIAM compliant or changed according to new or old pattern
new_pattern (bool):
True if new pattern is wanted, otherwise False

Returns:

dict:: Mapping of entity identifier to list of corresponding invalid CURIEs

refinegems.curation.miriam.polish_annotations(model: libsbml.Model, new_pattern: bool, outpath: str = None) → libsbml.Model[source]

Polishes all annotations in a model such that no duplicates are present & the same pattern is used for all CURIEs

Args:

model (libModel):
Model loaded with libSBML
new_pattern (bool):
True if new pattern is wanted, otherwise False. Note that bioregistry internally only uses the new patter.
outpath (str, optional):
Path to output file for invalid CURIEs detected by improve_uris Defaults to None.

Returns:

libModel:: libSBML model with polished annotations

pathways module

This module provides functions for adding, handling and analysing the KEGG pathways (or more specific their annotations) contained in a model.

refinegems.curation.pathways.add_kegg_pathways(model, kegg_pathways) → libsbml.Model[source]

Add KEGG reactions as BQB_OCCURS_IN.

Args:

model (libModel):
Model loaded with libSBML and groups enabled. (To enable groups you can use enable_groups().)
kegg_pathways (dict):
Reaction Id as key and KEGG Pathway Id as value, e.g. see output of find_kegg_pathways().

Returns:

libModel: Modified model with KEGG pathways.

refinegems.curation.pathways.create_pathway_groups(model: libsbml.Model, pathway_groups) → libsbml.Model[source]

Use group module to add reactions to KEGG pathway.

Args:

model (libModel):
Model loaded with libSBML and groups enabled. (To enable groups you can use enable_groups().)
pathway_groups (dict):
KEGG Pathway Id as key and reactions Ids as values, e.g. see output of _invert_reac_pathway_dict().

Returns:

libModel:: Modified model with groups for pathways.

refinegems.curation.pathways.enable_groups(model: libsbml.Model) → libsbml.Model[source]

Enables groups extension

Args:

model (libModel):
A model loaded with libSBML.

Returns:

libModel:: libSBML model with groups enabled

refinegems.curation.pathways.find_kegg_pathways(mapped_reacs: dict, viaEC: bool = False, viaRC: bool = False) → dict[source]

Given a dictionary of reaction IDs mapped to KEGG reaction IDs and/or EC numbers, extract the KEGG pathways for each reaction based on the KEGG reaction ID.

Args:

mapped_reacs (dict):
Dictionary containing the information about the reactions. For more information see, _extract_kegg_ec_from_reac().
viaEC (bool, optional):
If True, also tries mapping to pathways via EC number, if via reaction ID is unsuccessful. Defaults to False.
viaRC (bool, optional):
If True, also tries mapping to pathways via reaction class, if via reaction ID is unsuccessful. Defaults to False.

Returns:

dict:: Dictionary with the reaction IDs as keys and a list of KEGG pathway IDs as values.

refinegems.curation.pathways.kegg_pathway_analysis(model: cobra.Model) → KEGGPathwayAnalysisReport[source]

Analyse the pathways that are covered by the model.

The analysis is based on the KEGG pathway classification and the available KEGG pathway identifiers present in the model.

Note: one reaction can have multiple pathway identifiers associated with it. This analysis focuses on the total number of IDs found within the model.

Args:

model (cobra.Model):
A model loaded with COBRApy.

Returns:

KEGGPathwayAnalysisReport:: The KEGG pathway analysis report.

refinegems.curation.pathways.logger = <Logger refinegems.curation.pathways (INFO)>: If your organism occurs in the KEGG database, extract the KEGG reaction ID from the annotations of your reactions and identify, in which KEGG pathways this reaction occurs. Add all KEGG pathways for a reaction then as annotations with the biological qualifier OCCURS_IN to the respective reaction.

refinegems.curation.pathways.set_kegg_pathways(model: libsbml.Model, viaEC: bool = False, viaRC: bool = False) → list[str][source]

Executes all steps to add KEGG pathways as groups to a given model.

Changes the model in-place.

Args:

model (libModel):
Model loaded with libSBML

Returns:

list: Ids of reactions without KEGG annotation