refinegems.utility

connection module

Provides functions / connections to other tools for easier access and usage.

refinegems.utility.connections.adjust_BOF(genome: str, model_file: str, model: cobra.Model, dna_weight_fraction: float, weight_frac: float) str[source]

Adjust the model’s BOF using BOFdat. Currently implemented are step 1 DNA coefficients and step 2.

Args:
  • genome (str):

    Path to the genome (e.g. .fna) FASTA file.

  • model_file (str):

    Path to the sbml (.xml) file of the model.

  • model (cobra.Model):

    The genome-scale metabolic model (from the string above), loaded with COBRApy.

  • dna_weight_fraction (float):

    DNA weight fraction for BOF step 1.

  • weight_frac (float):

    Weight fraction for the second step of BOFdat (enzymes and ions)

Returns:
str:

The updated BOF reaction as a reaction string.

refinegems.utility.connections.filter_DIAMOND_blastp_results(blasttsv: str, pid_theshold: float = 90.0) pandas.DataFrame[source]

Filter the results of a DIAMOND BLASTp run (see run_DIAMOND_blastp()) by percentage identity value (PID) and extract the matching pairs of query and subject IDs.

Args:
  • blasttsv (str):

    Path to the DIAMOND BLASTp result file.

  • pid_theshold (float, optional):

    Threshold value for the PID. Given in percent. Defaults to 90.0.

Raises:
  • ValueError: PID threshold has to be between 0.0 and 100.0

Returns:
pd.DataFrame:

A table with the columns query_ID and subject_ID containing hits from BLAST run with s PID higher than the given threshold value.

refinegems.utility.connections.get_memote_score(memote_report: dict) float[source]

Extracts MEMOTE score from report

Args:
Returns:
float:

MEMOTE score

refinegems.utility.connections.perform_mcc(model: cobra.Model, dir: str, apply: bool = True) cobra.Model[source]

Run the MassChargeCuration toll on the model and optionally directly apply the solution.

Args:
  • model (cobra.Model):

    The model to use the tool on.

  • dir (str):

    Path of the directory to save MCC output in.

  • apply (bool, optional):

    If True, model is directly updated with the results. Defaults to True.

Returns:
cobra.Model:

The model (updated or not)

refinegems.utility.connections.run_DIAMOND_blastp(fasta: str, db: str, sensitivity: Literal['sensitive', 'more-sensitive', 'very-sensitive', 'ultra-sensitive'] = 'more-sensitive', coverage: float = 95.0, threads: int = 2, outdir: str = None, outname: str = 'DIAMOND_blastp_res.tsv') str[source]

Run DIAMOND in BLASTp mode.

Args:
  • fasta (str):

    The FASTA file to BLAST for.

  • db (str):

    The DIAMOND database file to BLAST against

  • sensitivity (Literal[‘sensitive’, ‘more-sensitive’, ‘very-sensitive’,’ultra-sensitive’], optional):

    Sensitivity mode for DIAMOND. Defaults to ‘more-sensitive’.

  • coverage (float, optional):

    A parameter for DIAMOND Coverage theshold for the hits. Defaults to 95.0.

  • threads (int, optional):

    A parameter for DIAMOND. Number of threds to be used while BLASTing. Defaults to 2.

  • outdir (str, optional):

    Path to a directory to write the output files to. Defaults to None.

  • outname (str, optional):

    Name of the result file (name only, not a path). Defaults to ‘DIAMOND_blastp_res.tsv’.

Returns:
str:

Path to the results of the DIAMOND BLASTp run.

refinegems.utility.connections.run_SBOannotator(model: libsbml.Model) libsbml.Model[source]

Run SBOannotator on a model to annotate the SBO terms.

Args:
  • model (libModel):

    The model loaded with libSBML.

Returns:
libModel:

The model with corrected / added SBO terms.

refinegems.utility.connections.run_memote(model: cobra.Model, type: Literal['json', 'html'] = 'html', return_res: bool = False, save_res: str | None = None, verbose: bool = False) dict | str | None[source]

Run the memote snapshot function on a given model loaded with COBRApy.

Args:
  • model (cobra.Model):

    The model loaded with COBRApy.

  • type (Literal[‘json’,’html’], optional):

    Type of report to produce. Can be ‘html’ or ‘json’. Defaults to ‘html’.

  • return_res (bool, optional):

    Option to return the result. Defaults to False.

  • save_res (str | None, optional):

    If given a path string, saves the report under the given path. Defaults to None.

  • verbose (bool, optional):

    Produce a more verbose ouput. Defaults to False.

Raises:
  • ValueError: Unknown input for parameter type

Returns:
  1. Case return_res = True and type = json:
    dict:

    The json dictionary.

  2. Case return_res = True and type = html:
    str:

    The html string.

  3. Case return_res = False:
    None:

    no return

cvterms module

Helper module to work with annotations (CVTerms)

Stores dictionaries which hold information the identifiers.org syntax, has functions to add CVTerms to different entities and parse CVTerms.

refinegems.utility.cvterms.DB2PREFIX_GENES
refinegems.utility.cvterms.DB2PREFIX_METABS
refinegems.utility.cvterms.DB2PREFIX_PATHWAYS
refinegems.utility.cvterms.DB2PREFIX_REACS
refinegems.utility.cvterms.MIRIAM
refinegems.utility.cvterms.OLD_MIRIAM
refinegems.utility.cvterms.PREFIX2DB_GENES
refinegems.utility.cvterms.PREFIX2DB_METABS
refinegems.utility.cvterms.PREFIX2DB_PATHWAYS
refinegems.utility.cvterms.PREFIX2DB_REACS
refinegems.utility.cvterms._add_annotations_from_dict_cobra(references: dict, entity: cobra.Reaction | cobra.Metabolite | cobra.Model) None[source]

Given a dictionary and a cobra object, add the former as annotations to the latter. The keys of the dictionary are used as the annotation labels, the values as the values. If the keys are already in the entity, the values will be combined (union).

Args:
  • references (dict):

    The dictionary with the references to add the entity.

  • entity (cobra.Reaction | cobra.Metabolite | cobra.Model):

    The entity to add annotations to.

refinegems.utility.cvterms.add_cv_term_genes(entry: str, db_id: str, gene: libsbml.GeneProduct, lab_strain: bool = False)[source]

Adds CVTerm to a gene

Args:
  • entry (str):

    Id to add as annotation.

  • db_id (str):

    Database to which entry belongs. Must be in DB2PREFIX_GENES.keys().

  • gene (GeneProduct):

    Gene to add CVTerm to.

  • lab_strain (bool, optional):

    For locally sequenced strains the qualifiers are always HOMOLOG_TO. Defaults to False.

refinegems.utility.cvterms.add_cv_term_metabolites(entry: str, db_id: str, metab: libsbml.Species)[source]

Adds CVTerm to a metabolite

Args:
  • entry (str):

    Id to add as annotation

  • db_id (str):

    Database to which entry belongs. Must be in DB2PREFIX_METABS.keys().

  • metab (Species):

    Metabolite to add CVTerm to

refinegems.utility.cvterms.add_cv_term_pathways(entry: str, db_id: str, path: libsbml.Group)[source]

Add CVTerm to a groups pathway

Args:
  • entry (str):

    Id to add as annotation

  • db_id (str):

    Database to which entry belongs. Must be in DB2PREFIX_PATHWAYS.keys().

  • path (Group):

    Pathway to add CVTerm to

refinegems.utility.cvterms.add_cv_term_pathways_to_entity(entry: str, db_id: str, reac: libsbml.Reaction)[source]

Add CVTerm to a reaction as OCCURS IN pathway

Args:
  • entry (str):

    Id to add as annotation

  • db_id (str):

    Database to which entry belongss

  • reac (Reaction):

    Reaction to add CVTerm to

refinegems.utility.cvterms.add_cv_term_reactions(entry: str, db_id: str, reac: libsbml.Reaction)[source]

Adds CVTerm to a reaction

Args:
  • entry (str):

    Id to add as annotation

  • db_id (str):

    Database to which entry belongs. Must be in DB2PREFIX_REACS.keys().

  • reac (Reaction):

    Reaction to add CVTerm to

refinegems.utility.cvterms.add_cv_term_units(unit_id: str, unit: libsbml.Unit, relation: int)[source]

Adds CVTerm to a unit

Args:
  • unit_id (str):

    ID to add as URI to annotation

  • unit (Unit):

    Unit to add CVTerm to

  • relation (int):

    Provides model qualifier to be added

refinegems.utility.cvterms.generate_cvterm(qt, b_m_qt) libsbml.CVTerm[source]

Generates a CVTerm with the provided qualifier & biological or model qualifier types

Args:
  • qt (libSBML qualifier type):

    BIOLOGICAL_QUALIFIER or MODEL_QUALIFIER

  • b_m_qt (libSBML qualifier):

    BQM_IS, BQM_IS_HOMOLOG_TO, etc.

Returns:
CVTerm:

With provided qualifier & biological or model qualifier types

refinegems.utility.cvterms.get_id_from_cv_term(entity: libsbml.SBase, db_id: str) list[str][source]

Extract Id for a specific database from CVTerm

Args:
  • entity (SBase):

    Species, Reaction, Gene, Pathway

  • db_id (str):

    Database of interest

Returns:
list[str]:

Ids of entity belonging to db_id

refinegems.utility.cvterms.print_cvterm(cvterm: libsbml.CVTerm)[source]

Debug function: Prints the URIs contained in the provided CVTerm along with the provided qualifier & biological/model qualifier types

Args:
cvterm (CVTerm):

A libSBML CVTerm

databases module

Variables, functions and more for the developement, extension and maintainance of the in-build database.

Note

Some functionalities for handling and dealing with the media are in the medium module.

Hint

Further functions for accessing the database can be found in the io module, e.g. load_a_table_from_database()

refinegems.utility.databases.PATH_TO_DB
refinegems.utility.databases.PATH_TO_DB_FOLDER
refinegems.utility.databases.VERSION_FILE
refinegems.utility.databases.VERSION_URL = 'http://bigg.ucsd.edu/api/v2/database_version'
class refinegems.utility.databases.ValidationCodes(value, names=_not_given, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Validation codes for the database

Args:
  • Enum (Enum):

    Provided as input to get a number mapping for the codes

BIGG = (2,)
BIGG_MEDIA = (4,)
BIGG_MSEED_COMPOUNDS = (6,)
COMPLETE = (0,)
EMPTY = (1,)
MEDIA = (3,)
MEDIA_MSEED_COMPOUNDS = 7
MODELSEED_COMPOUNDS = (5,)
refinegems.utility.databases.create_media_database(db_cursor: Cursor)[source]

Creates the media database with 4 tables (‘medium’, ‘substance’, ‘substance2db’, ‘medium2substance’) from file ‘./data/database/media_db.sql’

Args:
  • db_cursor (sqlite3.Cursor):

    Cursor from open connection to the database (data.db)

refinegems.utility.databases.get_latest_bigg_databases(db_connection: Connection, is_missing: bool = True)[source]

Gets the latest BiGG tables for metabolites & reactions if:

  • No version file is locally available

  • The version in the local version file is NOT the latest

  • No BiGG tables currently exist in the database

Args:
  • db_connection (sqlite3.Connection):

    Open connection to the database (data.db)

  • is_missing (bool, optional):

    True if no BiGG tables are in the database. Defaults to True.

refinegems.utility.databases.get_modelseed_compounds_database(db_connection: Connection)[source]

Retrieves the compounds table from ModelSEED from the respective GitHub repository

Args:
  • db_connection (sqlite3.Connection):

    Open connection to the database (data.db)

refinegems.utility.databases.initialise_database()[source]

Initialises/updates the database (data.db)

After initialisation the database contains:

  • 2 tables with names ‘bigg_metabolites’ & ‘bigg_reactions’

  • 6 tables with names ‘medium’, ‘substance’, ‘medium2substance’, ‘substance2db’, ‘subset’ & ‘subset2substance’

  • 1 table with name ‘modelseed_compounds’

refinegems.utility.databases.is_valid_database(db_cursor: Cursor) int[source]

Verifies if database has:

  • 2 tables with names ‘bigg_metabolites’ & ‘bigg_reactions’

  • 6 tables with names ‘medium’, ‘substance’, ‘substance2db’ & ‘medium2substance’, ‘subset’ & ‘subset2substance’

  • 1 table with name ‘modelseed_compounds’

Args:
  • db_cursor (sqlite3.Cursor):

    Cursor from open connection to the database (data.db)

Returns:
int:

Corresponding to one of the ValidationCodes

refinegems.utility.databases.reset_database(database: Path | str = PATH_TO_DB)[source]

Remove tables for certain databases to allow pushing of the database to GitHub (reduce size).

Args:
  • database (Path | str, optional):

    Path to the database. Defaults to PATH_TO_DB, the in-build database.

refinegems.utility.databases.update_bigg_db(latest_version: str, db_connection: Connection) dict[source]

Updates the BiGG tables ‘bigg_metabolites’ & ‘bigg_reactions’ within a database (data.db)

Args:
  • latest_version (str):

    String containing the Path to a file with the latest version of the BiGG database

  • db_connection (sqlite3.Connection):

    Open connection to the database (data.db)

refinegems.utility.databases.update_mnx_namespaces(db: Path | str = PATH_TO_DB, chunksize: int = 1)[source]

Add or update the MetaNetX namespace to/in a database.

Args:
  • db (Union[Path,str],optional):

    Path to a database to add the namespace to. Defaults to the in-build database.

  • chunksize (int, optional):

    Size of the chunk (in kB) to download at once. Defaults to 1.

db_access module

Access information from different databases or compare a model or model entities with them. This module provides variables and function for accessing databases for better model curation and annotation.

The following databases have functionalities implemented:

  • BiGG

  • ChEBI

  • KEGG

  • ModelSEED

  • NCBI

  • UniProt

refinegems.utility.db_access.ALL_BIGG_COMPARTMENTS_ONE_LETTER = ('c', 'e', 'p', 'm', 'x', 'r', 'v', 'n', 'g', 'u', 'l', 'h', 'f', 's', 'i', 'w', 'y')
refinegems.utility.db_access.ALL_BIGG_COMPARTMENTS_TWO_LETTER = ('im', 'cx', 'um', 'cm', 'mm')
refinegems.utility.db_access.BIGG_METABOLITES_URL = 'http://bigg.ucsd.edu/api/v2/universal/metabolites/'
refinegems.utility.db_access.BIOCYC_TIER1_DATABASES_PREFIXES = ['META', 'ECO', 'ECOLI', 'HUMAN']

Map an ID to information in BiGG or compare model entities to BiGG.

refinegems.utility.db_access._add_annotations_from_bigg_reac_row(row: pandas.Series, reac: cobra.Reaction) None[source]

Given a row of the BiGG reaction database table and a cobra.Reaction object, extend the annotation of the latter with the information of the former.

Args:
  • row (pd.Series):

    The row of the database table.

  • reac (cobra.Reaction):

    The reaction object.

refinegems.utility.db_access._search_ncbi_for_gp(row: pandas.Series, id_type: Literal['refseq', 'ncbiprotein']) pandas.Series[source]

Fetches protein name and locus tag from NCBI

Args:
  • row (pd.Series):

    Row of a pandas DataFrame containing RefSeq/NCBI Protein IDs in columns

  • id_type (Literal[‘refseq’, ‘ncbiprotein’]):

    ID type of IDs in provided row. Can be one of [‘refseq’, ‘ncbiprotein’].

Returns:
pd.Series:

Modified input row

refinegems.utility.db_access.add_annotations_from_BiGG_metabs(metabolite: cobra.Metabolite) None[source]

Check a cobra.metabolite for bigg.metabolite annotations. If they exists, search for more annotations in the BiGG database and add them to the metabolite.

Args:
  • metabolite (cobra.Metabolite):

    The metabolite object.

refinegems.utility.db_access.add_info_from_ChEBI_BiGG(missing_metabs: pandas.DataFrame, charge=True, formula=True, iupac=True) pandas.DataFrame[source]

Adds information from CHEBI/BiGG to the provided dataframe.

The following informations can be added:

  • charge

  • formula

  • iupac (name)

Args:
  • missing_metabs (pd.DataFrame):

    Table containing metabolites & the respective ChEBI & BiGG IDs

Returns:
pd.DataFrame:

Input table extended with the charges & chemical formulas obtained from ChEBI/BiGG.

refinegems.utility.db_access.compare_model_modelseed(model_charges: pandas.DataFrame, modelseed_charges: pandas.DataFrame) pandas.DataFrame[source]

Compares tables with charges / formulae from model & modelseed

Args:
  • model_charges (pd.DataFrame):

    Charges and formulae of model metabolites. Output of get_model_charges().

  • modelseed_charges (pd.DataFrame):

    Charges and formulae of ModelSEED metabolites. Output of get_modelseed_charges().

Returns:
pd.DataFrame:

Table containing info whether charges / formulae match

refinegems.utility.db_access.compare_to_modelseed(model: cobra.Model) tuple[pandas.DataFrame, pandas.DataFrame][source]

Executes all steps to compare model metabolites to ModelSEED metabolites

Args:
  • model (cobraModel):

    Model loaded with COBRApy

Returns:
tuple:

Tables with charge (1) & formula (2) mismatches

  1. pd.DataFrame: Table with charge mismatches

  2. pd.DataFrame: Table with formula mismatches

refinegems.utility.db_access.get_BiGG_metabs_annot_via_dbid(metabolite: cobra.Metabolite, id: str, dbcol: str, compartment: str = 'c') None[source]

Search for a BiGG ID and add it to a metabolite annotation. The search is based on a column name of the BiGG metabolite table and an ID to search for. Additionally, using the given compartment name, the found IDs are filtered for matching compartments.

Args:
  • metabolite (cobra.Metabolite):

    The metabolite. Needs to a a COBRApy Metabolte object.

  • id (str):

    The ID to search for in the database.

  • dbcol (str):

    Name of the column of the database to check the ID against.

  • compartment (str, optional):

    The compartment name. Needs to be a valid BiGG compartment ID. Defaults to ‘c’.

refinegems.utility.db_access.get_charge_mismatch(df_comp: pandas.DataFrame) pandas.DataFrame[source]

Extracts metabolites with charge mismatch of model & modelseed

Args:
df_comp (pd.DataFrame):

Charge and formula mismatches. Output from compare_model_modelseed().

Returns:
pd.DataFrame:

Table containing metabolites with charge mismatch

refinegems.utility.db_access.get_compared_formulae(formula_mismatch: pandas.DataFrame) pandas.DataFrame[source]

Compare formula by atom pattern

Args:
formula_mismatch (pd.DataFrame):

Table with column containing atom comparison. Output from get_formula_mismatch().

Returns:
pd.DataFrame:

table containing metabolites with formula mismatch

refinegems.utility.db_access.get_ec_from_ncbi(mail: str, ncbiprot: str) str | None[source]

Based on a NCBI protein accession number, try and fetch the EC number from NCBI.

Args:
  • mail (str):

    User’s mail address for the NCBI ENtrez tool.

  • ncbiprot (str):

    The NCBI protein accession number.

Returns:
  1. Case: fetching successful
    str:

    The EC number associated with the protein ID based on NCBI.

  2. Case: fetching unsuccessful
    None:

    Nothing to return

refinegems.utility.db_access.get_ec_via_swissprot(fasta: str, db: str, missing_genes: pandas.DataFrame, swissprot_mapping_file: str, outdir: str = None, sens: Literal['sensitive', 'more-sensitive', 'very-sensitive', 'ultra-sensitive'] = 'more-sensitive', cov: float = 95.0, t: int = 2, pid: float = 90.0) pandas.DataFrame[source]

Based on a protein FASTA and a missing genes tables, mapped them to EC numbers using a Swissprot DIAMOND database and a SwissProt mapping file (see download_url() on how to download the needed files).

Args:
  • fasta (str):

    Path to the FASTA protein file.

  • db (str):

    Path to the DIAMOND database (SwissProt).

  • missing_genes (pd.DataFrame):

    The table of missing genes.

  • swissprot_mapping_file (str):

    Path to the SwissProt mapping file.

  • outdir (str, optional):

    Path to a directory to write the output to. Defaults to None.

  • sens (Literal[‘sensitive’, ‘more-sensitive’, ‘very-sensitive’,’ultra-sensitive’], optional):

    Sensitivity mode of DIAMOND blastp. Defaults to ‘more-sensitive’.

  • cov (float, optional):

    Coverage threshold for DIAMOND blastp. Defaults to 95.0.

  • t (int, optional):

    Number of threads to use for DIAMOND blastp. Defaults to 2.

  • pid (float, optional):

    Percentage identity value to use as a cutoff for the results of the DIAMOND blastp run. Defaults to 90.0.

Returns:
pd.DataFrame:

The missing genes table extended by the mapping to an EC number, if successful.

refinegems.utility.db_access.get_formula_mismatch(df_comp: pandas.DataFrame) pandas.DataFrame[source]

Extracts metabolites with formula mismatch of model & modelseed

Args:
df_comp (pd.DataFrame):

Charge and formula mismatches. Output from compare_model_modelseed().

Returns:
pd.DataFrame:

Table containing metabolites with formula mismatch

refinegems.utility.db_access.get_kegg_genes(organismid: str) pandas.DataFrame[source]

Extracts list of genes from KEGG given an organism

Args:
  • organismid (str):

    KEGG ID of organism which the model is based on

Returns:
pd.DataFrame:

Table of all genes denoted in KEGG for the organism

refinegems.utility.db_access.get_model_charges(model: cobra.Model) pandas.DataFrame[source]

Extracts all metabolites from model

Args:
  • model (cobraModel):

    Model loaded with COBRApy

Returns:
pd.DataFrame:

Table containing charges and formulae of model metabolites

refinegems.utility.db_access.get_modelseed_charges(modelseed_compounds: pandas.DataFrame) pandas.DataFrame[source]

Extract table with BiGG, charges and formulae

Args:
Returns:
pd.DataFrame:

Table containing charges and formulae of ModelSEED metabolites

refinegems.utility.db_access.get_modelseed_compounds() pandas.DataFrame[source]

Extracts compounds from ModelSEED which have BiGG Ids

Returns:
pd.DataFrame:

Table containing ModelSEED data

refinegems.utility.db_access.kegg_reaction_parser(rn_id: str) dict[source]

Get the entry of a KEGG reaction ID and parse the information into a dictionary.

Args:
  • rn_id (str):

    A reaction ID existing in KEGG.

Returns:
dict:

The KEGG entry information as a dictionary.

refinegems.utility.db_access.map_dmnd_res_to_sp_ec_brenda(dmnd_results: pandas.DataFrame, swissprot_mapping_path: str) pandas.DataFrame[source]

Map the results of a DIAMOND BLASTp run (filtered, see filter_DIAMOND_blastp_results())

Args:
  • dmnd_results (pd.DataFrame):

    The results of the DIAMOND run.

  • swissprot_mapping_path (str):

    The path to the SwissProt mapping file (IDs against BRENDA and EC, for information on how to get them, refer to download_url())

Returns:
pd.DataFrame:

The resulting mapping (no duplicates).

refinegems.utility.db_access.map_to_homologs(fasta: str, db: str, missing_genes: pandas.DataFrame, mapping_file: str, outdir: str = None, sens: Literal['sensitive', 'more-sensitive', 'very-sensitive', 'ultra-sensitive'] = 'more-sensitive', cov: float = 95.0, t: int = 2, pid: float = 90.0, email=None) pandas.DataFrame[source]
refinegems.utility.db_access.parse_KEGG_ec(ec: str) dict[source]

Based on an EC number, fetch the corresponding KEGG entry and parse it into a dictionary containing the following information (if available):

  • ec-code

  • id (kegg.reference)

  • equation

  • reference

  • pathway

Args:
  • ec (str):

    The EC number in the format ‘x.x.x.x’

Returns:
dict:

The collected information about the KEGG entry.

refinegems.utility.db_access.parse_KEGG_gene(locus_tag: str) dict[source]

Based on a locus tag, fetch the corresponding KEGG entry and parse it into a dictionary containing the following information (if available):

  • ec-code

  • orthology

  • references

Args:
  • locus_tag (str):

    The locus in the format <organism_id>:<locus_tag>

Returns:
dict:

The collected information.

entities module

Collection of functions to access, handle and manipulate different entities of COBRApy and libsbml models.

refinegems.utility.entities.REF_COL_GF_GENE_MAP = {'GeneID': 'NCBIGENE', 'UniProt': 'UNIPROT'}
refinegems.utility.entities.are_compartment_names_valid(model: cobra.Model) bool[source]

Check if compartment names of model are considered valid based on VALID_COMPARTMENTS.

Args:
  • model (cobra.Model):

    The model, loaded with COBRApy.

Returns:
bool:

True, if valid, else false.

refinegems.utility.entities.build_metabolite_bigg(id: str, model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG', idprefix: str = 'refineGEMs') cobra.Metabolite | None[source]

Build a cobra.Metabolite object from a BiGG ID. This function will NOT directly add the metabolite to the model, if the contruction is successful.

Args:
  • id (str):

    A BiGG ID of a metabolite.

  • model (cobra.Model):

    The model, the metabolite will be build for.

  • namespace (str, optional):

    Name to use for the model ID. If namespace cannot be matched, will use a random ID. Defaults to ‘BiGG’.

  • compartment (str, optional):

    Compartment of the metabolite. Defaults to ‘c’.

  • idprefix (str, optional):

    Prefix for the random ID. Defaults to ‘refineGEMs’.

Returns:
  1. Case construction successful or match found in model:
    cobra.Metabolite:

    The build metabolite object.

  2. Case construction failed:
    None:

    Nothing to return.

refinegems.utility.entities.build_metabolite_kegg(kegg_id: str, model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG', compartment: str = 'c', idprefix='refineGEMs') cobra.Metabolite | None[source]

Build a cobra.Metabolite object from a KEGG ID. This function will NOT directly add the metabolite to the model, if the contruction is successful.

Args:
  • kegg_id (str):

    A KEGG ID of a metabolite.

  • model (cobra.Model):

    The model, the metabolite will be build for.

  • namespace (str, optional):

    Name to use for the model ID. If namespace cannot be matched, will use a random ID. Defaults to ‘BiGG’.

  • compartment (str, optional):

    Compartment of the metabolite. Defaults to ‘c’.

  • idprefix (str, optional):

    Prefix for the random ID. Defaults to ‘refineGEMs’.

Returns:
  1. Case construction successful or match found in model:
    cobra.Metabolite:

    The build metabolite object.

  2. Case construction failed:
    None:

    Nothing to return.

refinegems.utility.entities.build_metabolite_mnx(id: str, model: cobra.Model, namespace: str = 'BiGG', compartment: str = 'c', idprefix: str = 'refineGEMs') cobra.Metabolite | None[source]

Build a cobra.Metabolite object from a MetaNetX ID. This function will NOT directly add the metabolite to the model, if the contruction is successful.

Args:
  • id (str):

    A MetaNetX ID of a metabolite.

  • model (cobra.Model):

    The model, the metabolite will be build for.

  • namespace (str, optional):

    Name to use for the model ID. If namespace cannot be matched, will use a random ID. Defaults to ‘BiGG’.

  • compartment (str, optional):

    Compartment of the metabolite. Defaults to ‘c’.

  • idprefix (str, optional):

    Prefix for the random ID. Defaults to ‘refineGEMs’.

Returns:
  1. Case construction successful or match found in model:
    cobra.Metabolite:

    The metabolite object.

  2. Case construction failed:
    None:

    Nothing to return.

refinegems.utility.entities.build_metabolite_xxx(id: str, model: cobra.Model, namespace: str, compartment: str, idprefix: str) cobra.Metabolite[source]

Template function for building a cobra.Metabolite.

Note

This is a template function for developers. It cannot be executed.

Args:
  • id (str):

    _description_

  • model (cobra.Model):

    _description_

  • namespace (str):

    _description_

  • compartment (str):

    _description_

  • idprefix (str):

    _description_

Returns:
cobra.Metabolite:

_description_

refinegems.utility.entities.build_reaction()[source]

Note

Will be coming in a future release.

refinegems.utility.entities.build_reaction_bigg(model: cobra.Model, id: str, reac_str: str = None, references: dict = {}, idprefix: str = 'refineGEMs', namespace: Literal['BiGG'] = 'BiGG') cobra.Reaction | None | list[source]

Construct a new reaction for a model from a BiGG reaction ID. This function will NOT add the reaction directly to the model, if the construction process is successful.

Args:
  • model (cobra.Model):

    The model loaded with COBRApy.

  • id (str):

    A BiGG reaction ID.

  • reac_str (str, optional):

    The reaction equation string from the database. Currently, this param is not doing anything in this function. Defaults to None.

  • references (dict, optional):

    Additional annotations to add to the reaction (idtype:[value]). Defaults to {}.

  • idprefix (str, optional):

    Prefix for the pseudo-identifier. Defaults to ‘refineGEMs’.

  • namespace (Literal[‘BiGG’], optional):

    Namespace to use for the reaction ID. If namespace cannot be matched, uses the pseudo-ID Defaults to ‘BiGG’.

Returns:
  1. Case successful construction:
    cobra.Reaction:

    The newly build reaction object.

  2. Case construction not possible:
    None:

    Nothing to return.

  3. Case reaction found in model.
    list:

    List of matching reaction IDs (in model).

refinegems.utility.entities.build_reaction_kegg(model: cobra.Model, id: str = None, reac_str: str = None, references: dict = {}, idprefix: str = 'refineGEMs', namespace: Literal['BiGG'] = 'BiGG') cobra.Reaction | None | list[source]

Construct a new reaction for a model from either a KEGG reaction ID or a KEGG equation string. This function will NOT add the reaction directly to the model, if the construction process is successful.

Args:
  • model (cobra.Model):

    The model loaded with COBRApy.

  • id (str,optional):

    A KEGG reaction ID.

  • reac_str (str, optional):

    The reaction equation string from the database. Defaults to None.

  • references (dict, optional):

    Additional annotations to add to the reaction (idtype:[value]). Defaults to {}.

  • idprefix (str, optional):

    Prefix for the pseudo-identifier. Defaults to ‘refineGEMs’.

  • namespace (Literal[‘BiGG’], optional):

    Namespace to use for the reaction ID. If namespace cannot be matched, uses the pseudo-ID Defaults to ‘BiGG’.

Returns:
  1. Case successful construction:
    cobra.Reaction:

    The newly build reaction object.

  2. Case construction not possible:
    None:

    Nothing to return.

  3. Case reaction found in model.
    list:

    List of matching reaction IDs (in model).

refinegems.utility.entities.build_reaction_mnx(model: cobra.Model, id: str, reac_str: str = None, references: dict = {}, idprefix: str = 'refineGEMs', namespace: Literal['BiGG'] = 'BiGG') cobra.Reaction | None | list[source]

Construct a new reaction for a model from a MetaNetX reaction ID. This function will NOT add the reaction directly to the model, if the construction process is successful.

Args:
  • model (cobra.Model):

    The model loaded with COBRApy.

  • id (str):

    A MetaNetX reaction ID.

  • reac_str (str, optional):

    The reaction equation string from the database. Defaults to None.

  • references (dict, optional):

    Additional annotations to add to the reaction (idtype:[value]). Defaults to {}.

  • idprefix (str, optional):

    Prefix for the pseudo-identifier. Defaults to ‘refineGEMs’.

  • namespace (Literal[‘BiGG’], optional):

    Namespace to use for the reaction ID. If namespace cannot be matched, uses the pseudo-ID Defaults to ‘BiGG’.

Returns:
  1. Case successful construction:
    cobra.Reaction:

    The newly build reaction object.

  2. Case construction not possible:
    None:

    Nothing to return.

  3. Case reaction found in model.
    list:

    List of matching reaction IDs (in model).

refinegems.utility.entities.build_reaction_xxx()[source]

Extend the build function so, that all of them can take either the id or an equation as input for rebuilding the reaction (would also be beneficial for semi-manual curation)

model:cobra.Model, id:str=None,

reac_str:str=None, references:dict={}, idprefix:str=’refineGEMs’, namespace:Literal[‘BiGG’]=’BiGG’) -> Union[cobra.Reaction, None, list]:

refinegems.utility.entities.compare_gene_lists(gps_in_model: pandas.DataFrame, db_genes: pandas.DataFrame, kegg: bool = True) pandas.DataFrame[source]

Compares the provided tables according to column 0/’Locus_tag’

Args:
  • gps_in_model (pd.DataFrame):

    Table containing the KEGG Gene IDs/Locus tags in the model

  • db_genes (pd.DataFrame):

    Table containing the KEGG Gene IDs for the organism from KEGG/ locus tags (Accession-2) from BioCyc

  • kegg (bool):

    True if KEGG Genes should be extracted, otherwise False

Returns:
pd.DataFrame:

Table containing all missing genes

refinegems.utility.entities.create_fba_units(model: libsbml.Model) list[libsbml.UnitDefinition][source]

Creates all fba units required for a constraint-based model

Args:
  • model (libModel):

    Model loaded with libSBML

Returns:
list:

List of libSBML UnitDefinitions

refinegems.utility.entities.create_gp(model: libsbml.Model, protein_id: str, model_id: str = None, name: str = None, locus_tag: str = None, reference: dict[slice(<class 'str'>, tuple[typing.Union[list, str], bool], None)] = dict(), sanity_check: bool = False) None[source]

Creates GeneProduct in the given libSBML model.

Args:
  • model (libModel):

    The model object, loaded with libSBML.

  • protein_id (str):

    (NCBI) Protein ID of the gene.

  • model_id (str, optional):

    If given, uses this string as the ID of the gene in the model. ID should be identical to ID that CarveMe adds from the NCBI FASTA input file. Defaults to None.

  • name (str, optional):

    Name of the GeneProduct. Defaults to None.

  • locus_tag (str, optional):

    Genome-specific locus tag. Will be used as label in the model. Defaults to None.

  • reference (dict, optional):

    Dictionary containing references for the gene product. The key is the database name, the value is a tuple with the first element being the ID(s) and the second element being a boolean indicating if the strain is a lab strain or not. Defaults to an empty dictionary.

  • sanity_check (bool, optional):

    Check, whether locus tag (label) or model ID (ID) already exist in model. Note, that setting this to True increases the runtime. Defaults to False.

refinegems.utility.entities.create_gpr(reaction: libsbml.Reaction, gene: str | list[str]) None[source]

For a given libSBML Reaction and a gene ID or a list of gene IDs, create a gene production rule inside the reaction.

Currently only supports ‘OR’ causality.

Args:
  • reaction (libsbml.Reaction):

    The reaction object to add the GPR to.

  • gene (str | list[str]):

    Either a gene ID or a list of gene IDs, that will be added to the GPR (OR causality).

refinegems.utility.entities.create_random_id(model: cobra.Model, entity_type: Literal['reac', 'meta'] = 'reac', prefix: str = '') str[source]

Generate a unique, random ID for a model entity for a model.

Args:
  • model (cobra.Model):

    A model loaded with COBRApy.

  • entity_type (Literal[‘reac’,’meta’], optional):

    Type of model entity. Can be ‘reac’ for Reaction or ‘meta’ for Metabolite. Defaults to ‘reac’.

  • prefix (str, optional):

    Prefix to set for the randomised part. Useful to identify the random IDs later on. Defaults to ‘’.

Raises:
  • ValueError: Unknown entity_type

Returns:
str:

The generate new and unique ID.

refinegems.utility.entities.create_reaction(model: libsbml.Model, reaction_id: str, name: str, reactants: dict[slice(<class 'str'>, <class 'int'>, None)], products: dict[slice(<class 'str'>, <class 'int'>, None)], fluxes: dict[slice(<class 'str'>, <class 'str'>, None)], reversible: bool = None, fast: bool = None, compartment: str = None, sbo: str = None, genes: str | list[str] = None) tuple[libsbml.Reaction, libsbml.Model][source]

Creates new reaction in the given model

Args:
  • model (libModel):

    Model loaded with libSBML

  • reaction_id (str):

    BiGG ID of the reaction to create

  • name (str):

    Human readable name of the reaction

  • reactants (dict):

    Metabolites as keys and their stoichiometry as values

  • products (dict):

    Metabolites as keys and their stoichiometry as values

  • fluxes (dict):

    Dictionary with lower_bound and upper_bound as keys

  • reversible (bool):

    True/False for the reaction

  • fast (bool):

    True/False for the reaction

  • compartment (str):

    BiGG compartment ID of the reaction (if available)

  • sbo (str):

    SBO term of the reaction

  • genes (str|list):

    List of genes belonging to reaction

Returns:
tuple:

libSBML reaction (1) & libSBML model (2)

  1. Reaction: Created reaction

  2. libModel: Model containing the created reaction

refinegems.utility.entities.create_species(model: libsbml.Model, metabolite_id: str, name: str, compartment_id: str, charge: int, chem_formula: str) tuple[libsbml.Species, libsbml.Model][source]

Creates Species/Metabolite in the given model

Args:
  • model (libModel):

    Model loaded with libSBML

  • metabolite_id (str):

    Metabolite ID within model (If model from CarveMe, preferable a BiGG ID)

  • name (str):

    Name of the metabolite

  • compartment_id (str):

    ID of the compartment where metabolite resides

  • charge (int):

    Charge for the metabolite

  • chem_formula (str):

    Chemical formula for the metabolite

Returns:
tuple:

libSBML Species (1) & libSBML model (2)

  1. Species: Created species/metabolite

  2. libModel: Model containing the created metabolite

refinegems.utility.entities.create_unit(model_specs: tuple[int], meta_id: str, kind: str, e: int, m: int, s: int, uri_is: str = '', uri_idf: str = '') libsbml.Unit[source]

Creates unit for SBML model according to arguments

Args:
  • model_specs (tuple):

    Level & Version of SBML model

  • meta_id (str):

    Meta ID for unit (Neccessary for URI)

  • kind (str):

    Unit kind constant (see libSBML for available constants)

  • e (int):

    Exponent of unit

  • m (int):

    Multiplier of unit

  • s (int):

    Scale of unit

  • uri_is (str):

    URI supporting the specified unit

  • uri_idf (str):

    URI supporting the derived from unit

Returns:
Unit:

libSBML unit object

refinegems.utility.entities.create_unit_definition(model_specs: tuple[int], identifier: str, name: str, units: list[libsbml.Unit]) libsbml.UnitDefinition[source]

Creates unit definition for SBML model according to arguments

Args:
  • model_specs (tuple):

    Level & Version of SBML model

  • identifier (str):

    Identifier for the defined unit

  • name (str):

    Full name of the defined unit

  • units (list):

    All units the defined unit consists of

Returns:
UnitDefinition:

libSBML unit definition object

refinegems.utility.entities.get_gpid_mapping(model: libsbml.Model, gff_paths: str | list[str] = None, email: str = None, contains_locus_tags: bool = False, outpath: str | Path = None) pandas.DataFrame[source]

Generate a mapping from model IDs to valid database IDs via model content, GFF files (optional) and NCBI requests (optional).

Args:
  • model (libModel):

    Model loaded with libSBML

  • gff_paths (str|list[str]):

    Path(s) to GFF file(s). Allowed GFF formats are: RefSeq, NCBI and Prokka. This is only used when mapping_tbl_file == None. Defaults to None.

  • email (str):

    E-mail for NCBI queries. This is only used when mapping_tbl_file == None. Defaults to None.

  • contains_locus_tags (bool, optional):

    Specifies if provided model has locus tags within the label tag if set to True. This is only used when mapping_tbl_file == None. Defaults to False.

  • outpath (str|Path, optional):

    Output path for location where the generated mapping table should be written to. This is only used when mapping_tbl_file == None. Defaults to None.

Returns:
pd.DataFrame:

Mapping from model IDs to valid database IDs

refinegems.utility.entities.get_model_reacs_or_metabs(model_libsbml: libsbml.Model, metabolites: bool = False, col_name: str = 'bigg_id') pandas.DataFrame[source]

Extracts table of reactions/metabolites with BiGG IDs from model

Args:
  • model_libsbml (libModel):

    Model loaded with libSBML

  • metabolites (bool):

    Set to True if metabolites from model should be extracted

  • col_name (str):

    Name to be used for column in table. Defaults to ‘bigg_id’.

Returns:
pd.DataFrame:

Table with model identifiers for either metabolites or reactions

refinegems.utility.entities.get_reaction_annotation_dict(model: cobra.Model, db: Literal['KEGG', 'BiGG']) dict[source]

Create a dictionary of a model’s reaction IDs and a chosen database ID as saved in the annotations of the model.

The database ID can be choosen based on the strings for the namespace options in other functions.

Args:
  • model (cobra.Model):

    A model loaded with COBRApy.

  • db (Literal[‘KEGG’,’BiGG’]):

    The string denoting the database to map to.

Raises:
  • ValueError: Unknown database string for paramezer db

Returns:
dict:

The mapping of the reaction IDs to the database IDs found in the annotations

refinegems.utility.entities.get_reversible(fluxes: dict[slice(<class 'str'>, <class 'str'>, None)]) bool[source]

Infer if reaction is reversible from flux bounds

Args:
  • fluxes (dict):

    Dictionary containing the keys ‘lower_bound’ & ‘upper_bound’ with values in [‘cobra_default_lb’, ‘cobra_0_bound’, ‘cobra_default_ub’]

Returns:
bool:

True if reversible else False

refinegems.utility.entities.isreaction_complete(reac: cobra.Reaction, name_check: bool = False, formula_check: Literal['none', 'existence', 'wildcard', 'strict'] = 'existence', exclude_dna: bool = True, exclude_rna: bool = True) bool[source]

Check, if a reaction object can be considered a complete reaction. The parameters can set the strictness of the checking. Useful for checking, if a reaction can be added to a model or if it might break it. For example, a missing model ID in either reaction or metabolites returns false.

Args:
  • reac (cobra.Reaction):

    The reaction object (COBRApy) to test.

  • name_check (bool, optional):

    Option to force reaction and metabolites to have the name attribute set. Defaults to False.

  • formula_check (Literal[‘none’,’existence’,’wildcard’,’strict’], optional):

    Option to check the formula. ‘none’ disables the check, ‘existence’ tests, if a formula is set, ‘wildcard’ additionally checks for wild cards (returns false if one found) and ‘strict’ also checks for the rest symbol ‘R’. Defaults to ‘existence’.

  • exclude_dna (bool, optional):

    Option to set DNA reactions to invalid. Defaults to True.

  • exclude_rna (bool, optional):

    Option to set RNA reaction to invalid. Defaults to True.

Returns:
bool:

True, if the checks are passed successfully, else False.

refinegems.utility.entities.match_id_to_namespace(model_entity: cobra.Reaction | cobra.Metabolite, namespace: Literal['BiGG']) None[source]

Based on a given namespace, change the ID of a given model entity to the set namespace.

Currently working namespaces:

  • BiGG

Args:
  • model_entity (cobra.Reaction, cobra.Metabolite]):

    The model entity. Can be either a cobra.Reaction or cobra.Metabolite object.

  • namespace (Literal[‘BiGG’]):

    The chosen namespace.

Raises:
  • ValueError: Unknown input for namespace

  • TypeError: Unknown type for model_entity

refinegems.utility.entities.parse_reac_str(equation: str, type: Literal['BiGG', 'BioCyc', 'MetaNetX', 'KEGG'] = 'MetaNetX') tuple[dict, dict, list, bool][source]

Parse a reaction string.

Args:
  • equation (str):

    The equation of a reaction as a string (as saved in the database).

  • type (Literal[‘BiGG’,’BioCyc’,’MetaNetX’,’KEGG’], optional):

    The name of the database the equation was taken from. Can be ‘BiGG’,’BioCyc’,’MetaNetX’,’KEGG’. Defaults to ‘MetaNetX’.

Returns:
tuple:

Tuple of (1) dict, (2) dict, (3) list & (4) bool:

  1. Dictionary with the reactant IDs and their stoichiometric factors.

  2. Dictionary with the product IDs and their stoichiometric factors.

  3. List of compartment IDs or None, if they cannot be extract from the equation.

  4. True, if the reaction is reversible, else False.

refinegems.utility.entities.print_UnitDefinitions(unit_defs: libsbml.ListOfUnitDefinitions)[source]

Prints a list of libSBML UnitDefinitions as XMLNodes

Args:
  • unit_defs (ListOfUnitDefinitions):

    List of libSBML UnitDefinition objects

refinegems.utility.entities.reaction_equation_to_dict(eq: str, model: cobra.Model) dict[source]

Parses a reaction equation string to dictionary

Args:
  • eq (str):

    Equation of a reaction

  • model (cobra.Model):

    Model loaded with COBRApy

Returns:
dict:

Metabolite Ids as keys and their coefficients as values (negative = reactants, positive = products)

refinegems.utility.entities.remove_non_essential_genes(model: cobra.Model, genes_to_check: list[cobra.Gene] | None = None, remove_reactions: bool = True, min_growth_threshold: float = MIN_GROWTH_THRESHOLD) tuple[int, int][source]

Remove non-essential genes based on gene knock-out.

Args:
  • model (cobra.Model):

    The model to delete the genes from.

  • genes_to_check (Union[list[cobra.Gene],None], optional):

    List of genes to check. If None, all genes of the model are checked. Defaults to None.

  • remove_reactions (bool, optional):

    If set to True, also remove corresponding reactions. Defaults to True.

  • min_growth_threshold (float, optional):

    Minimal value for growth value. Defaults to MIN_GROWTH_THRESHOLD.

Returns:
tuple[int,int]:
Tuple of (1) int, (2) int:
  1. Number of deleted genes.

  2. Number of essential genes.

refinegems.utility.entities.resolve_compartment_names(model: cobra.Model)[source]

Resolves compartment naming problems.

Args:
  • model (cobra.Model):

    A COBRApy model object.

Raises:
  • KeyError: Unknown compartment raises an error to add it to the mapping. Important for developers.

refinegems.utility.entities.validate_reaction_compartment_bigg(comps: list) bool | Literal['exchange'][source]

Retrieves and validates the compatment(s) of a reaction from the BiGG namespace

Args:
  • comps (list):

    List containing compartments from a BiGG reaction

Returns:

  1. Case compartment in VALID_COMPARTMENTS.keys()
    bool|’exchange’:

    Either

    • True if the provided compartments are valid

    • ‘exchange’ if reaction in multiple compartments

  2. Case not a valid compartment:
    bool:

    ‘False’ if one of the found compartments is not in VALID_COMPARTMENTS

io module

Provides functions to load and write models, media definitions and the manual annotation table

Depending on the application the model needs to be loaded with COBRApy (e.g. memote) or with libSBML (e.g. activation of groups). Some might even require both (e.g. gap filling). The manual_annotations table has to follow the specific layout given in the data folder in order to work with this module.

refinegems.utility.io.create_missing_genes_protein_fasta(fasta: str, missing_genes: pandas.DataFrame, outdir: str = None) str[source]

Creates a FASTA file containing proteins for missing_genes

Note

Please keep in mind that the input FASTA file has to have Genbank format.

Args:
  • fasta (str):

    Path to the FASTA protein file.

  • missing_genes (pd.DataFrame):

    The table of missing genes.

  • outdir (str, optional):

    Path to a directory to write the output to. Defaults to None.

Returns:
str:

Path to the FASTA protein file for the missing genes.

refinegems.utility.io.load_a_table_from_database(table_name_or_query: str, query: bool = True) pandas.DataFrame[source]
Loads the table for which the name is provided or a table containing all rows for which the query evaluates to | true from the refineGEMs database (‘data/database/data.db’)
Args:
  • table_name_or_query (str):

    Name of a table contained in the database ‘data.db’/ a SQL query

  • query (bool):

    Specifies if a query or a table name is provided with table_name_or_query

Returns:
pd.DataFrame:

Containing the table for which the name was provided from the database ‘data.db’

refinegems.utility.io.load_document_libsbml(modelpath: str) libsbml.SBMLDocument[source]

Loads model document using libSBML

Args:
  • modelpath (str):

    Path to GEM

Returns:
SBMLDocument:

Loaded document by libSBML

refinegems.utility.io.load_model(modelpath: str | list[str], package: Literal['cobra', 'libsbml']) cobra.Model | list[cobra.Model] | libsbml.Model | list[libsbml.Model][source]

Load a model.

Args:
  • modelpath (str | list[str]):

    Path to the model or list of paths to models (string format).

  • package (Literal[‘cobra’,’libsbml’]):

    Package to use to load the model.

Returns:
cobra.Model|list[cobra.Model]|libModel|list[libModel]:

The loaded model(s).

refinegems.utility.io.load_subset_from_db(subset_name: str) tuple[str, str, pandas.DataFrame][source]

Load a subset from the database.

Args:
  • subset_name(str):

    Name of the subset to be loaded.

Returns:
tuple of (1) str, (2) str and (3) pd.DataFrame
  1. name of the subset

  2. description of the subset

  3. substance table for the subset

refinegems.utility.io.load_substance_table_from_db(mediumname: str, database: str, type: Literal['testing', 'standard'] = 'standard') pandas.DataFrame[source]

Load a substance table from a database.

Currently available types:

  • ‘testing’: for debugging

  • ‘standard’: The standard format containing all information in long format.

Note: ‘documentation’ currently object to change

Args:
  • name (str):

    The name (or identifier) of the medium.

  • database (str):

    Path to the database.

  • type (Literal[‘testing’,’standard’], optional):

    How to load the table. Defaults to ‘standard’.

Raises:
  • ValueError: Unknown type for loading the substance table.

Returns:
pd.DataFrame:

The substance table in the specified type retrieved from the database.

refinegems.utility.io.parse_dict_to_dataframe(str2list: dict) pandas.DataFrame[source]
Parses dictionary of form {str: list} & | Transforms it into a table with a column containing the strings and a column containing the lists
Args:
str2list (dict):

Dictionary mapping strings to lists

Returns:
pd.DataFrame:

Table with column containing the strings and column containing the lists

refinegems.utility.io.parse_gbff_for_cds(file_path: str) pandas.DataFrame[source]

Retrieves a table containg information about the following qualifiers from a Genbank file: [‘protein_id’,’locus_tag’,’db_xref’,’old_locus_tag’,’EC_number’].

Args:
  • file_path (str):

    Path to the Genbank (.gbff) file.

Returns:
pd.DataFrame:

A table containing the information above. Has the following columns= [‘ncbi_accession_version’, ‘locus_tag_ref’,’old_locus_tag’,’GeneID’,’EC number’].

refinegems.utility.io.parse_gff_for_cds(gffpath: str, keep_attributes: dict[slice(<class 'str'>, <class 'str'>, None)] = None) pandas.DataFrame | tuple[pandas.DataFrame, str][source]

Parses a GFF file to obtain a mapping for the corresponding attributes listed in keep_attributes

Args:
  • gffpath (str):

    Path to the GFF file.

  • keep_attributes (dict, optional):

    Dictionary of attributes to be kept and the corresponding column for the table. Defaults to None.

Returns:
  1. Case: return_variety = False
    pd.DataFrame:

    Dataframe containing a mapping for the corresponding attributes listed in keep_attributes

  2. Case: return_variety = True
    tuple:

    tuple of (1) pd.DataFrame & (2) str:

    1. pd.DataFrame: Dataframe containing a mapping for the corresponding attributes listed in keep_attributes

    2. str: Found variety for provided GFF

refinegems.utility.io.search_sbo_label(sbo_number: str) str[source]

Looks up the SBO label corresponding to a given SBO Term number

Args:
  • sbo_number (str):

    Last three digits of SBO-Term as str

Returns:
str:

Denoted label for given SBO Term

refinegems.utility.io.validate_libsbml_model(model: libsbml.Model) int[source]

Debug method: Validates a libSBML model with the libSBML validator

Args:
  • model (libModel):

    A libSBML model

Returns:
int:

Integer specifying if validate was successful or not

refinegems.utility.io.write_model_to_file(model: libsbml.Model | cobra.Model, filename: str | Path)[source]

Save a model into a file.

Args:
  • model (libModel|cobra.Model):

    The model to be saved

  • filename (str|Path):

    The filename/path to save the model to.

Raises:
  • ValueError: Unknown file extension for model

  • TypeError: Unknown model type

set_up module

Collection of functions for setting up files and work environments.

refinegems.utility.set_up.PATH_MEDIA_CONFIG
refinegems.utility.set_up.download_config(filename: str = './my_config.yaml', type=Literal['media'])[source]

Load a configuration file from the package and save a copy of it for the user to edit.

Args:
  • filename (str, optional):

    Filename to write the config to/save it under as. Defaults to ‘./my_config.yaml’.

  • type (Literal[‘media’], optional):

    Type of configuration file to load. Can be ‘media’ for the media config file. Defaults to Literal[‘media’].

refinegems.utility.set_up.download_url(download_type: Literal['SwissProt gapfill'], directory: str = None, k: int = 10, t: int = 1)[source]

Download files necessary for certain functionalities of the toolbox from the internet.

Currently available:

  • ‘SwissProt gapfill’: download files needed for the GeneGapFiller

Args:
  • dowload_type (Literal[‘SwissProt gapfill’]):

    Type of files to download.

  • directory (str, optional):

    Path to a directory to save the downloaded files to. Defaults to None (So the current working directory is used).

  • k (int, optional):

    Chunksize in kB. Defaults to 10.

  • t (int, optional):

    Number of threads to use for some additional setups, e.g. DIAMOND database creation. Defaults to 1.

Raises:
  • ValueError: Unknown database or file

util module

Collection of utility functions.

refinegems.utility.util.COMP_MAPPING = {'': 'uc', 'C_c': 'c', 'C_e': 'e', 'C_p': 'p', 'c': 'c', 'e': 'e', 'p': 'p', 'w': 'uc'}
refinegems.utility.util.DB2REGEX
refinegems.utility.util.MIN_GROWTH_THRESHOLD = 1e-05
refinegems.utility.util.SBO_BIOCHEM_TERMS = ['SBO:0000377', 'SBO:0000399', 'SBO:0000402', 'SBO:0000403', 'SBO:0000660', 'SBO:0000178', 'SBO:0000200', 'SBO:0000214', 'SBO:0000215', 'SBO:0000217', 'SBO:0000218', 'SBO:0000219', 'SBO:0000220', 'SBO:0000222', 'SBO:0000223', 'SBO:0000233', 'SBO:0000376', 'SBO:0000401']
refinegems.utility.util.SBO_TRANSPORT_TERMS = ['SBO:0000658', 'SBO:0000657', 'SBO:0000654', 'SBO:0000659', 'SBO:0000660']
refinegems.utility.util.VALID_COMPARTMENTS = {'c': 'cytosol', 'e': 'extracellular space', 'p': 'periplasm', 'uc': 'unknown compartment'}
refinegems.utility.util.is_stoichiometric_factor(s: str) bool[source]

“Check if a string could be used as a stoichiometric factor.

Args:
  • s (str):

    The string to check.

refinegems.utility.util.reannotate_sbo_memote(model: cobra.Model) cobra.Model[source]

Reannotate the SBO annotations (e.g. from SBOannotator) of a model into the SBO scheme accessible by memote.

Args:
  • model (cobra.Model):

    The cobra Model to be reannotated.

Returns:
cobra.Model:

The reannotated model

refinegems.utility.util.sum_biomass_weight(reaction: libsbml.Reaction) float[source]

From MEMOTE: https://github.com/opencobra/memote/blob/81a55a163262a0e06bfcb036d98e8e551edc3873/src/memote/support/biomass.py#L95

Compute the sum of all reaction compounds.

This function expects all metabolites of the biomass reaction to have formula information assigned.

Args:
  • reaction (Reaction):

    The biomass reaction of the model under investigation.

Returns:
float:

The molecular weight of the biomass reaction in units of g/mmol.

refinegems.utility.util.test_biomass_consistency(model: cobra.Model, reaction_id: str) float | str[source]

Modified from MEMOTE: https://github.com/opencobra/memote/blob/81a55a163262a0e06bfcb036d98e8e551edc3873/src/memote/suite/tests/test_biomass.py#L89

Expect biomass components to sum up to 1 g[CDW].

This test only yields sensible results if all biomass precursor metabolites have chemical formulas assigned to them. The molecular weight of the biomass reaction in metabolic models is defined to be equal to 1 g/mmol. Conforming to this is essential in order to be able to reliably calculate growth yields, to cross-compare models, and to obtain valid predictions when simulating microbial consortia. A deviation from 1 - 1E-03 to 1 + 1E-06 is accepted.

Implementation: Multiplies the coefficient of each metabolite of the biomass reaction with its molecular weight calculated from the formula, then divides the overall sum of all the products by 1000.

Args:
  • model(cobraModel):

    The model loaded with COBRApy.

  • reaction_id(str):

    Reaction ID of a BOF.

Returns:
  1. Case: problematic input
    str:

    an error message.

  2. Case: successful testing
    float:

    biomass weight

refinegems.utility.util.test_biomass_presence(model: cobra.Model) list[str] | None[source]

Modified from MEMOTE: https://github.com/opencobra/memote/blob/81a55a163262a0e06bfcb036d98e8e551edc3873/src/memote/suite/tests/test_biomass.py#LL42C3-L42C3

Expect the model to contain at least one biomass reaction.

The biomass composition aka biomass formulation aka biomass reaction is a common pseudo-reaction accounting for biomass synthesis in constraints-based modelling. It describes the stoichiometry of intracellular compounds that are required for cell growth. While this reaction may not be relevant to modeling the metabolism of higher organisms, it is essential for single-cell modeling.

Implementation: Identifies possible biomass reactions using two principal steps:

1. Return reactions that include the SBO annotation “SBO:0000629” for biomass.

  1. If no reactions can be identified this way:

    1. Look for the buzzwords “biomass”, “growth” and “bof” in reaction IDs.

    2. Look for metabolite IDs or names that contain the buzzword “biomass” and obtain the set of reactions they are involved in.

    3. Remove boundary reactions from this set.

    4. Return the union of reactions that match the buzzwords and of the reactions that metabolites are involved in that match the buzzword.

This test checks if at least one biomass reaction is present.

If no reaction can be identified return None.