refinegems.utility

connection module

Provides functions / connections to other tools for easier access and usage.

refinegems.utility.connections.adjust_BOF(genome: str, model: cobra.Model, dna_weight_fraction: float, weight_frac: float) → str | None[source]

Adjust the model’s BOF using BOFdat. Currently implemented are step 1 DNA coefficients and step 2.

Connection to the BOFdat tool as described in: BOFdat: Generating biomass objective functions for genome-scale metabolic models from experimental data Lachance JC, Lloyd CJ, Monk JM, Yang L, Sastry AV, et al. (2019) BOFdat: Generating biomass objective functions for genome-scale metabolic models from experimental data. PLOS Computational Biology 15(4): e1006971. https://doi.org/10.1371/journal.pcbi.1006971

Args:

genome (str):
Path to the genome (e.g. .fna) FASTA file.
model (cobra.Model):
The genome-scale metabolic model (from the string above), loaded with COBRApy.
dna_weight_fraction (float):
DNA weight fraction for BOF step 1.
weight_frac (float):
Weight fraction for the second step of BOFdat (coenzymes and ions)

Returns:

str | None:: The updated BOF reaction as a reaction string. Returns None if BOFdat is not installed.

refinegems.utility.connections.filter_DIAMOND_blastp_results(blasttsv: str, pid_theshold: float = 90.0) → pandas.DataFrame[source]

Filter the results of a DIAMOND BLASTp run (see run_DIAMOND_blastp()) by percentage identity value (PID) and extract the matching pairs of query and subject IDs.

Args:

blasttsv (str):
Path to the DIAMOND BLASTp result file.
pid_theshold (float, optional):
Threshold value for the PID. Given in percent. Defaults to 90.0.

Raises:

ValueError: PID threshold has to be between 0.0 and 100.0

Returns:

pd.DataFrame:: A table with the columns query_ID and subject_ID containing hits from BLAST run with s PID higher than the given threshold value.

refinegems.utility.connections.get_memote_score(memote_report: dict) → float[source]

Extracts MEMOTE score from report

Args:

memote_report (dict):
Output from run_memote().

Returns:

float:: MEMOTE score

refinegems.utility.connections.perform_mcc(model: cobra.Model, dir: str, apply: bool = True) → cobra.Model[source]

Run the MassChargeCuration toll on the model and optionally directly apply the solution.

Connection to the MCC tool, preprint is available at: MCC: Automated Mass and Charge Curation at Genome-Scale Applied to C. tuberculostearicum Reihaneh Mostolizadeh, Finn Mier, Andreas Dräger bioRxiv 2024.11.19.624331; doi: https://doi.org/10.1101/2024.11.19.624331

Args:

model (cobra.Model):
The model to use the tool on.
dir (str):
Path of the directory to save MCC output in.
apply (bool, optional):
If True, model is directly updated with the results. Defaults to True.

Returns:

cobra.Model:: The model (updated or not)

refinegems.utility.connections.run_DIAMOND_blastp(fasta: str, db: str, sensitivity: Literal['sensitive', 'more-sensitive', 'very-sensitive', 'ultra-sensitive'] = 'more-sensitive', coverage: float = 95.0, threads: int = 2, outdir: str = None, outname: str = 'DIAMOND_blastp_res.tsv') → str[source]

Run DIAMOND in BLASTp mode.

Connection to the DIAMOND tool as described in: Buchfink B, Reuter K, Drost HG, “Sensitive protein alignments at tree-of-life scale using DIAMOND”, Nature Methods 18, 366-368 (2021). doi:10.1038/s41592-021-01101-x

Args:

fasta (str):
The FASTA file to BLAST for.
db (str):
The DIAMOND database file to BLAST against
sensitivity (Literal[‘sensitive’, ‘more-sensitive’, ‘very-sensitive’,’ultra-sensitive’], optional):
Sensitivity mode for DIAMOND. Defaults to ‘more-sensitive’.
coverage (float, optional):
A parameter for DIAMOND Coverage theshold for the hits. Defaults to 95.0.
threads (int, optional):
A parameter for DIAMOND. Number of threds to be used while BLASTing. Defaults to 2.
outdir (str, optional):
Path to a directory to write the output files to. Defaults to None.
outname (str, optional):
Name of the result file (name only, not a path). Defaults to ‘DIAMOND_blastp_res.tsv’.

Returns:

str:: Path to the results of the DIAMOND BLASTp run.

refinegems.utility.connections.run_ModelPolisher(model_or_path: libsbml.Model | str, configuration: dict) → dict | None[source]

Wrapper around ModelPolisher

Warning

ModelPolisher is currently not maintained. Might not work as expected

Args:

model (libModel):
Model loaded with libSBML
configuration (dict):
Configuration file for ModelPolisher

Returns:

Union[dict, None]:: Result from ModelPolisher

refinegems.utility.connections.run_SBOannotator(model: libsbml.Model) → libsbml.Model[source]

Run SBOannotator on a model to annotate the SBO terms.

Connection to the SBOannotator tool as described in: Leonidou, N., Fritze, E., Renz, A., & Dräger, A. (2023). SBOannotator: a Python tool for the automated assignment of systems biology ontology terms. Bioinformatics, 39(7), btad437.

Args:

model (libModel):
The model loaded with libSBML.

Returns:

libModel:: The model with corrected / added SBO terms.

refinegems.utility.connections.run_memote(model: cobra.Model, type: Literal['json', 'html'] = 'html', return_res: bool = False, save_res: str | None = None, verbose: bool = False) → dict | str | None[source]

Run the memote snapshot function on a given model loaded with COBRApy.

Connection to the memote tool as described in: Lieven, C., Beber, M. E., Olivier, B. G., Bergmann, F. T., Ataman, M., Babaei, P., … & Zhang, C. (2020). MEMOTE for standardized genome-scale metabolic model testing. Nature biotechnology, 38(3), 272-276.

Args:

model (cobra.Model):
The model loaded with COBRApy.
type (Literal[‘json’,’html’], optional):
Type of report to produce. Can be ‘html’ or ‘json’. Defaults to ‘html’.
return_res (bool, optional):
Option to return the result. Defaults to False.
save_res (str | None, optional):
If given a path string, saves the report under the given path. Defaults to None.
verbose (bool, optional):
Produce a more verbose ouput. Defaults to False.

Raises:

ValueError: Unknown input for parameter type

Returns:

Case return_res = True and type = json:

dict:
The json dictionary.
Case return_res = True and type = html:

str:
The html string.
Case return_res = False:

None:
no return

cvterms module

Helper module to work with annotations (CVTerms)

Stores dictionaries which hold information the identifiers.org syntax, has functions to add CVTerms to different entities and parse CVTerms.

refinegems.utility.cvterms.DB2PREFIX_GENES

refinegems.utility.cvterms.DB2PREFIX_METABS

refinegems.utility.cvterms.DB2PREFIX_PATHWAYS

refinegems.utility.cvterms.DB2PREFIX_REACS

refinegems.utility.cvterms.MIRIAM

refinegems.utility.cvterms.OLD_MIRIAM

refinegems.utility.cvterms.PREFIX2DB_GENES

refinegems.utility.cvterms.PREFIX2DB_METABS

refinegems.utility.cvterms.PREFIX2DB_PATHWAYS

refinegems.utility.cvterms.PREFIX2DB_REACS

refinegems.utility.cvterms._add_annotations_from_dict_cobra(references: dict, entity: cobra.Reaction | cobra.Metabolite | cobra.Model) → None[source]

Given a dictionary and a cobra object, add the former as annotations to the latter. The keys of the dictionary are used as the annotation labels, the values as the values. If the keys are already in the entity, the values will be combined (union).

Args:

references (dict):
The dictionary with the references to add the entity.
entity (cobra.Reaction | cobra.Metabolite | cobra.Model):
The entity to add annotations to.

refinegems.utility.cvterms._get_identifiers_org_iri(prefix: str, identifier: str)[source]

refinegems.utility.cvterms._parse_iri(uri: str)[source]

refinegems.utility.cvterms.add_cv_term_genes(entry: str, db_id: str, gene: libsbml.GeneProduct, lab_strain: bool = False)[source]

Adds CVTerm to a gene

Args:

entry (str):
Id to add as annotation.
db_id (str):
Database to which entry belongs. Must be in DB2PREFIX_GENES.keys().
gene (GeneProduct):
Gene to add CVTerm to.
lab_strain (bool, optional):
For locally sequenced strains the qualifiers are always HOMOLOG_TO. Defaults to False.

refinegems.utility.cvterms.add_cv_term_metabolites(entry: str, db_id: str, metab: libsbml.Species)[source]

Adds CVTerm to a metabolite

Args:

entry (str):
Id to add as annotation
db_id (str):
Database to which entry belongs. Must be in DB2PREFIX_METABS.keys().
metab (Species):
Metabolite to add CVTerm to

refinegems.utility.cvterms.add_cv_term_pathways(entry: str, db_id: str, path: libsbml.Group)[source]

Add CVTerm to a groups pathway

Args:

entry (str):
Id to add as annotation
db_id (str):
Database to which entry belongs. Must be in DB2PREFIX_PATHWAYS.keys().
path (Group):
Pathway to add CVTerm to

refinegems.utility.cvterms.add_cv_term_pathways_to_entity(entry: str, db_id: str, reac: libsbml.Reaction)[source]

Add CVTerm to a reaction as OCCURS IN pathway

Args:

entry (str):
Id to add as annotation
db_id (str):
Database to which entry belongss
reac (Reaction):
Reaction to add CVTerm to

refinegems.utility.cvterms.add_cv_term_reactions(entry: str, db_id: str, reac: libsbml.Reaction)[source]

Adds CVTerm to a reaction

Args:

entry (str):
Id to add as annotation
db_id (str):
Database to which entry belongs. Must be in DB2PREFIX_REACS.keys().
reac (Reaction):
Reaction to add CVTerm to

refinegems.utility.cvterms.add_cv_term_units(unit_id: str, unit: libsbml.Unit, relation: int)[source]

Adds CVTerm to a unit

Args:

unit_id (str):
ID to add as URI to annotation
unit (Unit):
Unit to add CVTerm to
relation (int):
Provides model qualifier to be added

refinegems.utility.cvterms.generate_cvterm(qt, b_m_qt) → libsbml.CVTerm[source]

Generates a CVTerm with the provided qualifier & biological or model qualifier types

Args:

qt (libSBML qualifier type):
BIOLOGICAL_QUALIFIER or MODEL_QUALIFIER
b_m_qt (libSBML qualifier):
BQM_IS, BQM_IS_HOMOLOG_TO, etc.

Returns:

CVTerm:: With provided qualifier & biological or model qualifier types

refinegems.utility.cvterms.get_id_from_cv_term(entity: libsbml.SBase, db_id: str) → list[str][source]

Extract Id for a specific database from CVTerm

Args:

entity (SBase):
Species, Reaction, Gene, Pathway
db_id (str):
Database of interest

Returns:

list[str]:: Ids of entity belonging to db_id

refinegems.utility.cvterms.print_cvterm(cvterm: libsbml.CVTerm)[source]

Debug function: Prints the URIs contained in the provided CVTerm along with the provided qualifier & biological/model qualifier types

Args:

cvterm (CVTerm):: A libSBML CVTerm

databases module

Variables, functions and more for the developement, extension and maintainance of the in-build database.

Note

Some functionalities for handling and dealing with the media are in the medium module.

Hint

Further functions for accessing the database can be found in the io module, e.g. load_a_table_from_database()

refinegems.utility.databases.PATH_TO_DB

refinegems.utility.databases.PATH_TO_DB_FOLDER

refinegems.utility.databases.VERSION_FILE

refinegems.utility.databases.VERSION_URL = 'http://bigg.ucsd.edu/api/v2/database_version'

class refinegems.utility.databases.ValidationCodes(value, names=_not_given, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Validation codes for the database

Args:

Enum (Enum):
Provided as input to get a number mapping for the codes

BIGG = (2,)

BIGG_MEDIA = (4,)

BIGG_MSEED_COMPOUNDS = (6,)

COMPLETE = (0,)

EMPTY = (1,)

MEDIA = (3,)

MEDIA_MSEED_COMPOUNDS = 7

MODELSEED_COMPOUNDS = (5,)

refinegems.utility.databases.create_media_database(db_cursor: Cursor)[source]

Creates the media database with 6 tables (‘medium’, ‘substance’, ‘substance2db’, ‘medium2substance’, ‘subset’ & ‘subset2substance’) from file ‘./data/database/media_db.sql’

Args:

db_cursor (sqlite3.Cursor):
Cursor from open connection to the database (data.db)

refinegems.utility.databases.get_latest_bigg_databases(db_connection: Connection, is_missing: bool = True)[source]

Gets the latest BiGG tables for metabolites & reactions if:

No version file is locally available

The version in the local version file is NOT the latest

No BiGG tables currently exist in the database

Args:

db_connection (sqlite3.Connection):
Open connection to the database (data.db)
is_missing (bool, optional):
True if no BiGG tables are in the database. Defaults to True.

refinegems.utility.databases.get_modelseed_compounds_database(db_connection: Connection)[source]

Retrieves the compounds table from ModelSEED from the respective GitHub repository

Args:

db_connection (sqlite3.Connection):
Open connection to the database (data.db)

refinegems.utility.databases.initialise_database()[source]

Initialises/updates the database (data.db)

After initialisation the database contains:

2 tables with names ‘bigg_metabolites’ & ‘bigg_reactions’
6 tables with names ‘medium’, ‘substance’, ‘medium2substance’, ‘substance2db’, ‘subset’ & ‘subset2substance’
1 table with name ‘modelseed_compounds’

refinegems.utility.databases.is_valid_database(db_cursor: Cursor) → int[source]

Verifies if database has:

2 tables with names ‘bigg_metabolites’ & ‘bigg_reactions’

6 tables with names ‘medium’, ‘substance’, ‘substance2db’, ‘medium2substance’, ‘subset’ & ‘subset2substance’

1 table with name ‘modelseed_compounds’

Args:

db_cursor (sqlite3.Cursor):
Cursor from open connection to the database (data.db)

Returns:

int:: Corresponding to one of the ValidationCodes

refinegems.utility.databases.reset_database(database: Path | str = PATH_TO_DB)[source]

Remove tables for certain databases to allow pushing of the database to GitHub (reduce size).

Args:

database (Path | str, optional):
Path to the database. Defaults to PATH_TO_DB, the in-build database.

refinegems.utility.databases.update_bigg_db(latest_version: str, db_connection: Connection) → dict[source]

Updates the BiGG tables ‘bigg_metabolites’ & ‘bigg_reactions’ within a database (data.db)

Args:

latest_version (str):
String containing the Path to a file with the latest version of the BiGG database
db_connection (sqlite3.Connection):
Open connection to the database (data.db)

refinegems.utility.databases.update_mnx_namespaces(db: Path | str = PATH_TO_DB, chunksize: int = 1)[source]

Add or update the MetaNetX namespace to/in a database.

Args:

db (Union[Path,str],optional):
Path to a database to add the namespace to. Defaults to the in-build database.
chunksize (int, optional):
Size of the chunk (in kB) to download at once. Defaults to 1.

db_access module

Access information from different databases or compare a model or model entities with them. This module provides variables and function for accessing databases for better model curation and annotation.

The following databases have functionalities implemented:

BiGG
ChEBI
KEGG
ModelSEED
NCBI
UniProt

refinegems.utility.db_access.ALL_BIGG_COMPARTMENTS_ONE_LETTER = ('c', 'e', 'p', 'm', 'x', 'r', 'v', 'n', 'g', 'u', 'l', 'h', 'f', 's', 'i', 'w', 'y')

refinegems.utility.db_access.ALL_BIGG_COMPARTMENTS_TWO_LETTER = ('im', 'cx', 'um', 'cm', 'mm')

refinegems.utility.db_access.BIGG_METABOLITES_URL = 'http://bigg.ucsd.edu/api/v2/universal/metabolites/'

refinegems.utility.db_access.BIOCYC_TIER1_DATABASES_PREFIXES = ['META', 'ECO', 'ECOLI', 'HUMAN']: Map an ID to information in BiGG or compare model entities to BiGG.

refinegems.utility.db_access._add_annotations_from_bigg_reac_row(row: pandas.Series, reac: cobra.Reaction) → None[source]

Given a row of the BiGG reaction database table and a cobra.Reaction object, extend the annotation of the latter with the information of the former.

Args:

row (pd.Series):
The row of the database table.
reac (cobra.Reaction):
The reaction object.

refinegems.utility.db_access._search_ncbi_for_gp(row: pandas.Series, id_type: Literal['refseq', 'ncbiprotein']) → pandas.Series[source]

Fetches protein name and locus tag from NCBI

Args:

row (pd.Series):
Row of a pandas DataFrame containing RefSeq/NCBI Protein IDs in columns
id_type (Literal[‘refseq’, ‘ncbiprotein’]):
ID type of IDs in provided row. Can be one of [‘refseq’, ‘ncbiprotein’].

Returns:

pd.Series:: Modified input row

refinegems.utility.db_access.add_annotations_from_BiGG_metabs(metabolite: cobra.Metabolite) → None[source]

Check a cobra.metabolite for bigg.metabolite annotations. If they exists, search for more annotations in the BiGG database and add them to the metabolite.

Args:

metabolite (cobra.Metabolite):
The metabolite object.

refinegems.utility.db_access.add_info_from_ChEBI_BiGG(missing_metabs: pandas.DataFrame, charge=True, formula=True, iupac=True) → pandas.DataFrame[source]

Adds information from CHEBI/BiGG to the provided dataframe.

The following informations can be added:

charge
formula
iupac (name)

Args:

missing_metabs (pd.DataFrame):
Table containing metabolites & the respective ChEBI & BiGG IDs

Returns:

pd.DataFrame:: Input table extended with the charges & chemical formulas obtained from ChEBI/BiGG.

refinegems.utility.db_access.compare_model_modelseed(model_charges: pandas.DataFrame, modelseed_charges: pandas.DataFrame) → pandas.DataFrame[source]

Compares tables with charges / formulae from model & modelseed

Args:

model_charges (pd.DataFrame):
Charges and formulae of model metabolites. Output of get_model_charges().
modelseed_charges (pd.DataFrame):
Charges and formulae of ModelSEED metabolites. Output of get_modelseed_charges().

Returns:

pd.DataFrame:: Table containing info whether charges / formulae match

refinegems.utility.db_access.compare_to_modelseed(model: cobra.Model) → tuple[pandas.DataFrame, pandas.DataFrame][source]

Executes all steps to compare model metabolites to ModelSEED metabolites

Args:

model (cobraModel):
Model loaded with COBRApy

Returns:

tuple:

Tables with charge (1) & formula (2) mismatches

pd.DataFrame: Table with charge mismatches
pd.DataFrame: Table with formula mismatches

refinegems.utility.db_access.get_BiGG_metabs_annot_via_dbid(metabolite: cobra.Metabolite, id: str, dbcol: str, compartment: str = 'c') → None[source]

Search for a BiGG ID and add it to a metabolite annotation. The search is based on a column name of the BiGG metabolite table and an ID to search for. Additionally, using the given compartment name, the found IDs are filtered for matching compartments.

Args:

metabolite (cobra.Metabolite):
The metabolite. Needs to a a COBRApy Metabolte object.
id (str):
The ID to search for in the database.
dbcol (str):
Name of the column of the database to check the ID against.
compartment (str, optional):
The compartment name. Needs to be a valid BiGG compartment ID. Defaults to ‘c’.

refinegems.utility.db_access.get_charge_mismatch(df_comp: pandas.DataFrame) → pandas.DataFrame[source]

Extracts metabolites with charge mismatch of model & modelseed

Args:

df_comp (pd.DataFrame):: Charge and formula mismatches. Output from compare_model_modelseed().

Returns:

pd.DataFrame:: Table containing metabolites with charge mismatch

refinegems.utility.db_access.get_compared_formulae(formula_mismatch: pandas.DataFrame) → pandas.DataFrame[source]

Compare formula by atom pattern

Args:

formula_mismatch (pd.DataFrame):: Table with column containing atom comparison. Output from get_formula_mismatch().

Returns:

pd.DataFrame:: table containing metabolites with formula mismatch

refinegems.utility.db_access.get_ec_from_ncbi(mail: str, ncbiprot: str) → str | None[source]

Based on a NCBI protein accession number, try and fetch the EC number from NCBI.

Args:

mail (str):
User’s mail address for the NCBI ENtrez tool.
ncbiprot (str):
The NCBI protein accession number.

Returns:

Case: fetching successful

str:
The EC number associated with the protein ID based on NCBI.
Case: fetching unsuccessful

None:
Nothing to return

refinegems.utility.db_access.get_ec_via_swissprot(fasta: str, db: str, missing_genes: pandas.DataFrame, swissprot_mapping_file: str, outdir: str = None, sens: Literal['sensitive', 'more-sensitive', 'very-sensitive', 'ultra-sensitive'] = 'more-sensitive', cov: float = 95.0, t: int = 2, pid: float = 90.0) → pandas.DataFrame[source]

Based on a protein FASTA and a missing genes tables, mapped them to EC numbers using a Swissprot DIAMOND database and a SwissProt mapping file (see download_url() on how to download the needed files).

Args:

fasta (str):
Path to the FASTA protein file.
db (str):
Path to the DIAMOND database (SwissProt).
missing_genes (pd.DataFrame):
The table of missing genes.
swissprot_mapping_file (str):
Path to the SwissProt mapping file.
outdir (str, optional):
Path to a directory to write the output to. Defaults to None.
sens (Literal[‘sensitive’, ‘more-sensitive’, ‘very-sensitive’,’ultra-sensitive’], optional):
Sensitivity mode of DIAMOND blastp. Defaults to ‘more-sensitive’.
cov (float, optional):
Coverage threshold for DIAMOND blastp. Defaults to 95.0.
t (int, optional):
Number of threads to use for DIAMOND blastp. Defaults to 2.
pid (float, optional):
Percentage identity value to use as a cutoff for the results of the DIAMOND blastp run. Defaults to 90.0.

Returns:

pd.DataFrame:: The missing genes table extended by the mapping to an EC number, if successful.

refinegems.utility.db_access.get_formula_mismatch(df_comp: pandas.DataFrame) → pandas.DataFrame[source]

Extracts metabolites with formula mismatch of model & modelseed

Args:

df_comp (pd.DataFrame):: Charge and formula mismatches. Output from compare_model_modelseed().

Returns:

pd.DataFrame:: Table containing metabolites with formula mismatch

refinegems.utility.db_access.get_kegg_genes(organismid: str) → pandas.DataFrame[source]

Extracts list of genes from KEGG given an organism

Args:

organismid (str):
KEGG ID of organism which the model is based on

Returns:

pd.DataFrame:: Table of all genes denoted in KEGG for the organism

refinegems.utility.db_access.get_model_charges(model: cobra.Model) → pandas.DataFrame[source]

Extracts all metabolites from model

Args:

model (cobraModel):
Model loaded with COBRApy

Returns:

pd.DataFrame:: Table containing charges and formulae of model metabolites

refinegems.utility.db_access.get_modelseed_charges(modelseed_compounds: pandas.DataFrame) → pandas.DataFrame[source]

Extract table with BiGG, charges and formulae

Args:

modelseed_compounds (pd.DataFrame):
ModelSEED data. Output from get_modelseed_compounds().

Returns:

pd.DataFrame:: Table containing charges and formulae of ModelSEED metabolites

refinegems.utility.db_access.get_modelseed_compounds() → pandas.DataFrame[source]

Extracts compounds from ModelSEED which have BiGG Ids

Returns:

pd.DataFrame:: Table containing ModelSEED data

refinegems.utility.db_access.kegg_reaction_parser(rn_id: str) → dict[source]

Get the entry of a KEGG reaction ID and parse the information into a dictionary.

Args:

rn_id (str):
A reaction ID existing in KEGG.

Returns:

dict:: The KEGG entry information as a dictionary.

refinegems.utility.db_access.map_dmnd_res_to_sp_ec_brenda(dmnd_results: pandas.DataFrame, swissprot_mapping_path: str) → pandas.DataFrame[source]

Map the results of a DIAMOND BLASTp run (filtered, see filter_DIAMOND_blastp_results())

Args:

dmnd_results (pd.DataFrame):
The results of the DIAMOND run.
swissprot_mapping_path (str):
The path to the SwissProt mapping file (IDs against BRENDA and EC, for information on how to get them, refer to download_url())

Returns:

pd.DataFrame:: The resulting mapping (no duplicates).

refinegems.utility.db_access.map_to_homologs(fasta: str, db: str, missing_genes: pandas.DataFrame, mapping_file: str = None, outdir: str = None, sens: Literal['sensitive', 'more-sensitive', 'very-sensitive', 'ultra-sensitive'] = 'more-sensitive', cov: float = 95.0, t: int = 2, pid: float = 90.0, email: str = None) → pandas.DataFrame[source]

Based on a protein FASTA and a missing genes tables, mapped them to EC numbers

of homologous genes based on a DIAMOND BLASTp run.

Args:

fasta (str):
The protein FASTA file of the organism the model was build on. Defaults to None.
db (str):
Path to the DIAMOND database. Defaults to None.
missing_genes (pd.DataFrame):
Table containing information about the missing genes.
mapping_file (str, optional):
Precomputed mapping file to map the DIAMOND results to NCBI.
outdir (str, optional):
Path to the output directory. Defaults to None.
sens (Literal[‘sensitive’, ‘more-sensitive’, ‘very-sensitive’, ‘ultra-sensitive’], optional):
Sensitivity mode for DIAMOND. Defaults to “more-sensitive”.
cov (float, optional):
Threshold for the coverage (parameter for DIAMOND). Defaults to 95.0.
t (int, optional):
Number of threads to use (DIAMOND parameter). Defaults to 2.
pid (float, optional):
Percentage identity cutoff value (Mappings with lower values are not considered). Defaults to 90.0.
email (str, optional):
Email to use for the NCBI requests. If not given, skips the direct requests. Also skipped, it mapping_file is provided. Defaults to None.

Returns:

pd.DataFrame:: Output table containing the following information (columns contain None, if mapping unsuccessful): “locus_tag”, “ncbiprotein”, “locus_tag_ref”, “old_locus_tag”, “GeneID”, “ec-code”

refinegems.utility.db_access.parse_KEGG_ec(ec: str) → dict[source]

Based on an EC number, fetch the corresponding KEGG entry and parse it into a dictionary containing the following information (if available):

ec-code
id (kegg.reference)
equation
reference
pathway

Args:

ec (str):
The EC number in the format ‘x.x.x.x’

Returns:

dict:: The collected information about the KEGG entry.

refinegems.utility.db_access.parse_KEGG_gene(locus_tag: str) → dict[source]

Based on a locus tag, fetch the corresponding KEGG entry and parse it into a dictionary containing the following information (if available):

ec-code
orthology
references

Args:

locus_tag (str):
The locus in the format <organism_id>:<locus_tag>

Returns:

dict:: The collected information.

entities module

Collection of functions to access, handle and manipulate different entities of COBRApy and libsbml models.

refinegems.utility.entities.are_compartment_names_valid(model: cobra.Model | libsbml.Model) → bool[source]

Check if compartment names of model are considered valid based on VALID_COMPARTMENTS.

Args:

model (Union[cobra.Model, libModel]):
The model loaded with COBRApy or libSBML.

Returns:

bool:: True, if valid, else false.

refinegems.utility.entities.build_metabolite_bigg(id: str, model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG', idprefix: str = 'refineGEMs') → cobra.Metabolite | None[source]

Build a cobra.Metabolite object from a BiGG ID. This function will NOT directly add the metabolite to the model, if the contruction is successful.

Args:

id (str):
A BiGG ID of a metabolite.
model (cobra.Model):
The model, the metabolite will be build for.
namespace (str, optional):
Name to use for the model ID. If namespace cannot be matched, will use a random ID. Defaults to ‘BiGG’.
compartment (str, optional):
Compartment of the metabolite. Defaults to ‘c’.
idprefix (str, optional):
Prefix for the random ID. Defaults to ‘refineGEMs’.

Returns:

Case construction successful or match found in model:

cobra.Metabolite:
The build metabolite object.
Case construction failed:

None:
Nothing to return.

refinegems.utility.entities.build_metabolite_kegg(kegg_id: str, model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG', compartment: str = 'c', idprefix='refineGEMs') → cobra.Metabolite | None[source]

Build a cobra.Metabolite object from a KEGG ID. This function will NOT directly add the metabolite to the model, if the contruction is successful.

Args:

kegg_id (str):
A KEGG ID of a metabolite.
model (cobra.Model):
The model, the metabolite will be build for.
namespace (str, optional):
Name to use for the model ID. If namespace cannot be matched, will use a random ID. Defaults to ‘BiGG’.
compartment (str, optional):
Compartment of the metabolite. Defaults to ‘c’.
idprefix (str, optional):
Prefix for the random ID. Defaults to ‘refineGEMs’.

Returns:

Case construction successful or match found in model:

cobra.Metabolite:
The build metabolite object.
Case construction failed:

None:
Nothing to return.

refinegems.utility.entities.build_metabolite_mnx(id: str, model: cobra.Model, namespace: str = 'BiGG', compartment: str = 'c', idprefix: str = 'refineGEMs') → cobra.Metabolite | None[source]

Build a cobra.Metabolite object from a MetaNetX ID. This function will NOT directly add the metabolite to the model, if the contruction is successful.

Args:

id (str):
A MetaNetX ID of a metabolite.
model (cobra.Model):
The model, the metabolite will be build for.
namespace (str, optional):
Name to use for the model ID. If namespace cannot be matched, will use a random ID. Defaults to ‘BiGG’.
compartment (str, optional):
Compartment of the metabolite. Defaults to ‘c’.
idprefix (str, optional):
Prefix for the random ID. Defaults to ‘refineGEMs’.

Returns:

Case construction successful or match found in model:

cobra.Metabolite:
The metabolite object.
Case construction failed:

None:
Nothing to return.

refinegems.utility.entities.build_metabolite_xxx(id: str, model: cobra.Model, namespace: str, compartment: str, idprefix: str) → cobra.Metabolite[source]

Template function for building a cobra.Metabolite.

Note

This is a template function for developers. It cannot be executed.

Args:

id (str):
_description_
model (cobra.Model):
_description_
namespace (str):
_description_
compartment (str):
_description_
idprefix (str):
_description_

Returns:

cobra.Metabolite:: _description_

refinegems.utility.entities.build_reaction()[source]: Note

Will be coming in a future release.

refinegems.utility.entities.build_reaction_bigg(model: cobra.Model, id: str, reac_str: str = None, references: dict = {}, idprefix: str = 'refineGEMs', namespace: Literal['BiGG'] = 'BiGG') → cobra.Reaction | None | list[source]

Construct a new reaction for a model from a BiGG reaction ID. This function will NOT add the reaction directly to the model, if the construction process is successful.

Args:

model (cobra.Model):
The model loaded with COBRApy.
id (str):
A BiGG reaction ID.
reac_str (str, optional):
The reaction equation string from the database. Currently, this param is not doing anything in this function. Defaults to None.
references (dict, optional):
Additional annotations to add to the reaction (idtype:[value]). Defaults to {}.
idprefix (str, optional):
Prefix for the pseudo-identifier. Defaults to ‘refineGEMs’.
namespace (Literal[‘BiGG’], optional):
Namespace to use for the reaction ID. If namespace cannot be matched, uses the pseudo-ID Defaults to ‘BiGG’.

Returns:

Case successful construction:

cobra.Reaction:
The newly build reaction object.
Case construction not possible:

None:
Nothing to return.
Case reaction found in model.

list:
List of matching reaction IDs (in model).

refinegems.utility.entities.build_reaction_kegg(model: cobra.Model, id: str = None, reac_str: str = None, references: dict = {}, idprefix: str = 'refineGEMs', namespace: Literal['BiGG'] = 'BiGG') → cobra.Reaction | None | list[source]

Construct a new reaction for a model from either a KEGG reaction ID or a KEGG equation string. This function will NOT add the reaction directly to the model, if the construction process is successful.

Args:

model (cobra.Model):
The model loaded with COBRApy.
id (str,optional):
A KEGG reaction ID.
reac_str (str, optional):
The reaction equation string from the database. Defaults to None.
references (dict, optional):
Additional annotations to add to the reaction (idtype:[value]). Defaults to {}.
idprefix (str, optional):
Prefix for the pseudo-identifier. Defaults to ‘refineGEMs’.
namespace (Literal[‘BiGG’], optional):
Namespace to use for the reaction ID. If namespace cannot be matched, uses the pseudo-ID Defaults to ‘BiGG’.

Returns:

Case successful construction:

cobra.Reaction:
The newly build reaction object.
Case construction not possible:

None:
Nothing to return.
Case reaction found in model.

list:
List of matching reaction IDs (in model).

refinegems.utility.entities.build_reaction_mnx(model: cobra.Model, id: str, reac_str: str = None, references: dict = {}, idprefix: str = 'refineGEMs', namespace: Literal['BiGG'] = 'BiGG') → cobra.Reaction | None | list[source]

Construct a new reaction for a model from a MetaNetX reaction ID. This function will NOT add the reaction directly to the model, if the construction process is successful.

Args:

model (cobra.Model):
The model loaded with COBRApy.
id (str):
A MetaNetX reaction ID.
reac_str (str, optional):
The reaction equation string from the database. Defaults to None.
references (dict, optional):
Additional annotations to add to the reaction (idtype:[value]). Defaults to {}.
idprefix (str, optional):
Prefix for the pseudo-identifier. Defaults to ‘refineGEMs’.
namespace (Literal[‘BiGG’], optional):
Namespace to use for the reaction ID. If namespace cannot be matched, uses the pseudo-ID Defaults to ‘BiGG’.

Returns:

Case successful construction:

cobra.Reaction:
The newly build reaction object.
Case construction not possible:

None:
Nothing to return.
Case reaction found in model.

list:
List of matching reaction IDs (in model).

refinegems.utility.entities.build_reaction_xxx()[source]

Extend the build function so, that all of them can take either the id or an equation as input for rebuilding the reaction (would also be beneficial for semi-manual curation)

model:cobra.Model, id:str=None,: reac_str:str=None, references:dict={}, idprefix:str=’refineGEMs’, namespace:Literal[‘BiGG’]=’BiGG’) -> Union[cobra.Reaction, None, list]:

refinegems.utility.entities.compare_gene_lists(gps_in_model: pandas.DataFrame, db_genes: pandas.DataFrame, kegg: bool = True) → pandas.DataFrame[source]

Compares the provided tables according to column 0/’Locus_tag’

Args:

gps_in_model (pd.DataFrame):
Table containing the KEGG Gene IDs/Locus tags in the model
db_genes (pd.DataFrame):
Table containing the KEGG Gene IDs for the organism from KEGG/ locus tags (Accession-2) from BioCyc
kegg (bool):
True if KEGG Genes should be extracted, otherwise False

Returns:

pd.DataFrame:: Table containing all missing genes

refinegems.utility.entities.create_fba_units(model: libsbml.Model) → list[libsbml.UnitDefinition][source]

Creates all fba units required for a constraint-based model

Args:

model (libModel):
Model loaded with libSBML

Returns:

list:: List of libSBML UnitDefinitions

refinegems.utility.entities.create_gp(model: libsbml.Model, protein_id: str = None, model_id: str = None, name: str = None, locus_tag: str = None, reference: dict[slice(<class 'str'>, tuple[typing.Union[list, str], bool], None)] = dict(), sanity_check: bool = True) → None[source]

Creates GeneProduct in the given libSBML model.

Args:

model (libModel):
The model object, loaded with libSBML.
protein_id (str, optional):
(NCBI) Protein ID of the gene. Defaults to None.
model_id (str, optional):
If given, uses this string as the ID of the gene in the model. ID should be identical to ID that CarveMe adds from the NCBI FASTA input file. Defaults to None.
name (str, optional):
Name of the GeneProduct. Defaults to None.
locus_tag (str, optional):
Genome-specific locus tag. Will be used as label in the model. Defaults to None.
reference (dict, optional):
Dictionary containing references for the gene product. The key is the database name, the value is either a set containing the ID(s) or a single ID string. Defaults to an empty dictionary.
sanity_check (bool, optional):
Check, whether locus tag (label) or model ID (ID) already exist in model. Note, that setting this to True increases the runtime. Defaults to False.

refinegems.utility.entities.create_gpr(reaction: libsbml.Reaction, gene: str | list[str]) → None[source]

For a given libSBML Reaction and a gene ID or a list of gene IDs, create a gene production rule inside the reaction.

Currently only supports ‘OR’ causality.

Args:

reaction (libsbml.Reaction):
The reaction object to add the GPR to.
gene (str | list[str]):
Either a gene ID or a list of gene IDs, that will be added to the GPR (OR causality).

refinegems.utility.entities.create_random_id(model: cobra.Model, entity_type: Literal['reac', 'meta'] = 'reac', prefix: str = '') → str[source]

Generate a unique, random ID for a model entity for a model.

Args:

model (cobra.Model):
A model loaded with COBRApy.
entity_type (Literal[‘reac’,’meta’], optional):
Type of model entity. Can be ‘reac’ for Reaction or ‘meta’ for Metabolite. Defaults to ‘reac’.
prefix (str, optional):
Prefix to set for the randomised part. Useful to identify the random IDs later on. Defaults to ‘’.

Raises:

ValueError: Unknown entity_type

Returns:

str:: The generate new and unique ID.

refinegems.utility.entities.create_reaction(model: libsbml.Model, reaction_id: str, name: str, reactants: dict[slice(<class 'str'>, <class 'int'>, None)], products: dict[slice(<class 'str'>, <class 'int'>, None)], fluxes: dict[slice(<class 'str'>, <class 'str'>, None)], reversible: bool = None, fast: bool = None, compartment: str = None, sbo: str = None, genes: str | list[str] = None) → tuple[libsbml.Reaction, libsbml.Model][source]

Creates new reaction in the given model

Args:

model (libModel):
Model loaded with libSBML
reaction_id (str):
BiGG ID of the reaction to create
name (str):
Human readable name of the reaction
reactants (dict):
Metabolites as keys and their stoichiometry as values
products (dict):
Metabolites as keys and their stoichiometry as values
fluxes (dict):
Dictionary with lower_bound and upper_bound as keys
reversible (bool):
True/False for the reaction
fast (bool):
True/False for the reaction
compartment (str):
BiGG compartment ID of the reaction (if available)
sbo (str):
SBO term of the reaction
genes (str|list):
List of genes belonging to reaction

Returns:

tuple:

libSBML reaction (1) & libSBML model (2)

Reaction: Created reaction
libModel: Model containing the created reaction

refinegems.utility.entities.create_species(model: libsbml.Model, metabolite_id: str, name: str, compartment_id: str, charge: int, chem_formula: str) → tuple[libsbml.Species, libsbml.Model][source]

Creates Species/Metabolite in the given model

Args:

model (libModel):
Model loaded with libSBML
metabolite_id (str):
Metabolite ID within model (If model from CarveMe, preferable a BiGG ID)
name (str):
Name of the metabolite
compartment_id (str):
ID of the compartment where metabolite resides
charge (int):
Charge for the metabolite
chem_formula (str):
Chemical formula for the metabolite

Returns:

tuple:

libSBML Species (1) & libSBML model (2)

Species: Created species/metabolite
libModel: Model containing the created metabolite

refinegems.utility.entities.create_unit(model_specs: tuple[int], meta_id: str, kind: str, e: int, m: int, s: int, uri_is: str = '', uri_idf: str = '') → libsbml.Unit[source]

Creates unit for SBML model according to arguments

Args:

model_specs (tuple):
Level & Version of SBML model
meta_id (str):
Meta ID for unit (Neccessary for URI)
kind (str):
Unit kind constant (see libSBML for available constants)
e (int):
Exponent of unit
m (int):
Multiplier of unit
s (int):
Scale of unit
uri_is (str):
URI supporting the specified unit
uri_idf (str):
URI supporting the derived from unit

Returns:

Unit:: libSBML unit object

refinegems.utility.entities.create_unit_definition(model_specs: tuple[int], identifier: str, name: str, units: list[libsbml.Unit]) → libsbml.UnitDefinition[source]

Creates unit definition for SBML model according to arguments

Args:

model_specs (tuple):
Level & Version of SBML model
identifier (str):
Identifier for the defined unit
name (str):
Full name of the defined unit
units (list):
All units the defined unit consists of

Returns:

UnitDefinition:: libSBML unit definition object

refinegems.utility.entities.get_compartment_ids(model: cobra.Model | libsbml.Model) → list[str][source]

Get compartment IDs based on model type

Args:

model (Union[cobra.Model, libModel]):
The model loaded with COBRApy or libSBML.

Returns:

list[str]:: List of compartment IDs

Raises:

TypeError: Unknown model object type

refinegems.utility.entities.get_gpid_mapping(model: libsbml.Model, gff_paths: str | list[str] = None, email: str = None, contains_locus_tags: bool = False, outpath: str | Path = None) → pandas.DataFrame[source]

Generate a mapping from model IDs to valid database IDs via model content, GFF files (optional) and NCBI requests (optional).

Warning

Mappings may be incomplete and require manual adjustment.

If locus tags from Genbank or RefSeq are present within the GeneProduct identifier and no GFF is provided, these will be added incorrectly to the annotations as NCBI Protein or RefSeq URIs.

Note

Mapping currently is only implemented for NCBI Protein and RefSeq IDs. Other databases may be added in the future.

Args:

model (libModel):
Model loaded with libSBML
gff_paths (str|list[str]):
Path(s) to GFF file(s). Allowed GFF formats are: RefSeq, NCBI and Prokka. Defaults to None.
email (str):
E-mail for NCBI queries. Defaults to None.
contains_locus_tags (bool, optional):
Specifies if provided model has locus tags within the label tag if set to True. Defaults to False.
outpath (str|Path, optional):
Output path for location where the generated mapping table should be written to. Defaults to None.

Returns:

pd.DataFrame:: Mapping from model IDs to valid database IDs

refinegems.utility.entities.get_model_reacs_or_metabs(model_libsbml: libsbml.Model, metabolites: bool = False, col_name: str = 'bigg_id') → pandas.DataFrame[source]

Extracts table of reactions/metabolites with BiGG IDs from model

Args:

model_libsbml (libModel):
Model loaded with libSBML
metabolites (bool):
Set to True if metabolites from model should be extracted
col_name (str):
Name to be used for column in table. Defaults to ‘bigg_id’.

Returns:

pd.DataFrame:: Table with model identifiers for either metabolites or reactions

refinegems.utility.entities.get_reaction_annotation_dict(model: cobra.Model, db: Literal['KEGG', 'BiGG']) → dict[source]

Create a dictionary of a model’s reaction IDs and a chosen database ID as saved in the annotations of the model.

The database ID can be choosen based on the strings for the namespace options in other functions.

Args:

model (cobra.Model):
A model loaded with COBRApy.
db (Literal[‘KEGG’,’BiGG’]):
The string denoting the database to map to.

Raises:

ValueError: Unknown database string for paramezer db

Returns:

dict:: The mapping of the reaction IDs to the database IDs found in the annotations

refinegems.utility.entities.get_reversible(fluxes: dict[slice(<class 'str'>, <class 'str'>, None)]) → bool[source]

Infer if reaction is reversible from flux bounds

Args:

fluxes (dict):
Dictionary containing the keys ‘lower_bound’ & ‘upper_bound’ with values in [‘cobra_default_lb’, ‘cobra_0_bound’, ‘cobra_default_ub’]

Returns:

bool:: True if reversible else False

refinegems.utility.entities.isreaction_complete(reac: cobra.Reaction, name_check: bool = False, formula_check: Literal['none', 'existence', 'wildcard', 'strict'] = 'existence', exclude_dna: bool = True, exclude_rna: bool = True) → bool[source]

Check, if a reaction object can be considered a complete reaction. The parameters can set the strictness of the checking. Useful for checking, if a reaction can be added to a model or if it might break it. For example, a missing model ID in either reaction or metabolites returns false.

Args:

reac (cobra.Reaction):
The reaction object (COBRApy) to test.
name_check (bool, optional):
Option to force reaction and metabolites to have the name attribute set. Defaults to False.
formula_check (Literal[‘none’,’existence’,’wildcard’,’strict’], optional):
Option to check the formula. ‘none’ disables the check, ‘existence’ tests, if a formula is set, ‘wildcard’ additionally checks for wild cards (returns false if one found) and ‘strict’ also checks for the rest symbol ‘R’. Defaults to ‘existence’.
exclude_dna (bool, optional):
Option to set DNA reactions to invalid. Defaults to True.
exclude_rna (bool, optional):
Option to set RNA reaction to invalid. Defaults to True.

Returns:

bool:: True, if the checks are passed successfully, else False.

refinegems.utility.entities.match_id_to_namespace(model_entity: cobra.Reaction | cobra.Metabolite, namespace: Literal['BiGG']) → None[source]

Based on a given namespace, change the ID of a given model entity to the set namespace.

Currently working namespaces:

BiGG

Args:

model_entity (cobra.Reaction, cobra.Metabolite]):
The model entity. Can be either a cobra.Reaction or cobra.Metabolite object.
namespace (Literal[‘BiGG’]):
The chosen namespace.

Raises:

ValueError: Unknown input for namespace
TypeError: Unknown type for model_entity

refinegems.utility.entities.parse_reac_str(equation: str, type: Literal['BiGG', 'BioCyc', 'MetaNetX', 'KEGG'] = 'MetaNetX') → tuple[dict, dict, list, bool][source]

Parse a reaction string.

Args:

equation (str):
The equation of a reaction as a string (as saved in the database).
type (Literal[‘BiGG’,’BioCyc’,’MetaNetX’,’KEGG’], optional):
The name of the database the equation was taken from. Can be ‘BiGG’,’BioCyc’,’MetaNetX’,’KEGG’. Defaults to ‘MetaNetX’.

Returns:

tuple:

Tuple of (1) dict, (2) dict, (3) list & (4) bool:

Dictionary with the reactant IDs and their stoichiometric factors.
Dictionary with the product IDs and their stoichiometric factors.
List of compartment IDs or None, if they cannot be extract from the equation.
True, if the reaction is reversible, else False.

refinegems.utility.entities.reaction_equation_to_dict(eq: str, model: cobra.Model) → dict[source]

Parses a reaction equation string to dictionary

Args:

eq (str):
Equation of a reaction
model (cobra.Model):
Model loaded with COBRApy

Returns:

dict:: Metabolite Ids as keys and their coefficients as values (negative = reactants, positive = products)

refinegems.utility.entities.remove_non_essential_genes(model: cobra.Model, genes_to_check: list[cobra.Gene] | None = None, keep_in_complex: bool = True, remove_reactions: bool = True, min_growth_threshold: float = MIN_GROWTH_THRESHOLD) → tuple[int, int, int][source]

Remove non-essential genes based on gene knock-out.

Args:

model (cobra.Model):
The model to delete the genes from.
genes_to_check (Union[list[cobra.Gene],None], optional):
List of genes to check. If None, all genes of the model are checked. Defaults to None.
keep_in_complex (bool, optional):
Check, if a gene if part of a complex (GPR connection with AND). If True, keep genes, even if they are non-essential.
remove_reactions (bool, optional):
If set to True, also remove corresponding reactions. Defaults to True.
min_growth_threshold (float, optional):
Minimal value for growth value. Defaults to MIN_GROWTH_THRESHOLD.

Returns:

tuple[int,int]:

Tuple of (1) int, (2) int:

Number of deleted genes.
Number of essential genes.
Number of nonessential genes in a complex

refinegems.utility.entities.resolve_compartment_names(model: cobra.Model | libsbml.Model) → None[source]

Resolves compartment naming problems.

Args:

model (Union[cobra.Model, libModel]):
The model loaded with COBRApy or libSBML.

Raises:

KeyError: Unknown compartment raises an error to add it to the mapping. Important for developers.

refinegems.utility.entities.validate_reaction_compartment_bigg(comps: list) → bool | Literal['exchange'][source]

Retrieves and validates the compatment(s) of a reaction from the BiGG namespace

Args:

comps (list):
List containing compartments from a BiGG reaction

Returns:

Case compartment in VALID_COMPARTMENTS.keys()

bool|’exchange’:
Either

True if the provided compartments are valid

‘exchange’ if reaction in multiple compartments

Case not a valid compartment:

bool:
‘False’ if one of the found compartments is not in VALID_COMPARTMENTS

io module

Provides functions to load and write models, media definitions and the manual annotation table

Depending on the application the model needs to be loaded with COBRApy (e.g. memote) or with libSBML (e.g. activation of groups). Some might even require both (e.g. gap filling). The manual_annotations table has to follow the specific layout given in the data folder in order to work with this module.

refinegems.utility.io.convert_cobra_to_libsbml(cmodel: cobra.Model, add_label_locus: None | Literal['notes', 'id'] = None) → libsbml.Model[source]

Convert a loaded COBRApy model to a libsbml model. If possible, also add the locus tags as labels to the libsbml model.

Args:

cmodel (cobra.Model):
The model loaded with COBRApy.
add_label_locus (bool, optional):
Option to add locus tags as labels to the libsbml model. Can either be added via ‘id’, if the model ID corresponds to the locus tag or via ‘notes’, if the locus tags is save under ‘notes’ - ‘locus_tag’. None skips the addition of locus as labels. Defaults to None.

Returns:

libsbml.Model:: The model loaded with libsbml.

refinegems.utility.io.create_missing_genes_protein_fasta(fasta: str, missing_genes: pandas.DataFrame, outdir: str = None) → str | None[source]

Creates a FASTA file containing proteins for missing_genes

Note

Please keep in mind that the input FASTA file has to have the Genbank format as described in mimic_genbank().

Args:

fasta (str):
Path to the FASTA protein file.
missing_genes (pd.DataFrame):
The table of missing genes.
outdir (str, optional):
Path to a directory to write the output to. Defaults to None.

Returns:

Case 1: FASTA created successfully

str: Path to the FASTA protein file for the missing genes.

Case 2: No missing genes or Error in mapping.

None

refinegems.utility.io.load_a_table_from_database(table_name_or_query: str, query: bool = True) → pandas.DataFrame[source]

Loads the table for which the name is provided or a table containing all rows for which the query evaluates to | true from the refineGEMs database (‘data/database/data.db’)

Args:

table_name_or_query (str):
Name of a table contained in the database ‘data.db’/ a SQL query
query (bool):
Specifies if a query or a table name is provided with table_name_or_query

Returns:

pd.DataFrame:: Containing the table for which the name was provided from the database ‘data.db’

refinegems.utility.io.load_document_libsbml(modelpath: str) → libsbml.SBMLDocument[source]

Loads model document using libSBML

Args:

modelpath (str):
Path to GEM

Returns:

SBMLDocument:: Loaded document by libSBML

refinegems.utility.io.load_model(modelpath: str | list[str], package: Literal['cobra', 'libsbml']) → cobra.Model | list[cobra.Model] | libsbml.Model | list[libsbml.Model][source]

Load a model.

Args:

modelpath (str | list[str]):
Path to the model or list of paths to models (string format).
package (Literal[‘cobra’,’libsbml’]):
Package to use to load the model.

Returns:

cobra.Model|list[cobra.Model]|libModel|list[libModel]:: The loaded model(s).

refinegems.utility.io.load_subset_from_db(subset_name: str) → tuple[str, str, pandas.DataFrame][source]

Load a subset from the database.

Args:

subset_name(str):
Name of the subset to be loaded.

Returns:

tuple of (1) str, (2) str and (3) pd.DataFrame

name of the subset
description of the subset
substance table for the subset

refinegems.utility.io.load_substance_table_from_db(mediumname: str, database: str, type: Literal['testing', 'standard'] = 'standard') → pandas.DataFrame[source]

Load a substance table from a database.

Currently available types:

‘testing’: for debugging
‘standard’: The standard format containing all information in long format.

Note: ‘documentation’ currently object to change

Args:

name (str):
The name (or identifier) of the medium.
database (str):
Path to the database.
type (Literal[‘testing’,’standard’], optional):
How to load the table. Defaults to ‘standard’.

Raises:

ValueError: Unknown type for loading the substance table.

Returns:

pd.DataFrame:: The substance table in the specified type retrieved from the database.

refinegems.utility.io.mimic_genbank(annot_genome: str | Path, gff: str | Path = None, dir: str = None) → Path[source]

Wrapper for mimic_genbank_fasta() and mimic_genbank_gbff(). Generate a protein FASTA file that looks similar to the extended GenBank format one can download from the FTP servers of NCBI with the “_translated_CDS” tag in their file name.

Mainly, this format contains additional information about the sequences in the header of each entry in the following format:

>name [protein_id=xxx] [locus_tag=yyy] …

Args:

annot_genome (Union[str,Path]):
An annotated genome file. Can be a FASTA (.faa, .fa) or GBFF file
gff (Union[str,Path], optional):
Path to a GFF file. Only necesseary when annot_genome is a FASTA fiel. Defaults to None.
dir (str, optional):
Path to a directory for saving the output to. Defaults to None, which uses the current working directory.

Raises:

ValueError: Mimic of GenBank format with a FASTA-file requires also a GFF.
ValueError: Unkown file extension

Returns:

Path:: Path to the generated FASTA file mimicing GenBank format.

refinegems.utility.io.mimic_genbank_fasta(annot_genome: str | Path, gff_path: str | Path, dir: str = None) → Path[source]

Using a protein FASTA e.g. from Prokka and a GFF file, generate a FASTA-file, that mimics the GenBank-format required for e.g. GapFiller.

Args:

annot_genome (Union[str,Path]):
Path to the annotated genome FASTA (protein sequences).
gff_path (Union[str,Path]):
Path to the GFF file.
dir (str, optional):
Path to a directory used to write the output to. Defaults to None, which uses the current working directory.

Returns:

Path:: The path the generated FASTA mimicing the GenBank format.

refinegems.utility.io.mimic_genbank_gbff(gbff_path: str | Path, dir: str = None) → Path[source]

Using a GBFF-file, generate a FASTA that mimics the GenBank-format required for e.g. GapFiller.

Args:

gbff_path (Union[str,Path]):
Path to the input GBFF file.
dir (str, optional):
Path to a directory used to write the output to. Defaults to None, which uses the current working directory.

Returns:

Path:: Path the generate file mimicing the GenBank format.

refinegems.utility.io.parse_dict_to_dataframe(str2list: dict) → pandas.DataFrame[source]

Parses dictionary of form {str: list} & | Transforms it into a table with a column containing the strings and a column containing the lists

Args:

str2list (dict):: Dictionary mapping strings to lists

Returns:

pd.DataFrame:: Table with column containing the strings and column containing the lists

refinegems.utility.io.parse_gbff_for_cds(file_path: str, extract_translation: bool = False) → pandas.DataFrame[source]

Retrieves a table containg information about the following qualifiers from a Genbank file: [‘protein_id’,’locus_tag’,’db_xref’,’old_locus_tag’,’EC_number’].

Args:

file_path (str):
Path to the Genbank (.gbff) file.

Returns:

pd.DataFrame:: A table containing the information above. Has the following columns= [‘ncbi_accession_version’, ‘locus_tag_ref’,’old_locus_tag’,’GeneID’,’EC number’].

refinegems.utility.io.parse_gff_for_cds(gffpath: str, keep_attributes: dict[slice(<class 'str'>, <class 'str'>, None)] = None) → pandas.DataFrame | tuple[pandas.DataFrame, str][source]

Parses a GFF file to obtain a mapping for the corresponding attributes listed in keep_attributes

Args:

gffpath (str):
Path to the GFF file.
keep_attributes (dict, optional):
Dictionary of attributes to be kept and the corresponding column for the table. Defaults to None.

Returns:

Case: return_variety = False

pd.DataFrame:
Dataframe containing a mapping for the corresponding attributes listed in keep_attributes
Case: return_variety = True
tuple:
tuple of (1) pd.DataFrame & (2) str:

pd.DataFrame: Dataframe containing a mapping for the corresponding attributes listed in keep_attributes

str: Found variety for provided GFF

refinegems.utility.io.search_sbo_label(sbo_number: str) → str[source]

Looks up the SBO label corresponding to a given SBO Term number

Args:

sbo_number (str):
Last three digits of SBO-Term as str

Returns:

str:: Denoted label for given SBO Term

refinegems.utility.io.validate_libsbml_model(model: libsbml.Model) → int[source]

Debug method: Validates a libSBML model with the libSBML validator

Args:

model (libModel):
A libSBML model

Returns:

int:: Integer specifying if validate was successful or not

refinegems.utility.io.write_model_to_file(model: libsbml.Model | cobra.Model, filename: str | Path)[source]

Save a model into a file.

Args:

model (libModel|cobra.Model):
The model to be saved
filename (str|Path):
The filename/path to save the model to.

Raises:

ValueError: Unknown file extension for model
TypeError: Unknown model type

set_up module

Collection of functions for setting up files and work environments.

refinegems.utility.set_up.PATH_MEDIA_CONFIG

refinegems.utility.set_up.download_config(filename: str = './my_config.yaml', type=Literal['media'])[source]

Load a configuration file from the package and save a copy of it for the user to edit.

Args:

filename (str, optional):
Filename to write the config to/save it under as. Defaults to ‘./my_config.yaml’.
type (Literal[‘media’], optional):
Type of configuration file to load. Can be ‘media’ for the media config file. Defaults to Literal[‘media’].

refinegems.utility.set_up.download_url(download_type: Literal['SwissProt gapfill'], directory: str = None, k: int = 10, t: int = 1)[source]

Download files necessary for certain functionalities of the toolbox from the internet.

Currently available:

‘SwissProt gapfill’: download files needed for the GeneGapFiller

Args:

dowload_type (Literal[‘SwissProt gapfill’]):
Type of files to download.
directory (str, optional):
Path to a directory to save the downloaded files to. Defaults to None (So the current working directory is used).
k (int, optional):
Chunksize in kB. Defaults to 10.
t (int, optional):
Number of threads to use for some additional setups, e.g. DIAMOND database creation. Defaults to 1.

Raises:

ValueError: Unknown database or file

util module

Collection of utility functions.

refinegems.utility.util.COMP_MAPPING = {'': 'uc', 'C_c': 'c', 'C_e': 'e', 'C_p': 'p', 'c': 'c', 'cytosol': 'c', 'e': 'e', 'extracellular': 'e', 'p': 'p', 'w': 'uc'}

refinegems.utility.util.DB2REGEX

refinegems.utility.util.MIN_GROWTH_THRESHOLD = 1e-05

refinegems.utility.util.SBO_BIOCHEM_TERMS = ['SBO:0000377', 'SBO:0000399', 'SBO:0000402', 'SBO:0000403', 'SBO:0000660', 'SBO:0000178', 'SBO:0000200', 'SBO:0000214', 'SBO:0000215', 'SBO:0000217', 'SBO:0000218', 'SBO:0000219', 'SBO:0000220', 'SBO:0000222', 'SBO:0000223', 'SBO:0000233', 'SBO:0000376', 'SBO:0000401']

refinegems.utility.util.SBO_TRANSPORT_TERMS = ['SBO:0000658', 'SBO:0000657', 'SBO:0000654', 'SBO:0000659', 'SBO:0000660']

refinegems.utility.util.VALID_COMPARTMENTS = {'c': 'cytosol', 'e': 'extracellular space', 'p': 'periplasm', 'uc': 'unknown compartment'}

class refinegems.utility.util._BioregistryPatternMap[source]

Bases: object

Lazy bioregistry pattern map for optional annotation features.

__getitem__(key: str) → str[source]

get(key: str, default=None)[source]

refinegems.utility.util.insert_into_dict(dictionary: dict, new_key_vals: Tuple[str, int], target_key: str, mode: Literal['before', 'after'] = 'before') → dict[source]

refinegems.utility.util.is_stoichiometric_factor(s: str) → bool[source]

“Check if a string could be used as a stoichiometric factor.

Args:

s (str):
The string to check.

refinegems.utility.util.reannotate_sbo_memote(model: cobra.Model)[source]

Reannotate the SBO annotations (e.g. from SBOannotator) of a model into the SBO scheme accessible by memote.

Args:

model (cobra.Model):
The cobra Model to be reannotated.

refinegems.utility.util.sum_biomass_weight(reaction: libsbml.Reaction) → float[source]

From MEMOTE: https://github.com/opencobra/memote/blob/81a55a163262a0e06bfcb036d98e8e551edc3873/src/memote/support/biomass.py#L95

Compute the sum of all reaction compounds.

This function expects all metabolites of the biomass reaction to have formula information assigned.

Note

If there is a rest symbolised by an “R”, its weight will be considered 0.

Args:

reaction (Reaction):
The biomass reaction of the model under investigation.

Returns:

float:: The molecular weight of the biomass reaction in units of g/mmol.

refinegems.utility.util.test_biomass_consistency(model: cobra.Model, reaction_id: str) → float | str[source]

Modified from MEMOTE: https://github.com/opencobra/memote/blob/81a55a163262a0e06bfcb036d98e8e551edc3873/src/memote/suite/tests/test_biomass.py#L89

Expect biomass components to sum up to 1 g[CDW].

This test only yields sensible results if all biomass precursor metabolites have chemical formulas assigned to them. The molecular weight of the biomass reaction in metabolic models is defined to be equal to 1 g/mmol. Conforming to this is essential in order to be able to reliably calculate growth yields, to cross-compare models, and to obtain valid predictions when simulating microbial consortia. A deviation from 1 - 1E-03 to 1 + 1E-06 is accepted.

Implementation: Multiplies the coefficient of each metabolite of the biomass reaction with its molecular weight calculated from the formula, then divides the overall sum of all the products by 1000.

Args:

model(cobraModel):
The model loaded with COBRApy.
reaction_id(str):
Reaction ID of a BOF.

Returns:

Case: problematic input

str:
an error message.
Case: successful testing

float:
biomass weight

refinegems.utility.util.test_biomass_presence(model: cobra.Model) → list[str] | None[source]

Modified from MEMOTE: https://github.com/opencobra/memote/blob/81a55a163262a0e06bfcb036d98e8e551edc3873/src/memote/suite/tests/test_biomass.py#LL42C3-L42C3

Expect the model to contain at least one biomass reaction.

The biomass composition aka biomass formulation aka biomass reaction is a common pseudo-reaction accounting for biomass synthesis in constraints-based modelling. It describes the stoichiometry of intracellular compounds that are required for cell growth. While this reaction may not be relevant to modeling the metabolism of higher organisms, it is essential for single-cell modeling.

Implementation: Identifies possible biomass reactions using two principal steps:

1. Return reactions that include the SBO annotation “SBO:0000629” for biomass.

If no reactions can be identified this way:

Look for the buzzwords “biomass”, “growth” and “bof” in reaction IDs.

Look for metabolite IDs or names that contain the buzzword “biomass” and obtain the set of reactions they are involved in.

Remove boundary reactions from this set.

Return the union of reactions that match the buzzwords and of the reactions that metabolites are involved in that match the buzzword.

This test checks if at least one biomass reaction is present.

If no reaction can be identified return None.