refinegems.classes subpackage

egcs module

Identify, report and solve energy generating cycles (EGCs).

refinegems.classes.egcs.DISSIPATION_RXNS = {'ACCOA': {'Acetate [Acetic acid]': 1, 'Acetyl-CoA': -1, 'Coenzyme A': 1, 'Hydrogen [H(+)]': 1, 'Water [H2O]': -1}, 'ATP': {'ADP [Adenosine diphosphate]': 1, 'ATP [Adenosine triphosphate]': -1, 'Hydrogen [H(+)]': 1, 'Phosphate [PO4(3-)]': 1, 'Water [H2O]': -1}, 'CTP': {'CDP [Cytidine diphosphate]': 1, 'CTP [Cytidine triphosphate]': -1, 'Hydrogen [H(+)]': 1, 'Phosphate [PO4(3-)]': 1, 'Water [H2O]': -1}, 'DMMQL8': {'2-Demethylmenaquinol-8': 1, '2-Demethylmenaquinone-8': -1, 'Hydrogen [H(+)]': 2}, 'FADH2': {'FAD [oxidized Flavin adenine dinucleotide]': 1, 'FADH2 [reduced Flavin adenine dinucleotide]': -1, 'Hydrogen [H(+)]': 2}, 'FMNH2': {'FMN [oxidized Flavin mononucleotide]': 1, 'FMNH2 [reduced Flavin mononucleotide]': -1, 'Hydrogen [H(+)]': 2}, 'GLU': {'2-Oxoglutarate [Oxoglutaric acid]': 1, 'Ammonia': 1, 'D-Glucose': -1, 'Hydrogen [H(+)]': 2, 'Water [H2O]': -1}, 'GTP': {'GDP [Guanosine diphosphate]': 1, 'GTP [Guanosine triphosphate]': -1, 'Hydrogen [H(+)]': 1, 'Phosphate [PO4(3-)]': 1, 'Water [H2O]': -1}, 'ITP': {'Hydrogen [H(+)]': 1, 'IDP [Inosine diphosphate]': 1, 'ITP [Inosine triphosphate]': -1, 'Phosphate [PO4(3-)]': 1, 'Water [H2O]': -1}, 'MQL8': {'Hydrogen [H(+)]': 2, 'Menaquinol-8': 1, 'Menaquinone-8': -1}, 'NADH': {'Hydrogen [H(+)]': 1, 'NAD [oxidized Nicotinamide adenine dinucleotide]': 1, 'NADH [reduced Nicotinamide adenine dinucleotide]': -1}, 'NADPH': {'Hydrogen [H(+)]': 1, 'NADP [oxidized Nicotinamide adenine dinucleotide phosphate]': 1, 'NADPH [reduced Nicotinamide adenine dinucleotide phosphate]': -1}, 'PROTON': {'Hydrogen [H(+)]': 1, 'Hydrogen [H(+)] transported': -1}, 'Q8H2': {'Hydrogen [H(+)]': 2, 'Ubiquinol-8': 1, 'Ubiquinone-8': -1}, 'UTP': {'Hydrogen [H(+)]': 1, 'Phosphate [PO4(3-)]': 1, 'UDP [Uridine diphosphate]': 1, 'UTP [Uridine triphosphate]': -1, 'Water [H2O]': -1}}

class refinegems.classes.egcs.EGCSolver(threshold: float = MIN_GROWTH_THRESHOLD, limit: int = 2, chunksize: int = 1)[source]

Bases: object

Parent class for the EGC solvers with generally useful functions and attributes. Can only be used to find, not solve EGCs directly.

Attributes:

theshold: Float describing the cutoff, under which the model
will no longer considered to be growing. Defaults to the MIN_GROWTH_THRESHOLD set in the growth module.
limit: Sets the maximal number of cores to be used.
Defaults to 2.
chunksize: Chunksize to use for multiprocessing.
Defaults to 1.

__init__(threshold: float = MIN_GROWTH_THRESHOLD, limit: int = 2, chunksize: int = 1) → None[source]

add_DISSIPATIONRXNS(model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) → cobra.Model[source]

Add the dissipation reactions a model.

Args:

model (cobra.Model):
A model loaded with COBRApy.
namespace (Literal[‘BiGG’], optional): Namespace of the model.
Defaults to ‘BiGG’.
compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].

Returns:

cobra.Model:: The edited model.

check_metab_integration(metabolites: dict[slice(<class 'str'>, <class 'int'>, None)], model: cobra.Model, metab_info: ~refinegems.classes.medium.Medium, namespace: ~typing.Literal['BiGG'] = "BiGG", compartment: list = ["c", "e"]) → None | dict[source]

Check if the metabolites of a reactions are in the model. If yes, return the dictionary of metabolites (their IDs in the model) to the factors. If no, return None

Args:

metabolites (dict[str: int]):
Metabolites mapped to factors.
model (cobra.Model):
The model loaded with COBRApy.
metab_info (Medium):
Information about the metabolites from the database, in for of a Medium object.
namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].

Returns:

Case: metabolites not found

None:
nothing to return
Case: found

dict:
The mapping of IDs of the metabolites to the factors.

egcs_removed(model: cobra.Model, starting_egcs: dict, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) → list[source]

Compare a list of previously found EGCs to the current EGCs in the model.

Args:

model (cobra.Model):
The model loaded with COBRApy after a try of solving the EGCs.
starting_egcs (dict):
List of EGCs before trying to solve them.
namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].

Returns:

list:: List of newly removed EGCs.

find_egcs(model: cobra.Model, with_reacs: bool = False, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) → list | tuple[source]

Find the EGCs in a model - if exsistend.

Args:

model (cobra.Model):
The model loaded with COBRApy.
with_reacs (bool, optional):
Option to either only return the names of the found EGC or additionally also the reactions, which show fluxes during testing. Defaults to False.
namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].

Returns:

Case: with_reacs = False

list:
List of found EGC names.
Case: with_reacs = True
tuple:
tuple of (1) dictionary & (2) list:

dict: dictionary of the EGCs

list: their reactions that showed fluxes and the objective values of the test.

limit_bounds(model: cobra.Model)[source]

Limits upper and lower bounds of

exchange reactions to (0, 0)
reversible reactions to (-1, 1)
irreversible reactions to (0, 1)

Excludes dissipation reactions.

Args:

model (cobra.Model):
COBRApy model

refinegems.classes.egcs.EGC_SCORING_MATRIX = {'MR': 1, 'RB': 3, 'RF': 3, 'RM': 6}

class refinegems.classes.egcs.GreedyEGCSolver(scoring_matrix: dict = EGC_SCORING_MATRIX, **kwargs)[source]

Bases: EGCSolver

EGC solver that finds a good solution (greedy) based on modifications to single reactions.

Workflow:

identify existing EGCs
test, if EGCs can be solved using single modifications of reactions
- possible modifications:
  deletion (RM)
  
  set reversible (MR)
  
  remove backward (forward only) (RB)
  
  remove forward (backward only) (RF)
find a good - not optimal - combination of reactions, that solve the maximum number of EGCs that can be solved this way
apply solution to the model
report remaining EGCs, score and reactions used for solution

Attributes:

all attributes of the base class refinegems.classes.egcs.EGCSolver
scoring_matrix:
Dictionary of the changes (RM, MR, RF, RB) against Integers describing the penalty scores.

__init__(scoring_matrix: dict = EGC_SCORING_MATRIX, **kwargs) → None[source]

apply_modifications(model: cobra.Model, solution: dict)[source]

Apply the modifications to reactions in solution to the model.

4 modifications are possible:

“RM” -> removes the reaction
“RB” -> removes the backwards reaction
“RF” -> removes the forward reaction
“MR” -> makes reaction reversible

Args:

model (cobra.Model):
Input model
solution (dict):
Best solution from calculation in py:func:find_solution_greedy.

check_egc_growth(reac: cobra.Reaction, model: cobra.Model, bounds: tuple, starting_egcs: dict, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) → list | None[source]

Check EGC removal and growth of a model when chaning the bounds of a single reaction.

Args:

reac (cobra.Reaction):
The reaction to change
model (cobra.Model):
The model (COBRApy) to manipulate.
bounds (tuple):
The new reactions bounds.
starting_egcs (dict):
Dict of the original EGCs found in the model.
namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
compartment (list, optional): List of length 2 with the names of the
compartments for the dissipations reactions. Defaults to [‘c’,’e’].

Returns:

Case if EGCs removed

list:
List of EGCs that can be removed with the change
Case no removal possible

None:
no return

find_mods_resolve_egcs_greedy(model: cobra.Model, present_egcs: dict, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) → dict[source]

Find the (single) modifications to reactions in a cobra.Model and returns these in a dictionary. Splits the modification check in multiple processes.

Args:

model (cobra.Model):
The model loaded with COBRApy.
present_egcs (dict):
Dict of the original EGCs found in the model.
namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].

Returns:

dict:: Dictionary of potential modifications to resolve EGCs {“egc”: {“MR”:[potential_solutions], “RB”:[potential_solutions], “RF”:[potential_solutions], “RM”:[potential_solutions]}}

find_solution_greedy(results: dict, egc_reactions: dict) → tuple[source]

Based on the originally found EGCs and the output of find_mods_resolve_egcs_greedy(), find a solution that solves all EGCs that can be solved with the results.

Args:

results (dict):
Output of find_mods_resolve_egcs_greedy().
egc_reactions (dict):
Output of find_egcs() with ‘with_reac=True’. Should be the EGCs before calculating any solutions and applying them.
scoring_matrix (dict, optional):
Dictionary of the modifications types (RM, MR, RF, RB) and their penality score. Defaults to the in-build scoring matrix.

Returns:

tuple:

Tuple of (1) dict & (2) int:

dictionary of reaction IDs and their mode of change
score of the solution

solve_egcs(model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) → dict | None[source]

Run the complete greedy EGC solving process.

Note: The input model gets changed, if EGCs can be solved.

Args:

model (cobra.Model):
The model loaded with COBRApy.
namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].

Returns:

dict:

Dictionary with the following entries.

‘solution’: List of reactions for the solution.
‘score’: Score of the solution.
‘remaining egcs’: List of EGCs that could not be solved.

test_modifications(reaction: cobra.Reaction, model: cobra.Model, present_egc: dict, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) → dict[source]

Tries four cases for a Reaction

if reaction is not reversible -> make reaction reversible (MR)
limit backward reaction (RB)
limit forward reaction (RF)
“delete” reaction by setting fluxes to 0 (RM)

-> for each case the EGCs which are present in the model are checked if they are removed

-> if EGCs are removed we check if the model still grows on optimal medium

=> When both limitations are True reaction is saved to corresponding dictionary

Args:

reaction (cobra.Reaction):
Reaction from a cobra.Model
model (cobra.Model):
The corresponding GEM loaded with cobrapy
present_egc (dict):
Dictionary of present EGCs {“egc”: {}} -> EGCs are keys
namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].

Returns:

dict:: {“egc”: {“MR”:[potential_solutions], “RB”:[potential_solutions], “RF”:[potential_solutions], “RM”:[potential_solutions]}}

gapfill module

Add reactions, genes and more to a model based on different gap-filling methods. All (current) algorithms are separated into three steps: finding missing genes, finding missing reactions and trying to add the found as missing entities to the model.

Available gap filling methods:

KEGGapFiller

Mainly utilises information from the KEGG database. Needs a KEGG organism ID.

Estimated runtime: to be determined
BioCycGapFiller

Mainly utilises information from the BioCyc database. Requires access to BioCyc SmartTables.

Estimated runtime: to be determined
GeneGapFiller

Search for gaps using the GFF file and information from SwissProt.

Estimated runtime: to be determined

class refinegems.classes.gapfill.BioCycGapFiller(biocyc_gene_tbl_path: str, biocyc_reacs_tbl_path: str, gff: str)[source]

Bases: GapFiller

Based on a SmartTable with information on the genes and a SmartTable with
information on the reactions of the organism of the model, this class
finds missing genes in the model and maps them to reactions to try and
fill the gaps found with the BioCyc gene SmartTable.

For specifications on the SmartTables see the attributes biocyc_gene_tbl
& biocyc_reacs_tbl

Note

Please keep in mind that using this module requires a model containing the Genbank locus tags as labels. If your model does not conform to this you can use one of the functions polish_model() or extend_gp_annots_via_mapping_table().

Attributes:

GapFiller Attributes:
All attributes of the parent class GapFiller
biocyc_gene_tbl_path (str, required):
Path to organism-specific SmartTable for genes from BioCyc; Should contain the columns: Accession-2 | Reactions of gene
biocyc_reacs_tbl_path (str, required):
Path to organism-specific SmartTable for reactions from BioCyc; Should contain the columns: Reaction | Object ID | EC-Number | Spontaneous?
gff (str, required):
Path to organism-specific GFF file

__init__(biocyc_gene_tbl_path: str, biocyc_reacs_tbl_path: str, gff: str) → None[source]

property biocyc_rxn_tbl: Get or set the current BioCyc Reaction table.

While setting the provided path for a TSV file from BioCyc with the

columns 'Reaction' | 'Object ID' | 'EC-Number' | 'Spontaneous?' is

parsed and the content of the file is stored in a DataFrame.

find_missing_genes(model: libsbml.Model)[source]

Retrieves the missing genes and reactions from the BioCyc table according to the ‘Accession-2’ identifiers (locus_tags)

Args:

model (libModel):
Model loaded with libSBML

find_missing_reactions(model: cobra.Model)[source]

Retrieves the missing reactions with more information like the equation, EC code, etc. according to the missing genes

Args:

model (cobra.Model):
Model loaded with COBRApy

property full_gene_list: Get or set the current BioCyc Gene table.

While setting the provided path for a TSV file from BioCyc with the

columns 'Accession-2' | 'Reactions of gene' is parsed and the

content of the file is stored in a DataFrame containing all rows where

a ‘Reactions of gene’ exists.

Hint

Please keep in mind that the column Accession-2 needs to contain Genbank locus tags. If that is not the case for your organism use the correct column from BioCyc and rename it accordingly.

refinegems.classes.gapfill.DBEQ2EQ

refinegems.classes.gapfill.DB_REFERENCE_COLS

class refinegems.classes.gapfill.GapFiller[source]

Bases: ABC

Abstract base class for the gap filling.

Already includes functions for the “filling” part of the gap-filling approach and some helper functions. Each subclass needs an implementation of find_missing_genes and find_missing_reactions to determine the entities, that are missing in the model.

Attributes:

full_gene_list (list):
List of all the genes.
missing_genes:
DataFrame containing all genes found as missing with additional information. <br> ncbiprotein | locus_tag | <optional columns>
missing_reactions:
DataFrame containing all reactions found as missing with additional information.<br> ec-code | ncbiprotein | id | equation | reference | <is_transport> | via | add_to_GPR
geneid_type (str):
What type of gene ID the model contains. Defaults to ‘ncbi’.
_statistics (dict):
Dictionary of statistical information of the gap-filling run. Includes e.g. the number of added genes and reactions.
manual_curation (dict):
Dictionary of reaction and gene IDs to be used for manual curation.

__init__() → None[source]

_find_reac_in_model(model: cobra.Model, eccode: str, id: str, idtype: Literal['MetaNetX', 'KEGG', 'BiGG', 'BioCyc'], include_ec_match: bool = False) → None | list[source]

Helper function of GapFiller. Search the model for an ID (and optionally EC number), to determine, if the reaction is in the model.

Args:

model (cobra.Model):
The model loaded with COBRApy.
eccode (str):
The EC number in the format: X.X.X.X
id (str):
The ID to search for.
idtype (Literal[‘MetaNetX’,’KEGG’,’BiGG’, ‘BioCyc’]):
Name of the database the ID belongs to.
include_ec_match (bool, optional):
Option to include a match if only the EC number matches. Defaults to False.

Returns:

Case one or more match found:

list:
List of the ID of the reactions in the model, that match the query.
Case no match:

None:
Nothing found.

add_gene_reac_associations_from_table(model: libsbml.Model, reac_table: pandas.DataFrame) → None[source]

Using a table with at least the columns ‘ncbiprotein’ (containing e.g. NCBI protein identifier (lists), should be gene IDs in the model) and ‘add_to_GPR’ (containing reactions identifier (lists)), add the gene IDs to the GPRs of the corresponding reactions.

Args:

model (libModel):
The model loaded with libSBML.
reac_table (pd.DataFrame):
The table containing at least the columns ‘ncbiprotein’ (gene IDs) and ‘add_to_GPR’ (reaction IDs)

add_genes_from_table(model: libsbml.Model, gene_table: pandas.DataFrame) → None[source]

Create new GeneProduct for a table of genes in the format:

ncbiprotein | locus_tag | reference | … |

The dots symbolise additional columns, that can be passed to the function, but will not be used by it. The other columns are required.

Args:

model (libModel):
The model loaded with libSBML.
gene_table (pd.DataFrame):
The table with the genes to add. At least needs the columns ncbiprotein, locus_tag and reference. More columns can be provided.

add_reactions_from_table(model: cobra.Model, missing_reac_table: pandas.DataFrame, formula_check: Literal['none', 'existence', 'wildcard', 'strict'] = 'existence', exclude_dna: bool = True, exclude_rna: bool = True, idprefix: str = 'refineGEMs', namespace: Literal['BiGG'] = 'BiGG') → pandas.DataFrame[source]

Helper function to add reactions to a model from the missing_reactions table (output of the chosen implementation of find_missing_reactions())

Args:

model (cobra.Model):
The model, loaded with COBRpy.
missing_reac_table (pd.DataFrame):
The missing reactions table.
formula_check (Literal[‘none’,’existence’,’wildcard’,’strict’], optional):
Param for checking metabolite formula before adding them to the model. For more information, refer to isreaction_complete() Defaults to ‘existence’.
exclude_dna (bool, optional):
Option to exclude reaction containing ‘DNA’ from being added to the model. Defaults to True.
exclude_rna (bool, optional):
Option to exclude reaction containing ‘RNA’ from being added to the model. Defaults to True.
idprefix (str, optional):
A prefix to use, if pseudo-IDs need to be created. Defaults to ‘refineGEMs’.
namespace (Literal[‘BiGG’], optional):
Namespace to use for the reactions and metabolites (and the model). Defaults to ‘BiGG’.

Raises:

TypeError: Unknown return type for reac param. Please contact the developers.

Returns:

pd.DataFrame:: Table containing the information about which genes can now be added to reactions (use for GPR curation).

fill_model(model: cobra.Model | libsbml.Model, **kwargs) → libsbml.Model[source]

Based on a table of missing genes and missing reactions, fill the gaps in a model as good as possible automatically.

Note

This model rewrites and reloads the input model. Only the returned model has all the edits.

Args:

model (Union[cobra.Model,libModel]):
The model, either a libSBML or COBRApy model entity.
kwargs:
Additional parameters to be passed to add_reactions_from_table().

Raises:

TypeError: Unknown type of model.

Returns:

libModel:: The gap-filled model.

abstract find_missing_genes(model)[source]

Find missing genes in the model. Parameters can be extended as needed.

Needs to save a table in the format

ncbiprotein | locus_tag | <optional columns>

to the attribute missing_genes.

abstract find_missing_reactions(model)[source]

Find missing reactions in the model. Parameters can be extended as needed.

Needs to save a table of the format

to the attribute missing_reactions.

Method specific information can be added to the reference column, which is expected to contain a dictionary. The ‘via’ column describes the way the database will be added to the model.

report(dir: str, hide_zeros: bool = False, no_title: bool = False) → None[source]

Based on the previous gap-filling, save statistics and missing genes/reactions for manual curation.

Args:

dir (str):
Path to a directory to save the report to.
hide_zeros (bool, optional):
Option to hide statistics with zero counts. Defaults to False.
with_title (bool, optional):
Option to get figure without title. Defaults to False.

class refinegems.classes.gapfill.GeneGapFiller[source]

Bases: GapFiller

Find gaps in the model using the GFF file of the underlying genome and a DMND database and optionally NCBI.

This gap filling approach tries to identify missing genes from the GFF file and uses DIAMOND to run a blastp search for homologs against the DMND database

Note

Please keep in mind that using this module requires a model containing the Genbank locus tags as labels. If your model does not conform to this you can use one of the functions polish_model() or extend_gp_annots_via_mapping_table().

Hint

Files required for the swissprot approach can be downloaded with download_url()

Attributes:

GapFiller Attributes:
All attributes of the parent class GapFiller

GFF_COLS = {'eC_number': 'ec-code', 'locus_tag': 'locus_tag', 'protein_id': 'ncbiprotein'}

__init__() → None[source]

find_missing_genes(gffpath: str | Path, model: libsbml.Model)[source]

Find missing genes by comparing the CDS regions written in the GFF with the GeneProduct entities in the model.

Args:

gffpath (Union[str, Path]):
Path to a GFF file (corresponding to the model).
model (libModel):
The model loaded with libSBML.

find_missing_reactions(model: cobra.Model, prefix: str = 'refinegems', type_db: Literal['swissprot', 'user'] = 'swissprot', fasta: str = None, dmnd_db: str = None, map_db: str = None, mail: str = None, check_NCBI: bool = False, threshold_add_reacs: int = 5, **kwargs) → None[source]

Find missing reactions in the model by blasting the missing genes against the SwissProt database and mapping the results to EC/BRENDA.

Optionally, query the protein accession numbers against NCBI (increases runtime significantly).

Note

When running this function with the default parameters, no mapping is performed. If you want to use the mapping, please provide the required parameters. The defaults are primarily to ensure smooth addition to workflows in pipelines.

Hint

For more information on how to get the SwissProt files, please see download_url().

Args:

model (cobra.Model):
The model loaded with COBRApy.
prefix (str, optional):
Prefix for gene IDs in the model, the have to be generated randomly, as no ID from the chosen namespace (usually NCBI protein) has been found. Defaults to ‘refinegems’.
mail (str, optional):
Mail address for the query against NCBI. If not set, skips the mapping. Defaults to None.
check_NCBI (bool, optional):
If set to True, checking the gene IDs / NCBI protein IDs against the NCBI database is enabled. Else, this step is skipped to reduce runtime. Only usable with SwissProt as database. Defaults to False.
fasta (str, optional):
The protein FASTA file of the organism the model was build on. Required for the search against SwissProt. Defaults to None.
type_db (Literal[‘swissprot’,’user’], optional):
Database to search against. Choose ‘swissprot’ for SwissProt or ‘user’ for a user defined database. Defaults to ‘swissprot’.
dmnd_db (str, optional):
Path to the DIAMOND database containing the protein sequences of SwissProt. Required for the search against SwissProt or the users own DIAMOND database. Defaults to None.
map_db (str, optional):
Path to the SwissProt/users own mapping file. Required for the search against SwissProt/user-defined database. Greatly decreases runtime for running the DIAMOND search. Note, that the mapping depends on the chosen database. Defaults to None.
threshold_add_reacs (int, optional):
Threshold for the amount of reactions to add to the model. Defaults to 5.
kwargs:
Further optional parameters for the mapping, e.g. outdir, sens, cov, t, pid, etc. For more information see refinegems.utility.db_access.map_to_homologs() in case of type_db = ‘user’ or refinegems.utility.db_access.get_ec_via_swissprot() in case of type_db = ‘swissprot’.

class refinegems.classes.gapfill.KEGGapFiller(organismid: str)[source]

Bases: GapFiller

Based on a KEGG organism ID (corresponding to the organism of the model), find missing genes in the model and map them to reactions to try and fill the gaps found with the KEGG database.

Note

Please keep in mind that using this module requires a model containing the Genbank locus tags as labels as these are used in combination with the organism ID to query KEGG. Usually, the KEGG Gene ID consists of the organism ID and the Genbank locus tag and looks like <organismid>:<locus_tag>. If your model does not conform to this you can either use the function polish_model() or the function extend_gp_annots_via_mapping_table() in combination with the function extend_gp_annots_via_KEGG().

Warning

If the locus tag from Genbank and the locus tag part from the KEGG Gene ID do not match and running the functions above does not solve the issue for your organism, please recheck if all GeneProducts in your model contain valid KEGG Gene IDs in the annotation bag. Otherwise, add these manually to the model.

Warning

If the Genbank locus tags are not part of the KEGG Gene ID, please recheck the locus tags added as labels to the newly created GeneProducts after running fill_model().

Note

Due to the KEGG REST API this is relatively slow.

Attention

A second version of this class is in development, that will enable gap-filling via KEGG using an orthologouos strains instead of relying on a direct match in the KEGG database.

Attributes:

GapFiller Attributes:
All attributes of the parent class GapFiller
organismid (str, required):
Abbreviation of the organism in the KEGG database.

__init__(organismid: str) → None[source]

find_missing_genes(model: libsbml.Model)[source]

Get the missing genes in model in comparison to the KEGG entry of the organism. Saves a table containing the missing genes to the attribute missing_genes.

Format:

locus_tag | kegg.orthology | ec-code | ncbiprotein | reference

Args:

model (libModel):
The model loaded with libSBML.

find_missing_reactions(model: cobra.Model, threshold_add_reacs: int = 5)[source]

Find missing reactions in the model. Parameters can be extended as needed.

Needs to save a table of the format

to the attribute missing_reactions.

Method specific information can be added to the reference column, which is expected to contain a dictionary. The ‘via’ column describes the way the database will be added to the model.

refinegems.classes.gapfill._clean_table_after_mapping(mapped_table: pandas.DataFrame, entity_type: Literal['reaction', 'gene'] = 'reaction') → pandas.DataFrame[source]

Clean a table containing mapping results for different databases

Args:

mapped_table (pd.DataFrame):
Table containinfg a mapping for different databases
entity_type (Literal[‘reaction’, ‘gene’], optional):
Type of entity the mapping refers to. Defaults to ‘reaction’.

Returns:

pd.DataFrame:: The cleaned table.

refinegems.classes.gapfill.cobra_gapfill_wrapper(model: cobra.Model, universal: cobra.Model, medium: Medium, namespace: Literal['BiGG'] = 'BiGG', growth_threshold: float = 0.05, iterations: int = 3, chunk_size: int = 10000) → cobra.Model[source]

Wrapper for single_cobra_gapfill().

Either use the full set of reactions in universal model by setting iteration to 0 or None or use them in randomized chunks for faster runtime (useful on laptops). Note: when using the second option, be aware that this does not test all reaction combinations exhaustively (heuristic approach!!!).

Args:

model (cobra.Model):
The model to perform gapfilling on.
universal (cobra.Model):
A model with reactions to be potentially used for the gapfilling.
medium (Medium):
A medium the model should grow on.
namespace (Literal[‘BiGG’], optional):
Namespace to use for the model. Options include ‘BiGG’. Defaults to ‘BiGG’.
growth_threshold (float, optional):
Growth threshold for the gapfilling. Defaults to 0.05.
iterations (int, optional):
Number of iterations for the heuristic version of the gapfilling. If 0 or None is given, uses full set of reactions. Defaults to 3.
chunk_size (int, optional):
Number of reactions to be used for gapfilling at the same time. If None or 0 is given, use full set, not heuristic. Defaults to 10000.

Returns:

cobra.Model:: The gapfilled model, if a solution was found.

refinegems.classes.gapfill.map_biocyc_to_reac(biocyc_reacs: pandas.DataFrame, use_MNX: bool = True, use_BiGG: bool = True) → pandas.DataFrame[source]

Based on a table containing BioCyc reactions, map them to reactions in other databases (if a mapping is possible)

Args:

biocyc_reacs (pd.DataFrame):
Table containing BioCyc reactions information. Should contain the columns: Reaction | Object ID | EC-Number | Spontaneous? Can be doenloaded as a SmartTable from BioCyc.
use_MNX (bool, optional):
Try mapping using the MetaNetX database. Defaults to True.
use_BiGG (bool, optional):
Try mapping using the BiGG database. Defaults to True.

Returns:

pd.DataFrame:: The extended table.

refinegems.classes.gapfill.map_ec_to_reac(table: pandas.DataFrame, use_MNX: bool = True, use_BiGG: bool = True, use_KEGG: bool = True, threshold_add_reacs: int = 5) → pandas.DataFrame[source]

Based on a table of NCBI protein IDs and EC numbers, map them to reactions via different databases (if a mapping is possible).

input table should have format:

ec-code | ncbiprotein

output table has the format:

Args:

table (pd.DataFrame):
The input table.
use_MNX (bool, optional):
Try mapping using the MetaNetX database. Defaults to True.
use_BiGG (bool, optional):
Try mapping using the BiGG database. Defaults to True.
use_KEGG (bool, optional):
Try mapping using the KEGG database. Defaults to True.
threshold_add_reacs (int, optional):
Maximum number of reactions allowed per EC number per ncbiprotein ID to be added. Otherwise skip addition of reactions due to insufficient evidence Defaults to 5.

Returns:

pd.DataFrame:: The extended table.

refinegems.classes.gapfill.multiple_cobra_gapfill(model: cobra.Model, universal: cobra.Model, media_list: list[Medium], namespace: Literal['BiGG'] = 'BiGG', growth_threshold: float = 0.05, iterations: int = 3, chunk_size: int = 10000) → cobra.Model[source]

Perform single_cobra_gapfill() on a list of media.

Args:

model (cobra.Model):
The model to be gapfilled.
universal (cobra.Model):
The model to use reactions for gapfilling from.
media_list (list[Medium]):
List ofmedia the model is supposed to grow on.
growth_threshold (float, optional):
Growth threshold for the gapfilling. Defaults to 0.05.
iterations (int, optional):
Number of iterations for the heuristic version of the gapfilling. If 0 or None is given, uses full set of reactions. Defaults to 3.
chunk_size (int, optional):
Number of reactions to be used for gapfilling at the same time. If None or 0 is given, use full set, not heuristic. Defaults to 10000.

Returns:

cobra.Model:: The gapfilled model, if a solution was found.

refinegems.classes.gapfill.single_cobra_gapfill(model: cobra.Model, universal: cobra.Model, medium: Medium, namespace: Literal['BiGG'] = 'BiGG', growth_threshold: float = 0.05) → list[str] | bool[source]

Attempt gapfilling (with COBRApy) for a given model to allow growth on a given medium.

Args:

model (cobra.Model):
The model to perform gapfilling on.
universal (cobra.Model):
A model with reactions to be potentially used for the gapfilling.
medium (Medium):
A medium the model should grow on.
namespace (Literal[‘BiGG’], optional): Namespace to use for the model.
Defaults to ‘BiGG’.
growth_threshold (float, optional): Minimal rate for the model to be considered growing.
Defaults to 0.05.

Returns:

Union[list[str],True]:: List of reactions to be added to the model to allow growth or True, if the model already grows.

medium module

Provides the medium class and additional functions to handle media.

Functionalities include (amongst others):

loading a medium into a Medium object from a database, a file or a model
adding a medium to a model
adding media information to the database
extending, change and manipulate various parts of a medium to create the desired medium

refinegems.classes.medium.ALLOWED_DATABASE_LINKS = ['BiGG', 'MetaNetX', 'SEED', 'VMH', 'ChEBI', 'KEGG']

refinegems.classes.medium.FLOAT_REGEX = re.compile('[+-]?(\\d+(\\.\\d*)?|\\.\\d+)([eE][+-]?\\d+)?')

refinegems.classes.medium.INTEGER_REGEX = re.compile('^[-+]?([1-9]\\d*|0)$')

class refinegems.classes.medium.Medium(name: str, substance_table: pandas.DataFrame = pd.DataFrame(columns=['name', 'formula', 'flux', 'source', 'db_id', 'db_type']), description: str = None, doi: str = None)[source]

Bases: object

Class describing a medium.

Attributes:

name (str):
The name or abbreviation of the medium.
substance_table (pd.DataFrame):
A table containing information about the medium in silico compounds. Long format.
description (str, optional):
Short description of the medium. Defaults to None.
doi (str):
Reference(s) to the original publication of the medium. Defaults to None.

__add__(other: Medium) → Medium[source]

__init__(name: str, substance_table: pandas.DataFrame = pd.DataFrame(columns=['name', 'formula', 'flux', 'source', 'db_id', 'db_type']), description: str = None, doi: str = None)[source]

Initialise a Medium object.

Args:

name (str):
The name or abbreviation of the medium.
substance_table (pd.DataFrame, optional):
A table containing information about the medium in silico compounds. Long format. Defaults to an empty table with the columns [‘name’,’formula’,’flux’,’source’,’db_id’,’db_type’].
description (str, optional):
Short description of the medium. Defaults to None.
doi (str, optional):
Reference(s) to the original publication of the medium.. Defaults to None.

static _produce_medium_docs_table_row(row: pandas.Series, file: TextIOWrapper)[source]

Helper function for producing reStructured text for medium definitions, e.g. in with produce_medium_docs_table(). Tranforms each row of the substance table into a row of the rst-file.

Args:

row (pd.Series):
The row of the Medium.substance_table.
file (io.TextIOWrapper):
The connection to the file to write the rows into.

add_subset(subset_name: str, default_flux: float = 10.0, inplace: bool = True) → Medium[source]

Add a subset of substances to the medium, returning a newly generated one.

Args:

subset_name (str):
The type of subset to be added. Name should be in database-subset-id.
default_flux (float, optional):
Default flux value to calculate fluxes from based on the percentages saved in the database. Defaults to 10.0.

Returns:

Medium:: A new medium that is the combination of the set subset and the old one. In the case that the given subset name is not found in the database, the original medium is returned.

add_substance_from_db(name: str, flux: float = 10.0)[source]

Add a substance from the database to the medium.

Args:

name (str):
Name of the substance. Should be part of the database substance.name column.
flux (float, optional):
Sets the flux value of the new substance. Defaults to 10.0.

combine(other: Medium, how: Literal['+'] | None | float | tuple[float] = '+', default_flux: float = 10.0, inplace: bool = False) → Medium[source]

Combine two media into a new one.

Modes to combine media (input for param how):

None -> combine media, remove all flux values (= set them to None). Sets sources to None as well.
‘+’ -> Add fluxes of the same substance together.
float -> Calculate flux * percentage (float) for first medium and flux * 1.0-percentage (float) for second medium and add fluxes of same substance together.
tuple(float,float) -> Same as above, except both percentages are given.

Args:

other (Medium):
The medium to combine with.
how (Union[Literal[‘+’],None,float,tuple[float]], optional):
How to combine the two media. Options listed in header. Defaults to ‘+’.
default_flux (float, optional):
Flux to use in combine-modes (except how=None) for NaN/None values. Defaults to 10.0.

Returns:

Medium:: The combined medium.

copy() → Medium[source]

Create a deep copy of the Medium object.

Returns:: Medium: A new Medium object with the same data, independent of the original.

export_to_cobra(namespace: Literal['Name', 'BiGG', 'SEED'] = 'BiGG', default_flux: float = 10.0, replace: bool = False, double_o2: bool = True, ext_compartment: str = 'e') → dict[str, float][source]

Export a medium to the COBRApy format for a medium.

Args:

namespace (Literal[‘Name’, ‘BiGG’], optional):
Namespace to use. Defaults to ‘BiGG’.
default_flux (float, optional):
Default flux to substitute missing values. Defaults to 10.0.
replace (bool, optional):
Replace all values with the default flux. Defaults to False.
double_o2 (bool, optional):
Double the flux of oxygen. Defaults to True.
ext_compartment (str, optional):
The compartment suffix for external compounds Mainly used fir the SEED namespace. Defaults to ‘e’.

Raises:

ValueError: Unknown namespace.

Returns:

dict[str,float]:: The exported medium.

export_to_file(type: Literal['tsv', 'csv', 'docs', 'rst'] = 'tsv', flavour: Literal['substance_table', 'carveme_mimic'] = 'substance_table', no_flux: bool = False, dir: str = './', max_widths: int = 80)[source]

Export medium, especially substance table.

Args:

type (Literal[‘tsv’,’csv’,’docs’,’rst’], optional):
Type of file to export to. Defaults to ‘tsv’.
flavour (Literal[‘substance_table’,’carveme_mimic’], optional):
Flavour of file to export. Only viable for ‘tsv’ and ‘csv’. Defaults to ‘substance_table’.
no_flux (bool, optional):
If True, the flux column is removed in the exported file. Only viable for ‘tsv’ and ‘csv’. Defaults to False.
dir (str, optional):
Path to the directory to write the file(s) to. Defaults to ‘./’.
max_widths (int, optional):
Maximal table width for the documentation table (). Only viable for ‘rst’ and ‘docs’. Defaults to 80.

Raises:

ValueError: Unknown flavour if flavour not in [‘substance_table’,’carveme_mimic’].
ValueError: Unknown export type if type not in [‘tsv’,’csv’,’docs’,’rst’].

get_source(element: str) → list[str][source]

Get the source of a given element for the medium.

Search for the given element (elemental symbol e.g. O), excluding pattern matches that are followed by other lower-case letters and returm them as a list of sources for the given element.

Args:

element (str):
The symbol of the element to search the sources of

Returns:

list[str]:: The list of the names of the sources (no duplicates).

is_aerobic() → bool[source]

Check if the medium contains O2 / dioxygen.

Returns:

bool:: Results of the test, True if pure oxygen is in the medium.

make_aerobic(flux: float = None)[source]

If the medium is curretly anaerobic, add oxygen to the medium to make it aerobic.

Args:

flux(float,optional):
The flux value for the oxygen to be added. Defaults to None.

make_anaerobic()[source]: If the medium is currently aerobic, deletes the oxygen from it to make it anaerobic.

produce_carveme_mimic(no_flux: bool = False) → pandas.DataFrame[source]

Produces a pandas DataFrame for the substance table in the CarveMe format.

Args:

no_flux (bool, optional):
If True, the flux column is removed in the exported file. Defaults to False.

Returns:

pd.DataFrame:: Substance table in CarveMe format, with columns ‘medium’, ‘description’, ‘compound’, ‘name’ and optionally ‘flux’.

produce_medium_docs_table(folder: str | Path = './', max_width: int = 80) → str[source]

Produces a rst-file containing reStructuredText for the substance table for documentation.

Args:

folder (Union[str, Path], optional):
Path to folder/directory to save the rst file to. Defaults to ‘./’.
max_width (int, optional):
Maximal table width of the rst-table. Defaults to 80.

remove_substance(name: str)[source]

Remove a substance from the medium based on its name

Args:

name (str):
Name of the substance to remove.

set_default_flux(flux: float = 10.0, replace: bool = False, double_o2: bool = True)[source]

Set a default flux for the model.

Args:

flux (float, optional):
Default flux for the medium. Defaults to 10.0.
replace (bool, optional):
Replace al fluxes with the default. Defaults to False.
double_o2 (bool, optional):
Tag to double the flux for oxygen only. Works only with replace=True. Defaults to True.

set_oxygen_percentage(perc: float = 1.0)[source]

Set oxygen percentage of the medium.

Args:

perc (float, optional):
Percentage of oxygen. Defaults to 1.0 (= 100%)

set_source(element: str, new_source: str)[source]

Set the source for a given element to a specific substance by deleting all other sources of said element before adding the new source.

Args:

element (str):
The element to set the source for, e.g. ‘O’ for oxygen.
new_source (str):
The new source. Should be the name of a substance in the database, otherwise no new source will be set.

refinegems.classes.medium.REGEX_MEDIA_YML_TUPLE = re.compile('\$(\\d+\\.\\d+),(\\d+\\.\\d+)\$')

refinegems.classes.medium.REQUIRED_SUBSTANCE_ATTRIBUTES = ['name', 'formula', 'flux', 'source']

refinegems.classes.medium.add_subset_to_db(name: str, desc: str, subs_dict: dict, database: str = PATH_TO_DB, default_perc: float = 1.0) → None[source]

Add a new subset to the database.

Args:

name (str):
Name (Abbreviation) of the new subset. Needs to be unique for the databse.
desc (str):
Description of the new subset.
subs_dict (dict):
Dictionary of the names and percentages for the substances to be included in the new subsets. The names should be part of the substance table.
database (str, optional):
Which database to connect to. Defaults to PATH_TO_DB.
default_perc (float, optional):
Default percentage to set if None is given in the dictionary. Defaults to 1.0.

refinegems.classes.medium.download_example_medium(filename: str = 'custom_medium_substance_table.tsv')[source]

Load the example external medium file from the package and save a copy for the user to edit.

Args:

filename (str, optional):
Filename to write the example medium to/save it under as. Defaults to ‘custom_medium_substance_table.tsv’.

refinegems.classes.medium.enter_db_single_entry(table: str, columns: list[str], values: list[Any], database: str = PATH_TO_DB)[source]

Enter a single entry into a database.

Args:

table (str):
Which table to enter information to.
columns (list[str]):
Name of the columns to add information to.
values (list[Any]):
List of new values for the columns.
database (str, optional):
Database to add a row to. Defaults to PATH_TO_DB.

refinegems.classes.medium.enter_m2s_row(row: pandas.Series, medium_id: int, connection: Connection, cursor: Cursor)[source]

Helper function for refinegems.classes.medium.enter_medium_into_db(). Enters a new entry in the medium2substance table.

Args:

row (pd.Series):
A row of the pd.DataFrame of the enter_medium_into_db() function.
medium_id (int):
The row id of the medium.
connection (sqlite3.Connection):
Connection to the database.
cursor (sqlite3.Cursor):
Cursor for the database.

refinegems.classes.medium.enter_medium_into_db(medium: Medium, database: str = PATH_TO_DB)[source]

Enter a new medium to an already existing database.

Args:

medium (Medium):
A medium object to be added to the database.
database (str, optional):
Path to the database. Defaults to the in-build databse.

refinegems.classes.medium.enter_s2db_row(row: pandas.Series, db_type: str, connection: Connection, cursor: Cursor)[source]

Helper function for enter_medium_into_db(). Enters a new entry in the substance2db table after checking if it has yet to be added.

Args:

row (pd.Series):
A row of the pd.DataFrame of the enter_medium_into_db() function.
db_type (str):
Type of database identifier to be added.
connection (sqlite3.Connection):
Connection to the database.
cursor (sqlite3.Cursor):
Cursor for the database.

refinegems.classes.medium.enter_substance_row(row: pandas.Series, connection: Connection, cursor: Cursor) → int[source]

Helper function for enter_medium_into_db(). Enters a new entry in the medium2substance table.

Args:

row (pd.Series):
A row of the pd.DataFrame of the enter_medium_into_db() function.
connection (sqlite3.Connection):
Connection to the database.
cursor (sqlite3.Cursor):
Cursor for the database.

Returns:

int:: The substance ID in the database of the substance.

refinegems.classes.medium.export_media_from_db_to_file(media_names_or_config: str | list[str] | Literal['all'] = 'all', type: Literal['tsv', 'csv', 'docs', 'rst'] = 'tsv', flavour: Literal['substance_table', 'carveme_mimic'] = 'substance_table', single_file: bool = False, no_flux: bool = False, dir: str = './', max_widths: int = 80)[source]

Export media from the database to files/a single file.

Args:

media_names_or_config (Union[str, list[str], Literal[‘all’]], optional):
The name(s) of the medium/media to export or the path to a media configuration file. Defaults to ‘all’.
type (Literal[‘tsv’,’csv’,’docs’,’rst’], optional):
Type of file to export to. Defaults to ‘tsv’.
flavour (Literal[‘substance_table’,’carveme_mimic’], optional):
Flavour of file to export. Only viable for ‘tsv’ and ‘csv’. Defaults to ‘substance_table’.
single_file (bool, optional):
If True, export all media in one file. Only viable for ‘carveme_mimic’. Defaults to False.
no_flux (bool, optional):
If True, the flux column is removed in the exported file. Only viable for ‘tsv’ and ‘csv’. Defaults to False.
dir (str, optional):
Path to the directory to write the file(s) to. Defaults to ‘./’.
max_widths (int, optional):
Maximal table width for the documentation table (). Only viable for ‘rst’ and ‘docs’. Defaults to 80.

Raises:

TypeError: Invalid input for media_names_or_config.
ValueError: Unknown medium name(s)
ValueError: Unknown export type

refinegems.classes.medium.extract_medium_info_from_model_bigg(row, model: cobra.Model) → pandas.Series[source]

Helper function for read_from_cobra_model(). Extracts more information about the medium.

Args:

row (pd.Series):
A row of the datatable of read_from_cobra_model().
model (cobra.Model):
The cobra Model

Returns:

pd.Series:: One row for the substance table.

refinegems.classes.medium.generate_docs_for_subset(subset_name: str, folder: str | Path = './', max_width: int = 80)[source]

Generate documentation for a subset.

Args:

subset_name (str):
Name of the subset to be exported for the documentation. Name should be in database-subset-id.
folder (Union[str, Path], optional):
Folder to save the output to. Defaults to ‘./’.
max_width (int, optional):
Maximal table width for the documentation page. Defaults to 80.

refinegems.classes.medium.generate_insert_query(row: pandas.Series, cursor) → str[source]

Helper function for update_db_multi(). Generate the SQL string for inserting a new line into the database based on a row of the table.

Args:

row (pd.Series):
One row of the table of the parent function.

Returns:

str:: The constructed SQL string.

refinegems.classes.medium.generate_update_query(row: pandas.Series) → str[source]

Helper function for update_db_multi(). Generates an update SQL query for the provided table

Args:

row (pd.Series):
Series containing the row of a DataFrame to be used to update a table in a database with columns table | column | new_value | conditions

Returns:

str:: SQL query to be used to update a table in a database with the provided data

refinegems.classes.medium.get_last_idx_table(tablename: str, connection: Connection, cursor: Cursor) → int[source]

Helper function. Retrieves the last row id of a specified table of the database.

Args:

tablename (str):
The name of the table to retrieve the last row id from
connection (sqlite3.Connection):
Connection to the database.
cursor (sqlite3.Cursor):
Cursor for the database.

Returns:

int:: The last row ID of the specified table.

refinegems.classes.medium.load_external_medium(how: Literal['file', 'console'], **kwargs) → Medium[source]

Read in an external medium.

Currently available options for how

‘console’: read in medium via the console
‘file’: read in medium from a file, requires a ‘path=str’ argument to be passed.

About the format (console, file):

The substances have to be saved in a TSV file table (format see read_substances_from_file()). Further information for the ‘file’ option have to added as comments in the format: # info_name: info. Information should but do not need to contain name, description and reference.

Args:

how (Literal[‘file’,’console’]):
How (or from where) the medium should be read in. Available options are given above.

Raises:

ValueError: Unknown description for how.

Returns:

Medium:: The read-in medium.

refinegems.classes.medium.load_media(yaml_path: str) → tuple[list[Medium], list[str, None]][source]

Load the information from a media configuration file.

Args:

yaml_path (str):
The path to a media configuration file in YAML-format.

Returns:

tuple[list[Medium],list[str,None]]:

Tuple of two lists (1) & (2)

list: list of the loaded media and
list: list of supplement modes

refinegems.classes.medium.load_medium_from_db(name: str, database: str = PATH_TO_DB, type: str = 'standard') → Medium[source]

Load a medium from a database.

Args:

name (str):
The name (or identifier) of the medium.
database (str, optional):
Path to the database. Defaults to the in-built database.
type (str, optional):
How to load the medium. Defaults to ‘standard’.

Raises:

ValueError: Unknown medium name.

Returns:

Medium:: The medium retrieved from the database.

refinegems.classes.medium.medium_to_model(model: cobra.Model, medium: Medium, namespace: str = 'BiGG', default_flux: float = 10.0, replace: bool = False, double_o2: bool = True, add: bool = True) → None | dict[str, float][source]

Add a medium to a COBRApy model.

Args:

model (cobra.Model):
A model loaded with COBRApy.
medium (Medium):
A refinegems Medium object.
namespace (str, optional):
String to set the namespace to use for the model IDs. Defaults to ‘BiGG’.
default_flux (float, optional):
Set a default flux for NaN values or all. Defaults to 10.0.
replace (bool, optional):
Option to replace existing flux values with the default if set to True. Defaults to False.
double_o2 (bool, optional):
Double the oxygen amount in the medium. Defaults to True.
add (bool, optional):
If True, adds the medium to the model, else the exported medium is returned. Defaults to True.

Returns:

Union[None, dict[str, float]]:: Either none or the exported medium.

refinegems.classes.medium.read_from_cobra_model(model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG') → Medium[source]

Read and import a medium from a cobra model into a Medium object.

Args:

model (cobra.Model):
An open cobra Model.

Returns:

Medium:: The imported medium.

refinegems.classes.medium.read_substances_from_file(path: str) → pandas.DataFrame[source]

Read in a TSV with substance information into a table.

Format of the TSV (with example): name formula flux source X X | … water H20 10.0 …..

X: placeholder for database names (columns filled with corresponding IDs of the substances) X = see ALLOWED_DATABASE_LINKS

Args:

path(str):
The path to the input file.

Returns:

pd.DataFrame:: The table of substance information read from the file

refinegems.classes.medium.update_db_entry_single(table: str, column: str, new_value: Any, conditions: dict, database: str = PATH_TO_DB)[source]

Update a single database entry.

Args:

table (str):
Name of the table to update.
column (str):
Name of the Attribute to change.
new_value (Any):
New value to be set.
conditions (dict):
Further conditions.
database (str, optional):
Which database to change. Defaults to PATH_TO_DB.

refinegems.classes.medium.update_db_multi(data: pandas.DataFrame, update_entries: bool, database: str = PATH_TO_DB)[source]

Updates/Inserts multiple entries in a table from the specified database. Given table should have the format:

row : table | column | new_value | conditions

Notes:

multiple columns and values are lists with a “,” and no whitespaces
conditions are listed like: a=x;b=y;…
- conditions separated by ‘;’
- column and value separated by ‘=’
- no whitespaces

Args:

data (pd.DataFrame):
DataFrame containing the columns table | column | new_value | conditions
update_entries (bool):
Boolean to determine whether entries should be inserted or updated. False means insert.
database (str, optional):
Path to a database. Defaults to PATH_TO_DB.

refinegems.classes.medium.updated_db_to_schema(directory: str = '../data/database', inplace: bool = False)[source]

Extracts the SQL schema from the database data.db & Transfers it into an SQL file

Args:

directory(str,optional):
Path to the directory of the updated DB. Defaults to ‘../data/database’.
inplace(bool, optional):
If True, uses the default sql-file name, otherwise extends it with the prefix updated.

reports module

Classes to generate, handle, manipulate and save reports.

class refinegems.classes.reports.AuxotrophySimulationReport(results)[source]

Bases: Report

Report for the auxotrophy simulation.

Attributes:

simulation_results:
The data of the simulation.

__init__(results) → None[source]: Initialise the report.

save(dir: str | Path, color_palette: str = 'YlGn')[source]

Save the report to a given dictionary.

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.
color_palette (str, optional):
Name of a matplotlib colour palette. Defaults to ‘YnGr’.

visualise_auxotrophies(color_palette: str = 'YlGn', save: None | str = None) → None | matplotlib.figure.Figure[source]

Visualise and/or save the content of the report.

Args:

color_palette (str, optional):
A name of a seaborn gradient color palette. In case name is unknown, takes the default. Defaults to ‘YlGn’.
save (None | str, optional):
Path to a directory, if the output shall be saved. Defaults to None (returns the figure).

Returns:

Case: save = str

None: No
return, as the visulaisation is directly saved.
Case: save = None

matplotlib.figure.Figure:
The plotted figure.

class refinegems.classes.reports.CorePanAnalysisReport(model: cobra.Model, core_reac: list[str] = None, pan_reac: list[str] = None, novel_reac: list[str] = None)[source]

Bases: Report

Report for the core-pan analysis.

Summarises the information and provides functions for visualisation.

Attributes:

model:
The model the report is based on.
core_reac:
List of reactions considered “core”.
pan_reac:
List of reactions considered “pan”.
novel_reac:
List of reactions considered “novel”.

__init__(model: cobra.Model, core_reac: list[str] = None, pan_reac: list[str] = None, novel_reac: list[str] = None)[source]: Initialise the report.

get_reac_counts()[source]: Return a dictionary of the counts of the reactions types (core, pan, novel).

isValid(check='reaction-count') → bool[source]

Check if a certain part of the analysis is valid.

Currently possible checks:

reaction-count :
check if the number of reactions in the model equal the sum of the novel, pan and core reactions

Args:

check (str, optional):
Describes which part to check. Options are listed above. Defaults to ‘reaction-count’.

Raises:

ValueError: Unknown string for parameter check.

Returns:

bool:: Result of the check.

save(dir: str | Path)[source]

Save the results inside a PanCoreAnalysisReport object.

The function creates a new folder ‘pan-core-analysis’ inside the given directory and creates the following documents:

table_reactions.tsv : reactions ID mapped to their labels
visualise_reactions : donut chart of the values above

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.

visualise_reactions() → figure[source]

Visualise the results of the pan-core analysis for the reactions as a donut chart.

Returns:

matplotlib.figure:: The plot.

class refinegems.classes.reports.GapFillerReport(variety: str, statistics: dict, manual_curation: dict, hide_zeros: bool = False, no_title: bool = False)[source]

Bases: Report

Report for the gap-filling of the model.

Note

In care cases, the statistics might not sum up perfectly, as they only count the main steps, appart from the total and total missing amounts.

Attributes:

variety:
The variety of the gap-filling method used.
statistics:
List of different counts for reactions, genes and metabolites.
manual_curation:
List of IDs for manual curation.
hide_zeros:
Option to hide all zero values in the statistics. Defaults to False.

__init__(variety: str, statistics: dict, manual_curation: dict, hide_zeros: bool = False, no_title: bool = False) → None[source]: Initialise the report.

save(dir=str, color_palette: str = 'YlGn') → None[source]

Save the report.

Args:

dir (str):
Path to a directory to save the report to.
color_palette (str, optional):
A colour gradient from the matplotlib library. If the name does not exist, uses the default. Defaults to YlGn.

property statistics: Get or set the current statistics dictionary.

While setting the provided all zeros are removed from the dictionary

and if the values behind the keys ‘unmappable’ and ‘missing (remaining)’

are the same only ‘missing (remaining)’ is kept.

visualise(color_palette: str = 'YlGn') → matplotlib.figure.Figure[source]

Visualise the basic information of the report.

Args:

color_palette (str, optional):
Colour palette to use for the plots. Defaults to ‘YlGn’.

Returns:

matplotlib.figure.Figure:: The visualisation as a single figure.

class refinegems.classes.reports.GrowthSimulationReport(reports: list[SingleGrowthSimulationReport] = None)[source]

Bases: Report

Report for the growth simulation analysis.

Attributes:

reports:
List of the report for the single growth analysis.
model:
List of the model names.
media:
List of the media names.
supplementation:
List of the supplementation varieties.

__init__(reports: list[SingleGrowthSimulationReport] = None)[source]: Initialise the report.

__str__() → str[source]: Return a string representation of the report.

Note

Will be coming in a future release.

add_sim_results(new_rep: SingleGrowthSimulationReport)[source]

Add a new single growth report to the reports list

Args:

new_rep (SingleGrowthSimulationReport):
The new simulation report.

plot_growth(unit: Literal['h', 'dt'] = 'dt', color_palette: str = 'YlGn') → matplotlib.figure.Figure | None[source]

Visualise the contents of the report.

Note

Please keep in mind that the figure does not show unrealistically high and miniscule values to zero. However, all values are contained within the table one can get via to_table().

Args:

unit (Literal[‘h’,’dt’], optional):
Set the unit to plot. Can be doubling time in minutes (‘dt’) or growth rates in mmol/gDWh (‘h’). Defaults to ‘dt’.
color_palette (str, optional):
A colour gradient from the matplotlib library. If the name does not exist, uses the default. Defaults to ‘YlGn’.
**kwargs:
Additional keyword arguments for the plotting functions. See the ax.bar and sns.heatmap documentation for possible arguments.

Returns:

If plotting possible: matplotlib.figure.Figure:: The plotted figure.

Else None

save(dir: str | Path, check_overwrite: bool = True, color_palette: str = 'YlGn')[source]

Save the report.

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.
check_overwrite (bool, optional):
Flag to choose to check for existing directory/files of same name or just to overwrite them. Defaults to True.
color_palette (str, optional):
A colour gradient from the matplotlib library. If the name does not exist, uses the default. Defaults to ‘YlGn’.

to_table() → pandas.DataFrame[source]

Return a table of the contents of the report.

Returns:

pd.DataFrame:: The table containing the information in the report.

class refinegems.classes.reports.KEGGPathwayAnalysisReport(total_reac=None, kegg_count=None, kegg_global=None, kegg_over=None, kegg_rest=None)[source]

Bases: Report

Report for the KEGG pathway analysis.

Attributes:

total_reac:
An integer for the total number of reactions in the model.
kegg_count:
An integer as a counter for the KEGG pathway annotations.
kegg_global:
Dictionary of global KEGG IDs and their counts.
kegg_over:
Dictionary of overvire KEGG IDs and their counts.
kegg_rest:
Dictionary of the remaining KEGG IDs and their counts.

__init__(total_reac=None, kegg_count=None, kegg_global=None, kegg_over=None, kegg_rest=None) → None[source]: Initialise the report.

save(dir: str | Path, colors: str = 'YlGn') → None[source]

Save the content of the report as plots.

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.
colors(str,optional):
Colour palette for the plots. Should be a valid name of a matplotlib sequential colour palette.

visualise_kegg_counts(colors: list[str] = ['lightgreen', 'darkgreen']) → matplotlib.pyplot.figure[source]

Visualise the amounts of reaction with and without KEGG pathway annotation.

Args:

colors (list[str], optional):
List of two colours used for the plotting. If wrong number or non-matplotlib colours are given, sets its to the default. Defaults to ‘lightgreen’ and ‘darkgreen’.

Returns:

plt.figure:: The resulting plot.

visualise_kegg_pathway(plot_type: Literal['global', 'overview', 'high', 'existing'] = 'global', label: Literal['id', 'name'] = 'id', color_palette: str = 'YlGn') → matplotlib.pyplot.figure[source]

Visualise the KEGG pathway identifiers present.

Depending on the :plot_type:, different levels of pathway identifiers are plotted:

global: check and plot only the global pathway identifiers
overview: check and plot only the overview pathway identifiers
high: check and plot all identifiers grouped by their high level pathway identifiers. This option uses label=name, independedly of the input
all: check and plot all identifiers

Args:

plot_type (Literal[“global”,”overview”,”high”,”existing”], optional):
Type of plot, explaination see above. Defaults to ‘global’.
label (Literal[“id”,”name”], optional):
Type of the label. If ‘id’, uses the KEGG pathway IDs, if ‘name’, uses the pathway names. Defaults to ‘id’.
color_palette (str, optional):
A colour gradient from the matplotlib library. If the name does not exist, uses the default. Defaults to ‘YlGn’.

Returns:

plt.figure:: The plotted visualisation.

refinegems.classes.reports.KEGG_GLOBAL_PATHWAY = {'01100': 'Metabolic pathways', '01110': 'Biosynthesis of secondary metabolites', '01120': 'Microbial metabolism in diverse environments'}

refinegems.classes.reports.KEGG_METABOLISM_PATHWAY

refinegems.classes.reports.KEGG_METABOLISM_PATHWAY_DATE = '6. July 2023'

refinegems.classes.reports.KEGG_OVERVIEW_PATHWAY = {'01200': 'Carbon metabolism', '01210': '2-Oxocarboxylic acid metabolism', '01212': 'Fatty acid metabolism', '01220': 'Degradation of aromatic compounds', '01230': 'Biosynthesis of amino acids', '01232': 'Nucleotide metabolism', '01240': 'Biosynthesis of cofactors', '01250': 'Biosynthesis of nucleotide sugars'}

class refinegems.classes.reports.ModelInfoReport(model: cobra.Model)[source]

Bases: Report

Report about the basic information of a given model.

Attributes:

name:
A string for the name of the model.
reac:
List of the reactions in the model.
meta:
List of the metabolites in the model.
gene:
An int that describes the number of genes in the model.
orphans:
List of metabolite IDs that are considered orphans.
deadends:
List of metabolite IDs that are considered dead-ends.
disconnects:
List of metabolite IDs that are disconnected in the model.
mass_charge_unbalanced:
List of reaction IDs that are unbalanced regarding their mass and their charges.
mass_unbalanced:
List of reaction IDs that are unbalanced regarding their mass only.
charge_unbalanced:
List of reactions IDs that are unbalanced regarding their charges only.
pseudo:
List of pseudoreaction IDs (sinks, demands, exchanges) in the model.
normal_with_gpr:
List of reactions IDs that are normal reactions with gpr.
pseudo_with_gpr:
List of reactions IDs that are pseudoreactions with gpr.

__init__(model: cobra.Model) → None[source]: Initialise the report.

format_table(all_counts=True) → pandas.DataFrame[source]

Put the information of the report into a pandas DataFrame table.

Args:

all_counts (bool, optional):
Option to save the list of e.g. reactions as such or to convert them into counts when set to True. Defaults to True.

Returns:

pd.DataFrame:: The data in table format

make_html()[source]: Note

Will be coming in a future release.

save(dir: str | Path, color_palette: str = 'YlGn') → None[source]

Save the report.

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.
color_palette (str, optional):
Colour palette of matplotlib to plot figures in. Defaults to ‘YlGn’.

visualise(color_palette: str = 'YlGn') → matplotlib.figure.Figure[source]

Visualise the basic information of the report.

Args:

color_palette (str, optional):
Colour palette to use for the plots. Defaults to ‘YlGn’.

Returns:

matplotlib.figure.Figure:: The visualisation as a single figure.

class refinegems.classes.reports.MultiModelInfoReport[source]

Bases: Report

__add__(other)[source]

__init__() → None[source]: Initialise the report.

add_single_report(report: ModelInfoReport) → None[source]

save()[source]: Note

Will be coming in a future release.

visualise()[source]: Note

Will be coming in a future release.

class refinegems.classes.reports.MultiSBOTermReport(reports: list[SBOTermReport] | SBOTermReport)[source]

Bases: Report

A collection of SBO term reports.

Attributes:

model_reports:
List of SBOTermReports.

__init__(reports: list[SBOTermReport] | SBOTermReport)[source]

Args:

reports (Union[list[SBOTermReport] | SBOTermReport]):
Either a single or a list of SBOTermReports

Raises:

ValueError: Wrong input type

add_report(report: SBOTermReport)[source]

Add another SBOTermReports to the report collection.

Args:

report (SBOTermReport):
The report to add.

save(dir: str | Path, rename: dict = Union[None, dict], color_palette: str | list[str] = 'Paired', figsize: tuple = (10, 10))[source]

Save the information of contained in the report.

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.
rename (Union[None,dict], optional):
Takes a dictioanry of model IDs and alternative names When set, uses the dictionary to rename the models. Defaults to None.
color_palette (Union[str,list[str]], optional):
Color palette name or list of colours for the graphic. Defaults to ‘Paired’.
figsize (tuple, optional):
Site of the figure. Requires a tuple of two integers. Defaults to (10,10).

visualise(rename: None | dict = None, color_palette: str | list[str] = 'Paired', show_invalid: bool = False, show_overall_counts: bool = False, kwargs: dict = {'figsize': (10, 10), 'legend_loc': 'lower right'}) → matplotlib.figure.Figure[source]

Visualise the amount of SBO terms in the models.

Args:

rename (Union[None,dict], optional):
Takes a dictionary of model IDs and alternative names When set, uses the dictionary to rename the models. Defaults to None.
color_palette (Union[str,list[str]], optional):
Color palette name or list of colours for the graphic. Defaults to ‘Paired’.
show_invalid (bool, optional):
Whether to include invalid SBO terms in the visualisation. Defaults to False.
show_overall_counts (bool, optional):
Whether to show overall counts of SBO terms in the visualisation. Defaults to False.
kwargs (dict, optional):
Dictionary containing details for plotting the MultiSBOTermReport. Defaults to {‘figsize’: (10, 10), ‘legend_loc’:’lower right’}.

Raises:

TypeError: Unkown type for color palette.

Returns:

matplotlib.figure.Figure:: The generated graphic

class refinegems.classes.reports.Report[source]

Bases: ABC

Abstract base class for the reports.

Each subclass needs an implementation of save.

__init__()[source]: Initialise the report.

__str__()[source]: Return a string representation of the report.

Note

Will be coming in a future release.

abstract save(dir: str | Path, *args, **kwargs)[source]

Abstract method to save the report. Only implements a method to ensure a provided directory exists.

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.

to_dict() → dict[source]: Return the contents of the report as a dictionary.

Note

Will be coming in a future release.

to_table() → pandas.DataFrame[source]: Return the contents of the report as a pandas.DataFrame.

Note

Will be coming in a future release.

visualise(*args, **kwargs)[source]: Visualise the report contents. Should return a matplotlib.figure.Figure.

Note

Will be coming in a future release.

class refinegems.classes.reports.SBOTermReport(model: libsbml.Model, name: str | None = None)[source]

Bases: Report

Report of the ABO terms of a model.

Attributes:

name:
Name (ID) of the model.
sbodata:
Dictionary containing the SBO terms and the corresponding counts of annotations found in the model. Only includes SBO terms, that have at least 1 occurence in the model.

__init__(model: libsbml.Model, name: str | None = None)[source]

Args:

model (libModel):
A model loaded with libSBML.
name (Union[str, None], optional):
An optional name for the model. If not provided, the ID of the model is used. Defaults to None.

save(dir: str | Path)[source]

Save the information inside

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.

visualise(color_palette: str = 'forestgreen', show_invalid: bool = False, show_overall_counts: bool = False) → matplotlib.figure.Figure[source]

Visualise the amount of SBO terms found in the model the report was created with.

Args:

color_palette (str, optional):
Name of a color. Defaults to ‘forestgreen’.
show_invalid (bool, optional):
Whether to include invalid SBO terms in the visualisation. Defaults to False.
show_overall_counts (bool, optional):
Whether to show overall counts of SBO terms in the visualisation. Defaults to False.

Returns:

matplotlib.figure.Figure:: The created graphic.

class refinegems.classes.reports.SingleGrowthSimulationReport(model_name=None, medium_name=None, supplementation_variety=None, growth_value=None, doubling_time=None, additives=None, no_exchange=None)[source]

Bases: Report

Report for a single growth simulation, one media against one model.

Attributes:

model_name:
Name of the model.
medium_name:
Name of the medium.
supplementation_variety
Variety of the supplementation. One of [‘min’, ‘std’, None].
growth_value:
Simulated growth value.
doubling_time:
Simulated doubling time.
additives:
List of substances, that were added.
no_exchange:
List of substances that normally would be found in the media but have been removed, as they are not part of the exchange reactions of the model.

__init__(model_name=None, medium_name=None, supplementation_variety=None, growth_value=None, doubling_time=None, additives=None, no_exchange=None)[source]: Initialise the report.

__str__()[source]: Return a string representation of the report.

Note

Will be coming in a future release.

save(dir: str | Path)[source]

Save the report.

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.
check_overwrite (bool, optional):
Flag to choose to check for existing directory/files of same name or just to overwrite them. Defaults to True.

to_dict() → dict[source]

Transform the information into a dictionary.

Returns:

dict:: The information of the report as a dictionary.

class refinegems.classes.reports.SourceTestReport(results: pandas.DataFrame = None, element: str = None, model_name: str = None)[source]

Bases: Report

Report for the source test (:py:func:rg.growth.test_growth_with_source).

Attributes:

results:
A pd.DataFrame with the results (substances and growth values).
element:
The element the test was performed for.
model_name:
The name of the model, that was tested.

__init__(results: pandas.DataFrame = None, element: str = None, model_name: str = None)[source]: Initialise the report.

save(dir: str | Path, width: int = 12, color_palette: str = 'YlGn') → None[source]

Save the results of the source test.

Args:

dir (Union[str, Path]):
Path to a directory to save the output to.
width (int, optional):
Number of columns for the heatmap. Defaults to 12.
color_palette (str, optional):
Color palette (gradient) for the plot. Defaults to ‘YlGn’.

visualise(width: int = 12, color_palette: str = 'YlGn') → tuple[matplotlib.figure.Figure, pandas.DataFrame][source]

Visualise the results of the source test as a heatmap

Args:

width (int, optional):
Number of columns to display for the heatmap. Number of row is calculated accordingly to fit all values. Defaults to 12.
color_palette (str, optional):
Color palette (gradient) for the plot. Defaults to ‘YlGn’.

Returns:

tuple(matplotlib.Figure, pd.DataFrame):: The heatmap and the legend explaining the heatmap.