refinegems.classes subpackage
egcs module
Identify, report and solve energy generating cycles (EGCs).
- refinegems.classes.egcs.DISSIPATION_RXNS = {'ACCOA': {'Acetate [Acetic acid]': 1, 'Acetyl-CoA': -1, 'Coenzyme A': 1, 'Hydrogen [H(+)]': 1, 'Water [H2O]': -1}, 'ATP': {'ADP [Adenosine diphosphate]': 1, 'ATP [Adenosine triphosphate]': -1, 'Hydrogen [H(+)]': 1, 'Phosphate [PO4(3-)]': 1, 'Water [H2O]': -1}, 'CTP': {'CDP [Cytidine diphosphate]': 1, 'CTP [Cytidine triphosphate]': -1, 'Hydrogen [H(+)]': 1, 'Phosphate [PO4(3-)]': 1, 'Water [H2O]': -1}, 'DMMQL8': {'2-Demethylmenaquinol-8': 1, '2-Demethylmenaquinone-8': -1, 'Hydrogen [H(+)]': 2}, 'FADH2': {'FAD [oxidized Flavin adenine dinucleotide]': 1, 'FADH2 [reduced Flavin adenine dinucleotide]': -1, 'Hydrogen [H(+)]': 2}, 'FMNH2': {'FMN [oxidized Flavin mononucleotide]': 1, 'FMNH2 [reduced Flavin mononucleotide]': -1, 'Hydrogen [H(+)]': 2}, 'GLU': {'2-Oxoglutarate [Oxoglutaric acid]': 1, 'Ammonia': 1, 'D-Glucose': -1, 'Hydrogen [H(+)]': 2, 'Water [H2O]': -1}, 'GTP': {'GDP [Guanosine diphosphate]': 1, 'GTP [Guanosine triphosphate]': -1, 'Hydrogen [H(+)]': 1, 'Phosphate [PO4(3-)]': 1, 'Water [H2O]': -1}, 'ITP': {'Hydrogen [H(+)]': 1, 'IDP [Inosine diphosphate]': 1, 'ITP [Inosine triphosphate]': -1, 'Phosphate [PO4(3-)]': 1, 'Water [H2O]': -1}, 'MQL8': {'Hydrogen [H(+)]': 2, 'Menaquinol-8': 1, 'Menaquinone-8': -1}, 'NADH': {'Hydrogen [H(+)]': 1, 'NAD [oxidized Nicotinamide adenine dinucleotide]': 1, 'NADH [reduced Nicotinamide adenine dinucleotide]': -1}, 'NADPH': {'Hydrogen [H(+)]': 1, 'NADP [oxidized Nicotinamide adenine dinucleotide phosphate]': 1, 'NADPH [reduced Nicotinamide adenine dinucleotide phosphate]': -1}, 'PROTON': {'Hydrogen [H(+)]': 1, 'Hydrogen [H(+)] transported': -1}, 'Q8H2': {'Hydrogen [H(+)]': 2, 'Ubiquinol-8': 1, 'Ubiquinone-8': -1}, 'UTP': {'Hydrogen [H(+)]': 1, 'Phosphate [PO4(3-)]': 1, 'UDP [Uridine diphosphate]': 1, 'UTP [Uridine triphosphate]': -1, 'Water [H2O]': -1}}
- class refinegems.classes.egcs.EGCSolver(threshold: float = MIN_GROWTH_THRESHOLD, limit: int = 2, chunksize: int = 1)[source]
Bases:
objectParent class for the EGC solvers with generally useful functions and attributes. Can only be used to find, not solve EGCs directly.
- Attributes:
- theshold: Float describing the cutoff, under which the model
will no longer considered to be growing. Defaults to the MIN_GROWTH_THRESHOLD set in the growth module.
- limit: Sets the maximal number of cores to be used.
Defaults to 2.
- chunksize: Chunksize to use for multiprocessing.
Defaults to 1.
- __firstlineno__ = 113
- __static_attributes__ = ('chunksize', 'limit', 'threshold')
- add_DISSIPATIONRXNS(model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) cobra.Model[source]
Add the dissipation reactions a model.
- Args:
- model (cobra.Model):
A model loaded with COBRApy.
- namespace (Literal[‘BiGG’], optional): Namespace of the model.
Defaults to ‘BiGG’.
- compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].
- Returns:
- cobra.Model:
The edited model.
- check_metab_integration(metabolites: dict[slice(<class 'str'>, <class 'int'>, None)], model: cobra.Model, metab_info: ~refinegems.classes.medium.Medium, namespace: ~typing.Literal['BiGG'] = "BiGG", compartment: list = ["c", "e"]) None | dict[source]
Check if the metabolites of a reactions are in the model. If yes, return the dictionary of metabolites (their IDs in the model) to the factors. If no, return None
- Args:
- metabolites (dict[str: int]):
Metabolites mapped to factors.
- model (cobra.Model):
The model loaded with COBRApy.
- metab_info (Medium):
Information about the metabolites from the database, in for of a Medium object.
- namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
- compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].
- Returns:
- Case: metabolites not found
- None:
nothing to return
- Case: found
- dict:
The mapping of IDs of the metabolites to the factors.
- egcs_removed(model: cobra.Model, starting_egcs: dict, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) list[source]
Compare a list of previously found EGCs to the current EGCs in the model.
- Args:
- model (cobra.Model):
The model loaded with COBRApy after a try of solving the EGCs.
- starting_egcs (dict):
List of EGCs before trying to solve them.
- namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
- compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].
- Returns:
- list:
List of newly removed EGCs.
- find_egcs(model: cobra.Model, with_reacs: bool = False, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) list | tuple[source]
Find the EGCs in a model - if exsistend.
- Args:
- model (cobra.Model):
The model loaded with COBRApy.
- with_reacs (bool, optional):
Option to either only return the names of the found EGC or additionally also the reactions, which show fluxes during testing. Defaults to False.
- namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
- compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].
- Returns:
- Case:
with_reacs = False - list:
List of found EGC names.
- Case:
- Case:
with_reacs = True - tuple:
tuple of (1) dictionary & (2) list:
dict: dictionary of the EGCs
list: their reactions that showed fluxes and the objective values of the test.
- Case:
- refinegems.classes.egcs.EGC_SCORING_MATRIX = {'MR': 1, 'RB': 3, 'RF': 3, 'RM': 6}
- class refinegems.classes.egcs.GreedyEGCSolver(scoring_matrix: dict = EGC_SCORING_MATRIX, **kwargs)[source]
Bases:
EGCSolverEGC solver that finds a good solution (greedy) based on modifications to single reactions.
Workflow:
identify existing EGCs
test, if EGCs can be solved using single modifications of reactions
possible modifications:
deletion (RM)
set reversible (MR)
remove backward (forward only) (RB)
remove forward (backward only) (RF)
find a good - not optimal - combination of reactions, that solve the maximum number of EGCs that can be solved this way
apply solution to the model
report remaining EGCs, score and reactions used for solution
- Attributes:
all attributes of the base class
refinegems.classes.egcs.EGCSolver- scoring_matrix:
Dictionary of the changes (RM, MR, RF, RB) against Integers describing the penalty scores.
- __firstlineno__ = 440
- __static_attributes__ = ('scoring_matrix',)
- apply_modifications(model: cobra.Model, solution: dict)[source]
Apply the modifications to reactions in solution to the model.
4 modifications are possible:
“RM” -> removes the reaction
“RB” -> removes the backwards reaction
“RF” -> removes the forward reaction
“MR” -> makes reaction reversible
- Args:
- model (cobra.Model):
Input model
- solution (dict):
Best solution from calculation in py:func:find_solution_greedy.
- check_egc_growth(reac: cobra.Reaction, model: cobra.Model, bounds: tuple, starting_egcs: dict, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) list | None[source]
Check EGC removal and growth of a model when chaning the bounds of a single reaction.
- Args:
- reac (cobra.Reaction):
The reaction to change
- model (cobra.Model):
The model (COBRApy) to manipulate.
- bounds (tuple):
The new reactions bounds.
- starting_egcs (dict):
Dict of the original EGCs found in the model.
- namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
- compartment (list, optional): List of length 2 with the names of the
compartments for the dissipations reactions. Defaults to [‘c’,’e’].
- Returns:
- Case if EGCs removed
- list:
List of EGCs that can be removed with the change
- Case no removal possible
- None:
no return
- find_mods_resolve_egcs_greedy(model: cobra.Model, present_egcs: dict, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) dict[source]
Find the (single) modifications to reactions in a cobra.Model and returns these in a dictionary. Splits the modification check in multiple processes.
- Args:
- model (cobra.Model):
The model loaded with COBRApy.
- present_egcs (dict):
Dict of the original EGCs found in the model.
- namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
- compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].
- Returns:
- dict:
Dictionary of potential modifications to resolve EGCs {“egc”: {“MR”:[potential_solutions], “RB”:[potential_solutions], “RF”:[potential_solutions], “RM”:[potential_solutions]}}
- find_solution_greedy(results: dict, egc_reactions: dict) tuple[source]
Based on the originally found EGCs and the output of
find_mods_resolve_egcs_greedy(), find a solution that solves all EGCs that can be solved with the results.- Args:
- results (dict):
Output of
find_mods_resolve_egcs_greedy().
- egc_reactions (dict):
Output of
find_egcs()with ‘with_reac=True’. Should be the EGCs before calculating any solutions and applying them.
- scoring_matrix (dict, optional):
Dictionary of the modifications types (RM, MR, RF, RB) and their penality score. Defaults to the in-build scoring matrix.
- Returns:
- tuple:
Tuple of (1) dict & (2) int:
dictionary of reaction IDs and their mode of change
score of the solution
- solve_egcs(model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) dict | None[source]
Run the complete greedy EGC solving process.
Note: The input model gets changed, if EGCs can be solved.
- Args:
- model (cobra.Model):
The model loaded with COBRApy.
- namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
- compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].
- Returns:
- dict:
Dictionary with the following entries.
‘solution’: List of reactions for the solution.
‘score’: Score of the solution.
‘remaining egcs’: List of EGCs that could not be solved.
- test_modifications(reaction: cobra.Reaction, model: cobra.Model, present_egc: dict, namespace: Literal['BiGG'] = 'BiGG', compartment: list = ['c', 'e']) dict[source]
Tries four cases for a Reaction
if reaction is not reversible -> make reaction reversible (MR)
limit backward reaction (RB)
limit forward reaction (RF)
“delete” reaction by setting fluxes to 0 (RM)
-> for each case the EGCs which are present in the model are checked if they are removed
-> if EGCs are removed we check if the model still grows on optimal medium
=> When both limitations are True reaction is saved to corresponding dictionary
- Args:
- reaction (cobra.Reaction):
Reaction from a cobra.Model
- model (cobra.Model):
The corresponding GEM loaded with cobrapy
- present_egc (dict):
Dictionary of present EGCs {“egc”: {}} -> EGCs are keys
- namespace (Literal[‘BiGG’], optional):
String for the namespace used in the model. Current options include ‘BiGG’. Defaults to ‘BiGG’.
- compartment (list, optional):
List of length 2 with the names of the compartments for the dissipations reactions. Defaults to [‘c’,’e’].
- Returns:
- dict:
{“egc”: {“MR”:[potential_solutions], “RB”:[potential_solutions], “RF”:[potential_solutions], “RM”:[potential_solutions]}}
gapfill module
Add reactions, genes and more to a model based on different gap-filling methods. All (current) algorithms are separated into three steps: finding missing genes, finding missing reactions and trying to add the found as missing entities to the model.
Available gap filling methods:
- Mainly utilises information from the KEGG database. Needs a KEGG organism ID.Estimated runtime: to be determined
- Mainly utilises information from the BioCyc database. Requires access to BioCyc SmartTables.Estimated runtime: to be determined
- Search for gaps using the GFF file and information from SwissProt.Estimated runtime: to be determined
- class refinegems.classes.gapfill.BioCycGapFiller(biocyc_gene_tbl_path: str, biocyc_reacs_tbl_path: str, gff: str)[source]
Bases:
GapFillerBased on a SmartTable with information on the genes and a SmartTable withinformation on the reactions of the organism of the model, this classfinds missing genes in the model and maps them to reactions to try andfill the gaps found with the BioCyc gene SmartTable.For specifications on the SmartTables see the attributes biocyc_gene_tbl& biocyc_reacs_tblNote
Please keep in mind that using this module requires a model containing the Genbank locus tags as labels. If your model does not conform to this you can use one of the functions
polish_model()orextend_gp_annots_via_mapping_table().- Attributes:
- GapFiller Attributes:
All attributes of the parent class
GapFiller
- biocyc_gene_tbl_path (str, required):
Path to organism-specific SmartTable for genes from BioCyc; Should contain the columns:
Accession-2 | Reactions of gene
- biocyc_reacs_tbl_path (str, required):
Path to organism-specific SmartTable for reactions from BioCyc; Should contain the columns:
Reaction | Object ID | EC-Number | Spontaneous?
- gff (str, required):
Path to organism-specific GFF file
- __firstlineno__ = 1487
- __static_attributes__ = ('_biocyc_rxn_tbl', '_gff', '_variety', 'biocyc_rxn_tbl', 'full_gene_list', 'missing_genes', 'missing_reactions')
- property biocyc_rxn_tbl
- Get or set the current BioCyc Reaction table.While setting the provided path for a TSV file from BioCyc with thecolumns
'Reaction' | 'Object ID' | 'EC-Number' | 'Spontaneous?'isparsed and the content of the file is stored in a DataFrame.
- find_missing_genes(model: libsbml.Model)[source]
Retrieves the missing genes and reactions from the BioCyc table according to the ‘Accession-2’ identifiers (locus_tags)
- Args:
- model (libModel):
Model loaded with libSBML
- find_missing_reactions(model: cobra.Model)[source]
Retrieves the missing reactions with more information like the equation, EC code, etc. according to the missing genes
- Args:
- model (cobra.Model):
Model loaded with COBRApy
- property full_gene_list
- Get or set the current BioCyc Gene table.While setting the provided path for a TSV file from BioCyc with thecolumns
'Accession-2' | 'Reactions of gene'is parsed and thecontent of the file is stored in a DataFrame containing all rows wherea ‘Reactions of gene’ exists.Hint
Please keep in mind that the column Accession-2 needs to contain Genbank locus tags. If that is not the case for your organism use the correct column from BioCyc and rename it accordingly.
- class refinegems.classes.gapfill.GapFiller[source]
Bases:
ABCAbstract base class for the gap filling.
Already includes functions for the “filling” part of the gap-filling approach and some helper functions. Each subclass needs an implementation of find_missing_genes and find_missing_reactions to determine the entities, that are missing in the model.
- Attributes:
- full_gene_list (list):
List of all the genes.
- missing_genes:
DataFrame containing all genes found as missing with additional information. <br>
ncbiprotein | locus_tag | <optional columns>
- missing_reactions:
DataFrame containing all reactions found as missing with additional information.<br>
ec-code | ncbiprotein | id | equation | reference | <is_transport> | via | add_to_GPR
- geneid_type (str):
What type of gene ID the model contains. Defaults to ‘ncbi’.
- _statistics (dict):
Dictionary of statistical information of the gap-filling run. Includes e.g. the number of added genes and reactions.
- manual_curation (dict):
Dictionary of reaction and gene IDs to be used for manual curation.
- __firstlineno__ = 570
- __static_attributes__ = ('_statistics', '_variety', 'full_gene_list', 'geneid_type', 'manual_curation', 'missing_genes', 'missing_reactions')
- _find_reac_in_model(model: cobra.Model, eccode: str, id: str, idtype: Literal['MetaNetX', 'KEGG', 'BiGG', 'BioCyc'], include_ec_match: bool = False) None | list[source]
Helper function of
GapFiller. Search the model for an ID (and optionally EC number), to determine, if the reaction is in the model.- Args:
- model (cobra.Model):
The model loaded with COBRApy.
- eccode (str):
The EC number in the format: X.X.X.X
- id (str):
The ID to search for.
- idtype (Literal[‘MetaNetX’,’KEGG’,’BiGG’, ‘BioCyc’]):
Name of the database the ID belongs to.
- include_ec_match (bool, optional):
Option to include a match if only the EC number matches. Defaults to False.
- Returns:
- Case one or more match found:
- list:
List of the ID of the reactions in the model, that match the query.
- Case no match:
- None:
Nothing found.
- add_gene_reac_associations_from_table(model: libsbml.Model, reac_table: pandas.DataFrame) None[source]
Using a table with at least the columns ‘ncbiprotein’ (containing e.g. NCBI protein identifier (lists), should be gene IDs in the model) and ‘add_to_GPR’ (containing reactions identifier (lists)), add the gene IDs to the GPRs of the corresponding reactions.
- Args:
- model (libModel):
The model loaded with libSBML.
- reac_table (pd.DataFrame):
The table containing at least the columns ‘ncbiprotein’ (gene IDs) and ‘add_to_GPR’ (reaction IDs)
- add_genes_from_table(model: libsbml.Model, gene_table: pandas.DataFrame) None[source]
Create new GeneProduct for a table of genes in the format:
ncbiprotein | locus_tag | UniProt | … |The dots symbolise additional columns, that can be passed to the function, but will not be used by it. The other columns, except UniProt, are required.
- Args:
- model (libModel):
The model loaded with libSBML.
- gene_table (pd.DataFrame):
The table with the genes to add. At least needs the columns ncbiprotein and locus_tag. Optional columns include UniProt amongst other.
- add_reactions_from_table(model: cobra.Model, missing_reac_table: pandas.DataFrame, formula_check: Literal['none', 'existence', 'wildcard', 'strict'] = 'existence', exclude_dna: bool = True, exclude_rna: bool = True, idprefix: str = 'refineGEMs', namespace: Literal['BiGG'] = 'BiGG') pandas.DataFrame[source]
Helper function to add reactions to a model from the missing_reactions table (output of the chosen implementation of
find_missing_reactions())- Args:
- model (cobra.Model):
The model, loaded with COBRpy.
- missing_reac_table (pd.DataFrame):
The missing reactions table.
- formula_check (Literal[‘none’,’existence’,’wildcard’,’strict’], optional):
Param for checking metabolite formula before adding them to the model. For more information, refer to
isreaction_complete()Defaults to ‘existence’.
- exclude_dna (bool, optional):
Option to exclude reaction containing ‘DNA’ from being added to the model. Defaults to True.
- exclude_rna (bool, optional):
Option to exclude reaction containing ‘RNA’ from being added to the model. Defaults to True.
- idprefix (str, optional):
A prefix to use, if pseudo-IDs need to be created. Defaults to ‘refineGEMs’.
- namespace (Literal[‘BiGG’], optional):
Namespace to use for the reactions and metabolites (and the model). Defaults to ‘BiGG’.
- Raises:
TypeError: Unknown return type for reac param. Please contact the developers.
- Returns:
- pd.DataFrame:
Table containing the information about which genes can now be added to reactions (use for GPR curation).
- fill_model(model: cobra.Model | libsbml.Model, **kwargs) libsbml.Model[source]
Based on a table of missing genes and missing reactions, fill the gaps in a model as good as possible automatically.
Note
This model rewrites and reloads the input model. Only the returned model has all the edits.
- Args:
- model (Union[cobra.Model,libModel]):
The model, either a libSBML or COBRApy model entity.
- kwargs:
Additional parameters to be passed to
add_reactions_from_table().
- Raises:
TypeError: Unknown type of model.
- Returns:
- libModel:
The gap-filled model.
- abstract find_missing_genes(model)[source]
Find missing genes in the model. Parameters can be extended as needed.
Needs to save a table in the format
ncbiprotein | locus_tag | <optional columns>to the attribute missing_genes.
- abstract find_missing_reactions(model)[source]
Find missing reactions in the model. Parameters can be extended as needed.
Needs to save a table of the format
ec-code | ncbiprotein | id | equation | reference | <is_transport> | via | add_to_GPRto the attribute missing_reactions.
Method specific information can be added to the reference column, which is expected to contain a dictionary. The ‘via’ column describes the way the database will be added to the model.
- report(dir: str, hide_zeros: bool = False, no_title: bool = False) None[source]
Based on the previous gap-filling, save statistics and missing genes/reactions for manual curation.
- Args:
- dir (str):
Path to a directory to save the report to.
- hide_zeros (bool, optional):
Option to hide statistics with zero counts. Defaults to False.
- with_title (bool, optional):
Option to get figure without title. Defaults to False.
- class refinegems.classes.gapfill.GeneGapFiller[source]
Bases:
GapFillerFind gaps in the model using the GFF file of the underlying genome and a DMND database and optionally NCBI.
This gap filling approach tries to identify missing genes from the GFF file and uses DIAMOND to run a blastp search for homologs against the DMND database
Note
Please keep in mind that using this module requires a model containing the Genbank locus tags as labels. If your model does not conform to this you can use one of the functions
polish_model()orextend_gp_annots_via_mapping_table().Hint
Files required for the swissprot approach can be downloaded with
download_url()- Attributes:
- GapFiller Attributes:
All attributes of the parent class
GapFiller
- GFF_COLS = {'eC_number': 'ec-code', 'locus_tag': 'locus_tag', 'protein_id': 'ncbiprotein'}
- __firstlineno__ = 1844
- __static_attributes__ = ('_variety', 'full_gene_list', 'missing_genes', 'missing_reactions')
- find_missing_genes(gffpath: str | Path, model: libsbml.Model)[source]
Find missing genes by comparing the CDS regions written in the GFF with the GeneProduct entities in the model.
- Args:
- gffpath (Union[str, Path]):
Path to a GFF file (corresponding to the model).
- model (libModel):
The model loaded with libSBML.
- find_missing_reactions(model: cobra.Model, prefix: str = 'refinegems', type_db: Literal['swissprot', 'user'] = 'swissprot', fasta: str = None, dmnd_db: str = None, map_db: str = None, mail: str = None, check_NCBI: bool = False, threshold_add_reacs: int = 5, **kwargs) None[source]
Find missing reactions in the model by blasting the missing genes against the SwissProt database and mapping the results to EC/BRENDA.
Optionally, query the protein accession numbers against NCBI (increases runtime significantly).
Hint
For more information on how to get the SwissProt files, please see
download_url().- Args:
- model (cobra.Model):
The model loaded with COBRApy.
- prefix (str, optional):
Prefix for gene IDs in the model, the have to be generated randomly, as no ID from the chosen namespace (usually NCBI protein) has been found. Defaults to ‘refinegems’.
- mail (str, optional):
Mail address for the query against NCBI. If not set, skips the mapping. Defaults to None.
- check_NCBI (bool, optional):
If set to True, checking the gene IDs / NCBI protein IDs against the NCBI database is enabled. Else, this step is skipped to reduce runtime. Only usable with SwissProt as database. Defaults to False.
- fasta (str, optional):
The protein FASTA file of the organism the model was build on. Required for the searchh against SwissProt. Defaults to None.
- type_db (Literal[‘swissprot’,’user’], optional):
Database to search against. Choose ‘swissprot’ for SwissProt or ‘user’ for a user defined database. Defaults to ‘swissprot’.
- dmnd_db (str, optional):
Path to the DIAMOND database containing the protein sequences of SwissProt. Required for the search against SwissProt or rhe users own DIAMOND dn. Defaults to None.
- map_db (str, optional):
Path to the SwissProt mapping file. Required for the search against SwissProt. Greatly decreases runtime for running the DIAMOND search.
- ..note::
The mapping depends on the chosen database.
Defaults to None.
- threshold_add_reacs (int, optional):
Threshold for the amount of reactions to add to the model. Defaults to 5.
- **kwargs:
Further optional parameters for the mapping, e.g. outdir, sens, cov, t, pid, etc. For more information see
refinegems.utility.db_access.map_to_homologs()in case of type_db = ‘user’ orrefinegems.utility.db_access.get_ec_via_swissprot()in case of type_db = ‘swissprot’.
- class refinegems.classes.gapfill.KEGGapFiller(organismid)[source]
Bases:
GapFillerBased on a KEGG organism ID (corresponding to the organism of the model), find missing genes in the model and map them to reactions to try and fill the gaps found with the KEGG database.
Note
Please keep in mind that using this module requires a model containing the Genbank locus tags as labels. If your model does not conform to this you can use one of the functions
polish_model()orextend_gp_annots_via_mapping_table().Hint
Due to the KEGG REST API this is relatively slow.
Attributes:
- GapFiller Attributes:
All attributes of the parent class
GapFiller
- organismid (str, required):
Abbreviation of the organism in the KEGG database.
- __firstlineno__ = 1229
- __static_attributes__ = ('_variety', 'full_gene_list', 'missing_genes', 'missing_reactions', 'organismid')
- find_missing_genes(model: libsbml.Model)[source]
Get the missing genes in model in comparison to the KEGG entry of the organism. Saves a table containing the missing genes to the attribute missing_genes.
Format:
orgid:locus | locus_tag | kegg.orthology | ec-code | ncbiprotein | uniprot- Args:
- model (libModel):
The model loaded with libSBML.
- find_missing_reactions(model: cobra.Model, threshold_add_reacs: int = 5)[source]
Find missing reactions in the model. Parameters can be extended as needed.
Needs to save a table of the format
ec-code | ncbiprotein | id | equation | reference | <is_transport> | via | add_to_GPRto the attribute missing_reactions.
Method specific information can be added to the reference column, which is expected to contain a dictionary. The ‘via’ column describes the way the database will be added to the model.
- refinegems.classes.gapfill.cobra_gapfill_wrapper(model: cobra.Model, universal: cobra.Model, medium: Medium, namespace: Literal['BiGG'] = 'BiGG', growth_threshold: float = 0.05, iterations: int = 3, chunk_size: int = 10000) cobra.Model[source]
Wrapper for
single_cobra_gapfill().Either use the full set of reactions in universal model by setting iteration to 0 or None or use them in randomized chunks for faster runtime (useful on laptops). Note: when using the second option, be aware that this does not test all reaction combinations exhaustively (heuristic approach!!!).
- Args:
- model (cobra.Model):
The model to perform gapfilling on.
- universal (cobra.Model):
A model with reactions to be potentially used for the gapfilling.
- medium (Medium):
A medium the model should grow on.
- namespace (Literal[‘BiGG’], optional):
Namespace to use for the model. Options include ‘BiGG’. Defaults to ‘BiGG’.
- growth_threshold (float, optional):
Growth threshold for the gapfilling. Defaults to 0.05.
- iterations (int, optional):
Number of iterations for the heuristic version of the gapfilling. If 0 or None is given, uses full set of reactions. Defaults to 3.
- chunk_size (int, optional):
Number of reactions to be used for gapfilling at the same time. If None or 0 is given, use full set, not heuristic. Defaults to 10000.
- Returns:
- cobra.Model:
The gapfilled model, if a solution was found.
- refinegems.classes.gapfill.map_biocyc_to_reac(biocyc_reacs: pandas.DataFrame, use_MNX: bool = True, use_BiGG: bool = True) pandas.DataFrame[source]
Based on a table containing BioCyc reactions, map them to reactions in other databases (if a mapping is possible)
- Args:
- biocyc_reacs (pd.DataFrame):
Table containing BioCyc reactions information. Should contain the columns: Reaction | Object ID | EC-Number | Spontaneous? Can be doenloaded as a SmartTable from BioCyc.
- use_MNX (bool, optional):
Try mapping using the MetaNetX database. Defaults to True.
- use_BiGG (bool, optional):
Try mapping using the BiGG database. Defaults to True.
- Returns:
- pd.DataFrame:
The extended table.
- refinegems.classes.gapfill.map_ec_to_reac(table: pandas.DataFrame, use_MNX: bool = True, use_BiGG: bool = True, use_KEGG: bool = True, threshold_add_reacs: int = 5) pandas.DataFrame[source]
Based on a table of NCBI protein IDs and EC numbers, map them to reactions via different databases (if a mapping is possible).
- input table should have format:
ec-code | ncbiprotein- output table has the format:
ec-code | ncbiprotein | id | equation | reference | is_transport | via- Args:
- table (pd.DataFrame):
The input table.
- use_MNX (bool, optional):
Try mapping using the MetaNetX database. Defaults to True.
- use_BiGG (bool, optional):
Try mapping using the BiGG database. Defaults to True.
- use_KEGG (bool, optional):
Try mapping using the KEGG database. Defaults to True.
- threshold_add_reacs (int, optional):
Maximum number of reactions allowed per EC number per ncbiprotein ID to be added. Otherwise skip addition of reactions due to insufficient evidence Defaults to 5.
- Returns:
- pd.DataFrame:
The extended table.
- refinegems.classes.gapfill.multiple_cobra_gapfill(model: cobra.Model, universal: cobra.Model, media_list: list[Medium], namespace: Literal['BiGG'] = 'BiGG', growth_threshold: float = 0.05, iterations: int = 3, chunk_size: int = 10000) cobra.Model[source]
Perform
single_cobra_gapfill()on a list of media.- Args:
- model (cobra.Model):
The model to be gapfilled.
- universal (cobra.Model):
The model to use reactions for gapfilling from.
- media_list (list[Medium]):
List ofmedia the model is supposed to grow on.
- growth_threshold (float, optional):
Growth threshold for the gapfilling. Defaults to 0.05.
- iterations (int, optional):
Number of iterations for the heuristic version of the gapfilling. If 0 or None is given, uses full set of reactions. Defaults to 3.
- chunk_size (int, optional):
Number of reactions to be used for gapfilling at the same time. If None or 0 is given, use full set, not heuristic. Defaults to 10000.
- Returns:
- cobra.Model:
The gapfilled model, if a solution was found.
- refinegems.classes.gapfill.single_cobra_gapfill(model: cobra.Model, universal: cobra.Model, medium: Medium, namespace: Literal['BiGG'] = 'BiGG', growth_threshold: float = 0.05) list[str] | bool[source]
Attempt gapfilling (with COBRApy) for a given model to allow growth on a given medium.
- Args:
- model (cobra.Model):
The model to perform gapfilling on.
- universal (cobra.Model):
A model with reactions to be potentially used for the gapfilling.
- medium (Medium):
A medium the model should grow on.
- namespace (Literal[‘BiGG’], optional): Namespace to use for the model.
Defaults to ‘BiGG’.
- growth_threshold (float, optional): Minimal rate for the model to be considered growing.
Defaults to 0.05.
- Returns:
- Union[list[str],True]:
List of reactions to be added to the model to allow growth or True, if the model already grows.
medium module
Provides the medium class and additional functions to handle media.
Functionalities include (amongst others):
loading a medium into a Medium object from a database, a file or a model
adding a medium to a model
adding media information to the database
extending, change and manipulate various parts of a medium to create the desired medium
- refinegems.classes.medium.ALLOWED_DATABASE_LINKS = ['BiGG', 'MetaNetX', 'SEED', 'VMH', 'ChEBI', 'KEGG']
- refinegems.classes.medium.FLOAT_REGEX = re.compile('[+-]?(\\d+(\\.\\d*)?|\\.\\d+)([eE][+-]?\\d+)?')
- refinegems.classes.medium.INTEGER_REGEX = re.compile('^[-+]?([1-9]\\d*|0)$')
- class refinegems.classes.medium.Medium(name: str, substance_table: pandas.DataFrame = pd.DataFrame(columns=['name', 'formula', 'flux', 'source', 'db_id', 'db_type']), description: str = None, doi: str = None)[source]
Bases:
objectClass describing a medium.
- Attributes:
- name (str):
The name or abbreviation of the medium.
- substance_table (pd.DataFrame):
A table containing information about the medium in silico compounds. Long format.
- description (str, optional):
Short description of the medium. Defaults to None.
- doi (str):
Reference(s) to the original publication of the medium. Defaults to None.
- __firstlineno__ = 54
- __init__(name: str, substance_table: pandas.DataFrame = pd.DataFrame(columns=['name', 'formula', 'flux', 'source', 'db_id', 'db_type']), description: str = None, doi: str = None)[source]
Initialise a Medium object.
- Args:
- name (str):
The name or abbreviation of the medium.
- substance_table (pd.DataFrame, optional):
A table containing information about the medium in silico compounds. Long format. Defaults to an empty table with the columns [‘name’,’formula’,’flux’,’source’,’db_id’,’db_type’].
- description (str, optional):
Short description of the medium. Defaults to None.
- doi (str, optional):
Reference(s) to the original publication of the medium.. Defaults to None.
- __static_attributes__ = ('description', 'doi', 'name', 'substance_table')
- add_subset(subset_name: str, default_flux: float = 10.0, inplace: bool = True) Medium[source]
Add a subset of substances to the medium, returning a newly generated one.
- Args:
- subset_name (str):
The type of subset to be added. Name should be in database-substset-id.
- default_flux (float, optional):
Default flux value to calculate fluxes from based on the percentages saved in the database. Defaults to 10.0.
- Returns:
- Medium:
A new medium that is the combination of the set subset and the old one. In the case that the given subset name is not found in the database, the original medium is returned.
- add_substance_from_db(name: str, flux: float = 10.0)[source]
Add a substance from the database to the medium.
- Args:
- name (str):
Name of the substance. Should be part of the database substance.name column.
- flux (float, optional):
Sets the flux value of the new substance. Defaults to 10.0.
- combine(other: Medium, how: Literal['+'] | None | float | tuple[float] = '+', default_flux: float = 10.0, inplace: bool = False) Medium[source]
Combine two media into a new one.
Modes to combine media (input for param how):
None -> combine media, remove all flux values (= set them to None). Sets sources to None as well.
‘+’ -> Add fluxes of the same substance together.
float -> Calculate flux * percentage (float) for first medium and flux * 1.0-percentage (float) for second medium and add fluxes of same substance together.
tuple(float,float) -> Same as above, except both percentages are given.
- Args:
- other (Medium):
The medium to combine with.
- how (Union[Literal[‘+’],None,float,tuple[float]], optional):
How to combine the two media. Options listed in header. Defaults to ‘+’.
- default_flux (float, optional):
Flux to use in combine-modes (except how=None) for NaN/None values. Defaults to 10.0.
- Returns:
- Medium:
The combined medium.
- export_to_cobra(namespace: Literal['Name', 'BiGG'] = 'BiGG', default_flux: float = 10.0, replace: bool = False, double_o2: bool = True) dict[str, float][source]
Export a medium to the COBRApy format for a medium.
- Args:
- namespace (Literal[‘Name’, ‘BiGG’], optional):
Namespace to use. Defaults to ‘BiGG’.
- default_flux (float, optional):
Default flux to substitute missing values. Defaults to 10.0.
- replace (bool, optional):
Replace all values with the default flux. Defaults to False.
- double_o2 (bool, optional):
Double the flux of oxygen. Defaults to True.
- Raises:
ValueError: Unknown namespace.
- Returns:
- dict[str,float]:
The exported medium.
- export_to_file(type: str = 'tsv', dir: str = './', max_widths: int = 80)[source]
Export medium, especially substance table.
- Args:
- type (str, optional):
Type of file to export to. Defaults to ‘tsv’. Further choices are ‘csv’, ‘docs’, ‘rst’.
- dir (str, optional):
Path to the directory to write the file to. Defaults to ‘./’.
- max_widths (int, optional):
Maximal table width for the documentation table (). Only viable for ‘rst’ and ‘docs’. Defaults to 80.
- Raises:
ValueError: Unknown export type if type not in [‘tsv’,’csv’,’docs’,’rst’]
- get_source(element: str) list[str][source]
Get the source of a given element for the medium.
Search for the given element (elemental symbol e.g. O), excluding pattern matches that are followed by other lower-case letters and returm them as a list of sources for the given element.
- Args:
- element (str):
The symbol of the element to search the sources of
- Returns:
- list[str]:
The list of the names of the sources (no duplicates).
- is_aerobic() bool[source]
Check if the medium contains O2 / dioxygen.
- Returns:
- bool:
Results of the test, True if pure oxygen is in the medium.
- make_aerobic(flux: float = None)[source]
If the medium is curretly anaerobic, add oxygen to the medium to make it aerobic.
- Args:
- flux(float,optional):
The flux value for the oxygen to be added. Defaults to None.
- make_anaerobic()[source]
If the medium is currently aerobic, deletes the oxygen from it to make it anaerobic.
- produce_medium_docs_table(folder: str = './', max_width: int = 80) str[source]
Produces a rst-file containing reStructuredText for the substance table for documentation.
- Args:
- folder (str, optional):
Path to folder/directory to save the rst-file to. Defaults to ‘./’.
- max_width (int, optional):
Maximal table width of the rst-table. Defaults to 80.
- remove_substance(name: str)[source]
Remove a substance from the medium based on its name
- Args:
- name (str):
Name of the substance to remove.
- set_default_flux(flux: float = 10.0, replace: bool = False, double_o2: bool = True)[source]
Set a default flux for the model.
- Args:
- flux (float, optional):
Default flux for the medium. Defaults to 10.0.
- replace (bool, optional):
Replace al fluxes with the default. Defaults to False.
- double_o2 (bool, optional):
Tag to double the flux for oxygen only. Works only with replace=True. Defaults to True.
- set_oxygen_percentage(perc: float = 1.0)[source]
Set oxygen percentage of the medium.
- Args:
- perc (float, optional):
Percentage of oxygen. Defaults to 1.0 (= 100%)
- set_source(element: str, new_source: str)[source]
Set the source for a given element to a specific substance by deleting all other sources of said element before adding the new source.
- Args:
- element (str):
The element to set the source for, e.g. ‘O’ for oxygen.
- new_source (str):
The new source. Should be the name of a substance in the database, otherwise no new source will be set.
- refinegems.classes.medium.REQUIRED_SUBSTANCE_ATTRIBUTES = ['name', 'formula', 'flux', 'source']
- refinegems.classes.medium.add_subset_to_db(name: str, desc: str, subs_dict: dict, database: str = PATH_TO_DB, default_perc: float = 1.0) None[source]
Add a new subset to the database.
- Args:
- name (str):
Name (Abbreviation) of the new subset. Needs to be unique for the databse.
- desc (str):
Description of the new subset.
- subs_dict (dict):
Dictionary of the names and percentages for the substances to be included in the new subsets. The names should be part of the substance table.
- database (str, optional):
Which database to connect to. Defaults to PATH_TO_DB.
- default_perc (float, optional):
Default percentage to set if None is given in the dictionary. Defaults to 1.0.
- refinegems.classes.medium.enter_db_single_entry(table: str, columns: list[str], values: list[Any], database: str = PATH_TO_DB)[source]
Enter a single entry into a database.
- Args:
- table (str):
Which table to enter information to.
- columns (list[str]):
Name of the columns to add information to.
- values (list[Any]):
List of new values for the columns.
- database (str, optional):
Database to add a row to. Defaults to PATH_TO_DB.
- refinegems.classes.medium.enter_m2s_row(row: pandas.Series, medium_id: int, connection: Connection, cursor: Cursor)[source]
Helper function for
refinegems.classes.medium.enter_medium_into_db(). Enters a new entry in the medium2substance table.- Args:
- row (pd.Series):
A row of the pd.DataFrame of the
enter_medium_into_db()function.
- medium_id (int):
The row id of the medium.
- connection (sqlite3.Connection):
Connection to the database.
- cursor (sqlite3.Cursor):
Cursor for the database.
- refinegems.classes.medium.enter_medium_into_db(medium: Medium, database: str = PATH_TO_DB)[source]
Enter a new medium to an already existing database.
- Args:
- medium (Medium):
A medium object to be added to the database.
- database (str, optional):
Path to the database. Defaults to the in-build databse.
- refinegems.classes.medium.enter_s2db_row(row: pandas.Series, db_type: str, connection: Connection, cursor: Cursor)[source]
Helper function for
enter_medium_into_db(). Enters a new entry in the substance2db table after checking if it has yet to be added.- Args:
- row (pd.Series):
A row of the pd.DataFrame of the
enter_medium_into_db()function.
- db_type (str):
Type of database identifier to be added.
- connection (sqlite3.Connection):
Connection to the database.
- cursor (sqlite3.Cursor):
Cursor for the database.
- refinegems.classes.medium.enter_substance_row(row: pandas.Series, connection: Connection, cursor: Cursor) int[source]
Helper function for
enter_medium_into_db(). Enters a new entry in the medium2substance table.- Args:
- row (pd.Series):
A row of the pd.DataFrame of the
enter_medium_into_db()function.
- connection (sqlite3.Connection):
Connection to the database.
- cursor (sqlite3.Cursor):
Cursor for the database.
- Returns:
- int:
The substance ID in the database of the substance.
- refinegems.classes.medium.extract_medium_info_from_model_bigg(row, model: cobra.Model) pandas.Series[source]
Helper function for
read_from_cobra_model(). Extracts more information about the medium.- Args:
- row (pd.Series):
A row of the datatable of
read_from_cobra_model().
- model (cobra.Model):
The cobra Model
- Returns:
- pd.Series:
One row for the substance table.
- refinegems.classes.medium.generate_docs_for_subset(subset_name: str, folder: str = './', max_width: int = 80)[source]
Generate documentation for a subset.
- Args:
- subset_name (str):
Name of the subset.
- folder (str, optional):
Folder to save the output to. Defaults to ‘./’.
- max_width (int, optional):
Maximal table width for the documentation page. Defaults to 80.
- refinegems.classes.medium.generate_insert_query(row: pandas.Series, cursor) str[source]
Helper function for
update_db_multi(). Generate the SQL string for inserting a new line into the database based on a row of the table.- Args:
- row (pd.Series):
One row of the table of the parent function.
- Returns:
- str:
The constructed SQL string.
- refinegems.classes.medium.generate_update_query(row: pandas.Series) str[source]
Helper function for
update_db_multi(). Generates an update SQL query for the provided table- Args:
- row (pd.Series):
Series containing the row of a DataFrame to be used to update a table in a database with columns table | column | new_value | conditions
- Returns:
- str:
SQL query to be used to update a table in a database with the provided data
- refinegems.classes.medium.get_last_idx_table(tablename: str, connection: Connection, cursor: Cursor) int[source]
Helper function. Retrieves the last row id of a specified table of the database.
- Args:
- tablename (str):
The name of the table to retrieve the last row id from
- connection (sqlite3.Connection):
Connection to the database.
- cursor (sqlite3.Cursor):
Cursor for the database.
- Returns:
- int:
The last row ID of the specified table.
- refinegems.classes.medium.load_external_medium(how: Literal['file', 'console'], **kwargs) Medium[source]
Read in an external medium.
Currently available options for how
‘console’: read in medium via the console
‘file’: read in medium from a file, requires a ‘path=str’ argument to be passed.
About the format (console, file):
The substances have to be saved in a TSV file table (format see
read_substances_from_file()). Further information for the ‘file’ option have to added as comments in the format: # info_name: info. Information should but do not need to contain name, description and reference.- Args:
- how (Literal[‘file’,’console’]):
How (or from where) the medium should be read in. Available options are given above.
- Raises:
ValueError: Unknown description for how.
- Returns:
- Medium:
The read-in medium.
- refinegems.classes.medium.load_media(yaml_path: str) tuple[list[Medium], list[str, None]][source]
Load the information from a media configuration file.
- Args:
- yaml_path (str):
The path to a media configuration file in YAML-format.
- Returns:
- tuple[list[Medium],list[str,None]]:
Tuple of two lists (1) & (2)
list: list of the loaded media and
list: list of supplement modes
- refinegems.classes.medium.load_medium_from_db(name: str, database: str = PATH_TO_DB, type: str = 'standard') Medium[source]
Load a medium from a database.
- Args:
- name (str):
The name (or identifier) of the medium.
- database (str, optional):
Path to the database. Defaults to the in-built database.
- type (str, optional):
How to load the medium. Defaults to ‘standard’.
- Raises:
ValueError: Unknown medium name.
- Returns:
- Medium:
The medium retrieved from the database.
- refinegems.classes.medium.medium_to_model(model: cobra.Model, medium: Medium, namespace: str = 'BiGG', default_flux: float = 10.0, replace: bool = False, double_o2: bool = True, add: bool = True) None | dict[str, float][source]
Add a medium to a COBRApy model.
- Args:
- model (cobra.Model):
A model loaded with COBRApy.
- medium (Medium):
A refinegems Medium object.
- namespace (str, optional):
String to set the namespace to use for the model IDs. Defaults to ‘BiGG’.
- default_flux (float, optional):
Set a default flux for NaN values or all. Defaults to 10.0.
- replace (bool, optional):
Option to replace existing flux values with the default if set to True. Defaults to False.
- double_o2 (bool, optional):
Double the oxygen amount in the medium. Defaults to True.
- add (bool, optional):
If True, adds the medium to the model, else the exported medium is returned. Defaults to True.
- Returns:
- Union[None, dict[str, float]]:
Either none or the exported medium.
- refinegems.classes.medium.read_from_cobra_model(model: cobra.Model, namespace: Literal['BiGG'] = 'BiGG') Medium[source]
Read and import a medium from a cobra model into a Medium object.
- Args:
- model (cobra.Model):
An open cobra Model.
- Returns:
- Medium:
The imported medium.
- refinegems.classes.medium.read_substances_from_file(path: str) pandas.DataFrame[source]
Read in a TSV with substance information into a table.
Format of the TSV (with example): name formula flux source X X | … water H20 10.0 …..
X: placeholder for database names (columns filled with corresponding IDs of the substances) X = see ALLOWED_DATABASE_LINKS
- Args:
- path(str):
The path to the input file.
- Returns:
- pd.DataFrame:
The table of substance information read from the file
- refinegems.classes.medium.update_db_entry_single(table: str, column: str, new_value: Any, conditions: dict, database: str = PATH_TO_DB)[source]
Update a single database entry.
- Args:
- table (str):
Name of the table to update.
- column (str):
Name of the Attribute to change.
- new_value (Any):
New value to be set.
- conditions (dict):
Further conditions.
- database (str, optional):
Which database to change. Defaults to PATH_TO_DB.
- refinegems.classes.medium.update_db_multi(data: pandas.DataFrame, update_entries: bool, database: str = PATH_TO_DB)[source]
Updates/Inserts multiple entries in a table from the specified database. Given table should have the format:
row : table | column | new_value | conditions
Notes:
multiple columns and values are lists with a “,” and no whitespaces
conditions are listed like: a=x;b=y;…
conditions separated by ‘;’
column and value separated by ‘=’
no whitespaces
- Args:
- data (pd.DataFrame):
DataFrame containing the columns table | column | new_value | conditions
- update_entries (bool):
Boolean to determine whether entries should be inserted or updated. False means insert.
- database (str, optional):
Path to a database. Defaults to PATH_TO_DB.
- refinegems.classes.medium.updated_db_to_schema(directory: str = '../data/database', inplace: bool = False)[source]
Extracts the SQL schema from the database data.db & Transfers it into an SQL file
- Args:
- directory(str,optional):
Path to the directory of the updated DB. Defaults to ‘../data/database’.
- inplace(bool, optional):
If True, uses the default sql-file name, otherwise extends it with the prefix
updated.
reports module
Classes to generate, handle, manipulate and save reports.
- class refinegems.classes.reports.AuxotrophySimulationReport(results)[source]
Bases:
ReportReport for the auxotrophy simulation.
- Attributes:
- simulation_results:
The data of the simulation.
- __firstlineno__ = 837
- __static_attributes__ = ('simulation_results',)
- save(dir: str, color_palette: str = 'YlGn')[source]
Save the report to a given dictionary.
- Args:
- dir (str):
Path to a dictionary.
- color_palette (str, optional):
Name of a matplotlib colour palette. Defaults to ‘YnGr’.
- visualise_auxotrophies(color_palette: str = 'YlGn', save: None | str = None) None | matplotlib.figure.Figure[source]
Visualise and/or save the results of the
test_auxotrophies()function.- Args:
- res (pd.DataFrame):
The output of
test_auxotrophies().
- color_palette (str, optional):
A name of a seaborn gradient color palette. In case name is unknown, takes the default. Defaults to ‘YlGn’.
- save (None | str, optional):
Path to a directory, if the output shall be saved. Defaults to None (returns the figure).
- Returns:
- Case:
save = str - None: No
return, as the visulaisation is directly saved.
- Case:
- Case:
save = None - matplotlib.figure.Figure:
The plotted figure.
- Case:
- class refinegems.classes.reports.CorePanAnalysisReport(model: cobra.Model, core_reac: list[str] = None, pan_reac: list[str] = None, novel_reac: list[str] = None)[source]
Bases:
ReportReport for the core-pan analysis.
Summarises the information and provides functions for visualisation.
- Attributes:
- model:
The model the report is based on.
- core_reac:
List of reactions considered “core”.
- pan_reac:
List of reactions considered “pan”.
- novel_reac:
List of reactions considered “novel”.
- __firstlineno__ = 1063
- __init__(model: cobra.Model, core_reac: list[str] = None, pan_reac: list[str] = None, novel_reac: list[str] = None)[source]
- __static_attributes__ = ('core_reac', 'model', 'novel_reac', 'pan_reac')
- get_reac_counts()[source]
Return a dictionary of the counts of the reactions types (core, pan, novel).
- isValid(check='reaction-count') bool[source]
Check if a certain part of the analysis is valid.
Currently possible checks:
- reaction-count :
check if the number of reactions in the model equal the sum of the novel, pan and core reactions
- Args:
- check (str, optional):
Describes which part to check. Options are listed above. Defaults to ‘reaction-count’.
- Raises:
ValueError: Unknown string for parameter check.
- Returns:
- bool:
Result of the check.
- save(dir: str)[source]
Save the results inside a PanCoreAnalysisReport object.
The function creates a new folder ‘pan-core-analysis’ inside the given directory and creates the following documents:
table_reactions.tsv : reactions ID mapped to their labels
visualise_reactions : donut chart of the values above
- Args:
- dir (str):
Path to a directory to save the output to.
- class refinegems.classes.reports.GapFillerReport(variety: str, statistics: dict, manual_curation: dict, hide_zeros: bool = False, no_title: bool = False)[source]
Bases:
ReportReport for the gap-filling of the model.
- Attributes:
- variety:
The variety of the gap-filling method used.
- statistics:
List of different counts for reactions, genes and metabolites.
- manual_curation:
List of IDs for manual curation.
- hide_zeros:
Option to hide all zero values in the statistics. Defaults to False.
- __firstlineno__ = 1646
- __init__(variety: str, statistics: dict, manual_curation: dict, hide_zeros: bool = False, no_title: bool = False) None[source]
- __static_attributes__ = ('_statistics', 'hide_zeros', 'manual_curation', 'no_title', 'statistics', 'variety')
- save(dir=str, color_palette: str = 'YlGn') None[source]
Save the report.
- Args:
- dir (str):
Path to a directory to save the report to.
- color_palette (str, optional):
A colour gradient from the matplotlib library. If the name does not exist, uses the default. Defaults to YlGn.
- property statistics
- Get or set the current statistics dictionary.While setting the provided all zeros are removed from the dictionaryand if the values behind the keys ‘unmappable’ and ‘missing (remaining)’are the same only ‘missing (remaining)’ is kept.
- class refinegems.classes.reports.GrowthSimulationReport(reports: list[SingleGrowthSimulationReport] = None)[source]
Bases:
ReportReport for the growth simulation analysis.
- Attributes:
- reports:
List of the report for the single growth analysis.
- model:
List of the model names.
- media:
List of the media names.
- __firstlineno__ = 139
- __init__(reports: list[SingleGrowthSimulationReport] = None)[source]
- __static_attributes__ = ('media', 'models', 'reports')
- add_sim_results(new_rep: SingleGrowthSimulationReport)[source]
Add a new single growth report to the reports list
- Args:
- new_rep (SingleGrowthSimulationReport):
The new simulation report.
- plot_growth(unit: Literal['h', 'dt'] = 'dt', color_palette: str = 'YlGn') matplotlib.figure.Figure[source]
Visualise the contents of the report.
Note
Please keep in mind that the figure does not show unrealistically high and minicules values to zero. However, all values are contained within the table one can get via
to_table().- Args:
- unit (Literal[‘h’,’dt’], optional):
Set the unit to plot. Can be doubling time in minutes (‘dt’) or growth rates in mmol/gDWh (‘h’). Defaults to ‘dt’.
- color_palette (str, optional):
A colour gradient from the matplotlib library. If the name does not exist, uses the default. Defaults to ‘YlGn’.
- Returns:
- matplotlib.figure.Figure:
The plotted figure.
- save(to: str, how: Literal['dir'] = 'dir', check_overwrite: bool = True, color_palette: str = 'YlGn')[source]
Save the report.
Current options include:
‘dir’: save the report to a directory, including a txt and two graphics
- Args:
- to (str):
Path to a directory to save the report to.
- how (Literal[‘dir’], optional):
How to save the report. For options see functions description. Defaults to ‘dir’.
- check_overwrite (bool, optional):
Flag to choose to check for existing directory/files of same name or just to overwrite them. Defaults to True.
- color_palette (str, optional):
A colour gradient from the matplotlib library. If the name does not exist, uses the default. Defaults to ‘YlGn’.
- Raises:
ValueError: If the parameter ‘how’ is given something unexpected.
- class refinegems.classes.reports.KEGGPathwayAnalysisReport(total_reac=None, kegg_count=None, kegg_global=None, kegg_over=None, kegg_rest=None)[source]
Bases:
ReportReport for the KEGG pathway analysis.
- Attributes:
- total_reac:
An integer for the total number of reactions in the model.
- kegg_count:
An integer as a counter for the KEGG pathway annotations.
- kegg_global:
Dictionary of global KEGG IDs and their counts.
- kegg_over:
Dictionary of overvire KEGG IDs and their counts.
- kegg_rest:
Dictionary of the remaining KEGG IDs and their counts.
- __firstlineno__ = 527
- __init__(total_reac=None, kegg_count=None, kegg_global=None, kegg_over=None, kegg_rest=None) None[source]
- __static_attributes__ = ('kegg_count', 'kegg_global', 'kegg_over', 'kegg_paths', 'total_reac')
- save(dir: str, colors: str = 'YlGn') None[source]
Save the content of the report as plots.
- Args:
- dir (str):
Path to a directory to save the output directory with all the plot in.
- colors(str,optional):
Colour palette for the plots. Should be a valid name of a matplotlib sequential colour palette.
- visualise_kegg_counts(colors: list[str] = ['lightgreen', 'darkgreen']) matplotlib.pyplot.figure[source]
Visualise the amounts of reaction with and without KEGG pathway annotation.
- Args:
- colors (list[str], optional):
List of two colours used for the plotting. If wrong number or non-matplotlib colours are given, sets its to the default. Defaults to ‘lightgreen’ and ‘darkgreen’.
- Returns:
- plt.figure:
The resulting plot.
- visualise_kegg_pathway(plot_type: Literal['global', 'overview', 'high', 'existing'] = 'global', label: Literal['id', 'name'] = 'id', color_palette: str = 'YlGn') matplotlib.pyplot.figure[source]
Visualise the KEGG pathway identifiers present.
Depending on the :plot_type:, different levels of pathway identifiers are plotted:
global: check and plot only the global pathway identifiers
overview: check and plot only the overview pathway identifiers
high: check and plot all identifiers grouped by their high level pathway identifiers. This option uses label=name, independedly of the input
all: check and plot all identifiers
- Args:
- plot_type (Literal[“global”,”overview”,”high”,”existing”], optional):
Type of plot, explaination see above. Defaults to ‘global’.
- label (Literal[“id”,”name”], optional):
Type of the label. If ‘id’, uses the KEGG pathway IDs, if ‘name’, uses the pathway names. Defaults to ‘id’.
- color_palette (str, optional):
A colour gradient from the matplotlib library. If the name does not exist, uses the default. Defaults to ‘YlGn’.
- Returns:
- plt.figure:
The plotted visualisation.
- refinegems.classes.reports.KEGG_GLOBAL_PATHWAY = {'01100': 'Metabolic pathways', '01110': 'Biosynthesis of secondary metabolites', '01120': 'Microbial metabolism in diverse environments'}
- refinegems.classes.reports.KEGG_METABOLISM_PATHWAY
- refinegems.classes.reports.KEGG_METABOLISM_PATHWAY_DATE = '6. July 2023'
- refinegems.classes.reports.KEGG_OVERVIEW_PATHWAY = {'01200': 'Carbon metabolism', '01210': '2-Oxocarboxylic acid metabolism', '01212': 'Fatty acid metabolism', '01220': 'Degradation of aromatic compounds', '01230': 'Biosynthesis of amino acids', '01232': 'Nucleotide metabolism', '01240': 'Biosynthesis of cofactors', '01250': 'Biosynthesis of nucleotide sugars'}
- class refinegems.classes.reports.ModelInfoReport(model: cobra.Model)[source]
Bases:
ReportReport about the basic information of a given model.
- Attributes:
- name:
A string for the name of the model.
- reac:
List of the reactions in the model.
- meta:
List of the metabolites in the model.
- gene:
An int that describes the number of genes in the model.
- orphans:
List of metabolite IDs that are considered orphans.
- deadends:
List of metabolite IDs that are considered dead-ends.
- disconnects:
List of metabolite IDs that are disconnected in the model.
- mass_charge_unbalanced:
List of reaction IDs that are unbalanced regarding their mass and their charges.
- mass_unbalanced:
List of reaction IDs that are unbalanced regarding their mass only.
- charge_unbalanced:
List of reactions IDs that are unbalanced regarding their charges only.
- pseudo:
List of pseudoreaction IDs (sinks, demands, exchanges) in the model.
- normal_with_gpr:
List of reactions IDs that are normal reactions with gpr.
- pseudo_with_gpr:
List of reactions IDs that are pseudoreactions with gpr.
- __firstlineno__ = 1250
- __static_attributes__ = ('charge_unbalanced', 'deadends', 'disconnects', 'gene', 'mass_charge_unbalanced', 'mass_unbalanced', 'meta', 'name', 'normal_with_gpr', 'orphans', 'pseudo', 'pseudo_with_gpr', 'reac')
- format_table(all_counts=True) pandas.DataFrame[source]
Put the information of the report into a pandas DataFrame table.
- Args:
- all_counts (bool, optional):
Option to save the list of e.g. reactions as such or to convert them into counts when set to True. Defaults to True.
- Returns:
- pd.DataFrame:
The data in table format
- class refinegems.classes.reports.MultiModelInfoReport[source]
Bases:
Report- __firstlineno__ = 1614
- __static_attributes__ = ('table',)
- add_single_report(report: ModelInfoReport) None[source]
- class refinegems.classes.reports.MultiSBOTermReport(reports: list[SBOTermReport] | SBOTermReport)[source]
Bases:
objectA collection of SBO term reports.
- Attributes:
- model_reports:
List of
SBOTermReports.
- __firstlineno__ = 1864
- __init__(reports: list[SBOTermReport] | SBOTermReport)[source]
- Args:
- reports (Union[list[SBOTermReport] | SBOTermReport]):
Either a single or a list of
SBOTermReports
- Raises:
ValueError: Wrong input type
- __static_attributes__ = ('model_reports',)
- add_report(report: SBOTermReport)[source]
Add another
SBOTermReportsto the report collection.- Args:
- report (SBOTermReport):
The report to add.
- save(dir: str, rename: dict = Union[None, dict], color_palette: str | list[str] = 'Paired', figsize: tuple = (10, 10))[source]
Save the information of contained in the report.
- Args:
- dir (str):
String for the path of the outpt directory.
- rename (Union[None,dict], optional):
Takes a dictioanry of model IDs and alternative names When set, uses the dictionary to rename the models. Defaults to None.
- color_palette (Union[str,list[str]], optional):
Color palette name or list of colours for the graphic. Defaults to ‘Paired’.
- figsize (tuple, optional):
Site of the figure. Requires a tuple of two integers. Defaults to (10,10).
- visualise(rename: dict = Union[None, dict], color_palette: str | list[str] = 'Paired', figsize: tuple = (10, 10)) matplotlib.figure.Figure[source]
Visualise the amount of SBO terms in the models.
- Args:
- rename (Union[None,dict], optional):
Takes a dictioanry of model IDs and alternative names When set, uses the dictionary to rename the models. Defaults to None.
- color_palette (Union[str,list[str]], optional):
Color palette name or list of colours for the graphic. Defaults to ‘Paired’.
- figsize (tuple, optional):
Site of the figure. Requires a tuple of two integers. Defaults to (10,10).
- Raises:
TypeError: Unkown type for color palette.
- Returns:
- matplotlib.figure.Figure:
The generated graphic
- class refinegems.classes.reports.Report[source]
Bases:
object- __firstlineno__ = 71
- __static_attributes__ = ()
- class refinegems.classes.reports.SBOTermReport(model: libsbml.Model)[source]
Bases:
ReportReport of the ABO terms of a model.
- Attributes:
- name:
Name (ID) of the model.
- sbodata:
Dictionary containing the SBO terms and the corresponding counts of annotations found in the model. Only includes SBO terms, that have at least 1 occurence in the model.
- __firstlineno__ = 1802
- __static_attributes__ = ('name', 'sbodata')
- class refinegems.classes.reports.SingleGrowthSimulationReport(model_name=None, medium_name=None, growth_value=None, doubling_time=None, additives=None, no_exchange=None)[source]
Bases:
ReportReport for a single growth simulation, one media against one model.
- Attributes:
- model_name:
Name of the model.
- medium_name:
Name of the medium.
- growth_value:
Simulated growth value.
- doubling_time:
Simulated doubling time.
- additives:
List of substances, that were added.
- no_exchange:
List of substances that normally would be found in the media but have been removed, as they are not part of the exchange reactions of the model.
- __firstlineno__ = 75
- __init__(model_name=None, medium_name=None, growth_value=None, doubling_time=None, additives=None, no_exchange=None)[source]
- __static_attributes__ = ('additives', 'doubling_time', 'growth_value', 'medium_name', 'model_name', 'no_exchange')
- class refinegems.classes.reports.SourceTestReport(results: pandas.DataFrame = None, element: str = None, model_name: str = None)[source]
Bases:
ReportReport for the source test (:py:func:rg.growth.test_growth_with_source).
- Attributes:
- results:
A pd.DataFrame with the results (substances and growth values).
- element:
The element the test was performed for.
- model_name:
The name of the model, that was tested.
- __firstlineno__ = 926
- __static_attributes__ = ('element', 'model_name', 'results')
- save(dir: str, width: int = 12, color_palette: str = 'YlGn') None[source]
Save the results of the source test.
- Args:
- dir (str):
Path to a directory to save the results to.
- width (int, optional):
Number of columns for the heatmap. Defaults to 12.
- color_palette (str, optional):
Color palette (gradient) for the plot. Defaults to ‘YlGn’.
- visualise(width: int = 12, color_palette: str = 'YlGn') tuple[matplotlib.figure.Figure, pandas.DataFrame][source]
Visualise the results of the source test as a heatmap
- Args:
- width (int, optional):
Number of columns to display for the heatmap. Number of row is calculated accordingly to fit all values. Defaults to 12.
- color_palette (str, optional):
Color palette (gradient) for the plot. Defaults to ‘YlGn’.
- Returns:
- tuple(matplotlib.Figure, pd.DataFrame):
The heatmap and the legend explaining the heatmap.