Development

To maintain or extend the toolbox refineGEMs please install the package via GitHub.

Warning

refineGEMs requires at least Python 3.10 since version 2.0.0.

Hint

For help and information about known bugs, refer to Help and FAQ.

Installation for developers

refineGEMs depends on the tools MCC and BOFdat which cannot directly be installed via PyPI or the pyproject.toml. Please install both tools before using refineGEMs into the corresponding environment:

pip install "masschargecuration@git+https://github.com/Biomathsys/MassChargeCuration@installation-fix"
pip install "bofdat@git+https://github.com/draeger-lab/BOFdat"

Into a Conda environment

Setup a conda virtual environment and use its pip to install refineGEMs into that environment.

# clone or pull the latest source code
git clone https://github.com/draeger-lab/refinegems.git
cd refinegems

conda create -n <EnvName> python=<Specific Python version >= 3.10>

conda activate <EnvName>

# check that pip comes from <EnvName>
which pip

pip install .

This will install all packags denoted in pyproject.toml.

If which pip does not show pip in the conda environment you can also create a local environment for which you can control the path and use its pip:

conda create --prefix ./<EnvName>

conda activate <path to EnvName>

<EnvName>/bin/pip install .

Into a Pipenv environment

You can use pipenv to keep all dependencies together. Therefore, you will need to install pipenv first. To install refineGEMs locally complete the following steps:

# install pipenv using pip
pip install pipenv

# clone or pull the latest source code
git clone https://github.com/draeger-lab/refinegems.git
cd refinegems

# install all dependencies from Pipfile
pipenv install .

# initiate a session in the virtual environment
pipenv shell

The pipenv package can also be installed via Anaconda (recommended if you are a Windows user).

Hint

If you want to be able to savely import the package from anywhere while also retaining the possibility to edit the code, it is recommended to change the pip install line from the code blocks to pip install -e . --config-settings editable_mode=strict.

Additional packages required for development

Attention

The following packages need to be installed to be able to add content to the refineGEMs documentation.

  • accessible-pygments

  • ipython

  • nbsphinx

  • pandoc

  • sphinx

  • sphinx_copybutton

  • sphinx_rtd_theme

  • sphinxcontrib-bibtex

In addition, pip-compile should be installed to update the requirements.txt for the next release.

Installing the packages

You can install the packages via pip to your local environment:

pip install accessible-pygments sphinx nbsphinx sphinx_rtd_theme pandoc ipython sphinxcontrib-bibtex sphinx_copybutton
python -m pip install pip-tools

Alternatively, install the tool with the extra docs, e.g.

 pip install -e ".[docs]" --config-settings editable_mode=strict

Updating the requirements.txt

To create the requirements.txt adjust the requirements.in file as needed in the folder docs.
Then navigate to the folder docs in the command line:
cd docs

and use the following command to automatically generate the new requirements.txt:

python3 -m piptools compile --strip-extras --output-file=requirements.txt requirements.in

To bump to the newest versions possible, use the following command in the docs directory:

pip-compile --upgrade

Todo Tree extension

If you are working with VS Code or similar, you can install the Todo Tree extension and copy the content of the TodoTree_params.txt file in the dev folder into the corresponding place in your setting to enable highlighting and tracing of the @KEYWORD laabel for bugs, discussions and more.

The usage of these keywords strongly recommended, as it make communication between the developers and tracing of issues and ideas much easier. The following label are currently suppored:

Table 15 Label supported with the refineGEMs Todo Tree file

label

usage

@TODO

something needs to be implemented or changed

@BUG

something is not working as expected, the cause has yet to be determined

@FIXME

known error or issue that needs attention

@TEST

the following code needs to be tested

@DEPRECATE

the following code can be removed in the future

@NOTE

notes from one dev to another or just a reminder

@DISCUSSION

an issue, idea or feature, that requires discussion

@DEBUG

label for a debugging switch or similar, see next section

@ASK

when something requires research or further input, before it can be discussed

@IDEA

write down ideas for new features, better implementations and more

@WARNING

if something needs to be kept in mind or can easily break down, without being a bug

Note

For the label to be recognised correctly, the following format is required: # @KEYWORD.

Debugging switches

  • You can enable debug logging by replacing level=logging.INFO with level=logging.DEBUG.

  • If you want your print message to show in the log file, replace the `print() statement by logging.info().

  • For debugging of pandas warnings or issues pd.options.mode.chained_assignment = None needs to be commented out.

  • Additionally, some modules contain comment blocks inf the format shown below.
    By enabling the code lines between the dotted lines, a debugging-mode is run, which e.g. subsets the data to shorten the runtime to make debugging faster.
# @DEBUG ...............
# some code
# ......................

Guidelines for code documentation

We use the autoDocstring extension (njpwerner.autodocstring) for VSCode with the google format to generate function docstrings. To ensure a nice looking sphinx documentation, we add - to all variables that are passed as Args. And tuple returns are written as follows:

If you use VSCode, a mustache file for the documentation style that can be integrated into VSCode (dev directory of refineGEMs).

 1# Tuple output & Single input
 2"""Description of the function...
 3
 4Args:
 5    - input1 (type):
 6        this is what input1 does
 7
 8Returns:
 9    tuple:
10        Two tables (1) & (2)
11
12        (1) pd.DataFrame: Table with charge mismatches
13        (2) pd.DataFrame: Table with formula mismatches
14"""
15
16# Single output with multiple possibilities & multiple inputs
17"""Description of the function...
18
19Args:
20    - input1 (type):
21        this is what input1 does
22    - input2 (type):
23        this is what input2 does
24    - input3 (type):
25        this is what input3 does
26
27Returns:
28    (1) Case: str
29
30        Return value 1
31
32    (2) Case: np.nan
33
34        Return value 2
35"""

We are also trying to make input and return types explicit by declaring those in the function header:

1def my_func(input1: int, input2: str, input3: Model) -> tuple[str, int]:

More details for certain specifics can also be found here.

Information about working on the media database

Add of update information in the database

At the end of the medium module are a set of funtions for automatic curation of the database.

More information about how to run these can be found in the db_extension.ipynb notebook in the dev folder inside the GitHub repository.

Create docs for media and subsets

After adding a new medium or subset to the database or updating existing information, the new documentation pages (.rst) can be generated automatically

For the media definition, use export_to_file():

1new_medium = load_medium_from_db(<name>)
2new_medium.export_to_file(type='docs',dir=<path>)

To create the documentation page for a subset, use generate_docs_for_subset():

1generate_docs_for_subset(<name>,folder='<path>')