Installation

The various PhenomeDB components can be installed separately, however to simplify the usage of these interacting components, they have been docker-ised.

Local/desktop Docker installation

  1. Install docker

  2. Download the repo

  3. cd into the repo directory

  4. Copy the .env-example file to a file called .env, and edit the parameters as required (see settings)

  5. cd into the directory and run docker-compose up

$ git clone git@github.com:phenomecentre/phenomedb.git
$ cd phenomedb
$ cp .env-example .env
$ vim .env # or whichever text editor
$ docker compose up

Python installatiion

Python installation is necessary for local IDE debugging of unit tests and building the docs.

Warning

Mac Mx or other ARM-based chips are not currently supported for local installation due to dependency hell. If you are running a Mac Mx chip, use one of the phenomedb-airflow containers to run the tests and build the docs

To install the phenomedb library locally:

  1. Checkout the repo

  2. install the pip requirements (inside a virtualenv or conda env)

  3. run setup.py install

  4. Either run the docker compose separately or install postgres and redis according to your OS instructions.

  5. Test the installation by running the phenomedb cli.py -h command

$ python setup.py install # this will fail the first time, run it twice
$ python setup.py install
$ docker compose up -d postgres redis
$ cd phenomedb
$ python cli.py -h

If it the cli.py -h command shows you a list of available tasks, the installation is working.

Running the tests

The tests can be run using pytest.

Using the local install:

$ docker compose up postgres redis
$ cd tests
$ pytest .

Using a phenomedb-airflow docker container:

$ docker compose up -d
$ docker exec -it phenomedb-scheduler-1 /bin/bash
$ cd /opt/phenomedb_app/phenomedb/
$ pytest tests/

Building the docs

The docs are hosted on readthedocs, but must be built locally before upload (due to the postgres and redis dependencies). The sphinx and sphinx-rtd-theme pip packages are required to build the docs. To upload them to readthedocs, simply push them to the repo.

$ cd docs
$ make clean && make html
$ cd ..
$ git add . -A
$ git commit -m 'updated docs'
$ git push

Settings

Settings in PhenomeDB are configured in different ways depending if PhenomeDB is being run via docker compose or not.

Local Python Installation

When running the phenomedb python library from a local host (instead of Docker), the configuration is controlled by the ./phenomedb/data/config/default-config.ini file. The configuration can be overriden by either copying this file to the same directory with the name config.ini, or by copying it a location on your machine and specifying the PHENOMEDB_CONFIG environment variable.

Docker installation

When running PhenomeDB from docker compose, you can edit the user-copied (during installation) env file ./.env. This file defines the environment variables inside the docker containers, and overrides the values in config.ini and default-config.ini..

Apache Airflow settings can be configured with the following syntax:

AIRFLOW__API__AUTH_BACKEND=airflow.api.auth.backend.basic_auth

PhenomeDB settings can be set in the same format:

PHENOMEDB__GROUP__SETTING=example

The .env-example file contains the recommended Airflow and ChemSpider settings, but they can be adjusted as required.

The config.ini file contains the following groups and settings:

To use the ImportCompoundTask compound lookup functionality the following setting must be configured to use chemspider by obtaining a chemspider api key:

PHENOMEDB__API_KEYS__CHEMSPIDER

The following settings are recommended to be changed however the defaults will work.

PHENOMEDB__REDIS__PASSWORD

PHENOMEDB__PIPELINES__PIPELINE_MANAGER_USER

PHENOMEDB__PIPELINES__PIPELINE_MANAGER_PASSWORD

POSTGRES_USER

POSTGRES_PASSWORD

AIRFLOW_ADMIN_USER

AIRFLOW_ADMIN_PASSWORD

AIRFLOW_ADMIN_EMAIL

AIRFLOW__DATABASE__SQL_ALCHEMY_CONN

AIRFLOW__CORE__FERNET_KEY

TEST

username = admin # The user account used during unit tests

DB

dir = /Library/PostgreSQL/12/data/ # The directory used for storing Postgres data
rdbms = postgresql # The RDBMS to use (only supports Postgres currently)
user = postgres # The production database username
password = testpass # The database password
host = 127.0.0.1 # The database host
name = phenomedb # The database name
test = phenomedb_test # The test database name
port = 5433 # The database port
pool_size = 10 # The database pool size (SQLAlchemy)
max_overflow = 20 # The database max overflow
create_script = ./sql/phenomedb_v0.9.5_postgres.sql # The database create script

WEBSERVER

url = http://localhost:8080/ # The URL of the webserver

API

custom_root = custom # The url root of the custom API

REDIS

port = 6380 # The port of the Redis server
host = 127.0.0.1 # The host of the Redis server
user = default # The user of the Redis server
password = password # The password of the Redis server
memory_expired_seconds = 86400 # The time to expire cache objects from Redis

R

exec_path = /usr/local/bin/R # The R executable path
script_directory = /full/path/to/appdata/r_scripts/ # The R script directory

SMTP

enabled = true # Whether SMTP is configured
host = host # SMTP host
port = 25 # SMTP port
user = user # SMTP user
password = password # SMTP password
from = Name <emailaddress> # SMTP from address

DATA

project_data_base_path = /path/to/projectdata/ # The base path to the project related data (if used)
app_data = /full/path/to/appdata/ # The directory to store the application data
test_data = /full/path/to/data/test/ # The directory containing the test data
compounds = /full/path/to/data/compounds/ # The directory containing the compound data
config = /full/path/to/data/config/ # The directory containing the configs
cache = /full/path/to/appdata/cache/ # The cache directory

API_KEYS

chemspider = api_key # The ChemSpider API key

LOGGING

dir = /tmp/phenomelog/ # The logging directory

PIPELINES

pipeline_manager = apache-airflow # Only Apache-Airflow currently supported
pipeline_folder = /full/path/to/dags # The path to the Airflow DAGs folder
pipeline_manager_user = admin # The Airflow user to trigger pipelines
pipeline_manager_password = testpass # The Airflow user password for triggering pipelines
pipeline_manager_api_host = localhost:8080 # The Airflow API host URL
task_spec_file = /full/path/to/data/config/task_typespec.json # The task_typespec.json file
docker = false # Whether using docker or not