Installation
The various PhenomeDB components can be installed separately, however to simplify the usage of these interacting components, they have been docker-ised.
Local/desktop Docker installation
Install docker
Download the repo
cd into the repo directory
Copy the .env-example file to a file called .env, and edit the parameters as required (see settings)
cd into the directory and run docker-compose up
$ git clone git@github.com:phenomecentre/phenomedb.git
$ cd phenomedb
$ cp .env-example .env
$ vim .env # or whichever text editor
$ docker compose up
Python installatiion
Python installation is necessary for local IDE debugging of unit tests and building the docs.
Warning
Mac Mx or other ARM-based chips are not currently supported for local installation due to dependency hell. If you are running a Mac Mx chip, use one of the phenomedb-airflow containers to run the tests and build the docs
To install the phenomedb library locally:
Checkout the repo
install the pip requirements (inside a virtualenv or conda env)
run setup.py install
Either run the docker compose separately or install postgres and redis according to your OS instructions.
Test the installation by running the phenomedb cli.py -h command
$ python setup.py install # this will fail the first time, run it twice
$ python setup.py install
$ docker compose up -d postgres redis
$ cd phenomedb
$ python cli.py -h
If it the cli.py -h command shows you a list of available tasks, the installation is working.
Running the tests
The tests can be run using pytest.
Using the local install:
$ docker compose up postgres redis
$ cd tests
$ pytest .
Using a phenomedb-airflow docker container:
$ docker compose up -d
$ docker exec -it phenomedb-scheduler-1 /bin/bash
$ cd /opt/phenomedb_app/phenomedb/
$ pytest tests/
Building the docs
The docs are hosted on readthedocs, but must be built locally before upload (due to the postgres and redis dependencies). The sphinx and sphinx-rtd-theme pip packages are required to build the docs. To upload them to readthedocs, simply push them to the repo.
$ cd docs
$ make clean && make html
$ cd ..
$ git add . -A
$ git commit -m 'updated docs'
$ git push
Settings
Settings in PhenomeDB are configured in different ways depending if PhenomeDB is being run via docker compose or not.
Local Python Installation
When running the phenomedb python library from a local host (instead of Docker), the configuration is controlled by the ./phenomedb/data/config/default-config.ini file. The configuration can be overriden by either copying this file to the same directory with the name config.ini, or by copying it a location on your machine and specifying the PHENOMEDB_CONFIG environment variable.
Docker installation
When running PhenomeDB from docker compose, you can edit the user-copied (during installation) env file ./.env. This file defines the environment variables inside the docker containers, and overrides the values in config.ini and default-config.ini..
Apache Airflow settings can be configured with the following syntax:
AIRFLOW__API__AUTH_BACKEND=airflow.api.auth.backend.basic_auth
PhenomeDB settings can be set in the same format:
PHENOMEDB__GROUP__SETTING=example
The .env-example file contains the recommended Airflow and ChemSpider settings, but they can be adjusted as required.
The config.ini file contains the following groups and settings:
To use the ImportCompoundTask compound lookup functionality the following setting must be configured to use chemspider by obtaining a chemspider api key:
PHENOMEDB__API_KEYS__CHEMSPIDER
The following settings are recommended to be changed however the defaults will work.
PHENOMEDB__REDIS__PASSWORD
PHENOMEDB__PIPELINES__PIPELINE_MANAGER_USER
PHENOMEDB__PIPELINES__PIPELINE_MANAGER_PASSWORD
POSTGRES_USER
POSTGRES_PASSWORD
AIRFLOW_ADMIN_USER
AIRFLOW_ADMIN_PASSWORD
AIRFLOW_ADMIN_EMAIL
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
AIRFLOW__CORE__FERNET_KEY
TEST
username = admin # The user account used during unit tests
DB
dir = /Library/PostgreSQL/12/data/ # The directory used for storing Postgres data
rdbms = postgresql # The RDBMS to use (only supports Postgres currently)
user = postgres # The production database username
password = testpass # The database password
host = 127.0.0.1 # The database host
name = phenomedb # The database name
test = phenomedb_test # The test database name
port = 5433 # The database port
pool_size = 10 # The database pool size (SQLAlchemy)
max_overflow = 20 # The database max overflow
create_script = ./sql/phenomedb_v0.9.5_postgres.sql # The database create script
WEBSERVER
url = http://localhost:8080/ # The URL of the webserver
API
custom_root = custom # The url root of the custom API
REDIS
port = 6380 # The port of the Redis server
host = 127.0.0.1 # The host of the Redis server
user = default # The user of the Redis server
password = password # The password of the Redis server
memory_expired_seconds = 86400 # The time to expire cache objects from Redis
R
exec_path = /usr/local/bin/R # The R executable path
script_directory = /full/path/to/appdata/r_scripts/ # The R script directory
SMTP
enabled = true # Whether SMTP is configured
host = host # SMTP host
port = 25 # SMTP port
user = user # SMTP user
password = password # SMTP password
from = Name <emailaddress> # SMTP from address
DATA
project_data_base_path = /path/to/projectdata/ # The base path to the project related data (if used)
app_data = /full/path/to/appdata/ # The directory to store the application data
test_data = /full/path/to/data/test/ # The directory containing the test data
compounds = /full/path/to/data/compounds/ # The directory containing the compound data
config = /full/path/to/data/config/ # The directory containing the configs
cache = /full/path/to/appdata/cache/ # The cache directory
API_KEYS
chemspider = api_key # The ChemSpider API key
LOGGING
dir = /tmp/phenomelog/ # The logging directory
PIPELINES
pipeline_manager = apache-airflow # Only Apache-Airflow currently supported
pipeline_folder = /full/path/to/dags # The path to the Airflow DAGs folder
pipeline_manager_user = admin # The Airflow user to trigger pipelines
pipeline_manager_password = testpass # The Airflow user password for triggering pipelines
pipeline_manager_api_host = localhost:8080 # The Airflow API host URL
task_spec_file = /full/path/to/data/config/task_typespec.json # The task_typespec.json file
docker = false # Whether using docker or not