phenomedb.metadata
Metadata in PhenomeDB generally refers to clinical or sample-level variables, such as age, sex, BMI, etc.
As the purpose of PhenomeDB is to be able to integrate and stratify data across multiple projects or studies, this metadata must be harmonised prior to it usage.
The process for harmonisation of metadata is:
Import the metadata using the ImportMetadata task, specifying any columns you wish to ignore (for example sensitive data).
The ImportMetadata task imports the data to the metadata_field and metadata_value tables.
Create or identity existing Harmonised Metadata Fields for the fields you wish to harmonise.
For each metadata field, run the CurateMetadata task, selecting either an in-built metadata curation function, or define a python lambda that will curate the raw value to the required curated/harmonised value.
Once harmonised, the fields can then be used in integration and stratification queries via the QueryFactory.
Overview of the HarmoniseMetadataField task
The HarmoniseMetadataField architecture, with methods for harmonising types, names, and values
- class phenomedb.metadata.HarmoniseMetadataField(project_name=None, metadata_field_name=None, harmonised_metadata_field_name=None, inbuilt_transform_name=None, lambda_function_string='lambda x : x', allowed_decimal_places=None, allowed_data_range=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)
AutoHarmoniseMetadataField Class. Takes a project metadata field and harmonised metadata field, and applies a lambda function to transform the raw data into the harmonised one.
- Parameters:
metadata_field_name (str, optional) – The
phenomedb.models.MetadataFieldname, defaults to Noneharmonised_metadata_field_name (str, optional) – The
phenomedb.models.HarmonisedMetadataFieldname, defaults to Noneinbuilt_transform_name (str, optional) – The name of the inbuilt_transform method to use, ‘simple_assignment’ or ‘transform_dob_and_sampling_date_to_age’, or ‘categorise_bmi’ defaults to None
lambda_function_string (str, optional) – The lambda function string to use, defaults to ‘lambda x : x’
allowed_decimal_places (int, optional) – How many decimal places the harmonised value can have, defaults to None
allowed_data_range (list, optional) – The allowed range of harmonised values, defaults to None
project_name (str, optional) – The name of the Project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
pipeline_run_id (str, optional) – The Pipeline run ID
- call_inbuilt_transform(metadata_value)
Calls the inbuilt transform function
- Parameters:
metadata_value (
phenomedb.model.MetadataValue) – Thephenomedb.model.MetadataValue- Returns:
The
phenomedb.model.MetadataValue- Return type:
phenomedb.model.MetadataValue
- call_lambda(metadata_value)
Calls the lambda function
- Parameters:
metadata_value (
phenomedb.model.MetadataValue) – Thephenomedb.model.MetadataValue- Returns:
The
phenomedb.model.MetadataValue- Return type:
phenomedb.model.MetadataValue
- check_functions()
Check the functions
- load_dataset()
Load the dataset
- map_and_add_dataset_data()
Map and add dataset data
- process()
Main method
- task_validation()
Validate the task - default method
- phenomedb.metadata.categorise_bmi(metadata_value, datatype, allowed_data_range, allowed_decimal_places, db_session=None, db_env=None)
Inbuilt function: Categorise numeric BMI values to string categories
- Parameters:
metadata_value (
phenomedb.models.MetadataValue) – The MetadataValue object.datatype (
phenomedb.models.HarmonisedMetadataField.HarmonisedMetadataFieldDatatype) – The HarmonisedMetadataField.datatype.allowed_data_range (list) – A constraint to prevent values outside of this allowed range.
allowed_decimal_places (int) – How many decimal places are allowed.
- phenomedb.metadata.getrandbits(k) x. Generates an int with k random bits.
- phenomedb.metadata.random() x in the interval [0, 1).
- phenomedb.metadata.simple_assignment(metadata_value, datatype, allowed_data_range, allowed_decimal_places, db_session=None, db_env=None)
Inbuilt function: Simple assignment of data from raw to harmonised. Uses the HarmonisedMetadataField.datatype to cast to correct harmonised value.
- Parameters:
metadata_value (
phenomedb.models.MetadataValue) – The MetadataValue object.datatype (
phenomedb.models.HarmonisedMetadataField.HarmonisedMetadataFieldDatatype) – The HarmonisedMetadataField.datatype.allowed_data_range (list) – A constraint to prevent values outside of this allowed range.
allowed_decimal_places (int) – How many decimal places are allowed.
db_session (
sqlalchemy.orm.Session, optional) – The db_session to use, defaults to None.db_env (str, optional) – The db_env to use, ‘PROD’, ‘BETA’, or ‘TEST’, defaults to None (‘PROD’).
- Raises:
MetadataHarmonisationError – If the transform cannot work, raise this Exception.
- Returns:
The transformed, harmonised MetadataValue object.
- Return type:
- phenomedb.metadata.transform_dob_and_sampling_date_to_age(metadata_value, datatype, allowed_data_range, allowed_decimal_places, db_session=None, db_env=None)
Inbuilt function: transform date of birth and sampling date into a harmonised numeric age. Requires Sample.sample_date to exists.
- Parameters:
metadata_value (
phenomedb.models.MetadataValue) – The MetadataValue object.datatype (
phenomedb.models.HarmonisedMetadataField.HarmonisedMetadataFieldDatatype) – The HarmonisedMetadataField.datatype.allowed_data_range (list) – A constraint to prevent values outside of this allowed range.
allowed_decimal_places (int) – How many decimal places are allowed.
db_session (
sqlalchemy.orm.Session, optional) – The db_session to use, defaults to None.db_env (str, optional) – The db_env to use, ‘PROD’, ‘BETA’, or ‘TEST’, defaults to None (‘PROD’).
- Raises:
MetadataHarmonisationError – If the transform cannot work, raise this Exception.
- Returns:
The transformed, harmonised MetadataValue object.
- Return type: