phenomedb.metadata

Metadata in PhenomeDB generally refers to clinical or sample-level variables, such as age, sex, BMI, etc.

As the purpose of PhenomeDB is to be able to integrate and stratify data across multiple projects or studies, this metadata must be harmonised prior to it usage.

The process for harmonisation of metadata is:

  1. Import the metadata using the ImportMetadata task, specifying any columns you wish to ignore (for example sensitive data).

  2. The ImportMetadata task imports the data to the metadata_field and metadata_value tables.

  3. Create or identity existing Harmonised Metadata Fields for the fields you wish to harmonise.

  4. For each metadata field, run the CurateMetadata task, selecting either an in-built metadata curation function, or define a python lambda that will curate the raw value to the required curated/harmonised value.

Once harmonised, the fields can then be used in integration and stratification queries via the QueryFactory.

Overview of the HarmoniseMetadataField task

PhenomeDB HarmoniseMetadataField task

The HarmoniseMetadataField architecture, with methods for harmonising types, names, and values

class phenomedb.metadata.HarmoniseMetadataField(project_name=None, metadata_field_name=None, harmonised_metadata_field_name=None, inbuilt_transform_name=None, lambda_function_string='lambda x : x', allowed_decimal_places=None, allowed_data_range=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)

AutoHarmoniseMetadataField Class. Takes a project metadata field and harmonised metadata field, and applies a lambda function to transform the raw data into the harmonised one.

Parameters:
  • metadata_field_name (str, optional) – The phenomedb.models.MetadataField name, defaults to None

  • harmonised_metadata_field_name (str, optional) – The phenomedb.models.HarmonisedMetadataField name, defaults to None

  • inbuilt_transform_name (str, optional) – The name of the inbuilt_transform method to use, ‘simple_assignment’ or ‘transform_dob_and_sampling_date_to_age’, or ‘categorise_bmi’ defaults to None

  • lambda_function_string (str, optional) – The lambda function string to use, defaults to ‘lambda x : x’

  • allowed_decimal_places (int, optional) – How many decimal places the harmonised value can have, defaults to None

  • allowed_data_range (list, optional) – The allowed range of harmonised values, defaults to None

  • project_name (str, optional) – The name of the Project, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • pipeline_run_id (str, optional) – The Pipeline run ID

call_inbuilt_transform(metadata_value)

Calls the inbuilt transform function

Parameters:

metadata_value (phenomedb.model.MetadataValue) – The phenomedb.model.MetadataValue

Returns:

The phenomedb.model.MetadataValue

Return type:

phenomedb.model.MetadataValue

call_lambda(metadata_value)

Calls the lambda function

Parameters:

metadata_value (phenomedb.model.MetadataValue) – The phenomedb.model.MetadataValue

Returns:

The phenomedb.model.MetadataValue

Return type:

phenomedb.model.MetadataValue

check_functions()

Check the functions

load_dataset()

Load the dataset

map_and_add_dataset_data()

Map and add dataset data

process()

Main method

task_validation()

Validate the task - default method

phenomedb.metadata.categorise_bmi(metadata_value, datatype, allowed_data_range, allowed_decimal_places, db_session=None, db_env=None)

Inbuilt function: Categorise numeric BMI values to string categories

Parameters:
phenomedb.metadata.getrandbits(k) x.  Generates an int with k random bits.
phenomedb.metadata.random() x in the interval [0, 1).
phenomedb.metadata.simple_assignment(metadata_value, datatype, allowed_data_range, allowed_decimal_places, db_session=None, db_env=None)

Inbuilt function: Simple assignment of data from raw to harmonised. Uses the HarmonisedMetadataField.datatype to cast to correct harmonised value.

Parameters:
  • metadata_value (phenomedb.models.MetadataValue) – The MetadataValue object.

  • datatype (phenomedb.models.HarmonisedMetadataField.HarmonisedMetadataFieldDatatype) – The HarmonisedMetadataField.datatype.

  • allowed_data_range (list) – A constraint to prevent values outside of this allowed range.

  • allowed_decimal_places (int) – How many decimal places are allowed.

  • db_session (sqlalchemy.orm.Session, optional) – The db_session to use, defaults to None.

  • db_env (str, optional) – The db_env to use, ‘PROD’, ‘BETA’, or ‘TEST’, defaults to None (‘PROD’).

Raises:

MetadataHarmonisationError – If the transform cannot work, raise this Exception.

Returns:

The transformed, harmonised MetadataValue object.

Return type:

phenomedb.models.MetadataValue

phenomedb.metadata.transform_dob_and_sampling_date_to_age(metadata_value, datatype, allowed_data_range, allowed_decimal_places, db_session=None, db_env=None)

Inbuilt function: transform date of birth and sampling date into a harmonised numeric age. Requires Sample.sample_date to exists.

Parameters:
  • metadata_value (phenomedb.models.MetadataValue) – The MetadataValue object.

  • datatype (phenomedb.models.HarmonisedMetadataField.HarmonisedMetadataFieldDatatype) – The HarmonisedMetadataField.datatype.

  • allowed_data_range (list) – A constraint to prevent values outside of this allowed range.

  • allowed_decimal_places (int) – How many decimal places are allowed.

  • db_session (sqlalchemy.orm.Session, optional) – The db_session to use, defaults to None.

  • db_env (str, optional) – The db_env to use, ‘PROD’, ‘BETA’, or ‘TEST’, defaults to None (‘PROD’).

Raises:

MetadataHarmonisationError – If the transform cannot work, raise this Exception.

Returns:

The transformed, harmonised MetadataValue object.

Return type:

phenomedb.models.MetadataValue