phenomedb.imports

Mappings between 3-file format at PhenomeDB

Mappings between a 3-file format metabolomics dataset and the PhenomeDB core data model

Main import tasks

  1. ImportMetadata - import sample metadata from a CSV where rows are samples and columns are metadata fields

  2. ImportBrukerIVDrAnnotations - import annotated metabolite measurements/abundances from a Bruker IVDr NMR dataset.

  3. ImportPeakPantheRAnnotation - import annotated metabolite measurements/abundances from a PeakPantheR LC-MS dataset.

  4. ImportMetabolights - import metabolite features and annotations from Metabolights format

ImportTask overview

Overview of the ImportTask class structure. All ImportTasks inherit this transactional-validation approach.

class phenomedb.imports.AnnotationImportTask(project_name=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, pipeline_run_id=None)

The AnnotationImportTask class. Used as the base class for the major annotation import methods.

Parameters:
  • project_name (str, optional) – The name of the project, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

add_or_update_annotated_feature_unified(sample_assay, sample_row_index, feature_index)

Add a annotated_feature

Parameters:
  • sample_assay (phenomedb.models.SampleAssay) – The sample_assay of the annotation

  • feature_metadata_row (dict) – The row the feature_metadata

  • intensity (float) – The intensity value

add_or_update_sample_assay(sample, sample_row_index, dataset)

Get or add a new sample_assay

Parameters:
  • sample (phenomedb.models.Sample) – The sample of the sample_assay

  • sample_row_index (pd.DataFrame) – The dataset row index of the sample

  • sample_row_index – The dataset

Returns:

sample_assay object phenomedb.models.sample_assay

Return type:

class:phenomedb.models.SampleAssay

check_sample_columns(dataset)

Check the sample columns in the dataset

Parameters:

dataset (pandas.DataFrame) – The imported dataset/CSV file

Raises:
  • Exception – No Sample ID or Sample ID column

  • Exception – Minimum column missing

create_saved_query()

Create a SavedQuery for the dataset for downstream analysis

get_or_add_annotation(cpd_name, cpd_id=None)

_summary_

Parameters:
  • cpd_name (str) – The cpd_name to add

  • cpd_id (str, optional) – The cpd_id to import, defaults to None

Returns:

A corresponding phenomedb.models.Annotation object

Return type:

phenomedb.models.Annotation

get_or_add_annotation_not_unified(feature_row_index)

Get or add an annotation from a 3-file dataset

Parameters:

feature_row_index (int) – The feature metadata row index

Returns:

A corresponding phenomedb.models.Annotation object

Return type:

phenomedb.models.Annotation

get_or_add_annotation_unified(feature_column_index)

Get or add an annotation from a combined CSV file

Parameters:

feature_column_index (int) – The feature column index

Returns:

A corresponding phenomedb.models.Annotation object

Return type:

phenomedb.models.Annotation

get_or_add_feature_dataset_unified()

Get or add a phenomedb.models.FeatureDataset

get_or_add_feature_metadata_unified()

Gets or adds the phenomedb.models.FeatureMetadata (where cpd_name == Feature Name)

process()

The annotation import process method

class phenomedb.imports.DownloadMetabolightsStudy(study_folder_path=None, study_id=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)

Download a Metabolights Study. If only study_id set, will download the study as well

Parameters:
  • study_folder_path (str, optional) – Path to the study folder, defaults to None

  • study_id (int, optional) – ID of the Study, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

class phenomedb.imports.ImportBrukerIVDRAnnotations(project_name=None, annotation_method=None, version=None, is_latest=True, unified_csv_path=None, pipeline_run_id=None, sample_matrix=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True)

Import Bruker IVDr Annotations

Parameters:
  • annotation_method (str, required) – name of the annotation method, BI-QUANT or BI-LISA

  • unified_csv_path (str, required) – the path to the unified csv file

  • sample_matrix (str, required) – the sample matrix, ie plasma, urine, serum, etc

  • project_name (str, optional) – The name of the project, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

add_or_update_feature_metadata(annotation_id, feature_column_index)

Get FeatureMetadata (column) based on FeatureDataset.id and the Feature Name

Parameters:

feature_column_index (int) – The column index from the feature file

Returns:

Annotation

Return type:

phenomedb.models.Annotation

get_or_add_metadata(sample, sample_row_index)

Add the raw metadata to the metadata_raw table

Parameters:
  • sample (phenomedb.models.sample) – The sampling event of metadata_raw

  • sample_row_index (int) – The dataset row index of the sample

get_or_add_unit(unit_name)

Gets or adds a unit to the database (by unit name)

Parameters:

unit_name (str) – The phenomedb.models.Unit name

Returns:

The phenomedb.models.Unit

Return type:

phenomedb.models.Unit

load_dataset()

Loads the task dataset

map_and_add_dataset_data()

Map the imported nPYc dataset to the phenomeDB models and add to db.

post_commit_actions()

Triggers the post-commit pipelines

task_validation()

Run the task validation to check number and values of imported data

Raises:
  • ValidationError – FeatureDataset does not exist

  • ValidationError – FeatureDataset.saved_query_id does not exist

  • ValidationError – SampleAssay does not exist

  • ValidationError – Annotation does not exist

  • ValidationError – AnnotatedFeature does not exist

  • ValidationError – Unit does not exist

  • ValidationError – FeatureMetadata does not exist

class phenomedb.imports.ImportDataLocations(project_name, assay_name, data_locations_path=None, assay_platform=None, sample_matrix=None, task_run_id=None, username=None, db_env=None)
get_or_add_sample(sample_row_index)

Get or add Sample

Parameters:

sample_row_index (int) – The index of the sample file

Returns:

Sample

Return type:

phenomedb.models.Sample

get_or_add_sample_assay(sample_row_index, sample)

Get or add a SampleAssay

Parameters:
  • sample_row_index (int) – The row index of the sample Dataframe

  • sample (phenomedb.models.Sample) – The Sample of the SampleAssay

Returns:

SampleAssay

Return type:

phenomedb.models.SampleAssay

get_or_add_subject(subject_name)

Get or add a new subject

Parameters:

sample_row_index – The row number of the UnifiedCSV dataset

Returns:

subject object phenomedb.models.subject

Return type:

class:phenomedb.models.subject

load_dataset()

Loads the task dataset

map_and_add_dataset_data()

Map the imported nPYc dataset to the phenomeDB models and add to db.

nmr_assay_regexes = {'blood': {'0$': 'NOESY', '1$': 'CPMG', '2$': 'JRES', '3$': 'DAS'}, 'urine': {'0$': 'NOESY', '1$': 'JRES', '2$': 'DAS'}}

Import Run Order Class. Imports a run order file. :param task_options: A dictionary containing the task options :type task_options: dict

task_validation()

Validate the task - default method

class phenomedb.imports.ImportMetabolightsStudy(study_id=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)

Import a Metabolights Study. If only study_id set, will download the study as well

Parameters:
  • study_folder_path (str, optional) – Path to the study folder, defaults to None

  • study_id (int, optional) – ID of the Study, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

add_annotated_features(assay, metabolite_assignment_file, sample_assay, default_unit)

Add Annotations and AnnotatedFeatures

Parameters:
  • assay (phenomedb.models.Assay) – The Assay

  • metabolite_assignment_file (str) – The maf file

  • sample_assay (phenomedb.models.SampleAssay) – The SampleAssay

add_annotation_compounds(assay, annotation_file)

Add the AnnotationCompounds.

1. Add the Compound 3. Add the HarmonisedAnnotation 4. Add the AnnotationCompound

Parameters:
get_or_add_data_repository(name, accession_number=None, submission_date=None, public_release_date=None)

Get or add repository

Parameters:
  • name (str) – Name of the repo

  • accession_number (str, optional) – accession number of repo, defaults to None

  • submission_date (datetime.datetime, optional) – Date of submission, defaults to None

  • public_release_date (datetime.datetime, optional) – Date of release, defaults to None

Returns:

DataRepository

Return type:

phenomedb.models.DataRepository

get_or_add_metabolights_compound(assay, annotation_method, the_row)

Get or add Metabolights Compound

Parameters:
Returns:

AnnotationCompound

Return type:

phenomedb.models.AnnotationCompound

get_or_add_metadata_field(field_name, field_value, sample)

Get or add metadata field

Parameters:
  • field_name (str) – The name of the metadata field

  • field_value (str) – The value of the metadata field

  • sample (phenomedb.models.Sample) – Sample

get_or_add_protocol(name=None, type=None, description=None, uri=None, version=None, parameters=None, components=None)

Get or add the protocol

Parameters:
  • name (str, optional) – protocol name, defaults to None

  • type (str, optional) – protocol type, defaults to None

  • description (str, optional) – description, defaults to None

  • uri (str, optional) – URI of the protocol, defaults to None

  • version (str, optional) – version of the protocol, defaults to None

  • parameters (dict, optional) – parameters of the protocol, defaults to None

  • components (dict, optional) – components of the protocol, defaults to None

get_or_add_publication(pubmed_id=None, doi=None, author_list=None, title=None, status=None)

Get or add publication

Parameters:
  • pubmed_id (int, optional) – pubmed id, defaults to None

  • doi (str, optional) – DOI, defaults to None

  • author_list (list, optional) – List of authors, defaults to None

  • title (str, optional) – publication title, defaults to None

  • status (str, optional) – Publication status, defaults to None

get_or_add_sample(subject, sample_name, sample_type_enum, sample_matrix)

Get or add Sample

Parameters:
  • subject (phenomedb.models.Subject) – Subject

  • sample_name (str) – Sample name

  • sample_type_enum (SampleType) – SampleType enum

  • sample_matrix (sample matrix) – Sample matrix

Returns:

Sample

Return type:

phenomedb.models.Sample

get_or_add_subject(sample_name, sample_type_enum)

Get or add subject

Parameters:
  • sample_name (str) – Sample name

  • sample_type_enum (str) – Sample Type

Returns:

Subject

Return type:

phenomedb.models.Subject

load_dataset()

Loads the dataset

load_study_description_file(filepath)
Takes study description file and builds a 2D dictionary of sections, and key -> values

Lists

ONTOLOGY SOURCE REFERENCE Term Source Name “OBI” “NCBITAXON” “CHMO” “CL” “EFO” “MS” “NCIT” INVESTIGATION Investigation Identifier “MTBLS1073”

study_description_dict[‘ONTOLOGY SOURCE REFERENCE’][‘Term Source Name’] = [“OBI”,”NCBITAXON”,”CHMO”,”CL”,”EFO”,”MS”,”NCIT”] study_description_dict[‘INVESTIGATION’][‘Investigation Identifier’] = “MTBLS1073”

Parameters:

filepath (str) – The path to the study description file

map_and_add_dataset_data()

Map and add dataset data

parse_assay_file(assay_file, data)

Parse an assay file

Parameters:
  • assay_file (str) – The name of the assay file

  • data (pd.Dataframe) – The assay data

Raises:

NotImplementedError – Assay platform not implemented

parse_assays()

Parse the assays

parse_persons()

Parse the persons

Returns:

person dict

Return type:

dict

parse_protocols()

Parse the protocols

parse_publications()

Parse the publications

parse_sample_information()

Parse sample information

parse_study_description()

Parse the study description

class phenomedb.imports.ImportMetabolightsXCMSAnnotations(study_id=None, xcms_file=None, assay_name=None, assay_name_order=None, sample_matrix=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)

Import MetabolightsXCMSAnnotations, e.g. to be run after DownloadMetabolightStudy and RunXCMS”

Parameters:
  • study_folder_path (str, optional) – Path to the study folder, defaults to None

  • study_id (int, optional) – ID of the Study, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

add_annotated_features(assay, metabolite_assignment_file, sample_assay, default_unit)

Add Annotations and AnnotatedFeatures

Parameters:
  • assay (phenomedb.models.Assay) – The Assay

  • metabolite_assignment_file (str) – The maf file

  • sample_assay (phenomedb.models.SampleAssay) – The SampleAssay

  • default_unit (phenomedb.models.Unit) – The default unit to use

add_annotation_compounds(assay, annotation_file, assay_file)

Add the AnnotationCompounds.

1. Add the Compound 3. Add the HarmonisedAnnotation 4. Add the AnnotationCompound

Parameters:
  • assay (phenomedb.models.Assay) – the Assay

  • annotation_file (str) – The annotation file

  • assay_file (str) – The name of the assay file

add_feature_metadata(assay, assay_file)

Add FeatureMetadata objects

Parameters:
  • assay (str) – The assay to import

  • assay_file (str) – The name of the assay file

Returns:

List of added FeatureMetadatas

Return type:

list

generate_feature_jsonb(sample_assay, feature_metadatas)

Build the features jsonb and add to SampleAssayFeatures

Parameters:
get_or_add_data_repository(name, accession_number=None, submission_date=None, public_release_date=None)

Get or add repository

Parameters:
  • name (str) – Name of the repo

  • accession_number (str, optional) – accession number of repo, defaults to None

  • submission_date (datetime.datetime, optional) – Date of submission, defaults to None

  • public_release_date (datetime.datetime, optional) – Date of release, defaults to None

Returns:

DataRepository

Return type:

phenomedb.models.DataRepository

get_or_add_metabolights_compound(assay, annotation_method, the_row)

Get or add Metabolights Compound

Parameters:
Returns:

AnnotationCompound

Return type:

phenomedb.models.AnnotationCompound

get_or_add_metadata_field(field_name, field_value, sample)

Get or add metadata field

Parameters:
  • field_name (str) – The name of the metadata field

  • field_value (str) – The value of the metadata field

  • sample (phenomedb.models.Sample) – Sample

get_or_add_protocol(name=None, type=None, description=None, uri=None, version=None, parameters=None, components=None)

Get or add the protocol

Parameters:
  • name (str, optional) – protocol name, defaults to None

  • type (str, optional) – protocol type, defaults to None

  • description (str, optional) – description, defaults to None

  • uri (str, optional) – URI of the protocol, defaults to None

  • version (str, optional) – version of the protocol, defaults to None

  • parameters (dict, optional) – parameters of the protocol, defaults to None

  • components (dict, optional) – components of the protocol, defaults to None

get_or_add_publication(pubmed_id=None, doi=None, author_list=None, title=None, status=None)

Get or add publication

Parameters:
  • pubmed_id (int, optional) – pubmed id, defaults to None

  • doi (str, optional) – DOI, defaults to None

  • author_list (list, optional) – List of authors, defaults to None

  • title (str, optional) – publication title, defaults to None

  • status (str, optional) – Publication status, defaults to None

get_or_add_sample(subject, sample_name, sample_type_enum, sample_matrix)

Get or add Sample

Parameters:
  • subject (phenomedb.models.Subject) – Subject

  • sample_name (str) – Sample name

  • sample_type_enum (SampleType) – SampleType enum

  • sample_matrix (sample matrix) – Sample matrix

Returns:

Sample

Return type:

phenomedb.models.Sample

get_or_add_subject(sample_name, sample_type_enum)

Get or add subject

Parameters:
  • sample_name (str) – Sample name

  • sample_type_enum (str) – Sample Type

Returns:

Subject

Return type:

phenomedb.models.Subject

load_dataset()

Loads the dataset

load_study_description_file(filepath)
Takes study description file and builds a 2D dictionary of sections, and key -> values

Lists

ONTOLOGY SOURCE REFERENCE Term Source Name “OBI” “NCBITAXON” “CHMO” “CL” “EFO” “MS” “NCIT” INVESTIGATION Investigation Identifier “MTBLS1073”

study_description_dict[‘ONTOLOGY SOURCE REFERENCE’][‘Term Source Name’] = [“OBI”,”NCBITAXON”,”CHMO”,”CL”,”EFO”,”MS”,”NCIT”] study_description_dict[‘INVESTIGATION’][‘Investigation Identifier’] = “MTBLS1073”

Parameters:

filepath (str) – The path to the study description file

map_and_add_dataset_data()

Map and add dataset data

parse_assay_file(assay_file, data)

Parse an assay file

Parameters:
  • assay_file (str) – The name of the assay file

  • data (pandas.Dataframe) – The assay data

Raises:

NotImplementedError – [description]

parse_assays()

Parse the assays

parse_persons()

Parse the persons

Returns:

person dict

Return type:

dict

parse_protocols()

Parse the protocols

parse_publications()

Parse the publications

parse_sample_information()

Parse sample information

parse_study_description()

Parse the study description

class phenomedb.imports.ImportMetadata(project_name=None, filepath=None, id_column=None, id_type='Sample', columns_to_import=None, username=None, task_run_id=None, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None)

Import Metadata from a CSV file where rows are samples and columns are metadata fields.

Parameters:
  • project_name (str, optional) – The name of the Project, defaults to None

  • filepath (str, optional) – The path to the CSV file, defaults to None

  • id_column (str, optional) – The column name of the ID field, defaults to None

  • id_type (str, optional) – Are the IDs for Subject or Sample?, defaults to ‘Sample’

  • columns_to_import (list, optional) – Which columns to import, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

load_dataset()

Load the dataset

map_and_add_dataset_data()

Parse the dataset

Raises:

Exception – Unknown id_type (must be Sample or Subject)

class phenomedb.imports.ImportNPYC(project_name=None, assay_name=None, sample_matrix=None, dataset_path=None, sop=None, sample_metadata_path=None, sample_metadata_format=None, task_run_id=None, username=None, db_env=None, db_session=None, pipeline_run_id=None, execution_date=None)

_summary_

Parameters:
  • dataset_path – The path to the dataset folder, defaults to None.

  • sample_metadata_path (str, optional) – The path to the sample_metadata file, defaults to None.

  • sample_metadata_format (str, optional) – The sample_metadata format, defaults to None.

  • sop (str, optional) – The SOP to use, defaults to None.

  • assay_name (str, optional) – The name of the assay (ie LC-QQQ Bile Acids), defaults to None.

  • sample_matrix (str, optional) – The sample matrix (ie urine, plasma), defaults to None.

  • sop_file_path (str, optional) – The path to the SOP file used, defaults to “”.

  • project_name (str, optional) – The name of the project, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

get_or_add_annotated_features(sample_assay, sample_index)

Get or add annotated features

Parameters:
get_or_add_metadata(sample, sample_row_index)

Get or add metadata

Parameters:
get_or_add_sample(subject, sample_row_index)

Get or add a new sample

Parameters:
  • subject (phenomedb.models.Subject) – The subject of the sample

  • sample_row_index (int) – The dataset row index the sample

Returns:

sample object phenomedb.models.Sample

Return type:

class:phenomedb.models.Sample

get_or_add_sample_assay(sample, sample_row_index)

Get or add a new sample

Parameters:
Returns:

phenomedb.models.SampleAssay object

Return type:

class:phenomedb.models.SampleAssay

get_or_add_subject(sample_row_index)

Get or add a new subject

Parameters:

index (int) – The row number of the sample_metadata

Returns:

subject object phenomedb.models.Subject

Return type:

class:phenomedb.models.subject

map_and_add_dataset_data()

Map and add the data

class phenomedb.imports.ImportPeakPantherAnnotations(project_name=None, feature_metadata_csv_path=None, sample_metadata_csv_path=None, intensity_data_csv_path=None, ppr_annotation_parameters_csv_path=None, sample_matrix=None, assay_name=None, roi_version=None, batch_corrected_data_csv_path=None, all_features_feature_metadata_csv_path=None, ppr_mz_csv_path=None, ppr_rt_csv_path=None, is_latest=True, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, run_batch_correction=False)

ImportPeakPantherAnnotations Class. Using the Basic CSV format, imports a peakPantheR Dataset, maps to phenomeDB.models, and commits to DB.

Parameters:
  • feature_metadata_csv_path (str, optional) – The path to the feature metadata csv file, defaults to None

  • sample_metadata_csv_path (str, optional) – The path to the sample metadata csv file, defaults to None

  • intensity_data_csv_path (str, optional) – The path to the intensity file, defaults to None

  • ppr_annotation_parameters_csv_path (str, optional) – The path to the PPR annotation parameters file, defaults to None

  • sample_matrix (str, optional) – The sample_matrix being imported, defaults to None

  • assay_name (str, optional) – The assay name being imported, defaults to None

  • roi_version (str, optional) – The version of the ROI file used for annotations, defaults to None

  • batch_corrected_data_csv_path (str, optional) – The path to the batch corrected intensity data, defaults to None

  • all_features_feature_metadata_csv_path (str, optional) – The path to the file containing all the searched features, defaults to None

  • ppr_mz_csv_path (str, optional) – The PPR MZ csv file path, defaults to None

  • ppr_rt_csv_path (str, optional) – The PPR RT csv file path, defaults to None

  • project_name (str, optional) – The name of the project, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

add_or_update_annotated_feature_not_unified(sample_assay, sample_row_index, feature_index, feature_name)

Add or update a phenomedb.models.AnnotatedFeature from a 3-file format file.

Parameters:
Raises:

Exception – Feature name not found

Returns:

The created/found phenomedb.models.AnnotatedFeature

Return type:

phenomedb.models.AnnotatedFeature

add_or_update_feature_metadata(annotation_id, feature_metadata_row, feature_name)

Get or update a phenomedb.models.FeatureMetadata object

Parameters:
  • annotation_id (int) – The phenomedb.models.Annotation ID

  • feature_metadata_row (pd.Series) – The row of the feature metadata dataset

  • feature_name (str) – The name of the feature

Returns:

The phenomedb.models.FeatureMetadata object

Return type:

phenomedb.models.FeatureMetadata

get_or_add_feature_dataset()

Get or add a phenomedb.models.FeatureDataset

get_or_add_feature_metadata()

Get or add feature metadata

get_or_add_unit()

Get or add phenomedb.models.Unit

load_dataset()

Loads the PeakPanther dataset, sets the name, and the loads the sampleInfo

map_and_add_dataset_data()

Map and add the intensity/abundances/annotated_features

post_commit_actions()

Triggers the post-commit pipelines

task_validation()

The task validation for the import, checking the number of entries and the values match.

Raises:
  • ValidationError – sample metadata file is not the same

  • ValidationError – feature metadata file is not the same

  • ValidationError – intensity data file is not the same

  • ValidationError – FeatureMetadata does not exist

  • ValidationError – FeatureDataset does not exist

  • ValidationError – AnnotatedFeature does not exist

  • ValidationError – SR corrected intensity does not match expected

  • ValidationError – Annotation does not exist

  • ValidationError – Expected FeatureDataset.sr_correction_parameters does not exist

  • ValidationError – FeatureDataset.saved_query_id does not exist

class phenomedb.imports.ImportSampleManifest(project_name=None, sample_manifest_path=None, columns_to_ignore=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, pipeline_run_id=None)

Import a Sample Manifest XLS file. The format of which is an excel file with 2 sheets, one called ‘samples’ with sample-level metadata, another called ‘subjects’, with subject-level metadata. Both sample-level and subject-level metadata are imported at the sample-level.

Parameters:
  • sample_manifest_path (str, optional) – _description_, defaults to None

  • columns_to_ignore (str, optional) – _description_, defaults to None

  • project_name (str, optional) – The name of the project, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

add_metadata(sample, sample_row_index)

Add the raw metadata to the metadata_raw table

Parameters:
  • sample (phenomedb.models.Sample) – The sampling event of metadata_raw

  • sample_row_index (int) – The dataset row index

Returns:

add_metadata_from_sample_worksheet(sample, sample_row_index, metadata_row={})

Adds metadata from the sample worksheet

Parameters:
add_metadata_from_subject_worksheet(sample, sample_row_index, metadata_row={})

Adds metadata fields from the subject worksheet

Parameters:
get_or_add_sample(subject, sample_row_index)

Get or add a sample

Parameters:
  • subject (phenomedb.models.subject) – the subject of the sample

  • sample_row_index (int) – The dataset row index of the sample

Returns:

sample object phenomedb.models.Sample

Return type:

phenomedb.models.sample

get_or_add_subject(subject_name)

Get or add the subject

Parameters:

subject_name (str) – the phenomedb.models.Subject name

Returns:

the matching phenomedb.models.Subject

Return type:

phenomedb.models.subject

load_dataset()

Loads the task dataset

Raises:

Exception – If the file extension is not .xlsx or .xlsx

map_and_add_dataset_data()

Map the imported nPYc dataset to the phenomeDB models and add to db.

task_validation()

Task validation, checks every entry exists in the database, and the values match.

Raises:
  • ValidationError – MetadataField incorrectly added

  • ValidationError – MetadataField missing

  • ValidationError – MetadataValue incorrectly added

  • ValidationError – MetadataValue missing

class phenomedb.imports.ImportTargetLynxAnnotations(project_name=None, unified_csv_path=None, sop=None, sop_version=None, assay_name=None, sample_matrix=None, sop_file_path='', is_latest=True, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True)

TargetLynx Task Class. Imports an nPYc TargetLynx Targeted Dataset, maps to phenomeDB.models, and commits to DB.

Parameters:
  • unified_csv_path – The path to the unified csv file, defaults to None.

  • sop (str, optional) – The SOP to use, defaults to None.

  • sop_version (str, optional) – The version of the SOP used, defaults to None.

  • assay_name (str, optional) – The name of the assay (ie LC-QQQ Bile Acids), defaults to None.

  • sample_matrix (str, optional) –  The sample matrix (ie urine, plasma), defaults to None.

  • sop_file_path (str, optional :param project_name: The name of the project, defaults to None) – The path to the SOP file used, defaults to “”.

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

add_or_update_feature_metadata(annotation_id, feature_column_index)

Get FeatureMetadata (column) based on FeatureDataset.id and the Feature Name

Parameters:

feature_column_index (int) – The column index from the feature file

Returns:

The phenomedb.models.Annotation object

Return type:

phenomedb.models.Annotation

load_dataset()

Loads the task datasets.

map_and_add_dataset_data()

Map the imported nPYc dataset to the phenomeDB models and add to db.

task_validation()

The task validation, checks the counts and values of imported data

Raises:
  • ValidationError – FeatureDataset does not exist

  • ValidationError – FeatureDataset.saved_query_id does not exist

  • ValidationError – SampleAssay does not exist

  • ValidationError – AnnotatedFeature does not exist

  • ValidationError – FeatureMetadata does not exist

  • ValidationError – Annotation does not exist

class phenomedb.imports.ImportTask(project_name=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, pipeline_run_id=None)

The ImportTask class. Used as the base class for the major import methods. Not used for compounds. Should not be instantiated itself, only from a child class.

Parameters:
  • project_name (str, optional) – The name of the project, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

add_metadata_field(field_name)

Add a phenomedb.models.MetadataField

Parameters:

field_name (str) – The phenomedb.models.MetadataField name

Returns:

The added phenomedb.models.MetadataField

Return type:

phenomedb.models.MetadataField

add_metadata_field_and_value(sample_id, field_name, field_value)

Adds and flushes the metadata field and values

Parameters:
  • sample_id (int) – The phenomedb.model.Sample ID

  • field_name (str) – The name of the metadata field

  • field_value (str) – The value of the metadata field for the sample

add_metadata_value(metadata_field, sample_id, field_value)

Add a phenomedb.models.MetadataValue.

Parameters:
get_annotated_feature(feature_metadata_id, sample_assay_id)

Get a annotated_feature by feature metadata id and sample_assay.id

Parameters:
  • feature_metadata_id (int) – The id of the annotation

  • sample_assay_id (int) – The id of the SampleAssay

Returns:

The AnnotatedFeature

Return type:

phenomedb.models.AnnotatedFeature

get_annotation_by_cpd_name_and_annotation_method(cpd_name)

Get the annotation by cpd_name

Parameters:

cpd_name (str) – The name of the compound as seen seen in the annotation datasets

Returns:

The found AnnotationCompound

Return type:

phenomedb.models.AnnotationCompound

get_or_add_sample(subject, sample_row_index)

Get or add a new sample

Parameters:
  • subject (phenomedb.models.subject) – The subject of the sample

  • sample_row_index (in) – The dataset row index the sample

Returns:

sample object phenomedb.models.Sample

Return type:

class:phenomedb.models.Sample

get_or_add_subject(sample_row_index)

Get or add a new subject

Parameters:

index (int) – The row number of the sample_metadata

Returns:

subject object phenomedb.models.subject

Return type:

class:phenomedb.models.subject

get_or_build_sample_name(sample_row_index, column=None)

Get or build the sample name

Parameters:
  • sample_row_index (str, optional) – The row index of the sample metadata

  • column – The column name to use, default None

Raises:

Exception – Sample ID not found

Returns:

The sample name

Return type:

str

get_or_build_subject_name(index, sample_metadata_row, sample_type_column_name='SampleType', subject_column_name='Subject ID')

Get or build subject name

Parameters:
  • index (int) – The row index

  • sample_metadata_row (numpy.Series) – The sample_metadata row

  • sample_type_column_name (str, optional) – The name of the column with SampleType, defaults to ‘SampleType’

  • subject_column_name (str, optional) – The name of the column with the Subject ID, defaults to ‘Subject ID’

Returns:

The subject name

Return type:

str

get_project()

Gets a project to the database (by project_name)

Raises:

Exception – If no project by that name exists

Returns:

sample_assay object phenomedb.models.Project

Return type:

class:phenomedb.models.Project

get_sample_assay(sample, sample_row_index)

Get or add a new sample_assay

Parameters:
  • sample (phenomedb.models.sample) – The sample of the sample_assay

  • sample_row_index (int) – The dataset row index of the sample

Returns:

sample_assay object phenomedb.models.SampleAssay

Return type:

class:phenomedb.models.sample_assay

abstract load_dataset()

Load that dataset. This method should be over-ridden in the task class. This method can call a python package or do a system call.

abstract map_and_add_dataset_data()

Map the dataset and import to phenomeDB

parse_value(value)

Parse a raw value to convert the necessary type

Parameters:

value (object) – The raw value to convert

Returns:

The parsed, converted value

Return type:

float

process()

Main method

send_user_failure_email(err)

Send a TaskRun failure email to the user

Parameters:

err (str) – The error message

send_user_success_email()

Send a TaskRun success email to the user

Parameters:

err (str) – The error message

simple_report()

Check the database contains entries for all records in the dataset

task_validation()

Validate the task - default method

class phenomedb.imports.ImportXCMSFeatures(project_name=None, unified_csv_path=None, sample_matrix=None, assay_name=None, task_run_id=None, username=None, db_env=None)

XCMSFeatureImportTaskUnifiedCSV Class. Using the Unified CSV format, imports an XCMS Dataset, maps to phenomeDB.models, and commits to DB.

TO DO: Finish the import code

Parameters:
  • unified_csv_path (str, optional) – The path to the unified csv file, defaults to None.

  • assay_name (str, optional) – The name of the assay (ie LC-QQQ Bile Acids), defaults to None.

  • sample_matrix (str, optional) –  The sample matrix (ie urine, plasma), defaults to None.

  • project_name (str, optional) – The name of the project, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • validate (boolean) – Whether to run validation, default True

  • pipeline_run_id (str, optional) – The Pipeline run ID

build_feature_dict(sample_row_index, feature_column_index)

Build a dictionary of features

Parameters:
  • sample_row_index (int) – The row index of the sample metadata dataset

  • feature_column_index (int) – The column index of the feature metadata dataset

Returns:

A dictionary of FeatureMetadata properties

Return type:

dict

get_or_add_assay()

Gets or adds an assay to the database (by assay_name)

load_dataset()

Loads the XCMS dataset, sets the name, and the loads the sampleInfo

map_and_add_dataset_data()

Map and add the dataset data

Raises:

Exception – No Sampling ID or Sample File Name in row

phenomedb.imports.getrandbits(k) x.  Generates an int with k random bits.
phenomedb.imports.random() x in the interval [0, 1).