phenomedb.imports

Mappings between 3-file format at PhenomeDB — Mappings between a 3-file format metabolomics dataset and the PhenomeDB core data model

Main import tasks

ImportMetadata - import sample metadata from a CSV where rows are samples and columns are metadata fields
ImportBrukerIVDrAnnotations - import annotated metabolite measurements/abundances from a Bruker IVDr NMR dataset.
ImportPeakPantheRAnnotation - import annotated metabolite measurements/abundances from a PeakPantheR LC-MS dataset.
ImportMetabolights - import metabolite features and annotations from Metabolights format

ImportTask overview — Overview of the ImportTask class structure. All ImportTasks inherit this transactional-validation approach.

class phenomedb.imports.AnnotationImportTask(project_name=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, pipeline_run_id=None)

The AnnotationImportTask class. Used as the base class for the major annotation import methods.

Parameters:

project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

add_or_update_annotated_feature_unified(sample_assay, sample_row_index, feature_index)

Add a annotated_feature

Parameters:

sample_assay (phenomedb.models.SampleAssay) – The sample_assay of the annotation
feature_metadata_row (dict) – The row the feature_metadata
intensity (float) – The intensity value

add_or_update_sample_assay(sample, sample_row_index, dataset)

Get or add a new sample_assay

Parameters:

sample (phenomedb.models.Sample) – The sample of the sample_assay
sample_row_index (pd.DataFrame) – The dataset row index of the sample
sample_row_index – The dataset

Returns:

sample_assay object phenomedb.models.sample_assay

Return type:

class:phenomedb.models.SampleAssay

check_sample_columns(dataset)

Check the sample columns in the dataset

Parameters:

dataset (pandas.DataFrame) – The imported dataset/CSV file

Raises:

Exception – No Sample ID or Sample ID column
Exception – Minimum column missing

create_saved_query(): Create a SavedQuery for the dataset for downstream analysis

get_or_add_annotation(cpd_name, cpd_id=None)

_summary_

Parameters:

cpd_name (str) – The cpd_name to add
cpd_id (str, optional) – The cpd_id to import, defaults to None

Returns:

A corresponding phenomedb.models.Annotation object

Return type:

phenomedb.models.Annotation

get_or_add_annotation_not_unified(feature_row_index)

Get or add an annotation from a 3-file dataset

Parameters:: feature_row_index (int) – The feature metadata row index
Returns:: A corresponding phenomedb.models.Annotation object
Return type:: phenomedb.models.Annotation

get_or_add_annotation_unified(feature_column_index)

Get or add an annotation from a combined CSV file

Parameters:: feature_column_index (int) – The feature column index
Returns:: A corresponding phenomedb.models.Annotation object
Return type:: phenomedb.models.Annotation

get_or_add_feature_dataset_unified(): Get or add a phenomedb.models.FeatureDataset

get_or_add_feature_metadata_unified(): Gets or adds the phenomedb.models.FeatureMetadata (where cpd_name == Feature Name)

process(): The annotation import process method

class phenomedb.imports.DownloadMetabolightsStudy(study_folder_path=None, study_id=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)

Download a Metabolights Study. If only study_id set, will download the study as well

Parameters:

study_folder_path (str, optional) – Path to the study folder, defaults to None
study_id (int, optional) – ID of the Study, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

class phenomedb.imports.ImportBrukerIVDRAnnotations(project_name=None, annotation_method=None, version=None, is_latest=True, unified_csv_path=None, pipeline_run_id=None, sample_matrix=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True)

Import Bruker IVDr Annotations

Parameters:

annotation_method (str, required) – name of the annotation method, BI-QUANT or BI-LISA
unified_csv_path (str, required) – the path to the unified csv file
sample_matrix (str, required) – the sample matrix, ie plasma, urine, serum, etc
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

add_or_update_feature_metadata(annotation_id, feature_column_index)

Get FeatureMetadata (column) based on FeatureDataset.id and the Feature Name

Parameters:: feature_column_index (int) – The column index from the feature file
Returns:: Annotation
Return type:: phenomedb.models.Annotation

get_or_add_metadata(sample, sample_row_index)

Add the raw metadata to the metadata_raw table

Parameters:

sample (phenomedb.models.sample) – The sampling event of metadata_raw
sample_row_index (int) – The dataset row index of the sample

get_or_add_unit(unit_name)

Gets or adds a unit to the database (by unit name)

Parameters:: unit_name (str) – The phenomedb.models.Unit name
Returns:: The phenomedb.models.Unit
Return type:: phenomedb.models.Unit

load_dataset(): Loads the task dataset

map_and_add_dataset_data(): Map the imported nPYc dataset to the phenomeDB models and add to db.

post_commit_actions(): Triggers the post-commit pipelines

task_validation()

Run the task validation to check number and values of imported data

Raises:

ValidationError – FeatureDataset does not exist
ValidationError – FeatureDataset.saved_query_id does not exist
ValidationError – SampleAssay does not exist
ValidationError – Annotation does not exist
ValidationError – AnnotatedFeature does not exist
ValidationError – Unit does not exist
ValidationError – FeatureMetadata does not exist

class phenomedb.imports.ImportDataLocations(project_name, assay_name, data_locations_path=None, assay_platform=None, sample_matrix=None, task_run_id=None, username=None, db_env=None)

get_or_add_sample(sample_row_index)

Get or add Sample

Parameters:: sample_row_index (int) – The index of the sample file
Returns:: Sample
Return type:: phenomedb.models.Sample

get_or_add_sample_assay(sample_row_index, sample)

Get or add a SampleAssay

Parameters:

sample_row_index (int) – The row index of the sample Dataframe
sample (phenomedb.models.Sample) – The Sample of the SampleAssay

Returns:

SampleAssay

Return type:

phenomedb.models.SampleAssay

get_or_add_subject(subject_name)

Get or add a new subject

Parameters:: sample_row_index – The row number of the UnifiedCSV dataset
Returns:: subject object phenomedb.models.subject
Return type:: class:phenomedb.models.subject

load_dataset(): Loads the task dataset

map_and_add_dataset_data(): Map the imported nPYc dataset to the phenomeDB models and add to db.

nmr_assay_regexes = {'blood': {'0$': 'NOESY', '1$': 'CPMG', '2$': 'JRES', '3$': 'DAS'}, 'urine': {'0$': 'NOESY', '1$': 'JRES', '2$': 'DAS'}}: Import Run Order Class. Imports a run order file. :param task_options: A dictionary containing the task options :type task_options: dict

task_validation(): Validate the task - default method

class phenomedb.imports.ImportMetabolightsStudy(study_id=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)

Import a Metabolights Study. If only study_id set, will download the study as well

Parameters:

study_folder_path (str, optional) – Path to the study folder, defaults to None
study_id (int, optional) – ID of the Study, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

add_annotated_features(assay, metabolite_assignment_file, sample_assay, default_unit)

Add Annotations and AnnotatedFeatures

Parameters:

assay (phenomedb.models.Assay) – The Assay
metabolite_assignment_file (str) – The maf file
sample_assay (phenomedb.models.SampleAssay) – The SampleAssay

add_annotation_compounds(assay, annotation_file)

Add the AnnotationCompounds.

1. Add the Compound 3. Add the HarmonisedAnnotation 4. Add the AnnotationCompound

Parameters:

assay (phenomedb.models.Assay) – the Assay
annotation_file (str) – The annotation file

get_or_add_data_repository(name, accession_number=None, submission_date=None, public_release_date=None)

Get or add repository

Parameters:

name (str) – Name of the repo
accession_number (str, optional) – accession number of repo, defaults to None
submission_date (datetime.datetime, optional) – Date of submission, defaults to None
public_release_date (datetime.datetime, optional) – Date of release, defaults to None

Returns:

DataRepository

Return type:

phenomedb.models.DataRepository

get_or_add_metabolights_compound(assay, annotation_method, the_row)

Get or add Metabolights Compound

Parameters:

assay (phenomedb.models.Assay) – Assay
annotation_method (phenomedb.models.AnnotationMethod) – AnnotationMethod
chebi_id (str) – ChEBI ID
chemical_formula (str) – Chemical formula
smiles (str) – SMILES
inchi (str) – InChI
cpd_name (str) – cpd_name

Returns:

AnnotationCompound

Return type:

phenomedb.models.AnnotationCompound

get_or_add_metadata_field(field_name, field_value, sample)

Get or add metadata field

Parameters:

field_name (str) – The name of the metadata field
field_value (str) – The value of the metadata field
sample (phenomedb.models.Sample) – Sample

get_or_add_protocol(name=None, type=None, description=None, uri=None, version=None, parameters=None, components=None)

Get or add the protocol

Parameters:

name (str, optional) – protocol name, defaults to None
type (str, optional) – protocol type, defaults to None
description (str, optional) – description, defaults to None
uri (str, optional) – URI of the protocol, defaults to None
version (str, optional) – version of the protocol, defaults to None
parameters (dict, optional) – parameters of the protocol, defaults to None
components (dict, optional) – components of the protocol, defaults to None

get_or_add_publication(pubmed_id=None, doi=None, author_list=None, title=None, status=None)

Get or add publication

Parameters:

pubmed_id (int, optional) – pubmed id, defaults to None
doi (str, optional) – DOI, defaults to None
author_list (list, optional) – List of authors, defaults to None
title (str, optional) – publication title, defaults to None
status (str, optional) – Publication status, defaults to None

get_or_add_sample(subject, sample_name, sample_type_enum, sample_matrix)

Get or add Sample

Parameters:

subject (phenomedb.models.Subject) – Subject
sample_name (str) – Sample name
sample_type_enum (SampleType) – SampleType enum
sample_matrix (sample matrix) – Sample matrix

Returns:

Sample

Return type:

phenomedb.models.Sample

get_or_add_subject(sample_name, sample_type_enum)

Get or add subject

Parameters:

sample_name (str) – Sample name
sample_type_enum (str) – Sample Type

Returns:

Subject

Return type:

phenomedb.models.Subject

load_dataset(): Loads the dataset

load_study_description_file(filepath)

Takes study description file and builds a 2D dictionary of sections, and key -> values: Lists

ONTOLOGY SOURCE REFERENCE Term Source Name “OBI” “NCBITAXON” “CHMO” “CL” “EFO” “MS” “NCIT” INVESTIGATION Investigation Identifier “MTBLS1073”

study_description_dict[‘ONTOLOGY SOURCE REFERENCE’][‘Term Source Name’] = [“OBI”,”NCBITAXON”,”CHMO”,”CL”,”EFO”,”MS”,”NCIT”] study_description_dict[‘INVESTIGATION’][‘Investigation Identifier’] = “MTBLS1073”

Parameters:: filepath (str) – The path to the study description file

map_and_add_dataset_data(): Map and add dataset data

parse_assay_file(assay_file, data)

Parse an assay file

Parameters:

assay_file (str) – The name of the assay file
data (pd.Dataframe) – The assay data

Raises:

NotImplementedError – Assay platform not implemented

parse_assays(): Parse the assays

parse_persons()

Parse the persons

Returns:: person dict
Return type:: dict

parse_protocols(): Parse the protocols

parse_publications(): Parse the publications

parse_sample_information(): Parse sample information

parse_study_description(): Parse the study description

class phenomedb.imports.ImportMetabolightsXCMSAnnotations(study_id=None, xcms_file=None, assay_name=None, assay_name_order=None, sample_matrix=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)

Import MetabolightsXCMSAnnotations, e.g. to be run after DownloadMetabolightStudy and RunXCMS”

Parameters:

study_folder_path (str, optional) – Path to the study folder, defaults to None
study_id (int, optional) – ID of the Study, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

add_annotated_features(assay, metabolite_assignment_file, sample_assay, default_unit)

Add Annotations and AnnotatedFeatures

Parameters:

assay (phenomedb.models.Assay) – The Assay
metabolite_assignment_file (str) – The maf file
sample_assay (phenomedb.models.SampleAssay) – The SampleAssay
default_unit (phenomedb.models.Unit) – The default unit to use

add_annotation_compounds(assay, annotation_file, assay_file)

Add the AnnotationCompounds.

1. Add the Compound 3. Add the HarmonisedAnnotation 4. Add the AnnotationCompound

Parameters:

assay (phenomedb.models.Assay) – the Assay
annotation_file (str) – The annotation file
assay_file (str) – The name of the assay file

add_feature_metadata(assay, assay_file)

Add FeatureMetadata objects

Parameters:

assay (str) – The assay to import
assay_file (str) – The name of the assay file

Returns:

List of added FeatureMetadatas

Return type:

list

generate_feature_jsonb(sample_assay, feature_metadatas)

Build the features jsonb and add to SampleAssayFeatures

Parameters:

sample_assay (phenomedb.models.SampleAssay) – The phenomedb.models.SampleAssay object
feature_metadatas (list) – The list of added FeatureMetadatas

get_or_add_data_repository(name, accession_number=None, submission_date=None, public_release_date=None)

Get or add repository

Parameters:

name (str) – Name of the repo
accession_number (str, optional) – accession number of repo, defaults to None
submission_date (datetime.datetime, optional) – Date of submission, defaults to None
public_release_date (datetime.datetime, optional) – Date of release, defaults to None

Returns:

DataRepository

Return type:

phenomedb.models.DataRepository

get_or_add_metabolights_compound(assay, annotation_method, the_row)

Get or add Metabolights Compound

Parameters:

assay (phenomedb.models.Assay) – The phenomedb.models.Assay
annotation_method (phenomedb.models.AnnotationMethod) – The phenomedb.models.AnnotationMethod
the_row (pd.Series) – The row from the dataframe

Returns:

AnnotationCompound

Return type:

phenomedb.models.AnnotationCompound

get_or_add_metadata_field(field_name, field_value, sample)

Get or add metadata field

Parameters:

field_name (str) – The name of the metadata field
field_value (str) – The value of the metadata field
sample (phenomedb.models.Sample) – Sample

get_or_add_protocol(name=None, type=None, description=None, uri=None, version=None, parameters=None, components=None)

Get or add the protocol

Parameters:

name (str, optional) – protocol name, defaults to None
type (str, optional) – protocol type, defaults to None
description (str, optional) – description, defaults to None
uri (str, optional) – URI of the protocol, defaults to None
version (str, optional) – version of the protocol, defaults to None
parameters (dict, optional) – parameters of the protocol, defaults to None
components (dict, optional) – components of the protocol, defaults to None

get_or_add_publication(pubmed_id=None, doi=None, author_list=None, title=None, status=None)

Get or add publication

Parameters:

pubmed_id (int, optional) – pubmed id, defaults to None
doi (str, optional) – DOI, defaults to None
author_list (list, optional) – List of authors, defaults to None
title (str, optional) – publication title, defaults to None
status (str, optional) – Publication status, defaults to None

get_or_add_sample(subject, sample_name, sample_type_enum, sample_matrix)

Get or add Sample

Parameters:

subject (phenomedb.models.Subject) – Subject
sample_name (str) – Sample name
sample_type_enum (SampleType) – SampleType enum
sample_matrix (sample matrix) – Sample matrix

Returns:

Sample

Return type:

phenomedb.models.Sample

get_or_add_subject(sample_name, sample_type_enum)

Get or add subject

Parameters:

sample_name (str) – Sample name
sample_type_enum (str) – Sample Type

Returns:

Subject

Return type:

phenomedb.models.Subject

load_dataset(): Loads the dataset

load_study_description_file(filepath)

Takes study description file and builds a 2D dictionary of sections, and key -> values: Lists

ONTOLOGY SOURCE REFERENCE Term Source Name “OBI” “NCBITAXON” “CHMO” “CL” “EFO” “MS” “NCIT” INVESTIGATION Investigation Identifier “MTBLS1073”

study_description_dict[‘ONTOLOGY SOURCE REFERENCE’][‘Term Source Name’] = [“OBI”,”NCBITAXON”,”CHMO”,”CL”,”EFO”,”MS”,”NCIT”] study_description_dict[‘INVESTIGATION’][‘Investigation Identifier’] = “MTBLS1073”

Parameters:: filepath (str) – The path to the study description file

map_and_add_dataset_data(): Map and add dataset data

parse_assay_file(assay_file, data)

Parse an assay file

Parameters:

assay_file (str) – The name of the assay file
data (pandas.Dataframe) – The assay data

Raises:

NotImplementedError – [description]

parse_assays(): Parse the assays

parse_persons()

Parse the persons

Returns:: person dict
Return type:: dict

parse_protocols(): Parse the protocols

parse_publications(): Parse the publications

parse_sample_information(): Parse sample information

parse_study_description(): Parse the study description

class phenomedb.imports.ImportMetadata(project_name=None, filepath=None, id_column=None, id_type='Sample', columns_to_import=None, username=None, task_run_id=None, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None)

Import Metadata from a CSV file where rows are samples and columns are metadata fields.

Parameters:

project_name (str, optional) – The name of the Project, defaults to None
filepath (str, optional) – The path to the CSV file, defaults to None
id_column (str, optional) – The column name of the ID field, defaults to None
id_type (str, optional) – Are the IDs for Subject or Sample?, defaults to ‘Sample’
columns_to_import (list, optional) – Which columns to import, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

load_dataset(): Load the dataset

map_and_add_dataset_data()

Parse the dataset

Raises:: Exception – Unknown id_type (must be Sample or Subject)

class phenomedb.imports.ImportNPYC(project_name=None, assay_name=None, sample_matrix=None, dataset_path=None, sop=None, sample_metadata_path=None, sample_metadata_format=None, task_run_id=None, username=None, db_env=None, db_session=None, pipeline_run_id=None, execution_date=None)

_summary_

Parameters:

dataset_path – The path to the dataset folder, defaults to None.
sample_metadata_path (str, optional) – The path to the sample_metadata file, defaults to None.
sample_metadata_format (str, optional) – The sample_metadata format, defaults to None.
sop (str, optional) – The SOP to use, defaults to None.
assay_name (str, optional) – The name of the assay (ie LC-QQQ Bile Acids), defaults to None.
sample_matrix (str, optional) – The sample matrix (ie urine, plasma), defaults to None.
sop_file_path (str, optional) – The path to the SOP file used, defaults to “”.
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

get_or_add_annotated_features(sample_assay, sample_index)

Get or add annotated features

Parameters:

sample_assay (phenomedb.models.SampleAssay) – The phenomedb.models.SampleAssay object
sample_index (int) – The sample metadata row index

get_or_add_metadata(sample, sample_row_index)

Get or add metadata

Parameters:

sample (phenomedb.models.Sample) – The phenomedb.models.Sample
sample_row_index (int) – The sample metadata dataset row index

get_or_add_sample(subject, sample_row_index)

Get or add a new sample

Parameters:

subject (phenomedb.models.Subject) – The subject of the sample
sample_row_index (int) – The dataset row index the sample

Returns:

sample object phenomedb.models.Sample

Return type:

class:phenomedb.models.Sample

get_or_add_sample_assay(sample, sample_row_index)

Get or add a new sample

Parameters:

subject (phenomedb.models.Subject) – The phenomedb.models.Subject of the phenomedb.models.Sample
sample_row_index (int) – The dataset row index the sample

Returns:

phenomedb.models.SampleAssay object

Return type:

class:phenomedb.models.SampleAssay

get_or_add_subject(sample_row_index)

Get or add a new subject

Parameters:: index (int) – The row number of the sample_metadata
Returns:: subject object phenomedb.models.Subject
Return type:: class:phenomedb.models.subject

map_and_add_dataset_data(): Map and add the data

class phenomedb.imports.ImportPeakPantherAnnotations(project_name=None, feature_metadata_csv_path=None, sample_metadata_csv_path=None, intensity_data_csv_path=None, ppr_annotation_parameters_csv_path=None, sample_matrix=None, assay_name=None, roi_version=None, batch_corrected_data_csv_path=None, all_features_feature_metadata_csv_path=None, ppr_mz_csv_path=None, ppr_rt_csv_path=None, is_latest=True, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, run_batch_correction=False)

ImportPeakPantherAnnotations Class. Using the Basic CSV format, imports a peakPantheR Dataset, maps to phenomeDB.models, and commits to DB.

Parameters:

feature_metadata_csv_path (str, optional) – The path to the feature metadata csv file, defaults to None
sample_metadata_csv_path (str, optional) – The path to the sample metadata csv file, defaults to None
intensity_data_csv_path (str, optional) – The path to the intensity file, defaults to None
ppr_annotation_parameters_csv_path (str, optional) – The path to the PPR annotation parameters file, defaults to None
sample_matrix (str, optional) – The sample_matrix being imported, defaults to None
assay_name (str, optional) – The assay name being imported, defaults to None
roi_version (str, optional) – The version of the ROI file used for annotations, defaults to None
batch_corrected_data_csv_path (str, optional) – The path to the batch corrected intensity data, defaults to None
all_features_feature_metadata_csv_path (str, optional) – The path to the file containing all the searched features, defaults to None
ppr_mz_csv_path (str, optional) – The PPR MZ csv file path, defaults to None
ppr_rt_csv_path (str, optional) – The PPR RT csv file path, defaults to None
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

add_or_update_annotated_feature_not_unified(sample_assay, sample_row_index, feature_index, feature_name)

Add or update a phenomedb.models.AnnotatedFeature from a 3-file format file.

Parameters:

sample_assay (phenomedb.models.SampleAssay) – The phenomedb.models.SampleAssay object
sample_row_index (int) – The row of the sample metadata dataset
feature_index (int) – The row of the feature metadata dataset
feature_name (str) – The name of the feature

Raises:

Exception – Feature name not found

Returns:

The created/found phenomedb.models.AnnotatedFeature

Return type:

phenomedb.models.AnnotatedFeature

add_or_update_feature_metadata(annotation_id, feature_metadata_row, feature_name)

Get or update a phenomedb.models.FeatureMetadata object

Parameters:

annotation_id (int) – The phenomedb.models.Annotation ID
feature_metadata_row (pd.Series) – The row of the feature metadata dataset
feature_name (str) – The name of the feature

Returns:

The phenomedb.models.FeatureMetadata object

Return type:

phenomedb.models.FeatureMetadata

get_or_add_feature_dataset(): Get or add a phenomedb.models.FeatureDataset

get_or_add_feature_metadata(): Get or add feature metadata

get_or_add_unit(): Get or add phenomedb.models.Unit

load_dataset(): Loads the PeakPanther dataset, sets the name, and the loads the sampleInfo

map_and_add_dataset_data(): Map and add the intensity/abundances/annotated_features

post_commit_actions(): Triggers the post-commit pipelines

task_validation()

The task validation for the import, checking the number of entries and the values match.

Raises:

ValidationError – sample metadata file is not the same
ValidationError – feature metadata file is not the same
ValidationError – intensity data file is not the same
ValidationError – FeatureMetadata does not exist
ValidationError – FeatureDataset does not exist
ValidationError – AnnotatedFeature does not exist
ValidationError – SR corrected intensity does not match expected
ValidationError – Annotation does not exist
ValidationError – Expected FeatureDataset.sr_correction_parameters does not exist
ValidationError – FeatureDataset.saved_query_id does not exist

class phenomedb.imports.ImportSampleManifest(project_name=None, sample_manifest_path=None, columns_to_ignore=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, pipeline_run_id=None)

Import a Sample Manifest XLS file. The format of which is an excel file with 2 sheets, one called ‘samples’ with sample-level metadata, another called ‘subjects’, with subject-level metadata. Both sample-level and subject-level metadata are imported at the sample-level.

Parameters:

sample_manifest_path (str, optional) – _description_, defaults to None
columns_to_ignore (str, optional) – _description_, defaults to None
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

add_metadata(sample, sample_row_index)

Add the raw metadata to the metadata_raw table

Parameters:

sample (phenomedb.models.Sample) – The sampling event of metadata_raw
sample_row_index (int) – The dataset row index

Returns:

add_metadata_from_sample_worksheet(sample, sample_row_index, metadata_row={})

Adds metadata from the sample worksheet

Parameters:

sample (phenomedb.models.Sample) – sample object phenomedb.models.Sample
sample_row_index (int) – The dataset row index

add_metadata_from_subject_worksheet(sample, sample_row_index, metadata_row={})

Adds metadata fields from the subject worksheet

Parameters:

sample (phenomedb.models.Sample) – sample object phenomedb.models.Sample
sample_row_index (int) – The dataset row index

get_or_add_sample(subject, sample_row_index)

Get or add a sample

Parameters:

subject (phenomedb.models.subject) – the subject of the sample
sample_row_index (int) – The dataset row index of the sample

Returns:

sample object phenomedb.models.Sample

Return type:

phenomedb.models.sample

get_or_add_subject(subject_name)

Get or add the subject

Parameters:: subject_name (str) – the phenomedb.models.Subject name
Returns:: the matching phenomedb.models.Subject
Return type:: phenomedb.models.subject

load_dataset()

Loads the task dataset

Raises:: Exception – If the file extension is not .xlsx or .xlsx

map_and_add_dataset_data(): Map the imported nPYc dataset to the phenomeDB models and add to db.

task_validation()

Task validation, checks every entry exists in the database, and the values match.

Raises:

ValidationError – MetadataField incorrectly added
ValidationError – MetadataField missing
ValidationError – MetadataValue incorrectly added
ValidationError – MetadataValue missing

class phenomedb.imports.ImportTargetLynxAnnotations(project_name=None, unified_csv_path=None, sop=None, sop_version=None, assay_name=None, sample_matrix=None, sop_file_path='', is_latest=True, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True)

TargetLynx Task Class. Imports an nPYc TargetLynx Targeted Dataset, maps to phenomeDB.models, and commits to DB.

Parameters:

unified_csv_path – The path to the unified csv file, defaults to None.
sop (str, optional) – The SOP to use, defaults to None.
sop_version (str, optional) – The version of the SOP used, defaults to None.
assay_name (str, optional) – The name of the assay (ie LC-QQQ Bile Acids), defaults to None.
sample_matrix (str, optional) – The sample matrix (ie urine, plasma), defaults to None.
sop_file_path (str, optional :param project_name: The name of the project, defaults to None) – The path to the SOP file used, defaults to “”.
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

add_or_update_feature_metadata(annotation_id, feature_column_index)

Get FeatureMetadata (column) based on FeatureDataset.id and the Feature Name

Parameters:: feature_column_index (int) – The column index from the feature file
Returns:: The phenomedb.models.Annotation object
Return type:: phenomedb.models.Annotation

load_dataset(): Loads the task datasets.

map_and_add_dataset_data(): Map the imported nPYc dataset to the phenomeDB models and add to db.

task_validation()

The task validation, checks the counts and values of imported data

Raises:

ValidationError – FeatureDataset does not exist
ValidationError – FeatureDataset.saved_query_id does not exist
ValidationError – SampleAssay does not exist
ValidationError – AnnotatedFeature does not exist
ValidationError – FeatureMetadata does not exist
ValidationError – Annotation does not exist

class phenomedb.imports.ImportTask(project_name=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, pipeline_run_id=None)

The ImportTask class. Used as the base class for the major import methods. Not used for compounds. Should not be instantiated itself, only from a child class.

Parameters:

project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

add_metadata_field(field_name)

Add a phenomedb.models.MetadataField

Parameters:: field_name (str) – The phenomedb.models.MetadataField name
Returns:: The added phenomedb.models.MetadataField
Return type:: phenomedb.models.MetadataField

add_metadata_field_and_value(sample_id, field_name, field_value)

Adds and flushes the metadata field and values

Parameters:

sample_id (int) – The phenomedb.model.Sample ID
field_name (str) – The name of the metadata field
field_value (str) – The value of the metadata field for the sample

add_metadata_value(metadata_field, sample_id, field_value)

Add a phenomedb.models.MetadataValue.

Parameters:

metadata_field (phenomedb.models.MetadataField) – The phenomedb.models.MetadataField
sample_id (int) – The phenomedb.models.Sample ID
field_value (str) – The value of the metadata field for the sample

get_annotated_feature(feature_metadata_id, sample_assay_id)

Get a annotated_feature by feature metadata id and sample_assay.id

Parameters:

feature_metadata_id (int) – The id of the annotation
sample_assay_id (int) – The id of the SampleAssay

Returns:

The AnnotatedFeature

Return type:

phenomedb.models.AnnotatedFeature

get_annotation_by_cpd_name_and_annotation_method(cpd_name)

Get the annotation by cpd_name

Parameters:: cpd_name (str) – The name of the compound as seen seen in the annotation datasets
Returns:: The found AnnotationCompound
Return type:: phenomedb.models.AnnotationCompound

get_or_add_sample(subject, sample_row_index)

Get or add a new sample

Parameters:

subject (phenomedb.models.subject) – The subject of the sample
sample_row_index (in) – The dataset row index the sample

Returns:

sample object phenomedb.models.Sample

Return type:

class:phenomedb.models.Sample

get_or_add_subject(sample_row_index)

Get or add a new subject

Parameters:: index (int) – The row number of the sample_metadata
Returns:: subject object phenomedb.models.subject
Return type:: class:phenomedb.models.subject

get_or_build_sample_name(sample_row_index, column=None)

Get or build the sample name

Parameters:

sample_row_index (str, optional) – The row index of the sample metadata
column – The column name to use, default None

Raises:

Exception – Sample ID not found

Returns:

The sample name

Return type:

str

get_or_build_subject_name(index, sample_metadata_row, sample_type_column_name='SampleType', subject_column_name='Subject ID')

Get or build subject name

Parameters:

index (int) – The row index
sample_metadata_row (numpy.Series) – The sample_metadata row
sample_type_column_name (str, optional) – The name of the column with SampleType, defaults to ‘SampleType’
subject_column_name (str, optional) – The name of the column with the Subject ID, defaults to ‘Subject ID’

Returns:

The subject name

Return type:

str

get_project()

Gets a project to the database (by project_name)

Raises:: Exception – If no project by that name exists
Returns:: sample_assay object phenomedb.models.Project
Return type:: class:phenomedb.models.Project

get_sample_assay(sample, sample_row_index)

Get or add a new sample_assay

Parameters:

sample (phenomedb.models.sample) – The sample of the sample_assay
sample_row_index (int) – The dataset row index of the sample

Returns:

sample_assay object phenomedb.models.SampleAssay

Return type:

class:phenomedb.models.sample_assay

abstract load_dataset(): Load that dataset. This method should be over-ridden in the task class. This method can call a python package or do a system call.

abstract map_and_add_dataset_data(): Map the dataset and import to phenomeDB

parse_value(value)

Parse a raw value to convert the necessary type

Parameters:: value (object) – The raw value to convert
Returns:: The parsed, converted value
Return type:: float

process(): Main method

send_user_failure_email(err)

Send a TaskRun failure email to the user

Parameters:: err (str) – The error message

send_user_success_email()

Send a TaskRun success email to the user

Parameters:: err (str) – The error message

simple_report(): Check the database contains entries for all records in the dataset

task_validation(): Validate the task - default method

class phenomedb.imports.ImportXCMSFeatures(project_name=None, unified_csv_path=None, sample_matrix=None, assay_name=None, task_run_id=None, username=None, db_env=None)

XCMSFeatureImportTaskUnifiedCSV Class. Using the Unified CSV format, imports an XCMS Dataset, maps to phenomeDB.models, and commits to DB.

TO DO: Finish the import code

Parameters:

unified_csv_path (str, optional) – The path to the unified csv file, defaults to None.
assay_name (str, optional) – The name of the assay (ie LC-QQQ Bile Acids), defaults to None.
sample_matrix (str, optional) – The sample matrix (ie urine, plasma), defaults to None.
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

build_feature_dict(sample_row_index, feature_column_index)

Build a dictionary of features

Parameters:

sample_row_index (int) – The row index of the sample metadata dataset
feature_column_index (int) – The column index of the feature metadata dataset

Returns:

A dictionary of FeatureMetadata properties

Return type:

dict

get_or_add_assay(): Gets or adds an assay to the database (by assay_name)

load_dataset(): Loads the XCMS dataset, sets the name, and the loads the sampleInfo

map_and_add_dataset_data()

Map and add the dataset data

Raises:: Exception – No Sampling ID or Sample File Name in row

phenomedb.imports.getrandbits(k) → x. Generates an int with k random bits.

phenomedb.imports.random() → x in the interval [0, 1).