phenomedb.imports
Mappings between a 3-file format metabolomics dataset and the PhenomeDB core data model
Main import tasks
ImportMetadata - import sample metadata from a CSV where rows are samples and columns are metadata fields
ImportBrukerIVDrAnnotations - import annotated metabolite measurements/abundances from a Bruker IVDr NMR dataset.
ImportPeakPantheRAnnotation - import annotated metabolite measurements/abundances from a PeakPantheR LC-MS dataset.
ImportMetabolights - import metabolite features and annotations from Metabolights format
Overview of the ImportTask class structure. All ImportTasks inherit this transactional-validation approach.
- class phenomedb.imports.AnnotationImportTask(project_name=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, pipeline_run_id=None)
The AnnotationImportTask class. Used as the base class for the major annotation import methods.
- Parameters:
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- add_or_update_annotated_feature_unified(sample_assay, sample_row_index, feature_index)
Add a annotated_feature
- Parameters:
sample_assay (
phenomedb.models.SampleAssay) – The sample_assay of the annotationfeature_metadata_row (dict) – The row the feature_metadata
intensity (float) – The intensity value
- add_or_update_sample_assay(sample, sample_row_index, dataset)
Get or add a new sample_assay
- Parameters:
sample (
phenomedb.models.Sample) – The sample of the sample_assaysample_row_index (
pd.DataFrame) – The dataset row index of the samplesample_row_index – The dataset
- Returns:
sample_assay object
phenomedb.models.sample_assay- Return type:
class:phenomedb.models.SampleAssay
- check_sample_columns(dataset)
Check the sample columns in the dataset
- Parameters:
dataset (
pandas.DataFrame) – The imported dataset/CSV file- Raises:
Exception – No Sample ID or Sample ID column
Exception – Minimum column missing
- create_saved_query()
Create a SavedQuery for the dataset for downstream analysis
- get_or_add_annotation(cpd_name, cpd_id=None)
_summary_
- Parameters:
cpd_name (str) – The cpd_name to add
cpd_id (str, optional) – The cpd_id to import, defaults to None
- Returns:
A corresponding
phenomedb.models.Annotationobject- Return type:
- get_or_add_annotation_not_unified(feature_row_index)
Get or add an annotation from a 3-file dataset
- Parameters:
feature_row_index (int) – The feature metadata row index
- Returns:
A corresponding
phenomedb.models.Annotationobject- Return type:
- get_or_add_annotation_unified(feature_column_index)
Get or add an annotation from a combined CSV file
- Parameters:
feature_column_index (int) – The feature column index
- Returns:
A corresponding
phenomedb.models.Annotationobject- Return type:
- get_or_add_feature_dataset_unified()
Get or add a
phenomedb.models.FeatureDataset
- get_or_add_feature_metadata_unified()
Gets or adds the
phenomedb.models.FeatureMetadata(where cpd_name == Feature Name)
- process()
The annotation import process method
- class phenomedb.imports.DownloadMetabolightsStudy(study_folder_path=None, study_id=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)
Download a Metabolights Study. If only study_id set, will download the study as well
- Parameters:
study_folder_path (str, optional) – Path to the study folder, defaults to None
study_id (int, optional) – ID of the Study, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- class phenomedb.imports.ImportBrukerIVDRAnnotations(project_name=None, annotation_method=None, version=None, is_latest=True, unified_csv_path=None, pipeline_run_id=None, sample_matrix=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True)
Import Bruker IVDr Annotations
- Parameters:
annotation_method (str, required) – name of the annotation method, BI-QUANT or BI-LISA
unified_csv_path (str, required) – the path to the unified csv file
sample_matrix (str, required) – the sample matrix, ie plasma, urine, serum, etc
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- add_or_update_feature_metadata(annotation_id, feature_column_index)
Get FeatureMetadata (column) based on FeatureDataset.id and the Feature Name
- Parameters:
feature_column_index (int) – The column index from the feature file
- Returns:
Annotation
- Return type:
phenomedb.models.Annotation
- get_or_add_metadata(sample, sample_row_index)
Add the raw metadata to the metadata_raw table
- Parameters:
sample (
phenomedb.models.sample) – The sampling event of metadata_rawsample_row_index (int) – The dataset row index of the sample
- get_or_add_unit(unit_name)
Gets or adds a unit to the database (by unit name)
- Parameters:
unit_name (str) – The
phenomedb.models.Unitname- Returns:
- Return type:
- load_dataset()
Loads the task dataset
- map_and_add_dataset_data()
Map the imported nPYc dataset to the phenomeDB models and add to db.
- post_commit_actions()
Triggers the post-commit pipelines
- task_validation()
Run the task validation to check number and values of imported data
- Raises:
ValidationError – FeatureDataset does not exist
ValidationError – FeatureDataset.saved_query_id does not exist
ValidationError – SampleAssay does not exist
ValidationError – Annotation does not exist
ValidationError – AnnotatedFeature does not exist
ValidationError – Unit does not exist
ValidationError – FeatureMetadata does not exist
- class phenomedb.imports.ImportDataLocations(project_name, assay_name, data_locations_path=None, assay_platform=None, sample_matrix=None, task_run_id=None, username=None, db_env=None)
- get_or_add_sample(sample_row_index)
Get or add Sample
- Parameters:
sample_row_index (int) – The index of the sample file
- Returns:
Sample
- Return type:
phenomedb.models.Sample
- get_or_add_sample_assay(sample_row_index, sample)
Get or add a SampleAssay
- Parameters:
sample_row_index (int) – The row index of the sample Dataframe
sample (phenomedb.models.Sample) – The Sample of the SampleAssay
- Returns:
SampleAssay
- Return type:
phenomedb.models.SampleAssay
- get_or_add_subject(subject_name)
Get or add a new subject
- Parameters:
sample_row_index – The row number of the UnifiedCSV dataset
- Returns:
subject object
phenomedb.models.subject- Return type:
class:phenomedb.models.subject
- load_dataset()
Loads the task dataset
- map_and_add_dataset_data()
Map the imported nPYc dataset to the phenomeDB models and add to db.
- nmr_assay_regexes = {'blood': {'0$': 'NOESY', '1$': 'CPMG', '2$': 'JRES', '3$': 'DAS'}, 'urine': {'0$': 'NOESY', '1$': 'JRES', '2$': 'DAS'}}
Import Run Order Class. Imports a run order file. :param task_options: A dictionary containing the task options :type task_options: dict
- task_validation()
Validate the task - default method
- class phenomedb.imports.ImportMetabolightsStudy(study_id=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)
Import a Metabolights Study. If only study_id set, will download the study as well
- Parameters:
study_folder_path (str, optional) – Path to the study folder, defaults to None
study_id (int, optional) – ID of the Study, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- add_annotated_features(assay, metabolite_assignment_file, sample_assay, default_unit)
Add Annotations and AnnotatedFeatures
- Parameters:
assay (phenomedb.models.Assay) – The Assay
metabolite_assignment_file (str) – The maf file
sample_assay (
phenomedb.models.SampleAssay) – The SampleAssay
- add_annotation_compounds(assay, annotation_file)
Add the AnnotationCompounds.
1. Add the Compound 3. Add the HarmonisedAnnotation 4. Add the AnnotationCompound
- Parameters:
assay (
phenomedb.models.Assay) – the Assayannotation_file (str) – The annotation file
- get_or_add_data_repository(name, accession_number=None, submission_date=None, public_release_date=None)
Get or add repository
- Parameters:
name (str) – Name of the repo
accession_number (str, optional) – accession number of repo, defaults to None
submission_date (datetime.datetime, optional) – Date of submission, defaults to None
public_release_date (datetime.datetime, optional) – Date of release, defaults to None
- Returns:
DataRepository
- Return type:
- get_or_add_metabolights_compound(assay, annotation_method, the_row)
Get or add Metabolights Compound
- Parameters:
assay (
phenomedb.models.Assay) – Assayannotation_method (
phenomedb.models.AnnotationMethod) – AnnotationMethodchebi_id (str) – ChEBI ID
chemical_formula (str) – Chemical formula
smiles (str) – SMILES
inchi (str) – InChI
cpd_name (str) – cpd_name
- Returns:
AnnotationCompound
- Return type:
- get_or_add_metadata_field(field_name, field_value, sample)
Get or add metadata field
- Parameters:
field_name (str) – The name of the metadata field
field_value (str) – The value of the metadata field
sample (
phenomedb.models.Sample) – Sample
- get_or_add_protocol(name=None, type=None, description=None, uri=None, version=None, parameters=None, components=None)
Get or add the protocol
- Parameters:
name (str, optional) – protocol name, defaults to None
type (str, optional) – protocol type, defaults to None
description (str, optional) – description, defaults to None
uri (str, optional) – URI of the protocol, defaults to None
version (str, optional) – version of the protocol, defaults to None
parameters (dict, optional) – parameters of the protocol, defaults to None
components (dict, optional) – components of the protocol, defaults to None
- get_or_add_publication(pubmed_id=None, doi=None, author_list=None, title=None, status=None)
Get or add publication
- Parameters:
pubmed_id (int, optional) – pubmed id, defaults to None
doi (str, optional) – DOI, defaults to None
author_list (list, optional) – List of authors, defaults to None
title (str, optional) – publication title, defaults to None
status (str, optional) – Publication status, defaults to None
- get_or_add_sample(subject, sample_name, sample_type_enum, sample_matrix)
Get or add Sample
- Parameters:
subject (
phenomedb.models.Subject) – Subjectsample_name (str) – Sample name
sample_type_enum (SampleType) – SampleType enum
sample_matrix (sample matrix) – Sample matrix
- Returns:
Sample
- Return type:
- get_or_add_subject(sample_name, sample_type_enum)
Get or add subject
- Parameters:
sample_name (str) – Sample name
sample_type_enum (str) – Sample Type
- Returns:
Subject
- Return type:
- load_dataset()
Loads the dataset
- load_study_description_file(filepath)
- Takes study description file and builds a 2D dictionary of sections, and key -> values
Lists
ONTOLOGY SOURCE REFERENCE Term Source Name “OBI” “NCBITAXON” “CHMO” “CL” “EFO” “MS” “NCIT” INVESTIGATION Investigation Identifier “MTBLS1073”
study_description_dict[‘ONTOLOGY SOURCE REFERENCE’][‘Term Source Name’] = [“OBI”,”NCBITAXON”,”CHMO”,”CL”,”EFO”,”MS”,”NCIT”] study_description_dict[‘INVESTIGATION’][‘Investigation Identifier’] = “MTBLS1073”
- Parameters:
filepath (str) – The path to the study description file
- map_and_add_dataset_data()
Map and add dataset data
- parse_assay_file(assay_file, data)
Parse an assay file
- Parameters:
assay_file (str) – The name of the assay file
data (
pd.Dataframe) – The assay data
- Raises:
NotImplementedError – Assay platform not implemented
- parse_assays()
Parse the assays
- parse_persons()
Parse the persons
- Returns:
person dict
- Return type:
dict
- parse_protocols()
Parse the protocols
- parse_publications()
Parse the publications
- parse_sample_information()
Parse sample information
- parse_study_description()
Parse the study description
- class phenomedb.imports.ImportMetabolightsXCMSAnnotations(study_id=None, xcms_file=None, assay_name=None, assay_name_order=None, sample_matrix=None, task_run_id=None, username=None, db_env=None, execution_date=None, db_session=None, pipeline_run_id=None)
Import MetabolightsXCMSAnnotations, e.g. to be run after DownloadMetabolightStudy and RunXCMS”
- Parameters:
study_folder_path (str, optional) – Path to the study folder, defaults to None
study_id (int, optional) – ID of the Study, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- add_annotated_features(assay, metabolite_assignment_file, sample_assay, default_unit)
Add Annotations and AnnotatedFeatures
- Parameters:
assay (
phenomedb.models.Assay) – The Assaymetabolite_assignment_file (str) – The maf file
sample_assay (phenomedb.models.SampleAssay) – The SampleAssay
default_unit (
phenomedb.models.Unit) – The default unit to use
- add_annotation_compounds(assay, annotation_file, assay_file)
Add the AnnotationCompounds.
1. Add the Compound 3. Add the HarmonisedAnnotation 4. Add the AnnotationCompound
- Parameters:
assay (phenomedb.models.Assay) – the Assay
annotation_file (str) – The annotation file
assay_file (str) – The name of the assay file
- add_feature_metadata(assay, assay_file)
Add FeatureMetadata objects
- Parameters:
assay (str) – The assay to import
assay_file (str) – The name of the assay file
- Returns:
List of added FeatureMetadatas
- Return type:
list
- generate_feature_jsonb(sample_assay, feature_metadatas)
Build the features jsonb and add to SampleAssayFeatures
- Parameters:
sample_assay (
phenomedb.models.SampleAssay) – Thephenomedb.models.SampleAssayobjectfeature_metadatas (list) – The list of added FeatureMetadatas
- get_or_add_data_repository(name, accession_number=None, submission_date=None, public_release_date=None)
Get or add repository
- Parameters:
name (str) – Name of the repo
accession_number (str, optional) – accession number of repo, defaults to None
submission_date (datetime.datetime, optional) – Date of submission, defaults to None
public_release_date (datetime.datetime, optional) – Date of release, defaults to None
- Returns:
DataRepository
- Return type:
phenomedb.models.DataRepository
- get_or_add_metabolights_compound(assay, annotation_method, the_row)
Get or add Metabolights Compound
- Parameters:
assay (
phenomedb.models.Assay) – Thephenomedb.models.Assayannotation_method (
phenomedb.models.AnnotationMethod) – Thephenomedb.models.AnnotationMethodthe_row (
pd.Series) – The row from the dataframe
- Returns:
AnnotationCompound
- Return type:
phenomedb.models.AnnotationCompound
- get_or_add_metadata_field(field_name, field_value, sample)
Get or add metadata field
- Parameters:
field_name (str) – The name of the metadata field
field_value (str) – The value of the metadata field
sample (
phenomedb.models.Sample) – Sample
- get_or_add_protocol(name=None, type=None, description=None, uri=None, version=None, parameters=None, components=None)
Get or add the protocol
- Parameters:
name (str, optional) – protocol name, defaults to None
type (str, optional) – protocol type, defaults to None
description (str, optional) – description, defaults to None
uri (str, optional) – URI of the protocol, defaults to None
version (str, optional) – version of the protocol, defaults to None
parameters (dict, optional) – parameters of the protocol, defaults to None
components (dict, optional) – components of the protocol, defaults to None
- get_or_add_publication(pubmed_id=None, doi=None, author_list=None, title=None, status=None)
Get or add publication
- Parameters:
pubmed_id (int, optional) – pubmed id, defaults to None
doi (str, optional) – DOI, defaults to None
author_list (list, optional) – List of authors, defaults to None
title (str, optional) – publication title, defaults to None
status (str, optional) – Publication status, defaults to None
- get_or_add_sample(subject, sample_name, sample_type_enum, sample_matrix)
Get or add Sample
- Parameters:
subject (
phenomedb.models.Subject) – Subjectsample_name (str) – Sample name
sample_type_enum (SampleType) – SampleType enum
sample_matrix (sample matrix) – Sample matrix
- Returns:
Sample
- Return type:
- get_or_add_subject(sample_name, sample_type_enum)
Get or add subject
- Parameters:
sample_name (str) – Sample name
sample_type_enum (str) – Sample Type
- Returns:
Subject
- Return type:
- load_dataset()
Loads the dataset
- load_study_description_file(filepath)
- Takes study description file and builds a 2D dictionary of sections, and key -> values
Lists
ONTOLOGY SOURCE REFERENCE Term Source Name “OBI” “NCBITAXON” “CHMO” “CL” “EFO” “MS” “NCIT” INVESTIGATION Investigation Identifier “MTBLS1073”
study_description_dict[‘ONTOLOGY SOURCE REFERENCE’][‘Term Source Name’] = [“OBI”,”NCBITAXON”,”CHMO”,”CL”,”EFO”,”MS”,”NCIT”] study_description_dict[‘INVESTIGATION’][‘Investigation Identifier’] = “MTBLS1073”
- Parameters:
filepath (str) – The path to the study description file
- map_and_add_dataset_data()
Map and add dataset data
- parse_assay_file(assay_file, data)
Parse an assay file
- Parameters:
assay_file (str) – The name of the assay file
data (pandas.Dataframe) – The assay data
- Raises:
NotImplementedError – [description]
- parse_assays()
Parse the assays
- parse_persons()
Parse the persons
- Returns:
person dict
- Return type:
dict
- parse_protocols()
Parse the protocols
- parse_publications()
Parse the publications
- parse_sample_information()
Parse sample information
- parse_study_description()
Parse the study description
- class phenomedb.imports.ImportMetadata(project_name=None, filepath=None, id_column=None, id_type='Sample', columns_to_import=None, username=None, task_run_id=None, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None)
Import Metadata from a CSV file where rows are samples and columns are metadata fields.
- Parameters:
project_name (str, optional) – The name of the Project, defaults to None
filepath (str, optional) – The path to the CSV file, defaults to None
id_column (str, optional) – The column name of the ID field, defaults to None
id_type (str, optional) – Are the IDs for Subject or Sample?, defaults to ‘Sample’
columns_to_import (list, optional) – Which columns to import, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- load_dataset()
Load the dataset
- map_and_add_dataset_data()
Parse the dataset
- Raises:
Exception – Unknown id_type (must be Sample or Subject)
- class phenomedb.imports.ImportNPYC(project_name=None, assay_name=None, sample_matrix=None, dataset_path=None, sop=None, sample_metadata_path=None, sample_metadata_format=None, task_run_id=None, username=None, db_env=None, db_session=None, pipeline_run_id=None, execution_date=None)
_summary_
- Parameters:
dataset_path – The path to the dataset folder, defaults to None.
sample_metadata_path (str, optional) – The path to the sample_metadata file, defaults to None.
sample_metadata_format (str, optional) – The sample_metadata format, defaults to None.
sop (str, optional) – The SOP to use, defaults to None.
assay_name (str, optional) – The name of the assay (ie LC-QQQ Bile Acids), defaults to None.
sample_matrix (str, optional) – The sample matrix (ie urine, plasma), defaults to None.
sop_file_path (str, optional) – The path to the SOP file used, defaults to “”.
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- get_or_add_annotated_features(sample_assay, sample_index)
Get or add annotated features
- Parameters:
sample_assay (
phenomedb.models.SampleAssay) – Thephenomedb.models.SampleAssayobjectsample_index (int) – The sample metadata row index
- get_or_add_metadata(sample, sample_row_index)
Get or add metadata
- Parameters:
sample (
phenomedb.models.Sample) – Thephenomedb.models.Samplesample_row_index (int) – The sample metadata dataset row index
- get_or_add_sample(subject, sample_row_index)
Get or add a new sample
- Parameters:
subject (
phenomedb.models.Subject) – The subject of the samplesample_row_index (int) – The dataset row index the sample
- Returns:
sample object
phenomedb.models.Sample- Return type:
class:phenomedb.models.Sample
- get_or_add_sample_assay(sample, sample_row_index)
Get or add a new sample
- Parameters:
subject (
phenomedb.models.Subject) – Thephenomedb.models.Subjectof thephenomedb.models.Samplesample_row_index (int) – The dataset row index the sample
- Returns:
phenomedb.models.SampleAssayobject- Return type:
class:phenomedb.models.SampleAssay
- get_or_add_subject(sample_row_index)
Get or add a new subject
- Parameters:
index (int) – The row number of the sample_metadata
- Returns:
subject object
phenomedb.models.Subject- Return type:
class:phenomedb.models.subject
- map_and_add_dataset_data()
Map and add the data
- class phenomedb.imports.ImportPeakPantherAnnotations(project_name=None, feature_metadata_csv_path=None, sample_metadata_csv_path=None, intensity_data_csv_path=None, ppr_annotation_parameters_csv_path=None, sample_matrix=None, assay_name=None, roi_version=None, batch_corrected_data_csv_path=None, all_features_feature_metadata_csv_path=None, ppr_mz_csv_path=None, ppr_rt_csv_path=None, is_latest=True, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, run_batch_correction=False)
ImportPeakPantherAnnotations Class. Using the Basic CSV format, imports a peakPantheR Dataset, maps to phenomeDB.models, and commits to DB.
- Parameters:
feature_metadata_csv_path (str, optional) – The path to the feature metadata csv file, defaults to None
sample_metadata_csv_path (str, optional) – The path to the sample metadata csv file, defaults to None
intensity_data_csv_path (str, optional) – The path to the intensity file, defaults to None
ppr_annotation_parameters_csv_path (str, optional) – The path to the PPR annotation parameters file, defaults to None
sample_matrix (str, optional) – The sample_matrix being imported, defaults to None
assay_name (str, optional) – The assay name being imported, defaults to None
roi_version (str, optional) – The version of the ROI file used for annotations, defaults to None
batch_corrected_data_csv_path (str, optional) – The path to the batch corrected intensity data, defaults to None
all_features_feature_metadata_csv_path (str, optional) – The path to the file containing all the searched features, defaults to None
ppr_mz_csv_path (str, optional) – The PPR MZ csv file path, defaults to None
ppr_rt_csv_path (str, optional) – The PPR RT csv file path, defaults to None
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- add_or_update_annotated_feature_not_unified(sample_assay, sample_row_index, feature_index, feature_name)
Add or update a
phenomedb.models.AnnotatedFeaturefrom a 3-file format file.- Parameters:
sample_assay (
phenomedb.models.SampleAssay) – Thephenomedb.models.SampleAssayobjectsample_row_index (int) – The row of the sample metadata dataset
feature_index (int) – The row of the feature metadata dataset
feature_name (str) – The name of the feature
- Raises:
Exception – Feature name not found
- Returns:
The created/found
phenomedb.models.AnnotatedFeature- Return type:
- add_or_update_feature_metadata(annotation_id, feature_metadata_row, feature_name)
Get or update a
phenomedb.models.FeatureMetadataobject- Parameters:
annotation_id (int) – The
phenomedb.models.AnnotationIDfeature_metadata_row (
pd.Series) – The row of the feature metadata datasetfeature_name (str) – The name of the feature
- Returns:
The
phenomedb.models.FeatureMetadataobject- Return type:
- get_or_add_feature_dataset()
Get or add a
phenomedb.models.FeatureDataset
- get_or_add_feature_metadata()
Get or add feature metadata
- get_or_add_unit()
Get or add
phenomedb.models.Unit
- load_dataset()
Loads the PeakPanther dataset, sets the name, and the loads the sampleInfo
- map_and_add_dataset_data()
Map and add the intensity/abundances/annotated_features
- post_commit_actions()
Triggers the post-commit pipelines
- task_validation()
The task validation for the import, checking the number of entries and the values match.
- Raises:
ValidationError – sample metadata file is not the same
ValidationError – feature metadata file is not the same
ValidationError – intensity data file is not the same
ValidationError – FeatureMetadata does not exist
ValidationError – FeatureDataset does not exist
ValidationError – AnnotatedFeature does not exist
ValidationError – SR corrected intensity does not match expected
ValidationError – Annotation does not exist
ValidationError – Expected FeatureDataset.sr_correction_parameters does not exist
ValidationError – FeatureDataset.saved_query_id does not exist
- class phenomedb.imports.ImportSampleManifest(project_name=None, sample_manifest_path=None, columns_to_ignore=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, pipeline_run_id=None)
Import a Sample Manifest XLS file. The format of which is an excel file with 2 sheets, one called ‘samples’ with sample-level metadata, another called ‘subjects’, with subject-level metadata. Both sample-level and subject-level metadata are imported at the sample-level.
- Parameters:
sample_manifest_path (str, optional) – _description_, defaults to None
columns_to_ignore (str, optional) – _description_, defaults to None
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- add_metadata(sample, sample_row_index)
Add the raw metadata to the metadata_raw table
- Parameters:
sample (
phenomedb.models.Sample) – The sampling event of metadata_rawsample_row_index (int) – The dataset row index
- Returns:
- add_metadata_from_sample_worksheet(sample, sample_row_index, metadata_row={})
Adds metadata from the sample worksheet
- Parameters:
sample (
phenomedb.models.Sample) – sample objectphenomedb.models.Samplesample_row_index (int) – The dataset row index
- add_metadata_from_subject_worksheet(sample, sample_row_index, metadata_row={})
Adds metadata fields from the subject worksheet
- Parameters:
sample (
phenomedb.models.Sample) – sample objectphenomedb.models.Samplesample_row_index (int) – The dataset row index
- get_or_add_sample(subject, sample_row_index)
Get or add a sample
- Parameters:
subject (
phenomedb.models.subject) – the subject of the samplesample_row_index (int) – The dataset row index of the sample
- Returns:
sample object
phenomedb.models.Sample- Return type:
phenomedb.models.sample
- get_or_add_subject(subject_name)
Get or add the subject
- Parameters:
subject_name (str) – the
phenomedb.models.Subjectname- Returns:
the matching
phenomedb.models.Subject- Return type:
phenomedb.models.subject
- load_dataset()
Loads the task dataset
- Raises:
Exception – If the file extension is not .xlsx or .xlsx
- map_and_add_dataset_data()
Map the imported nPYc dataset to the phenomeDB models and add to db.
- task_validation()
Task validation, checks every entry exists in the database, and the values match.
- Raises:
ValidationError – MetadataField incorrectly added
ValidationError – MetadataField missing
ValidationError – MetadataValue incorrectly added
ValidationError – MetadataValue missing
- class phenomedb.imports.ImportTargetLynxAnnotations(project_name=None, unified_csv_path=None, sop=None, sop_version=None, assay_name=None, sample_matrix=None, sop_file_path='', is_latest=True, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True)
TargetLynx Task Class. Imports an nPYc TargetLynx Targeted Dataset, maps to phenomeDB.models, and commits to DB.
- Parameters:
unified_csv_path – The path to the unified csv file, defaults to None.
sop (str, optional) – The SOP to use, defaults to None.
sop_version (str, optional) – The version of the SOP used, defaults to None.
assay_name (str, optional) – The name of the assay (ie LC-QQQ Bile Acids), defaults to None.
sample_matrix (str, optional) – The sample matrix (ie urine, plasma), defaults to None.
sop_file_path (str, optional :param project_name: The name of the project, defaults to None) – The path to the SOP file used, defaults to “”.
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- add_or_update_feature_metadata(annotation_id, feature_column_index)
Get FeatureMetadata (column) based on FeatureDataset.id and the Feature Name
- Parameters:
feature_column_index (int) – The column index from the feature file
- Returns:
The phenomedb.models.Annotation object
- Return type:
phenomedb.models.Annotation
- load_dataset()
Loads the task datasets.
- map_and_add_dataset_data()
Map the imported nPYc dataset to the phenomeDB models and add to db.
- task_validation()
The task validation, checks the counts and values of imported data
- Raises:
ValidationError – FeatureDataset does not exist
ValidationError – FeatureDataset.saved_query_id does not exist
ValidationError – SampleAssay does not exist
ValidationError – AnnotatedFeature does not exist
ValidationError – FeatureMetadata does not exist
ValidationError – Annotation does not exist
- class phenomedb.imports.ImportTask(project_name=None, task_run_id=None, username=None, db_env=None, db_session=None, execution_date=None, validate=True, pipeline_run_id=None)
The ImportTask class. Used as the base class for the major import methods. Not used for compounds. Should not be instantiated itself, only from a child class.
- Parameters:
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- add_metadata_field(field_name)
Add a
phenomedb.models.MetadataField- Parameters:
field_name (str) – The
phenomedb.models.MetadataFieldname- Returns:
The added
phenomedb.models.MetadataField- Return type:
- add_metadata_field_and_value(sample_id, field_name, field_value)
Adds and flushes the metadata field and values
- Parameters:
sample_id (int) – The
phenomedb.model.SampleIDfield_name (str) – The name of the metadata field
field_value (str) – The value of the metadata field for the sample
- add_metadata_value(metadata_field, sample_id, field_value)
Add a
phenomedb.models.MetadataValue.- Parameters:
metadata_field (
phenomedb.models.MetadataField) – Thephenomedb.models.MetadataFieldsample_id (int) – The
phenomedb.models.SampleIDfield_value (str) – The value of the metadata field for the sample
- get_annotated_feature(feature_metadata_id, sample_assay_id)
Get a annotated_feature by feature metadata id and sample_assay.id
- Parameters:
feature_metadata_id (int) – The id of the annotation
sample_assay_id (int) – The id of the SampleAssay
- Returns:
The AnnotatedFeature
- Return type:
- get_annotation_by_cpd_name_and_annotation_method(cpd_name)
Get the annotation by cpd_name
- Parameters:
cpd_name (str) – The name of the compound as seen seen in the annotation datasets
- Returns:
The found AnnotationCompound
- Return type:
- get_or_add_sample(subject, sample_row_index)
Get or add a new sample
- Parameters:
subject (
phenomedb.models.subject) – The subject of the samplesample_row_index (in) – The dataset row index the sample
- Returns:
sample object
phenomedb.models.Sample- Return type:
class:phenomedb.models.Sample
- get_or_add_subject(sample_row_index)
Get or add a new subject
- Parameters:
index (int) – The row number of the sample_metadata
- Returns:
subject object
phenomedb.models.subject- Return type:
class:phenomedb.models.subject
- get_or_build_sample_name(sample_row_index, column=None)
Get or build the sample name
- Parameters:
sample_row_index (str, optional) – The row index of the sample metadata
column – The column name to use, default None
- Raises:
Exception – Sample ID not found
- Returns:
The sample name
- Return type:
str
- get_or_build_subject_name(index, sample_metadata_row, sample_type_column_name='SampleType', subject_column_name='Subject ID')
Get or build subject name
- Parameters:
index (int) – The row index
sample_metadata_row (
numpy.Series) – The sample_metadata rowsample_type_column_name (str, optional) – The name of the column with SampleType, defaults to ‘SampleType’
subject_column_name (str, optional) – The name of the column with the Subject ID, defaults to ‘Subject ID’
- Returns:
The subject name
- Return type:
str
- get_project()
Gets a project to the database (by project_name)
- Raises:
Exception – If no project by that name exists
- Returns:
sample_assay object
phenomedb.models.Project- Return type:
class:phenomedb.models.Project
- get_sample_assay(sample, sample_row_index)
Get or add a new sample_assay
- Parameters:
sample (
phenomedb.models.sample) – The sample of the sample_assaysample_row_index (int) – The dataset row index of the sample
- Returns:
sample_assay object
phenomedb.models.SampleAssay- Return type:
class:phenomedb.models.sample_assay
- abstract load_dataset()
Load that dataset. This method should be over-ridden in the task class. This method can call a python package or do a system call.
- abstract map_and_add_dataset_data()
Map the dataset and import to phenomeDB
- parse_value(value)
Parse a raw value to convert the necessary type
- Parameters:
value (object) – The raw value to convert
- Returns:
The parsed, converted value
- Return type:
float
- process()
Main method
- send_user_failure_email(err)
Send a TaskRun failure email to the user
- Parameters:
err (str) – The error message
- send_user_success_email()
Send a TaskRun success email to the user
- Parameters:
err (str) – The error message
- simple_report()
Check the database contains entries for all records in the dataset
- task_validation()
Validate the task - default method
- class phenomedb.imports.ImportXCMSFeatures(project_name=None, unified_csv_path=None, sample_matrix=None, assay_name=None, task_run_id=None, username=None, db_env=None)
XCMSFeatureImportTaskUnifiedCSV Class. Using the Unified CSV format, imports an XCMS Dataset, maps to phenomeDB.models, and commits to DB.
TO DO: Finish the import code
- Parameters:
unified_csv_path (str, optional) – The path to the unified csv file, defaults to None.
assay_name (str, optional) – The name of the assay (ie LC-QQQ Bile Acids), defaults to None.
sample_matrix (str, optional) – The sample matrix (ie urine, plasma), defaults to None.
project_name (str, optional) – The name of the project, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID
- build_feature_dict(sample_row_index, feature_column_index)
Build a dictionary of features
- Parameters:
sample_row_index (int) – The row index of the sample metadata dataset
feature_column_index (int) – The column index of the feature metadata dataset
- Returns:
A dictionary of FeatureMetadata properties
- Return type:
dict
- get_or_add_assay()
Gets or adds an assay to the database (by assay_name)
- load_dataset()
Loads the XCMS dataset, sets the name, and the loads the sampleInfo
- map_and_add_dataset_data()
Map and add the dataset data
- Raises:
Exception – No Sampling ID or Sample File Name in row
- phenomedb.imports.getrandbits(k) x. Generates an int with k random bits.
- phenomedb.imports.random() x in the interval [0, 1).