phenomedb.analysis

class phenomedb.analysis.AnalysisTask(query_factory=None, saved_query_model='AnnotatedFeature', saved_query_id=None, task_run_id=None, username=None, correction_type=None, exclude_na_metadata_samples=False, exclude_na_metadata_columns=False, output_dir=None, db_env=None, db_session=None, execution_date=None, columns_to_exclude=None, exclude_one_factor_columns=False, columns_to_include=None, class_level=None, class_type=None, only_harmonised_metadata=False, only_metadata=False, scaling=None, transform=None, reload_cache=False, validate=True, aggregate_function=None, harmonise_annotations=False, upstream_task_run_id=None, exclude_samples_with_na_feature_values=False, include_metadata=False, exclude_features_with_na_feature_values=False, include_default_columns=True, include_harmonised_metadata=True, drop_sample_column=False, exclude_features_not_in_all_projects=False, sample_types=None, assay_roles=None, pipeline_run_id=None)

The base AnalysisTask Task. Extend this Task to create your own methods.

Parameters:

query_factory (phenomedb.query_factory.QueryFactory, optional) – QueryFactory, a handle to the phenomedb.query_factory.QueryFactory object that defined the cohort, defaults to None
saved_query_model (str, optional) – The output model of the query, defaults to ‘AnnotatedFeature’
saved_query_id (int, optional) – SavedQuery.id of the query, (typical usage), defaults to None
task_run_id (int, optional) – The TaskRun.id, defaults to None
username (str, optional) – The username of the user running the task, defaults to None
correction_type (str, optional) – The CorrectionType to pass to the Query (e.g. SR, LTR), defaults to None
exclude_na_metadata_samples (bool, optional) – Whether to exclude samples that have na values for their metadata columns, defaults to False
exclude_na_metadata_columns (bool, optional) – Whether to exclude metadata columns that have na values, defaults to False
output_dir (str, optional) – Output directory for function, defaults to None
db_env (str, optional) – Database environment, ‘PROD’,’BETA’,’TEST’, defaults to None
db_session (object, optional) – Database session, defaults to None
execution_date (DateTime.DateTime, optional) – Datetime of execution, defaults to None
columns_to_exclude (list, optional) – Which columns to exclude, defaults to None
exclude_one_factor_columns (bool, optional) – Exclude columns with only one factor, defaults to False
columns_to_include (list, optional) – Which columns to include, defaults to None
class_level (str, optional) – Query Aggregration class level (for Compounds), defaults to None
class_type (str, optional) – Query Aggregration class type, defaults to None
only_harmonised_metadata (bool, optional) – Only include harmonised metadata fields, defaults to False
only_metadata (bool, optional) – Only include metadata fields, defaults to False
scaling (str, optional) – Which scaling to use, ‘pa’, ‘uv’, ‘med’, defaults to None
transform (str, optional) – Which transformation to use, ‘log’, ‘sqrt’, defaults to None
reload_cache (bool, optional) – Whether to reload the cache for the Query, defaults to False
validate (bool, optional) – Whether to validate the Task by running the validate() method, defaults to True
aggregate_function (str, optional) – Which Query aggregration function to use (mean, median, sum, avg), defaults to None
harmonise_annotations (bool, optional) – Whether to use harmonised annotations, defaults to False
upstream_task_run_id (int, optional) – The upstream TaskRun.id, defaults to None
exclude_samples_with_na_feature_values (bool, optional) – Exclude samples with na feature values, defaults to False
include_metadata (bool, optional) – Whether to include metadata or not, defaults to False
exclude_features_with_na_feature_values (bool, optional) – Exclude features with na feature values, defaults to False
include_default_columns (bool, optional) – Whether to include default columns, defaults to True
include_harmonised_metadata (bool, optional) – Whether to include harmonised metadata, defaults to True
drop_sample_column (bool, optional) – Drop the sample column, defaults to False
exclude_features_not_in_all_projects (bool, optional) – Exclude features not in all projects, defaults to False
sample_types (list, optional) – SampleTypes to include (StudySample, StudyReference, ExternalReference), defaults to None
assay_roles (list, optional) – AssayRoles to include (Assay, LinearityReference, PrecisionReference), defaults to None
pipeline_run_id (int, optional) – The TaskRun.pipeline_run_id, defaults to None

load_data()

Load data method. Takes the query factory or saved_query_id and loads the dataframes

Raises:: Exception – If no QueryFactory or SavedQuery object

process(): Main process method. Runs load_data(), run_analysis(), save_results()

run_analysis(): Runs the analysis. Override this method

save_results(): Save the results into AnalysisResult database table

class phenomedb.analysis.NPYCTask(query_factory=None, saved_query_model='AnnotatedFeature', saved_query_id=None, task_run_id=None, username=None, correction_type=None, exclude_na_metadata_samples=False, exclude_na_metadata_columns=False, output_dir=None, db_env=None, db_session=None, execution_date=None, columns_to_exclude=None, exclude_one_factor_columns=False, columns_to_include=None, class_level=None, class_type=None, only_harmonised_metadata=False, only_metadata=False, scaling=None, transform=None, reload_cache=False, validate=True, aggregate_function=None, harmonise_annotations=False, upstream_task_run_id=None, exclude_samples_with_na_feature_values=False, include_metadata=False, exclude_features_with_na_feature_values=False, include_default_columns=True, include_harmonised_metadata=True, drop_sample_column=False, exclude_features_not_in_all_projects=False, sample_types=None, assay_roles=None, pipeline_run_id=None)

process(): Main process method. Runs load_data(), run_analysis(), save_results()

class phenomedb.analysis.RAnalysisTask(query_factory=None, saved_query_id=None, username=None, task_run_id=None, scaling=None, transform=None, db_env=None, db_session=None, execution_date=None, exclude_na_metadata_samples=False, exclude_na_metadata_columns=False, reload_cache=False, columns_to_include=None, columns_to_exclude=None, exclude_one_factor_columns=True, only_harmonised_metadata=True, only_metadata=False, drop_sample_column=False, class_level=None, aggregate_function=None, class_type=None, saved_query_model='AnnotatedFeature', correction_type=None, harmonise_annotations=False, upstream_task_run_id=None, exclude_samples_with_na_feature_values=False, include_metadata=False, exclude_features_with_na_feature_values=False, include_default_columns=True, include_harmonised_metadata=True, exclude_features_not_in_all_projects=False, sample_types=None, assay_roles=None, pipeline_run_id=None)

run_analysis(): Runs the analysis. Override this method

class phenomedb.analysis.RunMWAS(query_factory=None, saved_query_id=None, username=None, task_run_id=None, comment=None, model_Y_variable=None, model_X_variables=None, reload_cache=False, method='linear', correction_type=None, scaling=None, transform=None, upstream_task_run_id=None, pipeline_run_id=None, include_harmonised_metadata=True, db_env=None, db_session=None, execution_date=None, multiple_correction='BH', features_to_include=None, bootstrap=False, save_models=False, exclude_features_not_in_all_projects=True, harmonise_annotations=True, model_Y_ci=None, model_Y_min=None, model_Y_max=None)

Run an MWAS analysis. Uses the R package MWASTools :link:`https://www.bioconductor.org/packages/release/bioc/html/MWASTools.html`

Parameters:

model_Y_variable (str) – The output variable to measure association against, for example h_metadata::Age for the harmonised age
model_X_variables (str) – Covariates for the test of association
method – Which association method to use, one of: linear, logistic, pearson, spearman, kendall
correction_method (str) – Which multiple-testing correction to use, one of: bonferroni, benjamin-hochberg,
model_Y_ci (float) – A confidence interval value to limit the range of Y valued samples to include (e.g. 0.95)
model_Y_min (float) – The minimum range value of Y to include (eg. 20)
model_Y_max (boolean) – The maximum range value of Y to include (eg 80)
bootstrap (boolean) – Whether to run bootstrapping (takes a long time) default False
bootstrap – Whether to save the models as well as the summary statistics/coefficients, default False
query_factory (phenomedb.query_factory.QueryFactory, optional) – QueryFactory, a handle to the phenomedb.query_factory.QueryFactory object that defined the cohort, defaults to None
saved_query_model (str, optional) – The output model of the query, defaults to ‘AnnotatedFeature’
saved_query_id (int, optional) – SavedQuery.id of the query, (typical usage), defaults to None
task_run_id (int, optional) – The TaskRun.id, defaults to None
username (str, optional) – The username of the user running the task, defaults to None
correction_type (str, optional) – The CorrectionType to pass to the Query (e.g. SR, LTR), defaults to None
exclude_na_metadata_samples (bool, optional) – Whether to exclude samples that have na values for their metadata columns, defaults to False
exclude_na_metadata_columns (bool, optional) – Whether to exclude metadata columns that have na values, defaults to False
output_dir (str, optional) – Output directory for function, defaults to None
db_env (str, optional) – Database environment, ‘PROD’,’BETA’,’TEST’, defaults to None
db_session (object, optional) – Database session, defaults to None
execution_date (DateTime.DateTime, optional) – Datetime of execution, defaults to None
columns_to_exclude (list, optional) – Which columns to exclude, defaults to None
exclude_one_factor_columns (bool, optional) – Exclude columns with only one factor, defaults to False
columns_to_include (list, optional) – Which columns to include, defaults to None
class_level (str, optional) – Query Aggregration class level (for Compounds), defaults to None
class_type (str, optional) – Query Aggregration class type, defaults to None
only_harmonised_metadata (bool, optional) – Only include harmonised metadata fields, defaults to False
only_metadata (bool, optional) – Only include metadata fields, defaults to False
scaling (str, optional) – Which scaling to use, ‘pa’, ‘uv’, ‘med’, defaults to None
transform (str, optional) – Which transformation to use, ‘log’, ‘sqrt’, defaults to None
reload_cache (bool, optional) – Whether to reload the cache for the Query, defaults to False
validate (bool, optional) – Whether to validate the Task by running the validate() method, defaults to True
aggregate_function (str, optional) – Which Query aggregration function to use (mean, median, sum, avg), defaults to None
harmonise_annotations (bool, optional) – Whether to use harmonised annotations, defaults to False
upstream_task_run_id (int, optional) – The upstream TaskRun.id, defaults to None
exclude_samples_with_na_feature_values (bool, optional) – Exclude samples with na feature values, defaults to False
include_metadata (bool, optional) – Whether to include metadata or not, defaults to False
exclude_features_with_na_feature_values (bool, optional) – Exclude features with na feature values, defaults to False
include_default_columns (bool, optional) – Whether to include default columns, defaults to True
include_harmonised_metadata (bool, optional) – Whether to include harmonised metadata, defaults to True
drop_sample_column (bool, optional) – Drop the sample column, defaults to False
exclude_features_not_in_all_projects (bool, optional) – Exclude features not in all projects, defaults to False
sample_types (list, optional) – SampleTypes to include (StudySample, StudyReference, ExternalReference), defaults to None
assay_roles (list, optional) – AssayRoles to include (Assay, LinearityReference, PrecisionReference), defaults to None
pipeline_run_id (int, optional) – The TaskRun.pipeline_run_id, defaults to None

save_results(): Save the results into HarmonisedAnnotatedFeature database table

class phenomedb.analysis.RunNPYCReport(username=None, task_run_id=None, db_env=None, db_session=None, execution_date=None, saved_query_id=None, correction_type=None, report_name=None, comment=None, samples_to_exclude=None, exclude_on='Run Order', exclusion_comments=None, reload_cache=False, scaling=None, aggregate_function=None, class_level=None, transform=None, class_type=None, saved_query_model='AnnotatedFeature', harmonise_annotations=False, upstream_task_run_id=None, exclude_features_not_in_all_projects=False, sample_types=None, assay_roles=None, pipeline_run_id=None)

Run a nPYc report.

Parameters:

report_name (str) – The report nPYc report to run, from :link:`https://npyc-toolbox.readthedocs.io/en/latest/reports.html`
query_factory (phenomedb.query_factory.QueryFactory, optional) – QueryFactory, a handle to the phenomedb.query_factory.QueryFactory object that defined the cohort, defaults to None
saved_query_model (str, optional) – The output model of the query, defaults to ‘AnnotatedFeature’
saved_query_id (int, optional) – SavedQuery.id of the query, (typical usage), defaults to None
task_run_id (int, optional) – The TaskRun.id, defaults to None
username (str, optional) – The username of the user running the task, defaults to None
correction_type (str, optional) – The CorrectionType to pass to the Query (e.g. SR, LTR), defaults to None
exclude_na_metadata_samples (bool, optional) – Whether to exclude samples that have na values for their metadata columns, defaults to False
exclude_na_metadata_columns (bool, optional) – Whether to exclude metadata columns that have na values, defaults to False
output_dir (str, optional) – Output directory for function, defaults to None
db_env (str, optional) – Database environment, ‘PROD’,’BETA’,’TEST’, defaults to None
db_session (object, optional) – Database session, defaults to None
execution_date (DateTime.DateTime, optional) – Datetime of execution, defaults to None
columns_to_exclude (list, optional) – Which columns to exclude, defaults to None
exclude_one_factor_columns (bool, optional) – Exclude columns with only one factor, defaults to False
columns_to_include (list, optional) – Which columns to include, defaults to None
class_level (str, optional) – Query Aggregration class level (for Compounds), defaults to None
class_type (str, optional) – Query Aggregration class type, defaults to None
only_harmonised_metadata (bool, optional) – Only include harmonised metadata fields, defaults to False
only_metadata (bool, optional) – Only include metadata fields, defaults to False
scaling (str, optional) – Which scaling to use, ‘pa’, ‘uv’, ‘med’, defaults to None
transform (str, optional) – Which transformation to use, ‘log’, ‘sqrt’, defaults to None
reload_cache (bool, optional) – Whether to reload the cache for the Query, defaults to False
validate (bool, optional) – Whether to validate the Task by running the validate() method, defaults to True
aggregate_function (str, optional) – Which Query aggregration function to use (mean, median, sum, avg), defaults to None
harmonise_annotations (bool, optional) – Whether to use harmonised annotations, defaults to False
upstream_task_run_id (int, optional) – The upstream TaskRun.id, defaults to None
exclude_samples_with_na_feature_values (bool, optional) – Exclude samples with na feature values, defaults to False
include_metadata (bool, optional) – Whether to include metadata or not, defaults to False
exclude_features_with_na_feature_values (bool, optional) – Exclude features with na feature values, defaults to False
include_default_columns (bool, optional) – Whether to include default columns, defaults to True
include_harmonised_metadata (bool, optional) – Whether to include harmonised metadata, defaults to True
drop_sample_column (bool, optional) – Drop the sample column, defaults to False
exclude_features_not_in_all_projects (bool, optional) – Exclude features not in all projects, defaults to False
sample_types (list, optional) – SampleTypes to include (StudySample, StudyReference, ExternalReference), defaults to None
assay_roles (list, optional) – AssayRoles to include (Assay, LinearityReference, PrecisionReference), defaults to None
pipeline_run_id (int, optional) – The TaskRun.pipeline_run_id, defaults to None

load_data()

Load data method. Takes the query factory or saved_query_id and loads the dataframes

Raises:: Exception – If no QueryFactory or SavedQuery object

run_analysis(): Runs the analysis. Override this method

class phenomedb.analysis.RunPCA(max_components=10, scaling=None, transform=None, minQ2=0.05, username=None, task_run_id=None, db_env=None, harmonise_annotations=True, db_session=None, execution_date=None, query_factory=None, saved_query_id=None, correction_type=None, reload_cache=False, validate=True, saved_query_model='AnnotatedFeature', class_level=None, class_type=None, aggregate_function=None, upstream_task_run_id=None, include_harmonised_metadata=True, exclude_features_not_in_all_projects=True, sample_types=None, assay_roles=None, pipeline_run_id=None)

RunPCA. Run a PCA using the pyChemometrics PCA function.

Scaling is done by ChemometricsScaler() as part of the model, NOT the QueryFactory Scaler

Uses SampleType masks. Masks can be specified if required.

Parameters:

max_components (int, optional_run) – The max number of Principle Components, defaults to 10
scaling (str, optional) – Which kind of scaling to use, ‘mc’: mean-centred, ‘uv’: univariate, ‘pa’: pareto. defaults to ‘uv’
minQ2 (float, optional) – minQ2 for number of PC optimisation, defaults to 0.05
query_factory (phenomedb.query_factory.AnnotatedFeatureFactory, optional) – The AnnotatedFeatureFactory object to load results from, defaults to None
saved_query_id (int, optional) – The ID of the SavedQuery to load results from, defaults to None
annotations_only (bool, optional) – Use only those annotated_features with annotations, defaults to False
upstream_task_run_id (int) – The Upstream Task Run ID
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

run_analysis(): Run the PCA analysis using the specified options.

class phenomedb.analysis.RunPCPR2(query_factory=None, saved_query_id=None, username=None, task_run_id=None, pct_threshold=0.95, db_env=None, db_session=None, execution_date=None, exclude_na_metadata_samples=True, exclude_na_metadata_columns=True, columns_to_exclude=None, columns_to_include=None, scaling=None, transform=None, only_harmonised_metadata=True, reload_cache=False, only_metadata=False, include_metadata=False, assay_roles=None, correction_type=None, aggregate_function=None, class_level=None, class_type=None, saved_query_model='AnnotatedFeature', sample_types=None, pipeline_run_id=None, harmonise_annotations=True, upstream_task_run_id=None, include_harmonised_metadata=True, exclude_features_not_in_all_projects=True)

Run a PCPR2 analysis. Uses the R package PCPR2 under the hood :link:`https://github.com/JoeRothwell/pcpr2`

Parameters:

pct_threshold (float, optional) – _description_, defaults to 0.95
query_factory (phenomedb.query_factory.QueryFactory, optional) – QueryFactory, a handle to the phenomedb.query_factory.QueryFactory object that defined the cohort, defaults to None
saved_query_model (str, optional) – The output model of the query, defaults to ‘AnnotatedFeature’
saved_query_id (int, optional) – SavedQuery.id of the query, (typical usage), defaults to None
task_run_id (int, optional) – The TaskRun.id, defaults to None
username (str, optional) – The username of the user running the task, defaults to None
correction_type (str, optional) – The CorrectionType to pass to the Query (e.g. SR, LTR), defaults to None
exclude_na_metadata_samples (bool, optional) – Whether to exclude samples that have na values for their metadata columns, defaults to False
exclude_na_metadata_columns (bool, optional) – Whether to exclude metadata columns that have na values, defaults to False
output_dir (str, optional) – Output directory for function, defaults to None
db_env (str, optional) – Database environment, ‘PROD’,’BETA’,’TEST’, defaults to None
db_session (object, optional) – Database session, defaults to None
execution_date (DateTime.DateTime, optional) – Datetime of execution, defaults to None
columns_to_exclude (list, optional) – Which columns to exclude, defaults to None
exclude_one_factor_columns (bool, optional) – Exclude columns with only one factor, defaults to False
columns_to_include (list, optional) – Which columns to include, defaults to None
class_level (str, optional) – Query Aggregration class level (for Compounds), defaults to None
class_type (str, optional) – Query Aggregration class type, defaults to None
only_harmonised_metadata (bool, optional) – Only include harmonised metadata fields, defaults to False
only_metadata (bool, optional) – Only include metadata fields, defaults to False
scaling (str, optional) – Which scaling to use, ‘pa’, ‘uv’, ‘med’, defaults to None
transform (str, optional) – Which transformation to use, ‘log’, ‘sqrt’, defaults to None
reload_cache (bool, optional) – Whether to reload the cache for the Query, defaults to False
validate (bool, optional) – Whether to validate the Task by running the validate() method, defaults to True
aggregate_function (str, optional) – Which Query aggregration function to use (mean, median, sum, avg), defaults to None
harmonise_annotations (bool, optional) – Whether to use harmonised annotations, defaults to False
upstream_task_run_id (int, optional) – The upstream TaskRun.id, defaults to None
exclude_samples_with_na_feature_values (bool, optional) – Exclude samples with na feature values, defaults to False
include_metadata (bool, optional) – Whether to include metadata or not, defaults to False
exclude_features_with_na_feature_values (bool, optional) – Exclude features with na feature values, defaults to False
include_default_columns (bool, optional) – Whether to include default columns, defaults to True
include_harmonised_metadata (bool, optional) – Whether to include harmonised metadata, defaults to True
drop_sample_column (bool, optional) – Drop the sample column, defaults to False
exclude_features_not_in_all_projects (bool, optional) – Exclude features not in all projects, defaults to False
sample_types (list, optional) – SampleTypes to include (StudySample, StudyReference, ExternalReference), defaults to None
assay_roles (list, optional) – AssayRoles to include (Assay, LinearityReference, PrecisionReference), defaults to None
pipeline_run_id (int, optional) – The TaskRun.pipeline_run_id, defaults to None

class phenomedb.analysis.RunWilcoxonRankTest(saved_query_one=None, saved_query_two=None, username=None, task_run_id=None, reload_cache=False, scaling=None, transform=None, upstream_task_run_id=None, pipeline_run_id=None, include_harmonised_metadata=True, db_env=None, db_session=None, execution_date=None, correction_type=None, exclude_features_not_in_all_projects=True, harmonise_annotations=True)

Run a Wilcoxon Rank Test (not implemented)

Parameters:: RAnalysisTask (_type_) – _description_
Returns:: _description_
Return type:: _type_

class phenomedb.analysis.RunXCMS(username=None, task_run_id=None, db_env=None, db_session=None, execution_date=None, upstream_task_run_id=None, pipeline_run_id=None, chromatography=None, metabolights_study_id=None, lab=None, input_dir=None, sample_matrix=None, centwave_prefilter=None, centwave_peakwidth=None, centwave_mzdiff=None, centwave_snthresh=None, centwave_ppm=None, centwave_noise=None, centwave_mzCenterFun=None, centwave_integrate=None, peakdensity_minFraction=None, peakdensity_minSamples=None, peakdensity_bw=None, peakdensity_binSize=None)

Run XCMS

Parameters:

chromatography (_type_, optional) – _description_, defaults to None
metabolights_study_id (_type_, optional) – _description_, defaults to None
lab (_type_, optional) – _description_, defaults to None
input_dir (_type_, optional) – _description_, defaults to None
sample_matrix (_type_, optional) – _description_, defaults to None
centwave_prefilter (_type_, optional) – _description_, defaults to None
centwave_peakwidth (_type_, optional) – _description_, defaults to None
centwave_mzdiff (_type_, optional) – _description_, defaults to None
centwave_snthresh (_type_, optional) – _description_, defaults to None
centwave_ppm (_type_, optional) – _description_, defaults to None
centwave_noise (_type_, optional) – _description_, defaults to None
centwave_mzCenterFun (_type_, optional) – _description_, defaults to None
centwave_integrate (_type_, optional) – _description_, defaults to None
peakdensity_minFraction (_type_, optional) – _description_, defaults to None
peakdensity_minSamples (_type_, optional) – _description_, defaults to None
peakdensity_bw (_type_, optional) – _description_, defaults to None
peakdensity_binSize (_type_, optional) – _description_, defaults to None
upstream_task_run_id (int) – The Upstream Task Run ID
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
validate (boolean) – Whether to run validation, default True
pipeline_run_id (str, optional) – The Pipeline run ID

Raises:

Exception – _description_
Exception – _description_
Exception – _description_
Exception – _description_
Exception – _description_

load_data()

Load data method. Takes the query factory or saved_query_id and loads the dataframes

Raises:: Exception – If no QueryFactory or SavedQuery object