phenomedb.cache

class phenomedb.cache.Cache

The Cache object is an abstracted interface to the redis and file cache.

Items in the cache are stored in redis for 24 hours, and on disk for 30 days.

This means we reduce the memory footprint without losing the performance of the cache (ie not having to load from database).

Methods to get, set, and expire objects

delete(key)

Delete a key from the cache

Parameters:

key (str) – The key of the item

delete_keys_by_regex(regex)

Delete any key that matches the regex

Parameters:

regex (str) – The regex to match on

delete_test_keys()

Delete any key with TEST in the name

exists(key)

Check whether the key exists in the cache

Parameters:

key (str) – The key of the item to check

Returns:

Whether the key exists in the cache

Return type:

bool

flushall(include_task_cache=False)

Flush/delete all the data

Parameters:

include_task_cache (bool, optional) – Whether to flush the task cache, defaults to False

generate_file_cache_list()

Generate the file cache list and store in redis for quick reference

get(key)

Get an object from the cache. Checks Redis first, then the FileCache

Parameters:

key (str) – The key of the item to retrieve

Returns:

The object to return

Return type:

object

get_cache_keys_dataframe(include_task_cache=False, include_analysis_view_cache=False)

Get a dataframe of the keys in the cache (used to store a persistent record on disk)

Parameters:
  • include_task_cache (bool, optional) – Whether to include the task_cache, defaults to False

  • include_analysis_view_cache (bool, optional) – Whether to include the analysis_view_cache, defaults to False

Returns:

a dataframe of the keys

Return type:

pandas.DataFrame

get_keys_dict(include_task_cache=False, include_analysis_view_cache=False)

Builds a dictionary of the keys in the cache

Parameters:
  • include_task_cache (bool, optional) – Whether to include the task cache, defaults to False

  • include_analysis_view_cache (bool, optional) – Whether to include the analysis_view_cache, defaults to False

Returns:

a dictionary of the keys in the cache

Return type:

dict

key_filename(key)

Get the filename for the key

Parameters:

key (str) – The key of the item.

Returns:

The filename for the key

Return type:

str

load_cache_from_file(key)

Load the cache from the file

Parameters:

key (str) – The key of the item

Returns:

The object to return

Return type:

object

load_file_cache_list()

Loads the file cache list from redis

set(key, value, ex=None)

Set an object in the cache.

Parameters:
  • key (str) – The key of the item to set.

  • value (object) – The item to set

class phenomedb.cache.CreateSavedQueryDataframeCache(username=None, task_run_id=None, saved_query_id=None, class_level=None, class_type=None, output_model='AnnotatedFeature', master_unit=None, correction_type=None, db_env=None, db_session=None, execution_date=None, reload_cache=True, pipeline_run_id=None, upstream_task_run_id=None)

Task to Create a SavedQuery Dataframe Cache. Takes a SavedQuery, and generates the cache for the dataframe

Parameters:
  • saved_query_id (int, optional) – The ID of the SavedQuery, defaults to None

  • master_unit (str, optional) – The master unit to harmonise units against, defaults to ‘mmol/L’

  • class_level (str, optional) – Query Aggregration class level (for Compounds), defaults to None

  • class_type (str, optional) – Query Aggregration class type, defaults to None

  • output_model (str, optional) – The output model of the query, defaults to ‘AnnotatedFeature’

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • pipeline_run_id (str, optional) – The Pipeline run ID

process()

Process method, loads the SavedQuery, QueryFactory, and generates the dataframe cache

class phenomedb.cache.CreateSavedQuerySummaryStatsCache(username=None, task_run_id=None, saved_query_id=None, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None, upstream_task_run_id=None)

Task to Create a SavedQuery Summary Stats Cache. Takes a SavedQuery, and generates the cache for the summary stats

Parameters:
  • saved_query_id (int, optional) – The ID of the SavedQuery, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • pipeline_run_id (str, optional) – The Pipeline run ID

process()

Process method. Loads the summary statistics and saves in Cache

class phenomedb.cache.CreateTaskViewCache(username=None, caching_task_run_id=None, task_run_id=None, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None, upstream_task_run_id=None)
process()

Process method

class phenomedb.cache.MoveTaskOutputToCache(username=None, task_run_id=None, highest_finished=None, update_db=False, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None, upstream_task_run_id=None)

Move the task output to the cache. This was created to move the phenomedb.models.TaskRun output to the cache, to free up database space and simplify data restore

Parameters:

Task (_type_) – _description_

Raises:

Exception – _description_

class phenomedb.cache.RemoveUntransformedDataFromCache(username=None, task_run_id=None, lowest_finished=None, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None, upstream_task_run_id=None)

Goes through all the task cache and removes the untransformed data from the output cache, which was causing bloat

Parameters:
  • lowest_finished (int, optional) – The lowest phenomedb.models.TaskRun ID to start from, defaults to None

  • task_run_id (float, optional) – The TaskRun ID

  • username (str, optional) – The username of the user running the job, defaults to None

  • db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’

  • db_session (object, optional) – The db_session to use

  • execution_date (str, optional) – The date of execution, str format.

  • pipeline_run_id – The Pipeline run ID