phenomedb.cache
- class phenomedb.cache.Cache
The Cache object is an abstracted interface to the redis and file cache.
Items in the cache are stored in redis for 24 hours, and on disk for 30 days.
This means we reduce the memory footprint without losing the performance of the cache (ie not having to load from database).
Methods to get, set, and expire objects
- delete(key)
Delete a key from the cache
- Parameters:
key (str) – The key of the item
- delete_keys_by_regex(regex)
Delete any key that matches the regex
- Parameters:
regex (str) – The regex to match on
- delete_test_keys()
Delete any key with TEST in the name
- exists(key)
Check whether the key exists in the cache
- Parameters:
key (str) – The key of the item to check
- Returns:
Whether the key exists in the cache
- Return type:
bool
- flushall(include_task_cache=False)
Flush/delete all the data
- Parameters:
include_task_cache (bool, optional) – Whether to flush the task cache, defaults to False
- generate_file_cache_list()
Generate the file cache list and store in redis for quick reference
- get(key)
Get an object from the cache. Checks Redis first, then the FileCache
- Parameters:
key (str) – The key of the item to retrieve
- Returns:
The object to return
- Return type:
object
- get_cache_keys_dataframe(include_task_cache=False, include_analysis_view_cache=False)
Get a dataframe of the keys in the cache (used to store a persistent record on disk)
- Parameters:
include_task_cache (bool, optional) – Whether to include the task_cache, defaults to False
include_analysis_view_cache (bool, optional) – Whether to include the analysis_view_cache, defaults to False
- Returns:
a dataframe of the keys
- Return type:
pandas.DataFrame
- get_keys_dict(include_task_cache=False, include_analysis_view_cache=False)
Builds a dictionary of the keys in the cache
- Parameters:
include_task_cache (bool, optional) – Whether to include the task cache, defaults to False
include_analysis_view_cache (bool, optional) – Whether to include the analysis_view_cache, defaults to False
- Returns:
a dictionary of the keys in the cache
- Return type:
dict
- key_filename(key)
Get the filename for the key
- Parameters:
key (str) – The key of the item.
- Returns:
The filename for the key
- Return type:
str
- load_cache_from_file(key)
Load the cache from the file
- Parameters:
key (str) – The key of the item
- Returns:
The object to return
- Return type:
object
- load_file_cache_list()
Loads the file cache list from redis
- set(key, value, ex=None)
Set an object in the cache.
- Parameters:
key (str) – The key of the item to set.
value (object) – The item to set
- class phenomedb.cache.CreateSavedQueryDataframeCache(username=None, task_run_id=None, saved_query_id=None, class_level=None, class_type=None, output_model='AnnotatedFeature', master_unit=None, correction_type=None, db_env=None, db_session=None, execution_date=None, reload_cache=True, pipeline_run_id=None, upstream_task_run_id=None)
Task to Create a SavedQuery Dataframe Cache. Takes a SavedQuery, and generates the cache for the dataframe
- Parameters:
saved_query_id (int, optional) – The ID of the SavedQuery, defaults to None
master_unit (str, optional) – The master unit to harmonise units against, defaults to ‘mmol/L’
class_level (str, optional) – Query Aggregration class level (for Compounds), defaults to None
class_type (str, optional) – Query Aggregration class type, defaults to None
output_model (str, optional) – The output model of the query, defaults to ‘AnnotatedFeature’
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
pipeline_run_id (str, optional) – The Pipeline run ID
- process()
Process method, loads the SavedQuery, QueryFactory, and generates the dataframe cache
- class phenomedb.cache.CreateSavedQuerySummaryStatsCache(username=None, task_run_id=None, saved_query_id=None, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None, upstream_task_run_id=None)
Task to Create a SavedQuery Summary Stats Cache. Takes a SavedQuery, and generates the cache for the summary stats
- Parameters:
saved_query_id (int, optional) – The ID of the SavedQuery, defaults to None
task_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
pipeline_run_id (str, optional) – The Pipeline run ID
- process()
Process method. Loads the summary statistics and saves in Cache
- class phenomedb.cache.CreateTaskViewCache(username=None, caching_task_run_id=None, task_run_id=None, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None, upstream_task_run_id=None)
- process()
Process method
- class phenomedb.cache.MoveTaskOutputToCache(username=None, task_run_id=None, highest_finished=None, update_db=False, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None, upstream_task_run_id=None)
Move the task output to the cache. This was created to move the
phenomedb.models.TaskRunoutput to the cache, to free up database space and simplify data restore- Parameters:
Task (_type_) – _description_
- Raises:
Exception – _description_
- class phenomedb.cache.RemoveUntransformedDataFromCache(username=None, task_run_id=None, lowest_finished=None, db_env=None, db_session=None, execution_date=None, pipeline_run_id=None, upstream_task_run_id=None)
Goes through all the task cache and removes the untransformed data from the output cache, which was causing bloat
- Parameters:
lowest_finished (int, optional) – The lowest
phenomedb.models.TaskRunID to start from, defaults to Nonetask_run_id (float, optional) – The TaskRun ID
username (str, optional) – The username of the user running the job, defaults to None
db_env (str, optional) – The db_env to use, ‘PROD’ or ‘TEST’, default ‘PROD’
db_session (object, optional) – The db_session to use
execution_date (str, optional) – The date of execution, str format.
pipeline_run_id – The Pipeline run ID