pytesmo.validation_framework.data_manager module

class pytesmo.validation_framework.data_manager.DataManager(datasets, ref_name, period=None, read_ts_names='read', upscale_parms=None)[source]

Bases: MixinReadTs

Class to handle the data management.

Parameters:
  • datasets (dict of dicts) –

    :Keys : string, datasets names :Values : dict, containing the following fields

    ’class’object

    Class containing the method read for reading the data.

    ’columns’list

    List of columns which will be used in the validation process.

    ’args’list, optional

    Args that are passed to the reading function.

    ’kwargs’dict, optional

    Kwargs that are passed to the reading function.

    ’grids_compatible’boolean, optional

    If set to True the grid point index is used directly when reading other, if False then lon, lat is used and a nearest neighbour search is necessary. default: False

    ’use_lut’boolean, optional

    If set to True the grid point index (obtained from a calculated lut between reference and other) is used when reading other, if False then lon, lat is used and a nearest neighbour search is necessary. default: False

    ’max_dist’float, optional

    Maximum allowed distance in meters for the lut calculation. default: None

  • ref_name (string) – Name of the reference dataset. The reference dataset is used as spatial reference, i.e. all other dataset will be interpolated to the locations of the reference dataset.

  • period (list, optional) – Of type [datetime start, datetime end]. If given then the two input datasets will be truncated to start <= dates <= end.

  • read_ts_names (string or dict of strings, optional) – if another method name than ‘read’ should be used for reading the data then it can be specified here. If it is a dict then specify a function name for each dataset.

  • upscale_parms (dict, optional. Default is None.) –

    dictionary with parameters for the upscaling methods. Keys:
    • ’upscaling_method’: method for upscaling

    • ’temporal_stability’: bool for using temporal stability

    • ’upscaling_lut’: dict of shape

      {‘other_name’:{ref gpi: [other gpis]}}

use_lut(other_name)

Returns lut between reference and other if use_lut for other dataset was set to True.

get_result_names()

Return results names based on reference and others names.

read_reference(*args)[source]

Function to read and prepare the reference dataset.

read_other(other_name, \*args)[source]

Function to read and prepare the other datasets.

property ds_dict
get_data(gpi, lon, lat)[source]

Get all the data from this manager for a certain grid point, longitude, latidude combination.

Parameters:
  • gpi (int) – grid point indices

  • lon (float) – grid point longitude

  • lat (type) – grid point latitude

Returns:

df_dict – Dictionary with dataset names as the key and pandas.DataFrames containing the data for the point as values. The dict will be empty if no data is available.

Return type:

dict of pandas.DataFrames

get_luts()[source]

Returns luts between reference and others if use_lut for other datasets was set to True.

Returns:

luts – Keys: other datasets names Values: lut between reference and other, or None

Return type:

dict

get_other_data(gpi, lon, lat)[source]

Get all the data for non reference datasets from this manager for a certain grid point, longitude, latidude combination.

Parameters:
  • gpi (int) – grid point indices

  • lon (float) – grid point longitude

  • lat (type) – grid point latitude

Returns:

other_dataframes – Dictionary with dataset names as the key and pandas.DataFrames containing the data for the point as values. The dict will be empty if no data is available.

Return type:

dict of pandas.DataFrames

get_results_names(n=2)[source]
read_other(name, *args)[source]

Function to read and prepare non-reference datasets.

Calls read of the dataset.

Takes either 1 (gpi) or 2 (lon, lat) arguments.

Parameters:
  • name (string) – Name of the other dataset.

  • gpi (int) – Grid point index

  • lon (float) – Longitude of point

  • lat (float) – Latitude of point

Returns:

data_df – Data DataFrame.

Return type:

pandas.DataFrame or None

read_reference(*args)[source]

Function to read and prepare the reference dataset.

Calls read of the dataset. Takes either 1 (gpi) or 2 (lon, lat) arguments.

Parameters:
  • gpi (int) – Grid point index

  • lon (float) – Longitude of point

  • lat (float) – Latitude of point

Returns:

ref_df – Reference dataframe.

Return type:

pandas.DataFrame or None

pytesmo.validation_framework.data_manager.flatten(seq)[source]
pytesmo.validation_framework.data_manager.get_result_combinations(ds_dict, n=2)[source]

Get all possible combinations dataset columns

Parameters:
  • ds_dict (dict) – Dict of lists containing the dataset names as keys and a list of the columns to read from the dataset as values.

  • n (int) – Number of datasets for combine with each other. If n=2 always two datasets will be combined into one result. If n=3 always three datasets will be combined into one results and so on. n has to be <= the number of total datasets.

Returns:

results_names – Containing all possible combinations of (dataset_x.column, dataset_y.column) for all datasets in ds_dict

Return type:

list of tuples

pytesmo.validation_framework.data_manager.get_result_names(ds_dict, refkey, n=2)[source]

Return result names based on all possible combinations based on a reference dataset.

Parameters:
  • ds_dict (dict) – Dict of lists containing the dataset names as keys and a list of the columns to read from the dataset as values.

  • refkey (string) – dataset name to use as a reference

  • n (int) – Number of datasets for combine with each other. If n=2 always two datasets will be combined into one result. If n=3 always three datasets will be combined into one results and so on. n has to be <= the number of total datasets.

Returns:

results_names – Containing all combinations of (referenceDataset.column, otherDataset.column)

Return type:

list of tuples