pytesmo.validation_framework.data_scalers module

Data scaler classes to be used together with the validation framework.

class pytesmo.validation_framework.data_scalers.CDFStoreParamsScaler(path, grid, percentiles=[0, 5, 10, 30, 50, 70, 90, 95, 100], **matcher_kwargs)[source]

Bases: object

CDF scaling using stored parameters if available. If stored parameters are not available they are calculated and written to disk.

Parameters:
  • path (string) – Path where the data is/should be stored

  • grid (pygeogrids.grids.CellGrid instance) – Grid on which the data is stored. Should be the same as the spatial reference grid of the validation framework instance in which this scaler is used.

  • percentiles (list or np.ndarray) – Percentiles to use for CDF matching

  • **matcher_kwargs (keyword arguments) – Passed on to pytesmo.cdf_matching.CDFMatching`

calc_parameters(data, reference_index)[source]

Calculate the percentiles used for CDF matching.

Parameters:
  • data (pandas.DataFrame) – temporally matched dataset

  • reference_index (int) – Index of the reference column in the dataset.

Returns:

matchers – keys -> Names of columns in the input data frame values -> nbins x 3 numpy.ndarrays with columns x_perc, y_perc,

percentiles

Return type:

dictionary

get_parameters(data, reference_index, gpi)[source]

Function to get scaling parameters. Try to load them, if they are not found we calculate them and store them.

Parameters:
  • data (pandas.DataFrame) – temporally matched dataset

  • gpi (int) – grid point index of self.grid

Returns:

params – keys -> Names of columns in the input data frame values -> numpy.ndarrays with the percentiles

Return type:

dictionary

load_parameters(gpi)[source]
scale(data, reference_index, gpi_info)[source]

Scale all columns in data to the column at the reference_index.

Parameters:
  • data (pandas.DataFrame) – temporally matched dataset

  • reference_index (int) – Which column of the data contains the scaling reference.

  • gpi_info (tuple) – tuple of at least, (gpi, lon, lat) Where gpi has to be the grid point indices of the grid of this scaler.

Raises:

ValueError – if scaling is not successful

store_parameters(gpi, parameters)[source]

Store parameters for gpi into netCDF file.

Parameters:
  • gpi (int) – grid point index of self.grid

  • params (dictionary) – keys -> Names of columns in the input data frame values -> numpy.ndarrays with the percentiles

class pytesmo.validation_framework.data_scalers.DefaultScaler(method)[source]

Bases: object

Scaling class that implements the scaling based on a given method from the pytesmo.scaling module.

Parameters:

method (string) – The data will be scaled into the reference space using the method specified by this string.

scale(data, reference_index, gpi_info)[source]

Scale all columns in data to the column at the reference_index.

Parameters:
  • data (pandas.DataFrame) – temporally matched dataset

  • reference_index (int) – Which column of the data contains the scaling reference.

  • gpi_info (tuple) – tuple of at least, (gpi, lon, lat) Where gpi has to be the grid point indices of the grid of this scaler.

Raises:

ValueError – if scaling is not successful