pytesmo.validation_framework.data_scalers module

Data scaler classes to be used together with the validation framework.

class pytesmo.validation_framework.data_scalers.CDFStoreParamsScaler(path, grid, percentiles=[0, 5, 10, 30, 50, 70, 90, 95, 100], **matcher_kwargs)[source]

Bases: object

CDF scaling using stored parameters if available. If stored parameters are not available they are calculated and written to disk.

Parameters:

path (string) – Path where the data is/should be stored
grid (pygeogrids.grids.CellGrid instance) – Grid on which the data is stored. Should be the same as the spatial reference grid of the validation framework instance in which this scaler is used.
percentiles (list or np.ndarray) – Percentiles to use for CDF matching
**matcher_kwargs (keyword arguments) – Passed on to pytesmo.cdf_matching.CDFMatching`

calc_parameters(data, reference_index)[source]

Calculate the percentiles used for CDF matching.

Parameters:

data (pandas.DataFrame) – temporally matched dataset
reference_index (int) – Index of the reference column in the dataset.

Returns:

matchers – keys -> Names of columns in the input data frame values -> nbins x 3 numpy.ndarrays with columns x_perc, y_perc,

percentiles

Return type:

dictionary

get_parameters(data, reference_index, gpi)[source]

Function to get scaling parameters. Try to load them, if they are not found we calculate them and store them.

Parameters:

data (pandas.DataFrame) – temporally matched dataset
gpi (int) – grid point index of self.grid

Returns:

params – keys -> Names of columns in the input data frame values -> numpy.ndarrays with the percentiles

Return type:

dictionary

load_parameters(gpi)[source]

scale(data, reference_index, gpi_info)[source]

Scale all columns in data to the column at the reference_index.

Parameters:

data (pandas.DataFrame) – temporally matched dataset
reference_index (int) – Which column of the data contains the scaling reference.
gpi_info (tuple) – tuple of at least, (gpi, lon, lat) Where gpi has to be the grid point indices of the grid of this scaler.

Raises:

ValueError – if scaling is not successful

store_parameters(gpi, parameters)[source]

Store parameters for gpi into netCDF file.

Parameters:

gpi (int) – grid point index of self.grid
params (dictionary) – keys -> Names of columns in the input data frame values -> numpy.ndarrays with the percentiles

class pytesmo.validation_framework.data_scalers.DefaultScaler(method)[source]

Bases: object

Scaling class that implements the scaling based on a given method from the pytesmo.scaling module.

Parameters:: method (string) – The data will be scaled into the reference space using the method specified by this string.

scale(data, reference_index, gpi_info)[source]

Scale all columns in data to the column at the reference_index.

Parameters:

data (pandas.DataFrame) – temporally matched dataset
reference_index (int) – Which column of the data contains the scaling reference.
gpi_info (tuple) – tuple of at least, (gpi, lon, lat) Where gpi has to be the grid point indices of the grid of this scaler.

Raises:

ValueError – if scaling is not successful