pytesmo.scaling module

Created on Apr 17, 2013

@author: Christoph Paulik christoph.paulik@geo.tuwien.ac.at

pytesmo.scaling.add_scaled(df, method='linreg', label_in=None, label_scale=None, **kwargs)[source]

takes a dataframe and appends a scaled time series to it. If no labels are given the first column will be scaled to the second column of the DataFrame

Parameters:
  • df (pandas.DataFrame) – input dataframe

  • method (string) – scaling method

  • label_in (string, optional) – the column of the dataframe that should be scaled to that with label_scale default is the first column

  • label_scale (string, optional) – the column of the dataframe the label_in column should be scaled to default is the second column

Returns:

df – input dataframe with new column labeled label_in+’_scaled_’+method

Return type:

pandas.DataFrame

pytesmo.scaling.cdf_beta_match(*args, **kwargs)[source]
pytesmo.scaling.cdf_match(src, ref, nbins=100, minobs=20, linear_edge_scaling=True, percentiles=None, combine_invalid=True, max_val=None, min_val=None)[source]

Rescales by CDF matching.

This calculates the empirical CDFs for source and reference dataset using a specified number of bins. In case of non-unique percentile values, a beta distribution is fitted to the CDF. For more robust estimation of the lower and upper bins, linear edge scaling is used (see Moesinger et al., 2020 for details).

Parameters:
  • src (numpy.array) – input dataset which will be scaled

  • ref (numpy.array) – src will be scaled to this dataset

  • nbins (int, optional) – Number of bins to use for estimation of the CDF

  • percentiles (sequence, optional) – Percentile values to use. If this is given, nbins is ignored. The percentiles might still be changed if minobs is given and the number data per bin is lower. Default is None.

  • minobs (int, optional) – Minimum desired number of observations in a bin for bin resizing. If it is None bins will not be resized. Default is 20.

  • linear_edge_scaling (bool, optional) – Whether to derive the edge parameters via linear regression (more robust, see Moesinger et al. (2020) for more info). Default is True. Note that this way only the outliers in the reference (y) CDF are handled. Outliers in the input data (x) will not be removed and will still show up in the data.

  • combine_invalid (bool, optional) – Optional feature to combine the masks of invalid data (NaN, Inf) of both source (X) and reference (y) data passed to fit. This only makes sense if X and y are both timeseries data corresponding to the same index. In this case, this makes sures that data is only used if values for X and y are available, so that seasonal patterns in missing values in one of them do not lead to distortions. (For example, if X is available the whole year, but y is only available during summer, the distribution of y should not be matched against the whole year CDF of X, because that could introduce systematic seasonal biases). Default is True.

  • max_val (float, optional) – Maximum and minimum values to enforce.

  • min_val (float, optional) – Maximum and minimum values to enforce.

Returns:

CDF matched values – dataset src with CDF as ref

Return type:

numpy.array

pytesmo.scaling.get_scaling_function(method)[source]

Get scaling function based on method name.

Parameters:

method (string) – method name as string

Returns:

scaling_func – function(src:numpy.ndarray, ref:numpy.ndarray) > scaled_src:np.ndarray

Return type:

function

Raises:

KeyError: – if method is not found

pytesmo.scaling.get_scaling_method_lut()[source]

Get all defined scaling methods and their function names.

Returns:

lut – key: scaling method name value: function

Return type:

dictionary

pytesmo.scaling.linreg(src, ref, **kwargs)[source]

scales the input datasets using linear regression

Parameters:
  • src (numpy.array) – input dataset which will be scaled

  • ref (numpy.array) – src will be scaled to this dataset

Returns:

scaled dataset – dataset scaled using linear regression

Return type:

numpy.array

pytesmo.scaling.linreg_params(src, ref)[source]

Calculate additive and multiplicative correction parameters based on linear regression models.

Parameters:
  • src (numpy.array) – Candidate data (to which the corrections apply)

  • ref (numpy.array) – Reference data (which candidate is scaled to)

Returns:

  • slope (float) – Multiplicative correction value

  • intercept (float) – Additive correction value

pytesmo.scaling.linreg_stored_params(src, slope, intercept)[source]

Scale the input data with passed correction values

Parameters:
  • src (numpy.array) – Candidate values, that are scaled

  • slope (float) – Multiplicative correction value

  • intercept (float) – Additive correction value

Returns:

src_scaled – The scaled input values

Return type:

numpy.array

pytesmo.scaling.mean_std(src, ref, **kwargs)[source]

scales the input datasets so that they have the same mean and standard deviation afterwards

Parameters:
  • src (numpy.array) – input dataset which will be scaled

  • ref (numpy.array) – src will be scaled to this dataset

Returns:

scaled dataset – dataset src with same mean and standard deviation as ref

Return type:

numpy.array

pytesmo.scaling.min_max(src, ref, **kwargs)[source]

scales the input datasets so that they have the same minimum and maximum afterwards

Parameters:
  • src (numpy.array) – input dataset which will be scaled

  • ref (numpy.array) – src will be scaled to this dataset

Returns:

scaled dataset – dataset src with same maximum and minimum as ref

Return type:

numpy.array

pytesmo.scaling.scale(df, method='linreg', reference_index=0, **kwargs)[source]

takes pandas.DataFrame and scales all columns to the column specified by reference_index with the chosen method

Parameters:
  • df (pandas.DataFrame) – containing matched time series that should be scaled

  • method (string, optional) – method definition, has to be a function in globals() that takes 2 numpy.array as input and returns one numpy.array of same length

  • reference_index (int, optional) – default 0, column index of reference dataset in dataframe

Returns:

scaled data – all time series of the input DataFrame scaled to the one specified by reference_index

Return type:

pandas.DataFrame