pytesmo.scaling module
Created on Apr 17, 2013
@author: Christoph Paulik christoph.paulik@geo.tuwien.ac.at
- pytesmo.scaling.add_scaled(df, method='linreg', label_in=None, label_scale=None, **kwargs)[source]
takes a dataframe and appends a scaled time series to it. If no labels are given the first column will be scaled to the second column of the DataFrame
- Parameters:
df (pandas.DataFrame) – input dataframe
method (string) – scaling method
label_in (string, optional) – the column of the dataframe that should be scaled to that with label_scale default is the first column
label_scale (string, optional) – the column of the dataframe the label_in column should be scaled to default is the second column
- Returns:
df – input dataframe with new column labeled label_in+’_scaled_’+method
- Return type:
- pytesmo.scaling.cdf_match(src, ref, nbins=100, minobs=20, linear_edge_scaling=True, percentiles=None, combine_invalid=True, max_val=None, min_val=None)[source]
Rescales by CDF matching.
This calculates the empirical CDFs for source and reference dataset using a specified number of bins. In case of non-unique percentile values, a beta distribution is fitted to the CDF. For more robust estimation of the lower and upper bins, linear edge scaling is used (see Moesinger et al., 2020 for details).
- Parameters:
src (numpy.array) – input dataset which will be scaled
ref (numpy.array) – src will be scaled to this dataset
nbins (int, optional) – Number of bins to use for estimation of the CDF
percentiles (sequence, optional) – Percentile values to use. If this is given, nbins is ignored. The percentiles might still be changed if minobs is given and the number data per bin is lower. Default is
None
.minobs (int, optional) – Minimum desired number of observations in a bin for bin resizing. If it is
None
bins will not be resized. Default is 20.linear_edge_scaling (bool, optional) – Whether to derive the edge parameters via linear regression (more robust, see Moesinger et al. (2020) for more info). Default is
True
. Note that this way only the outliers in the reference (y) CDF are handled. Outliers in the input data (x) will not be removed and will still show up in the data.combine_invalid (bool, optional) – Optional feature to combine the masks of invalid data (NaN, Inf) of both source (X) and reference (y) data passed to fit. This only makes sense if X and y are both timeseries data corresponding to the same index. In this case, this makes sures that data is only used if values for X and y are available, so that seasonal patterns in missing values in one of them do not lead to distortions. (For example, if X is available the whole year, but y is only available during summer, the distribution of y should not be matched against the whole year CDF of X, because that could introduce systematic seasonal biases). Default is True.
max_val (float, optional) – Maximum and minimum values to enforce.
min_val (float, optional) – Maximum and minimum values to enforce.
- Returns:
CDF matched values – dataset src with CDF as ref
- Return type:
numpy.array
- pytesmo.scaling.get_scaling_function(method)[source]
Get scaling function based on method name.
- Parameters:
method (string) – method name as string
- Returns:
scaling_func – function(src:numpy.ndarray, ref:numpy.ndarray) > scaled_src:np.ndarray
- Return type:
function
- Raises:
KeyError: – if method is not found
- pytesmo.scaling.get_scaling_method_lut()[source]
Get all defined scaling methods and their function names.
- Returns:
lut – key: scaling method name value: function
- Return type:
dictionary
- pytesmo.scaling.linreg(src, ref, **kwargs)[source]
scales the input datasets using linear regression
- Parameters:
src (numpy.array) – input dataset which will be scaled
ref (numpy.array) – src will be scaled to this dataset
- Returns:
scaled dataset – dataset scaled using linear regression
- Return type:
numpy.array
- pytesmo.scaling.linreg_params(src, ref)[source]
Calculate additive and multiplicative correction parameters based on linear regression models.
- Parameters:
src (numpy.array) – Candidate data (to which the corrections apply)
ref (numpy.array) – Reference data (which candidate is scaled to)
- Returns:
slope (float) – Multiplicative correction value
intercept (float) – Additive correction value
- pytesmo.scaling.linreg_stored_params(src, slope, intercept)[source]
Scale the input data with passed correction values
- pytesmo.scaling.mean_std(src, ref, **kwargs)[source]
scales the input datasets so that they have the same mean and standard deviation afterwards
- Parameters:
src (numpy.array) – input dataset which will be scaled
ref (numpy.array) – src will be scaled to this dataset
- Returns:
scaled dataset – dataset src with same mean and standard deviation as ref
- Return type:
numpy.array
- pytesmo.scaling.min_max(src, ref, **kwargs)[source]
scales the input datasets so that they have the same minimum and maximum afterwards
- Parameters:
src (numpy.array) – input dataset which will be scaled
ref (numpy.array) – src will be scaled to this dataset
- Returns:
scaled dataset – dataset src with same maximum and minimum as ref
- Return type:
numpy.array
- pytesmo.scaling.scale(df, method='linreg', reference_index=0, **kwargs)[source]
takes pandas.DataFrame and scales all columns to the column specified by reference_index with the chosen method
- Parameters:
df (pandas.DataFrame) – containing matched time series that should be scaled
method (string, optional) – method definition, has to be a function in globals() that takes 2 numpy.array as input and returns one numpy.array of same length
reference_index (int, optional) – default 0, column index of reference dataset in dataframe
- Returns:
scaled data – all time series of the input DataFrame scaled to the one specified by reference_index
- Return type: