pytesmo.validation_framework.metric_calculators module

Metric calculators implement combinations of metrics and structure the output.

class pytesmo.validation_framework.metric_calculators.BasicMetrics(other_name='k1', calc_tau=False, metadata_template=None)[source]

Bases: MetadataMetrics

This class just computes the basic metrics,
  • Pearson’s R

  • Spearman’s rho

  • RMSD

  • BIAS

  • optionally Kendall’s tau

it also stores information about gpi, lat, lon and number of observations

Parameters:
  • other_name (string or tuple, optional) – Name of the column of the non-reference / other dataset in the pandas DataFrame

  • calc_tau (boolean, optional) – if True then also tau is calculated. This is set to False by default since the calculation of Kendalls tau is rather slow and can significantly impact performance of e.g. global validation studies

  • metadata_template (dictionary, optional) –

    A dictionary containing additional fields (and types) of the form dict = {'field': np.float32([np.nan]}. Allows users to specify information in the job tuple, i.e.:

    jobs.append((idx, metadata['longitude'], metadata['latitude'],
                 metadata_dict))``
    

    which is then propagated to the end netCDF results file.

calc_metrics(data, gpi_info)[source]

calculates the desired statistics

Parameters:
  • data (pandas.DataFrame) – with 2 columns, the first column is the reference dataset named ‘ref’ the second column the dataset to compare against named ‘other’

  • gpi_info (tuple) – of (gpi, lon, lat)

Notes

Kendall tau is calculation is optional at the moment because the scipy implementation is very slow which is problematic for global comparisons

class pytesmo.validation_framework.metric_calculators.BasicMetricsPlusMSE(other_name='k1', metadata_template=None)[source]

Bases: BasicMetrics

Basic Metrics plus Mean squared Error and the decomposition of the MSE into correlation, bias and variance parts.

calc_metrics(data, gpi_info)[source]

calculates the desired statistics

Parameters:
  • data (pandas.DataFrame) – with 2 columns, the first column is the reference dataset named ‘ref’ the second column the dataset to compare against named ‘other’

  • gpi_info (tuple) – of (gpi, lon, lat)

Notes

Kendall tau is calculation is optional at the moment because the scipy implementation is very slow which is problematic for global comparisons

class pytesmo.validation_framework.metric_calculators.FTMetrics(frozen_flag=2, other_name='k1', metadata_template=None)[source]

Bases: MetadataMetrics

This class computes Freeze/Thaw Metrics Calculated metrics are:

  • SSF frozen/temp unfrozen

  • SSF unfrozen/temp frozen

  • SSF unfrozen/temp unfrozen

  • SSF frozen/temp frozen

it also stores information about gpi, lat, lon and number of total observations

calc_metrics(data, gpi_info)[source]

calculates the desired statistics

Parameters:
  • data (pandas.DataFrame) – with 2 columns, the first column is the reference dataset named ‘ref’ the second column the dataset to compare against named ‘other’

  • gpi_info (tuple) – of (gpi, lon, lat)

Notes

Kendall tau is not calculated at the moment because the scipy implementation is very slow which is problematic for global comparisons

class pytesmo.validation_framework.metric_calculators.HSAF_Metrics(other_name1='k1', other_name2='k2', dataset_names=None, metadata_template=None)[source]

Bases: MetadataMetrics

This class computes metrics as defined by the H-SAF consortium in order to prove the operational readiness of a product. It also stores information about gpi, lat, lon and number of observations.

calc_metrics(data, gpi_info)[source]

calculates the desired statistics

Parameters:
  • data (pandas.DataFrame) – with 3 columns, the first column is the reference dataset named ‘ref’ the second and third column are the datasets to compare against named ‘k1 and k2’

  • gpi_info (tuple) – Grid point info (i.e. gpi, lon, lat)

class pytesmo.validation_framework.metric_calculators.IntercomparisonMetrics(refname='ref', other_names=('k1', 'k2', 'k3'), calc_tau=False, metrics_between_nonref=False, calc_rho=True, dataset_names=None, metadata_template=None)[source]

Bases: MetadataMetrics

Compare Basic Metrics of multiple satellite data sets to one reference data set via:

  • Pearson’s R and p

  • Spearman’s rho and p

  • RMSD

  • BIAS

  • ubRMSD

  • mse

  • RSS

  • optionally Kendall’s tau

Parameters:
  • refname (str, optional) – Name of the reference column in the DataFrame.

  • other_names (tuple, optional (default: ('k1', 'k2', 'k3))) – Name of the column of the non-reference / other datasets in the DataFrame.

  • calc_rho (boolean, optional) – If True then also Spearman’s rho is calculated. This is set to True by default.

  • calc_tau (boolean, optional) – if True then also tau is calculated. This is set to False by default since the calculation of Kendalls tau is rather slow and can significantly impact performance of e.g. global validation studies

  • metrics_between_nonref (bool, optional (default: False)) – Allow 2-dataset combinations where the ref is not included. Warning: can lead to many combinations.

  • dataset_names (list, optional (default: None)) – Names of the original datasets, that are used to find the lookup table for the df cols.

  • metadata_template (dict, optional (default: None)) – See MetadataMetrics

calc_metrics(data, gpi_info)[source]

calculates the desired statistics

Parameters:
  • data (pd.DataFrame) – with >2 columns, the first column is the reference dataset named ‘ref’ other columns are the datasets to compare against named ‘other_i’

  • gpi_info (tuple) – of (gpi, lon, lat)

Notes

Kendall tau is calculation is optional at the moment because the scipy implementation is very slow which is problematic for global comparisons

class pytesmo.validation_framework.metric_calculators.MetadataMetrics(other_name='k1', metadata_template=None, min_obs=10)[source]

Bases: object

This class sets up the gpi info and metadata (if used) in the results template. This is used as the basis for all other metric calculators.

Parameters:
  • other_name (string or tuple, optional) – Name of the column of the non-reference / other dataset in the pandas DataFrame

  • metadata_template (dictionary, optional) – A dictionary containing additional fields (and types) of the form dict = {‘field’: np.float32([np.nan]}. Allows users to specify information in the job tuple, i.e. jobs.append((idx, metadata[‘longitude’], metadata[‘latitude’], metadata_dict)) which is then propagated to the end netCDF results file.

  • min_obs (int, optional) – Minium number of observations required t calculate metrics. Default is 10.

calc_metrics(data, gpi_info)[source]

Adds the gpi info and metadata to the results.

Parameters:
  • data (pandas.DataFrame) – see individual calculators for more information. not directly used here.

  • gpi_info (tuple) – of (gpi, lon, lat) or, optionally, (gpi, lon, lat, metadata) where metadata is a dictionary

class pytesmo.validation_framework.metric_calculators.PairwiseIntercomparisonMetrics(min_obs=10, calc_spearman=True, calc_kendall=True, analytical_cis=True, bootstrap_cis=False, bootstrap_min_obs=100, bootstrap_alpha=0.05, metadata_template=None)[source]

Bases: MetadataMetrics, PairwiseMetricsMixin

Basic metrics for comparison of two datasets:

  • RMSD

  • BIAS

  • ubRMSD

  • mse and decomposition (mse_var, mse_corr, mse_bias)

  • RSS

  • Pearson’s R and p

  • Spearman’s rho and p (optional)

  • Kendall’s tau and p (optional)

Additionally, confidence intervals for these metrics can be calculated (optional).

NOTE: When using this within a

pytesmo.validation_framework.validation.Validation, use temporal_matcher=make_combined_temporal_matcher(<window>) as keyword argument. make_combined_temporal_matcher can be imported from pytesmo.validation_framework.temporal_matchers.

Parameters:
  • min_obs (int, optional) – Minimum number of observations required to calculate metrics. Default is 10.

  • calc_spearman (bool, optional) – Whether to calculate Spearman’s rank correlation coefficient. Default is True.

  • calc_kendall (bool, optional) – Whether to calculate Kendall’s rank correlation coefficient. Default is True.

  • analytical_cis (bool, optional (default: True)) –

    Whether to calculate analytical confidence intervals for the following metrics:

    • BIAS

    • mse_bias

    • RMSD

    • urmsd

    • mse

    • R

    • rho (only if calc_spearman=True)

    • tau (only if calc_kendall=True)

  • bootstrap_cis (bool, optional (default: False)) –

    Whether to calculate bootstrap confidence intervals for the following metrics:

    • mse_corr

    • mse_var

    The default is False. This might be a lot of computational effort.

  • bootstrap_min_obs (int, optional (default: 100)) – Minimum number of observations to draw from the time series for boot- strapping.

  • bootstrap_alpha (float, optional (default: 0.05)) – Confidence level.

calc_metrics(data, gpi_info)[source]

Calculates pairwise metrics.

Parameters:
  • data (pd.DataFrame) – DataFrame with 2 columns between which metrics should be calculated.

  • gpi_info (tuple) – (gpi, lon, lat)

class pytesmo.validation_framework.metric_calculators.PairwiseMetricsMixin[source]

Bases: object

class pytesmo.validation_framework.metric_calculators.RollingMetrics(other_name='k1', metadata_template=None)[source]

Bases: MetadataMetrics

This class computes rolling metrics for Pearson R and RMSD. It also stores information about gpi, lat, lon and number of observations.

Parameters:
  • other_name (string or tuple, optional) – Name of the column of the non-reference / other dataset in the pandas DataFrame

  • metadata_template (dictionary, optional) –

    A dictionary containing additional fields (and types) of the form dict = {'field': np.float32([np.nan]}. Allows users to specify information in the job tuple, i.e.:

    jobs.append((idx, metadata['longitude'], metadata['latitude'],
                 metadata_dict))``
    

    which is then propagated to the end netCDF results file.

calc_metrics(data, gpi_info, window_size='30d', center=True, min_periods=2)[source]

Calculate the desired statistics.

Parameters:
  • data (pandas.DataFrame) – with 2 columns, the first column is the reference dataset named ‘ref’ the second column the dataset to compare against named ‘other’

  • gpi_info (tuple) – of (gpi, lon, lat)

  • window_size (string) – Window size defined as string.

  • center (bool, optional) – Set window at the center.

  • min_periods (int, optional) – Minimum number of observations in window required for computation.

class pytesmo.validation_framework.metric_calculators.TCMetrics(other_names=('k1', 'k2'), calc_tau=False, dataset_names=None, tc_metrics_for_ref=True, metrics_between_nonref=False, metadata_template=None)[source]

Bases: MetadataMetrics

This class computes triple collocation metrics as defined in the QA4SM project. It uses 2 satellite and 1 reference data sets as inputs only. It can be extended to perform intercomparison between possible triples of more than 3 datasets.

calc_metrics(data, gpi_info)[source]

Calculate Triple Collocation metrics

Parameters:
  • data (pd.DataFrame) – with >2 columns, the first column is the reference dataset named ‘ref’ other columns are the data sets to compare against named ‘other_i’

  • gpi_info (tuple) – of (gpi, lon, lat)

Notes

Kendall tau is calculation is optional at the moment because the scipy implementation is very slow which is problematic for global comparisons

class pytesmo.validation_framework.metric_calculators.TripleCollocationMetrics(refname, min_obs=10, bootstrap_cis=False, bootstrap_min_obs=100, bootstrap_alpha=0.05, metadata_template=None)[source]

Bases: MetadataMetrics, PairwiseMetricsMixin

Computes triple collocation metrics

The triple collocation metrics calculated are:

  • SNR

  • error standard deviation

  • linear scaling/multiplicative (first-order) bias

NOTE: When using this within a

pytesmo.validation_framework.validation.Validation, use temporal_matcher=make_combined_temporal_matcher(<window>) as keyword argument. make_combined_temporal_matcher can be imported from pytesmo.validation_framework.temporal_matchers.

Parameters:
  • refname (str) – Name of the reference column that is passed to calc_metrics. This will also be used to name the results. Make sure that you set ``rename_cols=False`` in the call to ``Validation.calc``, otherwise the names will be wrong.

  • min_obs (int, optional) – Minimum number of observations required to calculate metrics. Default is 10.

  • bootstrap_cis – Whether to calculate bootstrap confidence intervals for triple collocation metrics. The default is False. This might be a lot of computational effort.

  • bootstrap_min_obs (int, optional (default: 100)) – Minimum number of observations to draw from the time series for boot- strapping.

  • bootstrap_alpha (float, optional (default: 0.05)) – Confidence level.

  • metadata_template (dict, optional (default: None)) – A dictionary containing additional fields (and types) of the form dict = {‘field’: np.float32([np.nan]}. Allows users to specify information in the job tuple, i.e. jobs.append((idx, metadata[‘longitude’], metadata[‘latitude’], metadata_dict)) which is then propagated to the end netCDF results file.

calc_metrics(data, gpi_info)[source]

Calculates triple collocation metrics.

Parameters:
  • data (pd.DataFrame) – DataFrame with one reference column and two other columns between which the metrics are calculated. The name of the reference column must be the same as used in the constructor. Make sure to use rename_cols for Validation.calc, so that the names are correct.

  • gpi_info (tuple) – (gpi, lon, lat)

pytesmo.validation_framework.metric_calculators.get_dataset_names(ref_key, datasets, n=3)[source]

Get dataset names in correct order as used in the validation framework

  • reference dataset = ref

  • first other dataset = k1

  • second other dataset = k2

This is important to correctly iterate through the H-SAF metrics and to save each metric with the name of the used datasets

Parameters:
  • ref_key (basestring) – Name of the reference dataset

  • datasets (dict) – Dictionary of dictionaries as provided to the validation framework in order to perform the validation process.

Returns:

dataset_names – List of the dataset names in correct order

Return type:

list