pytesmo.metrics.pairwise module

Pairwise metrics and analytical confidence intervals.

Metrics

The metrics function implemented here all have the signature:

def metric(x : np.ndarray, y : np.ndarray) -> float

Confidence intervals

Formulas for confidence intervals have in general been taken from from Gilleland (2010), 10.5065/D6WD3XJM, https://opensky.ucar.edu/islandora/object/technotes:491

Other references are cited in the docstring of the respective function.

Analytical confidence interval functions implemented here are named <metric>_ci, e.g. for bias, the CI function is bias_ci. The signature is be:

def metric_ci(x : np.ndarray, y : np.ndarray, m : float,
              alpha=0.05 : float) -> float, float

where m is the metric value that has been calculated for x and y.

Typically, you should use pytesmo.metrics.confidence_intervals.with_analytical_ci() for calculating a metric CI.

pytesmo.metrics.pairwise.aad(x, y)[source]

Average (=mean) absolute deviation (AAD).

Parameters:
Returns:

d – Mean absolute deviation.

Return type:

float

pytesmo.metrics.pairwise.bias_ci(x, y, b, alpha=0.05)[source]

Confidence interval for bias.

The confidence interval is the same as the confidence interval for a mean.

Parameters:
Returns:

lower, upper – Lower and upper confidence interval bounds.

Return type:

float

pytesmo.metrics.pairwise.index_of_agreement(o, p)[source]

Index of agreement was proposed by Willmot (1981), to overcome the insenstivity of Nash-Sutcliffe efficiency E and R^2 to differences in the observed and predicted means and variances (Legates and McCabe, 1999). The index of agreement represents the ratio of the mean square error and the potential error (Willmot, 1984). The potential error in the denominator represents the largest value that the squared difference of each pair can attain. The range of d is similar to that of R^2 and lies between 0 (no correlation) and 1 (perfect fit).

Parameters:
Returns:

d – Index of agreement.

Return type:

float

pytesmo.metrics.pairwise.kendall_tau(x, y)[source]

Wrapper for scipy.stats.kendalltau

Parameters:
  • x (numpy.array) – First input vector.

  • y (numpy.array) – Second input vector.

Returns:

tau – Kendall’s tau statistic

Return type:

float

pytesmo.metrics.pairwise.kendall_tau_ci(x, y, tau, alpha=0.05)[source]

Confidence intervall for Kendall’s rank coefficient.

Parameters:
  • x (numpy.ndarray) – First input vector

  • y (numpy.ndarray) – Second input vector

  • tau (float) – Kendall tau for this data

  • alpha (float, optional) – 1 - confidence level, default is 0.05

Returns:

lower, upper – Lower and upper confidence interval bounds.

Return type:

float

References

Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika, 65(1), 23-28.

pytesmo.metrics.pairwise.mad(x, y)[source]

Median absolute deviation (MAD).

Parameters:
Returns:

d – Median absolute deviation.

Return type:

float

pytesmo.metrics.pairwise.msd(x, y)[source]

Mean square deviation/mean square error.

For validation, MSD (same as MSE) is defined as

..math:

MSD = \frac{1}{n}\sum\limits_{i=1}^n (x_i - y_i)^2

MSD can be decomposed into a term describing the deviation of x and y attributable to non-perfect correlation (r < 1), a term depending on the difference in variances between x and y, and the difference in means between x and y (bias).

..math:

MSD &= MSD_{corr} + MSD_{var} + MSD_{bias}\\
    &= 2\sigma_x\sigma_y (1-r) + (\sigma_x - \sigma_y)^2
       + (\mu_x - \mu_y)^2

This function calculates the full MSD, the function msd_corr, msd_var, and msd_bias can be used to calculate the individual components.

Parameters:
Returns:

msd – Mean square deviation

Return type:

float

pytesmo.metrics.pairwise.nash_sutcliffe(o, p)[source]

Nash Sutcliffe model efficiency coefficient E. The range of E lies between 1.0 (perfect fit) and -inf.

Parameters:
Returns:

E – Nash Sutcliffe model efficiency coefficient E.

Return type:

float

pytesmo.metrics.pairwise.nrmsd(x, y, ddof=0)[source]

Normalized root-mean-square deviation (nRMSD).

This is normalizes RMSD by max(max(x), max(y)) - min(min(x), min(y)).

Parameters:
  • x (numpy.ndarray) – First input vector.

  • y (numpy.ndarray) – Second input vector.

  • ddof (int, optional) – Delta degree of freedom.The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero. DEPRECATED: ddof is deprecated and might be removed in future versions.

Returns:

nrmsd – Normalized root-mean-square deviation (nRMSD).

Return type:

float

pytesmo.metrics.pairwise.pearson_r(x, y)[source]

Pearson’s linear correlation coefficient.

Parameters:
Returns:

r – Pearson’s correlation coefficent.

Return type:

float

pytesmo.metrics.pairwise.pearson_r_ci(x, y, r, alpha=0.05)[source]

Confidence interval for Pearson correlation coefficient.

Parameters:
  • x (numpy.ndarray) – First input vector

  • y (numpy.ndarray) – Second input vector

  • r (float) – Pearson r for this data

  • alpha (float, optional) – 1 - confidence level, default is 0.05

Returns:

lower, upper – Lower and upper confidence interval bounds.

Return type:

float

References

Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika, 65(1), 23-28.

pytesmo.metrics.pairwise.rmsd(x, y, ddof=0)[source]

Root-mean-square deviation (RMSD).

This is the root of MSD (see pytesmo.metrics.msd()). If x and y have the same mean (i.e. mean(x - y = 0) RMSD corresponds to the square root of the variance of x - y.

Parameters:
  • x (numpy.ndarray) – First input vector.

  • y (numpy.ndarray) – Second input vector.

  • ddof (int, optional, DEPRECATED) – Delta degree of freedom.The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero. DEPRECATED: ddof is deprecated and might be removed in future versions.

Returns:

rmsd – Root-mean-square deviation.

Return type:

float

pytesmo.metrics.pairwise.spearman_r(x, y)[source]

Spearman’s rank correlation coefficient.

Parameters:
  • x (numpy.array) – First input vector.

  • y (numpy.array) – Second input vector.

Returns:

rho – Spearman correlation coefficient

Return type:

float

See also

scipy.stats.spearmenr

pytesmo.metrics.pairwise.spearman_r_ci(x, y, r, alpha=0.05)[source]

Confidence interval for Spearman rank correlation coefficient.

Parameters:
  • x (numpy.ndarray) – First input vector

  • y (numpy.ndarray) – Second input vector

  • r (float) – Spearman’s r for this data

  • alpha (float, optional) – 1 - confidence level, default is 0.05

Returns:

lower, upper – Lower and upper confidence interval bounds.

Return type:

float

References

Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika, 65(1), 23-28.

pytesmo.metrics.pairwise.ubrmsd(x, y, ddof=0)[source]

Unbiased root-mean-square deviation (uRMSD).

This corresponds to RMSD with mean biases removed beforehand, that is

..math:

ubRMSD = \sqrt{\frac{1}{n}\sum\limits_{i=1}^n
                   \left((x - \bar{x}) - (y - \bar{y}))^2}

NOTE: If you are scaling the data beforehand to have zero mean bias, this is exactly the same as RMSD.

Parameters:
  • x (numpy.ndarray) – First input vector.

  • y (numpy.ndarray) – Second input vector.

  • ddof (int, optional) – Delta degree of freedom.The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero. DEPRECATED: ddof is deprecated and might be removed in future versions.

Returns:

ubrmsd – Unbiased root-mean-square deviation (uRMSD).

Return type:

float

pytesmo.metrics.pairwise.ubrmsd_ci(x, y, ubrmsd, alpha=0.05)[source]

Confidende interval for unbiased root-mean-square deviation (uRMSD).

Parameters:
  • x (numpy.ndarray) – First input vector

  • y (numpy.ndarray) – Second input vector

  • ubrmsd (float) – ubRMSD for this data

  • alpha (float, optional) – 1 - confidence level, default is 0.05

Returns:

lower, upper – Lower and upper confidence interval bounds.

Return type:

float