pytesmo.metrics.tcol module

Triple collocation metrics.

To avoid recalculation of the covariance matrix, all metrics are calculated inside tcol_metrics(). Bootstrapped CIs can be obtained with tcol_metrics_with_bootstrapped_ci().

Bootstrapping is currently not available for extended TCA.

pytesmo.metrics.tcol.check_if_biased(combs, correlated)[source]: Supporting function for extended collocation Checks whether the estimators are biased by checking of not too manny data sets (are assumed to have cross-correlated errors)

pytesmo.metrics.tcol.ecol(data, correlated=None, err_cov=None, abs_est=True)[source]

Extended collocation analysis to obtain estimates of:

signal variances

error variances

signal-to-noise ratios [dB]

error cross-covariances (and -correlations)

based on an arbitrary number of N>3 data sets.

EACH DATA SET MUST BE MEMBER OF >= 1 TRIPLET THAT FULFILLS THE CLASSICAL TRIPLE COLLOCATION ASSUMPTIONS

Parameters:

data (pd.DataFrame) – Temporally matched input data sets in each column
correlated (tuple of tuples (string)) – A tuple containing tuples of data set names (column names), between which the error cross-correlation shall be estimated. e.g. [[‘AMSR-E’,’SMOS’],[‘GLDAS’,’ERA’]] estimates error cross-correlations between (AMSR-E and SMOS), and (GLDAS and ERA), respectively.
err_cov – A priori known error cross-covariances that shall be included in the estimation (to obtain unbiased estimates)
abs_est – Force absolute values for signal and error variance estimates (to mitiate the issue of estimation uncertainties)

Returns:

A dictionary with the following entries (<name> correspond to data set (df
column’s) names
- sig_<name> (signal variance of <name>)
- err_<name> (error variance of <name>)
- snr_<name> (SNR (in dB) of <name>)
- err_cov_<name1>_<name2> (error covariance between <name1> and <name2>)
- err_corr_<name1>_<name2> (error correlation between <name1> and <name2>)

Notes

Rescaling parameters can be derived from the signal variances e.g., scaling <src> against <ref>: beta = np.sqrt(sig_<ref> / sig_<src>) rescaled = (data[<src>] - data[<src>].mean()) * beta + data[<ref>].mean()

Examples

# Just random numbers for demonstrations
ds1 = np.random.normal(0,1,500)
ds2 = np.random.normal(0,1,500)
ds3 = np.random.normal(0,1,500)
ds4 = np.random.normal(0,1,500)
ds5 = np.random.normal(0,1,500)

# Three data sets without cross-correlated errors: This is equivalent
# to standard triple collocation.
df = pd.DataFrame({'ds1':ds1,'ds2':ds2,'ds3':ds3},
                  index=np.arange(500))
res = ecol(df)

# Five data sets, where data sets (1 and 2), and (3 and 4), are assumed
# to have cross-correlated errors.
df = pd.DataFrame({'ds1':ds1,'ds2':ds2,'ds3':ds3,'ds4':ds4,'ds5':ds5},
                  index=np.arange(500),)
correlated = [['ds1','ds2'],['ds3','ds4']]
res = ecol(df,correlated=correlated)

References

[Gruber2016]

Gruber, A., Su, C. H., Crow, W. T., Zwieback, S., Dorigo, W. A., & Wagner, W. (2016). Estimating error cross-correlations in soil moisture data sets using extended collocation analysis. Journal of Geophysical Research: Atmospheres, 121(3), 1208-1219.

pytesmo.metrics.tcol.tcol_metrics(x, y, z, ref_ind=0)[source]

Triple collocation based estimation of signal-to-noise ratio, absolute errors, and rescaling coefficients

Parameters:

x (1D numpy.ndarray) – first input dataset
y (1D numpy.ndarray) – second input dataset
z (1D numpy.ndarray) – third input dataset
ref_ind (int) – Index of reference data set for estimating scaling coefficients. Default: 0 (x)

Returns:

snr (numpy.ndarray) – signal-to-noise (variance) ratio [dB]
err_std (numpy.ndarray) – SCALED error standard deviation
beta (numpy.ndarray) – scaling coefficients (i_scaled = i * beta_i)

Notes

This function estimates the triple collocation errors, the scaling parameter \(\beta\) and the signal to noise ratio directly from the covariances of the dataset. For a general overview and how this function and pytesmo.metrics.tcol_error() are related please see [Gruber2015].

Estimation of the error variances from the covariances of the datasets (e.g. \(\sigma_{XY}\) for the covariance between \(x\) and \(y\)) is done using the following formula:

\[\sigma_{\varepsilon_x}^2 = \sigma_{X}^2 - \frac{\sigma_{XY}\sigma_{XZ}}{\sigma_{YZ}}\]

\[\sigma_{\varepsilon_y}^2 = \sigma_{Y}^2 - \frac{\sigma_{YX}\sigma_{YZ}}{\sigma_{XZ}}\]

\[\sigma_{\varepsilon_z}^2 = \sigma_{Z}^2 - \frac{\sigma_{ZY}\sigma_{ZX}}{\sigma_{YX}}\]

\(\beta\) can also be estimated from the covariances:

\[\beta_x = 1\]

\[\beta_y = \frac{\sigma_{XZ}}{\sigma_{YZ}}\]

\[\beta_z=\frac{\sigma_{XY}}{\sigma_{ZY}}\]

The signal to noise ratio (SNR) is also calculated from the variances and covariances:

\[\text{SNR}_X[dB] = -10\log\left(\frac{\sigma_{X}^2\sigma_{YZ}} {\sigma_{XY}\sigma_{XZ}}-1\right)\]

\[\text{SNR}_Y[dB] = -10\log\left(\frac{\sigma_{Y}^2\sigma_{XZ}} {\sigma_{YX}\sigma_{YZ}}-1\right)\]

\[\text{SNR}_Z[dB] = -10\log\left(\frac{\sigma_{Z}^2\sigma_{XY}} {\sigma_{ZX}\sigma_{ZY}}-1\right)\]

It is given in dB to make it symmetric around zero. If the value is zero it means that the signal variance and the noise variance are equal. +3dB means that the signal variance is twice as high as the noise variance.

References

[Gruber2015]

Gruber, A., Su, C., Zwieback, S., Crow, W., Dorigo, W., Wagner, W. (2015). Recent advances in (soil moisture) triple collocation analysis. International Journal of Applied Earth Observation and Geoinformation, in review