pytesmo.metrics.tcol module
Triple collocation metrics.
To avoid recalculation of the covariance matrix, all metrics are calculated
inside tcol_metrics()
. Bootstrapped CIs can be obtained with
tcol_metrics_with_bootstrapped_ci()
.
Bootstrapping is currently not available for extended TCA.
- pytesmo.metrics.tcol.check_if_biased(combs, correlated)[source]
Supporting function for extended collocation Checks whether the estimators are biased by checking of not too manny data sets (are assumed to have cross-correlated errors)
- pytesmo.metrics.tcol.ecol(data, correlated=None, err_cov=None, abs_est=True)[source]
Extended collocation analysis to obtain estimates of:
signal variances
error variances
signal-to-noise ratios [dB]
error cross-covariances (and -correlations)
based on an arbitrary number of N>3 data sets.
EACH DATA SET MUST BE MEMBER OF >= 1 TRIPLET THAT FULFILLS THE CLASSICAL TRIPLE COLLOCATION ASSUMPTIONS
- Parameters:
data (pd.DataFrame) – Temporally matched input data sets in each column
correlated (tuple of tuples (string)) – A tuple containing tuples of data set names (column names), between which the error cross-correlation shall be estimated. e.g. [[‘AMSR-E’,’SMOS’],[‘GLDAS’,’ERA’]] estimates error cross-correlations between (AMSR-E and SMOS), and (GLDAS and ERA), respectively.
err_cov – A priori known error cross-covariances that shall be included in the estimation (to obtain unbiased estimates)
abs_est – Force absolute values for signal and error variance estimates (to mitiate the issue of estimation uncertainties)
- Returns:
A dictionary with the following entries (<name> correspond to data set (df
column’s) names
- sig_<name> (signal variance of <name>)
- err_<name> (error variance of <name>)
- snr_<name> (SNR (in dB) of <name>)
- err_cov_<name1>_<name2> (error covariance between <name1> and <name2>)
- err_corr_<name1>_<name2> (error correlation between <name1> and <name2>)
Notes
Rescaling parameters can be derived from the signal variances e.g., scaling <src> against <ref>: beta = np.sqrt(sig_<ref> / sig_<src>) rescaled = (data[<src>] - data[<src>].mean()) * beta + data[<ref>].mean()
Examples
# Just random numbers for demonstrations ds1 = np.random.normal(0,1,500) ds2 = np.random.normal(0,1,500) ds3 = np.random.normal(0,1,500) ds4 = np.random.normal(0,1,500) ds5 = np.random.normal(0,1,500) # Three data sets without cross-correlated errors: This is equivalent # to standard triple collocation. df = pd.DataFrame({'ds1':ds1,'ds2':ds2,'ds3':ds3}, index=np.arange(500)) res = ecol(df) # Five data sets, where data sets (1 and 2), and (3 and 4), are assumed # to have cross-correlated errors. df = pd.DataFrame({'ds1':ds1,'ds2':ds2,'ds3':ds3,'ds4':ds4,'ds5':ds5}, index=np.arange(500),) correlated = [['ds1','ds2'],['ds3','ds4']] res = ecol(df,correlated=correlated)
References
[Gruber2016]Gruber, A., Su, C. H., Crow, W. T., Zwieback, S., Dorigo, W. A., & Wagner, W. (2016). Estimating error cross-correlations in soil moisture data sets using extended collocation analysis. Journal of Geophysical Research: Atmospheres, 121(3), 1208-1219.
- pytesmo.metrics.tcol.tcol_metrics(x, y, z, ref_ind=0)[source]
Triple collocation based estimation of signal-to-noise ratio, absolute errors, and rescaling coefficients
- Parameters:
x (1D numpy.ndarray) – first input dataset
y (1D numpy.ndarray) – second input dataset
z (1D numpy.ndarray) – third input dataset
ref_ind (int) – Index of reference data set for estimating scaling coefficients. Default: 0 (x)
- Returns:
snr (numpy.ndarray) – signal-to-noise (variance) ratio [dB]
err_std (numpy.ndarray) – SCALED error standard deviation
beta (numpy.ndarray) – scaling coefficients (i_scaled = i * beta_i)
Notes
This function estimates the triple collocation errors, the scaling parameter \(\beta\) and the signal to noise ratio directly from the covariances of the dataset. For a general overview and how this function and
pytesmo.metrics.tcol_error()
are related please see [Gruber2015].Estimation of the error variances from the covariances of the datasets (e.g. \(\sigma_{XY}\) for the covariance between \(x\) and \(y\)) is done using the following formula:
\[\sigma_{\varepsilon_x}^2 = \sigma_{X}^2 - \frac{\sigma_{XY}\sigma_{XZ}}{\sigma_{YZ}}\]\[\sigma_{\varepsilon_y}^2 = \sigma_{Y}^2 - \frac{\sigma_{YX}\sigma_{YZ}}{\sigma_{XZ}}\]\[\sigma_{\varepsilon_z}^2 = \sigma_{Z}^2 - \frac{\sigma_{ZY}\sigma_{ZX}}{\sigma_{YX}}\]\(\beta\) can also be estimated from the covariances:
\[\beta_x = 1\]\[\beta_y = \frac{\sigma_{XZ}}{\sigma_{YZ}}\]\[\beta_z=\frac{\sigma_{XY}}{\sigma_{ZY}}\]The signal to noise ratio (SNR) is also calculated from the variances and covariances:
\[\text{SNR}_X[dB] = -10\log\left(\frac{\sigma_{X}^2\sigma_{YZ}} {\sigma_{XY}\sigma_{XZ}}-1\right)\]\[\text{SNR}_Y[dB] = -10\log\left(\frac{\sigma_{Y}^2\sigma_{XZ}} {\sigma_{YX}\sigma_{YZ}}-1\right)\]\[\text{SNR}_Z[dB] = -10\log\left(\frac{\sigma_{Z}^2\sigma_{XY}} {\sigma_{ZX}\sigma_{ZY}}-1\right)\]It is given in dB to make it symmetric around zero. If the value is zero it means that the signal variance and the noise variance are equal. +3dB means that the signal variance is twice as high as the noise variance.
References
[Gruber2015]Gruber, A., Su, C., Zwieback, S., Crow, W., Dorigo, W., Wagner, W. (2015). Recent advances in (soil moisture) triple collocation analysis. International Journal of Applied Earth Observation and Geoinformation, in review