pytesmo.temporal_matching module
Provides functions for temporally collocating data from multiple dataframes.
- pytesmo.temporal_matching.combined_temporal_collocation(reference, others, window, method='nearest', dropduplicates=False, dropna=False, combined_dropna=False, flag=None, checkna=False, use_invalid=False, add_ref_data=False)[source]
Temporally collocates multiple dataframes to reference times.
- Parameters:
reference (pd.DataFrame, pd.Series, or pd.DatetimeIndex) – The reference onto which other should be collocated. If this is a DataFrame or a Series, the index must be a DatetimeIndex. If the index is timezone-naive, UTC will be assumed.
others (list/tuple of pd.DataFrame or pd.Series) – DataFrames/Series to be collocated. Each entry must have a pd.DatetimeIndex as index. If the index is timezone-naive, the timezone of the reference data will be assumed.
window (pd.Timedelta or float) – Window around reference timestamps in which to look for data. Floats are interpreted as number of days.
method (str, optional) –
Which method to use for the temporal collocation:
”nearest” (default): Uses the nearest valid neighbour. When this method is used, entries with duplicate index values in other will be dropped, and only the first of the duplicates is kept.
”mean”: Takes the mean over the given window around the reference times.
dropduplicates (bool, optional) – Whether to drop duplicated timestamps in others. Default is
False
, except whenmethod="nearest"
, in which case this is enforced to beTrue
.dropna (bool, optional) – Whether to drop NaNs from the resulting dataframe (arising for example from duplicates with
duplicates_nan=True
or from missing values). Default isFalse
.combined_dropna (str or bool, optional) – Whether and how to drop NaNs from the resulting combined DataFrame. Can be
"any"
,"all"
,True
orFalse
. “any” makes sure that the output dataframe only has values at times where all input frames had values, while “all” only drops lines where all values are NaN.True
is the same as “any”, andFalse
(default) disables dropping NaNs.checkna (bool, optional) – Whether to check if only NaNs are returned (i.e. no match has been found). If set to
True
, raises aUserWarning
in case no match has been found. Default isFalse
.flag (np.ndarray or None, optional) – Flag column as array. If this is given, the column will be interpreted as validity indicator. Any nonzero values mark the row as invalid. Default is
None
.use_invalid (bool, optional) – Whether to use invalid values marked by flag in case no valid values are available. Default is
False
.add_ref_data (bool, optional) – If reference is a DataFrame or Series, add the data to the final collocated dataframe.
- Returns:
collocated – Temporally collocated DataFrame with variables from all input frames merged together.
- Return type:
pd.DataFrame or pd.Series
- pytesmo.temporal_matching.temporal_collocation(reference, other, window, method='nearest', return_index=False, return_distance=False, dropduplicates=False, dropna=False, checkna=False, flag=None, use_invalid=False)[source]
Temporally collocates values to reference.
- Parameters:
reference (pd.DataFrame, pd.Series, or pd.DatetimeIndex) – The reference onto which other should be collocated. If this is a DataFrame or a Series, the index must be a DatetimeIndex. If the index is timezone-naive and other is not, the timezone of other will be assumed.
other (pd.DataFrame or pd.Series) – Data to be collocated. Must have a pd.DatetimeIndex as index. If the index is timezone-naive and reference is not, the timezone of the reference data will be assumed.
window (pd.Timedelta or float) – Window around reference timestamps in which to look for data. Floats are interpreted as number of days.
method (str, optional) –
Which method to use for the temporal collocation:
”nearest” (default): Uses the nearest valid neighbour. When this method is used, entries with duplicate index values in other will be dropped, and only the first of the duplicates is kept.
”mean”: Takes the mean over the given window around the reference times.
return_index (boolean, optional) – Include index of other in matched dataframe (default: False). Only used with
method="nearest"
. The index will be added as a separate column with the name “index_other”.return_distance (boolean, optional) – Include distance information between reference and other in matched dataframe (default: False). This is only used with
method="nearest"
, and impliesreturn_index=True
. The distance will be added as a separate column with the name “distance_other”.dropduplicates (bool, optional) – Whether to drop duplicated timestamps in other. Default is
False
, except whenmethod="nearest"
, in which case this is enforced to beTrue
.dropna (bool, optional) – Whether to drop NaNs from the resulting dataframe (arising for example from duplicates with
duplicates_nan=True
or from missing values). This useshow="all"
, that is, only rows where all values are NaN are dropped. Default isFalse
.checkna (bool, optional) – Whether to check if only NaNs are returned (i.e. no match has been found). If set to
True
, raises aUserWarning
in case no match has been found. Default isFalse
.flag (np.ndarray, str or None, optional) – Flag column as array or name of the flag column in other. If this is given, the column will be interpreted as validity indicator. Any nonzero values mark the row as invalid. Default is
None
.use_invalid (bool, optional) – Whether to use invalid values marked by flag in case no valid values are available. Default is
False
.
- Returns:
collocated – Temporally collocated version of
other
.- Return type:
pd.DataFrame or pd.Series