pytesmo.temporal_matching module

Provides functions for temporally collocating data from multiple dataframes.

pytesmo.temporal_matching.combined_temporal_collocation(reference, others, window, method='nearest', dropduplicates=False, dropna=False, combined_dropna=False, flag=None, checkna=False, use_invalid=False, add_ref_data=False)[source]

Temporally collocates multiple dataframes to reference times.

Parameters:
  • reference (pd.DataFrame, pd.Series, or pd.DatetimeIndex) – The reference onto which other should be collocated. If this is a DataFrame or a Series, the index must be a DatetimeIndex. If the index is timezone-naive, UTC will be assumed.

  • others (list/tuple of pd.DataFrame or pd.Series) – DataFrames/Series to be collocated. Each entry must have a pd.DatetimeIndex as index. If the index is timezone-naive, the timezone of the reference data will be assumed.

  • window (pd.Timedelta or float) – Window around reference timestamps in which to look for data. Floats are interpreted as number of days.

  • method (str, optional) –

    Which method to use for the temporal collocation:

    • ”nearest” (default): Uses the nearest valid neighbour. When this method is used, entries with duplicate index values in other will be dropped, and only the first of the duplicates is kept.

    • ”mean”: Takes the mean over the given window around the reference times.

  • dropduplicates (bool, optional) – Whether to drop duplicated timestamps in others. Default is False, except when method="nearest", in which case this is enforced to be True.

  • dropna (bool, optional) – Whether to drop NaNs from the resulting dataframe (arising for example from duplicates with duplicates_nan=True or from missing values). Default is False.

  • combined_dropna (str or bool, optional) – Whether and how to drop NaNs from the resulting combined DataFrame. Can be "any", "all", True or False. “any” makes sure that the output dataframe only has values at times where all input frames had values, while “all” only drops lines where all values are NaN. True is the same as “any”, and False (default) disables dropping NaNs.

  • checkna (bool, optional) – Whether to check if only NaNs are returned (i.e. no match has been found). If set to True, raises a UserWarning in case no match has been found. Default is False.

  • flag (np.ndarray or None, optional) – Flag column as array. If this is given, the column will be interpreted as validity indicator. Any nonzero values mark the row as invalid. Default is None.

  • use_invalid (bool, optional) – Whether to use invalid values marked by flag in case no valid values are available. Default is False.

  • add_ref_data (bool, optional) – If reference is a DataFrame or Series, add the data to the final collocated dataframe.

Returns:

collocated – Temporally collocated DataFrame with variables from all input frames merged together.

Return type:

pd.DataFrame or pd.Series

pytesmo.temporal_matching.temporal_collocation(reference, other, window, method='nearest', return_index=False, return_distance=False, dropduplicates=False, dropna=False, checkna=False, flag=None, use_invalid=False)[source]

Temporally collocates values to reference.

Parameters:
  • reference (pd.DataFrame, pd.Series, or pd.DatetimeIndex) – The reference onto which other should be collocated. If this is a DataFrame or a Series, the index must be a DatetimeIndex. If the index is timezone-naive and other is not, the timezone of other will be assumed.

  • other (pd.DataFrame or pd.Series) – Data to be collocated. Must have a pd.DatetimeIndex as index. If the index is timezone-naive and reference is not, the timezone of the reference data will be assumed.

  • window (pd.Timedelta or float) – Window around reference timestamps in which to look for data. Floats are interpreted as number of days.

  • method (str, optional) –

    Which method to use for the temporal collocation:

    • ”nearest” (default): Uses the nearest valid neighbour. When this method is used, entries with duplicate index values in other will be dropped, and only the first of the duplicates is kept.

    • ”mean”: Takes the mean over the given window around the reference times.

  • return_index (boolean, optional) – Include index of other in matched dataframe (default: False). Only used with method="nearest". The index will be added as a separate column with the name “index_other”.

  • return_distance (boolean, optional) – Include distance information between reference and other in matched dataframe (default: False). This is only used with method="nearest", and implies return_index=True. The distance will be added as a separate column with the name “distance_other”.

  • dropduplicates (bool, optional) – Whether to drop duplicated timestamps in other. Default is False, except when method="nearest", in which case this is enforced to be True.

  • dropna (bool, optional) – Whether to drop NaNs from the resulting dataframe (arising for example from duplicates with duplicates_nan=True or from missing values). This uses how="all", that is, only rows where all values are NaN are dropped. Default is False.

  • checkna (bool, optional) – Whether to check if only NaNs are returned (i.e. no match has been found). If set to True, raises a UserWarning in case no match has been found. Default is False.

  • flag (np.ndarray, str or None, optional) – Flag column as array or name of the flag column in other. If this is given, the column will be interpreted as validity indicator. Any nonzero values mark the row as invalid. Default is None.

  • use_invalid (bool, optional) – Whether to use invalid values marked by flag in case no valid values are available. Default is False.

Returns:

collocated – Temporally collocated version of other.

Return type:

pd.DataFrame or pd.Series