pytesmo.time_series.anomaly module

Created on June 20, 2013

pytesmo.time_series.anomaly.calc_anomaly(Ser, window_size=35, climatology=None, respect_leap_years=True, return_clim=False)[source]

Calculates the anomaly of a time series (Pandas series). Both, climatology based, or moving-average based anomalies can be calculated

Parameters:
  • Ser (pandas.Series) – Input data (index must be a DateTimeIndex)

  • window_size (float, optional (default: 35)) – The window-size [days] of the moving-average window to calculate the anomaly reference (only used if climatology is not provided)

  • climatology (pandas.Series (index: 1-366), optional (default: None)) – if provided, anomalies will be based on the climatology

  • timespan ([timespan_from, timespan_to], datetime.datetime(y,m,d), optional) – If set, only a subset

  • respect_leap_years (boolean, optional (default: True)) – If set then leap years will be respected during matching of the climatology to the time series

  • return_clim (boolean, optional (default: False)) – if set to true the return argument will be a DataFrame which also contains the climatology time series. Only has an effect if climatology is used.

Returns:

anomaly – Series containing the calculated anomalies. If return_clim is set to true, a DataFrame will be returned, where one column contains the anomalies and another the climatology broadcasted over the whole index. If a climatology with a ‘std’ column was passed initially, this column will also be returned in the DataFrame if return_clim is chosen.

Return type:

pandas.Series or pandas.DataFrame

pytesmo.time_series.anomaly.calc_climatology(Ser, moving_avg_orig=5, moving_avg_clim=None, median=False, std=False, timespan=None, fill=nan, wraparound=True, respect_leap_years=False, interpolate_leapday=False, fillna=True, min_obs_orig=1, min_obs_clim=1, output_freq='day')[source]

Calculates the climatology of a data set.

Parameters:
  • Ser (pandas.Series) – Time series to compute climatology for (index must be a DateTimeIndex or julian date)

  • moving_avg_orig (float, optional (default: 5)) – The size of the moving_average window [days] that will be applied on the input Series (gap filling, short-term rainfall correction)

  • moving_avg_clim (float, optional (default: None)) –

    The size of the moving_average window in days that will be applied on the calculated climatology (long-term event correction). If None is passed, it will be calculated from the ‘output_freq’ value:

    • ’day’: 35

    • ’month’: 3

  • median (boolean, optional (default: False)) – if set to True, the climatology will be based on the median conditions

  • std (boolean, optional (default: False)) – if set to True, there will be 2 columns, one for the median or mean and one of the standard deviation of the aggregated data points.

  • timespan ([timespan_from, timespan_to], datetime.datetime(y,m,d), optional) – Set this to calculate the climatology based on a subset of the input Series

  • fill (float or int, optional (default: np.nan)) – Fill value to use for days on which no climatology exists

  • wraparound (boolean, optional (default: True)) – If set then the climatology is wrapped around at the edges before doing the second running average (long-term event correction)

  • respect_leap_years (boolean, optional (default: False)) – If set then leap years will be respected during the calculation of the climatology. Only valid with ‘output_freq’ value set to ‘day’. Default: False

  • interpolate_leapday (boolean, optional (default: False)) – <description>. Only valid with ‘output_freq’ value set to ‘day’. Default: False

  • fillna (boolean, optional (default: True)) – If set, then the moving average used for the calculation of the climatology will be filled at the nan-values

  • min_obs_orig (int (default: 1)) – Minimum observations required to give a valid output in the first moving average applied on the input series

  • min_obs_clim (int (default: 1)) – Minimum observations required to give a valid output in the second moving average applied on the calculated climatology

  • output_freq (str, optional (default: 'day')) – Determines the output frequency (time unit) of the climatology calculation (independently of the ‘Ser’ input frequency). Currently, supported options are ‘day’, ‘month’.

Returns:

climatology – Containing the calculated climatology. The size of the series depends on the type of climatology being calculated, based on the value of ‘output_freq’:

  • 366 values for a daily climatology, behaving as a leap year

  • 12 values for a monthly climatology

If ‘std’ is set to True, the output will be a DataFrame with 2 columns:

’climatology’ and ‘std’.

Return type:

pandas.Series or pandas.DataFrame