pytesmo.time_series.grouping module
Module provides grouping functions that can be used together with pandas to create a few strange timegroupings like e.g. decadal products were there are three products per month with timestamps on the 10th 20th and last of the month
- class pytesmo.time_series.grouping.TsDistributor(dates=None, date_ranges=None, yearless_dates=None, yearless_date_ranges=None)[source]
Bases:
object
- filter(idx: DatetimeIndex)[source]
Filter datetime index for a TimeSeriesDistributionSet
- Parameters:
idx (pd.DatetimeIndex) – Datetime index to split using the set
- Returns:
idx_filtered – Filtered Index that contains dates for the set
- Return type:
pd.DatetimeIndex
- select(df: DataFrame | Series | DatetimeIndex, set_nan=False)[source]
Select rows from data frame or series with mathing date time indices.
- Parameters:
df (pd.DataFrame or pd.Series) – Must have a date time index, which is then filtered based on the dates.
set_nan (bool, optional (default: False)) – Instead of dropping rows that are not selected, set their values to nan.
- Returns:
df – The filterd input data
- Return type:
pd.DataFrame or pd.Series
- class pytesmo.time_series.grouping.YearlessDatetime(month: int, day: int = 1, hour: int = 0, minute: int = 0, second: int = 0)[source]
Bases:
object
Container class to store Datetime information without a year. This is used to group data when the year is not relevant (e.g. seasonal analysis). Only down to second. Used by
pytesmo.validation_framework.metric_calculators_adapters.TsDistributor
- property doy: int
Get day of year for this date. Assume leap year! i.e.: 1=Jan.1st, 366=Dec.31st, 60=Feb.29th.
- pytesmo.time_series.grouping.group_by_day_bin(df, bins=[1, 11, 21, 32], start=False, dtindex=None)[source]
Calculates timegroups for a given daterange. Groups are from day 1-10, 11-20, 21-last day of each month.
- Parameters:
df (pandas.DataFrame) – DataFrame with DateTimeIndex for which the grouping should be done
bins (list, optional) – bins in day of the month, default is for dekadal grouping
start (boolean, optional) – if set to True the start of the bin will be the timestamp for each observations
dtindex (pandas.DatetimeIndex, optional) – precomputed DatetimeIndex that should be used for resulting groups, useful for processing of numerous datasets since it does not have to be computed for every call
- Returns:
grouped (pandas.core.groupby.DataFrameGroupBy) – DataFrame groupby object according the the day bins on this object functions like sum() or mean() can be called to get the desired aggregation.
dtindex (pandas.DatetimeIndex) – returned so that it can be reused if possible
- pytesmo.time_series.grouping.grouped_dates_between(start_date, end_date, bins=[1, 11, 21, 32], start=False)[source]
Between a start and end date give all dates that represent a bin See test for example.
- Parameters:
start_date (date) – start date
end_date (date) – end date
bins (list, optional) – bin start values as days in a month e.g. [0,11,21] would be two bins one with values 0<=x<11 and the second one with 11<=x<21
start (boolean, optional) – if True the start of the bins is the representative date
- Returns:
tstamps – list of representative dates between start and end date
- Return type:
list of datetimes
- pytesmo.time_series.grouping.grp_to_datetimeindex(grps, bins, dtindex, start=False)[source]
Makes a datetimeindex that has for each entry the timestamp of the bin beginning or end this entry belongs to.
- Parameters:
grps (numpy.array) – group numbers made by np.digitize(data, bins)
bins (list) – bin start values e.g. [0,11,21] would be two bins one with values 0<=x<11 and the second one with 11<=x<21
dtindex (pandas.DatetimeIndex) – same length as grps, gives the basis datetime for each group
start (boolean, optional) – if set to True the start of the bin will be the timestamp for each observations
- Returns:
grpdt – Datetimeindex where every date is the end of the bin the datetime ind the input dtindex belongs to
- Return type:
pd.DatetimeIndex