pytesmo.time_series.grouping module

Module provides grouping functions that can be used together with pandas to create a few strange timegroupings like e.g. decadal products were there are three products per month with timestamps on the 10th 20th and last of the month

class pytesmo.time_series.grouping.TsDistributor(dates=None, date_ranges=None, yearless_dates=None, yearless_date_ranges=None)[source]

Bases: object

filter(idx: DatetimeIndex)[source]

Filter datetime index for a TimeSeriesDistributionSet

Parameters:: idx (pd.DatetimeIndex) – Datetime index to split using the set
Returns:: idx_filtered – Filtered Index that contains dates for the set
Return type:: pd.DatetimeIndex

select(df: DataFrame | Series | DatetimeIndex, set_nan=False)[source]

Select rows from data frame or series with mathing date time indices.

Parameters:

df (pd.DataFrame or pd.Series) – Must have a date time index, which is then filtered based on the dates.
set_nan (bool, optional (default: False)) – Instead of dropping rows that are not selected, set their values to nan.

Returns:

df – The filterd input data

Return type:

pd.DataFrame or pd.Series

class pytesmo.time_series.grouping.YearlessDatetime(month: int, day: int = 1, hour: int = 0, minute: int = 0, second: int = 0)[source]

Bases: object

Container class to store Datetime information without a year. This is used to group data when the year is not relevant (e.g. seasonal analysis). Only down to second. Used by pytesmo.validation_framework.metric_calculators_adapters.TsDistributor

day: int = 1

property doy: int: Get day of year for this date. Assume leap year! i.e.: 1=Jan.1st, 366=Dec.31st, 60=Feb.29th.

classmethod from_datetime(dt: datetime)[source]: Omit year from passed datetime to create generic datetime.

hour: int = 0

minute: int = 0

month: int

second: int = 0

to_datetime(years: Tuple[int, ...] | int | None) → datetime | List | None[source]: Convert generic datetime to datetime with year. Feb 29th for non-leap-years will return None

pytesmo.time_series.grouping.group_by_day_bin(df, bins=[1, 11, 21, 32], start=False, dtindex=None)[source]

Calculates timegroups for a given daterange. Groups are from day 1-10, 11-20, 21-last day of each month.

Parameters:

df (pandas.DataFrame) – DataFrame with DateTimeIndex for which the grouping should be done
bins (list, optional) – bins in day of the month, default is for dekadal grouping
start (boolean, optional) – if set to True the start of the bin will be the timestamp for each observations
dtindex (pandas.DatetimeIndex, optional) – precomputed DatetimeIndex that should be used for resulting groups, useful for processing of numerous datasets since it does not have to be computed for every call

Returns:

grouped (pandas.core.groupby.DataFrameGroupBy) – DataFrame groupby object according the the day bins on this object functions like sum() or mean() can be called to get the desired aggregation.
dtindex (pandas.DatetimeIndex) – returned so that it can be reused if possible

pytesmo.time_series.grouping.grouped_dates_between(start_date, end_date, bins=[1, 11, 21, 32], start=False)[source]

Between a start and end date give all dates that represent a bin See test for example.

Parameters:

start_date (date) – start date
end_date (date) – end date
bins (list, optional) – bin start values as days in a month e.g. [0,11,21] would be two bins one with values 0<=x<11 and the second one with 11<=x<21
start (boolean, optional) – if True the start of the bins is the representative date

Returns:

tstamps – list of representative dates between start and end date

Return type:

list of datetimes

pytesmo.time_series.grouping.grp_to_datetimeindex(grps, bins, dtindex, start=False)[source]

Makes a datetimeindex that has for each entry the timestamp of the bin beginning or end this entry belongs to.

Parameters:

grps (numpy.array) – group numbers made by np.digitize(data, bins)
bins (list) – bin start values e.g. [0,11,21] would be two bins one with values 0<=x<11 and the second one with 11<=x<21
dtindex (pandas.DatetimeIndex) – same length as grps, gives the basis datetime for each group
start (boolean, optional) – if set to True the start of the bin will be the timestamp for each observations

Returns:

grpdt – Datetimeindex where every date is the end of the bin the datetime ind the input dtindex belongs to

Return type:

pd.DatetimeIndex