mkfact {seas} | R Documentation |
Discretizes a date within a year into a bin (or factor
)
for analysis, such as 11-day groups or by month.
# normal usage mkfact(dat, width) # dat is an integer Julian day and width is non-numeric mkfact(dat, width, year)
dat |
data.frame with at least a date column
(Date or POSIXct class).
It can also be an integer specifying the Julian day (specify year if width is non-numeric).
If it is omitted, the full number of days will be calculated for the year argument. |
width |
One of many options; usually specifies the number of days
in each bin (default is 11 days), but can also use "mon" for
months; see details below. |
year |
Required if dat is omitted, or if dat is a
Julian day integer and width is non-numeric; used to
calculate leap year. |
This useful date function groups days of a year into discrete
bins (or into a factor
). Statistical and plotting
functions can be applied to a variable contained within each bin. An
example of this would be to find the monthly temperature averages,
where month is the bin.
If width
is integer
, the width of each bin
(except for the last) will be exactly width
days. Since the
number of days in a year are not consistent, nor are always perfectly
divisible by width
, the numbers of days in the last bin will
vary. mksub
determines that last bin must have at least 20% of
the number of observations for a leap year, otherwise it is merged
into the second to last bin (which will have extra numbers of
days). If width
is numeric
(i.e. 366/12
),
the width of each bin varies slightly. Using width = 366/12
is
slightly different than width = "mon"
. Leap years only affect
the last bin.
Other common classifications based on the Gregorian calendar can be
used if width
is given a character
array. All of
these systems are arbitrary: having different numbers of days in each
bin, and leap years affecting the number of days in February. The most
common, of course, is by month ("mon"
). Meteorological
quarterly seasons ("DJF"
) are based on grouping three months,
starting with December. This style of grouping is commonly used in
climate literature, and is preferred over the season names
‘winter’, ‘spring’, ‘summer’, and
‘autumn’, which apply to only one hemisphere. The less common
annual quarterly divisions ("JFM"
) are similar, except that
grouping begins with January. Zodiac divisions ("zod"
) are
included for demonstrative purposes, and are based on the Tropical
birth dates (common in Western-culture horoscopes) starting with Aries
(March 21).
Here are the complete list of options for the width
argument:
numeric
: the width of each bin (or group) in days
366/n
: divide the year into n
sections
"mon"
: month intervals (abbreviated month names)
"month"
: month intervals (full month names)
"DJF"
: meteorological quarterly divisions: DJF, MAM, JJA, SON
"JFM"
: annual quarterly divisions: JFM, AMJ, JAS, OND
"JF"
: annual six divisions: JF, MA, AJ, JA, SO, ND
"zod"
: zodiac intervals (abbreviated symbol names)
"zodiac"
: zodiac intervals (full zodiac names)
Returns an array of factor
s for each date given in dat
.
See examples for its application.
Month names generated using "mon"
or "months"
are locale
specific, and depend on your operating system and system language
settings. Normally, abbreviated month names should have exactly three
characters or less, with no trailing decimals. However,
Microsoft-based operating systems have an inconsistent set of
abbreviated month names between locales. For example, abbreviated
month names in English locales have three letters with no period at
the end, while French locales have 3–4 letters with a decimal at the
end. If your OS is POSIX, you should have consistent month names in
any locale.
To avoid any issues supporting locales, simply revert to a C locale
(i.e. Sys.setlocale(loc="C")
)
The phase of the Gregorian solar year (begins Julian day 1, or January
1st) is not in sync with the phase of "DJF"
(begins Julian day
335/336) or "zod"
(begins Julian day 80/81). If either of these
systems are to be used, ensure that there are several years of
data, or that the phase of the data is the same as the beginning
Julian day.
For instance, if one years worth of data beginning on Julian day 1 is
factored into "DJF"
bins, the first bin will mix data from the
first three months, and from the last month. The last three bins will
have a continuous set of data. If the values are not perfectly
periodic, the first bin will have higher variance, due to the mixing
of data separated by nearly a year.
M.W. Toews
http://en.wikipedia.org/wiki/Solar_calendar
# Demonstrate the number of days in each category barplot(table(mkfact(width="mon", y=2005)), main="Number of days in each month") barplot(table(mkfact(width="zod", y=2005)), main="Number of days in each zodiac sign") barplot(table(mkfact(width="DJF", y=2005)), main="Number of days in each meteorological season") barplot(table(mkfact(width=5, y=2005)), main="Number of days in 5-day categories") barplot(table(mkfact(width=11, y=2005)), main="Number of days in 11-day categories") barplot(table(mkfact(width=366/12, y=2005)), main="Number of days in 12-section year", sub="Note: not exactly the same as months") # Application using synthetic data dat <- data.frame(date=as.Date(paste(2005,1:365),"%Y %j"), value=(-cos(1:365*2*pi/365)*10+rnorm(365)*3+10)) dat$d5 <- mkfact(dat,5) dat$d11 <- mkfact(dat,11) dat$month <- mkfact(dat,"mon") dat$DJF <- mkfact(dat,"DJF") plot(value ~ date, dat) plot(value ~ d5, dat) plot(value ~ d11, dat) plot(value ~ month, dat) plot(value ~ DJF, dat) print(head(dat)) tapply(dat$value, dat$month, mean, na.rm=TRUE) tapply(dat$value, dat$DJF, mean, na.rm=TRUE)