Emp.variog {ProbForecastGOP} | R Documentation |
Calculates the empirical variogram of forecast errors, averaged over time.
Emp.variog(day, obs, forecast, id, coord1, coord2, cut.points=NULL, max.dist=NULL, nbins=300)
day |
numeric vector containing the day of observation. |
obs |
numeric vector containing the observed weather quantity. |
forecast |
numeric vector containing the forecasted weather quantity. |
id |
vector with the id of the metereological stations in the dataset. |
coord1 |
vector containing the longitudes of the metereological stations. |
coord2 |
vector containing the latitudes of the metereological stations. |
cut.points |
numeric vector containing the cutpoints used for variogram binning. |
max.dist |
a numerical value giving the upper bound on the distance considered in the variogram computation. |
nbins |
a numerical value giving the number of bins for variogram binning. If both cut.points and nbins are entered, the entry for nbins will be ignored and the vector with the cutpoints will instead be used for variogram binning. |
The function includes bias-correction; it regresses the forecast on the observed weather quantity and computes the residuals. The empirical variogram of the residuals is then calculated by determining, for each day, the distance among all pairs of stations that have been observed in the same day and by calculating for each day the sum of all the squared differences in the residuals within each bin. These sums are then averaged over time, with weights for each bin given by the sum over time of the number of pairs of stations within the bin.
The formula used is:
gamma(h) = sum_d frac{1}{2N_{(h,d)}} (sum_i (Y(x_{i}+h,d)-Y(x_{i},d))^2)
where gamma(h) is the empirical variogram at distance h, N_{(h,d)} is the number of pairs of stations that have been recorded at day d and whose distance is equal to h, and Y(x_{i}+h,d) and Y(x_{i},d) are, respectively, the value of weather quantity observed on day d at stations located at x_{i}+h and x_{i}. Variogram binning is ignored in this formula.
- Defaults -
If the vector with the cutpoints is not specified, the cutpoints are determined so that there are nbins
bins with approximately the same number of pairs per bin.
If both the vector with the cutpoints and the number of bins, nbins
, are unspecified, the function by default determines the cutpoints so that there are 300 bins with approximately the same number of pairs per bin. If both the vector with the cutpoints and the number of bins are provided, the entry for the number of bins is ignored and the vector with the cutpoints is used for variogram binning.
The default value for the maximum distance considered in the variogram computation is the 90-th percentile of the distances between the stations.
The function returns a list with components given by:
res.var |
Variance of the forecast errors. |
bin.midpoints |
Numeric vector with midpoints of the bins used in the empirical variogram computation. |
number.pairs |
Numeric vector with the number of pairs per bin. |
empir.variog |
Numeric vector with the empirical variogram values. |
Depending on the data, the function might require substantial
computing time. As a consequence, if the interest is in producing
probabilistic weather forecasts and generating ensemble members, it is advised to save the output in a file and then use the Variog.fit
and Field.sim
functions.
Gel, Y., Raftery, A. E., Gneiting, T., Berrocal, V. J. veronica@stat.washington.edu.
Gel, Y., Raftery, A. E., Gneiting, T. (2004). Calibrated probabilistic mesoscale weather field forecasting: The Geostatistical Output Perturbation (GOP) method (with discussion). Journal of the American Statistical Association, Vol. 99 (467), 575–583.
Cressie, N. A. C. (1993). Statistics for Spatial Data (revised ed.). Wiley: New York.
EmpDir.variog
for directional empirical variogram averaged over time, and Variog.fit
for estimation of parameters in a parametric variogram model.
## Loading data data(slp) day <- slp$date.obs id <- slp$id.stat coord1 <- slp$lon.stat coord2 <- slp$lat.stat obs <- slp$obs forecast <- slp$forecast ## Computing variogram ## No specified cutpoints, no specified maximum distance ## Default number of bins variogram <- Emp.variog(day=day,obs=obs,forecast=forecast,id=id, coord1=coord1,coord2=coord2,cut.points=NULL,max.dist=NULL,nbins=NULL) ## Plotting variogram plot(variogram$bin.midpoints,variogram$empir.variog,xlab="Distance", ylab="Semi-variance",main="Empirical variogram") ## Computing variogram ## Specified cutpoints, specified maximum distance ## Unspecified number of bins variogram <- Emp.variog(day=day,obs=obs,forecast=forecast,id=id,coord1=coord1, coord2=coord2,cut.points=seq(0,1000,by=5),max.dist=800,nbins=NULL) ## Plotting variogram plot(variogram$bin.midpoints,variogram$empir.variog,xlab="Distance", ylab="Semi-variance",main="Empirical variogram")