Why this post?

  1. Open-source software is constantly evolving and improving. Code related to some of my previous posts is not working anymore. The reason is it relies on other developers who have made changes and improvements in their packages, sometimes breaking my code which builds on theirs. Even though it bothers me to some extent, I realize that is a necessary and acceptable downside. The upside is of course the tremendous amount of time saved by reusing existing software without the need to start everything from scratch, an easily gratifiying tradeoff. In this post I fix one of those broken code snippets.
  2. Another somewhat related issue is that the way my posts are built is I first run the analysis, write the post and paste the related clean code after removing all the unrelated crap. Every once in a while I need to mend the code after a reader reports some issue in running the pasted script. For example the sequence in which I pasted the code on the site does not exactly match the one I run on my own console, causing a variable which is needed at time \(t\) (and is properly defined on my own console) to be defined only at time \(t+1\) when I pasted the code example in the comments of this page. Using Rmarkdown solution, this blunder simply cannot occur. The whole post is logically intertwinded, no more uploading figures individually and code pasting. If something is wrong with the code, the html/pdf file will not be generated in the first place. Inspired by Petr Keil’s template suggestion for sceintific manuscript in R I thought I’ll give it a go.
  3. I often praise all the hard work that is being done under Common Public Licenses. This post demonstrates how contributers can get their proper credit quite effortlessly, by just incorporating the citation of their packages.

Univariate Volatility Forecast Evaluation

In portfolio management, risk management and derivative pricing, volatility plays an important role. So important in fact that you can find more volatility models than you can handle (Wikipedia link). What follows is to check how well each model performs, in- and out-of-sample. I present here three ways to do so. First let’s get the data and load the libraries needed for the post (Written by Ryan 2013, Ghalanos (2014), Fox & Weisberg (2011), Trapletti & Hornik (2013) and George Athanasopoulos et al. (2014)):

library(quantmod) ; library(rugarch) ;library(tseries)
library(forecast) ; library(car) ; 
end<- format(Sys.Date(),"%Y-%m-%d") 
k <- 1 # most recent k years, change at will
  start<-format(Sys.Date() - k*365,"%Y-%m-%d") 
  dat0 = as.matrix(getSymbols('SPY', src="google", 
                              from=start, to=end, 
                              auto.assign = F, warnings = FALSE,symbol.lookup = F))
n = NROW(dat0)
ret <- NULL
ret[2:n] <- dat0[2:n,4]/dat0[1:(n-1),4] - 1 # close to close
ret = ret[-1]  #-1 to drop the NA
time <- as.Date(names(dat0[,1])[-1])

Now we create two different models for the volatility. One which is based on normal distribution assumption (Normal GARCH), the other which allow asymmetry in the distribution (GJR-GARCH). We use the squared residuals as the volatility proxy. There are better proxies, but they require intra-day data which we do not have here.

gjrtspec <- ugarchspec(mean.model=list(armaOrder=c(1,0)),
                       variance.model =list(model = "gjrGARCH"),distribution="std") 
normspec <- ugarchspec(mean.model=list(armaOrder=c(1,0)), distribution="norm") 
Tgjrmodel = ugarchfit(gjrtspec,ret) 
Nmodel = ugarchfit(normspec,ret)
#print(slotNames(Tgjrmodel))
Tgjrfit = slot(Tgjrmodel,"fit")$sigma
Nfit = slot(Nmodel,"fit")$sigma
ret.sq = 250*ret^2 # annualized
N.sq = 250*Nfit^2
Tgjr.sq = 250*Tgjrfit^2
par(las=1)
plot(ret.sq~time, ty = "l",yla="",main="SPY annualized volatility and fitted values from the two models", bty="n",ylab="",xlab="") 
lines(Tgjr.sq~time, col = 2) ; lines(N.sq~time, col = 3)
legend("topright", c("Realized volatility","GJR GARCH","Normal GARCH"),lwd=3, 
       lty=1, col=(1:3),bty="n")

plot of chunk Volmodels It seems that the GJR-GARCH is sharper to adjust to spikes in volatility and does a better job if volatility stays high for a while, but if subsides, the Normal Garch has a closer fit. Now we want to test the two models to see which one is better. These are not forecasts per se since we do not project forward so the title of the figure read fitted values, but these fitted values are our guess for what is the volatility in that point in time. The peculiar nature of volatility is that we do not know really what it is, not even in retrospect, don’t let anyone tell you differently, we can only decide on a proxy (more on this here). We here use the very common squared-returns as a proxy for the realized volatility but there are other half a dozen we can choose from, each will indicate different volatility number.

Regression based test - Mincer Zarnowitz regression

The idea is simple, regress actual (realized) values on forecasts:

\[ \sigma_{t+1} = \beta_0+ \beta_1 \widehat{\sigma}_{t+1} \]

Now we jointly test the hypothesis: \(\beta_0 = 0 ,\; \beta_1 = 1\). An intercept of zero means that your forecast in unbiased. Why? say by way of contradiction it is 0.02, this means that in order for the two sides to equate, we add to 0.02 on average to the forecast which means it is constantly underestimating the observed. The slope should be 1, as to say, your forecast “explains” the observed value perfectly (on average). Here is the code (relies on Fox & Weisberg (2011):

N.mz = lm(ret.sq~N.sq) ; summary(N.mz)$coef
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept) 0.003169   0.002384   1.329 1.850e-01
## N.sq        0.745490   0.140686   5.299 2.572e-07
T.mz = lm(ret.sq~Tgjr.sq) ; summary(T.mz)$coef 
##             Estimate Std. Error t value  Pr(>|t|)
## (Intercept) 0.005557   0.001894   2.935 3.652e-03
## Tgjr.sq     0.511468   0.085948   5.951 9.044e-09
linearHypothesis(N.mz, c("(Intercept) = 0", "N.sq = 1")) 
## Linear hypothesis test
## 
## Hypothesis:
## (Intercept) = 0
## N.sq = 1
## 
## Model 1: restricted model
## Model 2: ret.sq ~ N.sq
## 
##   Res.Df   RSS Df Sum of Sq    F Pr(>F)
## 1    250 0.113                         
## 2    248 0.112  2   0.00151 1.68   0.19

Pairwise Comparison - Diebold Mariano test

Proposed by Diebold & Mariano (2002), and proved more generally by Giacomini & White (2006). Say you have two models which produce two sets of forecasts. you therefore have two sets of errors. Call these errors (Keil et al. 2012)

\[ e_j = \widehat{\sigma}_{model_j} - \sigma_{observed} , \qquad j = {model_1, model_2} \]

In the case where the two methods are the same, the difference between these two vectors \(\{e_1 - e_2\}\) is zero on average (Or a function of these vectors, e.g. (\(e_1^2 - e_2^2\)). In the extreme case where you just replicate the forecasts using the same method, the difference is exactly zero. More importantly, we know how this difference is asymptotically distributed so that we can test if it is indeed far from zero.

Students always have trouble understanding this point. The explanation I give is that it is impossible to measure the distance between 0 and 2 without knowing how likely a result of 2 really is! A result of 2 with a uniform distribution between {-3,3} is not as unlikely as a result of 2 with the standard normal distribution. I guess that is also the reason why we call the probability, a probability measure, since it measures.

The method that produces significantly smaller errors (typically squared errors or absolute errors) is preferred. You can easily extend it to more than one comparison (google “Tukey for all pairwise comparisons”, or look here for the multivariate extension of the Diebold Mariano test.)

dm.test((ret.sq - N.sq), (ret.sq - Tgjr.sq), alternative = c("two.sided"), h = 1, power = 2)
## 
##  Diebold-Mariano Test
## 
## data:  (ret.sq - N.sq)(ret.sq - Tgjr.sq)
## DM = -0.4354, Forecast horizon = 1, Loss function power = 2,
## p-value = 0.6636
## alternative hypothesis: two.sided

You can see that p-value>0.05 which means that there is not much of a difference between the two model in this example.

Jarque-Bera test

In the case we have an accurate volatility forecast. We can center the series and scale it using our forecasts for the standard deviation. To be precise, this guy:

\[\frac{original.series_t - mean(original.series)}{\widehat{\sigma}_t}\]

should have mean zero and standard deviation of one, IF we have good estimate for the conditional volatility. The new standardized series will NOT, in general, be normally distributed, but you can use this test as a measure for how well the skewness and kurtosis of the original series are captured by your model. In fact, the idea for this post came to me when I read Pat’s post: garch-and-long-tails where Pat was checking how Kurtosis is (unconditionally) captured when we use t-distribution instead of normal in the Garch model. The Jarque-Bera test is a natural extension since the higher moments, skewness and kurtosis, appear in the expression for the test statistic.

standardN = scale(ret, scale = F)/Nfit
standardTgjr = scale(ret, scale = F)/Tgjrfit
scaleret = scale(ret) # no volatility model, just Unconditional volatility
jarque.bera.test(scaleret)$stat
## X-squared 
##     7.322
jarque.bera.test(standardN)$stat
## X-squared 
##     25.62
jarque.bera.test(standardTgjr)$stat
## X-squared 
##     17.05

The statistic computed using the GJR-GARCH estimates is lower, indicating a better fit than that using the Normal-GARCH.

References and credits

Here are the references and packages used for the post:

Diebold, F.X. & Mariano, R.S. (2002). Comparing predictive accuracy. Journal of Business & Economic Statistics, 20, 134–144. Retrieved from http://dx.doi.org/10.1198/073500102753410444

Fox, J. & Weisberg, S. (2011). An R companion to applied regression, Secondn. Sage, Thousand Oaks CA. Retrieved from http://socserv.socsci.mcmaster.ca/jfox/Books/Companion

George Athanasopoulos, R.J.H. with contributions from, Razbash, S., Schmidt, D., Zhou, Z., Khan, Y., Bergmeir, C. & Wang, E. (2014). Forecast: Forecasting functions for time series and linear models. Retrieved from http://CRAN.R-project.org/package=forecast

Ghalanos, A. (2014). Rugarch: Univariate gARCH models.

Giacomini, R. & White, H. (2006). Tests of conditional predictive ability. Econometrica, 74, 1545–1578. Retrieved from http://dx.doi.org/10.1111/j.1468-0262.2006.00718.x

Keil, P., Belmaker, J., Wilson, A.M., Unitt, P. & Jetz, W. (2012). Downscaling of species distribution models: A hierarchical approach (R. Freckleton, Ed.). Methods Ecol Evol, 4, 82–94. Retrieved from http://dx.doi.org/10.1111/j.2041-210x.2012.00264.x

Ryan, J.A. (2013). Quantmod: Quantitative financial modelling framework. Retrieved from http://CRAN.R-project.org/package=quantmod

Trapletti, A. & Hornik, K. (2013). Tseries: Time series analysis and computational finance. Retrieved from http://CRAN.R-project.org/package=tseries