Generalized Regression

Linear regression is one of the most useful applications in the financial engineer’s tool-kit, but it suffers from a rather restrictive set of assumptions that limit its applicability in areas of research that are characterized by their focus on highly non-linear or correlated variables.  The latter problem, referred to as colinearity (or multicolinearity) arises very frequently in financial research, because asset processes are often somewhat (or even highly) correlated.  In a colinear system, one can test for the overall significant of the regression relationship, but one is unable to distinguish which of the explanatory variables is individually significant.  Furthermore, the estimates of the model paramaters, the weights applied to each explanatory variable, tend to be biased.

Over time, many attempts have been made to address this issue, one well-known example being ridge regression.  More recent attempts include lasso, elastic net and what I term generalized regression, which appear to offer significant advantages vs traditional regression techniques in situations where the variables are correlated.

In this note, I examine a variety of these technqiues and attempt to illustrate and compare their effectiveness.

You can downlaod a pdf here.

A Mathematica notebook is also available here.

 

Master’s in High Frequency Finance

I have been discussing with some potential academic partners the concept for a new graduate program in High Frequency Finance.  The idea is to take the concept of the Computational Finance program developed in the 1990s and update it to meet the needs of students in the 2010s.

The program will offer a thorough grounding in the modeling concepts, trading strategies and risk management procedures currently in use by leading investment banks, proprietary trading firms and hedge funds in US and international financial markets.  Students will also learn the necessary programming and systems design skills to enable them to make an effective contribution as quantitative analysts, traders, risk managers and developers.

I would be interested in feedback and suggestions as to the proposed content of the program.

Range-Based EGARCH Option Pricing Models (REGARCH)

The research in this post and the related paper on Range Based EGARCH Option pricing Models is focused on the innovative range-based volatility models introduced in Alizadeh, Brandt, and Diebold (2002) (hereafter ABD).  We develop new option pricing models using multi-factor diffusion approximations couched within this theoretical framework and examine their properties in comparison with the traditional Black-Scholes model.

The two-factor version of the model, which I have applied successfully in various option arbitrage strategies, encapsulates the intuively appealing idea of a trending long term mean volatility process, around which oscillates a mean-reverting, transient volatility process.  The option pricing model also incorporates asymmetry/leverage effects and well as correlation effects between the asset return and volatility processes, which results in a volatility skew. 

The core concept behind Range-Based Exponential GARCH model is Log-Range estimator discussed in an earlier post on volatility metrics, which contains a lengthy exposition of various volatility estimators and their properties. (Incidentally, for those of you who requested a copy of my paper on Estimating Historical Volatility, I have updated the post to include a link to the pdf).

We assume that the log stock price s follows a drift-less Brownian motion ds = sdW. The volatility of daily log returns, denoted h= s/sqrt(252), is assumed constant within each day, at ht from the beginning to the end of day t, but is allowed to change from one day to the next, from ht at the end of day t to ht+1 at the beginning of day t+1.  Under these assumptions, ABD show that the log range, defined as:

is to a very good approximation distributed as

where N[m; v] denotes a Gaussian distribution with mean m and variance v. The above equation demonstrates that the log range is a noisy linear proxy of log volatility ln ht.  By contrast, according to the results of Alizadeh, Brandt,and Diebold (2002), the log absolute return has a mean of 0.64 + ln ht and a variance of 1.11. However, the distribution of the log absolute return is far from Gaussian.  The fact that both the log range and the log absolute return are linear log volatility proxies (with the same loading of one), but that the standard deviation of the log range is about one-quarter of the standard deviation of the log absolute return, makes clear that the range is a much more informative volatility proxy. It also makes sense of the finding of Andersen and Bollerslev (1998) that the daily range has approximately the same informational content as sampling intra-daily returns every four hours.

Except for the model of Chou (2001), GARCH-type volatility models rely on squared or absolute returns (which have the same information content) to capture variation in the conditional volatility ht. Since the range is a more informative volatility proxy, it makes sense to consider range-based GARCH models, in which the range is used in place of squared or absolute returns to capture variation in the conditional volatility. This is particularly true for the EGARCH framework of Nelson (1990), which describes the dynamics of log volatility (of which the log range is a linear proxy).

ABD consider variants of the EGARCH framework introduced by Nelson (1990). In general, an EGARCH(1,1) model performs comparably to the GARCH(1,1) model of Bollerslev (1987).  However, for stock indices the in-sample evidence reported by Hentschel (1995) and the forecasting performance presented by Pagan and Schwert (1990) show a slight superiority of the EGARCH specification. One reason for this superiority is that EGARCH models can accommodate asymmetric volatility (often called the “leverage effect,” which refers to one of the explanations of asymmetric volatility), where increases in volatility are associated more often with large negative returns than with equally large positive returns.

The one-factor range-based model (REGARCH 1)  takes the form:

where the returns process Rt is conditionally Gaussian: Rt ~ N[0, ht2]

and the process innovation is defined as the standardized deviation of the log range from its expected value:

Following Engle and Lee (1999), ABD also consider multi-factor volatility models.  In particular, for a two-factor range-based EGARCH model (REGARCH2), the conditional volatility dynamics) are as follows:

and

where ln qt can be interpreted as a slowly-moving stochastic mean around which log volatility  ln ht makes large but transient deviations (with a process determined by the parameters kh, fh and dh).

The parameters q, kq, fq and dq determine the long-run mean, sensitivity of the long run mean to lagged absolute returns, and the asymmetry of absolute return sensitivity respectively.

The intuition is that when the lagged absolute return is large (small) relative to the lagged level of volatility, volatility is likely to have experienced a positive (negative) innovation. Unfortunately, as we explained above, the absolute return is a rather noisy proxy of volatility, suggesting that a substantial part of the volatility variation in GARCH-type models is driven by proxy noise as opposed to true information about volatility. In other words, the noise in the volatility proxy introduces noise in the implied volatility process. In a volatility forecasting context, this noise in the implied volatility process deteriorates the quality of the forecasts through less precise parameter estimates and, more importantly, through less precise estimates of the current level of volatility to which the forecasts are anchored.

read more

2-Factor REGARCH Model for the S&P500 Index

Yield Curve Construction Models – Tools & Techniques

Yield Curve

Yield curve models are used to price a wide variety of interest rate-contingent claims.  The existence of several different competing methods of curve construction available and there is no single standard method for constructing yield curves and alternate procedures are adopted in different business areas to suit local requirements and market conditions.  This fragmentation has often led to confusion amongst some users of the models as to their precise functionality and uncertainty as to which is the most appropriate modeling technique. In addition, recent market conditions, which inter-alia have seen elevated levels of LIBOR basis volatility, have served to heighten concerns amongst some risk managers and other model users about the output of the models and the validity of the underlying modeling methods.

The purpose of this review, which was carried out in conjunction with research analyst Xu Bai, now at Morgan Stanley, was to gain a thorough understanding of current methodologies, to validate their theoretical frameworks and implementation, identify any weaknesses in the current modeling methodologies, and to suggest improvements or alternative approaches that may enhance the accuracy, generality and robustness of modeling procedures.

Download paper here.

The Lognormal Mixture Variance Model

LNVM Voatility Surface

The LNVM model is a mixture of lognormal models and the model density is a linear combination of the underlying densities, for instance, log-normal densities. The resulting density of this mixture is no longer log-normal and the model can thereby better fit skew and smile observed in the market.  The model is becoming increasingly widely used for interest rate/commodity hybrids.

In this review of the model, I examine the mathematical framework of the model in order to gain an understanding of its key features and characteristics. 

Download article here.