Can Machine Learning Techniques Be Used To Predict Market Direction? The 1,000,000 Model Test.

During the 1990′s the advent of Neural Networks unleashed a torrent of research on their applications in financial markets, accompanied by some rather extravagant claims about their predicative abilities.  Sadly, much of the research proved to be sub-standard and the results illusionary, following which the topic was largely relegated to the bleachers, at least in the field of financial market research.

With the advent of new machine learning techniques such as Random Forests, Support Vector Machines and Nearest Neighbor Classification, there has been a resurgence of interest in non-linear modeling techniques and a flood of new research, a fair amount of it supportive of their potential for forecasting financial markets.  Once again, however, doubts about the quality of some of the research bring the results into question.

Against this background I and my co-researcher Dan Rico set out to address the question of whether these new techniques really do have predicative power, more specifically the ability to forecast market direction.  Using some excellent MatLab toolboxes and a new software package, an Excel Addin called 11Ants, that makes large scale testing of multiple models a snap, we examined over 1,000,000 models and model-ensembles, covering just about every available non-linear technique.  The data set for our study comprised daily prices for a selection of US equity securities, together with a large selection of technical indicators for which some other researchers have claimed explanatory power.

In-Sample Equity Curve for Best Performing Nonlinear Model

The answer provided by our research was, without exception, in the negative: not one of the models tested showed any significant ability to predict the direction of any of the securities in our data set.  Furthermore, our study found that the best-performing models favored raw price data over technical indicator variables, suggesting that the latter have little explanatory power. 

As with Neural Networks, the principal difficulty with non-linear techniques appears to be curve-fitting and a failure to generalize:  while it is very easy to find models that provide an excellent fit to in-sample data, the forecasting performance out-of-sample is often very poor. 

Out-of-Sample Equity Curve for Best Performing Nonlinear Model

Some caveats about our own research apply.  First and foremost, it is of course impossible to prove a hypothesis in the negative.  Secondly, it is plausible that some markets are less efficient than others:  some studies have claimed success in developing predictive models due to the (relative) inefficiency of the F/X and futures markets, for example.  Thirdly, the choice of sample period may be criticized:  it could be that the models were over-conditioned on a too- lengthy in-sample data set, which in one case ran from 1993 to 2008, with just two years (2009-2010) of out-of-sample data.  The choice of sample was deliberate, however:  had we omitted the 2008 period from the “learning” data set, it would be very easy to criticize the study for failing to allow the algorithms to learn about the exceptional behavior of the markets during that turbulent year.

Despite these limitations, our research casts doubt on the findings of some less-extensive studies, that may be the result of sample-selection bias.  One characteristic of the most credible studies finding evidence in favor of market predictability, such as those by Pesaran and Timmermann, for instance (see paper for citations), is that the models they employ tend to incorporate independent explanatory variables, such as yield spreads, which do appear to have real explanatory power.  The finding of our study suggest that, absent such explanatory factors, the ability to predict markets using sophisticated non-linear techniques applied to price data alone may prove to be as illusionary as it was in the 1990’s.

Download paper here.

Range-Based EGARCH Option Pricing Models (REGARCH)

The research in this post and the related paper on Range Based EGARCH Option pricing Models is focused on the innovative range-based volatility models introduced in Alizadeh, Brandt, and Diebold (2002) (hereafter ABD).  We develop new option pricing models using multi-factor diffusion approximations couched within this theoretical framework and examine their properties in comparison with the traditional Black-Scholes model.

The two-factor version of the model, which I have applied successfully in various option arbitrage strategies, encapsulates the intuively appealing idea of a trending long term mean volatility process, around which oscillates a mean-reverting, transient volatility process.  The option pricing model also incorporates asymmetry/leverage effects and well as correlation effects between the asset return and volatility processes, which results in a volatility skew. 

The core concept behind Range-Based Exponential GARCH model is Log-Range estimator discussed in an earlier post on volatility metrics, which contains a lengthy exposition of various volatility estimators and their properties. (Incidentally, for those of you who requested a copy of my paper on Estimating Historical Volatility, I have updated the post to include a link to the pdf).

We assume that the log stock price s follows a drift-less Brownian motion ds = sdW. The volatility of daily log returns, denoted h= s/sqrt(252), is assumed constant within each day, at ht from the beginning to the end of day t, but is allowed to change from one day to the next, from ht at the end of day t to ht+1 at the beginning of day t+1.  Under these assumptions, ABD show that the log range, defined as:

is to a very good approximation distributed as

where N[m; v] denotes a Gaussian distribution with mean m and variance v. The above equation demonstrates that the log range is a noisy linear proxy of log volatility ln ht.  By contrast, according to the results of Alizadeh, Brandt,and Diebold (2002), the log absolute return has a mean of 0.64 + ln ht and a variance of 1.11. However, the distribution of the log absolute return is far from Gaussian.  The fact that both the log range and the log absolute return are linear log volatility proxies (with the same loading of one), but that the standard deviation of the log range is about one-quarter of the standard deviation of the log absolute return, makes clear that the range is a much more informative volatility proxy. It also makes sense of the finding of Andersen and Bollerslev (1998) that the daily range has approximately the same informational content as sampling intra-daily returns every four hours.

Except for the model of Chou (2001), GARCH-type volatility models rely on squared or absolute returns (which have the same information content) to capture variation in the conditional volatility ht. Since the range is a more informative volatility proxy, it makes sense to consider range-based GARCH models, in which the range is used in place of squared or absolute returns to capture variation in the conditional volatility. This is particularly true for the EGARCH framework of Nelson (1990), which describes the dynamics of log volatility (of which the log range is a linear proxy).

ABD consider variants of the EGARCH framework introduced by Nelson (1990). In general, an EGARCH(1,1) model performs comparably to the GARCH(1,1) model of Bollerslev (1987).  However, for stock indices the in-sample evidence reported by Hentschel (1995) and the forecasting performance presented by Pagan and Schwert (1990) show a slight superiority of the EGARCH specification. One reason for this superiority is that EGARCH models can accommodate asymmetric volatility (often called the “leverage effect,” which refers to one of the explanations of asymmetric volatility), where increases in volatility are associated more often with large negative returns than with equally large positive returns.

The one-factor range-based model (REGARCH 1)  takes the form:

where the returns process Rt is conditionally Gaussian: Rt ~ N[0, ht2]

and the process innovation is defined as the standardized deviation of the log range from its expected value:

Following Engle and Lee (1999), ABD also consider multi-factor volatility models.  In particular, for a two-factor range-based EGARCH model (REGARCH2), the conditional volatility dynamics) are as follows:

and

where ln qt can be interpreted as a slowly-moving stochastic mean around which log volatility  ln ht makes large but transient deviations (with a process determined by the parameters kh, fh and dh).

The parameters q, kq, fq and dq determine the long-run mean, sensitivity of the long run mean to lagged absolute returns, and the asymmetry of absolute return sensitivity respectively.

The intuition is that when the lagged absolute return is large (small) relative to the lagged level of volatility, volatility is likely to have experienced a positive (negative) innovation. Unfortunately, as we explained above, the absolute return is a rather noisy proxy of volatility, suggesting that a substantial part of the volatility variation in GARCH-type models is driven by proxy noise as opposed to true information about volatility. In other words, the noise in the volatility proxy introduces noise in the implied volatility process. In a volatility forecasting context, this noise in the implied volatility process deteriorates the quality of the forecasts through less precise parameter estimates and, more importantly, through less precise estimates of the current level of volatility to which the forecasts are anchored.

read more

2-Factor REGARCH Model for the S&P500 Index

On Testing Direction Prediction Accuracy

As regards the question of forecasting accuracy discussed in the paper on Forecasting Volatility in the S&P 500 Index, there are two possible misunderstandings here that need to be cleared up.  These arise from remarks by one commentator  as follows:

“An above 50% vol direction forecast looks good,.. but “direction” is biased when working with highly skewed distributions!   ..so it would be nice if you could benchmark it against a simple naive predictors to get a feel for significance, -or- benchmark it with a trading strategy and see how the risk/return performs.”

(i) The first point is simple, but needs saying: the phrase “skewed distributions” in the context of volatility modeling could easily be misconstrued as referring to the volatility skew. This, of course, is used to describe to the higher implied vols seen in the Black-Scholes prices of OTM options. But in the Black-Scholes framework volatility is constant, not stochastic, and the “skew” referred to arises in the distribution of the asset return process, which has heavier tails than the Normal distribution (excess Kurtosis and/or skewness). I realize that this is probably not what the commentator meant, but nonetheless it’s worth heading that possible misunderstanding off at the pass, before we go on.

(ii) I assume that the commentator was referring to the skewness in the volatility process, which is characterized by the LogNormal distribution. But the forecasting tests referenced in the paper are tests of the ability of the model to predict the direction of volatility, i.e. the sign of the change in the level of volatility from the current period to the next period. Thus we are looking at, not a LogNormal distribution, but the difference in two LogNormal distributions with equal mean – and this, of course, has an expectation of zero. In other words, the expected level of volatility for the next period is the same as the current period and the expected change in the level of volatility is zero. You can test this very easily for yourself by generating a large number of observations from a LogNormal process, taking the difference and counting the number of positive and negative changes in the level of volatility from one period to the next. You will find, on average, half the time the change of direction is positive and half the time it is negative.  

For instance, the following chart shows the distribution of the number of positive changes in the level of a LogNormally distributed random variable with mean and standard deviation of 0.5, for a sample of 1,000 simulations, each of 10,000 observations.  The sample mean (5,000.4) is very close to the expected value of 5,000.

Distribution Number of Positive Direction Changes

So, a naive predictor will forecast volatility to remain unchanged for the next period and by random chance approximately half the time volatility will turn out to be higher and half the time it will turn out to be lower than in the current period. Hence the default probability estimate for a positive change of direction is 50% and you would expect to be right approximately half of the time. In other words, the direction prediction accuracy of the naive predictor is 50%. This, then, is one of the key benchmarks you use to assess the ability of the model to predict market direction. That is what test statistics like Theil’s-U does – measures the performance relative to the naive predictor. The other benchmark we use is the change of direction predicted by the implied volatility of ATM options.
In this context, the model’s 61% or higher direction prediction accuracy is very significant (at the 4% level in fact) and this is reflected in the Theil’s-U statistic of 0.82 (lower is better). By contrast, Theil’s-U for the Implied Volatility forecast is 1.46, meaning that IV is a much worse predictor of 1-period-ahead changes in volatility than the naive predictor.

On its face, it is because of this exceptional direction prediction accuracy that a simple strategy is able to generate what appear to be abnormal returns using the change of direction forecasts generated by the model, as described in the paper. In fact, the situation is more complicated than that, once you introduce the concept of a market price of volatility risk.

Market Timing in the S&P 500 Index Using Volatility Forecasts

There has been a good deal of interest in the market timing ideas discussed in my earlier blog post Using Volatility to Predict Market Direction, which discusses the research of Diebold and Christoffersen into the sign predictability induced by volatility dynamics.  The ideas are thoroughly explored in a QuantNotes article from 2006, which you can download here

There is a follow-up article from 2006 in which Christoffersen, Diebold, Mariano and Tay develop the ideas further to consider the impact of higher moments of the asset return distribution on sign predictability and the potential for market timing in international markets (download here).

Trading Strategy
To illustrate some of the possibilities of this approach, we constructed a simple market timing strategy in which a position was taken in the S&P 500 index or in 90-Day T-Bills, depending on an ex-ante forecast of positive returns from the logit regression model (and using an expanding window to estimate the drift coefficient).  We assume that the position is held for 30 days and rebalanced at the end of each period.  In this test we make no allowance for market impact, or transaction costs.

Results
Annual returns for the strategy and for the benchmark S&P 500 Index are shown in the figure below.  The strategy performs exceptionally well in 1987, 1989 and 1995, when the ratio between expected returns and volatility remains close to optimum levels and the direction of the S&P 500 Index is highly predictable,  Of equal interest is that the strategy largely avoids the market downturn of 2000-2002 altogether, a period in which sign probabilities were exceptionally low.

In terms of overall performance, the model enters the market in 113 out of a total of 241 months (47%) and is profitable in 78 of them (69%).  The average gain is 7.5% vs. an average loss of –4.11% (ratio 1.83).  The compound annual return is 22.63%, with an annual volatility of 17.68%, alpha of 14.9% and Sharpe ratio of 1.10. 

The under-performance of the strategy in 2003 is explained by the fact that direction-of-change probabilities were rising from a very low base in Q4 2002 and do not reach trigger levels until the end of the year.  Even though the strategy out-performed the Index by a substantial margin of 6% , the performance in 2005 is of concern as market volatility was very low and probabilities overall were on a par with those seen in 1995.  Further tests are required to determine whether the failure of the strategy to produce an exceptional performance on par with 1995 was the result of normal statistical variation or due to changes in the underlying structure of the process requiring model recalibration.

Future Research & Development
The obvious next step is to develop the approach described above to formulate trading strategies based on sign forecasting in a universe of several assets, possibly trading binary options.  The approach also has potential for asset allocation, portfolio theory and risk management applications.

Market Timing in the S&P500 Index