Can Machine Learning Techniques Be Used To Predict Market Direction? The 1,000,000 Model Test.

During the 1990′s the advent of Neural Networks unleashed a torrent of research on their applications in financial markets, accompanied by some rather extravagant claims about their predicative abilities.  Sadly, much of the research proved to be sub-standard and the results illusionary, following which the topic was largely relegated to the bleachers, at least in the field of financial market research.

With the advent of new machine learning techniques such as Random Forests, Support Vector Machines and Nearest Neighbor Classification, there has been a resurgence of interest in non-linear modeling techniques and a flood of new research, a fair amount of it supportive of their potential for forecasting financial markets.  Once again, however, doubts about the quality of some of the research bring the results into question.

Against this background I and my co-researcher Dan Rico set out to address the question of whether these new techniques really do have predicative power, more specifically the ability to forecast market direction.  Using some excellent MatLab toolboxes and a new software package, an Excel Addin called 11Ants, that makes large scale testing of multiple models a snap, we examined over 1,000,000 models and model-ensembles, covering just about every available non-linear technique.  The data set for our study comprised daily prices for a selection of US equity securities, together with a large selection of technical indicators for which some other researchers have claimed explanatory power.

In-Sample Equity Curve for Best Performing Nonlinear Model

The answer provided by our research was, without exception, in the negative: not one of the models tested showed any significant ability to predict the direction of any of the securities in our data set.  Furthermore, our study found that the best-performing models favored raw price data over technical indicator variables, suggesting that the latter have little explanatory power. 

As with Neural Networks, the principal difficulty with non-linear techniques appears to be curve-fitting and a failure to generalize:  while it is very easy to find models that provide an excellent fit to in-sample data, the forecasting performance out-of-sample is often very poor. 

Out-of-Sample Equity Curve for Best Performing Nonlinear Model

Some caveats about our own research apply.  First and foremost, it is of course impossible to prove a hypothesis in the negative.  Secondly, it is plausible that some markets are less efficient than others:  some studies have claimed success in developing predictive models due to the (relative) inefficiency of the F/X and futures markets, for example.  Thirdly, the choice of sample period may be criticized:  it could be that the models were over-conditioned on a too- lengthy in-sample data set, which in one case ran from 1993 to 2008, with just two years (2009-2010) of out-of-sample data.  The choice of sample was deliberate, however:  had we omitted the 2008 period from the “learning” data set, it would be very easy to criticize the study for failing to allow the algorithms to learn about the exceptional behavior of the markets during that turbulent year.

Despite these limitations, our research casts doubt on the findings of some less-extensive studies, that may be the result of sample-selection bias.  One characteristic of the most credible studies finding evidence in favor of market predictability, such as those by Pesaran and Timmermann, for instance (see paper for citations), is that the models they employ tend to incorporate independent explanatory variables, such as yield spreads, which do appear to have real explanatory power.  The finding of our study suggest that, absent such explanatory factors, the ability to predict markets using sophisticated non-linear techniques applied to price data alone may prove to be as illusionary as it was in the 1990’s.

Download paper here.

Market Timing in the S&P 500 Index Using Volatility Forecasts

There has been a good deal of interest in the market timing ideas discussed in my earlier blog post Using Volatility to Predict Market Direction, which discusses the research of Diebold and Christoffersen into the sign predictability induced by volatility dynamics.  The ideas are thoroughly explored in a QuantNotes article from 2006, which you can download here

There is a follow-up article from 2006 in which Christoffersen, Diebold, Mariano and Tay develop the ideas further to consider the impact of higher moments of the asset return distribution on sign predictability and the potential for market timing in international markets (download here).

Trading Strategy
To illustrate some of the possibilities of this approach, we constructed a simple market timing strategy in which a position was taken in the S&P 500 index or in 90-Day T-Bills, depending on an ex-ante forecast of positive returns from the logit regression model (and using an expanding window to estimate the drift coefficient).  We assume that the position is held for 30 days and rebalanced at the end of each period.  In this test we make no allowance for market impact, or transaction costs.

Results
Annual returns for the strategy and for the benchmark S&P 500 Index are shown in the figure below.  The strategy performs exceptionally well in 1987, 1989 and 1995, when the ratio between expected returns and volatility remains close to optimum levels and the direction of the S&P 500 Index is highly predictable,  Of equal interest is that the strategy largely avoids the market downturn of 2000-2002 altogether, a period in which sign probabilities were exceptionally low.

In terms of overall performance, the model enters the market in 113 out of a total of 241 months (47%) and is profitable in 78 of them (69%).  The average gain is 7.5% vs. an average loss of –4.11% (ratio 1.83).  The compound annual return is 22.63%, with an annual volatility of 17.68%, alpha of 14.9% and Sharpe ratio of 1.10. 

The under-performance of the strategy in 2003 is explained by the fact that direction-of-change probabilities were rising from a very low base in Q4 2002 and do not reach trigger levels until the end of the year.  Even though the strategy out-performed the Index by a substantial margin of 6% , the performance in 2005 is of concern as market volatility was very low and probabilities overall were on a par with those seen in 1995.  Further tests are required to determine whether the failure of the strategy to produce an exceptional performance on par with 1995 was the result of normal statistical variation or due to changes in the underlying structure of the process requiring model recalibration.

Future Research & Development
The obvious next step is to develop the approach described above to formulate trading strategies based on sign forecasting in a universe of several assets, possibly trading binary options.  The approach also has potential for asset allocation, portfolio theory and risk management applications.

Market Timing in the S&P500 Index