Can Machine Learning Techniques Be Used To Predict Market Direction? The 1,000,000 Model Test.

During the 1990’s the advent of Neural Networks unleashed a torrent of research on their applications in financial markets, accompanied by some rather extravagant claims about their predicative abilities.  Sadly, much of the research proved to be sub-standard and the results illusionary, following which the topic was largely relegated to the bleachers, at least in the field of financial market research.

With the advent of new machine learning techniques such as Random Forests, Support Vector Machines and Nearest Neighbor Classification, there has been a resurgence of interest in non-linear modeling techniques and a flood of new research, a fair amount of it supportive of their potential for forecasting financial markets.  Once again, however, doubts about the quality of some of the research bring the results into question.

SSALGOTRADING AD

Against this background I and my co-researcher Dan Rico set out to address the question of whether these new techniques really do have predicative power, more specifically the ability to forecast market direction.  Using some excellent MatLab toolboxes and a new software package, an Excel Addin called 11Ants, that makes large scale testing of multiple models a snap, we examined over 1,000,000 models and model-ensembles, covering just about every available non-linear technique.  The data set for our study comprised daily prices for a selection of US equity securities, together with a large selection of technical indicators for which some other researchers have claimed explanatory power.

In-Sample Equity Curve for Best Performing Nonlinear Model
In-Sample Equity Curve for Best Performing Nonlinear Model

The answer provided by our research was, without exception, in the negative: not one of the models tested showed any significant ability to predict the direction of any of the securities in our data set.  Furthermore, our study found that the best-performing models favored raw price data over technical indicator variables, suggesting that the latter have little explanatory power.

As with Neural Networks, the principal difficulty with non-linear techniques appears to be curve-fitting and a failure to generalize:  while it is very easy to find models that provide an excellent fit to in-sample data, the forecasting performance out-of-sample is often very poor.

Out-of-Sample Equity Curve for Best Performing Nonlinear Model
Out-of-Sample Equity Curve for Best Performing Nonlinear Model

Some caveats about our own research apply.  First and foremost, it is of course impossible to prove a hypothesis in the negative.  Secondly, it is plausible that some markets are less efficient than others:  some studies have claimed success in developing predictive models due to the (relative) inefficiency of the F/X and futures markets, for example.  Thirdly, the choice of sample period may be criticized:  it could be that the models were over-conditioned on a too- lengthy in-sample data set, which in one case ran from 1993 to 2008, with just two years (2009-2010) of out-of-sample data.  The choice of sample was deliberate, however:  had we omitted the 2008 period from the “learning” data set, it would be very easy to criticize the study for failing to allow the algorithms to learn about the exceptional behavior of the markets during that turbulent year.

Despite these limitations, our research casts doubt on the findings of some less-extensive studies, that may be the result of sample-selection bias.  One characteristic of the most credible studies finding evidence in favor of market predictability, such as those by Pesaran and Timmermann, for instance (see paper for citations), is that the models they employ tend to incorporate independent explanatory variables, such as yield spreads, which do appear to have real explanatory power.  The finding of our study suggest that, absent such explanatory factors, the ability to predict markets using sophisticated non-linear techniques applied to price data alone may prove to be as illusionary as it was in the 1990’s.

 

ONE MILLION MODELS

Machine Learning Trading Systems

The SPDR S&P 500 ETF (SPY) is one of the widely traded ETF products on the market, with around $200Bn in assets and average turnover of just under 200M shares daily.  So the likelihood of being able to develop a money-making trading system using publicly available information might appear to be slim-to-none. So, to give ourselves a fighting chance, we will focus on an attempt to predict the overnight movement in SPY, using data from the prior day’s session.

In addition to the open/high/low and close prices of the preceding day session, we have selected a number of other plausible variables to build out the feature vector we are going to use in our machine learning model:

  • The daily volume
  • The previous day’s closing price
  • The 200-day, 50-day and 10-day moving averages of the closing price
  • The 252-day high and low prices of the SPY series

We will attempt to build a model that forecasts the overnight return in the ETF, i.e.  [O(t+1)-C(t)] / C(t)

SSALGOTRADING AD

In this exercise we use daily data from the beginning of the SPY series up until the end of 2014 to build the model, which we will then test on out-of-sample data running from Jan 2015-Aug 2016.  In a high frequency context a considerable amount of time would be spent evaluating, cleaning and normalizing the data.  Here we face far fewer problems of that kind.  Typically one would standardized the input data to equalize the influence of variables that may be measured on scales of very different orders of magnitude.  But in this example all of the input variables, with the exception of volume, are measured on the same scale and so standardization is arguably unnecessary.

First, the in-sample data is loaded and used to create a training set of rules that map the feature vector to the variable of interest, the overnight return:

 

fig1

 

In Mathematica 10 Wolfram introduced a suite of machine learning algorithms that include regression, nearest neighbor, neural networks and random forests, together with functionality to evaluate and select the best performing machine learning technique.  These facilities make it very straightfoward to create a classifier or prediction model using machine learning algorithms, such as this handwriting recognition example:

handwriting

We create a predictive model on the SPY trainingset, allowing Mathematica to pick the best machine learning algorithm:

fig3

There are a number of options for the Predict function that can be used to control the feature selection, algorithm type, performance type and goal, rather than simply accepting the defaults, as we have done here:

fig4

Having built our machine learning model, we load the out-of-sample data from Jan 2015 to Aug 2016, and create a test set:

fig5

 

We next create a PredictionMeasurement object,  using the Nearest Neighbor model , that can be used for further analysis:

 

fig6

fig7

fig8

 

There isn’t much dispersion in the model forecasts, which all have positive value.  A common technique in such cases is to subtract the mean from each of the forecasts (and we may also standardize them by dividing by the standard deviation).

The scatterplot of actual vs. forecast overnight returns in SPY now looks like this:

scatterplot

 

There’s still an obvious lack of dispersion in the forecast values, compared to the actual overnight returns, which we could rectify by standardization. In any event, there appears to be a small, nonlinear relationship between forecast and actual values, which holds out some hope that the model may yet prove useful.

From Forecasting to Trading

There are various methods of deploying a forecasting model in the context of creating a trading system.  The simplest route, which we  will take here, is to apply a threshold gate and convert the filtered forecasts directly into a trading signal. But other approaches are possible, for example:

  • Combining the forecasts from multiple models to create a prediction ensemble
  • Using the forecasts as inputs to a genetic programming model
  • Feeding the forecasts into the input layer of  a neural network model designed specifically to generate trading signals, rather than forecasts

In this example we will create a trading model by applying a simple filter to the forecasts, picking out only those values that exceed a specified threshold. This is a standard trick used to isolate the signal in the model from the background noise.  We will accept only the positive signals that exceed the threshold level, creating a long-only trading system.  i.e. we ignore forecasts that fall below the threshold level.  We buy SPY at the close when the forecast exceeds the threshold and exit any long position at the next day’s open.  This strategy produces the following pro-forma results:

 

Perf table

 

equity curve

 

Conclusion

The system has some quite attractive features, including a win rate of over 66%  and a CAGR of over 10% for the out-of-sample period.

Obviously, this is a very basic illustration: we would want to factor in trading commissions, and the slippage incurred entering and exiting positions in the post- and pre-market periods, which will negatively impact performance, of course.  On the other hand, we have barely begun to scratch the surface in terms of the variables that could be considered for inclusion in the feature vector, and which may increase the explanatory power of the model.

In other words, in reality, this is only the beginning of a lengthy and arduous research process. Nonetheless, this simple example should be enough to give the reader a taste of what’s involved in building a predictive trading model using machine learning algorithms.