Forecasting Market Indices Using Stacked Autoencoders & LSTM

Quality Research vs. Poor Research

The stem paper for this post is:

Bao W, Yue J, Rao Y (2017) A deep learning framework for financial time series using
stacked autoencoders and long-short term memory. PLoS ONE 12(7): e0180944. https://doi.org/10.1371/journal.pone.0180944

The chief claim by the researchers is that 90% to 95% 1-day ahead forecast accuracy can be achieved for a selection of market indices, including the S&P500 and Dow Jones Industrial Average, using a deep learning network of stacked autoencoders and LSTM layers, acting on data transformed using the Haar Discrete Wavelet Transform. The raw data comprises daily data for the index, around a dozen standard technical indicators, the US dollar index and an interest rate series.

Before we go into any detail let’s just step back and look at the larger picture. We have:

  • Unknown researchers
  • A journal from outside the field of finance
  • A paper replete with pretty colored images, but very skimpy detail on the methodology
  • A claimed result that lies far beyond the bounds of credibility

There’s enough red flags here to start a stampede at Pamplona. Let’s go through them one by one:

  1. Everyone is unknown at some point in their career. But that’s precisely why you partner with a widely published author. It gives the reader confidence that the paper isn’t complete garbage.
  2. Not everyone gets to publish in the Journal of Finance. I get that. How many of us were regular readers of the Journal of Political Economy before Black and Scholes published their famous paper on option pricing in 1973? Nevertheless, a finance paper published in a medical journal does not inspire great confidence.
  3. Read almost any paper by a well known researcher and you will find copious detail on the methodology. These days, the paper is often accompanied by a Git repo (add 3 stars for this!). Academics producing quality research want readers to be able to replicate and validate their findings.
    In this paper there are lots of generic, pretty colored graphics of deep learning networks, but no code repo and very little detail on the methodology. If you don’t want to publish details because the methodology is proprietary and potentially valuable, then do what I do: don’t publish at all.
  4. One-day ahead forecasting accuracy of 53%-55% is good (52%-53% in HFT). 60% accuracy is outstanding. 90% – 95% is unbelievable. It’s a license to print money. So what we are being asked to believe is through a combination of data smoothing (which is all DWT is), dimensionality reduction (stacked autoencoders) and long-memory modeling, we can somehow improve forecasting accuracy over, say, a gradient boosted tree baseline, by something like 40%. It simply isn’t credible.

These simple considerations should be enough for any experienced quant to give the paper a wide berth.

Digging into the Methodology

  1. Discrete Wavelet Transform

So we start from a raw dataset with variables that closely match those described in the paper (see headers for details). Of course, I don’t know the parameter values they used for most of the technical indicators, but it possibly doesn’t matter all that much.

Note that I am applying DWT using the Haar wavelet twice: once to the original data and then again to the transformed data. This has the effect of filtering out higher frequency “noise” in the data, which is the object of the exercise. If follow this you will also see that the DWT actually adds noisy fluctuations to the US Dollar index and 13-Week TBill series. So these should be excluded from the de-noising process. You can see how the DWT denoising process removes some of the higher frequency fluctuations from the opening price, for instance:

2. Stacked Autoencoders

First up, we need to produce data for training, validation and testing. I am doing this for just the first batch of data. We would then move the window forward + 3 months, rinse and repeat.

Note that:

(1) The data is being standardized. If you don’t do this the outputs from the autoencoders is mostly just 1s and 0s. Same happens if you use Min/Max scaling.

(2) We use the mean and standard deviation from the training dataset to normalize the test dataset. This is a trap that too many researchers fall into – standardizing the test dataset using the mean and standard deviation of the test dataset is feeding forward information.

The Autoencoder stack uses a hidden layer of size 10 in each encoder. We strip the output layer from the first encoder and use the hidden layer as inputs to the second autoencoder, and so on:

3. Benchmark Model

Before we plow on any further lets do a sanity check. We’ll use the Predict function to see if we’re able to get any promising-looking results. Here we are building a Gradient Boosted Trees predictor that maps the autoencoded training data to the corresponding closing prices of the index, one step ahead.

Next we use the predictor on the test dataset to produce 1-step-ahead forecasts for the closing price of the index.

Finally, we construct a trading model, as described in the paper, in which we go long or short the index depending on whether the forecast is above or below the current index level. The results do not look good (see below).

Now, admittedly, an argument can be made that a properly constructed LSTM model would outperform a simple gradient-boosted tree – but not by the amount that would be required to improve the prediction accuracy from around 50% to nearer 95%, the level claimed in the paper. At most I would expect to see a 1% to 5% improvement in forecast accuracy.

So what this suggests to me is that the researchers have got something wrong, by somehow allowing forward information to leak into the modeling process. The most likely culprits are:

  1. Applying DWT transforms to the entire dataset, instead of the training and test sets individually
  2. Standardzing the test dataset using the mean and standard deviation of the test dataset, instead of the training data set

A More Complete Attempt to Replicate the Research

There’s a much more complete attempt at replicating the research in this Git repo

As the repo author writes:

My attempts haven’t been succesful so far. Given the very limited comments regarding implementation in the article, it may be the case that I am missing something important, however the results seem too good to be true, so my assumption is that the authors have a bug in their own implementation. I would of course be happy to be proven wrong about this statement 😉

Conclusion

Over time, as one’s experience as a quant deepens, you learn to recognize the signs of shoddy research and save yourself the effort of trying to replicate it. It’s actually easier these days for researchers to fool themselves (and their readers) that they have uncovered something interesting, because of the facility with which complex algorithms can be deployed in an inappropriate way.

Postscript

 This paper echos my concerns about the incorrect use of wavelets in a forecasting context:

The incorrect development of these wavelet-based forecasting models occurs during wavelet decomposition (the process of extracting high- and low-frequency information into different sub-time series known as wavelet and scaling coefficients, respectively) and as a result introduces error into the forecast model inputs. The source of this error is due to the boundary condition that is associated with wavelet decomposition (and the wavelet and scaling coefficients) and is linked to three main issues: 1) using ‘future data’ (i.e., data from the future that is not available); 2) inappropriately selecting decomposition levels and wavelet filters; and 3) not carefully partitioning calibration and validation data.

Machine Learning Based Statistical Arbitrage

Previous Posts

I have written extensively about statistical arbitrage strategies in previous posts, for example:

Applying Machine Learning in Statistical Arbitrage

In this series of posts I want to focus on applications of machine learning in stat arb and pairs trading, including genetic algorithms, deep neural networks and reinforcement learning.

Pair Selection

Let’s begin with the subject of pairs selection, to set the scene. The way this is typically handled is by looking at historical correlations and cointegration in a large universe of pairs. But there are serious issues with this approach, as described in this post:

Instead I use a metric that I call the correlation signal, which I find to be a more reliable indicator of co-movement in the underlying asset processes. I wont delve into the details here, but you can get the gist from the following:

The search algorithm considers pairs in the S&P 500 membership and ranks them in descending order of correlation information. Pairs with the highest values (typically of the order of 100, or greater) tend to be variants of the same underlying stock, such as GOOG vs GOOGL, which is an indication that the metric “works” (albeit that such pairs offer few opportunities at low frequency). The pair we are considering here has a correlation signal value of around 14, which is also very high indeed.

Trading Strategy Development

We begin by collecting five years of returns series for the two stocks:

The first approach we’ll consider is the unadjusted spread, being the difference in returns between the two series, from which we crate a normalized spread “price”, as follows.

This methodology is frowned upon as the resultant spread is unlikely to be stationary, as you can see for this example in the above chart. But it does have one major advantage in terms of implementation: the same dollar value is invested in both long and short legs of the spread, making it the most efficient approach in terms of margin utilization and capital cost – other approaches entail incurring an imbalance in the dollar value of the two legs.

But back to nonstationarity. The problem is that our spread price series looks like any other asset price process – it trends over long periods and tends to wander arbitrarily far from its starting point. This is NOT the outcome that most statistical arbitrageurs are looking to achieve. On the contrary, what they want to see is a stationary process that will tend to revert to its mean value whenever it moves too far in one direction.

Still, this doesn’t necessarily determine that this approach is without merit. Indeed, it is a very typical trading strategy amongst futures traders for example, who are often looking for just such behavior in their trend-following strategies. Their argument would be that futures spreads (which are often constructed like this) exhibit clearer, longer lasting and more durable trends than in the underlying futures contracts, with lower volatility and market risk, due to the offsetting positions in the two legs. The argument has merit, no doubt. That said, spreads of this kind can nonetheless be extremely volatile.

So how do we trade such a spread? One idea is to add machine learning into the mix and build trading systems that will seek to capitalize on long term trends. We can do that in several ways, one of which is to apply genetic programming techniques to generate potential strategies that we can backtest and evaluate. For more detail on the methodology, see:

I built an entire hedge fund using this approach in the early 2000’s (when machine learning was entirely unknown to the general investing public). These days there are some excellent software applications for generating trading systems and I particularly like Mike Bryant’s Adaptrade Builder, which was used to create the strategies shown below:

Builder has no difficulty finding strategies that produce a smooth equity curve, with decent returns, low drawdowns and acceptable Sharpe Ratios and Profit Factors – at least in backtest! Of course, there is a way to go here in terms of evaluating such strategies and proving their robustness. But it’s an excellent starting point for further R&D.

But let’s move on to consider the “standard model” for pairs trading. The way this works is that we consider a linear model of the form

Y(t) = beta * X(t) + e(t)

Where Y(t) is the returns series for stock 1, X(t) is the returns series in stock 2, e(t) is a stationary random error process and beta (is this model) is a constant that expresses the linear relationship between the two asset processes. The idea is that we can form a spread process that is stationary:

Y(t) – beta * X(t) = e(t)

In this case we estimate beta by linear regression to be 0.93. The residual spread process has a mean very close to zero, and the spread price process remains within a range, which means that we can buy it when it gets too low, or sell it when it becomes too high, in the expectation that it will revert to the mean:

In this approach, “buying the spread” means purchasing shares to the value of, say, $1M in stock 1, and selling beta * $1M of stock 2 (around $930,000). While there is a net dollar imbalance in the dollar value of the two legs, the margin impact tends to be very small indeed, while the overall portfolio is much more stable, as we have seen.

The classical procedure is to buy the spread when the spread return falls 2 standard deviations below zero, and sell the spread when it exceeds 2 standard deviations to the upside. But that leaves a lot of unanswered questions, such as:

  • After you buy the spread, when should you sell it?
  • Should you use a profit target?
  • Where should you set a stop-loss?
  • Do you increase your position when you get repeated signals to go long (or short)?
  • Should you use a single, or multiple entry/exit levels?

And so on – there are a lot of strategy components to consider. Once again, we’ll let genetic programming do the heavy lifting for us:

What’s interesting here is that the strategy selected by the Builder application makes use of the Bollinger Band indicator, one of the most common tools used for trading spreads, especially when stationary (although note that it prefers to use the Opening price, rather than the usual close price):

Ok so far, but in fact I cheated! I used the entire data series to estimate the beta coefficient, which is effectively feeding forward-information into our model. In reality, the data comes at us one day at a time and we are required to re-estimate the beta every day.

Let’s approximate the real-life situation by re-estimating beta, one day at a time. I am using an expanding window to do this (i.e. using the entire data series up to each day t), but is also common to use a fixed window size to give a “rolling” estimate of beta in which the latest data plays a more prominent part in the estimation. The process now looks like this:

Here we use OLS to produce a revised estimate of beta on each trading day. So our model now becomes:

Y(t) = beta(t) * X(t) + e(t)

i.e. beta is now time-varying, as can be seen from the chart above.

The synthetic spread price appears to be stationary (we can test this), although perhaps not to the same degree as in the previous example, where we used the entire data series to estimate a single, constant beta. So we might anticipate that out ML algorithm would experience greater difficulty producing attractive trading models. But, not a bit of it – it turns out that we are able to produce systems that are just as high performing as before:

In fact this strategy has higher returns, Sharpe Ratio, Sortino Ratio and lower drawdown than many of the earlier models.

Conclusion

The purpose of this post was to show how we can combine the standard approach to statistical arbitrage, which is based on classical econometric theory, with modern machine learning algorithms, such as genetic programming. This frees us to consider a very much wider range of possible trade entry and exit strategies, beyond the rather simplistic approach adopted when pairs trading was first developed. We can deploy multiple trade entry levels and stop loss levels to manage risk, dynamically size the trade according to current market conditions and give emphasis to alternative performance characteristics such as maximum drawdown, or Sharpe or Sortino ratio, in addition to strategy profitability.

The programatic nature of the strategies developed in the way also make them very amenable to optimization, Monte Carlo simulation and stress testing.

This is but one way of adding machine learning methodologies to the mix. In a series of follow-up posts I will be looking at the role that other machine learning techniques – such as deep learning and reinforcement learning – can play in improving the performance characteristics of the classical statistical arbitrage strategy.

Can Machine Learning Techniques Be Used To Predict Market Direction? The 1,000,000 Model Test.

During the 1990’s the advent of Neural Networks unleashed a torrent of research on their applications in financial markets, accompanied by some rather extravagant claims about their predicative abilities.  Sadly, much of the research proved to be sub-standard and the results illusionary, following which the topic was largely relegated to the bleachers, at least in the field of financial market research.

With the advent of new machine learning techniques such as Random Forests, Support Vector Machines and Nearest Neighbor Classification, there has been a resurgence of interest in non-linear modeling techniques and a flood of new research, a fair amount of it supportive of their potential for forecasting financial markets.  Once again, however, doubts about the quality of some of the research bring the results into question.

SSALGOTRADING AD

Against this background I and my co-researcher Dan Rico set out to address the question of whether these new techniques really do have predicative power, more specifically the ability to forecast market direction.  Using some excellent MatLab toolboxes and a new software package, an Excel Addin called 11Ants, that makes large scale testing of multiple models a snap, we examined over 1,000,000 models and model-ensembles, covering just about every available non-linear technique.  The data set for our study comprised daily prices for a selection of US equity securities, together with a large selection of technical indicators for which some other researchers have claimed explanatory power.

In-Sample Equity Curve for Best Performing Nonlinear Model
In-Sample Equity Curve for Best Performing Nonlinear Model

The answer provided by our research was, without exception, in the negative: not one of the models tested showed any significant ability to predict the direction of any of the securities in our data set.  Furthermore, our study found that the best-performing models favored raw price data over technical indicator variables, suggesting that the latter have little explanatory power.

As with Neural Networks, the principal difficulty with non-linear techniques appears to be curve-fitting and a failure to generalize:  while it is very easy to find models that provide an excellent fit to in-sample data, the forecasting performance out-of-sample is often very poor.

Out-of-Sample Equity Curve for Best Performing Nonlinear Model
Out-of-Sample Equity Curve for Best Performing Nonlinear Model

Some caveats about our own research apply.  First and foremost, it is of course impossible to prove a hypothesis in the negative.  Secondly, it is plausible that some markets are less efficient than others:  some studies have claimed success in developing predictive models due to the (relative) inefficiency of the F/X and futures markets, for example.  Thirdly, the choice of sample period may be criticized:  it could be that the models were over-conditioned on a too- lengthy in-sample data set, which in one case ran from 1993 to 2008, with just two years (2009-2010) of out-of-sample data.  The choice of sample was deliberate, however:  had we omitted the 2008 period from the “learning” data set, it would be very easy to criticize the study for failing to allow the algorithms to learn about the exceptional behavior of the markets during that turbulent year.

Despite these limitations, our research casts doubt on the findings of some less-extensive studies, that may be the result of sample-selection bias.  One characteristic of the most credible studies finding evidence in favor of market predictability, such as those by Pesaran and Timmermann, for instance (see paper for citations), is that the models they employ tend to incorporate independent explanatory variables, such as yield spreads, which do appear to have real explanatory power.  The finding of our study suggest that, absent such explanatory factors, the ability to predict markets using sophisticated non-linear techniques applied to price data alone may prove to be as illusionary as it was in the 1990’s.

 

ONE MILLION MODELS

Systematic Futures Trading

In its proprietary trading, Systematic Strategies primary focus in on equity and volatility strategies, both low and high frequency. In futures, the emphasis is on high frequency trading, although we also run one or two lower frequency strategies that have higher capacity, such as the Futures WealthBuilder. The version of WealthBuilder running on the Collective 2 site has performed very well in 2017, with net returns of 30% and a Sharpe Ratio of 3.4:

Futures C2 oct 2017

 

In the high frequency space, our focus is on strategies with very high Sharpe Ratios and low drawdowns. We trade a range of futures products, including equity, fixed income, metals and energy markets. Despite the current low levels of market volatility, these strategies have performed well in 2017:

HFT Futures Oct 2017 (NFA)

Building high frequency strategies with double-digit Sharpe Ratios requires a synergy of computational capability and modeling know-how. The microstructure of futures markets is, of course, substantially different to that of equity or forex markets and the components of the model that include microstructure effects vary widely from one product to another. There can be substantial variations too in the way that time is handled in the model – whether as discrete or continuous “wall time”, in trade time, or some other measure. But some of the simple technical indicators we use – moving averages, for example – are common to many models across different products and markets. Machine learning plays a role in most of our trading strategies, including high frequency.

Here are some relevant blog posts that you may find interesting:

http://jonathankinlay.com/2016/04/high-frequency-trading-equities-vs-futures/

 

http://jonathankinlay.com/2015/05/designing-scalable-futures-strategy/

 

http://jonathankinlay.com/2014/10/day-trading-system-in-vix-futures/

A Winer Process

No doubt many of you sharp-eyed readers will have spotted a spelling error, thinking I intended to refer to one of these:

Fig 1

 

But, in fact, I really did have in mind something more like this:

 

wine pour

 

We are following an example from the recently published Mathematica Beyond Mathematics by Jose Sanchez Leon, an up-to-date text that describes many of the latest features in Mathematica, illustrated with interesting applications. Sanchez Leon shows how Mathematica’s machine learning capabilities can be applied to the craft of wine-making.

SSALGOTRADING AD

We begin by loading a curated Wolfram dataset comprising measurements of the physical properties and quality of wines:

Fig 2

A Machine Learning Prediction Model for Wine Quality

We’re going to apply Mathematica’s built-in machine learning algorithms to train a predictor of wine quality, using the training dataset. Mathematica determines that the most effective machine learning technique in this case is Random Forest and after a few seconds produces the predictor function:

Fig 3

 

Mathematica automatically selects what it considers to be the best performing model from several available machine learning algorithms:

machine learning methods

Let’s take a look at how well the predictor perform on the test dataset of 1,298 wines:

Fig 4

We can use the predictor function to predict the quality of an unknown wine, based on its physical properties:

Fig 5

Next we create a function to predict the quality of an unknown wine as a function of just two of its characteristics, its pH and alcohol level.  The analysis suggests that the quality of our unknown wine could be improved by increasing both its pH and alcohol content:

Fig 6

Applications and Examples

This simple toy example illustrates how straightforward it is to deploy machine learning techniques in Mathematica.  Machine Learning and Neural Networks became a major focus for Wolfram Research in version 10, and the software’s capabilities have been significantly enhanced in version 11, with several applications such as text and sentiment analysis that have direct relevance to trading system development:

Fig 7

For other detailed examples see:

http://jonathankinlay.com/2016/08/machine-learning-model-spy/

http://jonathankinlay.com/2016/11/trading-market-sentiment/

 

http://jonathankinlay.com/2016/08/dynamic-time-warping/

 

 

 

 

 

Correlation Copulas

Continuing a previous post, in which we modeled the relationship in the levels of the VIX Index and the Year 1 and Year 2 CBOE Correlation Indices, we next turn our attention to modeling changes in the VIX index.

In case you missed it, the post can be found here:

http://jonathankinlay.com/2017/08/correlation-cointegration/

We saw previously that the levels of the three indices are all highly correlated, and we were able to successfully account for approximately half the variation in the VIX index using either linear regression models or non-linear machine-learning models that incorporated the two correlation indices.  It turns out that the log-returns processes are also highly correlated:

Fig1 Fig2

A Linear Model of VIX Returns

We can create a simple linear regression model that relates log-returns in the VIX index to contemporaneous log-returns in the two correlation indices, as follows.  The derived model accounts for just under 40% of the variation in VIX index returns, with each correlation index contributing approximately one half of the total VIX return.

SSALGOTRADING AD

Fig3

Non-Linear Model of VIX Returns

Although the linear model is highly statistically significant, we see clear evidence of lack of fit in the model residuals, which indicates non-linearities present in the relationship.  So, ext we use a nearest-neighbor algorithm, a machine learning technique that allows us to model non-linear components of the relationship.  The residual plot from the nearest neighbor model clearly shows that it does a better job of capturing these nonlinearities, with lower standard in the model residuals, compared to the linear regression model:

Fig4

Correlation Copulas

Another approach entails the use of copulas to model the inter-dependency between the volatility and correlation indices.  For a fairly detailed exposition on copulas, see the following blog posts:

http://jonathankinlay.com/2017/01/copulas-risk-management/

 

http://jonathankinlay.com/2017/03/pairs-trading-copulas/

We begin by taking a smaller sample comprising around three years of daily returns in the indices.  This minimizes the impact of any long-term nonstationarity in the processes and enables us to fit marginal distributions relatively easily.  First, let’s look at the correlations in our sample data:

Fig5

We next proceed to fit margin distributions to the VIX and Correlation Index processes.  It turns out that the VIX process is well represented by a Logistic distribution, while the two Correlation Index returns processes are better represented by a Student-T density.  In all three cases there is little evidence of lack of fit, wither in the body or tails of the estimated probability density functions:

Fig6 Fig7 Fig8

The final step is to fit a copula to model the joint density between the indices.  To keep it simple I have chosen to carry out the analysis for the combination of the VIX index with only the first of the correlation indices, although in principle there no reason why a copula could not be estimated for all three indices.  The fitted model is a multinormal Gaussian copula with correlation coefficient of 0.69.  of course, other copulas are feasible (Clayton, Gumbel, etc), but Gaussian model appears to provide an adequate fit to the empirical copula, with approximate symmetry in the left and right tails.

Fig9

 

 

 

 

 

Capitalizing on the Coming Market Crash

Long-Only Equity Investors

Recently I have been discussing possible areas of collaboration with an RIA contact on LinkedIn, who also happens to be very familiar with the hedge fund world.  He outlined the case of a high net worth investor in equities (long only), who wanted to remain invested, but was becoming increasingly concerned about the prospects for a significant market downturn, or even a market crash, similar to those of 2000 or 2008.

I am guessing he is not alone: hardly a day goes by without the publication of yet another article sounding a warning about stretched equity valuations and the dangerously elevated level of the market.

The question put to me was, what could be done to reduce the risk in the investor’s portfolio?

Typically, conservative investors would have simply moved more of their investment portfolio into fixed income securities, but with yields at such low levels this is hardly an attractive option today. Besides, many see the bond market as representing an even more extreme bubble than equities currently.

SSALGOTRADING AD

Hedging Strategies

The problem with traditional hedging mechanisms such as put options, for example, is that they are relatively expensive and can easily reduce annual returns from the overall portfolio by several hundred basis points.  Even at current low level of volatility the performance drag is noticeable, since the potential upside in the equity portfolio is also lower than it has been for some time.  A further consideration is that many investors are not mandated – or are simply reluctant – to move beyond traditional equity investing into complex ETF products or derivatives.

An equity long/short hedge fund product is one possible solution, but many equity investors are reluctant to consider shorting stocks under any circumstances, even for hedging purposes. And while a short hedge may provide some downside protection it is unlikely to fully safeguard the investor in a crash scenario.  Furthermore, the cost of a hedge fund investment is typically greater than for a long-only product, entailing the payment of a performance fee in addition to management fees that are often higher than for standard investment products.

The Ideal Investment Strategy

Given this background, we can say that the ideal investment strategy is one that:

  • Invests long-only in equities
  • Is inexpensive to implement (reasonable management fees; no performance fees)
  • Does not require shorting stocks, or expensive hedging mechanisms such as options
  • Makes acceptable returns during both bull and bear markets
  • Is likely to produce positive returns in a market crash scenario

A typical buy-and-hold approach is unlikely to meet only the first three requirements, although an argument could be made that a judicious choice of defensive stocks might enable the investment portfolio to generate returns at an “acceptable” level during a downturn (without being prescriptive as to the precise meaning of that term may be).  But no buy-and-hold strategy could ever be expected to prosper during times of severe market stress.  A more sophisticated approach is required.

Market Timing

Market timing is regarded as a “holy grail” by some quantitative strategists.  The idea, simply, is to increase or reduce risk exposure according to the prospects for the overall market.  For a very long time the concept has been dismissed as impossible, by definition, given that markets are mostly efficient.  But analysts have persisted in the attempt to develop market timing techniques, motivated by the enormous benefits that a viable market timing strategy would bring.  And gradually, over time, evidence has accumulated that the market can be timed successfully and profitably.  The rate of progress has accelerated in the last decade by the considerable advances in computing power and the development of machine learning algorithms and application of artificial intelligence to investment finance.

I have written several articles on the subject of market timing that the reader might be interested to review (see below).  In this article, however, I want to focus firstly on the work on another investment strategist, Blair Hull.

http://jonathankinlay.com/2014/07/how-to-bulletproof-your-portfolio/

 

http://jonathankinlay.com/2014/07/enhancing-mutual-fund-returns-with-market-timing/

The Hull Tactical Fund

Blair Hull rose to prominence in the 1980’s and 1990’s as the founder of the highly successful quantitative option market making firm, the Hull Trading Company which at one time moved nearly a quarter of the entire daily market volume on some markets, and executed over 7% of the index options traded in the US. The firm was sold to Goldman Sachs at the peak of the equity market in 1999, for a staggering $531 million.

Blair used the capital to establish the Hull family office, Hull Investments, and in 2013 founded an RIA, Hull Tactical Asset Allocation LLC.   The firm’s investment thesis is firmly grounded in the theory of market timing, as described in the paper “A Practitioner’s Defense of Return Predictability”,  authored by Blair Hull and Xiao Qiao, in which the issues and opportunities of market timing and return predictability are explored.

In 2015 the firm launched The Hull Tactical Fund (NYSE Arca: HTUS), an actively managed ETF that uses quantitative trading model to take long and short positions in ETFs that seek to track the performance of the S&P 500, as well as leveraged ETFs or inverse ETFs that seek to deliver multiples, or the inverse, of the performance of the S&P 500.  The goal to achieve long-term growth from investments in the U.S. equity and Treasury markets, independent of market direction.

How well has the Hull Tactical strategy performed? Since the fund takes the form of an ETF its performance is a matter in the public domain and is published on the firm’s web site.  I reproduce the results here, which compare the performance of the HTUS ETF relative to the SPDR S&P 500 ETF (NYSE Arca: SPY):

 

Hull1

 

Hull3

 

Although the HTUS ETF has underperformed the benchmark SPY ETF since launching in 2015, it has produced a higher rate of return on a risk-adjusted basis, with a Sharpe ratio of 1.17 vs only 0.77 for SPY, as well as a lower drawdown (-3.94% vs. -13.01%).  This means that for the same “risk budget” as required to buy and hold SPY, (i.e. an annual volatility of 13.23%), the investor could have achieved a total return of around 36% by using margin funds to leverage his investment in HTUS by a factor of 2.8x.

How does the Hull Tactical team achieve these results?  While the detailed specifics are proprietary, we know from the background description that market timing (and machine learning concepts) are central to the strategy and this is confirmed by the dynamic level of the fund’s equity exposure over time:


Hull2

 

A Long-Only, Crash-Resistant Equity Strategy

A couple of years ago I and my colleagues carried out an investigation of long-only equity strategies as part of a research project.  Our primary focus was on index replication, but in the course of our research we came up with a methodology for developing long-only strategies that are highly crash-resistant.

The performance of our Long-Only Market Timing strategy is summarized below and compared with the performance of the HTUS ETF and benchmark SPY ETF (all results are net of fees).  Over the period from inception of the HTUS ETF, our LOMT strategy produced a higher total return than HTUS (22.43% vs. 13.17%), higher CAGR (10.07% vs. 6.04%), higher risk adjusted returns (Sharpe Ratio 1.34 vs 1.21) and larger annual alpha (6.20% vs 4.25%).  In broad terms, over this period the LOMT strategy produced approximately the same overall return as the benchmark SPY ETF, but with a little over half the annual volatility.

 

Fig4

 

Fig5

Application of Artificial Intelligence to Market Timing

Like the HTUS ETF, our LOMT strategy operates with very low fees, comparable to an ETF product rather than a hedge fund (1% management fee, no performance fees).  Again, like the HTUS ETF our LOMT products makes no use of leverage.  However, unlike HTUS it avoids complicated (and expensive) inverse or leveraged ETF products and instead invests only in two assets – the SPY ETF and 91-day US Treasury Bills.  In other words, the LOMT strategy is a pure market timing strategy, moving capital between the SPY ETF and Treasury Bills depending on its forecast of future market performance.  These forecasts are derived from machine learning algorithms that are specifically tuned to minimize the downside risk in the investment portfolio.  This not only makes strategy returns less volatile, but also ensures that the strategy is very robust to market downturns.

In fact, even better than that,  not only does the LOMT strategy tend to avoid large losses during periods of market stress, it is capable of capitalizing on the opportunities that more volatile market conditions offer.  Looking at the compounded returns (net of fees) over the period from 1994 (the inception of the SPY ETF) we see that the LOMT strategy produces almost double the total profit of the SPY ETF, despite several years in which it underperforms the benchmark.  The reason is clear from the charts:  during the periods 2000-2002 and again in 2008, when the market crashed and returns in the SPY ETF were substantially negative, the LOMT strategy managed to produce positive returns.  In fact, the banking crisis of 2008 provided an exceptional opportunity for the LOMT strategy, which in that year managed to produce a return nearing +40% at a time when the SPY ETF fell by almost the same amount!

 

Fig6

 

Fig7

 

Long Volatility Strategies

I recall having a conversation with Nassim Taleb, of Black Swan fame, about his Empirica fund around the time of its launch in the early 2000’s.  He explained that his analysis had shown that volatility was often underpriced due to an under-estimation of tail risk, which the fund would seek to exploit by purchasing cheap out-of-the-money options.  My response was that this struck me a great idea for an insurance product, but not a hedge fund – his investors, I explained, were going to hate seeing month after month of negative returns and would flee the fund.  By the time the big event occurred there wouldn’t be sufficient AUM remaining to make up the shortfall.  And so it proved.

A similar problem arises from most long-volatility strategies, whether constructed using options, futures or volatility ETFs:  the combination of premium decay and/or negative carry typically produces continuing losses that are very difficult for the investor to endure.

Conclusion

What investors have been seeking is a strategy that can yield positive returns during normal market conditions while at the same time offering protection against the kind of market gyrations that typically decimate several years of returns from investment portfolios, such as we saw after the market crashes in 2000 and 2008.  With the new breed of long-only strategies now being developed using machine learning algorithms, it appears that investors finally have an opportunity to get what they always wanted, at a reasonable price.

And just in time, if the prognostications of the doom-mongers turn out to be correct.

Contact Hull Tactical

Contact Systematic Strategies