How Not to Develop Trading Strategies

In his post on Multi-Market Techniques for Robust Trading Strategies (http://www.adaptrade.com/Newsletter/NL-MultiMarket.htm) Michael Bryant of Adaptrade discusses some interesting approaches to improving model robustness. One is to use data from several correlated assets to build the model, on the basis that if the algorithm works for several assets with differing price levels, that would tend to corroborate the system’s robustness. The second approach he advocates is to use data from the same asset series at different bars lengths. The example he uses @ES.D at 5, 7 and 9 minute bars. The argument in favor of this approach is the same as for the first, albeit in this case the underlying asset is the same.

I like Michael’s idea in principle, but I wanted to give you a sense of what can all too easily go wrong with GP modeling, even using techniques such as multi-time frame fitting and Monte Carlo simulation to improve robustness testing.

In the chart below I have extended the analysis back in time, beyond the 2011-2012 period that Michael used to build his original model. As you can see, most of the returns are generated in-sample, in the 2011-2012 period. As we look back over the period from 2007-2010, the results are distinctly unimpressive – the strategy basically trades sideways for four years.

How do Do It Right

In my view, there is only one, safe way to use GP to develop strategies. Firstly, you need to use a very long span of data – as much as possible, to fit your model. Only in this way can you ensure that the model has encountered enough variation in market conditions to stand a reasonable chance of being able to adapt to changing market conditions in future.

Secondly, you need to use two OOS period. The first OOS span of data, drawn from the start of the data series, is used in the normal way, to visually inspect the performance of the model. But the second span of OOS data, from more recent history, is NOT examined before the model is finalized. This is really important. Products like Adaptrade make it too easy for the system designer to “cheat”, by looking at the recent performance of his trading system “out of sample” and selecting models that do well in that period. But the very process of examining OOS performance introduces bias into the system. It would be like adding a line of code saying something like:

IF (model performance in OOS period > x) do the following….

I am quite sure if I posted a strategy with a line of code like that in it, it would immediately be shot down as being blatantly biased, and quite rightly so. But, if I look at the recent “OOS” performance and use it to select the model, I am effectively doing exactly the same thing.

That is why it is so important to have a second span of OOS data that it not only not used to build the model, but also is not used to assess performance, until after the final model selection is made. For that reason, the second OOS period is referred to as a “double blind” test.

That’s the procedure I followed to build my futures daytrading strategy: I used as much data as possible, dating from 2002. The first 20% of the each data set was used for normal OOS testing. But the second set of data, from Jan 2012 onwards, was my double-blind data set. Only when I saw that the system maintained performance in BOTH OOS periods was I reasonably confident of the system’s robustness.

This further explains why it is so challenging to develop higher frequency strategies using GP. Running even a very fast GP modeling system on a large span of high frequency data can take inordinate amounts of time.

The longest span of 5-min bar data that a GP system can handle would typically be around 5-7 years. This is probably not quite enough to build a truly robust system, although if you pick you time span carefully it might be (I generally like to use the 2006-2011 period, which has lots of market variation).

For 15 minute bar data, a well-designed GP system can usually handle all the available data you can throw at it – from 1999 in the case of the Emini, for instance.

Why I don’t Like Fitting Models over Short Time Spans

The risks of fitting models to data in short time spans are intuitively obvious. If you happen to pick a data set in which the market is in a strong uptrend, then your model is going to focus on that kind of market behavior. Subsequently, when the trend changes, the strategy will typically break down.
Monte Carlo simulation isn’t going to change much in this situation: sure, it will help a bit, perhaps, but since the resampled data is all drawn from the same original data set, in most cases the simulated paths will also show a strong uptrend – all that will be shown is that there is some doubt about the strength of the trend. But a completely different scenario, in which, say, the market drops by 10%, is unlikely to appear.

One possible answer to that problem, recommended by some system developers, is simply to rebuild the model when a breakdown is detected. While it’s true that a product like MSA can make detection easier, rebuilding the model is another question altogether. There is no guarantee that the kind of model that has worked hitherto can be re-tooled to work once again. In fact, there may be no viable trading system that can handle the new market dynamics.

Here is a case in point. We have a system that works well on 10 min bars in TF.D up until around May 2012, when MSA indicates a breakdown in strategy performance.

So now we try to fit a new model, along the pattern of the original model, taking account some of the new data. But it turns out to be just a Band-Aid – after a few more data points the strategy breaks down again, irretrievably.

This is typical of what often happens when you use GP to build a model using s short span of data. That’s why I prefer to use a long time span, even at lower frequency. The chances of being able to build a robust system that will adapt well to changing market conditions are much higher.

A Robust Emini Trading System

Here, for example is a GP system build on daily data in @ES.D from 1999 to 2011 (i.e. 2012 to 2014 is OOS).

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30