Statistical Arbitrage with Synthetic Data

In my last post I mapped out how one could test the reliability of a single stock strategy (for the S&P 500 Index) using synthetic data generated by the new algorithm I developed.

Developing Trading Strategies with Synthetic Data

As this piece of research follows a similar path, I won’t repeat all those details here. The key point addressed in this post is that not only are we able to generate consistent open/high/low/close prices for individual stocks, we can do so in a way that preserves the correlations between related securities. In other words, the algorithm not only replicates the time series properties of individual stocks, but also the cross-sectional relationships between them. This has important applications for the development of portfolio strategies and portfolio risk management.

KO-PEP Pair

To illustrate this I will use synthetic daily data to develop a pairs trading strategy for the KO-PEP pair.

The two price series are highly correlated, which potentially makes them a suitable candidate for a pairs trading strategy.

There are numerous ways to trade a pairs spread such as dollar neutral or beta neutral, but in this example I am simply going to look at trading the price difference. This is not a true market neutral approach, nor is the price difference reliably stationary. However, it will serve the purpose of illustrating the methodology.

Historical price differences between KO and PEP

Obviously it is crucial that the synthetic series we create behave in a way that replicates the relationship between the two stocks, so that we can use it for strategy development and testing. Ideally we would like to see high correlations between the synthetic and original price series as well as between the pairs of synthetic price data.

We begin by using the algorithm to generate 100 synthetic daily price series for KO and PEP and examine their properties.

Correlations

As we saw previously, the algorithm is able to generate synthetic data with correlations to the real price series ranging from below zero to close to 1.0:

Distribution of correlations between synthetic and real price series for KO and PEP

The crucial point, however, is that the algorithm has been designed to also preserve the cross-sectional correlation between the pairs of synthetic KO-PEP data, just as in the real data series:

Distribution of correlations between synthetic KO and PEP price series

Some examples of highly correlated pairs of synthetic data are shown in the plots below:

In addition to correlation, we might also want to consider the price differences between the pairs of synthetic series, since the strategy will be trading that price difference, in the simple approach adopted here. We could, for example, select synthetic pairs for which the divergence in the price difference does not become too large, on the assumption that the series difference is stationary. While that approach might well be reasonable in other situations, here an assumption of stationarity would be perhaps closer to wishful thinking than reality. Instead we can use of selection of synthetic pairs with high levels of cross-correlation, as we all high levels of correlation with the real price data. We can also select for high correlation between the price differences for the real and synthetic price series.

Strategy Development & WFO Testing

Once again we follow the procedure for strategy development outline in the previous post, except that, in addition to a selection of synthetic price difference series we also include 14-day correlations between the pairs. We use synthetic daily synthetic data from 1999 to 2012 to build the strategy and use the data from 2013 onwards for testing/validation. Eventually, after 50 generations we arrive at the result shown in the figure below:

As before, the equity curve for the individual synthetic pairs are shown towards the bottom of the chart, while the aggregate equity curve, which is a composition of the results for all none synthetic pairs is shown above in green. Clearly the results appear encouraging.

As a final step we apply the WFO analysis procedure described in the previous post to test the performance of the strategy on the real data series, using a variable number in-sample and out-of-sample periods of differing size. The results of the WFO cluster test are as follows:

The results are no so unequivocal as for the strategy developed for the S&P 500 index, but would nonethless be regarded as acceptable, since the strategy passes the great majority of the tests (in addition to the tests on synthetic pairs data).

The final results appear as follows:

Conclusion

We have demonstrated how the algorithm can be used to generate synthetic price series the preserve not only the important time series properties, but also the cross-sectional properties between series for correlated securities. This important feature has applications in the development of statistical arbitrage strategies, portfolio construction methodology and in portfolio risk management.

Developing Trading Strategies With Synthetic Data

One of the main criticisms levelled at systematic trading over the last few years is that the over-use of historical market data has tended to produce curve-fitted strategies that perform poorly out of sample in a live trading environment. This is indeed a valid criticism – given enough attempts one is bound to arrive eventually at a strategy that performs well in backtest, even on a holdout data sample. But that by no means guarantees that the strategy will continue to perform well going forward.

The solution to the problem has been clear for some time: what is required is a method of producing synthetic market data that can be used to build a strategy and test it under a wide variety of simulated market conditions. A strategy built in this way is more likely to survive the challenge of live trading than one that has been developed using only a single historical data path.

The problem, however, has been in implementation. Up until now all the attempts to produce credible synthetic price data have failed, for one reason or another, as I described in an earlier post:

I have been able to devise a completely new algorithm for generating artificial price series that meet all of the key requirements, as follows:

  • Computational simplicity & efficiency. Important if we are looking to mass-produce synthetic series for a large number of assets, for a variety of different applications. Some deep learning methods would struggle to meet this requirement, even supposing that transfer learning is possible.
  • The ability to produce price series that are internally consistent (i.e High > Low, etc) in every case .
  • Should be able to produce a range of synthetic series that vary widely in their correspondence to the original price series. In some case we want synthetic price series that are highly correlated to the original; in other cases we might want to test our investment portfolio or risk control systems under extreme conditions never before seen in the market.
  • The distribution of returns in the synthetic series should closely match the historical series, being non-Gaussian and with “fat-tails”.
  • The ability to incorporate long memory effects in the sequence of returns.
  • The ability to model GARCH effects in the returns process.

This means that we are now in a position to develop trading strategies without any direct reference to the underlying market data. Consequently we can then use all of the real market data for out-of-sample back-testing.

Developing a Trading Strategy for the S&P 500 Index Using Synthetic Market Data

To illustrate the procedure I am going to use daily synthetic price data for the S&P 500 Index over the period from Jan 1999 to July 2022. Details of the the characteristics of the synthetic series are given in the post referred to above.

This image has an empty alt attribute; its file name is Fig3-12.png

Because we want to create a trading strategy that will perform under market conditions close to those currently prevailing, I will downsample the synthetic series to include only those that correlate quite closely, i.e. with a minimum correlation of 0.75, with the real price data.

Why do this? Surely if we want to make a strategy as robust as possible we should use all of the synthetic data series for model development?

The reason is that I believe that some of the more extreme adverse scenarios generated by the algorithm may occur quite rarely, perhaps once in every few decades. However, I am principally interested in a strategy that I can apply under current market conditions and I am prepared to take my chances that the worst-case scenarios are unlikely to come about any time soon. This is a major design decision, one that you may disagree with. Of course, one could make use of every available synthetic data series in the development of the trading model and by doing so it is likely that you would produce a model that is more robust. But the training could take longer and the performance during normal market conditions may not be as good.

Having generated the price series, the process I am going to follow is to use genetic programming to develop trading strategies that will be evaluated on all of the synthetic data series simultaneously. I will then use the performance of the aggregate portfolio, i.e. the outcome of all of the trades generated by the strategy when applied to all of the synthetic series, to assess the overall performance. In order to be considered, candidate strategies have to perform well under all of the different market scenarios, or at least the great majority of them. This ensures that the strategy is likely to prove more robust across different types of market conditions, rather than on just the single type of market scenario observed in the real historical series.

As usual in these cases I will reserve a portion (10%) of each data series for testing each strategy, and a further 10% sample for out-of-sample validation. This isn’t strictly necessary: since the real data series has not be used directly in the development of the trading system, we can later test the strategy on all of the historical data and regard this as an out-of-sample backtest.

To implement the procedure I am going to use Mike Bryant’s excellent Adaptrade Builder software.

This is an exemplar of outstanding software engineering and provides a broad range of features for generating trading strategies of every kind. One feature of Builder that is particularly useful in this context is its ability to construct strategies and test them on up to 20 data series concurrently. This enables us to develop a strategy using all of the synthetic data series simultaneously, showing the performance of each individual strategy as well for as the aggregate portfolio.

After evolving strategies for 50 generations we arrive at the following outcome:

The equity curve for the aggregate portfolio is shown in blue, while the equity curves for the strategy applied to individual synthetic data series are shown towards the bottom of the chart. Of course, the performance of the aggregate portfolio appears much superior to any of the individual strategies, because it is effectively the arithmetic sum of the individual equity curves. And just because the aggregate portfolio appears to perform well both in-sample and out-of-sample, that doesn’t imply that the strategy works equally well for every individual market scenario. In some scenarios it performs better than in others, as can be observed from the individual equity curves.

But, in any case, our objective here is not to create a stock portfolio strategy, but rather to trade a single asset – the S&P 500 Index. The role of the aggregate portfolio is simply to suggest that we may have found a strategy that is sufficiently robust to work well across a variety of market conditions, as represented by the various synthetic price series.

Builder generates code for the strategies it evolves in a number of different languages and in this case we take the EasyLanguage code for the fittest strategy #77 and apply it to a daily chart for the S&P 500 Index – i.e. the real data series – in Tradestation, with the following results:

The strategy appears to work well “out-of-the-box”, i,e, without any further refinement. So our quest for a robust strategy appears to have been quite successful, given that none of the 23-year span of real market data on which the strategy was tested was used in the development process.

We can take the process a little further, however, by “optimizing” the strategy. Traditionally this would mean finding the optimal set of parameters that produces the highest net profit on the test data. But this would be curve fitting in the worst possible sense, and is not at all what I am suggesting.

Instead we use a procedure known as Walk Forward Optimization (WFO), as described in this post:

The goal of WFO is not to curve-fit the best parameters, which would entirely defeat the object of using synthetic data. Instead, its purpose is to test the robustness of the strategy. We accomplish this by using a sequence of overlapping in-sample and out-of-sample periods to evaluate how well the strategy stands up, assuming the parameters are optimized on in-sample periods of varying size and start date and tested of similarly varying out-of-sample periods. A strategy that fails a cluster of such tests is unlikely to prove robust in live trading. A strategy that passes a test cluster at least demonstrates some capability to perform well in different market regimes.

To some extent we might regard such a test as unnecessary, given that the strategy has already been observed to perform well under several different market conditions, encapsulated in the different synthetic price series, in addition to the real historical price series. Nonetheless, we conduct a WFO cluster test to further evaluate the robustness of the strategy.

As the goal of the procedure is not to maximize the theoretical profitability of the strategy, but rather to evaluate its robustness, we select a criterion other than net profit as the factor to optimize. Specifically, we select the sum of the areas of the strategy drawdowns as the quantity to minimize (by maximizing the inverse of the sum of drawdown areas, which amounts to the same thing). This requires a little explanation.

If we look at the strategy drawdown periods of the equity curve, we observe several periods (highlighted in red) in which the strategy was underwater:

The area of each drawdown represents the length and magnitude of the drawdown and our goal here is to minimize the sum of these areas, so that we reduce both the total duration and severity of strategy drawdowns.

In each WFO test we use different % of OOS data and a different number of runs, assessing the performance of the strategy on a battery of different criteria:

x

These criteria not only include overall profitability, but also factors such as parameter stability, profit consistency in each test, the ratio of in-sample to out-of-sample profits, etc. In other words, this WFO cluster analysis is not about profit maximization, but robustness evaluation, as assessed by these several different metrics. And in this case the strategy passes every test with flying colors:

Other than validating the robustness of the strategy’s performance, the overall effect of the procedure is to slightly improve the equity curve by diminishing the magnitude and duration of the drawdown periods:

Conclusion

We have shown how, by using synthetic price series, we can build a robust trading strategy that performs well under a variety of different market conditions, including on previously “unseen” historical market data. Further analysis using cluster WFO tests strengthens the assessment of the strategy’s robustness.

Backtest vs. Trading Reality

Kris Sidial, whose Twitter posts are often interesting, recently posted about the reality of trading profitability vs backtest performance, as follows:

While I certainly agree that the latter example is more representative of a typical trader’s P&L, I don’t concur that the first P&L curve is necessarily “99.9% garbage”. There are many strategies that have equity curves that are smoother and more monotonic than those of Kris’s Skeleton Case V2 strategy. Admittedly, most of these lie in the area of high frequency, which is not Kris’s domain expertise. But there are also lower frequency strategies that produce results which are not dissimilar to those shown the first chart.

As a case in point, consider the following strategy for the S&P 500 E-Mini futures contract, described in more detail below. The strategy was developed using 15-minute bar data from 1999 to 2012, and traded live thereafter. The live and backtest performance characteristics are almost indistinguishable, not only in terms of rate of profit, but also in regard to strategy characteristics such as the no. of trades, % win rate and profit factor.

Just in case you think the picture is a little too rosy, I would point out that the average profit factor is 1.25, which means that the strategy is generating only 25% more in profits than losses. There will be big losing trades from time to time and long sequences of losses during which the strategy appears to have broken down. It takes discipline to resist the temptation to “fix” the strategy during extended drawdowns and instead rely on reversion to the mean rate of performance over the long haul. One source of comfort to the trader through such periods is that the 60% win rate means that the majority of trades are profitable.

As you read through the replies to Kris’s post, you will see that several of his readers make the point that strategies with highly attractive equity curves and performance characteristics are typically capital constrained. This is true in the case of this strategy, which I trade with a very modest amount of (my own) capital. Even trading one-lots in the E-Mini futures I occasionally experience missed trades, either on entry or exit, due to limit orders not being filled at the high or low of a bar. In scaling the strategy up to something more meaningful such as a 10-lot, there would be multiple partial fills to deal with. But I think it would be a mistake to abandon a high performing strategy such as this just because of an apparent capacity constraint. There are several approaches one can explore to address the issue, which may be enough to make the strategy scalable.

Where (as here) the issue of scalability relates to the strategy fill rate on limit orders, a good starting point is to compute the extreme hit rate, which is the proportion of trades that take place at the high or low of the bar. As a rule of thumb, for strategies running on typical low frequency infrastructure an extreme hit rate of 10% or less is manageable; anything above that level quickly becomes problematic. If the extreme hit rate is very high, e.g. 25% or more, then you are going to have to pay a great deal of attention to the issues of latency and order priority to make the strategy viable in practise. Ultimately, for a high frequency market making strategy, most orders are filled at the extreme of each “bar”, so almost all of the focus in on minimizing latency and maintaining a high queue priority, with all of the attendant concerns regarding trading hardware, software and infrastructure.

Next, you need a strategy for handling missed trades. You could, for example, decide to skip any entry trades that are missed, while manually entering unfilled exit trades at the market. Or you could post market orders for both entry and exit trades if they are not filled. An extreme solution would be to substitute market-if-touched orders for limit orders in your strategy code. But this would affect all orders generated by the system, not just the 10% at the high or low of the bar and is likely to have a very adverse affect on overall profitability, especially if the average trade is low (because you are paying an extra tick on entry and exit of every trade).

The above suggests that you are monitoring the strategy manually, running simulation and live versions side by side, so that you can pick up any trades that the strategy should have taken, but which have been missed. This may be practical for a strategy that trades during regular market hours, but not for one that also trades the overnight session.

An alternative approach, one that is commonly applied by systematic traders, is to automate the handling of missed trades. Typically the trader will set a parameter that converts a limit order to a market order X seconds after a limit price has been traded but not filled. Of course, this will result in paying up an extra tick (or more) to enter trades that perhaps would have been filled if one had waited longer than X seconds. It will have some negative impact on strategy profitability, but not too much if the extreme hit rate is low. I tend to use this method for exit trades, preferring to skip any entry trades that don’t get filled at the limit price.

Beyond these simple measures, there are several other ways to extend the capacity of the strategy. An obvious place to start is by evaluating strategy performance on different session times and bar lengths. So, in this case, we might look at deploying the strategy on both the day and night sessions. We can also evaluate performance on bars of different length. This will give different entry and exit points for individual trades and trades that are at the extreme of a bar on one timeframe may not be at the high or low of a bar on the other timescale. For example, here is the (simulated) performance of the strategy on 13 minute bars:

There is a reason for choosing a bar interval such as 13 minutes, rather than the more commonplace 5- or 10 minutes, as explained in this post:

Finally, it is worth exploring whether the strategy can be applied to other related markets such as NQ futures, for example. Typically this will entail some change to the strategy code to reflect the difference in price levels, but the thrust of the strategy logic will be similar. Another approach is to use the signals from the current strategy as inputs – i.e. alpha generators – for a derivative strategy, such as trading the SPY ETF based on signals from the ES strategy. The performance of the derived strategy may not be as good, but in a product like SPY the capacity might be larger.

Tactical Mutual Fund Strategies

A recent blog post of mine was posted on Seeking Alpha (see summary below if you missed it).

Capital

The essence of the idea is simply that one can design long-only, tactical market timing strategies that perform robustly during market downturns, or which may even be positively correlated with volatility.  I used the example of a LOMT (“Long-Only Market-Timing”) strategy that switches between the SPY ETF and 91-Day T-Bills, depending on the current outlook for the market as characterized by machine learning algorithms.  As I indicated in the article, the LOMT handily outperforms the buy-and-hold strategy over the period from 1994 -2017 by several hundred basis points:

Fig6

 

Of particular note is the robustness of the LOMT strategy performance during the market crashes in 2000/01 and 2008, as well as the correction in 2015:

 

Fig7

 

The Pros and Cons of Market Timing (aka “Tactical”) Strategies

One of the popular choices the investor concerned about downsize risk is to use put options (or put spreads) to hedge some of the market exposure.  The problem, of course, is that the cost of the hedge acts as a drag on performance, which may be reduced by several hundred basis points annually, depending on market volatility.    Trying to decide when to use option insurance and when to maintain full market exposure is just another variation on the market timing problem.

The point of tactical strategies is that, unlike an option hedge, they will continue to produce positive returns – albeit at a lower rate than the market portfolio – during periods when markets are benign, while at the same time offering much superior returns during market declines, or crashes.   If the investor is concerned about the lower rate of return he is likely to achieve during normal years, the answer is to make use of leverage.

SSALGOTRADING AD

Market timing strategies like Hull Tactical or the LOMT have higher risk-adjusted rates of return (Sharpe Ratios) than the market portfolio.  So the investor can make use of margin money to scale up his investment to about the same level of risk as the market index.  In doing so he will expect to earn a much higher rate of return than the market.

This is easy to do with products like LOMT or Hull Tactical, because they make use of marginable securities such as ETFs.   As I point out in the sections following, one of the shortcomings of applying the market timing approach to mutual funds, however, is that they are not marginable (not initially, at least), so the possibilities for using leverage are severely restricted.

Market Timing with Mutual Funds

An interesting suggestion from one Seeking Alpha reader was to apply the LOMT approach to the Vanguard 500 Index Investor fund (VFINX), which has a rather longer history than the SPY ETF.  Unfortunately, I only have ready access to data from 1994, but nonetheless applied the LOMT model over that time period.  This is an interesting challenge, since none of the VFINX data was used in the actual construction of the LOMT model.  The fact that the VFINX series is highly correlated with SPY is not the issue – it is typically the case that strategies developed for one asset will fail when applied to a second, correlated asset.  So, while it is perhaps hard to argue that the entire VFIX is out-of-sample, the performance of the strategy when applied to that series will serve to confirm (or otherwise) the robustness and general applicability of the algorithm.

The results turn out as follows:

 

Fig21

 

Fig22

 

Fig23

 

The performance of the LOMT strategy implemented for VFINX handily outperforms the buy-and-hold portfolios in the SPY ETF and VFINX mutual fund, both in terms of return (CAGR) and well as risk, since strategy volatility is less than half that of buy-and-hold.  Consequently the risk adjusted return (Sharpe Ratio) is around 3x higher.

That said, the VFINX variation of LOMT is distinctly inferior to the original version implemented in the SPY ETF, for which the trading algorithm was originally designed.   Of particular significance in this context is that the SPY version of the LOMT strategy produces substantial gains during the market crash of 2008, whereas the VFINX version of the market timing strategy results in a small loss for that year.  More generally, the SPY-LOMT strategy has a higher Sortino Ratio than the mutual fund timing strategy, a further indication of its superior ability to manage  downside risk.

Given that the objective is to design long-only strategies that perform well in market downturns, one need not pursue this particular example much further , since it is already clear that the LOMT strategy using SPY is superior in terms of risk and return characteristics to the mutual fund alternative.

Practical Limitations

There are other, practical issues with apply an algorithmic trading strategy a mutual fund product like VFINX. To begin with, the mutual fund prices series contains no open/high/low prices, or volume data, which are often used by trading algorithms.  Then there are the execution issues:  funds can only be purchased or sold at market prices, whereas many algorithmic trading systems use other order types to enter and exit positions (stop and limit orders being common alternatives). You can’t sell short and  there are restrictions on the frequency of trading of mutual funds and penalties for early redemption.  And sales loads are often substantial (3% to 5% is not uncommon), so investors have to find a broker that lists the selected funds as no-load for the strategy to make economic sense.  Finally, mutual funds are often treated by the broker as ineligible for margin for an initial period (30 days, typically), which prevents the investor from leveraging his investment in the way that he do can quite easily using ETFs.

For these reasons one typically does not expect a trading strategy formulated using a stock or ETF product to transfer easily to another asset class.  The fact that the SPY-LOMT strategy appears to work successfully on the VFINX mutual fund product  (on paper, at least) is highly unusual and speaks to the robustness of the methodology.  But one would be ill-advised to seek to implement the strategy in that way.  In almost all cases a better result will be produced by developing a strategy designed for the specific asset (class) one has in mind.

A Tactical Trading Strategy for the VFINX Mutual Fund

A better outcome can possibly be achieved by developing a market timing strategy designed specifically for the VFINX mutual fund.  This strategy uses only market orders to enter and exit positions and attempts to address the issue of frequent trading by applying a trading cost to simulate the fees that typically apply in such situations.  The results, net of imputed fees, for the period from 1994-2017 are summarized as follows:

 

Fig24

 

Fig18

Overall, the CAGR of the tactical strategy is around 88 basis points higher, per annum.  The risk-adjusted rate of return (Sharpe Ratio) is not as high as for the LOMT-SPY strategy, since the annual volatility is almost double.  But, as I have already pointed out, there are unanswered questions about the practicality of implementing the latter for the VFINX, given that it seeks to enter trades using limit orders, which do not exist in the mutual fund world.

The performance of the tactical-VFINX strategy relative to the VFINX fund falls into three distinct periods: under-performance in the period from 1994-2002, about equal performance in the period 2003-2008, and superior relative performance in the period from 2008-2017.

Only the data from 1/19934 to 3/2008 were used in the construction of the model.  Data in the period from 3/2008 to 11/2012 were used for testing, while the results for 12/2012 to 8/2017 are entirely out-of-sample. In other words, the great majority of the period of superior performance for the tactical strategy was out-of-sample.  The chief reason for the improved performance of the tactical-VFINX strategy is the lower drawdown suffered during the financial crisis of 2008, compared to the benchmark VFINX fund.  Using market-timing algorithms, the tactical strategy was able identify the downturn as it occurred and exit the market.  This is quite impressive since, as perviously indicated, none of the data from that 2008 financial crisis was used in the construction of the model.

In his Seeking Alpha article “Alpha-Winning Stars of the Bull Market“, Brad Zigler identifies the handful of funds that have outperformed the VFINX benchmark since 2009, generating positive alpha:

Fig20

 

What is notable is that the annual alpha of the tactical-VINFX strategy, at 1.69%, is higher than any of those identified by Zigler as being “exceptional”. Furthermore, the annual R-squared of the tactical strategy is higher than four of the seven funds on Zigler’s All-Star list.   Based on Zigler’s performance metrics, the tactical VFINX strategy would be one of the top performing active funds.

But there is another element missing from the assessment. In the analysis so far we have assumed that in periods when the tactical strategy disinvests from the VFINX fund the proceeds are simply held in cash, at zero interest.  In practice, of course, we would invest any proceeds in risk-free assets such as Treasury Bills.   This would further boost the performance of the strategy, by several tens of basis points per annum, without any increase in volatility.  In other words, the annual CAGR and annual Alpha, are likely to be greater than indicated here.

Robustness Testing

One of the concerns with any backtest – even one with a lengthy out-of-sample period, as here – is that one is evaluating only a single sample path from the price process.  Different evolutions could have produced radically different outcomes in the past, or in future. To assess the robustness of the strategy we apply Monte Carlo simulation techniques to generate a large number of different sample paths for the price process and evaluate the performance of the strategy in each scenario.

Three different types of random variation are factored into this assessment:

  1. We allow the observed prices to fluctuate by +/- 30% with a probability of about 1/3 (so, roughly, every three days the fund price will be adjusted up or down by that up to that percentage).
  2. Strategy parameters are permitted to fluctuate by the same amount and with the same probability.  This ensures that we haven’t over-optimized the strategy with the selected parameters.
  3. Finally, we randomize the start date of the strategy by up to a year.  This reduces the risk of basing the assessment on the outcome from encountering a lucky (or unlucky) period, during which the market may be in a strong trend, for example.

In the chart below we illustrate the outcome from around 1,000 such randomized sample paths, from which it can be seen that the strategy performance is robust and consistent.

Fig 19

 

Limitations to the Testing Procedure

We have identified one way in which this assessment understates the performance of the tactical-VFINX strategy:  by failing to take into account the uplift in returns from investing in interest-bearing Treasury securities, rather than cash, at times when the strategy is out of the market.  So it is only reasonable to point out other limitations to the test procedure that may paint a too-optimistic picture.

The key consideration here is the frequency of trading.  On average, the tactical-VFINX strategy trades around twice a month, which is more than normally permitted for mutual funds.  Certainly, we have factored in additional trading costs to account for early redemptions charges. But the question is whether or not the strategy would be permitted to trade at such frequency, even with the payment of additional fees.  If not, then the strategy would have to be re-tooled to work on long average holding periods, no doubt adversely affecting its performance.

Conclusion

The purpose of this analysis was to assess whether, in principle, it is possible to construct a market timing strategy that is capable of outperforming a VFINX fund benchmark.  The answer appears to be in the affirmative.  However, several practical issues remain to be addressed before such a strategy could be put into production successfully.  In general, mutual funds are not ideal vehicles for expressing trading strategies, including tactical market timing strategies.  There are latent inefficiencies in mutual fund markets – the restrictions on trading and penalties for early redemption, to name but two – that create difficulties for active approaches to investing in such products – ETFs are much superior in this regard.  Nonetheless, this study suggest that, in principle, tactical approaches to mutual fund investing may deliver worthwhile benefits to investors, despite the practical challenges.

Beta Convexity

What is a Stock Beta?

Around a quarter of a century ago I wrote a paper entitled “Equity Convexity” which – to my disappointment – was rejected as incomprehensible by the finance professor who reviewed it.  But perhaps I should not have expected more: novel theories are rarely well received first time around.  I remain convinced the idea has merit and may perhaps revisit it in these pages at some point in future.  For now, I would like to discuss a related, but simpler concept: beta convexity.  As far as I am aware this, too, is new.  At least, while I find it unlikely that it has not already been considered, I am not aware of any reference to it in the literature.

SSALGOTRADING AD

We begin by reviewing the elementary concept of an asset beta, which is the covariance of the return of an asset with the return of the benchmark market index, divided by the variance of the return of the benchmark over a certain period:

Beta formula

Asset betas typically exhibit time dependency and there are numerous methods that can be used to model this feature, including, for instance, the Kalman Filter:

 

http://jonathankinlay.com/2015/02/statistical-arbitrage-using-kalman-filter/

Beta Convexity

In the context discussed here we set such matters to one side.  Instead of considering how an asset beta may vary over time, we look into how it might change depending on the direction of the benchmark index.  To take an example, let’s consider the stock Advaxis, Inc. (Nasdaq: ADXS).  In the charts below we examine the relationship between the daily stock returns and the returns in the benchmark Russell 3000 Index when the latter are positive and negative.

 

ADXS - Up Beta ADXS - Down Beta

 

The charts indicate that the stock beta tends to be higher during down periods in the benchmark index than during periods when the benchmark return is positive.  This can happen for two reasons: either the correlation between the asset and the index rises, or the volatility of the asset increases, (or perhaps both) when the overall market declines.  In fact, over the period from Jan 2012 to May 2017, the overall stock beta was 1.31, but the up-beta was only 0.44 while the down-beta was 1.53.  This is quite a marked difference and regardless of whether the change in beta arises from a change in the correlation or in the stock volatility, it could have a significant impact on the optimal weighting for this stock in an equity portfolio.

Ideally, what we would prefer to see is very little dependence in the relationship between the asset beta and the sign of the underlying benchmark.  One way to quantify such dependency is with what I have called Beta Convexity:

Beta Convexity = (Up-Beta – Down-Beta) ^2

A stock with a stable beta, i.e. one for which the difference between the up-beta and down-beta is negligibly small, will have a beta-convexity of zero. One the other hand, a stock that shows instability in its beta relationship with the benchmark will tend to have relatively large beta convexity.

 

Index Replication using a Minimum Beta-Convexity Portfolio

One way to apply this concept it to use it as a means of stock selection.  Regardless of whether a stock’s overall beta is large or small, ideally we want its dependency to be as close to zero as possible, i.e. with near-zero beta-convexity.  This is likely to produce greater stability in the composition of the optimal portfolio and eliminate unnecessary and undesirable excess volatility in portfolio returns by reducing nonlinearities in the relationship between the portfolio and benchmark returns.

In the following illustration we construct a stock portfolio by choosing the 500 constituents of the benchmark Russell 3000 index that have the lowest beta convexity during the previous 90-day period, rebalancing every quarter (hence all of the results are out-of-sample).  The minimum beta-convexity portfolio outperforms the benchmark by a total of 48.6% over the period from Jan 2012-May 2017, with an annual active return of 5.32% and Information Ratio of 1.36.  The portfolio tracking error is perhaps rather too large at 3.91%, but perhaps can be further reduced with the inclusion of additional stocks.

 

 

ResultsTable

 

Active Monthly

 

G1000

 

Active

Conclusion:  Beta Convexity as a New Factor

Beta convexity is a new concept that appears to have a useful role to play in identifying stocks that have stable long term dependency on the benchmark index and constructing index tracking portfolios capable of generating appreciable active returns.

The outperformance of the minimum-convexity portfolio is not the result of a momentum effect, or a systematic bias in the selection of high or low beta stocks.  The selection of the 500 lowest beta-convexity stocks in each period is somewhat arbitrary, but illustrates that the approach can scale to a size sufficient to deploy hundreds of millions of dollars of investment capital, or more.  A more sensible scheme might be, for example, to select a variable number of stocks based on a predefined tolerance limit on beta-convexity.

Obvious steps from here include experimenting with alternative weighting schemes such as value or beta convexity weighting and further refining the stock selection procedure to reduce the portfolio tracking error.

Further useful applications of the concept are likely to be found in the design of equity long/short and  market neural strategies. These I shall leave the reader to explore for now, but I will perhaps return to the topic in a future post.

Ethical Strategy Design

It isn’t often that you see an equity curve like the one shown below, which was produced by a systematic strategy built on 1-minute bars in the ProShares Ultra VIX Short-Term Futures ETF (UVXY):
Fig3

As the chart indicates, the strategy is very profitable, has a very high overall profit factor and a trade win rate in excess of 94%:

Fig4

 

FIG5

 

So, what’s not to like?  Well, arguably, one would like to see a strategy with a more balanced P&L, capable of producing profitable trades on the long as well as the short side. That would give some comfort that the strategy will continue to perform well regardless of whether the market tone is bullish or bearish. That said, it is understandable that the negative drift from carry in volatility futures, amplified by the leverage in the leveraged ETF product, makes it is much easier to make money by selling short.  This is  analogous to the long bias in the great majority of equity strategies, which relies on the positive drift in stocks.  My view would be that the short bias in the UVXY strategy is hardly a sufficient reason to overlook its many other very attractive features, any more than long bias is a reason to eschew equity strategies.

SSALGOTRADING AD

This example is similar to one we use in our training program for proprietary and hedge fund traders, to illustrate some of the pitfalls of strategy development.  We point out that the strategy performance has held up well out of sample – indeed, it matches the in-sample performance characteristics very closely.  When we ask trainees how they could test the strategy further, the suggestion is often made that we use Monte-Carlo simulation to evaluate the performance across a wider range of market scenarios than seen in the historical data.  We do this by introducing random fluctuations into the ETF prices, as well as in the strategy parameters, and by randomizing the start date of the test period.  The results are shown below. As you can see, while there is some variation in the strategy performance, even the worst simulated outcome appears very benign.

 

Fig2

Around this point trainees, at least those inexperienced in trading system development, tend to run out of ideas about what else could be done to evaluate the strategy.  One or two will mention drawdown risk, but the straight-line equity curve indicates that this has not been a problem for the strategy in the past, while the results of simulation testing suggest that drawdowns are unlikely to be a significant concern, across a broad spectrum of market conditions.  Most trainees simply want to start trading the strategy as soon as possible (although the more cautious of them will suggest trading in simulation mode for a while).

As this point I sometimes offer to let trainees see the strategy code, on condition that they agree to trade the strategy with their own capital.   Being smart people, they realize something must be wrong, even if they are unable to pinpoint what the problem may be.  So the discussion moves on to focus in more detail the question of strategy risk.

A Deeper Dive into Strategy Risk

At this stage I point out to trainees that the equity curve shows the result from realized gains and losses. What it does not show are the fluctuations in equity that occurred before each trade was closed.

That information is revealed by the following report on the maximum adverse excursion (MAE), which plots the maximum drawdown in each trade vs. the final trade profit or loss.  Once trainees understand the report, the lights begin to come on.  We can see immediately that there were several trades which were underwater to the tune of $30,000, $50,000, or even $70,000 , or more, before eventually recovering to produce a profit.  In the most extreme case the trade was almost $80,000 underwater, before producing a profit of only a few hundred dollars. Furthermore, the drawdown period lasted for several weeks, which represents almost geological time for a strategy operating on 1-minute bars. It’s not hard to grasp the concept that risking $80,000 of your own money in order to make $250 is hardly an efficient use of capital, or an acceptable level of risk-reward.


FIG6 FIG7

 

FIG8

 

Next, I ask for suggestions for how to tackle the problem of drawdown risk in the strategy.   Most trainees will suggest implementing a stop-loss strategy, similar to those employed by thousands of  trading firms.  Looking at the MAE chart, it appears that we can avert the worst outcomes with a stop loss limit of, say, $25,000.  However, when we implement a stop loss strategy at this level, here’s the outcome it produces:

 

FIG9

Now we see the difficulty.  Firstly, what a stop-loss strategy does is simply crystallize the previously unrealized drawdown losses.  Consequently, the equity curve looks a great deal less attractive than it did before.  The second problem is more subtle: the conditions that produced the loss-making trades tend to continue for some time, perhaps as long as several days, or weeks.  So, a strategy that has a stop loss risk overlay will tend to exit the existing position, only to reinstate a similar position more or less immediately.  In other words, a stop loss achieves very little, other than to force the trader to accept losses that the strategy would have made up if it had been allowed to continue.  This outcome is a difficult one to accept, even in the face of the argument that a stop loss serves the purpose of protecting the trader (and his firm) from an even more catastrophic loss.  Because if the strategy tends to re-enter exactly the same position shortly after being stopped out, very little has been gained in terms of catastrophic risk management.

Luck and the Ethics of Strategy Design

What are the learning points from this exercise in trading system development?  Firstly, one should resist being beguiled by stellar-looking equity curves: they may disguise the true risk characteristics of the strategy, which can only be understood by a close study of strategy drawdowns and  trade MAE.  Secondly, a lesson that many risk managers could usefully take away is that a stop loss is often counter-productive, serving only to cement losses that the strategy would otherwise have recovered from.

A more subtle point is that a Geometric Brownian Motion process has a long-term probability of reaching any price level with certainty.  Accordingly, in theory one has only to wait long enough to recover from any loss, no matter how severe.   Of course, in the meantime, the accumulated losses might be enough to decimate the trading account, or even bring down the entire firm (e.g. Barings).  The point is,  it is not hard to design a system with a very seductive-looking backtest performance record.

If the solution is not a stop loss, how do we avoid scenarios like this one?  Firstly, if you are trading someone else’s money, one answer is: be lucky!  If you happened to start trading this strategy some time in 2016, you would probably be collecting a large bonus.  On the other hand, if you were unlucky enough to start trading in early 2017, you might be collecting a pink slip very soon.  Although unethical, when you are gambling with other people’s money, it makes economic sense to take such risks, because the potential upside gain is so much greater than the downside risk (for you). When you are risking with your own capital, however, the calculus is entirely different.  That is why we always trade strategies with our own capital before opening them to external investors (and why we insist that our prop traders do the same).

As a strategy designer, you know better, and should act accordingly.  Investors, who are relying on your skills and knowledge, can all too easily be seduced by the appearance of a strategy’s outstanding performance, overlooking the latent risks it hides.  We see this over and over again in option-selling strategies, which investors continue to pile into despite repeated demonstrations of their capital-destroying potential.  Incidentally, this is not a point about backtest vs. live trading performance:  the strategy illustrated here, as well as many option-selling strategies, are perfectly capable of producing live track records similar to those seen in backtest.  All you need is some luck and an uneventful period in which major drawdowns don’t arise.  At Systematic Strategies, our view is that the strategy designer is under an obligation to shield his investors from such latent risks, even if they may be unaware of them.  If you know that a strategy has such risk characteristics, you should avoid it, and design a better one.  The risk controls, including limitations on unrealized drawdowns (MAE) need to be baked into the strategy design from the outset, not fitted retrospectively (and often counter-productively, as we have seen here).

The acid test is this:  if you would not be prepared to risk your own capital in a strategy, don’t ask your investors to take the risk either.

The ethical principle of “do unto others as you would have them do unto you” applies no less in investment finance than it does in life.

Strategy Code

Code for UVXY Strategy

 

Improving Trading System Performance Using a Meta-Strategy

What is a Meta-Strategy?

In my previous post on identifying drivers of strategy performance I mentioned the possibility of developing a meta-strategy.

fig0A meta-strategy is a trading system that trades trading systems.  The idea is to develop a strategy that will make sensible decisions about when to trade a specific system, in a way that yields superior performance compared to simply following the underlying trading system.  Put another way, the simplest kind of meta-strategy is a long-only strategy that takes positions in some underlying trading system.  At times, it will follow the underlying system exactly; at other times it is out of the market and ignore the trading system’s recommendations.

More generally, a meta-strategy can determine the size in which one, or several, systems should be traded at any point in time, including periods where the size can be zero (i.e. the system is not currently traded).  Typically, a meta-strategy is long-only:  in theory there is nothing to stop you developing a meta-strategy that shorts your underlying strategy from time to time, but that is a little counter-intuitive to say the least!

A meta-strategy is something that could be very useful for a fund-of-funds, as a way of deciding how to allocate capital amongst managers.

Caissa Capital operated a meta-strategy in its option arbitrage hedge fund back in the early 2000’s.  The meta-strategy (we called it a “model management system”) selected from a half dozen different volatility models to be used for option pricing, depending their performance, as measured by around 30 different criteria.  The criteria included both statistical metrics, such as the mean absolute percentage error in the forward volatility forecasts, as well as trading performance criteria such as the moving average of the trade PNL.  The model management system probably added 100 – 200 basis points per annum to the performance the underlying strategy, so it was a valuable add-on.

Illustration of a Meta-Strategy in US Bond Futures

To illustrate the concept we will use an underlying system that trades US Bond futures at 15-minute bar intervals.  The performance of the system is summarized in the chart and table below.

Fig1A

 

FIG2A

 

Strategy performance has been very consistent over the last seven years, in terms of the annual returns, number of trades and % win rate.  Can it be improved further?

To assess this possibility we create a new data series comprising the points of the equity curve illustrated above.  More specifically, we form a series comprising the open, high, low and close values of the strategy equity, for each trade.  We will proceed to treat this as a new data series and apply a range of different modeling techniques to see if we can develop a trading strategy, in exactly the same way as we would if the underlying was a price series for a stock.

It is important to note here that, for the meta-strategy at least, we are working in trade-time, not calendar time. The x-axis will measure the trade number of the underlying strategy, rather than the date of entry (or exit) of the underlying trade.  Thus equally spaced points on the x-axis represent different lengths of calendar time, depending on the duration of each trade.

It is necessary to work in trade time rather than calendar time because, unlike a stock, it isn’t possible to trade the underlying strategy whenever we want to – we can only enter or exit the strategy at points in time when it is about to take a trade, by accepting that trade or passing on it (we ignore the other possibility which is sizing the underlying trade, for now).

SSALGOTRADING AD

Another question is what kinds of trading ideas do we want to consider for the meta-strategy?  In principle one could incorporate almost any trading concept, including the usual range of technical indictors such as RSI, or Bollinger bands.  One can go further an use machine learning techniques, including Neural Networks, Random Forest, or SVM.

In practice, one tends to gravitate towards the simpler kinds of trading algorithm, such as moving averages (or MA crossover techniques), although there is nothing to say that more complex trading rules should not be considered.  The development process follows a familiar path:  you create a hypothesis, for example, that the equity curve of the underlying bond futures strategy tends to be mean-reverting, and then proceed to test it using various signals – perhaps a moving average, in this case.  If the signal results in a potential improvement in the performance of the default meta-strategy (which is to take every trade in the underlying system system), one includes it in the library of signals that may ultimately be combined to create the finished meta-strategy.

As with any strategy development you should follows the usual procedure of separating the trade data to create a set used for in-sample modeling and out-of-sample performance testing.

Following this general procedure I arrived at the following meta-strategy for the bond futures trading system.

FigB1

FigB2

The modeling procedure for the meta-strategy has succeeded in eliminating all of the losing trades in the underlying bond futures system, during both in-sample and out-of-sample periods (comprising the most recent 20% of trades).

In general, it is unlikely that one can hope to improve the performance of the underlying strategy quite as much as this, of course.  But it may well be possible to eliminate a sufficient proportion of losing trades to reduce the equity curve drawdown and/or increase the overall Sharpe ratio by a significant amount.

A Challenge / Opportunity

If you like the meta-strategy concept, but are unsure how to proceed, I may be able to help.

Send me the data for your existing strategy (see details below) and I will attempt to model a meta-strategy and send you the results.  We can together evaluate to what extent I have been successful in improving the performance of the underlying strategy.

Here are the details of what you need to do:

1. You must have an existing, profitable strategy, with sufficient performance history (either real, simulated, or a mixture of the two).  I don’t need to know the details of the underlying strategy, or even what it is trading, although it would be helpful to have that information.

2. You must send  the complete history of the equity curve of the underlying strategy,  in Excel format, with column headings Date, Open, High, Low, Close.  Each row represents consecutive trades of the underlying system and the O/H/L/C refers to the value of the equity curve for each trade.

3.  The history must comprise at least 500 trades as an absolute minimum and preferably 1000 trades, or more.

4. At this stage I can only consider a single underlying strategy (i.e. a single equity curve)

5.  You should not include any software or algorithms of any kind.  Nothing proprietary, in other words.

6.  I will give preference to strategies that have a (partial) live track record.

As my time is very limited these days I will not be able to deal with any submissions that fail to meet these specifications, or to enter into general discussions about the trading strategy with you.

You can reach me at jkinlay@systematic-strategies.com

 

Identifying Drivers of Trading Strategy Performance

Building a winning strategy, like the one in the e-Mini S&P500 futures described here is only half the challenge:  it remains for the strategy architect to gain an understanding of the sources of strategy alpha, and risk.  This means identifying the factors that drive strategy performance and, ideally, building a model so that their relative importance can be evaluated.  A more advanced step is the construction of a meta-model that will predict strategy performance and provided recommendations as to whether the strategy should be traded over the upcoming period.

Strategy Performance – Case Study

Let’s take a look at how this works in practice.  Our case study makes use of the following daytrading strategy in e-Mini futures.

Fig1

The overall performance of the strategy is quite good.  Average monthly PNL over the period from April to Oct 2015 is almost $8,000 per contract, after fees, with a standard deviation of only $5,500. That equates to an annual Sharpe Ratio in the region of 5.0.  On a decent execution platform the strategy should scale to around 10-15 contracts, with an annual PNL of around $1.0 to $1.5 million.

Looking into the performance more closely we find that the win rate (56%) and profit factor (1.43) are typical for a profitable strategy of medium frequency, trading around 20 times per session (in this case from 9:30AM to 4PM EST).

fig2

Another attractive feature of the strategy risk profile is the Max Adverse Execution, the drawdown experienced in individual trades (rather than the realized drawdown). In the chart below we see that the MAE increases steadily, without major outliers, to a maximum of only around $1,000 per contract.

Fig3

One concern is that the average trade PL is rather small – $20, just over 1.5 ticks. Strategies that enter and exit with limit orders and have small average trade are generally highly dependent on the fill rate – i.e. the proportion of limit orders that are filled.  If the fill rate is too low, the strategy will be left with too many missed trades on entry or exit, or both.  This is likely to damage strategy performance, perhaps to a significant degree – see, for example my post on High Frequency Trading Strategies.

The fill rate is dependent on the number of limit orders posted at the extreme high or low of the bar, known as the extreme hit rate.  In this case the strategy has been designed specifically to operate at an extreme hit rate of only around 10%, which means that, on average, only around one trade in ten occurs at the high or low of the bar.  Consequently, the strategy is not highly fill-rate dependent and should execute satisfactorily even on a retail platform like Tradestation or Interactive Brokers.

Drivers of Strategy Performance

So far so good.  But before we put the strategy into production, let’s try to understand some of the key factors that determine its performance.  Hopefully that way we will be better placed to judge how profitable the strategy is likely to be as market conditions evolve.

In fact, we have already identified one potential key performance driver: the extreme hit rate (required fill rate) and determined that it is not a major concern in this case. However, in cases where the extreme hit rate rises to perhaps 20%, or more, the fill ratio is likely to become a major factor in determining the success of the strategy.  It would be highly inadvisable to attempt implementation of such a strategy on a retail platform.

SSALGOTRADING AD

What other factors might affect strategy performance?  The correct approach here is to apply the scientific method:  develop some theories about the drivers of performance and see if we can find evidence to support them.

For this case study we might conjecture that, since the strategy enters and exits using limit orders, it should exhibit characteristics of a mean reversion strategy, which will tend to do better when the market moves sideways and rather worse in a strongly trending market.

Another hypothesis is that, in common with most day-trading and high frequency strategies, this strategy will produce better results during periods of higher market volatility.  Empirically, HFT firms have always produced higher profits during volatile market conditions  – 2008 was a banner year for many of them, for example.  In broad terms, times when the market is whipsawing around create additional opportunities for strategies that seek to exploit temporary mis-pricings.  We shall attempt to qualify this general understanding shortly.  For now let’s try to gather some evidence that might support the hypotheses we have formulated.

I am going to take a very simple approach to this, using linear regression analysis.  It’s possible to do much more sophisticated analysis using nonlinear methods, including machine learning techniques. In our regression model the dependent variable will be the daily strategy returns.  In the first iteration, let’s use measures of market returns, trading volume and market volatility as the independent variables.

Fig4

The first surprise is the size of the (adjusted) R Square – at 28%, this far exceeds the typical 5% to 10% level achieved in most such regression models, when applied to trading systems.  In other words, this model does a very good job of account for a large proportion of the variation in strategy returns.

Note that the returns in the underlying S&P50o index play no part (the coefficient is not statistically significant). We might expect this: ours is is a trading strategy that is not specifically designed to be directional and has approximately equivalent performance characteristics on both the long and short side, as you can see from the performance report.

Now for the next surprise: the sign of the volatility coefficient.  Our ex-ante hypothesis is that the strategy would benefit from higher levels of market volatility.  In fact, the reverse appears to be true (due to the  negative coefficient).  How can this be?  On further reflection, the reason why most HFT strategies tend to benefit from higher market volatility is that they are momentum strategies.  A momentum strategy typically enters and exits using market orders and hence requires  a major market move to overcome the drag of the bid-offer spread (assuming it calls the market direction correctly!).  This strategy, by contrast, is a mean-reversion strategy, since entry/exits are effected using limit orders.  The strategy wants the S&P500 index to revert to the mean – a large move that continues in the same direction is going to hurt, not help, this strategy.

Note, by contrast, that the coefficient for the volume factor is positive and statistically significant.  Again this makes sense:  as anyone who has traded the e-mini futures overnight can tell you, the market tends to make major moves when volume is light – simply because it is easier to push around.  Conversely, during a heavy trading day there is likely to be significant opposition to a move in any direction.  In other words, the market is more likely to trade sideways on days when trading volume is high, and this is beneficial for our strategy.

The final surprise and perhaps the greatest of all, is that the strategy alpha appears to be negative (and statistically significant)!  How can this be?  What the regression analysis  appears to be telling us is that the strategy’s performance is largely determined by two underlying factors, volume and volatility.

Let’s dig into this a little more deeply with another regression, this time relating the current day’s strategy return to the prior day’s volume, volatility and market return.

Fig5

In this regression model the strategy alpha is effectively zero and statistically insignificant, as is the case for lagged volume.  The strategy returns relate inversely to the prior day’s market return, which again appears to make sense for a mean reversion strategy:  our model anticipates that, in the mean, the market will reverse the prior day’s gain or loss.  The coefficient for the lagged volatility factor is once again negative and statistically significant.  This, too, makes sense:  volatility tends to be highly autocorrelated, so if the strategy performance is dependent on market volatility during the current session, it is likely to show dependency on volatility in the prior day’s session also.

So, in summary, we can provisionally conclude that:

This strategy has no market directional predictive power: rather it is a pure, mean-reversal strategy that looks to make money by betting on a reversal in the prior session’s market direction.  It will do better during periods when trading volume is high, and when market volatility is low.

Conclusion

Now that we have some understanding of where the strategy performance comes from, where do we go from here?  The next steps might include some, or all, of the following:

(i) A more sophisticated econometric model bringing in additional lags of the explanatory variables and allowing for interaction effects between them.

(ii) Introducing additional exogenous variables that may have predictive power. Depending on the nature of the strategy, likely candidates might include related equity indices and futures contracts.

(iii) Constructing a predictive model and meta-strategy that would enable us assess the likely future performance of the strategy, and which could then be used to determine position size.  Machine learning techniques can often be helpful in this content.

I will give an example of the latter approach in my next post.

Trading Strategy Design

In this post I want to share some thoughts on how to design great automated trading strategies – what to look for, and what to avoid.

For illustrative purposes I am going to use a strategy I designed for the ever-popular S&P500 e-mini futures contract.

The overall equity curve for the strategy is show below.

@ES Equity Curve

This is often the best place to start.  What you want to see, of course, is a smooth, upward-sloping curve, without too many sizable drawdowns, and one in which the strategy continues to make new highs.  This is especially important in the out-of-sample test period (Jan 2014- Jul 2015 in this case).  You will notice a flat period around 2013, which we will need to explore later.  Overall, however, this equity curve appears to fit the stereotypical pattern we hope to see when developing a new strategy.

Let’s move on look at the overall strategy performance numbers.

STRATEGY PERFORMANCE CHARACTERISTICS

@ES Perf Summary(click to enlarge)

 1. Net Profit
Clearly, the most important consideration.  Over the 17 year test period the strategy has produced a net profit  averaging around $23,000 per annum, per contract.  As a rough guide, you would want to see a net profit per contract around 10x the maintenance margin, or higher.

2. Profit Factor
The gross profit divided by the gross loss.  You want this to be as high as possible. Too low, as the strategy will be difficult to trade, because you will see sustained periods of substantial losses.  I would suggest a minimum acceptable PF in the region of 1.25.  Many strategy developers aim for a PF of 1.5, or higher.

3. Number of Trades
Generally, the more trades the better, at least from the point of view of building confidence in the robustness of strategy performance.  A strategy may show a great P&L, but if it only trades once a month it is going to take many many years of performance data to ensure statistical significance.  This strategy, on the other hand, is designed to trade 2-3 times a day.  Given that, and the length of the test period, there is little doubt that the results are statistically significant.

SSALGOTRADING AD

Profit Factor and number of trades are opposing design criteria – increasing the # trades tends to reduce the PF.  That consideration sets an upper bound on the # trades that can be accommodated, before the profit factor deteriorates to unacceptably low levels.  Typically, 4-5 trades a day is about the maximum trading frequency one can expect to achieve.

4. Win Rate
Novice system designers tend to assume that you want this to be as high as possible, but that isn’t typically the case.  It is perfectly feasible to design systems that have a 90% win rate, or higher, but which produce highly undesirable performance characteristics, such as frequent, large drawdowns.  For a typical trading system the optimal range for the win rate is in the region of 40% to 66%.  Below this range, it becomes difficult to tolerate the long sequences of losses that will result, without losing faith in the system.

5. Average Trade
This is the average net profit per trade.  A typical range would be $10 to $100.  Many designers will only consider strategies that have a higher average trade than this one, perhaps $50-$75, or more.  The issue with systems that have a very small average trade is that the profits can quickly be eaten up by commissions. Even though, in this case, the results are net of commissions, one can see a significant deterioration in profits if the average trade is low and trade frequency is high, because of the risk of low fill rates (i.e. the % of limit orders that get filled).  To assess this risk one looks at the number of fills assumed to take place at the high or low of the bar.  If this exceeds 10% of the total # trades, one can expect to see some slippage in the P&L when the strategy is put into production.

6. Average Bars
The number of bars required to complete a trade, on average.  There is no hard limit one can suggest here – it depends entirely on the size of the bars.  Here we are working in 60 minute bars, so a typical trade is held for around 4.5 hours, on average.   That’s a time-frame that I am comfortable with.  Others may be prepared to hold positions for much longer – days, or even weeks.

Perhaps more important is the average length of losing trades. What you don’t want to see is the strategy taking far longer to exit losing trades than winning trades. Again, this is a matter of trader psychology – it is hard to sit there hour after hour, or day after day, in a losing position – the temptation to cut the position becomes hard to ignore.  But, in doing that you are changing the strategy characteristics in a fundamental way, one that rarely produces a performance improvement.

What the strategy designer needs to do is to figure out in advance what the limits are of the investor’s tolerance for pain, in terms of maximum drawdown, average losing trade, etc, and design the strategy to meet those specifications, rather than trying to fix the strategy afterwards.

7. Required Account Size
It’s good to know exactly how large an account you need per contract, so you can figure out how to scale the strategy.  In this case one could hope to scale the strategy up to a 10-lot in a $100,000 account.  That may or may not fit the trader’s requirements and again, this needs to be considered at the outset.  For example, for a trader looking to utilize, say, $1,000,000 of capital, it is doubtful whether this strategy would fit his requirements without considerable work on the implementations issues that arise when trying to trade in anything approaching a 100 contract clip rate.

8. Commission
Always check to ensure that the strategy designer has made reasonable assumptions about slippage and commission.  Here we are assuming $5 per round turn.  There is no slippage, because the strategy executes using limit orders.

9. Drawdown
Drawdowns are, of course, every investor’s bugbear.  No-one likes drawdowns that are either large, or lengthy in relation to the annual profitability of the strategy, or the average trade duration.  A $10,000 max drawdown on a strategy producing over $23,000 a year is actually quite decent – I have seen many e-mini strategies with drawdowns at 2x – 3x that level, or larger.  Again, this is one of the key criteria that needs to be baked into the strategy design at the outset, rather than trying to fix later.

 ANNUAL PROFITABILITY

Let’s now take a look at how the strategy performs year-by-year, and some of the considerations and concerns that often arise.

@ES Annual1. Performance During Downturns
One aspect I always pay attention to is how well the strategy performs during periods of high market stress, because I expect similar conditions to arise in the fairly near future, e.g. as the Fed begins to raise rates.

Here, as you can see, the strategy performed admirably during both the dot com bust of 1999/2000 and the financial crisis of 2008/09.

2. Consistency in the # Trades and % Win Rate
It is not uncommon with low frequency strategies to see periods of substantial variation in the # trades or win rate.  Regardless how good the overall performance statistics are, this makes me uncomfortable.  It could be, for instance, that the overall results are influenced by one or two exceptional years that are unlikely to be repeated.  Significant variation in the trading or win rate raise questions about the robustness of the strategy, going forward.  On the other hand, as here, it is a comfort to see the strategy maintaining a very steady trading rate and % win rate, year after year.

3. Down Years
Every strategy shows variation in year to year performance and one expects to see years in which the strategy performs less well, or even loses money. For me, it rather depends on when such losses arise, as much as the size of the loss.  If a loss occurs in the out-of-sample period it raises serious questions about strategy robustness and, as a result, I am very unlikely to want to put such a strategy into production. If, as here, the period of poor performance occurs during the in-sample period I am less concerned – the strategy has other, favorable characteristics that make it attractive and I am willing to tolerate the risk of one modestly down-year in over 17 years of testing.

INTRA-TRADE DRAWDOWNS

Many trades that end up being profitable go through a period of being under-water.  What matters here is how high those intra-trade losses may climb, before the trade is closed.  To take an extreme example, would you be willing to risk $10,000 to make an average profit of only $10 per trade?  How about $20,000? $50,000? Your entire equity?

The Maximum Average Excursion chart below shows the drawdowns on a trade by trade basis.  Here we can see that, over the 17 year test period, no trade has suffered a drawdown of much more than $5,000.  I am comfortable with that level. Others may prefer a lower limit, or be tolerant of a higher MAE.

MAE

Again, the point is that the problem of a too-high MAE is not something one can fix after the event.  Sure, a stop loss will prevent any losses above a specified size.  But a stop loss also has the unwanted effect of terminating trades that would have turned into money-makers. While psychologically comfortable, the effect of a stop loss is almost always negative  in terms of strategy profitability and other performance characteristics, including drawdown, the very thing that investors are looking to control.

 CONCLUSION
I have tried to give some general guidelines for factors that are of critical importance in strategy design.  There are, of course, no absolutes:  the “right” characteristics depend entirely on the risk preferences of the investor.

One point that strategy designers do need to take on board is the need to factor in all of the important design criteria at the outset, rather than trying (and usually failing) to repair the strategy shortcomings after the event.