Robustness in Quantitative Research and Trading

What is Strategy Robustness?  What is its relevance to Quantitative Research and Trading?

One of the most highly desired properties of any financial model or investment strategy, by investors and managers alike, is robustness.  I would define robustness as the ability of the strategy to deliver a consistent  results across a wide range of market conditions.  It, of course, by no means the only desirable property – investing in Treasury bills is also a pretty robust strategy, although the returns are unlikely to set an investor’s pulse racing – but it does ensure that the investor, or manager, is unlikely to be on the receiving end of an ugly surprise when market conditions adjust.

Robustness is not the same thing as low volatility, which also tends to be a characteristic highly prized by many investors.  A strategy may operate consistently, with low volatility in certain market conditions, but behave very differently in other.  For instance, a delta-hedged short-volatility book containing exotic derivative positions.   The point is that empirical researchers do not know the true data-generating process for the markets they are modeling. When specifying an empirical model they need to make arbitrary assumptions. An example is the common assumption that assets returns follow a Gaussian distribution.  In fact, the empirical distribution of the great majority of asset process exhibit the characteristic of “fat tails”, which can result from the interplay between multiple market states with random transitions.  See this post for details:

http://jonathankinlay.com/2014/05/a-quantitative-analysis-of-stationarity-and-fat-tails/

 

In statistical arbitrage, for example, quantitative researchers often make use of cointegration models to build pairs trading strategies.  However the testing procedures used in current practice are not sufficient powerful to distinguish between cointegrated processes and those whose evolution just happens to correlate temporarily, resulting in the frequent breakdown in cointegrating relationships.  For instance, see this post:

http://jonathankinlay.com/2017/06/statistical-arbitrage-breaks/

Modeling Assumptions are Often Wrong – and We Know It

We are, of course, not the first to suggest that empirical models are misspecified:

“All models are wrong, but some are useful” (Box 1976, Box and Draper 1987).

 

Martin Feldstein (1982: 829): “In practice all econometric specifications are necessarily false models.”

 

Luke Keele (2008: 1): “Statistical models are always simplifications, and even the most complicated model will be a pale imitation of reality.”

 

Peter Kennedy (2008: 71): “It is now generally acknowledged that econometric models are false and there is no hope, or pretense, that through them truth will be found.”

During the crash of 2008 quantitative Analysts and risk managers found out the hard way that the assumptions underpinning the copula models used to price and hedge credit derivative products were highly sensitive to market conditions.  In other words, they were not robust.  See this post for more on the application of copula theory in risk management:

http://jonathankinlay.com/2017/01/copulas-risk-management/

 

Robustness Testing in Quantitative Research and Trading

We interpret model misspecification as model uncertainty. Robustness tests analyze model uncertainty by comparing a baseline model to plausible alternative model specifications.  Rather than trying to specify models correctly (an impossible task given causal complexity), researchers should test whether the results obtained by their baseline model, which is their best attempt of optimizing the specification of their empirical model, hold when they systematically replace the baseline model specification with plausible alternatives. This is the practice of robustness testing.

SSALGOTRADING AD

Robustness testing analyzes the uncertainty of models and tests whether estimated effects of interest are sensitive to changes in model specifications. The uncertainty about the baseline model’s estimated effect size shrinks if the robustness test model finds the same or similar point estimate with smaller standard errors, though with multiple robustness tests the uncertainty likely increases. The uncertainty about the baseline model’s estimated effect size increases of the robustness test model obtains different point estimates and/or gets larger standard errors. Either way, robustness tests can increase the validity of inferences.

Robustness testing replaces the scientific crowd by a systematic evaluation of model alternatives.

Robustness in Quantitative Research

In the literature, robustness has been defined in different ways:

  • as same sign and significance (Leamer)
  • as weighted average effect (Bayesian and Frequentist Model Averaging)
  • as effect stability We define robustness as effect stability.

Parameter Stability and Properties of Robustness

Robustness is the share of the probability density distribution of the baseline model that falls within the 95-percent confidence interval of the baseline model.  In formulaeic terms:

Formula

  • Robustness is left-–right symmetric: identical positive and negative deviations of the robustness test compared to the baseline model give the same degree of robustness.
  • If the standard error of the robustness test is smaller than the one from the baseline model, ρ converges to 1 as long as the difference in point estimates is negligible.
  • For any given standard error of the robustness test, ρ is always and unambiguously smaller the larger the difference in point estimates.
  • Differences in point estimates have a strong influence on ρ if the standard error of the robustness test is small but a small influence if the standard errors are large.

Robustness Testing in Four Steps

  1. Define the subjectively optimal specification for the data-generating process at hand. Call this model the baseline model.
  2. Identify assumptions made in the specification of the baseline model which are potentially arbitrary and that could be replaced with alternative plausible assumptions.
  3. Develop models that change one of the baseline model’s assumptions at a time. These alternatives are called robustness test models.
  4. Compare the estimated effects of each robustness test model to the baseline model and compute the estimated degree of robustness.

Model Variation Tests

Model variation tests change one or sometimes more model specification assumptions and replace with an alternative assumption, such as:

  • change in set of regressors
  • change in functional form
  • change in operationalization
  • change in sample (adding or subtracting cases)

Example: Functional Form Test

The functional form test examines the baseline model’s functional form assumption against a higher-order polynomial model. The two models should be nested to allow identical functional forms. As an example, we analyze the ‘environmental Kuznets curve’ prediction, which suggests the existence of an inverse u-shaped relation between per capita income and emissions.

Emissions and percapitaincome

Note: grey-shaded area represents confidence interval of baseline model

Another example of functional form testing is given in this review of Yield Curve Models:

http://jonathankinlay.com/2018/08/modeling-the-yield-curve/

Random Permutation Tests

Random permutation tests change specification assumptions repeatedly. Usually, researchers specify a model space and randomly and repeatedly select model from this model space. Examples:

  • sensitivity tests (Leamer 1978)
  • artificial measurement error (Plümper and Neumayer 2009)
  • sample split – attribute aggregation (Traunmüller and Plümper 2017)
  • multiple imputation (King et al. 2001)

We use Monte Carlo simulation to test the sensitivity of the performance of our Quantitative Equity strategy to changes in the price generation process and also in model parameters:

http://jonathankinlay.com/2017/04/new-longshort-equity/

Structured Permutation Tests

Structured permutation tests change a model assumption within a model space in a systematic way. Changes in the assumption are based on a rule, rather than random.  Possibilities here include:

  • sensitivity tests (Levine and Renelt)
  • jackknife test
  • partial demeaning test

Example: Jackknife Robustness Test

The jackknife robustness test is a structured permutation test that systematically excludes one or more observations from the estimation at a time until all observations have been excluded once. With a ‘group-wise jackknife’ robustness test, researchers systematically drop a set of cases that group together by satisfying a certain criterion – for example, countries within a certain per capita income range or all countries on a certain continent. In the example, we analyse the effect of earthquake propensity on quake mortality for countries with democratic governments, excluding one country at a time. We display the results using per capita income as information on the x-axes.

jackknife

Upper and lower bound mark the confidence interval of the baseline model.

Robustness Limit Tests

Robustness limit tests provide a way of analyzing structured permutation tests. These tests ask how much a model specification has to change to render the effect of interest non-robust. Some examples of robustness limit testing approaches:

  • unobserved omitted variables (Rosenbaum 1991)
  • measurement error
  • under- and overrepresentation
  • omitted variable correlation

For an example of limit testing, see this post on a review of the Lognormal Mixture Model:

http://jonathankinlay.com/2018/08/the-lognormal-mixture-variance-model/

Summary on Robustness Testing

Robustness tests have become an integral part of research methodology. Robustness tests allow to study the influence of arbitrary specification assumptions on estimates. They can identify uncertainties that otherwise slip the attention of empirical researchers. Robustness tests offer the currently most promising answer to model uncertainty.

Tactical Mutual Fund Strategies

A recent blog post of mine was posted on Seeking Alpha (see summary below if you missed it).

Capital

The essence of the idea is simply that one can design long-only, tactical market timing strategies that perform robustly during market downturns, or which may even be positively correlated with volatility.  I used the example of a LOMT (“Long-Only Market-Timing”) strategy that switches between the SPY ETF and 91-Day T-Bills, depending on the current outlook for the market as characterized by machine learning algorithms.  As I indicated in the article, the LOMT handily outperforms the buy-and-hold strategy over the period from 1994 -2017 by several hundred basis points:

Fig6

 

Of particular note is the robustness of the LOMT strategy performance during the market crashes in 2000/01 and 2008, as well as the correction in 2015:

 

Fig7

 

The Pros and Cons of Market Timing (aka “Tactical”) Strategies

One of the popular choices the investor concerned about downsize risk is to use put options (or put spreads) to hedge some of the market exposure.  The problem, of course, is that the cost of the hedge acts as a drag on performance, which may be reduced by several hundred basis points annually, depending on market volatility.    Trying to decide when to use option insurance and when to maintain full market exposure is just another variation on the market timing problem.

The point of tactical strategies is that, unlike an option hedge, they will continue to produce positive returns – albeit at a lower rate than the market portfolio – during periods when markets are benign, while at the same time offering much superior returns during market declines, or crashes.   If the investor is concerned about the lower rate of return he is likely to achieve during normal years, the answer is to make use of leverage.

SSALGOTRADING AD

Market timing strategies like Hull Tactical or the LOMT have higher risk-adjusted rates of return (Sharpe Ratios) than the market portfolio.  So the investor can make use of margin money to scale up his investment to about the same level of risk as the market index.  In doing so he will expect to earn a much higher rate of return than the market.

This is easy to do with products like LOMT or Hull Tactical, because they make use of marginable securities such as ETFs.   As I point out in the sections following, one of the shortcomings of applying the market timing approach to mutual funds, however, is that they are not marginable (not initially, at least), so the possibilities for using leverage are severely restricted.

Market Timing with Mutual Funds

An interesting suggestion from one Seeking Alpha reader was to apply the LOMT approach to the Vanguard 500 Index Investor fund (VFINX), which has a rather longer history than the SPY ETF.  Unfortunately, I only have ready access to data from 1994, but nonetheless applied the LOMT model over that time period.  This is an interesting challenge, since none of the VFINX data was used in the actual construction of the LOMT model.  The fact that the VFINX series is highly correlated with SPY is not the issue – it is typically the case that strategies developed for one asset will fail when applied to a second, correlated asset.  So, while it is perhaps hard to argue that the entire VFIX is out-of-sample, the performance of the strategy when applied to that series will serve to confirm (or otherwise) the robustness and general applicability of the algorithm.

The results turn out as follows:

 

Fig21

 

Fig22

 

Fig23

 

The performance of the LOMT strategy implemented for VFINX handily outperforms the buy-and-hold portfolios in the SPY ETF and VFINX mutual fund, both in terms of return (CAGR) and well as risk, since strategy volatility is less than half that of buy-and-hold.  Consequently the risk adjusted return (Sharpe Ratio) is around 3x higher.

That said, the VFINX variation of LOMT is distinctly inferior to the original version implemented in the SPY ETF, for which the trading algorithm was originally designed.   Of particular significance in this context is that the SPY version of the LOMT strategy produces substantial gains during the market crash of 2008, whereas the VFINX version of the market timing strategy results in a small loss for that year.  More generally, the SPY-LOMT strategy has a higher Sortino Ratio than the mutual fund timing strategy, a further indication of its superior ability to manage  downside risk.

Given that the objective is to design long-only strategies that perform well in market downturns, one need not pursue this particular example much further , since it is already clear that the LOMT strategy using SPY is superior in terms of risk and return characteristics to the mutual fund alternative.

Practical Limitations

There are other, practical issues with apply an algorithmic trading strategy a mutual fund product like VFINX. To begin with, the mutual fund prices series contains no open/high/low prices, or volume data, which are often used by trading algorithms.  Then there are the execution issues:  funds can only be purchased or sold at market prices, whereas many algorithmic trading systems use other order types to enter and exit positions (stop and limit orders being common alternatives). You can’t sell short and  there are restrictions on the frequency of trading of mutual funds and penalties for early redemption.  And sales loads are often substantial (3% to 5% is not uncommon), so investors have to find a broker that lists the selected funds as no-load for the strategy to make economic sense.  Finally, mutual funds are often treated by the broker as ineligible for margin for an initial period (30 days, typically), which prevents the investor from leveraging his investment in the way that he do can quite easily using ETFs.

For these reasons one typically does not expect a trading strategy formulated using a stock or ETF product to transfer easily to another asset class.  The fact that the SPY-LOMT strategy appears to work successfully on the VFINX mutual fund product  (on paper, at least) is highly unusual and speaks to the robustness of the methodology.  But one would be ill-advised to seek to implement the strategy in that way.  In almost all cases a better result will be produced by developing a strategy designed for the specific asset (class) one has in mind.

A Tactical Trading Strategy for the VFINX Mutual Fund

A better outcome can possibly be achieved by developing a market timing strategy designed specifically for the VFINX mutual fund.  This strategy uses only market orders to enter and exit positions and attempts to address the issue of frequent trading by applying a trading cost to simulate the fees that typically apply in such situations.  The results, net of imputed fees, for the period from 1994-2017 are summarized as follows:

 

Fig24

 

Fig18

Overall, the CAGR of the tactical strategy is around 88 basis points higher, per annum.  The risk-adjusted rate of return (Sharpe Ratio) is not as high as for the LOMT-SPY strategy, since the annual volatility is almost double.  But, as I have already pointed out, there are unanswered questions about the practicality of implementing the latter for the VFINX, given that it seeks to enter trades using limit orders, which do not exist in the mutual fund world.

The performance of the tactical-VFINX strategy relative to the VFINX fund falls into three distinct periods: under-performance in the period from 1994-2002, about equal performance in the period 2003-2008, and superior relative performance in the period from 2008-2017.

Only the data from 1/19934 to 3/2008 were used in the construction of the model.  Data in the period from 3/2008 to 11/2012 were used for testing, while the results for 12/2012 to 8/2017 are entirely out-of-sample. In other words, the great majority of the period of superior performance for the tactical strategy was out-of-sample.  The chief reason for the improved performance of the tactical-VFINX strategy is the lower drawdown suffered during the financial crisis of 2008, compared to the benchmark VFINX fund.  Using market-timing algorithms, the tactical strategy was able identify the downturn as it occurred and exit the market.  This is quite impressive since, as perviously indicated, none of the data from that 2008 financial crisis was used in the construction of the model.

In his Seeking Alpha article “Alpha-Winning Stars of the Bull Market“, Brad Zigler identifies the handful of funds that have outperformed the VFINX benchmark since 2009, generating positive alpha:

Fig20

 

What is notable is that the annual alpha of the tactical-VINFX strategy, at 1.69%, is higher than any of those identified by Zigler as being “exceptional”. Furthermore, the annual R-squared of the tactical strategy is higher than four of the seven funds on Zigler’s All-Star list.   Based on Zigler’s performance metrics, the tactical VFINX strategy would be one of the top performing active funds.

But there is another element missing from the assessment. In the analysis so far we have assumed that in periods when the tactical strategy disinvests from the VFINX fund the proceeds are simply held in cash, at zero interest.  In practice, of course, we would invest any proceeds in risk-free assets such as Treasury Bills.   This would further boost the performance of the strategy, by several tens of basis points per annum, without any increase in volatility.  In other words, the annual CAGR and annual Alpha, are likely to be greater than indicated here.

Robustness Testing

One of the concerns with any backtest – even one with a lengthy out-of-sample period, as here – is that one is evaluating only a single sample path from the price process.  Different evolutions could have produced radically different outcomes in the past, or in future. To assess the robustness of the strategy we apply Monte Carlo simulation techniques to generate a large number of different sample paths for the price process and evaluate the performance of the strategy in each scenario.

Three different types of random variation are factored into this assessment:

  1. We allow the observed prices to fluctuate by +/- 30% with a probability of about 1/3 (so, roughly, every three days the fund price will be adjusted up or down by that up to that percentage).
  2. Strategy parameters are permitted to fluctuate by the same amount and with the same probability.  This ensures that we haven’t over-optimized the strategy with the selected parameters.
  3. Finally, we randomize the start date of the strategy by up to a year.  This reduces the risk of basing the assessment on the outcome from encountering a lucky (or unlucky) period, during which the market may be in a strong trend, for example.

In the chart below we illustrate the outcome from around 1,000 such randomized sample paths, from which it can be seen that the strategy performance is robust and consistent.

Fig 19

 

Limitations to the Testing Procedure

We have identified one way in which this assessment understates the performance of the tactical-VFINX strategy:  by failing to take into account the uplift in returns from investing in interest-bearing Treasury securities, rather than cash, at times when the strategy is out of the market.  So it is only reasonable to point out other limitations to the test procedure that may paint a too-optimistic picture.

The key consideration here is the frequency of trading.  On average, the tactical-VFINX strategy trades around twice a month, which is more than normally permitted for mutual funds.  Certainly, we have factored in additional trading costs to account for early redemptions charges. But the question is whether or not the strategy would be permitted to trade at such frequency, even with the payment of additional fees.  If not, then the strategy would have to be re-tooled to work on long average holding periods, no doubt adversely affecting its performance.

Conclusion

The purpose of this analysis was to assess whether, in principle, it is possible to construct a market timing strategy that is capable of outperforming a VFINX fund benchmark.  The answer appears to be in the affirmative.  However, several practical issues remain to be addressed before such a strategy could be put into production successfully.  In general, mutual funds are not ideal vehicles for expressing trading strategies, including tactical market timing strategies.  There are latent inefficiencies in mutual fund markets – the restrictions on trading and penalties for early redemption, to name but two – that create difficulties for active approaches to investing in such products – ETFs are much superior in this regard.  Nonetheless, this study suggest that, in principle, tactical approaches to mutual fund investing may deliver worthwhile benefits to investors, despite the practical challenges.

Identifying Drivers of Trading Strategy Performance

Building a winning strategy, like the one in the e-Mini S&P500 futures described here is only half the challenge:  it remains for the strategy architect to gain an understanding of the sources of strategy alpha, and risk.  This means identifying the factors that drive strategy performance and, ideally, building a model so that their relative importance can be evaluated.  A more advanced step is the construction of a meta-model that will predict strategy performance and provided recommendations as to whether the strategy should be traded over the upcoming period.

Strategy Performance – Case Study

Let’s take a look at how this works in practice.  Our case study makes use of the following daytrading strategy in e-Mini futures.

Fig1

The overall performance of the strategy is quite good.  Average monthly PNL over the period from April to Oct 2015 is almost $8,000 per contract, after fees, with a standard deviation of only $5,500. That equates to an annual Sharpe Ratio in the region of 5.0.  On a decent execution platform the strategy should scale to around 10-15 contracts, with an annual PNL of around $1.0 to $1.5 million.

Looking into the performance more closely we find that the win rate (56%) and profit factor (1.43) are typical for a profitable strategy of medium frequency, trading around 20 times per session (in this case from 9:30AM to 4PM EST).

fig2

Another attractive feature of the strategy risk profile is the Max Adverse Execution, the drawdown experienced in individual trades (rather than the realized drawdown). In the chart below we see that the MAE increases steadily, without major outliers, to a maximum of only around $1,000 per contract.

Fig3

One concern is that the average trade PL is rather small – $20, just over 1.5 ticks. Strategies that enter and exit with limit orders and have small average trade are generally highly dependent on the fill rate – i.e. the proportion of limit orders that are filled.  If the fill rate is too low, the strategy will be left with too many missed trades on entry or exit, or both.  This is likely to damage strategy performance, perhaps to a significant degree – see, for example my post on High Frequency Trading Strategies.

The fill rate is dependent on the number of limit orders posted at the extreme high or low of the bar, known as the extreme hit rate.  In this case the strategy has been designed specifically to operate at an extreme hit rate of only around 10%, which means that, on average, only around one trade in ten occurs at the high or low of the bar.  Consequently, the strategy is not highly fill-rate dependent and should execute satisfactorily even on a retail platform like Tradestation or Interactive Brokers.

Drivers of Strategy Performance

So far so good.  But before we put the strategy into production, let’s try to understand some of the key factors that determine its performance.  Hopefully that way we will be better placed to judge how profitable the strategy is likely to be as market conditions evolve.

In fact, we have already identified one potential key performance driver: the extreme hit rate (required fill rate) and determined that it is not a major concern in this case. However, in cases where the extreme hit rate rises to perhaps 20%, or more, the fill ratio is likely to become a major factor in determining the success of the strategy.  It would be highly inadvisable to attempt implementation of such a strategy on a retail platform.

SSALGOTRADING AD

What other factors might affect strategy performance?  The correct approach here is to apply the scientific method:  develop some theories about the drivers of performance and see if we can find evidence to support them.

For this case study we might conjecture that, since the strategy enters and exits using limit orders, it should exhibit characteristics of a mean reversion strategy, which will tend to do better when the market moves sideways and rather worse in a strongly trending market.

Another hypothesis is that, in common with most day-trading and high frequency strategies, this strategy will produce better results during periods of higher market volatility.  Empirically, HFT firms have always produced higher profits during volatile market conditions  – 2008 was a banner year for many of them, for example.  In broad terms, times when the market is whipsawing around create additional opportunities for strategies that seek to exploit temporary mis-pricings.  We shall attempt to qualify this general understanding shortly.  For now let’s try to gather some evidence that might support the hypotheses we have formulated.

I am going to take a very simple approach to this, using linear regression analysis.  It’s possible to do much more sophisticated analysis using nonlinear methods, including machine learning techniques. In our regression model the dependent variable will be the daily strategy returns.  In the first iteration, let’s use measures of market returns, trading volume and market volatility as the independent variables.

Fig4

The first surprise is the size of the (adjusted) R Square – at 28%, this far exceeds the typical 5% to 10% level achieved in most such regression models, when applied to trading systems.  In other words, this model does a very good job of account for a large proportion of the variation in strategy returns.

Note that the returns in the underlying S&P50o index play no part (the coefficient is not statistically significant). We might expect this: ours is is a trading strategy that is not specifically designed to be directional and has approximately equivalent performance characteristics on both the long and short side, as you can see from the performance report.

Now for the next surprise: the sign of the volatility coefficient.  Our ex-ante hypothesis is that the strategy would benefit from higher levels of market volatility.  In fact, the reverse appears to be true (due to the  negative coefficient).  How can this be?  On further reflection, the reason why most HFT strategies tend to benefit from higher market volatility is that they are momentum strategies.  A momentum strategy typically enters and exits using market orders and hence requires  a major market move to overcome the drag of the bid-offer spread (assuming it calls the market direction correctly!).  This strategy, by contrast, is a mean-reversion strategy, since entry/exits are effected using limit orders.  The strategy wants the S&P500 index to revert to the mean – a large move that continues in the same direction is going to hurt, not help, this strategy.

Note, by contrast, that the coefficient for the volume factor is positive and statistically significant.  Again this makes sense:  as anyone who has traded the e-mini futures overnight can tell you, the market tends to make major moves when volume is light – simply because it is easier to push around.  Conversely, during a heavy trading day there is likely to be significant opposition to a move in any direction.  In other words, the market is more likely to trade sideways on days when trading volume is high, and this is beneficial for our strategy.

The final surprise and perhaps the greatest of all, is that the strategy alpha appears to be negative (and statistically significant)!  How can this be?  What the regression analysis  appears to be telling us is that the strategy’s performance is largely determined by two underlying factors, volume and volatility.

Let’s dig into this a little more deeply with another regression, this time relating the current day’s strategy return to the prior day’s volume, volatility and market return.

Fig5

In this regression model the strategy alpha is effectively zero and statistically insignificant, as is the case for lagged volume.  The strategy returns relate inversely to the prior day’s market return, which again appears to make sense for a mean reversion strategy:  our model anticipates that, in the mean, the market will reverse the prior day’s gain or loss.  The coefficient for the lagged volatility factor is once again negative and statistically significant.  This, too, makes sense:  volatility tends to be highly autocorrelated, so if the strategy performance is dependent on market volatility during the current session, it is likely to show dependency on volatility in the prior day’s session also.

So, in summary, we can provisionally conclude that:

This strategy has no market directional predictive power: rather it is a pure, mean-reversal strategy that looks to make money by betting on a reversal in the prior session’s market direction.  It will do better during periods when trading volume is high, and when market volatility is low.

Conclusion

Now that we have some understanding of where the strategy performance comes from, where do we go from here?  The next steps might include some, or all, of the following:

(i) A more sophisticated econometric model bringing in additional lags of the explanatory variables and allowing for interaction effects between them.

(ii) Introducing additional exogenous variables that may have predictive power. Depending on the nature of the strategy, likely candidates might include related equity indices and futures contracts.

(iii) Constructing a predictive model and meta-strategy that would enable us assess the likely future performance of the strategy, and which could then be used to determine position size.  Machine learning techniques can often be helpful in this content.

I will give an example of the latter approach in my next post.

Is Your Trading Strategy Still Working?

The Challenge of Validating Strategy Performance

One of the challenges faced by investment strategists is to assess whether a strategy is continuing to perform as it should.  This applies whether it is a new strategy that has been backtested and is now being traded in production, or a strategy that has been live for a while.
All strategies have a limited lifespan.  Markets change, and a trading strategy that can’t accommodate that change will get out of sync with the market and start to lose money. Unless you have a way to identify when a strategy is no longer in sync with the market, months of profitable trading can be undone very quickly.

The issue is particularly important for quantitative strategies.  Firstly, quantitative strategies are susceptible to the risk of over-fitting.  Secondly, unlike a strategy based on fundamental factors, it may be difficult for the analyst to verify that the drivers of strategy profitability remain intact.

Savvy investors are well aware of the risk of quantitative strategies breaking down and are likely to require reassurance that a period of underperformance is a purely temporary phenomenon.

It might be tempting to believe that you will simply stop trading when the strategy stops working.  But given the stochastic nature of investment returns, how do you distinguish a losing streak from a system breakdown?

SSALGOTRADING AD

Stochastic Process Control

One approach to the problem derives from the field of Monte Carlo simulation and stochastic process control.  Here we random draw samples from the distribution of strategy returns and use these to construct a prediction envelope to forecast the range of future returns.  If the equity curve of the strategy over the forecast period  falls outside of the envelope, it would raise serious concerns that the strategy may have broken down.  In those circumstances you would almost certainly want to trade the strategy in smaller size for a while to see if it recovers, or even exit the strategy altogether it it does not.

I will illustrate the procedure for the long/short ETF strategy that I described in an earlier post, making use of Michael Bryant’s excellent Market System Analyzer software.

To briefly refresh, the strategy is built using cointegration theory to construct long/short portfolios is a selection of ETFs that provide exposure to US and international equity, currency, real estate and fixed income markets.  The out of sample back-test performance of the strategy is very encouraging:

Fig 2

 

Fig 1

There was evidently a significant slowdown during 2014, with a reduction in the risk-adjusted returns and win rate for the strategy:

Fig 1

This period might itself have raised questions about the continuing effectiveness of the strategy.  However, we have the benefit of hindsight in seeing that, during the first two months of 2015, performance appeared to be recovering.

Consequently we put the strategy into production testing at the beginning of March 2015 and we now wish to evaluate whether the strategy is continuing on track.   The results indicate that strategy performance has been somewhat weaker than we might have hoped, although this is compensated for by a significant reduction in strategy volatility, so that the net risk-adjusted returns remain somewhat in line with recent back-test history.

Fig 3

Using the MSA software we sample the most recent back-test returns for the period to the end of Feb 2015, and create a 95% prediction envelope for the returns since the beginning of March, as follows:

Fig 2

As we surmised, during the production period the strategy has slightly underperformed the projected median of the forecast range, but overall the equity curve still falls within the prediction envelope.  As this stage we would tentatively conclude that the strategy is continuing to perform within expected tolerance.

Had we seen a pattern like the one shown in the chart below, our conclusion would have been very different.

Fig 4

As shown in the illustration, the equity curve lies below the lower boundary of the prediction envelope, suggesting that the strategy has failed. In statistical terms, the trades in the validation segment appear not to belong to the same statistical distribution of trades that preceded the validation segment.

This strategy failure can also be explained as follows: The equity curve prior to the validation segment displays relatively little volatility. The drawdowns are modest, and the equity curve follows a fairly straight trajectory. As a result, the prediction envelope is fairly narrow, and the drawdown at the start of the validation segment is so large that the equity curve is unable to rise back above the lower boundary of the envelope. If the history prior to the validation period had been more volatile, it’s possible that the envelope would have been large enough to encompass the equity curve in the validation period.

 CONCLUSION

Systematic trading has the advantage of reducing emotion from trading because the trading system tells you when to buy or sell, eliminating the difficult decision of when to “pull the trigger.” However, when a trading system starts to fail a conflict arises between the need to follow the system without question and the need to stop following the system when it’s no longer working.

Stochastic process control provides a technical, objective method to determine when a trading strategy is no longer working and should be modified or taken offline. The prediction envelope method extrapolates the past trade history using Monte Carlo analysis and compares the actual equity curve to the range of probable equity curves based on the extrapolation.

Next we will look at nonparametric distributions tests  as an alternative method for assessing strategy performance.