Robustness in Quantitative Research and Trading

What is Strategy Robustness?  What is its relevance to Quantitative Research and Trading?

One of the most highly desired properties of any financial model or investment strategy, by investors and managers alike, is robustness.  I would define robustness as the ability of the strategy to deliver a consistent  results across a wide range of market conditions.  It, of course, by no means the only desirable property – investing in Treasury bills is also a pretty robust strategy, although the returns are unlikely to set an investor’s pulse racing – but it does ensure that the investor, or manager, is unlikely to be on the receiving end of an ugly surprise when market conditions adjust.

Robustness is not the same thing as low volatility, which also tends to be a characteristic highly prized by many investors.  A strategy may operate consistently, with low volatility in certain market conditions, but behave very differently in other.  For instance, a delta-hedged short-volatility book containing exotic derivative positions.   The point is that empirical researchers do not know the true data-generating process for the markets they are modeling. When specifying an empirical model they need to make arbitrary assumptions. An example is the common assumption that assets returns follow a Gaussian distribution.  In fact, the empirical distribution of the great majority of asset process exhibit the characteristic of “fat tails”, which can result from the interplay between multiple market states with random transitions.  See this post for details:

http://jonathankinlay.com/2014/05/a-quantitative-analysis-of-stationarity-and-fat-tails/

 

In statistical arbitrage, for example, quantitative researchers often make use of cointegration models to build pairs trading strategies.  However the testing procedures used in current practice are not sufficient powerful to distinguish between cointegrated processes and those whose evolution just happens to correlate temporarily, resulting in the frequent breakdown in cointegrating relationships.  For instance, see this post:

http://jonathankinlay.com/2017/06/statistical-arbitrage-breaks/

Modeling Assumptions are Often Wrong – and We Know It

We are, of course, not the first to suggest that empirical models are misspecified:

“All models are wrong, but some are useful” (Box 1976, Box and Draper 1987).

 

Martin Feldstein (1982: 829): “In practice all econometric specifications are necessarily false models.”

 

Luke Keele (2008: 1): “Statistical models are always simplifications, and even the most complicated model will be a pale imitation of reality.”

 

Peter Kennedy (2008: 71): “It is now generally acknowledged that econometric models are false and there is no hope, or pretense, that through them truth will be found.”

During the crash of 2008 quantitative Analysts and risk managers found out the hard way that the assumptions underpinning the copula models used to price and hedge credit derivative products were highly sensitive to market conditions.  In other words, they were not robust.  See this post for more on the application of copula theory in risk management:

http://jonathankinlay.com/2017/01/copulas-risk-management/

 

Robustness Testing in Quantitative Research and Trading

We interpret model misspecification as model uncertainty. Robustness tests analyze model uncertainty by comparing a baseline model to plausible alternative model specifications.  Rather than trying to specify models correctly (an impossible task given causal complexity), researchers should test whether the results obtained by their baseline model, which is their best attempt of optimizing the specification of their empirical model, hold when they systematically replace the baseline model specification with plausible alternatives. This is the practice of robustness testing.

SSALGOTRADING AD

Robustness testing analyzes the uncertainty of models and tests whether estimated effects of interest are sensitive to changes in model specifications. The uncertainty about the baseline model’s estimated effect size shrinks if the robustness test model finds the same or similar point estimate with smaller standard errors, though with multiple robustness tests the uncertainty likely increases. The uncertainty about the baseline model’s estimated effect size increases of the robustness test model obtains different point estimates and/or gets larger standard errors. Either way, robustness tests can increase the validity of inferences.

Robustness testing replaces the scientific crowd by a systematic evaluation of model alternatives.

Robustness in Quantitative Research

In the literature, robustness has been defined in different ways:

  • as same sign and significance (Leamer)
  • as weighted average effect (Bayesian and Frequentist Model Averaging)
  • as effect stability We define robustness as effect stability.

Parameter Stability and Properties of Robustness

Robustness is the share of the probability density distribution of the baseline model that falls within the 95-percent confidence interval of the baseline model.  In formulaeic terms:

Formula

  • Robustness is left-–right symmetric: identical positive and negative deviations of the robustness test compared to the baseline model give the same degree of robustness.
  • If the standard error of the robustness test is smaller than the one from the baseline model, ρ converges to 1 as long as the difference in point estimates is negligible.
  • For any given standard error of the robustness test, ρ is always and unambiguously smaller the larger the difference in point estimates.
  • Differences in point estimates have a strong influence on ρ if the standard error of the robustness test is small but a small influence if the standard errors are large.

Robustness Testing in Four Steps

  1. Define the subjectively optimal specification for the data-generating process at hand. Call this model the baseline model.
  2. Identify assumptions made in the specification of the baseline model which are potentially arbitrary and that could be replaced with alternative plausible assumptions.
  3. Develop models that change one of the baseline model’s assumptions at a time. These alternatives are called robustness test models.
  4. Compare the estimated effects of each robustness test model to the baseline model and compute the estimated degree of robustness.

Model Variation Tests

Model variation tests change one or sometimes more model specification assumptions and replace with an alternative assumption, such as:

  • change in set of regressors
  • change in functional form
  • change in operationalization
  • change in sample (adding or subtracting cases)

Example: Functional Form Test

The functional form test examines the baseline model’s functional form assumption against a higher-order polynomial model. The two models should be nested to allow identical functional forms. As an example, we analyze the ‘environmental Kuznets curve’ prediction, which suggests the existence of an inverse u-shaped relation between per capita income and emissions.

Emissions and percapitaincome

Note: grey-shaded area represents confidence interval of baseline model

Another example of functional form testing is given in this review of Yield Curve Models:

http://jonathankinlay.com/2018/08/modeling-the-yield-curve/

Random Permutation Tests

Random permutation tests change specification assumptions repeatedly. Usually, researchers specify a model space and randomly and repeatedly select model from this model space. Examples:

  • sensitivity tests (Leamer 1978)
  • artificial measurement error (Plümper and Neumayer 2009)
  • sample split – attribute aggregation (Traunmüller and Plümper 2017)
  • multiple imputation (King et al. 2001)

We use Monte Carlo simulation to test the sensitivity of the performance of our Quantitative Equity strategy to changes in the price generation process and also in model parameters:

http://jonathankinlay.com/2017/04/new-longshort-equity/

Structured Permutation Tests

Structured permutation tests change a model assumption within a model space in a systematic way. Changes in the assumption are based on a rule, rather than random.  Possibilities here include:

  • sensitivity tests (Levine and Renelt)
  • jackknife test
  • partial demeaning test

Example: Jackknife Robustness Test

The jackknife robustness test is a structured permutation test that systematically excludes one or more observations from the estimation at a time until all observations have been excluded once. With a ‘group-wise jackknife’ robustness test, researchers systematically drop a set of cases that group together by satisfying a certain criterion – for example, countries within a certain per capita income range or all countries on a certain continent. In the example, we analyse the effect of earthquake propensity on quake mortality for countries with democratic governments, excluding one country at a time. We display the results using per capita income as information on the x-axes.

jackknife

Upper and lower bound mark the confidence interval of the baseline model.

Robustness Limit Tests

Robustness limit tests provide a way of analyzing structured permutation tests. These tests ask how much a model specification has to change to render the effect of interest non-robust. Some examples of robustness limit testing approaches:

  • unobserved omitted variables (Rosenbaum 1991)
  • measurement error
  • under- and overrepresentation
  • omitted variable correlation

For an example of limit testing, see this post on a review of the Lognormal Mixture Model:

http://jonathankinlay.com/2018/08/the-lognormal-mixture-variance-model/

Summary on Robustness Testing

Robustness tests have become an integral part of research methodology. Robustness tests allow to study the influence of arbitrary specification assumptions on estimates. They can identify uncertainties that otherwise slip the attention of empirical researchers. Robustness tests offer the currently most promising answer to model uncertainty.

Futures WealthBuilder

We are launching a new product, the Futures WealthBuilder,  a CTA system that trades futures contracts in several highly liquid financial and commodity markets, including SP500 EMinis, Euros, VIX, Gold, US Bonds, 10-year and five-year notes, Corn, Natural Gas and Crude Oil.  Each  component strategy uses a variety of machine learning algorithms to detect trends, seasonal effects and mean-reversion.  We develop several different types of model for each market, and deploy them according to their suitability for current market conditions.

Performance of the strategy (net of fees) since 2013 is detailed in the charts and tables below.  Notable features include a Sharpe Ratio of just over 2, an annual rate of return of 190% on an account size of $50,000, and a maximum drawdown of around 8% over the last three years.  It is worth mentioning, too, that the strategy produces approximately equal rates of return on both long and short trades, with an overall profit factor above 2.

 

Fig1

 

Fig2

 

 

Fig3

Fig4

 

Fig5

Low Correlation

Despite a high level of correlation between several of the underlying markets, the correlation between the component strategies of Futures WealthBuilder are, in the majority of cases, negligibly small (with a few exceptions, such as the high correlation between the 10-year and 5-year note strategies).  This accounts for the relative high level of return in relation to portfolio risk, as measured by the Sharpe Ratio.   We offer strategies in both products chiefly as a mean of providing additional liquidity, rather than for their diversification benefit.

Fig 6

Strategy Robustness

Strategy robustness is a key consideration in the design stage.  We use Monte Carlo simulation to evaluate scenarios not seen in historical price data in order to ensure consistent performance across the widest possible range of market conditions.  Our methodology introduces random fluctuations to historical prices, increasing or decreasing them by as much as 30%.  We allow similar random fluctuations in that value strategy parameters, to ensure that our models perform consistently without being overly-sensitive to the specific parameter values we have specified.  Finally, we allow the start date of each sub-system to vary randomly by up to a year.

The effect of these variations is to produce a wide range of outcomes in terms of strategy performance.  We focus on the 5% worst outcomes, ranked by profitability, and select only those strategies whose performance is acceptable under these adverse scenarios.  In this way we reduce the risk of overfitting the models while providing more realistic expectations of model performance going forward.  This procedure also has the effect of reducing portfolio tail risk, and the maximum peak-to-valley drawdown likely to be produced by the strategy in future.

GC Daily Stress Test

Futures WealthBuilder on Collective 2

We will be running a variant of the Futures WealthBuilder strategy on the Collective 2 site, using a subset of the strategy models in several futures markets(see this page for details).  Subscribers will be able to link and auto-trade the strategy in their own account, assuming they make use of one of the approved brokerages which include Interactive Brokers, MB Trading and several others.

Obviously the performance is unlikely to be as good as the complete strategy, since several component sub-strategies will not be traded on Collective 2.  However, this does give the subscriber the option to trial the strategy in simulation before plunging in with real money.

Fig7

 

 

 

 

 

The Internal Bar Strength Indicator

Internal Bar Strength (IBS) is an idea that has been around for some time.  IBS is based on the position of the day’s close in relation to the day’s range: it takes a value of 0 if the closing price is the lowest price of the day, and 1 if the closing price is the highest price of the day.

More formally:

IBS  =  (Close – Low) / (High – Low)

The IBS effect may be related to intraday over-reaction to news or market movements, which are then ”corrected” the next day.  It serves as a measure of the tendency of a price series to mean-revert over daily horizons.  I use the term “daily” advisedly: so far as I am aware, there has been no research (including my own) demonstrating the existence of an IBS effect at time horizons shorter, or longer, than one day.  Indeed, there has been very little in the way of academic research into the concept of any kind, which is strange considering how compelling are the results it is capable of producing.  Practitioners have been happy enough with that state of affairs, content to deploy this neglected indicator in their trading strategies, where it has often proved to be extremely useful (we use IBS in one of our volatility strategies). Since 2013, however, the cat has been let out of the bag, thanks to an excellent research paper by Alexander Pagonidis, who writes an interesting quantitative finance blog.

The essence of the idea is that stocks that close in the lowest part of the daily range, with an IBS of below, say, 0.2, will tend to rally the next day, while stocks that close in the highest quintile will often decline in value in the following session.  In his paper “The IBS Effect: Mean Reversion in Equity ETFs” (2013), Pagonidis researches the IBS effect in equity index ETFs in the US and several international markets.  He confirms that low IBS values in these assets are associated with high returns in the following day session, while high IBS values are associated with low returns. Average returns when IBS is below 0.20 are .35% ,while average returns when IBS is above 0.80 are -0.13%. According to his research, this effect has been present in equity ETFs since the early 90s and has been highly consistent through time.

SSALGOTRADING AD

IBS Strategy Performance

To give the reader some idea of the potential of the IBS effect, I have reproduced below equity curves for the IBS strategy for the SPDR S&P 500 ETF Trust (SPY) and iShares MSCI Singapore ETF (EWS) index ETFs over the period from 1999 to 2016.  The strategy buys at the close when IBS is below 0.2, and sells at the close when IBS exceeds 0.8, liquidating the position at the following market close. Strategy CAGR over the period has been of the order of 13% for SPY and as high as 40% for EWS, ignoring transaction costs.

IBS Strategy Chart SPY EWS

 

Note that in both cases strategy returns for SPY and EWS have diminished in recent years, turning negative in 2015 and 2016 YTD and this is true for ETFs in general.  It remains to be seen whether this deterioration in strategy performance is temporary or permanent.  There are some indications that the latter holds true, but the evidence is not quite definitive.  For example, the chart below shows daily equity curve for the SPY IBS strategy, with 95% confidence intervals for the latest 100 trades (up to the end of May 2016), constructed using Monte-Carlo bootstrap.  The equity curve appears to have penetrated the lower bound, indicating a statistically significant deterioration in the performance of the IBS strategy for SPY over the last year or so (EWS is similar).  That said, the equity curve does fall inside the boundaries of the 99% confidence interval, so those looking for greater certainty about the possible breakdown of the effect will need to wait a little longer for confirmation.

 

SPY IBS MSA

 

Whatever the outcome may be for SPY and other ETFs going forward, it is certainly true that IBS effects persist strongly for some individual equities, Exxon-Mobil Corp. (XOM) being a case in point (see below).  It’s worth taking note of the exceptional performance of the XOM IBS strategy during the latter quarter of 2008.  I will have much more to say on the application of the IBS indicator for individual equities in a future blog post.

 

XOM IBS Strategy

 

The Role of Range, Volume, Bull/Bear Markets, Volatility and Seasonality

Pagonidis goes on to detail several further important findings in relation to IBS.  It is clear from his research that high volatility is related to increased predictability of returns and a more powerful IBS effect, in particular the high IBS-negative return aspect.  As might be expected, the effect is also larger after days with high range, both for high and low IBS extremes.

Volume turns out to be especially important for  U.S. index ETFs:  in fact, the IBS effect only appears to work on high-volume days.

Pagonidis also separates the data into bull and bear market environments, based on whether 200-day returns are positive or not.  The size of the effect is roughly similar in each environment (slightly larger in bear markets), but it is greater in the direction of the overall trend: high IBS readings are followed by larger negative returns during bear markets, and vice versa.

Day of Week Effect

The IBS effect is also strongly seasonal, having the greatest impact on returns from Monday’s close to Tuesday’s close, as illustrated for the SPY ETF in the chart below.  This accounts for the phenomenon known popularly as “Turnaround Tuesday”, i.e. the tendency for the market to recover strongly from losses on a Monday.  The day-of-week effect is weakest for Fridays.

 

SPY DOW

 

The mean of the returns distribution is not the only aspect that IBS can predict. Skewness also varies significantly between IBS buckets, with low IBS readings being followed by highly skewed returns, and vice versa. Close-to-close returns after a bottom-bucket IBS day have average skewness of 0.65 across Equity Index ETF products, while top-bucket IBS days are followed by returns with skewness of 0.03. This finding has very useful risk management applications for investors concerned with tail risk.

IBS as a Filter for a Swing Trading Strategy in QQQ

The returns to an IBS-only strategy are both statistically and economically significant. However, commissions will greatly decrease the returns and increase the maximum drawdowns, however, making such an approach challenging in the real world. One alternative is to combine the IBS effect with mean reversion on longer timescales and only take trades when they align.

Pagonidis offers a simple demonstration using the Cutler’s RSI indicator that shows how the IBS effect can be used to boost returns of a swing trading strategy while significantly decreasing the number of trades needed.

Cutler’s RSI at time t is calculated as follows:

 

RSI

 

Pagonidis tests a simple, long-only strategy that trades the PowerShares QQQ Trust, Series 1 (QQQ) ETF using the Cutler’s RSI(3) indicator:

• Go long at the close if RSI(3) < 10

• Maintain the position while RSI(3) ≤ 40

 filter these returns by adding an additional rule based on the value of IBS:

• Enter or maintain long position only if IBS ≤ 0.5

Pangonis claims that the strategy produces rather promising results that “easily beats commissions”;  however, my own rendition of the strategy, assuming commissions of $0.005 per share and slippage of a further $0.02 per share produces results that are distinctly less encouraging:

EC0

 

Pef0

Strategy Code

For those interested, the code is as follows:

Inputs:
RSILen(3),
RSI_Entry(10),
RSI_Exit(40),
IBS_Threshold(0.5),
Initial_Capital(100000);
Vars:
nShares(100),
RSIval(0),
IBS(0);
RSIval=RSI(C,RSILen);
IBS = (C-L)/(H-L);

nShares = Round(Initial_Capital / Close,0);

If Marketposition = 0 and RSIval > RSI_Entry and IBS < IBS_Threshold then begin
Buy nShares contracts next bar at market;
end;
If Marketposition > 0 and ((RSIval > RSI_Exit) or (IBS_Threshold > IBS_Threshold)) then begin
Sell next bar at market;
end;

Strategy Optimization and Robustness Testing

One can further improve performance by optimizing the trading system parameters, using Tradestation’s excellent Walk Forward Optimization (WFO) module.  This allows us to examine the effect of re-calibrating the strategy parameters are regular intervals, testing the optimized model on out-of-sample data sets of various sizes.  WFO can be used, not only optimize a strategy, but also to examine the sensitivity of its performance to changes in the levels of key parameters.  For example, in the case of the QQQ swing trading strategy, we find that profitability increases monotonically with the length of the RSI indicator, and this effect is especially marked when an IBS threshold level of 0.2 is used:

Sensitivity

 

Likewise we can test the consistency of the day-of-the-week effect over several OS data sets of  varying size and these tests are consistent with the pattern seen earlier for the IBS indicator, confirming its role as a filter rule in enhancing system profitability:

Distribution Analysis

 

A model that is regularly re-calibrated using WFO is subjected to a series of tests designed to ensure its robustness and consistency in live trading.   The tests include the following:

 

WFO

 

In order to achieve an overall pass rating, the system is required to pass all five tests of its out-of-sample performance, from which Tradestation deems it likely that the system will continue to perform well in live trading.  The results from this procedure appear much more promising than the strategy in its original form, as can be seen from the performance table and equity curve chart shown below.

EC1

Perf1

 

However, these results include both in-sample and out-of-sample periods.  An examination of the results from the WFO indicate that the overall efficiency of the strategy is around 55%, meaning that the P&L produced by the system in out-of-sample periods amounts to a little over one half of the rate of profit produced during in-sample periods.  Going forward, therefore, we might expect the performance of the system in live trading to be only around half as good as shown here.  While this is still superior to the original system, it may not be considered good enough.  Nonetheless, for the purpose of illustrating the benefits of the IBS indicator as a trade filter, it makes the point.

Another interesting example of an IBS-based trading strategy in the QQQ and SPY ETFs can be found in the following blog post.

Conclusion

Internal Bar Strength is a powerful mean-reversion indicator for equity products traded at daily frequencies, with a consistent effect that has continued from the 1990s through to the current decade. IBS can be used on its own in mean-reversion strategies that have worked well for both US equities and US and International equity index ETFs, or used as a trade filter when combined with other alpha signals.

While there is evidence of a weakening of the IBS effect since around 2013 this is not yet confirmed statistically (at the 99% confidence level) and may simply be the result of normal statistical variation in its efficacy.

 

 

A New Approach to Equity Valuation

How Analysts Traditionally Value Equity

fig1I learned the traditional method for producing equity valuations in the 1980’s, from  Chase bank’s excellent credit training program.  The standard technique was to develop several years of projected financial statements, and then discount the cash flows and terminal value to arrive at an NPV. I’m guessing the basic approach hasn’t changed all that much over the last 30-40 years and probably continues to serve as the fundamental building block for M&A transactions and PE deals.

Damadoran

Amongst several excellent texts on the topic I can recommend, for example, Aswath Damodaran’s book on valuation.

Arguably the weakest point in the methodology are the assumptions made about the long term growth rate of the business and the rate used to discount the cash flows to produce the PV.  Since we are dealing with long term projections, small variations in these rates can make a considerable difference to the outcome.

The Monte Carlo Approach

Around 20 years ago I wrote a paper titled “A New Approach to Equity Valuation”, in which I attempted to define a new methodology for equity valuation.  The idea was simple enough:  instead of guessing an appropriate rate to discount the projected cash flows generated by the company, you embed the riskiness into the cash flows themselves, using probability distributions.  That allows you to model the cash flows using Monte Carlo simulation and discount them using the risk-free rate, which is much easier to determine.  In a similar vein,  the model can allow for stochastic growth rates, perhaps also taking into account the arrival of potential new entrants, or disruptive technologies.

I recall taking the idea to an acquaintance of mine who at the time was head of M&A at a prestigious boutique bank in London.  About five minutes into the conversation I realized I had lost him at “Monte Carlo”.  It was yet another instance of the gulf between the fundamental and quantitative approach to investment finance, something I have always regarded as rather artificial.  The line has blurred in several places over the last few decades – option theory of the firm and factor models, to name but two examples – but remains largely intact.  I have met very few equity analysts who have the slightest clue about quantitative research and vice-versa, for that matter.  This is a pity in my view, as there is much to be gained by blending knowledge of the two disciplines.

SSALGOTRADING AD

The basic idea of the Monte Carlo approach is to formulate probability distributions for key variables that drive the business, such as sales, gross margin, cost of goods, etc., as well as related growth rates. You then determine the outcome in terms of P&L and cash flows over a large number of simulations, from which you can derive a probability distribution for the firm/equity value.

npv

There are two potential sources of data one can use to build a Monte Carlo model: the historical distributions of the variables and information from line management. It is the latter that is likely to be especially useful, because you can embed management’s expertise and understanding of the business and its competitive environment directly into the model variables, rather than relying upon a single discount rate to account for all the possible sources of variation in the cash flows.

It can get a little complicated, of course: one cannot simply assume that all the variables evolve independently – COGS is likely to fall as a % of sales as sales increase, for example, due to economies of scale. Such interactive effects are critically important and it is necessary to dig deep into the inner workings of the business to model them successfully.  But to those who may view such a task as overwhelmingly complicated I can offer several counter examples.  For instance, in the 1970’s  I worked on large scale simulation models of the North Sea oil fields that incorporated volumes of information from geology to engineering to financial markets.  Another large scale simulation was built to assess how best to manage tanker traffic at one of the world’s busiest sea ports.

Creating a simulation model of  the financials of a single firm is a simple task, by comparison. And, after you have built the model it will typically remain fundamentally unchanged in basic form for many years making the task of producing valuation estimates much easier in future.

Applications of Monte Carlo Methods in Equity Valuation

Ok, so what’s the point?  At the end of the day, don’t you just end up with the same result as from traditional methods, i.e. an estimate of the equity or firm value? Actually no – what you have instead is an estimate of the probability distribution of the value, something decidedly more useful.

For example:

Contract Negotiation

Monte Carlo methods have been applied successfully to model contract negotiation scenarios, for instance for management consulting projects, where several rounds of negotiation are often involved in reaching an agreed pricing structure.

Negotiation

 Stock Selection

You might build a portfolio of value stocks whose share price is below the median value, in the expectation that the majority of the universe will prove to be undervalued, over the long term.  Or you might embed information about the expected value of the equities in your universe (and their cashflow volatilities) into you portfolio construction model.

Private Equity / Mergers & Acquisitions

In a PE or M&A negotiation your model provides a range of values to select from, each of which is associated with an estimated “probability of overpayment”.  For example, your opening bid might be a little below the median value, where it is likely that you are under-bidding for the projected cash flows.  That allows some headroom to increase the bid, if necessary, without incurring too great a risk of over-paying.

Recent Research

A survey of recent research in the field yields some interesting results, amongst them a paper by Magnus Pedersen entitled Monte Carlo Simulation in Financial Valuation (2014).  Pedersen takes a rather different approach to applying Monte Carlo methods to equity valuation.   Specifically, he uses the historical distribution of the price/book ratio to derive the empirical distribution of the equity value rather than modeling the individual cash flows.  This is a sensible compromise for someone who, unlike an analyst at a major sell-side firm, may not have access to management information necessary to build a more sophisticated model.  Nevertheless, Pedersen is able to demonstrate quite interesting results using MC methods to construct equity portfolios (weighted according to the Kelly criterion), in an accompanying paper Portfolio Optimization & Monte Carlo Simulation (2014).

For those who find the subject interesting, Pedersen offers several free books on his web site, which are worth reviewing.

cover_strategies-sp500

Is Your Trading Strategy Still Working?

The Challenge of Validating Strategy Performance

One of the challenges faced by investment strategists is to assess whether a strategy is continuing to perform as it should.  This applies whether it is a new strategy that has been backtested and is now being traded in production, or a strategy that has been live for a while.
All strategies have a limited lifespan.  Markets change, and a trading strategy that can’t accommodate that change will get out of sync with the market and start to lose money. Unless you have a way to identify when a strategy is no longer in sync with the market, months of profitable trading can be undone very quickly.

The issue is particularly important for quantitative strategies.  Firstly, quantitative strategies are susceptible to the risk of over-fitting.  Secondly, unlike a strategy based on fundamental factors, it may be difficult for the analyst to verify that the drivers of strategy profitability remain intact.

Savvy investors are well aware of the risk of quantitative strategies breaking down and are likely to require reassurance that a period of underperformance is a purely temporary phenomenon.

It might be tempting to believe that you will simply stop trading when the strategy stops working.  But given the stochastic nature of investment returns, how do you distinguish a losing streak from a system breakdown?

SSALGOTRADING AD

Stochastic Process Control

One approach to the problem derives from the field of Monte Carlo simulation and stochastic process control.  Here we random draw samples from the distribution of strategy returns and use these to construct a prediction envelope to forecast the range of future returns.  If the equity curve of the strategy over the forecast period  falls outside of the envelope, it would raise serious concerns that the strategy may have broken down.  In those circumstances you would almost certainly want to trade the strategy in smaller size for a while to see if it recovers, or even exit the strategy altogether it it does not.

I will illustrate the procedure for the long/short ETF strategy that I described in an earlier post, making use of Michael Bryant’s excellent Market System Analyzer software.

To briefly refresh, the strategy is built using cointegration theory to construct long/short portfolios is a selection of ETFs that provide exposure to US and international equity, currency, real estate and fixed income markets.  The out of sample back-test performance of the strategy is very encouraging:

Fig 2

 

Fig 1

There was evidently a significant slowdown during 2014, with a reduction in the risk-adjusted returns and win rate for the strategy:

Fig 1

This period might itself have raised questions about the continuing effectiveness of the strategy.  However, we have the benefit of hindsight in seeing that, during the first two months of 2015, performance appeared to be recovering.

Consequently we put the strategy into production testing at the beginning of March 2015 and we now wish to evaluate whether the strategy is continuing on track.   The results indicate that strategy performance has been somewhat weaker than we might have hoped, although this is compensated for by a significant reduction in strategy volatility, so that the net risk-adjusted returns remain somewhat in line with recent back-test history.

Fig 3

Using the MSA software we sample the most recent back-test returns for the period to the end of Feb 2015, and create a 95% prediction envelope for the returns since the beginning of March, as follows:

Fig 2

As we surmised, during the production period the strategy has slightly underperformed the projected median of the forecast range, but overall the equity curve still falls within the prediction envelope.  As this stage we would tentatively conclude that the strategy is continuing to perform within expected tolerance.

Had we seen a pattern like the one shown in the chart below, our conclusion would have been very different.

Fig 4

As shown in the illustration, the equity curve lies below the lower boundary of the prediction envelope, suggesting that the strategy has failed. In statistical terms, the trades in the validation segment appear not to belong to the same statistical distribution of trades that preceded the validation segment.

This strategy failure can also be explained as follows: The equity curve prior to the validation segment displays relatively little volatility. The drawdowns are modest, and the equity curve follows a fairly straight trajectory. As a result, the prediction envelope is fairly narrow, and the drawdown at the start of the validation segment is so large that the equity curve is unable to rise back above the lower boundary of the envelope. If the history prior to the validation period had been more volatile, it’s possible that the envelope would have been large enough to encompass the equity curve in the validation period.

 CONCLUSION

Systematic trading has the advantage of reducing emotion from trading because the trading system tells you when to buy or sell, eliminating the difficult decision of when to “pull the trigger.” However, when a trading system starts to fail a conflict arises between the need to follow the system without question and the need to stop following the system when it’s no longer working.

Stochastic process control provides a technical, objective method to determine when a trading strategy is no longer working and should be modified or taken offline. The prediction envelope method extrapolates the past trade history using Monte Carlo analysis and compares the actual equity curve to the range of probable equity curves based on the extrapolation.

Next we will look at nonparametric distributions tests  as an alternative method for assessing strategy performance.