A Practical Application of Regime Switching Models to Pairs Trading

In the previous post I outlined some of the available techniques used for modeling market states.  The following is an illustration of how these techniques can be applied in practice.    You can download this post in pdf format here.

SSALGOTRADING AD

The chart below shows the daily compounded returns for a single pair in an ETF statistical arbitrage strategy, back-tested over a 1-year period from April 2010 to March 2011.

The idea is to examine the characteristics of the returns process and assess its predictability.

Pairs Trading

The initial impression given by the analytics plots of daily returns, shown in Fig 2 below, is that the process may be somewhat predictable, given what appears to be a significant 1-order lag in the autocorrelation spectrum.  We also see evidence of the
customary non-Gaussian “fat-tailed” distribution in the error process.

Regime Switching

An initial attempt to fit a standard Auto-Regressive Moving Average ARMA(1,0,1) model  yields disappointing results, with an unadjusted  model R-squared of only 7% (see model output in Appendix 1)

However, by fitting a 2-state Markov model we are able to explain as much as 65% in the variation in the returns process (see Appendix II).
The model estimates Markov Transition Probabilities as follows.

P(.|1)       P(.|2)

P(1|.)       0.93920      0.69781

P(2|.)     0.060802      0.30219

In other words, the process spends most of the time in State 1, switching to State 2 around once a month, as illustrated in Fig 3 below.

Markov model
In the first state, the  pairs model produces an expected daily return of around 65bp, with a standard deviation of similar magnitude.  In this state, the process also exhibits very significant auto-regressive and moving average features.

Regime 1:

Intercept                   0.00648     0.0009       7.2          0

AR1                            0.92569    0.01897   48.797        0

MA1                         -0.96264    0.02111   -45.601        0

Error Variance^(1/2)           0.00666     0.0007

In the second state, the pairs model  produces lower average returns, and with much greater variability, while the autoregressive and moving average terms are poorly determined.

Regime 2:

Intercept                    0.03554    0.04778    0.744    0.459

AR1                            0.79349    0.06418   12.364        0

MA1                         -0.76904    0.51601     -1.49   0.139

Error Variance^(1/2)           0.01819     0.0031

CONCLUSION
The analysis in Appendix II suggests that the residual process is stable and Gaussian.  In other words, the two-state Markov model is able to account for the non-Normality of the returns process and extract the salient autoregressive and moving average features in a way that makes economic sense.

How is this information useful?  Potentially in two ways:

(i)     If the market state can be forecast successfully, we can use that information to increase our capital allocation during periods when the process is predicted to be in State 1, and reduce the allocation at times when it is in State 2.

(ii)    By examining the timing of the Markov states and considering different features of the market during the contrasting periods, we might be able to identify additional explanatory factors that could be used to further enhance the trading model.

Markov model

Trading With Indices

In this post I want to discuss ways to make use of signals from relevant market indices in your trading.  These signals can add value regardless of whether you trade algorithmically or manually.  The techniques described here are one of the most widely applicable in the quantitative analyst’s arsenal.

Let’s motivate the discussion by looking an example of a simple trading system trading the VIX on weekly bars.  Performance results for the system are summarized in the chart and table below.  The system outperforms the buy and hold return by a substantial margin, with a profit factor of over 3 and a win rate exceeding 82%.  What’s not to like?

VIX EC

VIX Performance

Well, for one thing, this isn’t really a trading system – because the VIX Index itself isn’t tradable. So the performance results are purely notional (and, if you didn’t already notice, no slippage or commission is included).

It is very easy to build high-performing trading system in indices – because they are not traded products,  index prices are often stale and tend to “follow” the price action in the equivalent traded market.

This particular system for the VIX Index took me less than ten minutes to develop and comprises only a few lines of code.  The system makes use of a simple RSI indicator to decide when to buy or sell the index.  I optimized the indicator parameters (separately for long and short) over the period to 2012, and tested it out-of-sample on the data from 2013-2016.

inputs:
Price( Close ) ,
Length( 14 ) ,
OverSold( 30 ) ;

variables:
RSIValue( 0 );

RSIValue = RSI( Price, Length );
if CurrentBar > 1 and RSIValue crosses over OverSold then
Buy ( !( “RsiLE” ) ) next bar at market;

.

The daily system I built for the S&P 500 Index is a little more sophisticated than the VIX model, and produces the following results.

SP500 EC

SP500 Perf

 

Using Index Trading Systems

We have seen that its trivially easy to build profitable trading systems for index products.  But since they can’t be traded, what’s the point?

The analyst might be tempted by the idea of using the signals generated by an index trading system to trade a corresponding market, such as VIX or eMini futures.  However, this approach is certain to fail.  Index prices lag the prices of equivalent futures products, where traders first monetize their view on the market.  So using an index strategy directly to trade a cash or futures market would be like trying to trade using prices delayed by a few seconds, or minutes – a recipe for losing money.

SSALGOTRADING AD

Nor is it likely that a trading system developed for an index product will generalize to a traded market.  What I mean by this is that if you were to take an index strategy, such as the VIX RSI strategy, transfer it to VIX futures and tweak the parameters in the hope of producing a profitable system, you are likely to be disappointed. As I have shown, you can produce a profitable index trading system using the simplest and most antiquated trading concepts (such as the RSI index) that long ago ceased to offer any predictive value in actual traded markets.  Index markets are actually inefficient – the prices of index products often fail to fully reflect all relevant, available information in a timely way. Such simple inefficiencies are easily revealed by indicators such as moving averages.  Traded markets, by contrast, are highly efficient and, with the exception of HFT, it is going to take a great deal more than a simple moving average to provide insight into the few inefficiencies that do arise.

bullbear

Strategies in index products are best thought of, not as trading strategies, but rather as a means of providing broad guidance as to the general condition of the market and its likely direction over the longer term.  To take the VIX index strategy as an example, you can see that each “trade” spans several weeks.  So one might regard a “buy” signal from the VIX index system as an indication that volatility is expected to rise over the next month or two.  A trader might use that information to lean on the side of being long volatility, perhaps even avoiding any short volatility positions altogether for the next several weeks.  Following the model’s guidance in that way would would certainly have helped many equity and volatility traders during the market sell off during August 2015, for example:

 

Vix Example

The S&P 500 Index model is one I use to provide guidance as to market conditions for the current trading day.  It is a useful input to my thinking as to how aggressive I want my trading models to be during the upcoming session. If the index model suggests a positive tone to the market, with muted volatility, I might be inclined to take a more aggressive stance.  If the model starts trading to the short side, however, I am likely to want to be much more cautious.    Yesterday (May 16, 2016), for example, the index model took an early long trade, providing confirmation of the positive tenor to the market and encouraging me to trade volatility to the short side more aggressively.

 

SP500 Example

 

 

In general, I would tend to classify index trading systems as “decision support” tools that provide a means of shading opinion on the market, or perhaps providing a means of calibrating trading models to the anticipated market conditions. However, they can be used in a more direct way, short of actual trading.  For example, one of our volatility trading systems uses the trading signals from a trading system designed for the VVIX volatility-of-volatility index.  Another approach is to use the signals from an index trading system as an indicator of the market regime in a regime switching model.

Designing Index Trading Models

Whereas it is profitability that is typically the primary design criterion for an actual trading system, given the purpose of an index trading system there are other criteria that are at least as important.

It should be obvious from these few illustrations that you want to design your index model to trade less frequently than the system you are intending to trade live: if you are swing-trading the eminis on daily bars, it doesn’t help to see 50 trades a day from your index system.  What you want is an indication as to whether the market action over the next several days is likely to be positive or negative.  This means that, typically, you will design your index system using bar frequencies at least as long as for your live system.

Another way to slow down the signals coming from your index trading system is to design it for very high accuracy – a win rate of  70%, or higher.  It is actually quite easy to do this:  I have systems that trade the eminis on daily bars that have win rates of over 90%.  The trick is simply that you have to be prepared to wait a long time for the trade to come good.  For a live system that can often be a problem – no-one like to nurse an underwater position for days or weeks on end.  But for an index trading system it matters far less and, in fact, it helps:  because you want trading signals over longer horizons than the time intervals you are using in your live trading system.

Since the index system doesn’t have to trade live, it means of course that the usual trading costs and frictions do not apply.  The advantage here is that you can come up with concepts for trading systems that would be uneconomic in the real world, but which work perfectly well in the frictionless world of index trading.  The downside, however, is that this might lead you to develop index systems that trade far too frequently.  So, even though they should not apply, you might seek to introduce trading costs in order to penalize higher frequency trading systems and benefit systems that trade less frequently.

Designing index trading systems in an area in which genetic programming algorithms excel.  There are two main reasons for this.  Firstly, as I have previously discussed, simple technical indicators of the kind employed by GP modeling systems work well in index markets.  Secondly, and more importantly, you can use the GP system to tailor an index trading system to meet the precise criteria you have in mind, such as the % win rate, trading frequency, etc.

An outstanding product that I can highly recommend in this context is Mike Bryant’s Adaptrade Builder.  Builder is a superb piece of software whose power and ease of use reflects Mike’s engineering background and systems development expertise.


Adaptrade

 

 

Quantitative Analysis of Fat Tails – JonathanKinlay.com

In this quantitative analysis I explore how, starting from the assumption of a stable, Gaussian distribution in a returns process, we evolve to a system that displays all the characteristics of empirical market data, notably time-dependent moments, high levels of kurtosis and fat tails.  As it turns out, the only additional assumption one needs to make is that the market is periodically disturbed by the random arrival of news.

NOTE:  if you are unable to see the Mathematica models below, you can download the free Wolfram CDF player and you may also need this plug-in.

You can also download the complete Mathematica CDF file here.

Stationarity

A stationary process is one that evolves over time, but whose probability distribution does not vary with time. As the word implies, such a process is stable. More formally, the moments of the distribution are independent of time.

Let’s assume we are dealing with such a process that have constant mean μ and constant volatility (standard deviation) σ.

 Φ=NormalDistribution[μ,σ]

Here are some examples of Normal probability distributions, with constant mean μ = 0 and standard deviation σ ranging from 0.75 to 2

 Plot[Evaluate@Table[PDF[Φ,x],{σ,{.75,1,2}}]/.μ→0,{x,-6,6},Filling→Axis]

 

Chart 1

The moments of Φ are given by:

 Through[{Mean, StandardDeviation, Skewness, Kurtosis}[Φ]]

{μ,  σ,  0,   3}

They, too, are time – independent.

We can simulate some observations from such a process, with, say, mean μ = 0 and standard deviation σ = 1:

ListPlot[sampleData=RandomVariate[Φ /.{μ→0, σ→1},10^4]]

 

Chart 2

Histogram[sampleData]

Chart 3

If we assume for the moment that such a process is an adequate description of an asset returns process, we can simulate the evolution of a price process as follows :

ListPlot[prices=Accumulate[sampleData]]

Chart 4

 

SSALGOTRADING AD

An Empirical Distribution

Lets take a look at a real price series, comprising 1 – minute bar data in the June ‘ 14 E – Mini futures contract.

Chart 5

As with our simulated price process, it is clear that the real price process for Emini futures is also non – stationary.

What about the returns process?

ListPlot[returnsES]

Chart 6

Notice the banding effect in returns, which results from having a fixed, minimum price move of $12 .50, rather than a continuous scale.

Histogram[returnsES]

 

Chart 7

Through[{Min,Max,Mean,Median,StandardDeviation,Skewness,Kurtosis}[returnsES]]

{-0.00867214,  0.0112353,  2.75501×10-6,   0.,   0.000780895,   0.35467,   26.2376}

The empirical returns distribution doesn’ t appear to be Gaussian – the distribution is much more peaked than a standard Normal distribution with the same mean and standard deviation. And the higher moments don’t fit the Normal model either – the empirical distribution has positive skew and a kurtosis that is almost 9x greater than a Gaussian distribution. The latter signifies what is often referred to as “fat tails”: the distribution has much greater weight in the tails than a standard Normal distribution, indicating a much greater likelihood of an extreme value than a Normal distribution would predict.

A Quantitative Analysis of Non-Stationarity: Two States

Non – stationarity arises when one or more of the moments of a distribution vary over time. Let’s take a look at how that can arise, and its effects.Suppose we have a Gaussian returns process for which the mean, or drift, or trend, fluctuates over time.

Let’s consider a simple example where the process drift is  μ1 and volatility σ1 for most of the time and then for some proportion of time k, we get addition drift  μ2 and volatility σ2.  In other words we have:

 Φ1=NormalDistribution[μ1,σ1]

 Through[{Mean,StandardDeviation,Skewness,Kurtosis}[Φ1]]

{μ1,   σ1,   0,   3}

 Φ2=NormalDistribution[μ2,σ2]

 Through[{Mean,StandardDeviation,Skewness,Kurtosis}[Φ2]]

{μ2,   σ2,   0,   3}

This simple model fits a scenario in which we suppose that the returns process spends most of its time in State 1, in which is Normally distributed with  drift is  μ1 and volatility σ1, and suffers from the occasional “shock” which propels the systems into a second State 2, in which its distribution is a combination of its original distribution and a new Gaussian distribution with different mean and volatility.

Let’ s suppose that we sample the combined process y =  Φ1 + k  Φ2.   What distribution would it have?  We can represent this is follows :

 y=TransformedDistribution[(x1+k x2),{x11,x22}]


Eqn2
 

 Through[{Mean,StandardDeviation,Skewness,Kurtosis}[y]]

Stationarity_52

 Plot[PDF[y,x]/.{μ10,μ20,σ1 1,σ2 2, k0.5},{x,-6,6},FillingAxis]

Chart 8

The result is just another Normal distribution. Depending on the incidence k, y will follow a Gaussian distribution whose mean and variance depend on the mean and variance of the two Normal distributions being mixed. The resulting distribution in State 2 may have higher or lower drift and volatility, but it is still Gaussian, with constant kurtosis of 3.

In other words, the system y will be non-stationary, because the first and second moments change over time, depending on what state it is in. But the form of the distribution is unchanged – it is still Gaussian. There are no fat-tails.

Non – Stationarity : Random States

In the above example the system moved between states in a known, predictable way. The “shocks” to the system were not really shocks, but transitions. But that’s not how financial markets behave: markets move from one state to another in an unpredictable way, with the arrival of news.

We can simulate this situation as follows. Using the former model as a starting point, lets now relax the assumption that the incidence of the second state, k, is a constant. Instead, let’ s assume that k is itself a random variable. In other words we are going to now assume that our system changes state in a random way. How does this alter the distribution?

An appropriate model for λ might be a Poisson process, which is often used as a model for unpredictable, discrete events, ranging from bus arrivals to earthquakes.  PDFs of Poisson distributions with means  λ=5, 10 and 20 are shown in the chart below.  These represent probability distributions for processes that have mean  arrivals of 5, 10 or 20 events.

 DiscretePlot[Evaluate@Table[PDF[PoissonDistribution[λ],k],{λ,{5,10,20}}],{k,0,30},PlotRangeAll,PlotMarkersAutomatic]

Chart 9

Our new model now looks like this :

 y=TransformedDistribution[{x1+k*x2},{x1⎡Φ1,x2⎡Φ2,kPoissonDistribution[λ]}]

The first two moments of the distribution are as follows :

Through[{Mean,StandardDeviation}[y]]

Stationarity_60

As before, the mean and standard deviation of the distribution are going to vary, depending on the state of the system, and the mean arrival rate of shocks, . But what about kurtosis? Is it still constant?

Kurtosis[y]

Eqn1

Emphatically not!  The fourth moment of the distribution is now dependent on the drift in the second state, the volatilities of both states and the mean arrival rate of shocks, λ.

Let’ s look at a specific example.  Assume that in State 1 the process has volatility of 7.5 %, with zero drift, and that the shock distribution also has zero drift with volatility of 65 %. If the mean incidence rate of shocks λ = 10 %, the distribution kurtosis is close to that seen in the empirical distribution for the E-Mini.

 Kurtosis[y] /.{σ10.075,μ20,σ20.65,λ→0.1}

{35.3551}

More generally :

 ListLinePlot[Flatten[Kurtosis[y]/.Table[{σ10.075,μ20,σ20.65,λ→i/20},{i,1,20}]],PlotLabelStyle[“Kurtosis vs Mean Shock Arrival Rate”, FontSize18],AxesLabel->{“Incidence Rate (%)”, “Kurtosis”},FillingAxis, ImageSizeLarge]

 

Chart 10

Thus we can see how, even if the underlying returns distribution is Gaussian in form, the random arrival of news “shocks” to the system can induce non – stationarity in overall drift and volatility. It can also result in fat tails. More specifically, if the arrival of news is stochastic in nature, rather than deterministic, the process may exhibit far higher levels of kurtosis than in its original Gaussian state, in which the fourth moment was a constant level of 3.

Quantitative Analysis of a Jump Diffusion Process

Nobel – prize winning economist Robert Merton extended this basic concept to the realm of stochastic calculus.

In Merton’s jump diffusion model, the stock price follows the random process

∂St / St =μdt + σdWt+(J-1)dNt

The first two terms are familiar from the Black–Scholes model : drift rate μ, volatility σ, and random walk Wt (Wiener process).The last term represents the jumps :J is the jump size as a multiple of stock price, while Nt is the number of jump events that have occurred up to time t.is assumed to follow the Poisson process.

 PDF[PoissonDistribution[λt]]

where λ is the average frequency with which jumps occur.

The jump size J follows a log – normal distribution

 PDF[LogNormalDistribution[m, ν], s]

where m is the average jump size and v is the volatility of the jump size.

In the jump diffusion model, the stock price St follows the random process dSt/St=μ dt+σ dWt+(J-1) dN(t), which comprises, in order, drift, diffusive, and jump components. The jumps occur according to a Poisson distribution and their size follows a log-normal distribution. The model is characterized by the diffusive volatility σ, the average jump size J (expressed as a fraction of St), the frequency of jumps λ, and the volatility of jump size ν.

The Volatility Smile

The “implied volatility” corresponding to an option price is the value of the volatility parameter for which the Black-Scholes model gives the same price. A well-known phenomenon in market option prices is the “volatility smile”, in which the implied volatility increases for strike values away from the spot price. The jump diffusion model is a generalization of Black–Scholes in which the stock price has randomly occurring jumps in addition to the random walk behavior. One of the interesting properties of this model is that it displays the volatility smile effect. In this Demonstration, we explore the Black–Scholes implied volatility of option prices (equal for both put and call options) in the jump diffusion model. The implied volatility is modeled as a function of the ratio of option strike price to spot price.