Volatility Clustering Across Asset Classes: GARCH and EGARCH Analysis with Python (2015–2026)

Introduction

If you’ve been trading anything other than cash over the past eighteen months, you’ve noticed something peculiar: periods of calm tend to persist, but so do periods of chaos. A quiet Tuesday in January rarely suddenly explodes into volatility on Wednesday—market turbulence comes in clusters. This isn’t market inefficiency; it’s a fundamental stylized fact of financial markets, one that most quant models fail to properly account for.

The current volatility regime we’re navigating in early 2026 provides a perfect case study. Following the Federal Reserve’s policy pivot late in 2025, equity markets experienced a sharp correction, with the VIX spiking from around 15 to above 30 in a matter of weeks. But here’s what interests me as a researcher: that elevated volatility didn’t dissipate overnight. It lingered, exhibiting the characteristic “slow decay” that the GARCH framework was designed to capture.

In this article, I present an empirical analysis of volatility dynamics across five major asset classes—the S&P 500 (SPY), US Treasuries (TLT), Gold (GLD), Oil (USO), and Bitcoin (BTC-USD)—over the ten-year period from January 2015 to February 2026. Using both GARCH(1,1) and EGARCH(1,1,1) models, I characterize volatility persistence and leverage effects, revealing striking differences across asset classes that have direct implications for risk management and trading strategy design.

This extends my earlier work on VIX derivatives and correlation trading, where understanding the time-varying nature of volatility is essential for pricing complex derivatives and managing portfolio risk through volatile regimes.

Understanding Volatility Clustering

Before diving into the results, let’s build some intuition about what GARCH actually captures—and why it matters.

Volatility clustering refers to the empirical observation that large price changes tend to be followed by large price changes, and small changes tend to follow small changes. If the market experiences a turbulent day, don’t expect immediate tranquility the next day. Conversely, a period of quiet trading often continues uninterrupted.

This phenomenon was formally modeled by Robert Engle in his landmark 1982 paper, “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation,” which introduced the ARCH (Autoregressive Conditional Heteroskedasticity) model. Engle’s insight was revolutionary: rather than assuming constant variance (homoskedasticity), he modeled variance itself as a time-varying process that depends on past shocks.

Tim Bollerslev extended this work in 1986 with the GARCH (Generalized ARCH) model, which proved more parsimonious and flexible. Then, in 1991, Daniel Nelson introduced the EGARCH (Exponential GARCH) model, which could capture the asymmetric response of volatility to positive versus negative returns—the famous “leverage effect” where negative shocks tend to increase volatility more than positive shocks of equal magnitude.

The Mathematics

The standard GARCH(1,1) model specifies:

$\sigma_t^2 = \omega + \alpha r_{t-1}^2 + \beta \sigma_{t-1}^2$

where:

σ_t² is the conditional variance at time t
r_t-1² is the squared return from the previous period (the “shock”)
σ_t-1² is the previous period’s conditional variance
α measures how quickly volatility responds to new shocks
β measures the persistence of volatility shocks
The sum α + β represents overall volatility persistence

The key parameter here is α + β. If this sum is close to 1 (as it typically is for financial assets), volatility shocks decay slowly—a phenomenon I observed firsthand during the 2025-2026 correction. We can calculate the “half-life” of a volatility shock as:

$\text{Half-life} = \frac{\ln(0.5)}{\ln(\alpha + \beta)}$

For example, with α + β = 0.97, a volatility shock takes approximately ln(0.5)/ln(0.97) ≈ 23 days to decay by half.

The EGARCH model modifies this framework to capture asymmetry:

$\ln(\sigma_t^2) = \omega + \alpha \left(\frac{r_{t-1}}{\sigma_{t-1}}\right) + \gamma \left(\frac{|r_{t-1}|}{\sigma_{t-1}}\right) + \beta \ln(\sigma_{t-1}^2)$

The parameter γ (gamma) captures the leverage effect. A negative γ means that negative returns generate more volatility than positive returns of equal magnitude—which is precisely what we observe in equity markets and, as we’ll see, in Bitcoin.

Methodology

For each asset in the sample, I computed daily log returns as:

$r_t = 100 \times \ln\left(\frac{P_t}{P_{t-1}}\right)$

The multiplication by 100 converts returns to percentage terms, which improves numerical convergence when estimating the models.

I then fitted two volatility models to each asset’s return series:

GARCH(1,1): The workhorse model that captures volatility clustering through the autoregressive structure of conditional variance
EGARCH(1,1,1): The exponential GARCH model that additionally captures leverage effects through the asymmetric term

All models were estimated using Python’s arch package with normally distributed innovations. The sample period spans January 2015 to February 2026, encompassing multiple distinct volatility regimes including:

The 2015-2016 oil price collapse
The 2018 Q4 correction
The COVID-19 volatility spike of March 2020
The 2022 rate-hike cycle
The 2025-2026 post-pivot correction

This rich variety of regimes makes the sample ideal for studying volatility dynamics across different market conditions.

Results

GARCH(1,1) Estimates

The GARCH(1,1) model reveals substantial variation in volatility dynamics across asset classes:

Asset	α (alpha)	β (beta)	Persistence (α+β)	Half-life (days)	AIC
S&P 500	0.1810	0.7878	0.9688	~23	7130.4
US Treasuries	0.0683	0.9140	0.9823	~38	7062.7
Gold	0.0631	0.9110	0.9741	~27	7171.9
Oil	0.1271	0.8305	0.9576	~16	11999.4
Bitcoin	0.1228	0.8470	0.9699	~24	20789.6

EGARCH(1,1,1) Estimates

The EGARCH model additionally captures leverage effects:

Asset	α (alpha)	β (beta)	γ (gamma)	Persistence	AIC
S&P 500	0.2398	0.9484	-0.1654	1.1882	7022.6
US Treasuries	0.1501	0.9806	0.0084	1.1307	7063.5
Gold	0.1205	0.9721	0.0452	1.0926	7146.9
Oil	0.2171	0.9564	-0.0668	1.1735	12002.8
Bitcoin	0.2505	0.9377	-0.0383	1.1882	20773.9

Interpretation

Volatility Persistence

All five assets exhibit high volatility persistence, with α + β ranging from 0.9576 (Oil) to 0.9823 (US Treasuries). These values are remarkably consistent with the classic empirical findings from Engle (1982) and Bollerslev (1986), who first documented this phenomenon in inflation and stock market data respectively.

US Treasuries show the highest persistence (0.9823), meaning volatility shocks in the bond market take longer to decay—approximately 38 days to half-life. This makes intuitive sense: Federal Reserve policy changes, which are the primary drivers of Treasury volatility, tend to have lasting effects that persist through subsequent meetings and economic data releases.

Gold exhibits the second-highest persistence (0.9741), consistent with its role as a long-term store of value. Macroeconomic uncertainties—geopolitical tensions, currency debasement fears, inflation scares—don’t resolve quickly, and neither does the associated volatility.

S&P 500 and Bitcoin show similar persistence (~0.97), with half-lives of approximately 23-24 days. This suggests that equity market volatility shocks, despite their reputation for sudden spikes, actually decay at a moderate pace.

Oil has the lowest persistence (0.9576), which makes sense given the more mean-reverting nature of commodity prices. Oil markets can experience rapid shifts in sentiment based on supply disruptions or demand changes, but these shocks tend to resolve more quickly than in financial assets.

Leverage Effects

The EGARCH γ parameter reveals asymmetric volatility responses—the leverage effect that Nelson (1991) formalized:

S&P 500 (γ = -0.1654): The strongest negative leverage effect in the sample. A 1% drop in equities increases volatility significantly more than a 1% rise. This is the classic equity pattern: bad news is “stickier” than good news. For options traders, this means that protective puts are more expensive than equivalent out-of-the-money calls during volatile periods—a direct consequence of this asymmetry.

Bitcoin (γ = -0.0383): Moderate negative leverage, weaker than equities but still significant. The cryptocurrency market shows asymmetric reactions to price movements, with downside moves generating more volatility than upside moves. This is somewhat surprising given Bitcoin’s retail-dominated nature, but consistent with the hypothesis that large institutional players are increasingly active in crypto markets.

Oil (γ = -0.0668): Moderate negative leverage, similar to Bitcoin. The energy market’s reaction to geopolitical events (which tend to be negative supply shocks) contributes to this asymmetry.

Gold (γ = +0.0452): Here’s where it gets interesting. Gold exhibits a slight positive gamma—the opposite of the equity pattern. Positive returns slightly increase volatility more than negative returns. This is consistent with gold’s safe-haven role: when risk assets sell off and investors flee to gold, the resulting price spike in gold can be accompanied by increased trading activity and volatility. Conversely, gradual gold price increases during calm markets occur with declining volatility.

US Treasuries (γ = +0.0084): Essentially symmetric. Treasury volatility doesn’t distinguish between positive and negative returns—which makes sense, since Treasuries are priced primarily on interest rate expectations rather than “good” or “bad” news in the equity sense.

Model Fit

The AIC (Akaike Information Criterion) comparison shows that EGARCH provides a materially better fit for the S&P 500 (7022.6 vs 7130.4) and Bitcoin (20773.9 vs 20789.6), where significant leverage effects are present. For Gold and Treasuries, GARCH performs comparably or slightly better, consistent with the absence of significant leverage asymmetry.

Practical Implications for Traders

1. Volatility Forecasting and Position Sizing

The high persistence values across all assets have direct implications for position sizing during volatile regimes. If you’re trading options or managing a portfolio, the GARCH framework tells you that elevated volatility will likely persist for weeks, not days. This suggests:

Don’t reduce risk too quickly after a volatility spike. The half-life analysis shows that it takes 2-4 weeks for half of a volatility shock to dissipate. Cutting exposure immediately after a correction means you’re selling low vol into the spike.
Expect re-leveraging opportunities. Once vol peaks and begins decaying, there’s a window of several weeks where volatility is still elevated but declining—potentially favorable for selling vol (e.g., writing covered calls or selling volatility swaps).

2. Options Pricing

The leverage effects have material implications for option pricing:

Equity options (S&P 500) should price in significant skew—put options are relatively more expensive than calls. If you’re buying protection (e.g., buying SPY puts for portfolio hedge), you’re paying a premium for this asymmetry.
Bitcoin options show similar but weaker asymmetry. The market is still relatively young, and the vol surface may not fully price in the leverage effect—potentially an edge for sophisticated options traders.
Gold options exhibit the opposite pattern. Call options may be relatively cheaper than puts, reflecting gold’s tendency to experience vol spikes on rallies (as opposed to selloffs).

3. Portfolio Construction

For multi-asset portfolios, the differing persistence and leverage characteristics suggest tactical allocation shifts:

During risk-on regimes: Low persistence in oil suggests faster mean reversion—commodity exposure might be appropriate for shorter time horizons.
During risk-off regimes: High persistence in Treasuries means bond market volatility decays slowly. Duration hedges need to account for this extended volatility window.
Diversification benefits: The low correlation between equity and Treasury volatility dynamics supports the case for mixed-asset portfolios—but the high persistence in both suggests that when one asset class enters a high-vol regime, it likely persists for weeks.

4. Trading Volatility Directly

For traders who express views on volatility itself (VIX futures, variance swaps, volatility ETFs):

The persistence framework suggests that VIX spikes should be traded as mean-reverting (which they are), but with the expectation that complete normalization takes 30-60 days.
The leverage effect in equities means that vol strategies should be positioned for asymmetric payoffs—long vol positions benefit more from downside moves than equivalent upside moves.

Reproducible Example

At the bottom of the post is the complete Python code used to generate these results. The code uses yfinance for data download and the arch package for model estimation. It’s designed to be easily extensible—you can add additional assets, change the date range, or experiment with different GARCH variants (GARCH-M, TGARCH, GJR-GARCH) to capture different aspects of the volatility dynamics.

Conclusion

This analysis confirms that volatility clustering is a universal phenomenon across asset classes, but the specific characteristics vary meaningfully:

Volatility persistence is universally high (α + β ≈ 0.95–0.98), meaning volatility shocks take weeks to months to decay. This has important implications for position sizing and risk management.
Leverage effects vary dramatically across asset classes. Equities show strong negative leverage (bad news increases vol more than good news), while gold shows slight positive leverage (opposite pattern), and Treasuries show no meaningful asymmetry.
The half-life of volatility shocks ranges from approximately 16 days (oil) to 38 days (Treasuries), providing a quantitative guide for expected duration of volatile regimes.

These findings extend naturally to my ongoing work on volatility derivatives and correlation trading. Understanding the persistence and asymmetry of volatility is essential for pricing VIX options, variance swaps, and other vol-sensitive products—as well as for managing the tail risk that inevitably accompanies high-volatility regimes like the one we’re navigating in early 2026.

References

Engle, R.F. (1982). “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica, 50(4), 987-1007.
Bollerslev, T. (1986). “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics, 31(3), 307-327.
Nelson, D.B. (1991). “Conditional Heteroskedasticity in Asset Returns: A New Approach.” Econometrica, 59(2), 347-370.

All models estimated using Python’s arch package with normal innovations. Data source: Yahoo Finance. The analysis covers the period January 2015 through February 2026, comprising approximately 2,800 trading days.

"""
GARCH Analysis: Volatility Clustering Across Asset Classes
============================================== ==============
- Downloads daily adjusted close prices (2015–2026)
- Computes log returns (in percent)
- Fits GARCH(1,1) and EGARCH(1,1) models to each asset
- Reports key parameters: alpha, beta, persistence, gamma (leverage in EGARCH)
- Highlights potential leverage effects when |γ| > 0.05

Assets included: SPY, TLT, GLD, USO, BTC-USD
"""

import yfinance as yf
import pandas as pd
import numpy as np
from arch import arch_model
import warnings

# Suppress arch model convergence warnings for cleaner output
warnings.filterwarnings('ignore', category=UserWarning)

# ────────────────────────────────────────────────
# Configuration
# ────────────────────────────────────────────────
ASSETS = ['SPY', 'TLT', 'GLD', 'USO', 'BTC-USD']
START_DATE = '2015-01-01'
END_DATE = '2026-02-14'

# ────────────────────────────────────────────────
# 1. Download price data
# ────────────────────────────────────────────────
print("=" * 70)
print("GARCH(1,1) & EGARCH(1,1) Analysis – Volatility Clustering")
print("=" * 70)
print()

print("1. Downloading daily adjusted close prices...")
price_data = {}

for asset in ASSETS:
 try:
 df = yf.download(asset, start=START_DATE, end=END_DATE,
 progress=False, auto_adjust=True)
 if df.empty:
 print(f" {asset:6s} → No data retrieved")
 continue
 price_data[asset] = df['Close']
 print(f" {asset:6s} → {len(df):5d} observations")
 except Exception as e:
 print(f" {asset:6s} → Download failed: {e}")

# Combine into single DataFrame and drop rows with any missing values
prices = pd.DataFrame(price_data).dropna()
print(f"\nCombined clean dataset: {len(prices):,} trading days")

# ────────────────────────────────────────────────
# 2. Calculate log returns (in percent)
# ────────────────────────────────────────────────
print("\n2. Computing log returns...")
returns = np.log(prices / prices.shift(1)).dropna() * 100
print(f"Log returns ready: {len(returns):,} observations\n")

# ────────────────────────────────────────────────
# 3. Fit GARCH(1,1) and EGARCH(1,1) models
# ────────────────────────────────────────────────
print("3. Fitting models...")
print("-" * 70)

results = []

for asset in ASSETS:
 if asset not in returns.columns:
 print(f"{asset:6s} → Skipped (no data)")
 continue

 print(f"\n{asset}")
 print("─" * 40)

 asset_returns = returns[asset].dropna()

 # Default missing values
 row = {
 'Asset': asset,
 'Alpha_GARCH': np.nan, 'Beta_GARCH': np.nan, 'Persist_GARCH': np.nan,
 'LL_GARCH': np.nan, 'AIC_GARCH': np.nan,
 'Alpha_EGARCH': np.nan, 'Gamma_EGARCH': np.nan, 'Beta_EGARCH': np.nan,
 'Persist_EGARCH': np.nan
 }

 # ───── GARCH(1,1) ─────
 try:
 model_garch = arch_model(
 asset_returns,
 vol='Garch', p=1, q=1,
 dist='normal',
 mean='Zero' # common choice for pure volatility models
 )
 res_garch = model_garch.fit(disp='off', options={'maxiter': 500})

 row['Alpha_GARCH'] = res_garch.params.get('alpha[1]', np.nan)
 row['Beta_GARCH'] = res_garch.params.get('beta[1]', np.nan)
 row['Persist_GARCH'] = row['Alpha_GARCH'] + row['Beta_GARCH']
 row['LL_GARCH'] = res_garch.loglikelihood
 row['AIC_GARCH'] = res_garch.aic

 print(f"GARCH(1,1) α = {row['Alpha_GARCH']:8.4f} "
 f"β = {row['Beta_GARCH']:8.4f} "
 f"persistence = {row['Persist_GARCH']:6.4f}")
 except Exception as e:
 print(f"GARCH(1,1) failed: {e}")

 # ───── EGARCH(1,1) ─────
 try:
 model_egarch = arch_model(
 asset_returns,
 vol='EGARCH', p=1, o=1, q=1,
 dist='normal',
 mean='Zero'
 )
 res_egarch = model_egarch.fit(disp='off', options={'maxiter': 500})

 row['Alpha_EGARCH'] = res_egarch.params.get('alpha[1]', np.nan)
 row['Gamma_EGARCH'] = res_egarch.params.get('gamma[1]', np.nan)
 row['Beta_EGARCH'] = res_egarch.params.get('beta[1]', np.nan)
 row['Persist_EGARCH'] = row['Alpha_EGARCH'] + row['Beta_EGARCH']

 print(f"EGARCH(1,1) α = {row['Alpha_EGARCH']:8.4f} "
 f"γ = {row['Gamma_EGARCH']:8.4f} "
 f"β = {row['Beta_EGARCH']:8.4f} "
 f"persistence = {row['Persist_EGARCH']:6.4f}")

 if abs(row['Gamma_EGARCH']) > 0.05:
 print(" → Significant leverage effect (|γ| > 0.05)")
 except Exception as e:
 print(f"EGARCH(1,1) failed: {e}")

 results.append(row)

# ────────────────────────────────────────────────
# 4. Summary table
# ────────────────────────────────────────────────
print("\n" + "=" * 70)
print("SUMMARY OF RESULTS")
print("=" * 70)

df_results = pd.DataFrame(results)
df_results = df_results.round(4)

# Reorder columns for readability
cols = [
 'Asset',
 'Alpha_GARCH', 'Beta_GARCH', 'Persist_GARCH',
 'Alpha_EGARCH', 'Gamma_EGARCH', 'Beta_EGARCH', 'Persist_EGARCH',
 #'LL_GARCH', 'AIC_GARCH' # uncomment if you want log-likelihood & AIC
]

print(df_results[cols].to_string(index=False))
print()

print("Done.").

May 12, 2021May 13, 2021

Strategy Backtesting in Mathematica

This is a snippet from a strategy backtesting system that I am currently building in Mathematica.

One of the challenges when building systems in WL is to avoid looping wherever possible. This can usually be accomplished with some thought, and the efficiency gains can be significant. But it can be challenging to get one’s head around the appropriate construct using functions like FoldList, etc, especially as there are often edge cases to be taken into consideration.

A case in point is the issue of calculating the profit and loss from individual trades in a trading strategy. The starting point is to come up with a FoldList compatible function that does the necessary calculations:

CalculateRealizedTradePL[{totalQty_, totalValue_, avgPrice_, PL_, totalPL_}, {qprice_, qty_}] := Module[{newTotalPL = totalPL, price = QuantityMagnitude[qprice], newTotalQty, tradeValue, newavgPrice, newTotalValue, newPL}, newTotalQty = totalQty + qty; tradeValue = If[Sign[qty] == Sign[totalQty] || avgPrice == 0, priceqty, If[Sign[totalQty + qty] == Sign[totalQty], avgPriceqty, price(totalQty + qty)]]; newTotalValue = If[Sign[totalQty] == Sign[newTotalQty], totalValue + tradeValue, newTotalQtyprice]; newavgPrice = If[Sign[totalQty + qty] == Sign[totalQty], (totalQtyavgPrice + tradeValue)/newTotalQty, price]; newPL = If[(Sign[qty] == Sign[totalQty] ) || totalQty == 0, 0, qty(avgPrice - price)]; newTotalPL = newTotalPL + newPL; {newTotalQty, newTotalValue, newavgPrice, newPL, newTotalPL}]

Trade P&L is calculated on an average cost basis, as opposed to FIFO or LIFO.

Note that the functions handle both regular long-only trading strategies and short-sale strategies, in which (in the case of equities), we have to borrow the underlying stock to sell it short. Also, the pointValue argument enables us to apply the functions to trades in instruments such as futures for which, unlike stocks, the value of a 1 point move is typically larger than 1(e.g.50 for the ES S&P 500 mini futures contract).

We then apply the function in two flavors, to accommodate both standard numerical arrays and timeseries (associations would be another good alternative):

CalculateRealizedPLFromTrades[tradeList_?ArrayQ, pointValue_ : 1] := Module[{tradePL = Rest@FoldList[CalculateRealizedTradePL, {0, 0, 0, 0, 0}, tradeList]}, tradePL[[All, 4 ;; 5]] = tradePL[[All, 4 ;; 5]]pointValue; tradePL] CalculateRealizedPLFromTrades[tsTradeList_, pointValue_ : 1] := Module[{tsTradePL = Rest@FoldList[CalculateRealizedTradePL, {0, 0, 0, 0, 0}, QuantityMagnitude@tsTradeList["Values"]]}, tsTradePL[[All, 4 ;; 5]] = tsTradePL[[All, 4 ;; 5]]pointValue; tsTradePL[[All, 2 ;;]] = Quantity[tsTradePL[[All, 2 ;;]], "US Dollars"]; tsTradePL = TimeSeries[ Transpose@ Join[Transpose@tsTradeList["Values"], Transpose@tsTradePL], tsTradeList["DateList"]]]

These functions run around 10x faster that the equivalent functions that use Do loops (without parallelization or compilation, admittedly).

Let’s see how they work with an example:

Trade Simulation

Next, we’ll generate a series of random trades using the AAPL time series, as follows (we also take the opportunity to convert the list of trades into a time series, tsTrades):

trades = Transpose@ Join[Transpose[ tsAAPL["DatePath"][[ Sort@RandomSample[Range[tsAAPL["PathLength"]], 20]]]], {RandomChoice[{-100, 100}, 20]}]; trades // TableForm

Trade P&L Calculation

We are now ready to apply our Trade P&L calculation function, first to the list of trades in array form:

TableForm[
Flatten[#] & /@ 
Partition[
Riffle[trades, 
CalculateRealizedPLFromTrades[trades[[All, 2 ;; 3]]]], 2], 
TableHeadings -> {{}, {"Date", "Price", "Quantity", "Total Qty", 
"Position Value", "Average Price", "P&L", "Total PL"}}]

The timeseries version of the function provides the output as a timeseries object in Quantity[“US Dollars”] format and, of course, can be plotted immediately with DateListPlot (it is also convenient for other reasons, as the complete backtest system is built around timeseries objects):

tsTradePL = CalculateRealizedPLFromTrades[tsTrades]

February 23, 2021

Measuring Toxic Flow for Trading & Risk Management

A common theme of microstructure modeling is that trade flow is often predictive of market direction. One concept in particular that has gained traction is flow toxicity, i.e. flow where resting orders tend to be filled more quickly than expected, while aggressive orders rarely get filled at all, due to the participation of informed traders trading against uninformed traders. The fundamental insight from microstructure research is that the order arrival process is informative of subsequent price moves in general and toxic flow in particular. This is turn has led researchers to try to measure the probability of informed trading (PIN). One recent attempt to model flow toxicity, the Volume-Synchronized Probability of Informed Trading (VPIN)metric, seeks to estimate PIN based on volume imbalance and trade intensity. A major advantage of this approach is that it does not require the estimation of unobservable parameters and, additionally, updating VPIN in trade time rather than clock time improves its predictive power. VPIN has potential applications both in high frequency trading strategies, but also in risk management, since highly toxic flow is likely to lead to the withdrawal of liquidity providers, setting up the conditions for a flash-crash” type of market breakdown.

The procedure for estimating VPIN is as follows. We begin by grouping sequential trades into equal volume buckets of size V. If the last trade needed to complete a bucket was for a size greater than needed, the excess size is given to the next bucket. Then we classify trades within each bucket into two volume groups: Buys (V(t)_B) and Sells (V(t)_S), with V = V(t)_B+ V(t)_S
The Volume-Synchronized Probability of Informed Trading is then derived as:

Typically one might choose to estimate VPIN using a moving average over n buckets, with n being in the range of 50 to 100.

Another related statistic of interest is the single-period signed VPIN. This will take a value of between -1 and =1, depending on the proportion of buying to selling during a single period t.

Fig 1. Single-Period Signed VPIN for the ES Futures Contract

It turns out that quote revisions condition strongly on the signed VPIN. For example, in tests of the ES futures contract, we found that the change in the midprice from one volume bucket the next was highly correlated to the prior bucket’s signed VPIN, with a coefficient of 0.5. In other words, market participants offering liquidity will adjust their quotes in a way that directly reflects the direction and intensity of toxic flow, which is perhaps hardly surprising.

Of greater interest is the finding that there is a small but statistically significant dependency of price changes, as measured by first buy (sell) trade price to last sell (buy) trade price, on the prior period’s signed VPIN. The correlation is positive, meaning that strongly toxic flow in one direction has a tendency to push prices in the same direction during the subsequent period. Moreover, the single period signed VPIN turns out to be somewhat predictable, since its autocorrelations are statistically significant at two or more lags. A simple linear auto-regression ARMMA(2,1) model produces an R-square of around 7%, which is small, but statistically significant.

A more useful model, however , can be constructed by introducing the idea of Markov states and allowing the regression model to assume different parameter values (and error variances) in each state. In the Markov-state framework, the system transitions from one state to another with conditional probabilities that are estimated in the model.

An example of such a model for the signed VPIN in ES is shown below. Note that the model R-square is over 27%, around 4x larger than for a standard linear ARMA model.

We can describe the regime-switching model in the following terms. In the regime 1 state the model has two significant autoregressive terms and one significant moving average term (ARMA(2,1)). The AR1 term is large and positive, suggesting that trends in VPIN tend to be reinforced from one period to the next. In other words, this is a momentum state. In the regime 2 state the AR2 term is not significant and the AR1 term is large and negative, suggesting that changes in VPIN in one period tend to be reversed in the following period, i.e. this is a mean-reversion state.

The state transition probabilities indicate that the system is in mean-reversion mode for the majority of the time, approximately around 2 periods out of 3. During these periods, excessive flow in one direction during one period tends to be corrected in the
ensuring period. But in the less frequently occurring state 1, excess flow in one direction tends to produce even more flow in the same direction in the following period. This first state, then, may be regarded as the regime characterized by toxic flow.

Markov State Regime-Switching Model

Markov Transition Probabilities

P(.|1) P(.|2)

P(1|.) 0.54916 0.27782

P(2|.) 0.45084 0.7221

Regime 1:

AR1 1.35502 0.02657 50.998 0

AR2 -0.33687 0.02354 -14.311 0

MA1 0.83662 0.01679 49.828 0

Error Variance^(1/2) 0.36294 0.0058

Regime 2:

AR1 -0.68268 0.08479 -8.051 0

AR2 0.00548 0.01854 0.296 0.767

MA1 -0.70513 0.08436 -8.359 0

Error Variance^(1/2) 0.42281 0.0016

Log Likelihood = -33390.6

Schwarz Criterion = -33445.7

Hannan-Quinn Criterion = -33414.6

Akaike Criterion = -33400.6

Sum of Squares = 8955.38

R-Squared = 0.2753

R-Bar-Squared = 0.2752

Residual SD = 0.3847

Residual Skewness = -0.0194

Residual Kurtosis = 2.5332

Jarque-Bera Test = 553.472 {0}

Box-Pierce (residuals): Q(9) = 13.9395 {0.124}

Box-Pierce (squared residuals): Q(12) = 743.161 {0}

A Simple Trading Strategy

One way to try to monetize the predictability of the VPIN model is to use the forecasts to take directional positions in the ES
contract. In this simple simulation we assume that we enter a long (short) position at the first buy (sell) price if the forecast VPIN exceeds some threshold value 0.1 (-0.1). The simulation assumes that we exit the position at the end of the current volume bucket, at the last sell (buy) trade price in the bucket.

This simple strategy made 1024 trades over a 5-day period from 8/8 to 8/14, 90% of which were profitable, for a total of $7,675 – i.e. around ½ tick per trade.

The simulation is, of course, unrealistically simplistic, but it does give an indication of the prospects for more realistic version of the strategy in which, for example, we might rest an order on one side of the book, depending on our VPIN forecast.

Figure 2 – Cumulative Trade PL

References

Easley, D., Lopez de Prado, M., O’Hara, M., Flow Toxicity and Volatility in a High frequency World, Johnson School Research paper Series # 09-2011, 2011

Easley, D. and M. O‟Hara (1987), “Price, Trade Size, and Information in Securities Markets”, Journal of Financial Economics, 19.

Easley, D. and M. O‟Hara (1992a), “Adverse Selection and Large Trade Volume: The Implications for Market Efficiency”,
Journal of Financial and Quantitative Analysis, 27(2), June, 185-208.

Easley, D. and M. O‟Hara (1992b), “Time and the process of security price adjustment”, Journal of Finance, 47, 576-605.

January 19, 2021January 19, 2021

Forecasting Financial Markets – Part 1: Time Series Analysis

The presentation in this post covers a number of important topics in forecasting, including:

Stationary processes and random walks
Unit roots and autocorrelation
ARMA models
Seasonality
Model testing
Forecasting
Dickey-Fuller and Phillips-Perron tests for unit roots

Also included are a number of detailed worked examples, including:

Forecasting 2011 - Time Series

December 8, 2020

Resources for Quantitative Analysts

Two of the smartest econometricians I know are Prof. Stephen Taylor of Lancaster University, and Prof. James Davidson of Exeter University.

I recall spending many profitable hours in the 1980’s with Stephen’s book Modelling Financial Time Series, which I am pleased to see has now been reprinted in a second edition. For a long time this was the best available book on the topic and it remains a classic. It has been surpassed by very few books, one being Stephen’s later work Asset Price Dynamics, Volatility and Prediction. This is a superb exposition, one that will repay close study.

James Davidson is one of the smartest minds in econometrics. Not only is his research of the highest caliber, he has somehow managed (in his spare time!) to develop one of the most advanced econometrics packages available. Based on Jurgen Doornik’s Ox programming system, the Time Series Modelling package covers almost every conceivable model type, including regression models, ARIMA, ARFIMA and other single equation models, systems of equations, panel data models, GARCH and other heteroscedastic models and regime switching models, accompanied by very comprehensive statistical testing capabilities. Furthermore, TSM is very well documented and despite being arguably the most advanced system of its kind it is inexpensive relative to alternatives. James’s research output is voluminous and often highly complex. His book, Econometric Theory, is an excellent guide to the state of the art, but not for the novice (or the faint hearted!).

Those looking for a kinder, gentler introduction to econometrics would do well to acquire a copy of Prof. Chris Brooks’s Introductory Econometrics for Finance. This covers most of the key ideas, from regression, through ARMA, GARCH, panel data models, cointegration, regime switching and volatility modeling. Not only is the coverage comprehensive, Chris’s explanation of the concepts is delightfully clear and illustrated with interesting case studies which he analyzes using the EViews econometrics package. Although not as advanced as TSM, EViews has everything that most quantitative analysts are likely to require in a modeling system and is very well suited to Chris’s teaching style. Chris’s research output is enormous and covers a great many topics of interest to financial market analysts, in the same lucid style.

September 29, 2020

Robustness in Quantitative Research and Trading

What is Strategy Robustness? What is its relevance to Quantitative Research and Trading?

One of the most highly desired properties of any financial model or investment strategy, by investors and managers alike, is robustness. I would define robustness as the ability of the strategy to deliver a consistent results across a wide range of market conditions. It, of course, by no means the only desirable property – investing in Treasury bills is also a pretty robust strategy, although the returns are unlikely to set an investor’s pulse racing – but it does ensure that the investor, or manager, is unlikely to be on the receiving end of an ugly surprise when market conditions adjust.

Robustness is not the same thing as low volatility, which also tends to be a characteristic highly prized by many investors. A strategy may operate consistently, with low volatility in certain market conditions, but behave very differently in other. For instance, a delta-hedged short-volatility book containing exotic derivative positions. The point is that empirical researchers do not know the true data-generating process for the markets they are modeling. When specifying an empirical model they need to make arbitrary assumptions. An example is the common assumption that assets returns follow a Gaussian distribution. In fact, the empirical distribution of the great majority of asset process exhibit the characteristic of “fat tails”, which can result from the interplay between multiple market states with random transitions. See this post for details:

http://jonathankinlay.com/2014/05/a-quantitative-analysis-of-stationarity-and-fat-tails/

In statistical arbitrage, for example, quantitative researchers often make use of cointegration models to build pairs trading strategies. However the testing procedures used in current practice are not sufficient powerful to distinguish between cointegrated processes and those whose evolution just happens to correlate temporarily, resulting in the frequent breakdown in cointegrating relationships. For instance, see this post:

http://jonathankinlay.com/2017/06/statistical-arbitrage-breaks/

Modeling Assumptions are Often Wrong – and We Know It

We are, of course, not the first to suggest that empirical models are misspecified:

“All models are wrong, but some are useful” (Box 1976, Box and Draper 1987).

Martin Feldstein (1982: 829): “In practice all econometric specifications are necessarily false models.”

Luke Keele (2008: 1): “Statistical models are always simplifications, and even the most complicated model will be a pale imitation of reality.”

Peter Kennedy (2008: 71): “It is now generally acknowledged that econometric models are false and there is no hope, or pretense, that through them truth will be found.”

During the crash of 2008 quantitative Analysts and risk managers found out the hard way that the assumptions underpinning the copula models used to price and hedge credit derivative products were highly sensitive to market conditions. In other words, they were not robust. See this post for more on the application of copula theory in risk management:

http://jonathankinlay.com/2017/01/copulas-risk-management/

Robustness Testing in Quantitative Research and Trading

We interpret model misspecification as model uncertainty. Robustness tests analyze model uncertainty by comparing a baseline model to plausible alternative model specifications. Rather than trying to specify models correctly (an impossible task given causal complexity), researchers should test whether the results obtained by their baseline model, which is their best attempt of optimizing the specification of their empirical model, hold when they systematically replace the baseline model specification with plausible alternatives. This is the practice of robustness testing.

Robustness testing analyzes the uncertainty of models and tests whether estimated effects of interest are sensitive to changes in model specifications. The uncertainty about the baseline model’s estimated effect size shrinks if the robustness test model finds the same or similar point estimate with smaller standard errors, though with multiple robustness tests the uncertainty likely increases. The uncertainty about the baseline model’s estimated effect size increases of the robustness test model obtains different point estimates and/or gets larger standard errors. Either way, robustness tests can increase the validity of inferences.

Robustness testing replaces the scientific crowd by a systematic evaluation of model alternatives.

Robustness in Quantitative Research

In the literature, robustness has been defined in different ways:

as same sign and significance (Leamer)
as weighted average effect (Bayesian and Frequentist Model Averaging)
as effect stability We define robustness as effect stability.

Parameter Stability and Properties of Robustness

Robustness is the share of the probability density distribution of the baseline model that falls within the 95-percent confidence interval of the baseline model. In formulaeic terms:

Robustness is left-–right symmetric: identical positive and negative deviations of the robustness test compared to the baseline model give the same degree of robustness.
If the standard error of the robustness test is smaller than the one from the baseline model, ρ converges to 1 as long as the difference in point estimates is negligible.
For any given standard error of the robustness test, ρ is always and unambiguously smaller the larger the difference in point estimates.
Differences in point estimates have a strong influence on ρ if the standard error of the robustness test is small but a small influence if the standard errors are large.

Robustness Testing in Four Steps

Define the subjectively optimal specification for the data-generating process at hand. Call this model the baseline model.
Identify assumptions made in the specification of the baseline model which are potentially arbitrary and that could be replaced with alternative plausible assumptions.
Develop models that change one of the baseline model’s assumptions at a time. These alternatives are called robustness test models.
Compare the estimated effects of each robustness test model to the baseline model and compute the estimated degree of robustness.

Model Variation Tests

Model variation tests change one or sometimes more model specification assumptions and replace with an alternative assumption, such as:

change in set of regressors
change in functional form
change in operationalization
change in sample (adding or subtracting cases)

Example: Functional Form Test

The functional form test examines the baseline model’s functional form assumption against a higher-order polynomial model. The two models should be nested to allow identical functional forms. As an example, we analyze the ‘environmental Kuznets curve’ prediction, which suggests the existence of an inverse u-shaped relation between per capita income and emissions.

Note: grey-shaded area represents confidence interval of baseline model

Another example of functional form testing is given in this review of Yield Curve Models:

http://jonathankinlay.com/2018/08/modeling-the-yield-curve/

Random Permutation Tests

Random permutation tests change specification assumptions repeatedly. Usually, researchers specify a model space and randomly and repeatedly select model from this model space. Examples:

sensitivity tests (Leamer 1978)
artificial measurement error (Plümper and Neumayer 2009)
sample split – attribute aggregation (Traunmüller and Plümper 2017)
multiple imputation (King et al. 2001)

We use Monte Carlo simulation to test the sensitivity of the performance of our Quantitative Equity strategy to changes in the price generation process and also in model parameters:

http://jonathankinlay.com/2017/04/new-longshort-equity/

Structured Permutation Tests

Structured permutation tests change a model assumption within a model space in a systematic way. Changes in the assumption are based on a rule, rather than random. Possibilities here include:

sensitivity tests (Levine and Renelt)
jackknife test
partial demeaning test

Example: Jackknife Robustness Test

The jackknife robustness test is a structured permutation test that systematically excludes one or more observations from the estimation at a time until all observations have been excluded once. With a ‘group-wise jackknife’ robustness test, researchers systematically drop a set of cases that group together by satisfying a certain criterion – for example, countries within a certain per capita income range or all countries on a certain continent. In the example, we analyse the effect of earthquake propensity on quake mortality for countries with democratic governments, excluding one country at a time. We display the results using per capita income as information on the x-axes.

Upper and lower bound mark the confidence interval of the baseline model.

Robustness Limit Tests

Robustness limit tests provide a way of analyzing structured permutation tests. These tests ask how much a model specification has to change to render the effect of interest non-robust. Some examples of robustness limit testing approaches:

unobserved omitted variables (Rosenbaum 1991)
measurement error
under- and overrepresentation
omitted variable correlation

For an example of limit testing, see this post on a review of the Lognormal Mixture Model:

http://jonathankinlay.com/2018/08/the-lognormal-mixture-variance-model/

Summary on Robustness Testing

Robustness tests have become an integral part of research methodology. Robustness tests allow to study the influence of arbitrary specification assumptions on estimates. They can identify uncertainties that otherwise slip the attention of empirical researchers. Robustness tests offer the currently most promising answer to model uncertainty.

December 3, 2019

Conditional Value at Risk Models

One of the most widely used risk measures is the Value-at-Risk, defined as the expected loss on a portfolio at a specified confidence level. In other words, VaR is a percentile of a loss distribution.
But despite its popularity VaR suffers from well-known limitations: its tendency to underestimate the risk in the (left) tail of the loss distribution and its failure to capture the dynamics of correlation between portfolio components or nonlinearities in the risk characteristics of the underlying assets.

One method of seeking to address these shortcomings is discussed in a previous post Copulas in Risk Management. Another approach known as Conditional Value at Risk (CVaR), which seeks to focus on tail risk, is the subject of this post. We look at how to estimate Conditional Value at Risk in both Gaussian and non-Gaussian frameworks, incorporating loss distributions with heavy tails and show how to apply the concept in the context of nonlinear time series models such as GARCH.

Var, CVaR and Heavy Tails

October 16, 2018

Algorithmic Trading

MOVING FROM RESEARCH TO TRADING

I have written recently about the comparative advantages of different programming languages in the context of research and trading (see here). My sense of it is that there is no single “ideal” programming language – the best strategy is to pick an appropriate tool for the job and there are usually several reasonable choices one could make.

If you are engaged in econometrics research, you might choose a package like RATS, Eviews, Gauss, or Prof. James Davidson’s excellent and inexpensive TSM, which I have used for many years and can recommend highly. For a latency-sensitive high frequency trading application, you will probably want to use something like C++, or possibly a 3rd party algo system like Apama or Tethys. But for algorithmic trading systems of intermediate frequency the choice appears almost unlimited.

The problem with retail trading tools like TradeStation, Multicharts, or Amibroker, is that they are designed primarily for single-asset strategies. That may be ok for futures trading,where more often than not the focus is on a single underlying, but in equities the opposite is true. Using one of these products to develop and implement a pairs trading strategy is a stretch. As for portfolio analytics – forget it.

This is where more general, high level languages like R, Matlab or Mathematica come in: their greater power and flexibility is handling large, multivariate data sets makes it much more straightforward to develop portfolio strategies. And they can often bridge the gap between R&D and implementation quite easily: code that was used in the research stage can often be quickly re-tooled to work in a production version of the system. As for production systems, there is now a significant cottage industry of traders who use Matlab in algo trading. R has a similar following (see here).

In addition to parallelizing the code (for use with the Parallel Computing Toolbox) to speed up the research phase, you might also want to implement a hybrid system by re-coding the slower routines in C++, to create a mex file (for details see here). Matlab’s Profiler is a useful tool for identifying code bottlenecks. In a recent piece of research in which I was evaluating over 30,000,000 cointegrated portfolios, I discovered to my surprise that the main code bottleneck was the multiple calls to Matlab’s std function, a problem easily fixed with a few lines of C++ code. The resulting hybrid program executed at more than twice the speed – important when your run time might be several hours, or even days.

HOOKING UP THE EXECUTION PLATFORM

The main challenge for developers using generic tools like Mathematica, Matlab or R is the implementation stage of the project. Providing connectivity to brokerage/execution platforms never seemed high on the list of priorities for Wolfram or Mathworks and things are similarly hit or miss with R.

Belatedly, Mathematica now offers a link to Bloomberg via its Finance Platform. Matlab, meanwhile, offers a Trading Toolbox, which supposedly offers connectivity , not only to Bloomberg, but also Interactive Brokers and Trading Technologies, amongst other platforms. Unfortunately, the toolbox interface to IB appears to rely on outdated 1990s ActiveX technology, which is flakey at best. In tests, I was unable to make progress past the ‘not connected’ error message.

At that point I turned to Yair Altman’s IB-Matlab product. Happily, this uses IB’s Java api, which is a great deal more robust than the ActiveX platform. It’s been some time since I last used IB-Matlab and was pleased to see that Yair has been very busy over the intervening period, building the capabilities of the system and providing very comprehensive documentation for it. With Yair’s help, it took me no time at all to get up and running and within a day or two the system was executing orders flawlessly in IB’s TWS. The relatively few snags I ran into were almost all due to IB’s extremely terse error messaging, which often gives almost no clue as to what the issue might be. Fortunately, Yair is very generous with his time in providing support to his users and his responses to me questions were fast and detailed.

EXECUTION ALGOS

With intermediate systems trading at frequencies of, say, 5-minutes to daily, one has a choice to make as regards execution. Given that the strategy is not very latency sensitive, it is certainly conceivable to develop one’s own execution algos in Matlab. However, platforms like TWS are equipped with native algos, not only from IB, but also other providers like Credit Suisse and Jeffries.

Actually, I have found several of IB’s own algos such as Scaletrader and Accumulate/Distribute to be very effective. Certainly IB seems very proud of them – IB CEO Thomas Peterffy has patented at least one of them. Accumulate/Distribute, for instance, is quite sophisticated, allowing the user to randomize and slice the size and interval between individual orders, use passive or aggressive order types, and pause execution on a news alert, or when the price falls below a moving average, or outside a specified range.

There is much to be said for using algos native to the execution platform rather than reinventing the wheel, providing the cost is reasonable. So, while it is perfectly feasible to build execution algos in Matlab, it typically isn’t necessary – in most cases standard algos will suffice.

There are exceptions, of course. IB doesn’t offer the kind of basket-trading capabilities that are available in advanced algo platforms like Tethys or RediPlus. In those systems, for example, you can set the level of long/short imbalance in the portfolio that you are willing to tolerate and the algo will speed up or slow down execution of trades in individual components of the basket to maintain the dollar imbalance within that tolerance. You can also manage the sector risk dynamically during execution.

Those kind of advanced capabilities don’t come cheap and you wont find them at IB, or any other retail platform. If you need that kind of functionality, for example, because you are trading a long/short equity portfolio within a universe of 200-300 names, your best option is probably to switch to a different execution platform. Otherwise you will need to code a custom algo in your language of choice.

For many quantitative strategies, (at least the low frequency ones) IB’s standard algos are often good enough. The Accumulate/Distribute algo, for instance, will show a visual representation of the progress of the execution of individuals legs of a pairs trade, and it is easy enough to identify a potential imbalance and adjust the algo parameters in real time. If you are only trading pairs, or small portfolios of cointegrated securities, it probably isn’t worthwhile to develop the sophisticated logic that would be required to handle the adjustment of the execution of individual legs of a trade in a fully automated way. A large portfolio would be a different matter, however.

MATLAB EXAMPLE

I thought it might be instructive to take a look at how you might implement the execution of a strategy in Matlab, using IB algos. In the Matlab code fragment below, the (2 x nTickers) array tradeActions contains, in the first row, the action we wish to take (1 = BUY, -1 = SELL, -2 = SELL SHORT) and in the second row the (absolute value of) the number of shares we wish to trade for tickers i =1:nTickers. We break each order up into hundred lots and odd lots, routing the former via IB’s Accumulate/Distribute algo and the latter as passive REL orders (note that A/D will typically randomize the timing of each sub-order, while REL orders are posted directly into the market). The Matlab function AccumulateDistribute implements the most important features of IB’s A/D algo, including random size and time slicing of the order. Orders are submitted as passive REL orders with zero offset (so they will sit on the current bid or ask) – obviously you would typically want to consider allowing some non-zero offset for less liquid securities. It is not hard to envisage how one might further enhance the algo to monitor the progress of the execution and speed up or slow down certain orders accordingly.

A couple of IB api “gotchas” to be aware of:

(i) IB requires unique and monotonically increasing orderIds for each order. One way to do this, suggested by Yair, is to use orderId = round((now-735000)*3e5); This fails when you are submitting a number of orders sequentially at high speed (say in a for loop), where the time increments are sub-second, so you need to pass the orderID back and force a minimal increment, as I have in the code below.

(ii) It is very important to specify the primary exchange of each security: securities with identical tickers can be found trading on different exchanges. Failing to specify the primary exchange in such a case will result in IB rejecting the order with a typically cryptic api message.

Continue reading “Algorithmic Trading”

September 13, 2018

Enhancing Mutual Fund Returns With Market Timing

Summary

In this article, I will apply market timing techniques to several popular mutual funds.

The market timing approach produces annual rates of return that are 3% to 7% higher, with lower risk, than an equivalent buy and hold mutual fund investment.

Investors could in some cases have earned more than double the return achieved by holding a mutual fund investment over a 10-year period.

Hedging strategies that use market timing signals are able to sidestep market corrections, volatile conditions and the ensuing equity drawdowns.

Hedged portfolios typically employ around 12% less capital than the equivalent buy and hold strategy.

Background to the Market Timing Approach

In an earlier article, I discussed how to use marketing timing techniques to hedge an equity portfolio correlated to the broad market. I showed how, by using signals produced by a trading system modeled on the CBOE VIX index, we can smooth out volatility in an equity portfolio consisting of holdings in the SPDR S&P 500 ETF (NYSEARCA:SPY). An investor will typically reduce their equity holdings by a modest amount, say 20%, or step out of the market altogether during periods when the VIX index is forecast to rise, returning to the market when the VIX is likely to fall. An investment strategy based on this approach would have avoided most of the 2000-03 correction, as well as much of the market turmoil of 2008-09.

A more levered version of the hedging strategy, which I termed the MT aggressive portfolio, uses the VIX index signals to go to cash during high volatility periods, and then double the original equity portfolio holdings (using standard Reg-T leverage) during benign market conditions, as signaled by the model. The MT aggressive approach would have yielded net returns almost three times greater than that of a buy and hold portfolio in the SPY ETF, over the period from 1999-2014. Even though this version of the strategy makes use of leverage, the average holding in the portfolio would have been slightly lower than in the buy and hold portfolio because, in a majority of days, the strategy would have been 100% in cash. The result is illustrated in the chart in Fig. 1, which is reproduced below.

Fig. 1: Value of $1,000 – Long-Only Vs. MT Aggressive Portfolio

Source: Yahoo Finance.

Note that this approach does not entail shorting any stock. And for investors who prefer to buy and hold, I would make the point that the MT aggressive approach would have enabled you to buy almost three times as much stock in dollar terms by mid-2014 than would be the case if you had simply owned the SPY portfolio over the entire period.

Market Timing and Mutual Funds

With that background, we turn our attention to how we can use market timing techniques to improve returns from equity mutual funds. The funds selected for analysis are the Vanguard 500 Index Admiral (MUTF:VFIAX), Fidelity Spartan 500 Index Advtg (MUTF:FUSVX) and BlackRock S&P 500 Stock K (MUTF:WFSPX). This group of popular mutual funds is a representative sample of available funds that offer broad equity market exposure, with a high degree of correlation to the S&P 500 index. In what follows, we will focus attention on the MT aggressive approach, although other more conservative hedging strategies are equally valid.

We consider performance over the 10-year period from 2005, as at least one of the funds opened late in 2004. In each case, the MT aggressive portfolio is created by exiting the current mutual fund position and going 100% to cash, whenever the VIX model issues a buy signal in the VIX index. Conversely, we double our original mutual fund investment when the model issues a sell signal in the VIX index. In calculating returns, we make an allowance for trading costs of $3 cents per share for all transactions.

Returns for each of the mutual funds, as well as for the SPY ETF and the corresponding MT aggressive hedge strategies, are illustrated in the charts in Fig. 2. The broad pattern is similar in each case – we see significant outperformance of the MT aggressive portfolios relative to their ETF or mutual fund benchmarks. Furthermore, in most cases the hedge strategy tends to exhibit lower volatility, with less prolonged drawdowns during critical periods such as 2000/03 and 2008/09.

Fig. 2 – Value of $1,000: Mutual Fund Vs. MT Aggressive Portfolio January 2005 – June 2014

Source: Yahoo Finance.

Looking at the performance numbers in more detail, we can see from the tables shown in Fig. 3 that the MT aggressive strategies outperformed their mutual fund buy and hold benchmarks by a substantial margin. In the case of VFIAX and WFSPX, the hedge strategies produce a total net return more than double that of the corresponding mutual fund. With one exception, FUSVX, annual volatility of the MT aggressive portfolio was similar to, or lower than, that of the corresponding mutual fund, confirming our reading of the charts in Fig. 2. As a consequence, the MT aggressive strategies have higher Sharpe Ratios than any of the mutual funds. The improvement in risk adjusted returns is significant – more than double in the case of two of the funds, and about 40% higher in the case of the third.

Finally, we note that the MT aggressive strategies have an average holding that is around 12% lower than the equivalent long-only fund. That’s because of the periods in which investment proceeds are held in cash.

Fig. 3: Mutual Fund and MT Aggressive Portfolio Performance January 2005 – June 2014

Source: Yahoo Finance.

Conclusion

The aim of market timing is to smooth out the returns by hedging, and preferably avoiding altogether periods of market turmoil. In other words, the objective is to achieve the same, or better, rates of return, with lower volatility and drawdowns. We have demonstrated that this can be done, not only when the underlying investment is in an ETF such as SPY, but also where we hold an investment in one of several popular equity mutual funds. Over a 10-year period the hedge strategies produced consistently higher returns, with lower volatility and drawdown, while putting less capital at risk than their counterpart buy and hold mutual fund investments.