Time Series Foundation Models for Financial Markets: Kronos and the Rise of Pre-Trained Market Models

Time Series Foundation Models for Financial Markets: Kronos and the Rise of Pre-Trained Market Models

The quant finance industry has spent decades building specialized models for every conceivable forecasting task: GARCH variants for volatility, ARIMA for mean reversion, Kalman filters for state estimation, and countless proprietary approaches for statistical arbitrage. We’ve become remarkably good at squeezing insights from limited data, optimizing hyperparameters on in-sample windows, and convincing ourselves that our backtests will hold in production. Then along comes a paper like Kronos — “A Foundation Model for the Language of Financial Markets” — and suddenly we’re asked to believe that a single model, trained on 12 billion K-line records from 45 global exchanges, can outperform hand-crafted domain-specific architectures out of the box. That’s a bold claim. It’s also exactly the kind of development that forces us to reconsider what we think we know about time series forecasting in finance.

The Foundation Model Paradigm Comes to Finance

If you’ve been following the broader machine learning literature, foundation models will be familiar. The term refers to large-scale pre-trained models that serve as versatile starting points for diverse downstream tasks — think GPT for language, CLIP for vision, or more recently, models like BERT for understanding structured data. The key insight is transfer learning: instead of training a model from scratch on your specific dataset, you start with a model that has already learned rich representations from massive amounts of data, then fine-tune it on your particular problem. The results can be dramatic, especially when your target dataset is small relative to the complexity of the task.

Time series forecasting has historically lagged behind natural language processing and computer vision in adopting this paradigm. Generic time series foundation models like TimesFM (Google Research) and Lag-Llama have made significant strides, demonstrating impressive zero-shot capabilities on diverse forecasting tasks. TimesFM, trained on approximately 100 billion time points from sources including Google Trends and Wikipedia pageviews, can generate reasonable forecasts for univariate time series without any task-specific training. Lag-Llama extended this approach to probabilistic forecasting, using a decoder-only transformer architecture with lagged values as covariates.

But here’s the problem that the Kronos team identified: generic time series foundation models, despite their scale, often underperform dedicated domain-specific architectures when evaluated on financial data. This shouldn’t be surprising. Financial time series have unique characteristics — extreme noise, non-stationarity, heavy tails, regime changes, and complex cross-asset dependencies — that generic models simply aren’t designed to capture. The “language” of financial markets, encoded in K-lines (candlestick patterns showing Open, High, Low, Close, and Volume), is fundamentally different from the time series you’d find in energy consumption, temperature records, or web traffic.

Enter Kronos: A Foundation Model Built for Finance

Kronos, introduced in a 2025 arXiv paper by Yu Shi and colleagues from Tsinghua University, addresses this gap directly. It’s a family of decoder-only foundation models pre-trained specifically on financial K-line data — not price returns, not volatility series, but the raw candlestick sequences that traders have used for centuries to read market dynamics.

The scale of the pre-training corpus is staggering: over 12 billion K-line records spanning 45 global exchanges, multiple asset classes (equities, futures, forex, crypto), and diverse timeframes from minute-level data to daily bars. This is not a model that has seen a few thousand time series. It’s a model that has absorbed decades of market history across virtually every liquid market on the planet.

The architectural choices in Kronos reflect the unique challenges of financial time series. Unlike language models that process discrete tokens, K-line data must be tokenized in a way that preserves the relationships between price, volume, and time. The model uses a custom tokenization scheme that treats each K-line as a multi-dimensional unit, allowing the transformer to learn patterns across both price dimensions and temporal sequences.

What Makes Kronos Different: Architecture and Methodology

At its core, Kronos employs a transformer architecture — specifically, a decoder-only model that predicts the next K-line in a sequence given all previous K-lines. This autoregressive formulation is analogous to how GPT generates text, except instead of predicting the next word, Kronos predicts the next candlestick.

The mathematical formulation is worth understanding in detail. Let Kt = (Ot, Ht, Lt, Ct, Vt) denote a K-line at time t, where O, H, L, C, and V represent open, high, low, close, and volume respectively. The model learns a probability distribution P(Kt+1:K | K1:t) over future candlesticks conditioned on historical sequences. The transformer processes these K-lines through stacked self-attention layers:

h^{(l)} = \text{Attention}(Q^{(l)}, K^{(l)}, V^{(l)}) + h^{(l-1)}

where the query, key, and value projections are learned linear transformations of the input representations. The attention mechanism computes:

\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

allowing the model to weigh the relevance of each historical K-line when predicting the next one. Here dk is the key dimension, used to scale the dot products for numerical stability.

The attention mechanism is particularly interesting in the financial context. Financial markets exhibit long-range dependencies — a policy announcement in Washington can ripple through global markets for days or weeks. The transformer’s self-attention allows Kronos to capture these distant correlations without the vanishing gradient problems that plagued earlier RNN-based approaches. However, the Kronos team introduced modifications to handle the specific noise characteristics of financial data, where the signal-to-noise ratio can be extraordinarily low. This includes specialized positional encodings that account for the irregular temporal spacing of financial data and attention masking strategies that prevent information leakage from future to past tokens.

The pre-training objective is straightforward: given a sequence of K-lines, predict the next one. This is formally a maximum likelihood estimation problem:

\mathcal{L}_{\text{ML}} = \sum_t \log P(K_{t+1} | K_{1:t}; \theta)

where θ represents the model parameters. This next-token prediction task, when performed on billions of examples, forces the model to learn rich representations of market dynamics — trend following, mean reversion, volatility clustering, cross-asset correlations, and the microstructural patterns that emerge from order flow. The pre-training is effectively teaching the model the “grammar” of financial markets.

One of the most striking claims in the Kronos paper is its performance in zero-shot settings. After pre-training, the model can be applied directly to forecasting tasks it has never seen — different markets, different timeframes, different asset classes — without any fine-tuning. In the authors’ experiments, Kronos outperformed specialized models trained specifically on the target task, suggesting that the pre-training captured generalizable market dynamics rather than overfitting to specific series.

Beyond Price Forecasting: The Full Range of Applications

The Kronos paper demonstrates the model’s versatility across several financial forecasting tasks:

Price series forecasting is the most obvious application. Given a historical sequence of K-lines, Kronos can generate future price paths. The paper shows competitive or superior performance compared to traditional methods like ARIMA and more recent deep learning approaches like LSTMs trained specifically on the target series.

Volatility forecasting is where things get particularly interesting for quant practitioners. Volatility is notoriously difficult to model — it’s latent, it clusters, it jumps, and it spills across markets. Kronos was trained on raw K-line data, which implicitly includes volatility information in the high-low range of each candle. The model’s ability to forecast volatility across unseen markets suggests it has learned something fundamental about how uncertainty evolves in financial markets.

Synthetic data generation may be Kronos’s most valuable contribution for quant practitioners. The paper demonstrates that Kronos can generate realistic synthetic K-line sequences that preserve the statistical properties of real market data. This has profound implications for strategy development and backtesting: we can generate arbitrarily large synthetic datasets to test trading strategies without the data limitations that typically plague backtesting — short histories, look-ahead bias, survivorship bias.

Cross-asset dependencies are naturally captured in the pre-training. Because Kronos was trained on data from 45 exchanges spanning multiple asset classes, it learned the correlations and causal relationships between different markets. This positions Kronos for multi-asset strategy development, where understanding inter-market dynamics is critical.

Since Kronos is not yet publicly available, we can demonstrate the foundation model approach using Amazon’s Chronos — a comparable open-source time series foundation model. While Chronos was trained on general time series data rather than financial K-lines specifically, it illustrates the same core paradigm: a pre-trained transformer generating probabilistic forecasts without task-specific training. Here’s a practical demo on real financial data:

import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
from chronos import ChronosPipeline

# Load model and fetch data
pipeline = ChronosPipeline.from_pretrained("amazon/chronos-t5-large", device_map="cuda")
data = yf.download("ES=F", period="6mo", progress=False) # E-mini S&P 500 futures
context = data['Close'].values[-60:] # Use last 60 days as context

# Generate forecast
forecast = pipeline.predict(context, prediction_length=20)

# Plot
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(range(60), context, label="Historical", color="steelblue")
ax.plot(range(60, 80), forecast.mean(axis=0), label="Forecast", color="orange")
ax.axvline(x=59, color="gray", linestyle="--", alpha=0.5)
ax.set_title("Chronos Forecast: ES Futures (20-day)")
ax.legend()
plt.tight_layout()
plt.show()

SPY Daily Returns — Volatility Clustering in Action

SPY Daily Returns — Volatility Clustering in Action

Zero-Shot vs. Fine-Tuned Performance: What the Evidence Shows

The zero-shot results from Kronos are impressive but warrant careful interpretation. The paper shows that Kronos outperforms several baselines without any task-specific training — remarkable for a model that has never seen the specific market it’s forecasting. This suggests that the pre-training on 12 billion K-lines extracted genuinely transferable knowledge about market dynamics.

However, fine-tuning consistently improves performance. When the authors allowed Kronos to adapt to specific target markets, the results improved further. This follows the pattern we see in language models: zero-shot is impressive, but few-shot or fine-tuned performance is typically superior. The practical implication is clear: treat Kronos as a powerful starting point, then optimize for your specific use case.

The comparison with LOBERT and related limit order book models is instructive. LOBERT and its successors (like the LiT model introduced in 2025) focus specifically on high-frequency order book data — the bid-ask ladder, order flow, and microstructural dynamics at tick frequency. These are fundamentally different from K-line models. Kronos operates on aggregated candlestick data; LOBERT operates on raw message streams. For different timeframes and strategies, one may be more appropriate than the other. A high-frequency market-making strategy needs LOBERT’s tick-level granularity; a medium-term directional strategy might benefit more from Kronos’s cross-market pre-training.

Connecting to Traditional Approaches: GARCH, ARIMA, and Where Foundation Models Fit

Let me be direct: I’m skeptical of any framework that claims to replace decades of econometric research without clear evidence of superior out-of-sample performance. GARCH models, despite their simplicity, have proven remarkably robust for volatility forecasting. ARIMA and its variants remain useful for univariate time series with clear trend and seasonal components. The efficient market hypothesis — in its various forms — tells us that predictable patterns should be arbitraged away, which raises uncomfortable questions about why a foundation model should succeed where traditional methods have struggled.

That said, there’s a nuanced way to think about this. Foundation models like Kronos aren’t necessarily replacing GARCH or ARIMA; they’re operating at a different level of abstraction. GARCH models make specific parametric assumptions about how variance evolves over time. Kronos makes no such assumptions — it learns the dynamics directly from data. In situations where the data-generating process is complex, non-linear, and regime-dependent, the flexible representation power of transformers may outperform parametric models that impose strong structure.

Consider volatility forecasting, traditionally the domain of GARCH. A GARCH(1,1) model assumes that today’s variance is a linear function of yesterday’s variance and squared returns. This is obviously a simplification. Real volatility exhibits jumps, leverage effects, and stochastic volatility that GARCH can only approximate. Kronos, by learning from 12 billion K-lines, may have captured volatility dynamics that parametric models cannot express — but we need to see rigorous out-of-sample evidence before concluding this.

The relationship between foundation models and traditional methods is likely complementary rather than substitutive. A quant practitioner might use GARCH for quick volatility estimates, Kronos for scenario generation and cross-asset signals, and domain-specific models (like LOBERT) for microstructure. The key is understanding each tool’s strengths and limitations.

Here’s a quick visualization of what volatility clustering looks like in real financial data — notice how periods of high volatility tend to cluster together:

import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt

# Fetch SPY data
data = yf.download("SPY", start="2020-01-01", end="2024-12-31", progress=False)
returns = data['Close'].pct_change().dropna() * 100

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(returns.index, returns.values, color='steelblue', linewidth=0.8)
ax.axhline(y=returns.std(), color='red', linestyle='--', alpha=0.5, label='1 Std Dev')
ax.axhline(y=-returns.std(), color='red', linestyle='--', alpha=0.5)
ax.set_title("Daily Returns (%) — Volatility Clustering Visible", fontsize=12)
ax.set_ylabel("Return %")
ax.legend()
plt.tight_layout()
plt.show()

Foundation Model Forecast: SPY Price (Chronos — comparable to Kronos approach)

Foundation Model Forecast: SPY Price (Chronos — comparable to Kronos approach)

Practical Implications for Quant Practitioners

For those of us building trading systems, what does this actually mean? Several practical considerations emerge:

Data efficiency is perhaps the biggest win. Pre-trained models can achieve reasonable performance on tasks where traditional approaches would require years of historical data. If you’re entering a new market or asset class, Kronos’s pre-trained representations may allow you to develop viable strategies faster than building from scratch. Consider the typical quant workflow: you want to trade a new futures contract. Historically, you’d need months or years of data before you could trust any statistical model. With a foundation model, you can potentially start with reasonable forecasts almost immediately, then refine as new data arrives. This changes the economics of market entry.

Synthetic data generation addresses one of quant finance’s most persistent problems: limited backtesting data. Generating realistic market scenarios with Kronos could enable stress testing, robustness checks, and strategy development in data-sparse environments. Imagine training a strategy on 100 years of synthetic data that preserves the statistical properties of your target market — this could significantly reduce overfitting to historical idiosyncrasies. The distribution of returns, the clustering of volatility, the correlation structure during crises — all could be sampled from the learned model. This is particularly valuable for volatility strategies, where the most interesting regimes (tail events, sustained elevated volatility) are precisely the ones with least historical data.

Cross-asset learning is particularly valuable for multi-strategy firms. Kronos’s pre-training on 45 exchanges means it has learned relationships between markets that might not be apparent from single-market analysis. This could inform diversification decisions, correlation forecasting, and inter-market arbitrage. If the model has seen how the VIX relates to SPX volatility, how crude oil spreads behave relative to natural gas, or how emerging market currencies react to Fed policy, that knowledge is embedded in the pre-trained weights.

Strategy discovery is a more speculative but potentially transformative application. Foundation models can identify patterns that human intuition misses. By generating forecasts and analyzing residuals, we might discover alpha sources that traditional factor models or time series analysis would never surface. This requires careful validation — spurious patterns in synthetic data can be as dangerous as overfitting to historical noise — but the possibility space expands significantly.

Integration challenges should not be underestimated. Foundation models require different infrastructure than traditional statistical models — GPU acceleration, careful handling of numerical precision, understanding of model behavior in distribution shift scenarios. The operational overhead is non-trivial. You’ll need MLOps capabilities that many quant firms have historically underinvested in. Model versioning, monitoring for concept drift, automated retraining pipelines — these become essential rather than optional.

There’s also a workflow consideration. Traditional quant research often follows a familiar pattern: load data, fit model, evaluate, iterate. Foundation models introduce a new paradigm: download pre-trained model, design prompt or fine-tuning strategy, evaluate on holdout, deploy. The skills required are different. Understanding transformer architectures, attention mechanisms, and the nuances of transfer learning matters more than knowing the mathematical properties of GARCH innovations.

For teams considering adoption, I’d suggest a staged approach. Start with the zero-shot capabilities to establish baselines. Then explore fine-tuning on your specific datasets. Then investigate synthetic data generation for robustness testing. Each stage builds organizational capability while managing risk. Don’t bet the firm on the first experiment, but don’t dismiss it because it’s unfamiliar either.

Limitations and Open Questions

I want to be clear-eyed about what we don’t yet know. The Kronos paper, while impressive, represents early research. Several critical questions remain:

Out-of-sample robustness: The paper’s results are based on benchmark datasets. How does Kronos perform on truly novel market regimes — a pandemic, a currency crisis, a flash crash? Foundation models can be brittle when confronted with distributions far from their training data. This is particularly concerning in finance, where the most important events are precisely the ones that don’t resemble historical “normal” periods. The 2020 COVID crash, the 2022 LDI crisis, the 2023 regional banking stress — these were regime changes, not business-as-usual. We need evidence that Kronos handles these appropriately.

Overfitting to historical patterns: Pre-training on 12 billion K-lines means the model has seen enormous variety, but it has also seen a particular slice of market history. Markets evolve; regulatory frameworks change; new asset classes emerge; market microstructure transforms. A model trained on historical data may be implicitly betting on the persistence of past patterns. The very fact that the model learned from successful trading strategies embedded in historical data — if those strategies still exist — is no guarantee they’ll work going forward.

Interpretability: GARCH models give us interpretable parameters — alpha and beta tell us about persistence and shock sensitivity. Kronos is a black box. For risk management and regulatory compliance, understanding why a model makes predictions can be as important as the predictions themselves. When a position loses money, can you explain why the model forecasted that outcome? Can you stress-test the model by understanding its failure modes? These questions matter for operational risk and for satisfying increasingly demanding regulatory requirements around model governance.

Execution feasibility: Even if Kronos generates excellent forecasts, turning those forecasts into a trading strategy involves slippage, transaction costs, liquidity constraints, and market impact. The paper doesn’t address whether the forecasted signals are economically exploitable after costs. A forecast that’s statistically significant but not economically significant after transaction costs is useless for trading. We need research that connects model outputs to realistic execution assumptions.

Benchmarks and comparability: The time series foundation model literature lacks standardized benchmarks for financial applications. Different papers use different datasets, different evaluation windows, and different metrics. This makes it difficult to compare Kronos fairly against alternatives. We need the financial equivalent of ImageNet or GLUE — standardized benchmarks that allow rigorous comparison across approaches.

Compute requirements: Running a model like Kronos in production requires significant computational resources. Not every quant firm has GPU clusters sitting idle. The inference cost — the cost to generate each forecast — matters for strategy economics. If each forecast costs $0.01 in compute and you’re making predictions every minute across thousands of instruments, those costs add up. We need to understand the cost-benefit tradeoff.

Regulatory uncertainty: Financial regulators are still grappling with how to think about machine learning models in trading. Foundation models add another layer of complexity. Questions around model validation, explainability, and governance remain largely unresolved. Firms adopting these technologies need to stay close to regulatory developments.

Finally, there’s a philosophical concern worth mentioning. Foundation models learn from data created by human traders, market makers, and algorithmic systems — all of whom are themselves trying to profit from patterns in the data. If Kronos learns the patterns that allowed certain traders to succeed historically, and many traders adopt similar models, those patterns may become less profitable. This is the standard arms race argument applied to a new context. Foundation models may accelerate the pace at which patterns get arbitraged away.

The Road Ahead: NeurIPS 2025 and Beyond

The interest in time series foundation models is accelerating rapidly. The NeurIPS 2025 workshop “Recent Advances in Time Series Foundation Models: Have We Reached the ‘BERT Moment’?” (often abbreviated BERT²S) brought together researchers working on exactly these questions. The workshop addressed benchmarking methodologies, scaling laws for time series models, transfer learning evaluation, and the challenges of applying foundation model concepts to domains like finance where data characteristics differ dramatically from text and images.

The academic momentum is clear. Google continues to develop TimesFM. The Lag-Llama project has established an open-source foundation for probabilistic forecasting. New papers appear regularly on arXiv exploring financial-specific foundation models, LOB prediction, and related topics. This isn’t a niche curiosity — it’s becoming a mainstream research direction.

For quant practitioners, the message is equally clear: pay attention. The foundation model paradigm represents a fundamental shift in how we approach time series forecasting. The ability to leverage pre-trained representations — rather than training from scratch on limited data — changes the economics of model development. It may also change which problems are tractable.

Conclusion

Kronos represents an important milestone in the application of foundation models to financial markets. Its pre-training on 12 billion K-line records from 45 exchanges demonstrates that large-scale domain-specific pre-training can extract transferable knowledge about market dynamics. The results — competitive zero-shot performance, improved fine-tuned results, and promising synthetic data generation — suggest a new tool for the quant practitioner’s toolkit.

But let’s not overheat. This is 2025, not the year AI solves markets. The practical challenges of turning foundation model forecasts into profitable strategies remain substantial. GARCH and ARIMA aren’t obsolete; they’re complementary. The key is understanding when each approach adds value. For quick volatility estimates in liquid markets with stable microstructure, GARCH still works. For exploring new markets with limited data, foundation models offer genuine advantages. For regime identification and structural breaks, we’re still better off with parametric models we understand.

What excites me most is the synthetic data generation capability. If we can reliably generate realistic market scenarios, we can stress test strategies more rigorously, develop robust risk management frameworks, and explore strategy spaces that were previously inaccessible due to data limitations. That’s genuinely new. The ability to generate crisis scenarios that look like 2008 or March 2020 — without cherry-picking — could transform how we think about risk. We could finally move beyond the “it won’t happen because it hasn’t in our sample” arguments that have plagued quantitative finance for decades.

But even here, caution is warranted. Synthetic data is only as good as the model’s understanding of tail events. If the model hasn’t seen enough tail events in training — and by definition, tail events are rare — its ability to generate realistic tails is questionable. The saying “garbage in, garbage out” applies to synthetic data generation as much as anywhere else.

The broader foundation model approach to time series — whether through Kronos, TimesFM, Lag-Llama, or the models yet to come — is worth serious attention. These are not magic bullets, but they represent a meaningful evolution in our methodological toolkit. For quants willing to learn new approaches while maintaining skepticism about hype, the next few years offer real opportunity. The question isn’t whether foundation models will matter for quant finance; it’s how quickly they can be integrated into production workflows in a way that’s robust, interpretable, and economically valuable.

I’m keeping an open mind while holding firm on skepticism. That’s served me well in 25 years of quantitative finance. It will serve us well here too.


Author’s Assessment: Bull Case vs. Bear Case

The Bull Case: Kronos demonstrates that large-scale domain-specific pre-training on financial data extracts genuinely transferable knowledge. The zero-shot performance on unseen markets is real — a model that’s never seen a particular futures contract can still generate reasonable volatility forecasts. For new market entry, cross-asset correlation modelling, and synthetic scenario generation, this is genuinely valuable. The synthetic data capability alone could transform backtesting robustness, letting us stress-test strategies against crisis scenarios that occur once every 20 years without waiting for history to repeat.

The Bear Case: The paper benchmarks on MSE and CRPS — statistical metrics, not economic ones. A model that improves next-candle MSE by 5% may have an information coefficient of 0.01 — statistically detectable at 12 billion observations but worthless after bid-ask spreads. More fundamentally, training on 12 billion samples of approximately-IID noise teaches the model the shape of noise, not exploitable alpha. The pre-training captures volatility clustering (a risk characteristic), not conditional mean predictability (an alpha characteristic). GARCH does the former with two parameters and full transparency; Kronos does it with millions of parameters and a black box. Show me a backtest with realistic execution costs before calling this a trading signal.

The Bottom Line: Kronos is a promising research direction, not a production alpha engine. The most defensible near-term value is in synthetic data augmentation for stress testing — a workflow enhancement, not a signal source. Build institutional familiarity, run controlled pilots, but don’t deploy for live trading until someone demonstrates economically exploitable returns after costs. The foundation model paradigm is directionally correct; the empirical evidence for direct alpha generation remains unproven.

Hands-On: Kronos vs GARCH

Let’s test the sidebar’s claim directly. We’ll fit a GARCH(1,1) to the same futures data and compare its volatility forecast to what Chronos produces:

import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
from arch import arch_model
from chronos import ChronosPipeline

# Fetch data
data = yf.download("ES=F", period="1y", progress=False)
returns = data['Close'].pct_change().dropna() * 100

# Split: use 80% for fitting, 20% for testing
split = int(len(returns) * 0.8)
train, test = returns[:split], returns[split:]

# GARCH(1,1) forecast
garch = arch_model(train, vol='Garch', p=1, q=1, dist='normal')
garch_fit = garch.fit(disp='off')
garch_forecast = garch_fit.forecast(horizon=len(test)).variance.iloc[-1].values

# Chronos forecast
pipeline = ChronosPipeline.from_pretrained("amazon/chronos-t5-large", device_map="cuda")
chronos_preds = pipeline.predict(train.values, prediction_length=len(test))
chronos_forecast = np.std(chronos_preds, axis=0) # Volatility as std dev

# MSE comparison
garch_mse = np.mean((garch_forecast - test.values**2)**2)
chronos_mse = np.mean((chronos_forecast - test.values**2)**2)

print(f"GARCH MSE: {garch_mse:.4f}")
print(f"Chronos MSE: {chronos_mse:.4f}")

# Plot
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(test.index, test.values**2, label="Realized", color="black", alpha=0.7)
ax.plot(test.index, garch_forecast, label="GARCH", color="blue")
ax.plot(test.index, chronos_forecast, label="Chronos", color="orange")
ax.set_title("Volatility Forecast: GARCH vs Foundation Model")
ax.legend()
plt.tight_layout()
plt.show()

Volatility Forecast Comparison: GARCH(1,1) vs Chronos Foundation Model

Volatility Forecast Comparison: GARCH(1,1) vs Chronos Foundation Model

The bear case isn’t wrong: GARCH does volatility with 2 interpretable parameters and transparent assumptions. The foundation model uses millions of parameters. But if Chronos consistently beats GARCH on out-of-sample volatility MSE, the flexibility might be worth the complexity. Try running this yourself — the answer depends on the regime.

Intraday Stock Index Forecasting

In a previous post I discussed modelling stock prices processes as Geometric brownian Motion processes:

Understanding Stock Price Range Forecasts

To recap briefly, we assume a process of the form:

Where S0 is the initial stock price at time t = 0.

The mean of such a process is:

and standard deviation:

In the post I showed how to estimate such a process with daily stock prices, using these to provide a forecast range of prices over a one-month horizon. This is potentially useful, for example, in choosing which strikes to select in an option hedge.

Of course, there is nothing to prevent you from using the same technique over different timescales. Here I use the MATH-TWS package to connect Mathematica to the IB TWS platform via the C++ api, to extract intraday prices for the S&P 500 Index at 1-minute intervals. These are used to estimate a short-term GBM process, which provides forecasts of the mean and variance of the index at the 4 PM close.

We capture the data using:

then create a time series of the intraday prices and plot them:

If we want something a little fancier we can create a trading chart, including technical indicators of our choice, for instance:

The charts can be updated in real time from IB, using MATHTWS.

From there we estimate a GBM process using 1-minute close prices:

and then simulate a number of price paths towards the 4 PM close (the mean price path is shown in black):

This indicates that the expected value of the SPX index at the close will be around 4450, which we could estimate directly from:

Where u is the estimated drift of the GBM process.

Similarly we can look at the projected terminal distribution of the index at 4pm to get a sense of the likely range of closing prices, which may assist a decision to open or close certain option (hedge) positions:

Of course, all this is predicated on the underlying process continuing on its current trajectory, with drift and standard deviation close to those seen in the process in the preceding time interval. But trends change, as do volatilities, which means that our forecasts may be inaccurate. Furthermore, the drift in asset processes tends to be dominated by volatility, especially at short time horizons.

So the best way to think of this is as a conditional expectation, i.e. “If the stock price continues on its current trajectory, then our expectation is that the closing price will be in the following range…”.

For more on MATH-TWS see:

MATH-TWS: Connecting Wolfram Mathematica to IB TWS

Measuring Toxic Flow for Trading & Risk Management

A common theme of microstructure modeling is that trade flow is often predictive of market direction.  One concept in particular that has gained traction is flow toxicity, i.e. flow where resting orders tend to be filled more quickly than expected, while aggressive orders rarely get filled at all, due to the participation of informed traders trading against uninformed traders.  The fundamental insight from microstructure research is that the order arrival process is informative of subsequent price moves in general and toxic flow in particular.  This is turn has led researchers to try to measure the probability of informed trading  (PIN).  One recent attempt to model flow toxicity, the Volume-Synchronized Probability of Informed Trading (VPIN)metric, seeks to estimate PIN based on volume imbalance and trade intensity.  A major advantage of this approach is that it does not require the estimation of unobservable parameters and, additionally, updating VPIN in trade time rather than clock time improves its predictive power.  VPIN has potential applications both in high frequency trading strategies, but also in risk management, since highly toxic flow is likely to lead to the withdrawal of liquidity providers, setting up the conditions for a flash-crash” type of market breakdown.

The procedure for estimating VPIN is as follows.  We begin by grouping sequential trades into equal volume buckets of size V.  If the last trade needed to complete a bucket was for a size greater than needed, the excess size is given to the next bucket.  Then we classify trades within each bucket into two volume groups:  Buys (V(t)B) and Sells (V(t)S), with V = V(t)B + V(t)S
The Volume-Synchronized Probability of Informed Trading is then derived as:

risk management

Typically one might choose to estimate VPIN using a moving average over n buckets, with n being in the range of 50 to 100.

Another related statistic of interest is the single-period signed VPIN. This will take a value of between -1 and =1, depending on the proportion of buying to selling during a single period t.

Toxic Flow

Fig 1. Single-Period Signed VPIN for the ES Futures Contract

It turns out that quote revisions condition strongly on the signed VPIN. For example, in tests of the ES futures contract, we found that the change in the midprice from one volume bucket the next  was highly correlated to the prior bucket’s signed VPIN, with a coefficient of 0.5.  In other words, market participants offering liquidity will adjust their quotes in a way that directly reflects the direction and intensity of toxic flow, which is perhaps hardly surprising.

Of greater interest is the finding that there is a small but statistically significant dependency of price changes, as measured by first buy (sell) trade price to last sell (buy) trade price, on the prior period’s signed VPIN.  The correlation is positive, meaning that strongly toxic flow in one direction has a tendency  to push prices in the same direction during the subsequent period. Moreover, the single period signed VPIN turns out to be somewhat predictable, since its autocorrelations are statistically significant at two or more lags.  A simple linear auto-regression ARMMA(2,1) model produces an R-square of around 7%, which is small, but statistically significant.

A more useful model, however , can be constructed by introducing the idea of Markov states and allowing the regression model to assume different parameter values (and error variances) in each state.  In the Markov-state framework, the system transitions from one state to another with conditional probabilities that are estimated in the model.

SSALGOTRADING AD

An example of such a model  for the signed VPIN in ES is shown below. Note that the model R-square is over 27%, around 4x larger than for a standard linear ARMA model.

We can describe the regime-switching model in the following terms.  In the regime 1 state  the model has two significant autoregressive terms and one significant moving average term (ARMA(2,1)).  The AR1 term is large and positive, suggesting that trends in VPIN tend to be reinforced from one period to the next. In other words, this is a momentum state. In the regime 2 state the AR2 term is not significant and the AR1 term is large and negative, suggesting that changes in VPIN in one period tend to be reversed in the following period, i.e. this is a mean-reversion state.

The state transition probabilities indicate that the system is in mean-reversion mode for the majority of the time, approximately around 2 periods out of 3.  During these periods, excessive flow in one direction during one period tends to be corrected in the
ensuring period.  But in the less frequently occurring state 1, excess flow in one direction tends to produce even more flow in the same direction in the following period.  This first state, then, may be regarded as the regime characterized by toxic flow.

Markov State Regime-Switching Model

Markov Transition Probabilities

P(.|1)       P(.|2)

P(1|.)        0.54916      0.27782

P(2|.)       0.45084      0.7221

Regime 1:

AR1           1.35502    0.02657   50.998        0

AR2         -0.33687    0.02354   -14.311        0

MA1          0.83662    0.01679   49.828        0

Error Variance^(1/2)           0.36294     0.0058

Regime 2:

AR1      -0.68268    0.08479    -8.051        0

AR2       0.00548    0.01854    0.296    0.767

MA1     -0.70513    0.08436    -8.359        0

Error Variance^(1/2)           0.42281     0.0016

Log Likelihood = -33390.6

Schwarz Criterion = -33445.7

Hannan-Quinn Criterion = -33414.6

Akaike Criterion = -33400.6

Sum of Squares = 8955.38

R-Squared =  0.2753

R-Bar-Squared =  0.2752

Residual SD =  0.3847

Residual Skewness = -0.0194

Residual Kurtosis =  2.5332

Jarque-Bera Test = 553.472     {0}

Box-Pierce (residuals):         Q(9) = 13.9395 {0.124}

Box-Pierce (squared residuals): Q(12) = 743.161     {0}

 

A Simple Trading Strategy

One way to try to monetize the predictability of the VPIN model is to use the forecasts to take directional positions in the ES
contract.  In this simple simulation we assume that we enter a long (short) position at the first buy (sell) price if the forecast VPIN exceeds some threshold value 0.1  (-0.1).  The simulation assumes that we exit the position at the end of the current volume bucket, at the last sell (buy) trade price in the bucket.

This simple strategy made 1024 trades over a 5-day period from 8/8 to 8/14, 90% of which were profitable, for a total of $7,675 – i.e. around ½ tick per trade.

The simulation is, of course, unrealistically simplistic, but it does give an indication of the prospects for  more realistic version of the strategy in which, for example, we might rest an order on one side of the book, depending on our VPIN forecast.

informed trading

Figure 2 – Cumulative Trade PL

References

Easley, D., Lopez de Prado, M., O’Hara, M., Flow Toxicity and Volatility in a High frequency World, Johnson School Research paper Series # 09-2011, 2011

Easley, D. and M. O‟Hara (1987), “Price, Trade Size, and Information in Securities Markets”, Journal of Financial Economics, 19.

Easley, D. and M. O‟Hara (1992a), “Adverse Selection and Large Trade Volume: The Implications for Market Efficiency”,
Journal of Financial and Quantitative Analysis, 27(2), June, 185-208.

Easley, D. and M. O‟Hara (1992b), “Time and the process of security price adjustment”, Journal of Finance, 47, 576-605.

 

Forecasting Financial Markets – Part 1: Time Series Analysis

The presentation in this post covers a number of important topics in forecasting, including:

  • Stationary processes and random walks
  • Unit roots and autocorrelation
  • ARMA models
  • Seasonality
  • Model testing
  • Forecasting
  • Dickey-Fuller and Phillips-Perron tests for unit roots

Also included are a number of detailed worked examples, including:

  1. ARMA Modeling
  2. Box Jenkins methodology
  3. Modeling the US Wholesale Price Index
  4. Pesaran & Timmermann study of excess equity returns
  5. Purchasing Power Parity

 

Forecasting 2011 - Time Series

 

Regime-Switching & Market State Modeling

The Excel workbook referred to in this post can be downloaded here.

Market state models are amongst the most useful analytical techniques that can be helpful in developing alpha-signal generators.  That term covers a great deal of ground, with ideas drawn from statistics, econometrics, physics and bioinformatics.  The purpose of this short note is to provide an introduction to some of the key ideas and suggest ways in which they might usefully applied in the context of researching and developing trading systems.

Although they come from different origins, the concepts presented here share common foundational principles:

  1. Markets operate in different states that may be characterized by various measures (volatility, correlation, microstructure, etc);
  2. Alpha signals can be generated more effectively by developing models that are adapted to take account of different market regimes;
  3. Alpha signals may be combined together effectively by taking account of the various states that a market may be in.

Market state models have shown great promise is a variety of applications within the field of applied econometrics in finance, not only for price and market direction forecasting, but also basis trading, index arbitrage, statistical arbitrage, portfolio construction, capital allocation and risk management.

REGIME SWITCHING MODELS

These are econometric models which seek to use statistical techniques to characterize market states in terms of different estimates of the parameters of some underlying linear model.  This is accompanied by a transition matrix which estimates the probability of moving from one state to another.

To illustrate this approach I have constructed a simple example, given in the accompanying Excel workbook.  In this model the market operates as follows:

econometric Where

Yt is a variable of interest (e.g. the return in an asset over the next period t) 

et is an error process with constant variance s2 

S is the market state, with two regimes (S=1 or S=2) 

a0 is the drift in the asset process 

a1 is an autoregressive term, by which the return in the current period is dependent on the prior period return 

b1 is a moving average term, which smoothes the error process 

 This is one of the simplest possible structures, which in more general form can include multiple states, and independent regressions Xi as explanatory variables (such as book pressure, order flow, etc):

econometric

 

SSALGOTRADING AD

The form of the error process et may also be dependent on the market state.  It may simply be that, as in this example, the standard deviation of the error process changes from state to state.  But the changes can also be much more complex:  for instance, the error process may be non-Gaussian, or it may follow a formulation from the GARCH framework.

In this example the state parameters are as follows:

Reg1 Reg 2
s 0.01 0.02
a0 0.005 -0.015
a1 0.40 0.70
b1 0.10 0.20

What this means is that, in the first state the market tends to trend upwards with relatively low volatility.  In the second state, not only is market volatility much higher, but also the trend is 3x as large in the negative direction.

I have specified the following state transition matrix:

Reg1 Reg2
Reg1 0.85 0.15
Reg2 0.90 0.10

This is interpreted as follows:  if the market is in State 1, it will tend to remain in that state 85% of the time, transitioning to State 2 15% of the time.  Once in State 2, the market tends to revert to State 1 very quickly, with 90% probability.  So the system is in State 1 most of the time, trending slowly upwards with low volatility and occasionally flipping into an aggressively downward trending phase with much higher volatility.

The Generate sheet in the Excel workbook shows how observations are generated from this process, from which we select a single instance of 3,000 observations, shown in sheet named Sample.

The sample looks like this:

 

Market state 
 
 

 As anticipated, the market is in State 1 most of the time, occasionally flipping into State 2 for brief periods.

Market state 

 It is well-known that in financial markets we are typically dealing with highly non-Gaussian distributions.  Non-Normality can arise for a number of reasons, including changes in regimes, as illustrated here.  It is worth noting that, even though in this example the process in either market state follows a Gaussian distribution, the combined process is distinctly non-Gaussian in form, having (extremely) fat tails, as shown by the QQ-plot below.

 

 Market state

If we attempt to fit a standard ARMA model to the process, the outcome is very disappointing in terms of the model’s poor explanatory power (R2 0.5%) and lack of fit in the squared-residuals:

 

 

ARIMA(1,0,1)

         Estimate  Std. Err.   t Ratio  p-Value

Intercept                      0.00037    0.00032     1.164    0.244

AR1                            0.57261     0.1697     3.374    0.001

MA1                           -0.63292    0.16163    -3.916        0

Error Variance^(1/2)           0.02015     0.0004    ——   ——

                       Log Likelihood = 7451.96

                    Schwarz Criterion = 7435.95

               Hannan-Quinn Criterion = 7443.64

                     Akaike Criterion = 7447.96

                       Sum of Squares =  1.2172

                            R-Squared =  0.0054

                        R-Bar-Squared =  0.0044

                          Residual SD =  0.0202

                    Residual Skewness = -2.1345

                    Residual Kurtosis =  5.7279

                     Jarque-Bera Test = 3206.15     {0}

Box-Pierce (residuals):         Q(48) = 59.9785 {0.115}

Box-Pierce (squared residuals): Q(50) = 78.2253 {0.007}

              Durbin Watson Statistic = 2.01392

                    KPSS test of I(0) =  0.2001    {<1} *

                 Lo’s RS test of I(0) =  1.2259  {<0.5} *

Nyblom-Hansen Stability Test:  NH(4)  =  0.5275    {<1}

MA form is 1 + a_1 L +…+ a_q L^q.

Covariance matrix from robust formula.

* KPSS, RS bandwidth = 0.

Parzen HAC kernel with Newey-West plug-in bandwidth.

 

 

However, if we keep the same simple form of ARMA(1,1) model, but allow for the possibility of a two-state Markov process, the picture alters dramatically:  now the model is able to account for 98% of the variation in the process, as shown below.

 

Notice that we have succeeded in estimating the correct underlying transition probabilities, and how the ARMA model parameters change from regime to regime much as they should (small positive drift in one regime, large negative drift in the second, etc).

 

Markov Transition Probabilities

                    P(.|1)       P(.|2)

P(1|.)            0.080265      0.14613

P(2|.)             0.91973      0.85387

 

                              Estimate  Std. Err.   t Ratio  p-Value

Logistic, t(1,1)              -2.43875     0.1821    ——   ——

Logistic, t(1,2)              -1.76531     0.0558    ——   ——

Non-switching parameters shown as Regime 1.

 

Regime 1:

Intercept                     -0.05615    0.00315   -17.826        0

AR1                            0.70864    0.16008     4.427        0

MA1                           -0.67382    0.16787    -4.014        0

Error Variance^(1/2)           0.00244     0.0001    ——   ——

 

Regime 2:

Intercept                      0.00838     2e-005   419.246        0

AR1                            0.26716    0.08347     3.201    0.001

MA1                           -0.26592    0.08339    -3.189    0.001

 

                       Log Likelihood = 12593.3

                    Schwarz Criterion = 12557.2

               Hannan-Quinn Criterion = 12574.5

                     Akaike Criterion = 12584.3

                       Sum of Squares =  0.0178

                            R-Squared =  0.9854

                        R-Bar-Squared =  0.9854

                          Residual SD =  0.002

                    Residual Skewness = -0.0483

                    Residual Kurtosis = 13.8765

                     Jarque-Bera Test = 14778.5     {0}

Box-Pierce (residuals):         Q(48) = 379.511     {0}

Box-Pierce (squared residuals): Q(50) = 36.8248 {0.917}

              Durbin Watson Statistic = 1.50589

                    KPSS test of I(0) =  0.2332    {<1} *

                 Lo’s RS test of I(0) =  2.1352 {<0.005} *

Nyblom-Hansen Stability Test:  NH(9)  =  0.8396    {<1}

MA form is 1 + a_1 L +…+ a_q L^q.

Covariance matrix from robust formula.

* KPSS, RS bandwidth = 0.

Parzen HAC kernel with Newey-West plug-in bandwidth.

regime switching

There are a variety of types of regime switching mechanisms we can use in state models:

 

Hamiltonian – the simplest, where the process mean and variance vary from state to state

Markovian – the approach used here, with state transition matrix

Explained Switching – where the process changes state as a result of the influence of some underlying variable (such as interest rate volatility, for example)

Smooth Transition – comparable to explained Markov switching, but without and explicitly probabilistic interpretation.

 

 

This example is both rather simplistic and pathological at the same time:  the states are well-separated , by design, whereas for real processes they tend to be much harder to distinguish.  A difficulty of this methodology is that the models can be very difficult to estimate.  The likelihood function tends to be very flat and there are a great many local maxima that give similar fit, but with widely varying model forms and parameter estimates.  That said, this is a very rich class of models with a great many potential applications.

Resources for Quantitative Analysts

Two of the smartest econometricians I know are Prof. Stephen Taylor of Lancaster University, and Prof. James Davidson of Exeter University.

I recall spending many profitable hours in the 1980’s with Stephen’s book Modelling Financial Time Series, which I am pleased to see has now been reprinted in a second edition.  For a long time this was the best available book on the topic and it remains a classic. It has been surpassed by very few books, one being Stephen’s later work Asset Price Dynamics, Volatility and Prediction.  This is a superb exposition, one that will repay close study.

James Davidson is one of the smartest minds in econometrics. Not only is his research of the highest caliber, he has somehow managed (in his spare time!) to develop one of the most advanced econometrics packages available.  Based on Jurgen Doornik’s Ox programming system, the Time Series Modelling package covers almost every conceivable model type, including regression models, ARIMA, ARFIMA and other single equation models, systems of equations, panel data models, GARCH and other heteroscedastic models and regime switching models, accompanied by very comprehensive statistical testing capabilities.  Furthermore, TSM is very well documented and despite being arguably the most advanced system of its kind it is inexpensive relative to alternatives.  James’s research output is voluminous and often highly complex.  His book, Econometric Theory, is an excellent guide to the state of the art, but not for the novice (or the faint hearted!).

Those looking for a kinder, gentler introduction to econometrics would do well to acquire a copy of Prof. Chris Brooks’s Introductory Econometrics for Finance. This covers most of the key ideas, from regression, through ARMA, GARCH, panel data models, cointegration, regime switching and volatility modeling.  Not only is the coverage comprehensive, Chris’s explanation of the concepts is delightfully clear and illustrated with interesting case studies which he analyzes using the EViews econometrics package.    Although not as advanced as TSM, EViews has everything that most quantitative analysts are likely to require in a modeling system and is very well suited to Chris’s teaching style.  Chris’s research output is enormous and covers a great many topics of interest to financial market analysts, in the same lucid style.

Can Machine Learning Techniques Be Used To Predict Market Direction? The 1,000,000 Model Test.

During the 1990’s the advent of Neural Networks unleashed a torrent of research on their applications in financial markets, accompanied by some rather extravagant claims about their predicative abilities.  Sadly, much of the research proved to be sub-standard and the results illusionary, following which the topic was largely relegated to the bleachers, at least in the field of financial market research.

With the advent of new machine learning techniques such as Random Forests, Support Vector Machines and Nearest Neighbor Classification, there has been a resurgence of interest in non-linear modeling techniques and a flood of new research, a fair amount of it supportive of their potential for forecasting financial markets.  Once again, however, doubts about the quality of some of the research bring the results into question.

SSALGOTRADING AD

Against this background I and my co-researcher Dan Rico set out to address the question of whether these new techniques really do have predicative power, more specifically the ability to forecast market direction.  Using some excellent MatLab toolboxes and a new software package, an Excel Addin called 11Ants, that makes large scale testing of multiple models a snap, we examined over 1,000,000 models and model-ensembles, covering just about every available non-linear technique.  The data set for our study comprised daily prices for a selection of US equity securities, together with a large selection of technical indicators for which some other researchers have claimed explanatory power.

In-Sample Equity Curve for Best Performing Nonlinear Model
In-Sample Equity Curve for Best Performing Nonlinear Model

The answer provided by our research was, without exception, in the negative: not one of the models tested showed any significant ability to predict the direction of any of the securities in our data set.  Furthermore, our study found that the best-performing models favored raw price data over technical indicator variables, suggesting that the latter have little explanatory power.

As with Neural Networks, the principal difficulty with non-linear techniques appears to be curve-fitting and a failure to generalize:  while it is very easy to find models that provide an excellent fit to in-sample data, the forecasting performance out-of-sample is often very poor.

Out-of-Sample Equity Curve for Best Performing Nonlinear Model
Out-of-Sample Equity Curve for Best Performing Nonlinear Model

Some caveats about our own research apply.  First and foremost, it is of course impossible to prove a hypothesis in the negative.  Secondly, it is plausible that some markets are less efficient than others:  some studies have claimed success in developing predictive models due to the (relative) inefficiency of the F/X and futures markets, for example.  Thirdly, the choice of sample period may be criticized:  it could be that the models were over-conditioned on a too- lengthy in-sample data set, which in one case ran from 1993 to 2008, with just two years (2009-2010) of out-of-sample data.  The choice of sample was deliberate, however:  had we omitted the 2008 period from the “learning” data set, it would be very easy to criticize the study for failing to allow the algorithms to learn about the exceptional behavior of the markets during that turbulent year.

Despite these limitations, our research casts doubt on the findings of some less-extensive studies, that may be the result of sample-selection bias.  One characteristic of the most credible studies finding evidence in favor of market predictability, such as those by Pesaran and Timmermann, for instance (see paper for citations), is that the models they employ tend to incorporate independent explanatory variables, such as yield spreads, which do appear to have real explanatory power.  The finding of our study suggest that, absent such explanatory factors, the ability to predict markets using sophisticated non-linear techniques applied to price data alone may prove to be as illusionary as it was in the 1990’s.

 

ONE MILLION MODELS

Long Memory and Regime Shifts in Asset Volatility

This post covers quite a wide range of concepts in volatility modeling relating to long memory and regime shifts and is based on an article that was published in Wilmott magazine and republished in The Best of Wilmott Vol 1 in 2005.  A copy of the article can be downloaded here.

One of the defining characteristics of volatility processes in general (not just financial assets) is the tendency for the serial autocorrelations to decline very slowly.  This effect is illustrated quite clearly in the chart below, which maps the autocorrelations in the volatility processes of several financial assets.

Thus we can say that events in the volatility process for IBM, for instance, continue to exert influence on the process almost two years later.

This feature in one that is typical of a black noise process – not some kind of rap music variant, but rather:

“a process with a 1/fβ spectrum, where β > 2 (Manfred Schroeder, “Fractalschaos, power laws“). Used in modeling various environmental processes. Is said to be a characteristic of “natural and unnatural catastrophes like floods, droughts, bear markets, and various outrageous outages, such as those of electrical power.” Further, “because of their black spectra, such disasters often come in clusters.”” [Wikipedia].

Because of these autocorrelations, black noise processes tend to reinforce or trend, and hence (to some degree) may be forecastable.  This contrasts with a white noise process, such as an asset return process, which has a uniform power spectrum, insignificant serial autocorrelations and no discernable trending behavior:

White Noise Power Spectrum
White Noise Power Spectrum

An econometrician might describe this situation by saying that a  black noise process is fractionally integrated order d, where d = H/2, H being the Hurst Exponent.  A way to appreciate the difference in the behavior of a black noise process vs. a white process is by comparing two fractionally integrated random walks generated using the same set of quasi random numbers by Feder’s (1988) algorithm (see p 32 of the presentation on Modeling Asset Volatility).

Fractal Random Walk - White Noise
Fractal Random Walk – White Noise

Fractal Random Walk - Black Noise Process
Fractal Random Walk – Black Noise Process

As you can see. both random walks follow a similar pattern, but the black noise random walk is much smoother, and the downward trend is more clearly discernible.  You can play around with the Feder algorithm, which is coded in the accompanying Excel Workbook on Volatility and Nonlinear Dynamics .  Changing the Hurst Exponent parameter H in the worksheet will rerun the algorithm and illustrate a fractal random walk for a black noise (H > 0.5), white noise (H=0.5) and mean-reverting, pink noise (H<0.5) process.

One way of modeling the kind of behavior demonstrated by volatility process is by using long memory models such as ARFIMA and FIGARCH (see pp 47-62 of the Modeling Asset Volatility presentation for a discussion and comparison of various long memory models).  The article reviews research into long memory behavior and various techniques for estimating long memory models and the coefficient of fractional integration d for a process.

SSALGOTRADING AD

But long memory is not the only possible cause of long term serial correlation.  The same effect can result from structural breaks in the process, which can produce spurious autocorrelations.  The article goes on to review some of the statistical procedures that have been developed to detect regime shifts, due to Bai (1997), Bai and Perron (1998) and the Iterative Cumulative Sums of Squares methodology due to Aggarwal, Inclan and Leal (1999).  The article illustrates how the ICSS technique accurately identifies two changes of regimes in a synthetic GBM process.

In general, I have found the ICSS test to be a simple and highly informative means of gaining insight about a process representing an individual asset, or indeed an entire market.  For example, ICSS detects regime shifts in the process for IBM around 1984 (the time of the introduction of the IBM PC), the automotive industry in the early 1980’s (Chrysler bailout), the banking sector in the late 1980’s (Latin American debt crisis), Asian sector indices in Q3 1997, the S&P 500 index in April 2000 and just about every market imaginable during the 2008 credit crisis.  By splitting a series into pre- and post-regime shift sub-series and examining each segment for long memory effects, one can determine the cause of autocorrelations in the process.  In some cases, Asian equity indices being one example, long memory effects disappear from the series, indicating that spurious autocorrelations were induced by a major regime shift during the 1997 Asian crisis. In most cases, however, long memory effects persist.

Excel Workbook on Volatility and Nonlinear Dynamics 

There are several other topics from chaos theory and nonlinear dynamics covered in the workbook, including:

More on these issues in due course.

Market Timing in the S&P 500 Index Using Volatility Forecasts

There has been a good deal of interest in the market timing ideas discussed in my earlier blog post Using Volatility to Predict Market Direction, which discusses the research of Diebold and Christoffersen into the sign predictability induced by volatility dynamics.  The ideas are thoroughly explored in a QuantNotes article from 2006, which you can download here.

There is a follow-up article from 2006 in which Christoffersen, Diebold, Mariano and Tay develop the ideas further to consider the impact of higher moments of the asset return distribution on sign predictability and the potential for market timing in international markets (download here).

Trading Strategy
To illustrate some of the possibilities of this approach, we constructed a simple market timing strategy in which a position was taken in the S&P 500 index or in 90-Day T-Bills, depending on an ex-ante forecast of positive returns from the logit regression model (and using an expanding window to estimate the drift coefficient).  We assume that the position is held for 30 days and rebalanced at the end of each period.  In this test we make no allowance for market impact, or transaction costs.

Results
Annual returns for the strategy and for the benchmark S&P 500 Index are shown in the figure below.  The strategy performs exceptionally well in 1987, 1989 and 1995, when the ratio between expected returns and volatility remains close to optimum levels and the direction of the S&P 500 Index is highly predictable,  Of equal interest is that the strategy largely avoids the market downturn of 2000-2002 altogether, a period in which sign probabilities were exceptionally low.

SSALGOTRADING AD

In terms of overall performance, the model enters the market in 113 out of a total of 241 months (47%) and is profitable in 78 of them (69%).  The average gain is 7.5% vs. an average loss of –4.11% (ratio 1.83).  The compound annual return is 22.63%, with an annual volatility of 17.68%, alpha of 14.9% and Sharpe ratio of 1.10.

The under-performance of the strategy in 2003 is explained by the fact that direction-of-change probabilities were rising from a very low base in Q4 2002 and do not reach trigger levels until the end of the year.  Even though the strategy out-performed the Index by a substantial margin of 6% , the performance in 2005 is of concern as market volatility was very low and probabilities overall were on a par with those seen in 1995.  Further tests are required to determine whether the failure of the strategy to produce an exceptional performance on par with 1995 was the result of normal statistical variation or due to changes in the underlying structure of the process requiring model recalibration.

Future Research & Development
The obvious next step is to develop the approach described above to formulate trading strategies based on sign forecasting in a universe of several assets, possibly trading binary options.  The approach also has potential for asset allocation, portfolio theory and risk management applications.

Market Timing in the S&amp;P500 Index
Market Timing in the S&P500 Index