From Entities to Alphas: Launching the Python Version of the Equities Entity Store

Introduction

When we launched the Equities Entity Store in Mathematica, it revolutionized how financial professionals interact with market data by bringing semantic structure, rich metadata, and analysis-ready information into a unified framework. Mathematica’s EntityStore provided an elegant way to explore equities, ETFs, indices, and factor models through a symbolic interface. However, the industry landscape has evolved—the majority of quantitative finance, data science, and machine learning now thrives in Python.

While platforms like FactSet, WRDS, and Bloomberg provide extensive financial data, quantitative researchers still spend up to 80% of their time wrangling data rather than building models. Current workflows often involve downloading CSV files, manually cleaning them in pandas, and stitching together inconsistent time series—all while attempting to avoid subtle lookahead bias that invalidates backtests.

Recognizing these challenges, we’ve reimagined the Equities Entity Store for Python, focusing first on what the Python ecosystem does best: scalable machine learning and robust data analysis.

The Python Version: What’s New

Rather than beginning with metadata-rich entity hierarchies, the Python Equities Entity Store prioritizes the intersection of high-quality data and predictive modeling capabilities. At its foundation lies a comprehensive HDF5 dataset containing over 1,400 features for 7,500 stocks, measured monthly from 1995 to 2025—creating an extensive cross-sectional dataset optimized for sophisticated ML applications.

Our lightweight, purpose-built package includes specialized modules for:

Feature loading: Efficient extraction and manipulation of data from the HDF5 store
Feature preprocessing: Comprehensive tools for winsorization, z-scoring, neutralization, and other essential transformations
Label construction: Flexible creation of target variables, including 1-month forward information ratio
Ranking models: Advanced implementations including LambdaMART and other gradient-boosted tree approaches
Portfolio construction: Sophisticated tools for converting model outputs into actionable investment strategies
Backtesting and evaluation: Rigorous performance assessment across multiple metrics

Guaranteed Protection Against Lookahead Bias

A critical advantage of our Python Equities Entity Store implementation is its robust safeguards against lookahead bias—a common pitfall that compromises the validity of backtests and predictive models. Modern ML preprocessing pipelines often inadvertently introduce information from the future into training data, leading to unrealistic performance expectations.

Unlike platforms such as QuantConnect, Zipline, or even custom research environments that require careful manual controls, our system integrates lookahead protection at the architectural level:

# Example: Time-aware feature standardization with strict temporal boundaries
from equityentity.features.preprocess import TimeAwareStandardizer

# This standardizer only uses data available up to each point in time
standardizer = TimeAwareStandardizer(lookback_window='60M')
zscore_features = standardizer.fit_transform(raw_features)

# Instead of the typical approach that inadvertently leaks future data:

# DON'T DO THIS: sklearn.preprocessing.StandardScaler().fit_transform(raw_features)

Multiple safeguards are integrated throughout the system:

Time-aware preprocessing: All transformations (normalization, imputation, feature engineering) strictly respect temporal boundaries
Point-in-time data snapshots: Features reflect only information available at the decision point
New listing delay: Stocks are only included after a customizable delay period from their first trading date

# From our data_loader.py - IPO bias protection through months_delay

for i, symbol in enumerate(symbols):
    first_date = universe_df[universe_df["Symbol"] == symbol]["FirstDate"].iloc[0]
    delay_end = first_date + pd.offsets.MonthEnd(self.months_delay)
    valid_mask[:, i] = dates_pd > delay_end

Versioned historical data: Our HDF5 store maintains proper vintages to reflect real-world information availability
Pipeline validation tools: Built-in checks flag potential lookahead violations during model development

While platforms like Numerai provide pre-processed features to prevent lookahead, they limit you to their feature set. EES gives you the same guarantees while allowing complete flexibility in feature engineering—all with verification tools to validate your pipeline’s temporal integrity.

Application: Alpha from Feature Ranking

As a proof of concept, we’ve implemented a sophisticated stock ranking system using the LambdaMART algorithm, applied to a universe of current and former components of the S&P 500 Index.. The target label is the 1-month information ratio (IR_1m), constructed as:

IR_1m = (r_i,t+1 – r_benchmark,t+1) / σ(r_i – r_benchmark)

Where r_i,t+1 is the forward 1-month return of stock i, r_benchmark is the corresponding sector benchmark return, and σ is the tracking error.

Using the model’s predicted rank scores, we form decile portfolios rebalanced monthly over a 25-year period (2000-2025), with an average turnover of 66% per month.

The top decile (Decile 10) portfolio demonstrates a Sharpe Ratio of approximately 0.8 with an annualized return of 17.8%—impressive performance that validates our approach. As shown in the cumulative return chart, performance remained consistent across different market regimes, including the 2008 financial crisis, the 2020 pandemic crash, and subsequent recovery periods.

Risk-adjusted performance increases across the decile portfolios, indicating that the selected factors appear to provide real explanatory power:

Looking at the feature importance chart, the most significant features include:

Technical features:
- Volatility metrics dominate with “Volatility_ZScore” being the most important feature by a wide margin
- “Mu_1m_ZScore” (1-month return z-score)
- “relPriceAverage_3m_ZScore” (3-month relative price average)
- “Convexity_3m_ZScore” (price path convexity over 3 months)
Fundamental features:
- “PB_RMW_60m” (Price-to-Book adjusted for profitability over 60 months)
Interaction terms
- “CAGR_60m_ROCE” (compound annual growth rate combined with return on capital employed)
- ProfitFactor_60m_CAGR_60m” (interaction between profit factor and growth)
Cross-sectional features:
- “CalmarRatio_6m_ZScore” (risk-adjusted return metric)
- “Volatility_GICSSectorPctRank” (sector-normalized volatility percentile rank)

Our model was trained on data from 1995-1999 and validated on an independent holdout set before final out-of-sample testing from 2000-2025, in which the model is updated every 60 months.

This rigorous approach to validation ensures that our performance metrics reflect realistic expectations rather than in-sample overfitting.

This diverse feature set confirms that durable alpha generation requires the integration of multiple orthogonal signals unified under a common ranking framework—precisely what our Python Equities Entity Store facilitates. The dominance of volatility-related features suggests that risk management is a critical component of the model’s predictive power.

Package Structure and Implementation

The Python EES is organized as follows:

equityentity/

├── __init__.py

├── features/

│ ├── loader.py # Load features from HDF5

│ ├── preprocess.py # Standardization, neutralization, filtering

│ └── labels.py # Target generation (e.g., IR@1m)

├── models/

│ └── ranker.py # LambdaMART, LightGBM ranking models

├── portfolio/

│ └── constructor.py # Create portfolios from rank scores

├── backtest/

│ └── evaluator.py # Sharpe, IR, turnover, hit rate

└── entity/ # Optional metadata (JSON to dataclass)

├── equity.py

├── etf.py

└── index.py

Code Example: Ranking Model Training

Here’s how the ranking model module works, leveraging LightGBM’s LambdaMART implementation:

class RankModel:

    def __init__(self, max_depth=4, num_leaves=32, learning_rate=0.1, n_estimators=500,
use_gpu=True, feature_names=None):

        self.params = {
            "objective": "lambdarank",
            "max_depth": max_depth,
            "num_leaves": num_leaves,
            "learning_rate": learning_rate,
            "n_estimators": n_estimators,
            "device": "gpu" if use_gpu else "cpu",
            "verbose": -1,
            "max_position": 50
        }

        self.model = None
        self.feature_names = feature_names if feature_names is not None else []

    def train(self, features, labels):

        # Reshape features and labels for LambdaMART format
        n_months, n_stocks, n_features = features.shape
        X = features.reshape(-1, n_features)
        y = labels.reshape(-1)
        group = [n_stocks] * n_months
        train_data = lgb.Dataset(X, label=y, group=group, feature_name=self.feature_names)
        self.model = lgb.train(self.params, train_data)

Portfolio Construction

The system seamlessly transitions from predictive scores to portfolio allocation with built-in transaction cost modeling:

# Portfolio construction with transaction cost awareness

def construct_portfolios(self):

    n_months, n_stocks = self.pred_scores.shape

    for t in range(n_months):

        # Get predictions and forward returns
        scores = self.pred_scores[t]
        returns_t = self.returns[min(t + 1, n_months - 1)]

        # Select top and bottom deciles
        sorted_idx = np.argsort(scores)
        long_idx = sorted_idx[-n_decile:]
        short_idx = sorted_idx[:n_decile]

        # Calculate transaction costs from portfolio turnover
        curr_long_symbols = set(symbols_t[long_idx])
        curr_short_symbols = set(symbols_t[short_idx])
        long_trades = len(curr_long_symbols.symmetric_difference(self.prev_long_symbols))
        short_trades = len(curr_short_symbols.symmetric_difference(self.prev_short_symbols))

        tx_cost_long = self.tx_cost * long_trades
        tx_cost_short = self.tx_cost * short_trades

        # Calculate net returns with costs
        long_ret = long_raw - tx_cost_long
        short_ret = -short_raw - tx_cost_short - self.loan_cost

Complete Workflow Example

The package is designed for intuitive workflows with minimal boilerplate. Here’s how simple it is to get started:

from equityentity.features import FeatureLoader, LabelGenerator
from equityentity.models import LambdaMARTRanker
from equityentity.portfolio import DecilePortfolioConstructor

# Load features with point-in-time awareness
loader = FeatureLoader(hdf5_path='equity_features.h5')
features = loader.load_features(start_date='2010-01-01', end_date='2025-01-01')

# Generate IR_1m labels
label_gen = LabelGenerator(benchmark='sector_returns')
labels = label_gen.create_information_ratio(forward_period='1M')

# Train a ranking model
ranker = LambdaMARTRanker(n_estimators=500, learning_rate=0.05)
ranker.fit(features, labels)

# Create portfolios from predictions
constructor = DecilePortfolioConstructor(rebalance_freq='M')
portfolios = constructor.create_from_scores(ranker.predict(features))

# Evaluate performance
performance = portfolios['decile_10'].evaluate()
print(f"Sharpe Ratio: {performance['sharpe_ratio']:.2f}")
print(f"Information Ratio: {performance['information_ratio']:.2f}")
print(f"Annualized Return: {performance['annualized_return']*100:.1f}%")

The package supports both configuration file-based workflows for production use and interactive Jupyter notebook exploration. Output formats include pandas DataFrames, JSON for web applications, and HDF5 for efficient storage of results.

Why Start with Cross-Sectional ML?

While Mathematica’s EntityStore emphasized symbolic navigation and knowledge representation, Python excels at algorithmic learning and numerical computation at scale. Beginning with the HDF5 dataset enables immediate application by quantitative researchers, ML specialists, and strategy developers interested in:

Exploring sophisticated feature engineering across time horizons and market sectors
Building powerful predictive ranking models with state-of-the-art ML techniques
Constructing long-short portfolios with dynamic scoring mechanisms
Developing robust factor models and alpha signals

And because we’ve already created metadata-rich JSON files for each entity, we can progressively integrate the symbolic structure—creating a hybrid system where machine learning capabilities complement knowledge representation.

Increasingly, quantitative researchers are integrating tools like LangChain, GPT-based agents, and autonomous research pipelines to automate idea generation, feature testing, and code execution. The structured design of the Python Equities Entity Store—with its modularity, metadata integration, and time-consistent features—makes it ideally suited for use as a foundation in LLM-driven quantitative workflows.

Competitive Pricing and Value

While alternative platforms in this space typically come with significant cost barriers, we’ve positioned the Python Equities Entity Store to be accessible to firms of all sizes:

While open-source platforms like QuantConnect, Zipline, and Backtrader provide accessible backtesting environments, they often lack the scale, granularity, and point-in-time feature control required for advanced cross-sectional ML strategies. The Python Equities Entity Store fills this gap—offering industrial-strength data infrastructure, lookahead protection, and extensibility without the steep cost of commercial platforms.

Unlike these competitors that often require multiple subscriptions to achieve similar functionality, Python Equities Entity Store provides an integrated solution at a fraction of the cost. This pricing strategy reflects our commitment to democratizing access to institutional-grade quantitative tools.

Next Steps

We’re excited to announce our roadmap for the Python Equities Entity Store:

July 2025 Release: The official launch of our HDF5-compatible package, complete with:
- Comprehensive documentation and API reference
- Jupyter notebooks demonstrating key workflows from data loading to portfolio construction
- Example strategies showcasing the system’s capabilities across different market regimes
- Performance benchmarks and baseline models with full backtest history
- Python package available via PyPI (pip install equityentity)
- Docker container with pre-loaded example datasets
Q3 2025: Integration of the symbolic entity framework, allowing seamless navigation between quantitative features and qualitative metadata
Q4 2025: Extension to additional asset classes and alternative data sources, expanding the system’s analytical scope
Early 2026: Launch of a cloud-based computational environment for collaboration and strategy sharing

Accessing the Python Equities Entity Store

As a special promotion, existing users of the current Mathematica Equities Entity Store Enterprise Edition will be given free access to the Python version on launch.

So, if you sign up now for the Enterprise Edition you will receive access to both the existing Mathematica version and the new Python version as soon as it is released.

After the launch of the Python Equities Entity Store, each product will be charged individually. So this limited time offer represents a 50% discount.

See our web site for pricing details: https://store.equityanalytics.store/equities-entity-store

Conclusion

By prioritizing scalable feature datasets and sophisticated ranking models, the Python version of the Equities Entity Store positions itself as an indispensable tool for modern equity research. It bridges the gap between raw data and actionable insights, combining the power of machine learning with the structure of knowledge representation.

The Python Equities Entity Store represents a significant step forward in quantitative finance tooling—enabling faster iteration, more robust models, and ultimately, better investment decisions.

April 3, 2025May 2, 2025

Outperforming in Chaos: How Strategic Scenario Portfolios Are Beating the Market in 2025’s Geopolitical Storm

“The first rule of investing isn’t ‘Don’t lose money.’ It’s ‘Recognize when the rules are changing.'”

UPDATE: MAY 1 2025

The February 2025 European semiconductor export restrictions sent markets into a two-day tailspin, wiping $1.3 trillion from global equities. For most investors, it was another stomach-churning reminder of how traditional portfolios falter when geopolitics overwhelms fundamentals.

But for a growing cohort of forward-thinking portfolio managers, it was validation. Their Strategic Scenario Portfolios—deliberately constructed to thrive during specific geopolitical events—delivered positive returns amid the chaos.

I’m not talking about theoretical models. I’m talking about real money, real returns, and a methodology you can implement right now.

What Exactly Is a Strategic Scenario Portfolio?

A Strategic Scenario Portfolio (SSP) is an investment allocation designed to perform robustly during specific high-impact events—like trade wars, sanctions, regional conflicts, or supply chain disruptions.

Unlike conventional approaches that react to crises, SSPs anticipate them. They’re narrative-driven, built around specific, plausible scenarios that could reshape markets. They’re thematically concentrated, focusing on sectors positioned to benefit from that scenario rather than broad diversification. They maintain asymmetric balance, incorporating both downside protection and upside potential. And perhaps most importantly, they’re ready for deployment before markets fully price in the scenario.

Think of SSPs as portfolio “insurance policies” that also have the potential to deliver substantial alpha.

“Why didn’t I know about this before now?” SSPs aren’t new—institutional investors have quietly used similar approaches for decades. What’s new is systematizing this approach for broader application.

Real-World Proof: Two Case Studies That Speak for Themselves

Case Study #1: The 2018-2019 US-China Trade War

When trade tensions escalated in 2018, we constructed the “USChinaTradeWar2018” portfolio with a straightforward mandate: protect capital while capitalizing on trade-induced dislocations.

The portfolio allocated 25% to SPDR Gold Shares (GLD) as a core risk-off hedge. Another 20% went to Consumer Staples (VDC) for defensive positioning, while 15% was invested in Utilities (XLU) for stable returns and low volatility. The remaining 40% was distributed equally among Walmart (WMT), Newmont Mining (NEM), Procter & Gamble (PG), and Industrials (XLI), creating a balanced mix of defensive positioning with selective tactical exposure.

The results were remarkable. From May 2018 to December 2019, this portfolio delivered a total return of 30.2%, substantially outperforming the S&P 500’s 22.0%. More impressive than the returns, however, was the risk profile. The portfolio achieved a Sharpe ratio of 1.8 (compared to the S&P 500’s 0.6), demonstrating superior risk-adjusted performance. Its maximum drawdown was a mere 2.2%, while the S&P 500 experienced a 14.0% drawdown during the same period. With a beta of just 0.26 and alpha of 11.7%, this portfolio demonstrated precisely what SSPs are designed to deliver: outperformance with dramatically reduced correlation to broader market movements.

Note: Past performance is not indicative of future results. Performance calculated using total return with dividends reinvested, compared against S&P 500 total return.

Case Study #2: The 2025 Tariff War Portfolio

Fast forward to January 2025. With new tariffs threatening global trade, we developed the “TariffWar2025” portfolio using a similar strategic framework but adapted to the current environment.

The core of the portfolio (50%) established a defensive foundation across Utilities (XLU), Consumer Staples (XLP), Healthcare (XLV), and Gold (GLD). We allocated 20% toward domestic industrial strength through Industrials (XLI) and Energy (XLE) to capture reshoring benefits and energy independence trends. Another 20% targeted strategic positioning with Lockheed Martin (LMT) benefiting from increased defense spending and Cisco (CSCO) offering exposure to domestic technology infrastructure with limited Chinese supply chain dependencies. The remaining 10% created balanced treasury exposure across long-term (TLT) and short-term (VGSH) treasuries to hedge against both economic slowdown and rising rates.

The results through Q1 2025 have been equally impressive. While the S&P 500 declined 4.6%, the TariffWar2025 portfolio generated a positive 4.3% return. Its Sharpe ratio of 8.4 indicates exceptional risk-adjusted performance, and remarkably, the portfolio experienced zero drawdown during a period when the S&P 500 fell by as much as 7.1%. With a beta of 0.20 and alpha of 31.9%, the portfolio again demonstrated the power of scenario-based investing in navigating geopolitical turbulence.

Note: Past performance is not indicative of future results. Performance calculated using total return with dividends reinvested, compared against S&P 500 total return.

Why Traditional Portfolios Fail When You Need Them Most

Traditional portfolio construction relies heavily on assumptions that often crumble during times of geopolitical stress. Historical correlations, which form the backbone of most diversification strategies, routinely break during crises. Mean-variance optimization, a staple of modern portfolio theory, falters dramatically when markets exhibit non-normal distributions, which is precisely what happens during geopolitical events. And the broad diversification that works so well in normal times often converges in stressed markets, leaving investors exposed just when protection is most needed.

When markets fracture along geopolitical lines, these assumptions collapse spectacularly. Consider the March 2023 banking crisis: correlations between tech stocks and regional banks—historically near zero—suddenly jumped to 0.75. Or recall how in 2022, both stocks AND bonds declined simultaneously, shattering the foundation of 60/40 portfolios.

What geopolitical scenario concerns you most right now, and how is your portfolio positioned for it? This question reveals the central value proposition of Strategic Scenario Portfolios.

Building Your Own Strategic Scenario Portfolio: A Framework for Success

You don’t need a quant team to implement this approach. The framework begins with defining a clear scenario. Rather than vague concerns about “volatility” or “recession,” an effective SSP requires a specific narrative. For example: “Europe imposes carbon border taxes, triggering retaliatory measures from major trading partners.”

From this narrative foundation, you can map the macro implications. Which regions would face the greatest impact? What sectors would benefit or suffer? How might interest rates, currencies, and commodities respond? This mapping process translates your scenario into investment implications.

The next step involves identifying asymmetric opportunities—situations where the market is underpricing both risks and potential benefits related to your scenario. These asymmetries create the potential for alpha generation within your protective framework.

Structure becomes critical at this stage. A typical SSP balances defensive positions (usually 60-75% of the allocation) with opportunity capture (25-40%). This balance ensures capital preservation while maintaining upside potential if your scenario unfolds as anticipated.

Finally, establish monitoring criteria. Define what developments would strengthen or weaken your scenario’s probability, and set clear guidelines for when to increase exposure, reduce positions, or exit entirely.

For those new to this approach, start with a small allocation—perhaps 5-10% of your portfolio—as a satellite to your core holdings. As your confidence or the scenario probability increases, you can scale up exposure accordingly.

Common Questions About Strategic Scenario Portfolios

“Isn’t this just market timing in disguise?” This question arises frequently, but the distinction is important. Market timing attempts to predict overall market movements—when the market will rise or fall. SSPs are fundamentally different. They’re about identifying specific scenarios and their sectoral impacts, regardless of broad market direction. The focus is on relative performance within a defined context, not on predicting market tops and bottoms.

“How do I know when to exit an SSP position?” The key is defining exit criteria in advance. This might include scenario resolution (like a trade agreement being signed), time limits (reviewing the position after a predefined period), or performance thresholds (taking profits or cutting losses at certain levels). Clear exit strategies prevent emotional decision-making when markets become volatile.

“Do SSPs work in all market environments?” This question reveals a misconception about their purpose. SSPs aren’t designed to outperform in all environments. They’re specifically built to excel during their target scenarios, while potentially underperforming in others. That’s why they work best as tactical overlays to core portfolios, rather than as stand-alone investment approaches.

“How many scenarios should I plan for simultaneously?” Start with one or two high-probability, high-impact scenarios. Too many simultaneous SSPs can dilute your strategic focus and create unintended exposures. As you gain comfort with the approach, you can expand your scenario coverage while maintaining portfolio coherence.

Tools for the Forward-Thinking Investor

Implementing SSPs effectively requires both qualitative and quantitative tools. Systems like the Equities Entity Store for MATLAB provide institutional-grade capabilities for modeling multi-asset correlations across different regimes. They enable stress-testing portfolios against specific geopolitical scenarios, optimizing allocations based on scenario probabilities, and tracking exposures to factors that become relevant primarily in crisis periods.

These tools help translate scenario narratives into precise portfolio allocations with targeted risk exposures. While sophisticated analytics enhance the process, the core methodology remains accessible even to investors without advanced quantitative resources.

The Path Forward in a Fractured World

The investment landscape of 2025 is being shaped by forces that traditional models struggle to capture. Deglobalization and reshoring are restructuring supply chains and changing regional economic dependencies. Resource nationalism and energy security concerns are creating new commodity dynamics. Strategic competition between major powers is manifesting in investment restrictions, export controls, and targeted sanctions. Technology fragmentation along geopolitical lines is creating parallel innovation systems with different winners and losers.

In this environment, passive diversification is necessary but insufficient. Strategic Scenario Portfolios provide a disciplined framework for navigating these challenges, protecting capital, and potentially generating significant alpha when markets are most volatile.

The question isn’t whether geopolitical disruptions will continue—they will. The question is whether your portfolio is deliberately designed to withstand them.

Next Steps: Getting Started With SSPs

The journey toward implementing Strategic Scenario Portfolios begins with identifying your most concerning scenario. What geopolitical or policy risk keeps you up at night? Is it escalation in the South China Sea? New climate regulations? Central bank digital currencies upending traditional banking?

Once you’ve identified your scenario, assess your current portfolio’s exposure. Would your existing allocations benefit, suffer, or remain neutral if this scenario materialized? This honest assessment often reveals vulnerabilities that weren’t apparent through traditional risk measures.

Design a prototype SSP focused on your scenario. Start small, perhaps with a paper portfolio that you can monitor without committing capital immediately. Track both the portfolio’s performance and developments related to your scenario, refining your approach as you gain insights.

For many investors, this process benefits from professional guidance. Complex scenario mapping requires a blend of geopolitical insight, economic analysis, and portfolio construction expertise that often exceeds the resources of individual investors or even smaller investment teams.

About the Author: Jonathan Kinlay, PhD is Principal Partner at Golden Bough Partners LLC, a quantitative proprietary trading firm, and managing partner of Intelligent Technologies. With experience as a finance professor at NYU Stern and Carnegie Mellon, he specializes in advanced portfolio construction, algorithmic trading systems, and quantitative risk management. His latest book, “Equity Analytics” (2024), explores modern approaches to market resilience. Jonathan works with select institutional clients and fintech ventures as a strategic advisor, helping them develop robust quantitative frameworks that deliver exceptional risk-adjusted returns. His proprietary trading systems have consistently achieved Sharpe ratios 2-3× industry benchmarks.

📬 Let’s Connect: Have you implemented scenario-based approaches in your investment process? What geopolitical risks are you positioning for? Share your thoughts in the comments or connect with me directly.

Disclaimer: This article is for informational purposes only and does not constitute investment advice. The performance figures presented are based on actual portfolios but may not be achievable for all investors. Always conduct your own research and consider your financial situation before making investment decisions.

July 17, 2024July 17, 2024

Night Trading

Key takeaways from “Night Trading: Higher Returns with Lower Risk”:

• Overnight returns show strong long-term persistence (up to 5 years!)
• Some stocks consistently outperform overnight
• Overnight trading strategies can be profitable even after costs
• Potential for lower risk AND higher returns for select stocks

The Overnight Bias Parameter (OBP) model, integrated into the Equities Entity Store, offers a powerful tool for identifying prime night trading opportunities.

The results? An OBP-based portfolio achieved an impressive 24.68% annual return from 1995-2021, with strong risk-adjusted performance (18.58%) and low market correlation (beta of 0.35).

This research challenges traditional risk-return paradigms and opens up exciting possibilities for savvy investors.

Want to learn more about leveraging these overnight return patterns? Check out the full presentation or visit the Equities Entity Store for real-time OBP data and analysis tools.

Night-Trading-Higher-Returns-with-Lower-Risk

June 21, 2024June 21, 2024

A Proposed Double Slit Experiment Using Momentum Entangled Photon Pairs

Double-Slit-Experiment-Using-Momentum-Entangled-Photon-Pairs

May 17, 2024

Advanced Course on Equity Analytics

Advanced-Course-on-Equity-Analytics

March 28, 2024

Optimal Mean-Reversion Strategies

Scenario Description

Consider a financial asset whose price, Xt, follows a mean-reverting stochastic process. A common model for mean reversion is the Ornstein-Uhlenbeck (OU) process, defined by the stochastic differential equation (SDE):

Objective

The trader aims to maximize the expected cumulative profit from trading this asset over a finite horizon, subject to transaction costs. The trader’s control is the rate of buying or selling the asset, denoted by ut, at time t.

Hamilton-Jacobi-Bellman (HJB) Equation

To find the optimal trading strategy, we frame this as a stochastic control problem. The value function,V(t,Xt), represents the maximum expected profit from time t to the end of the trading horizon, given the current price level Xt. The HJB equation for this problem is:

where C(ut) represents the cost of trading, which can depend on the rate of trading ut. The term ut(Xt−C(ut)) captures the profit from trading, adjusted for transaction costs.

Solution Approach

Boundary and Terminal Conditions: Specify terminal conditions for V(T,XT), where T is the end of the trading horizon, and boundary conditions for V(t,Xt) based on the problem setup.

Solve the HJB Equation: The solution involves finding the function V(t,Xt) and the control policy ut∗ that maximizes the HJB equation. This typically requires numerical methods, especially for complex cost functions or when closed-form solutions are not feasible.

Interpret the Optimal Policy: The optimal control ut∗ derived from solving the HJB equation indicates the optimal rate of trading (buying or selling) at any time t and price level Xt, considering the mean-reverting nature of the price and the impact of transaction costs.

Key Insights

No-Trade Zones: The presence of transaction costs often leads to the creation of no-trade zones in the optimal policy, where the expected benefit from trading does not outweigh the costs.

Mean-Reversion Exploitation: The optimal strategy exploits mean reversion by adjusting the trading rate based on the deviation of the current price from the mean level, μ.

The Lipton & Lopez de Marcos Paper

“A Closed-form Solution for Optimal Mean-reverting Trading Strategies” contributes significantly to the literature on optimal trading strategies for mean-reverting instruments. The paper focuses on deriving optimal trading strategies that maximize the Sharpe Ratio by solving the Hamilton-Jacobi-Bellman equation associated with the problem. It outlines a method that relies on solving a Fredholm integral equation to determine the optimal trading levels, taking into account transaction costs.

The paper begins by discussing the relevance of mean-reverting trading strategies across various markets, particularly emphasizing the energy market’s suitability for such strategies. It acknowledges the practical challenges and limitations of previous analytical results, mainly asymptotic and applicable to perpetual trading strategies, and highlights the novelty of addressing finite maturity strategies.

A key contribution of the paper is the development of an explicit formula for the Sharpe ratio in terms of stop-loss and take-profit levels, which allows traders to deploy tactical execution algorithms for optimal strategy performance under different market regimes. The methodology involves calibrating the Ornstein-Uhlenbeck process to market prices and optimizing the Sharpe ratio with respect to the defined levels. The authors present numerical results that illustrate the Sharpe ratio as a function of these levels for various parameters and discuss the implications of their findings for liquidity providers and statistical arbitrage traders.

The paper also reviews traditional approaches to similar problems, including the use of renewal theory and linear transaction costs, and compares these with its analytical framework. It concludes that its method provides a valuable tool for liquidity providers and traders to optimally execute their strategies, with practical applications beyond theoretical interest.

The authors use the path integral method to understand the behavior of their solutions, providing an alternative treatment to linear transaction costs that results in a determination of critical boundaries for trading. This approach is distinct in its use of direct solving methods for the Fredholm equation and adjusting the trading thresholds through a numerical method until a matching condition is met.

This research not only advances the understanding of optimal trading rules for mean-reverting strategies but also offers practical guidance for traders and liquidity providers in implementing these strategies effectively.

March 13, 2024March 13, 2024

The Misunderstood Art of Market Timing:

How to Beat Buy-and-Hold with Less Risk

What is Market Timing? – Common Misconceptions

Market timing has a very bad press and for good reason: the inherent randomness of markets makes reliable forecasting virtually impossible. So why even bother to write about it? The answer is, because market timing has been mischaracterized and misunderstood. It isn’t about forecasting. If fact, with notable exceptions, most of trading isn’t about forecasting. It’s about conditional expectations.

Conditional expectations refer to the expected value of a random variable (such as future stock returns) given certain known information or conditions.

In the context of trading and market timing, it means that rather than attempting to forecast absolute price levels, we base our expectations for future returns on current observable market conditions.

For example, let’s say historical data shows that when the market has declined a certain percentage from its recent highs (condition), forward returns over the next several days tend to be positive on average (expectation). A trading strategy could use this information to buy the dip when that condition is met, not because it is predicting that the market will rally, but because history suggests a favorable risk/reward ratio for that trade under those specific circumstances.

The key insight is that by focusing on conditional expectations, we don’t need to make absolute predictions about where the market is heading. We simply assess whether the present conditions have historically been associated with positive expected returns, and use that probabilistic edge to inform our trading decisions.

This is a more nuanced and realistic approach than binary forecasting, as it acknowledges the inherent uncertainty of markets while still allowing us to make intelligent, data-driven decisions. By aligning our trades with conditional expectations, we can put the odds in our favor without needing a crystal ball.

So, when a market timing algorithm suggests buying the market, it isn’t making a forecast about what the market is going to do next. Rather, what it is saying is, if the market behaves like this then, on past experience, the following trade is likely to be profitable. That is a very different thing from forecasting the market.

A good example of a simple market-timing algorithm is “buying the dips”. It’s so simple that you don’t need a computer algorithm to do it. But a computer algorithm helps by determining what comprises a dip and the level at which profits should be taken.

An Effective Market Timing Strategy

One of my favorites market timing strategies is the following algorithm, which I originally developed to trade the SPY ETF. The equity curve from inception of the ETF in 1993 looks like this:

The algorithm combines a few simple technical indicators to determine what constitutes a dip and the level at which profits should be taken. The entry and exit orders are also very straightforward, buying and selling at the market open, which can be achieved by participating in the opening auction. This is very convenient: a signal is generated after the close on day 1 and is then executed as a MOA (market opening auction) order in the opening auction on day 2. The opening auction is by no means the most liquid period of the trading session, but in an ETF like SPY the volumes are such that the market impact is likely to be negligible for the great majority of investors. This is not something you would attempt to do in an illiquid small-cap stock, however, where entries and exits are more reliably handled using a VWAP algorithm; but for any liquid ETF or large-cap stock the opening auction will typically be fine.

Adapting the Strategy to Other Assets and Markets

Another aspect that gives me confidence in the algorithm is that it generalizes well to other assets and even other markets. Here, for example, is the equity curve for the exact same algorithm implemented in the XLG ETF in the period from 2010:

And here is the equity curve for the same strategy (with the same parameters) in AAPL, over the same period:

Remarkably, the strategy also works in E-mini futures too, which is highly unusual: typically the market dynamics of the futures market are so different from the spot market that strategies don’t transfer well. But in this case, it simply works:

Understanding Why the Strategy Works

The reason the strategy is effective is due to the upward drift in equities and related derivatives. If you tried to apply a similar strategy to energy or currency markets, it would fail. The strategy’s “secret sauce” is the combination of indicators it uses to determine the short-term low in the ETF that constitutes a good buying opportunity, and then figure out the right level at which to sell.

Does the algorithm always work? If by that you mean “is every trade profitable?” the answer is no. Around 61% of trades are profitable, so there are many instances where trades are closed at a loss. But the net impact of using the market-timing algorithm is very positive, when compared to the buy-and-hold benchmark, as we shall see shortly.

Because the underlying thesis is so simple (i.e. equity markets have positive drift), we can say something about the long-term prospects for the strategy. Equity markets haven’t changed their fundamental tendency to appreciate over the 31-year period from inception of the SPY ETF in 1993, which is why the strategy has performed well throughout that time. Could one envisage market conditions in which the strategy will perform poorly? Yes – any prolonged period of flat to downward trending prices in equities will result in poor performance. But we haven’t seen those conditions since the early 1970’s and, arguably, they are unlikely to return, since the fundamental change brought about by abandonment of the gold standard in 1973.

The abandonment of the gold standard and the subsequent shift to fiat currencies has given central banks, particularly the U.S. Federal Reserve, unprecedented power to expand the money supply and support asset prices during times of crisis. This ‘Fed Put’ has been a major factor underpinning the multi-decade bull market in stocks.

In addition, the increasing dominance of the U.S. as the world’s primary economic and military superpower since the end of the Cold War has made U.S. financial assets a uniquely attractive destination for global capital, creating sustained demand for U.S. equities.

Technological innovation, particularly with respect to the internet and advances in computing, has also unleashed a wave of productivity and wealth creation that has disproportionately benefited the corporate sector and equity holders. This trend shows no signs of abating and may even be accelerating with the advent of artificial intelligence.

While risks certainly remain and occasional cyclical bear markets are inevitable, the combination of accommodative monetary policy, the U.S.’s global hegemony, and technological progress create a powerful set of economic forces that are likely to continue propelling equity prices higher over the long-term, albeit with significant volatility along the way.Strategy Performance in Bear Markets

Note that the conditions I am referring to are something unlike anything we have seen in the last 50 years, not just a (serious) market pullback. If we look at the returns in the period from 2000-2002, for example, we see that the strategy held up very well, out-performing the benchmark by 54% over the three-year period of the market crash. Likewise, in 2008 credit crisis, the strategy was able to eke out a small gain, beating the benchmark by over 38%. In fact, the strategy is positive in all but one of the 31 years from inception.

Comparing Performance to Buy-and-Hold

Let’s take a look at the compound returns from the strategy vs. the buy-and-hold benchmark:

At first sight, it appears that the benchmark significantly out-performs the strategy, albeit suffering from much larger drawdowns. But that doesn’t give an accurate picture of relative performance. To see why, let’s look at the overall performance characteristics:

Now we see that, while the strategy CAGR is 3.50% below the buy-and-hold return, its annual volatility is less than half that of the benchmark, giving the strategy a superior Sharpe Ratio.

Leveraging the Strategy to Enhance Risk-Adjusted Returns

To make a valid comparison between the strategy and its benchmark we therefore need to equalize the annual volatility of both, and we can achieve this by leveraging the strategy by a factor of approximately 2.32. When we do that, we obtain the following results:

Now that the strategy and benchmark volatilities have been approximately equalized through leverage, we see that the strategy substantially outperforms buy-and-hold by around 355 basis points per year and with far smaller drawdowns.

In general, we see that the strategy outperformed the benchmark in fewer than 50% of annual periods since 1993. However, the size of the outperformance in years when it beat the benchmark was frequently very substantial:

Conclusion

Market timing can work. To understand why, we need to stop thinking in terms of forecasting and think instead about conditional returns. When we do that, we arrive at the insight that market timing works because it relies on the positive drift in equity markets, which has been one of the central features of that market over the last 50 years and is likely to remain so in the foreseeable future. We have confidence in that prediction, because we understand the economic factors that have continued to drive the upward drift in equities over the last half-century.

After that, it is simply a question of the mechanics – how to time the entries and exits. This article describes just one approach amongst a great number of possibilities.

One of the many benefits of market timing is that it has a tendency to side-step the worst market conditions and can produce positive returns even in the most hostile environments: periods such as 2000-2002 and 2008, for example, as we have seen.

Finally, don’t forget that, as we are sitting out of the market approximately 40% of the time our overall risk is much lower – less than half that of the benchmark. So, we can afford to leverage our positions without taking on more overall risk than when we buy and hold. This clearly demonstrates the ability of the strategy to produce higher rates of risk-adjusted return.

March 11, 2024

A Two-Factor Model for Capturing Momentum and Mean Reversion in Stock Returns

Introduction:

Financial modeling has long sought to develop frameworks that accurately capture the complex dynamics of asset prices. Traditional models often focus on either momentum or mean reversion effects, struggling to incorporate both simultaneously. In this blog post, we introduce a two-factor model that aims to address this issue by integrating both momentum and mean reversion effects within the stochastic processes governing stock prices.

The Motivation Behind the Two-Factor Model:

The development of the two-factor model is motivated by the empirical observation that financial markets exhibit periods of persistent trends (momentum) and reversion to historical means or intrinsic values (mean reversion). Capturing both effects within a single framework has been a challenge in financial econometrics. The proposed model seeks to tackle this challenge by incorporating momentum and mean reversion effects within a unified framework.

The Building Blocks of the Two-Factor Model:

The two-factor model consists of two main components: a drift factor and a mean-reverting factor. The drift factor, denoted as d μ(t), represents the long-term trend or momentum of a stock’s price. It incorporates a constant drift parameter θ, reflecting the underlying direction driven by broader market forces or fundamental changes. The mean-reverting factor, denoted as d θt, captures the short-term deviations from the drift. It is characterized by a mean-reversion speed κ, which determines the rate at which prices revert to their long-term equilibrium following temporary fluctuations. These factors are influenced by their respective volatilities (σμ, σθ) and driven by correlated Wiener processes, allowing the model to reflect the interaction between momentum and mean reversion observed in markets

Empirical Application and Parameter Estimation:

To demonstrate the model’s application, the research applies the two-factor framework to daily returns data of Coca-Cola (KO) and PepsiCo (PEP) over a twenty-year period. This empirical analysis explores the model’s potential for informing pairs trading strategies. The parameter estimation process employs a maximum likelihood estimation (MLE) technique, adapted to handle the specifics of fitting a two-factor model to real-world data. This approach aims to ensure accuracy and adaptability, enabling the model to capture the evolving dynamics of the market.

Implications for Financial Modeling and Trading Strategies:

The introduction of the two-factor model contributes to the field of quantitative finance by providing a framework that incorporates both momentum and mean reversion effects. This approach can lead to a more comprehensive understanding of asset price dynamics, potentially benefiting risk management, asset allocation, and the development of trading strategies. The model’s insights may be particularly relevant for pairs trading, where identifying relative mispricings between related assets is important.

Conclusion:

The two-factor model presented in this blog post offers a new approach to financial modeling by integrating momentum and mean reversion effects. The model’s empirical application to Coca-Cola and PepsiCo demonstrates its potential for informing trading strategies. As quantitative finance continues to evolve, the two-factor model may prove to be a useful tool for researchers, practitioners, and investors seeking to understand the dynamics of financial markets.

Two-Factor-Model-of-Stock-Returns-ver_1_1

March 7, 2024March 7, 2024

Intelligent Technologies

Intelligent-Technologies-1

March 4, 2024March 4, 2024

High Frequency Statistical Arbitrage

High-frequency statistical arbitrage leverages sophisticated quantitative models and cutting-edge technology to exploit fleeting inefficiencies in global markets. Pioneered by hedge funds and proprietary trading firms over the last decade, the strategy identifies and capitalizes on sub-second price discrepancies across assets ranging from public equities to foreign exchange.

At its core, statistical arbitrage aims to predict short-term price movements based on probability theory and historical relationships. When implemented at high frequencies—microseconds or milliseconds—the quantitative models uncover trading opportunities unavailable to human traders. The predictive signals are then executable via automated, low-latency infrastructure.

These strategies thrive on speed. By getting pricing data faster, determining anomalies faster, and executing orders faster than the rest of the market, you expand the momentary windows to trade profitably.

Seminal papers have delved into the mathematical and technical nuances underpinning high-frequency statistical arbitrage. Zhaodong Zhong and Jian Wang’s 2014 paper develops stochastic models to quantify how market microstructure and randomness influence high-frequency trading outcomes. Samuel Wong’s 2018 research explores adapting statistical arbitrage for the nascent cryptocurrency markets.

Yet maximizing the strategy’s profitability poses an ongoing challenge. Changing market dynamics necessitate regular algorithm tweaking and infrastructure upgrades. It’s an arms race for lower latency and better predictive signals. Any edge gained disappears quickly as new firms implement similar systems. Regulatory attention also persists due to concerns over unintended impacts on market stability.

Nonetheless, high-frequency statistical arbitrage retains a crucial role for leading quant funds. Ongoing advances in machine learning, cloud computing, and execution technology promise to further empower the strategy. Though the competitive landscape grows more challenging, the cutting edge continues advancing profitably. Where human perception fails, automated high-frequency strategies recognize and seize value.

Implementing an Intraday Statistical Arbitrage Model

While HFT infrastructure and know-how are beyond the reach of most traders, it is possible to conceive of a system for pairs trading at moderate frequency, say 1-minute intervals.

We illustrate the approach with an algorithm that was originally showcased by Mathworks some years ago (but which has since slipped off the radar and is no longer available to download). I’ve amended the code to improve its efficiency, but the core idea remains the same: we conduct a rolling backtest in which data on a pair of assets, in this case spot prices of Brent Crude (LCO) and West Texas Intermediate (WTI), is subdivided into in-sample and out-of-sample periods of varying lengths. We seek to identify windows in which the price series are cointegrated in the sense of Engle-Granger and then apply the regression parameters to take long and short positions in the pair during the corresponding out-of-sample period. The idea is to trade only when there is compelling evidence of cointegration between the two series and to avoid trading at other times.

The critical part of the walk-forward analysis code is as shown below. Note we are using a function parametersweep to conduct a grid search across a range of in-sample dataset sizes to determine if the series are cointegrated (according to the Engle-Granger test) in that sub-period and, if so, determine the position size according to the regression parameters. The optimal in-sample parameters are then applied in the out-of-sample period and the performance results are recorded.

Here we are making use of Matlab’s parallelization capabilities, which work seamlessly to spread the processing load across available CPUs, handling the distribution of variables, function definitions and dependencies with ease. My experience with trying to parallelize Python, by contrast, is often a frustrating one that frequently fails at the first several attempts.

The results appear promising; however, the data is out-of-date, comes from a source that can be less than 100% reliable and may represent price quotes rather than traded prices. If we switch to 1-minute traded prices in a pair of stocks such as PEP and KO that are known to be cointegrated over long horizons, the outcome is very different:

Conclusion

High-frequency statistical arbitrage represents the convergence of cutting-edge technology and quantitative modeling to uncover fleeting trading advantages invisible to human market participants. This strategy has proven profitable for sophisticated hedge funds and prop shops, but also raises broader questions around fairness, regulation, and the future of finance.

However, the competitive edge gained from high-frequency strategies diminishes quickly as the technology diffuses across the industry. Firms must run faster just to stand still.

Continued advancement in machine learning, cloud computing, and execution infrastructure promises to expand the frontier. But practitioners and policymakers alike share responsibility for ensuring market integrity and stability amidst this technology arms race.

In conclusion, high-frequency statistical arbitrage remains essential to many leading quantitative firms, with the competitive landscape growing ever more challenging. Realizing the potential of emerging innovations, while promoting healthy markets that benefit all participants, will require both vision and wisdom. The path ahead lies between cooperation and competition, ethics and incentives. By bridging these domains, high-frequency strategies can contribute positively to financial evolution while capturing sustainable edge.

References:

Zhong, Zhaodong, and Jian Wang. “High-Frequency Trading and Probability Theory.” (2014).

Wong, Samuel S. Y. “A High-Frequency Algorithmic Trading Strategy for Cryptocurrency.” (2018).

Glossary

For those unfamiliar with the topic of statistical arbitrage and its commonly used terms and concepts, check out my book Equity Analytics, which covers the subject matter in considerable detail.

QUANTITATIVE RESEARCH AND TRADING

Posts

From Entities to Alphas: Launching the Python Version of the Equities Entity Store

Introduction

The Python Version: What’s New

Guaranteed Protection Against Lookahead Bias

Package Structure and Implementation

Why Start with Cross-Sectional ML?

Competitive Pricing and Value

Next Steps

Accessing the Python Equities Entity Store

Conclusion

Outperforming in Chaos: How Strategic Scenario Portfolios Are Beating the Market in 2025’s Geopolitical Storm

Night Trading

A Proposed Double Slit Experiment Using Momentum Entangled Photon Pairs

Advanced Course on Equity Analytics

Optimal Mean-Reversion Strategies

Scenario Description

Objective

Solution Approach

Key Insights

The Lipton & Lopez de Marcos Paper

The Misunderstood Art of Market Timing:

What is Market Timing? – Common Misconceptions

An Effective Market Timing Strategy

Adapting the Strategy to Other Assets and Markets

Understanding Why the Strategy Works

Comparing Performance to Buy-and-Hold

Leveraging the Strategy to Enhance Risk-Adjusted Returns

Conclusion

A Two-Factor Model for Capturing Momentum and Mean Reversion in Stock Returns

Intelligent Technologies

High Frequency Statistical Arbitrage

Implementing an Intraday Statistical Arbitrage Model

Conclusion

References:

Glossary