State-Space Models for Market Microstructure: Can Mamba Replace Transformers in High-Frequency Finance?

In my recent piece on Kronos, I explored how foundation models trained on K-line data are reshaping time series forecasting in finance. That discussion naturally raises a follow-up question that several readers have asked: what about the architecture itself? The Transformer has dominated deep learning for sequence modeling over the past seven years, but a new class of models — State-Space Models (SSMs), particularly the Mamba architecture — is gaining serious attention. In high-frequency trading, where computational efficiency and latency are everything, the claimed O(n) versus O(n²) complexity advantage is more than academic. It’s a potential competitive edge.

Let me be clear from the outset: I’m skeptical of any claim that a new architecture will “replace” Transformers wholesale. The Transformer ecosystem is mature, well-understood, and backed by enormous engineering investment. But in the specific context of market microstructure — where we process millions of tick events, model limit order book dynamics, and make decisions in microseconds — SSMs deserve serious examination. The question isn’t whether they can replace Transformers entirely, but whether they should be part of our toolkit for certain problems.

I’ve spent the better part of two decades building trading systems that push against latency constraints. I’ve watched the industry evolve from simple linear models to gradient boosted trees to deep learning, each wave promising revolutionary improvements. Most delivered incremental gains; some fizzled entirely. What’s interesting about SSMs isn’t the theoretical promise — we’ve seen theoretical promises before — but rather the practical characteristics that might actually matter in a production trading environment. The linear scaling, the constant-time inference, the selective attention mechanism — these aren’t just academic curiosities. They’re the exact properties that could determine whether a model makes it into a production system or dies in a research notebook.

What Are State-Space Models?

To understand why SSMs have suddenly become interesting, we need to go back to the mathematical foundations — and they’re older than you might think. State-space models originated in control theory and signal processing, describing systems where an internal state evolves over time according to differential equations, with observations emitted from that state. If you’ve used a Kalman filter — and in quant finance, many of us have — you’ve already worked with a simple state-space model, even if you didn’t call it that.

The canonical continuous-time formulation is:

\[x'(t) = Ax(t) + Bu(t)\]

\[y(t) = Cx(t) + Du(t)\]

where \(x(t)\) is the latent state vector, \(u(t)\) is the input, \(y(t)\) is the output, and \(A\), \(B\), \(C\), \(D\) are learned matrices. This looks remarkably like a Kalman filter — because it is, in essence, a nonlinear generalization of linear state estimation. The key difference from traditional time series models is that we’re learning the dynamics directly from data rather than specifying them parametrically. Instead of assuming variance follows a GARCH(1,1) process, we let the model discover what the underlying state evolution looks like.

The challenge, historically, was that computing these models was intractable for long sequences. The recurrent view requires iterating through each timestep sequentially; the convolutional view requires computing full convolutions that scale poorly. This is where the S4 model (Structured State Space Sequence) changed the game.

S4, introduced by Gu, Dao et al. (2022), brought three critical innovations. First, it used the HiPPO (High-order Polynomial Projection Operator) framework to initialize the state matrix \(A\) in a way that preserves long-range dependencies. Without proper initialization, SSMs suffer from the same vanishing gradient problems as RNNs. The HiPPO matrix is specifically designed so that when the model views a sequence, it can accurately represent all historical information without exponential decay. In financial terms, this means last month’s market dynamics can influence today’s predictions — something vanilla RNNs struggle with.

Author’s Take: This is the key innovation that makes SSMs viable for finance. Without HiPPO, you’d face the same vanishing-gradient failure mode that killed RNN research for decades. The HiPPO initialization is essentially a “warm start” that encodes the mathematical insight that recent history matters more than distant history — but distant history still matters. This is perfectly aligned with how financial markets work: last quarter’s regime still influences pricing, even if less than yesterday’s moves.

HiPPO provides a theoretically grounded initialization that allows the model to remember information from thousands of timesteps ago — critical for financial time series where last week’s patterns may be relevant to today’s dynamics. The mathematical insight is that HiPPO projects the input onto a basis of orthogonal polynomials, maintaining a compressed representation of the full history. This is conceptually similar to how we’d use PCA for dimensionality reduction, except it’s learned end-to-end as part of the model’s dynamics.

Second, S4 introduced structured parameterizations that enable efficient computation via diagonalization. Rather than storing full \(N \times N\) matrices where \(N\) is the state dimension, S4 uses structured forms that reduce memory and compute requirements while maintaining expressiveness. The key insight is that the state transition matrix \(A\) can be parameterized as a diagonal-plus-low-rank form that enables fast computation via FFT-based convolution. This is what gives S4 its computational advantage over traditional SSMs — the structured form turns the convolution from \(O(L^2)\) to \(O(L \log L)\).

Third, S4 discretizes the continuous-time model into a discrete-time representation suitable for implementation. The standard approach is zero-order hold (ZOH), which treats the input as constant between timesteps:

\[x_{k} = \bar{A}x_{k-1} + \bar{B}u_k\]

\[y_k = \bar{C}x_k + \bar{D}u_k\]

where \(\bar{A} = e^{A\Delta t}\), \(\bar{B} = (e^{A\Delta t} – I)A^{-1}B\), and similarly for \(\bar{C}\) and \(\bar{D}\). The bilinear transform is an alternative that can offer better frequency response in some settings:

Author’s Take: In practice, I’ve found ZOH (zero-order hold) works well for most tick-level data — it’s robust to the high-frequency microstructure noise that dominates at sub-second horizons. Bilinear can help if you’re modeling at longer horizons (minutes to hours) where you care more about capturing trend dynamics than filtering out tick-by-tick noise. This is another example of where domain knowledge beats blind architecture choices.

\[\bar{A} = (I + A\Delta t/2)(I – A\Delta t/2)^{-1}\]

Either way, the discretization bridges continuous-time system theory with discrete-time sequence modeling. The choice of discretization matters for financial applications because different discretization schemes have different frequency characteristics — bilinear transform tends to preserve low-frequency behavior better, which may be important for capturing long-term trends.

Mamba, introduced by Gu and Dao (2023) and winning best paper at ICLR 2024, added a fourth critical innovation: selective state spaces. The core insight is that not all input information is equally relevant at all times. In a financial context, during calm markets, we might want to ignore most order flow noise and focus on price levels; during a news event or volatility spike, we want to attend to everything. Mamba introduces a selection mechanism that allows the model to dynamically weigh which inputs matter:

\[s_t = \text{select}(u_t)\]

\[\bar{B}_t = \text{Linear}_B(s_t)\]

\[\bar{C}_t = \text{Linear}_C(s_t)\]

The select operation is implemented as a learned projection that determines which elements of the input to filter. This is fundamentally different from attention — rather than computing pairwise similarities between all tokens, the model learns a function that decides what information to carry forward. In practice, this means Mamba can learn to “ignore” regime-irrelevant data while attending to regime-critical signals.

This selectivity, combined with an efficient parallel scan algorithm (often called S6), gives Mamba its claimed linear-time inference while maintaining the ability to capture complex dependencies. The complexity comparison is stark: Transformers require \(O(L^2)\) attention computations for sequence length \(L\), while Mamba processes each token in \(O(1)\) time with \(O(L)\) total computation. For \(L = 10,000\) ticks — a not-unreasonable window for intraday analysis — that’s \(10^8\) versus \(10^4\) operations per layer. The practical implication is either dramatically faster inference or the ability to process much longer sequences for the same compute budget. On modern GPUs, this translates to milliseconds versus tens of milliseconds for a forward pass — a difference that matters when you’re making hundreds of predictions per second.

Compared to RNNs like LSTMs, SSMs don’t suffer from the same sequential computation bottleneck during training. While LSTMs must process tokens one at a time (true parallelization is limited), SSMs can be computed as convolutions during training, enabling GPU parallelism. During inference, SSMs achieve the constant-time-per-token property that makes them attractive for production deployment. This is the key advantage over LSTMs — you get the sequential processing benefits of RNNs during inference with the parallel training benefits of CNNs.

Why HFT and Market Microstructure?

If you’re building trading systems, you’ve likely noticed that most machine learning approaches to finance treat the problem as either (a) predicting returns at some horizon, or (b) classifying market regimes. Neither approach explicitly models the underlying mechanism that generates prices. Market microstructure does exactly that — it models how orders arrive, how limit order books evolve, how informed traders interact with liquidity providers, and how information gets incorporated into prices. Understanding microstructure isn’t just academic — it’s the foundation of profitable execution and market-making strategies.

The data characteristics of market microstructure create unique challenges that make SSMs potentially attractive:

Scale: A single liquid equity can generate millions of messages per day across bid, ask, and depth levels. Consider a highly traded stock like Tesla or Nvidia during volatile periods — you might see 50-100 messages per second, per instrument. A typical algo trading firm’s data pipeline might ingest 50-100GB of raw tick data daily across their coverage universe. Processing this with Transformer models is expensive. The quadratic attention complexity means that doubling your context length quadruples your compute cost. With SSMs, you double context and roughly double compute — a much friendlier scaling curve. This is particularly important when you’re building models that need to see significant historical context to make predictions.

Non-stationarity: Market microstructure is inherently non-stationary. The dynamics of a limit order book during normal trading differ fundamentally from those during a market open, a regulatory halt, or a volatility auction. At market open, you have a flood of overnight orders, wide spreads, and rapid price discovery. During a halt, trading stops entirely and the book freezes. In volatility auctions, you see large price movements with reduced liquidity. Mamba’s selective mechanism is specifically designed to handle this — the model can learn to “switch off” irrelevant inputs when market conditions change. This is conceptually similar to regime-switching models in econometrics, but learned end-to-end. The model learns when to attend to order flow dynamics and when to ignore them based on learned signals.

Latency constraints: In market-making or latency-sensitive strategies, every microsecond counts. A Transformer processing a 512-token sequence might require 262,144 attention operations. Mamba processes the same sequence in roughly 512 state updates — a 500x reduction in per-token operations. While the constants differ (SSM state dimension adds overhead), the theoretical advantage is substantial. Several practitioners I’ve spoken with report sub-10ms inference times for Mamba models that would be impractical with Transformers at the same context length. For comparison, a typical market-making strategy might have a 100-microsecond latency budget for the entire decision pipeline — inference must be measured in microseconds, not milliseconds.

Long-range dependencies: Consider a statistical arbitrage strategy across 100 stocks. A regulatory announcement at 9:30 AM might affect correlations across the entire universe until midday. Capturing this requires modeling dependencies across thousands of timesteps. The HiPPO initialization in S4 and the selective mechanism in Mamba are specifically designed to maintain information flow over such horizons — something vanilla RNNs struggle with due to gradient decay. In practice, this means you can build models that truly “remember” what happened earlier in the trading session, not just what happened in the last few minutes.

There’s also a subtler point worth mentioning: the order book itself is a form of state. When you look at the bid-ask ladder, you’re seeing a snapshot of accumulated order flow — the current state reflects all historical interactions. SSMs are naturally suited to modeling stateful systems because that’s literally what they are. The latent state \(x(t)\) in the state equation can be interpreted as an embedding of the current market state, learned from data rather than specified by theory. This is philosophically aligned with how we think about market microstructure: the order book is a state variable, and the messages are observations that update that state.

Recent Research and Results

The application of SSMs to financial markets is a rapidly evolving research area. Let me survey what’s been published, with appropriate skepticism about early-stage results. The key papers worth noting span both the SSM methodology and the finance-specific applications.

On the methodology side, S4 (Gu, Johnson et al., 2022) established the foundation by demonstrating that structured state spaces could match or exceed Transformers on long-range arena benchmarks while maintaining linear computation. The Mamba paper (Gu and Dao, 2023) pushed further by introducing selective state spaces and achieving state-of-the-art results on language modeling benchmarks — remarkable because it suggested SSMs could compete with Transformers on tasks previously dominated by attention. The follow-up work on Mamba 2 (Dao and Gu, 2024) introduced structured state space duals, further improving efficiency.

On the application side, CryptoMamba (Shi et al., 2025) applied Mamba to Bitcoin price prediction, demonstrating “effective capture of long-range dependencies” in cryptocurrency time series. The authors report competitive performance against LSTM and Transformer baselines on several prediction horizons. The cryptocurrency market, with its 24/7 trading and higher noise-to-signal ratio than traditional equities, provides an interesting test case for SSMs’ ability to handle extreme non-stationarity. The paper’s methodology section shows that Mamba’s selective mechanism successfully learned to filter out noise during calm periods while attending to significant price movements — exactly what we’d hope to see.

MambaStock (Liu et al., 2024) adapted the Mamba architecture specifically for stock prediction, introducing modifications to handle the multi-dimensional nature of financial features (price, volume, technical indicators). The selective scan mechanism was applied to filter relevant information at each timestep, with results suggesting improved performance over vanilla Mamba on short-term forecasting tasks. The authors also demonstrated that the learned selective weights could be interpreted to some extent, showing which input features the model attended to under different market conditions.

Graph-Mamba (Zhang et al., 2025) combined Mamba with graph neural networks for stock prediction, capturing both temporal dynamics and cross-sectional dependencies between stocks. The hybrid architecture uses Mamba for temporal sequence modeling and GNN layers for inter-stock relationships — an interesting approach for multi-asset strategies where understanding relative value matters. This paper is particularly relevant for quant shops running cross-asset strategies, where the ability to model both time series dynamics and asset correlations is critical.

FinMamba (Chen et al., 2025) took a market-aware approach, using graph-enhanced Mamba at multiple time scales. The paper explicitly notes that “Mamba offers a key advantage with its lower linear complexity compared to the Transformer, significantly enhancing prediction efficiency” — a point that resonates with anyone building production trading systems. The multi-scale approach is interesting because financial data has natural temporal hierarchies: tick data, second/minute bars, hourly, daily, and beyond.

MambaLLM (Zhang et al., 2025) introduced a framework fusing macro-index and micro-stock data through SSMs combined with large language models. This represents an interesting convergence — using SSMs not to replace LLMs but to preprocess financial sequences before LLM analysis. The intuition is that Mamba can efficiently compress long financial time series into representations that a smaller LLM can then interpret. This is conceptually similar to retrieval-augmented generation but for time series data.

Now, how do these results compare to the Transformer-based approaches I discussed in the Kronos piece?

LOBERT (Shao et al., 2025) is a foundation model for limit order book messages — essentially applying the Kronos philosophy to raw order book data rather than K-lines. Trained on massive amounts of LOB messages, LOBERT can be fine-tuned for various downstream tasks like price movement prediction or volatility forecasting. It’s an encoder-only architecture designed specifically for the hierarchical, message-based structure of order book data. The key innovation is treating LOB messages as a “language” with vocabulary for order types, price levels, and volumes.

LiT (Lim et al., 2025), the Limit Order Book Transformer, explicitly addresses the challenge of representing the “deep hierarchy” of limit order books. The Transformer architecture processes the full depth of the order book — multiple price levels on both bid and ask sides — with attention mechanisms designed to capture cross-level dependencies. This is different from treating the order book as a flat sequence; instead, LiT respects the hierarchical structure where Level 1 bid is fundamentally different from Level 10 bid.

The comparison is instructive. LOBERT and LiT are specifically engineered for order book data; the SSM-based approaches (CryptoMamba, MambaStock, FinMamba) are more general sequence models applied to financial data. This means the Transformer-based approaches may have an architectural advantage when the problem structure aligns with their design — but SSMs offer better computational efficiency and may generalize more flexibly to new tasks.

What about direct head-to-head comparisons? The evidence is still thin. Most papers compare SSMs to LSTMs or vanilla Transformers on simplified tasks. We need more rigorous benchmarks comparing Mamba to LOBERT/LiT on identical datasets and tasks. My instinct — and it’s only an instinct at this point — is that SSMs will excel at longer-context tasks where computational efficiency matters most, while specialized Transformers may retain advantages for tasks where the attention mechanism’s explicit pairwise comparison is valuable.

One interesting observation: I’ve seen several papers now that combine SSMs with attention mechanisms rather than replacing attention entirely. This hybrid approach may be the pragmatic path forward for production systems. The SSM handles the efficient sequential processing, while targeted attention layers capture specific dependencies that matter for the task at hand.

Practical Implementation Considerations

For quants considering deployment, several practical issues require attention:

Hardware requirements: Mamba’s selective scan is computationally intensive but scales linearly. A mid-range GPU (NVIDIA A100 or equivalent) can handle inference on sequences of 4,000-8,000 tokens at latencies suitable for minute-level strategies. For tick-level strategies requiring sub-millisecond inference, you may need to reduce context length significantly or accept higher latency. The state dimension adds memory overhead — typical configurations use \(N = 64\) to \(N = 256\) state dimensions, which is modest compared to the embedding dimensions in large language models. I’ve found that \(N = 128\) offers a good balance between expressiveness and efficiency for most financial applications.

Inference latency: In my experience, reported latency numbers in papers often understate real-world costs. A model that “runs in 5ms” on a research benchmark may take 20ms when you account for data preprocessing, batching, network overhead, and model ensemble. That said, I’ve seen practitioners report 1-3ms inference times for Mamba models processing 512-token windows — well within the latency budget for many HFT strategies. Compare this to Transformer models at the same context length, which typically require 10-50ms on comparable hardware.

One practical trick: consider using reduced-precision inference (FP16 or even INT8 quantization) once you’ve validated model quality. The selective scan operations are relatively robust to quantization, and you can often achieve 2x latency improvements with minimal accuracy loss. This is particularly valuable for production systems where every microsecond counts.

Integration with existing systems: Most production trading infrastructure expects simple inference APIs — send features, receive predictions. Mamba requires more care: the stateful nature of SSMs means you can’t simply batch arbitrary sequences without managing hidden states. This is manageable but requires engineering effort. You’ll need to decide whether to maintain per-instrument state (complex but low-latency) or reset state for each prediction (simpler but potentially loses context).

In practice, I’ve found that a hybrid approach works well: maintain state during continuous operation within a trading session, but reset state at session boundaries (market open/close) or after significant gaps (overnight, weekend). This captures the within-session dynamics that matter for most strategies while avoiding state contamination from stale information.

Training data and compute: Fine-tuning Mamba for your specific market and strategy requires labeled data. Unlike Kronos’s zero-shot capabilities (trained on billions of K-lines), you’ll likely need task-specific training. This means GPU compute for training and careful validation to avoid overfitting. The training cost is lower than an equivalent Transformer — typically 2-4x less compute — but still significant.

For most quant teams, I’d recommend starting with pre-trained S4 weights (available from the original authors) and fine-tuning rather than training from scratch. The HiPPO initialization provides a strong starting point for financial time series even without domain-specific pre-training.

Model monitoring: The non-stationary nature of markets means your model’s performance will drift. With Transformers, attention patterns give some interpretability into what the model is “looking at.” With Mamba, the selective mechanism is less transparent. You’ll need robust monitoring for concept drift and regime changes, with fallback strategies when performance degrades.

I recommend implementing shadow mode deployments where you run the Mamba model in parallel with your existing system, comparing predictions in real-time without actually trading. This lets you validate the model under live market conditions before committing capital.

Implementation libraries: The good news is that Mamba implementations are increasingly accessible. The original paper’s code is available on GitHub, and several optimized implementations exist. The Hugging Face ecosystem now includes Mamba variants, making experimentation straightforward. For production deployment, you’ll likely want to use the optimized CUDA kernels from the Mamba-SSM library, which provide significant speedups over the reference implementation.

Limitations and Open Questions

Let me be direct about what we don’t yet know:

The Quant’s Reality Check: Critical Questions for Production

Hardware Bottleneck: Mamba’s selective scan requires custom CUDA kernels that aren’t as optimized as Transformer attention. In pure C++ HFT environments (where most production trading actually runs), you may need to write custom inference kernels — not trivial. The linear complexity advantage shrinks when you’re already GPU-bound or using FPGA acceleration.

Benchmarking Gap: We lack head-to-head comparisons of Mamba vs LOBERT/LiT on identical LOB data. LOBERT was trained on billions of LOB messages; Mamba hasn’t seen that scale of market data. The “fair fight” comparison hasn’t been run yet.

Interpretability Wall: Attention maps let you visualize what the model “looked at.” Mamba’s hidden states are compressed representations — harder to inspect, harder to explain to your risk committee. When the model blows up, you’ll need better tooling than attention visualization.

Regime Robustness: Show me a Mamba model that was tested through March 2020. I haven’t seen it. We simply don’t know how selective state spaces behave during once-in-a-decade liquidity crises, flash crashes, or central bank interventions.

Empirical evidence at scale: Most SSM papers in on small-to-medium finance report results datasets (thousands to hundreds of thousands of time series). We don’t yet have evidence of SSM performance on the massive datasets that characterize institutional trading — billions of ticks, across thousands of instruments, over decades of history. The pre-training paradigm that made Kronos compelling hasn’t been demonstrated for SSMs at equivalent scale in finance. This is probably the biggest gap in the current research landscape.

Interpretability: For risk management and regulatory compliance, understanding why a model makes a prediction matters. Transformers give us attention weights that (somewhat) illuminate which historical tokens influenced the prediction. Mamba’s hidden states are less interpretable. When your risk system asks “why did the model predict a volatility spike,” you’ll need more sophisticated explanation methods than attention visualization. Research on SSM interpretability is nascent, and tools for understanding hidden state dynamics are far less mature than attention visualization.

Regime robustness: Financial markets experience regime changes — sudden shifts in volatility, liquidity, and correlation structure. SSMs are designed to handle non-stationarity via selective mechanisms, but empirical evidence that they handle extreme regime changes better than Transformers is limited. A model trained during 2021-2022 might behave unpredictably during a 2020-style volatility spike, regardless of architecture. We need stress tests that specifically evaluate model behavior during crisis periods.

Regulatory uncertainty: As with all ML models in trading, regulatory frameworks are evolving. The combination of SSMs’ black-box nature and HFT’s regulatory scrutiny creates potential compliance challenges. Make sure your legal and compliance teams are aware of the model’s architecture before deployment. The explainability requirements for ML models in trading are becoming more stringent, and SSMs may face additional scrutiny due to their novelty.

Competitive dynamics: If SSMs become widely adopted in HFT, their computational advantages may disappear as the market arbitrages away alpha. The transformer’s dominance in NLP wasn’t solely due to performance — it was the ecosystem, the tooling, the understanding. SSMs are early in this curve. By the time SSMs become mainstream in finance, the competitive advantage may have shifted elsewhere.

Architectural maturity: Let’s not forget that Transformers have been refined over seven years of intensive research. Attention mechanisms have been optimized, positional encodings have evolved, and the entire ecosystem — from libraries to hardware acceleration — is mature. SSMs are at version 1.0. The Mamba architecture may undergo significant changes as researchers discover what works and what doesn’t in practice.

Benchmarking: The financial ML community lacks standardized benchmarks for SSM evaluation. Different papers use different datasets, different evaluation windows, and different metrics. This makes comparison difficult. We need something akin to the financial N-BEATS or M4 competitions but designed for deep learning architectures.

Conclusion: A Pragmatic Hybrid View

The question “Can Mamba replace Transformers?” is the wrong frame. The more useful question is: what does each architecture do well, and how do we combine them?

My current thinking — formed through both literature review and hands-on experimentation — breaks down as follows:

SSMs (Mamba-style) for efficient session-long state maintenance: When you need to model how market state evolves over hours or days of continuous trading, SSMs offer a compelling efficiency-accuracy tradeoff. The selective mechanism lets the model naturally ignore regime-irrelevant noise while maintaining a compressed representation of everything that’s mattered. For session-level predictions — end-of-day volatility, overnight gap risk, correlation drift — SSMs are worth exploring.

Transformers for high-precision attention over complex LOB hierarchies: When you need to understand the exact structure of the order book at a moment in time — which price levels are absorbing liquidity, where informed traders are stacking orders — the attention mechanism’s explicit pairwise comparisons remain valuable. Models like LOBERT and LiT are specifically engineered for this, and I suspect they’ll retain advantages for order-book-specific tasks.

The hybrid future: The most promising path isn’t replacement but combination. Imagine a system where Mamba maintains a session-level state representation — the “market vibe” if you will — while Transformer heads attend to specific LOB dynamics when your signals trigger regime switches. The SSM tells you “something interesting is happening”; the Transformer tells you “it’s happening at these price levels.”

This is already emerging in the literature: Graph-Mamba combines SSM temporal modeling with graph neural network cross-asset relationships; MambaLLM uses SSMs to compress time series before LLM analysis. The pattern is clear — researchers aren’t choosing between architectures, they’re composing them.

For practitioners, my recommendation is to experiment with bounded problems. Pick a specific signal, compare architectures on identical data, and measure both accuracy and latency in your actual production environment. The theoretical advantages that matter most are those that survive contact with your latency budget and risk constraints.

The post-Transformer era isn’t about replacement — it’s about selection. Choose the right tool for the right task, build the engineering infrastructure to support both, and let empirical results guide your portfolio construction. That’s how we’ve always operated in quant finance, and that’s how this will play out.

I’m continuing to experiment. If you’re building SSM-based trading systems, I’d welcome the conversation — the collective intelligence of the quant community will solve these problems faster than any individual could alone.

References

  1. Gu, A., & Dao, T. (2023). Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv preprint arXiv:2312.00752. https://arxiv.org/abs/2312.00752
  2. Gu, A., Goel, K., & Ré, C. (2022). Efficiently Modeling Long Sequences with Structured State Spaces. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=uYLFoz1vlAC
  3. Linna, E., et al. (2025). LOBERT: Generative AI Foundation Model for Limit Order Book Messages. arXiv preprint arXiv:2511.12563. https://arxiv.org/abs/2511.12563
  4. (2025). LiT: Limit Order Book Transformer. Frontiers in Artificial Intelligence. https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1616485/full
  5. Avellaneda, M., & Stoikov, S. (2008). High-frequency trading in a limit order book. Quantitative Finance, 8(3), 217–224. (Manuscript PDF) https://people.orie.cornell.edu/sfs33/LimitOrderBook.pdf

High Frequency Trading with ADL – JonathanKinlay.com

Trading Technologies’ ADL is a visual programming language designed specifically for trading strategy development that is integrated in the company’s flagship XTrader product. ADL Extract2 Despite the radically different programming philosophy, my experience of working with ADL has been delightfully easy and strategies that would typically take many months of coding in C++ have been up and running in a matter of days or weeks.  An extract of one such strategy, a high frequency scalping trade in the E-Mini S&P 500 futures, is shown in the graphic above.  The interface and visual language is so intuitive to a trading system developer that even someone who has never seen ADL before can quickly grasp at least some of what it happening in the code.

Strategy Development in Low vs. High-Level Languages
What are the benefits of using a high level language like ADL compared to programming languages like C++/C# or Java that are traditionally used for trading system development?  The chief advantage is speed of development:  I would say that ADL offers the potential up the development process by at least one order of magnitude.  A complex trading system would otherwise take months or even years to code and test in C++ or Java, can be implemented successfully and put into production in a matter of weeks in ADL. In this regard, the advantage of speed of development is one shared by many high level languages, including, for example, Matlab, R and Mathematica.  But in ADL’s case the advantage in terms of time to implementation is aided by the fact that, unlike generalist tools such as MatLab, etc, ADL is designed specifically for trading system development.  The ADL development environment comes equipped with compiled pre-built blocks designed to accomplish many of the common tasks associated with any trading system such as acquiring market data and handling orders.  Even complex spread trades can be developed extremely quickly due to the very comprehensive library of pre-built blocks.

SSALGOTRADING AD

Integrating Research and Development
One of the drawbacks of using a higher  level language for building trading systems is that, being interpreted rather than compiled, they are simply too slow – one or more orders of magnitude, typically – to be suitable for high frequency trading.  I will come on to discuss the execution speed issue a little later.  For now, let me bring up a second major advantage of ADL relative to other high level languages, as I see it.  One of the issues that plagues trading system development is the difficulty of communication between researchers, who understand financial markets well, but systems architecture and design rather less so, and developers, whose skill set lies in design and programming, but whose knowledge of markets can often be sketchy.  These difficulties are heightened where researchers might be using a high level language and relying on developers to re-code their prototype system  to get it into production.  Developers  typically (and understandably) demand a high degree of specificity about the requirement and if it’s not included in the spec it won’t be in the final deliverable.  Unfortunately, developing a successful trading system is a highly non-linear process and a researcher will typically have to iterate around the core idea repeatedly until they find a combination of alpha signal and entry/exit logic that works.  In other words, researchers need flexibility, whereas developers require specificity. ADL helps address this issue by providing a development environment that is at once highly flexible and at the same time powerful enough to meet the demands of high frequency trading in a production environment.  It means that, in theory, researchers and developers can speak a common language and use a common tool throughout the R&D development cycle.  This is likely to reduce the kind of misunderstanding between researchers and developers that commonly arise (often setting back the implementation schedule significantly when they do).

Latency
Of course,  at least some of the theoretical benefit of using ADL depends on execution speed.  The way the problem is typically addressed with systems developed in high level languages like Matlab or R is to recode the entire system in something like C++, or to recode some of the most critical elements and plug those back into the main Matlab program as dlls.  The latter approach works, and preserves the most important benefits of working in both high and low level languages, but the resulting system is likely to be sub-optimal and can be difficult to maintain. The approach taken by Trading Technologies with ADL is very different.  Firstly,  the component blocks are written in  C# and in compiled form should run about as fast as native code.  Secondly, systems written in ADL can be deployed immediately on a co-located algo server that is plugged directly into the exchange, thereby reducing latency to an acceptable level.  While this is unlikely to sufficient for an ultra-high frequency system operating on the sub-millisecond level, it will probably suffice for high frequency systems that operate at speeds above above a few millisecs, trading up to say, around 100 times a day.

Fill Rate and Toxic Flow
For those not familiar with the HFT territory, let me provide an example of why the issues of execution speed and latency are so important.  Below is a simulated performance record for a HFT system in ES futures.  The system is designed to enter and exit using limit orders and trades around 120 times a day, with over 98% profitability, if we assume a 100% fill rate. Monthly PNL 1 Perf Summary 1  So far so good.  But  a 100% fill rate  is clearly unrealistic.  Let’s look at a pessimistic scenario: what if we  got filled on orders only when the limit price was exceeded?  (For those familiar with the jargon, we are assuming a high level of flow toxicity)  The outcome is rather different: Perf Summary 2 Neither scenario is particularly realistic, but the outcome is much more likely to be closer to the second scenario rather than the first if we our execution speed is slow, or if we are using a retail platform such as Interactive Brokers or Tradestation, with long latency wait times.  The reason is simple: our orders will always arrive late and join the limit order book at the back of the queue.  In most cases the orders ahead of ours will exhaust demand at the specified limit price and the market will trade away without filling our order.  At other times the market will fill our order whenever there is a large flow against us (i.e. a surge of sell orders into our limit buy), i.e. when there is significant toxic flow. The proposition is that, using ADL and the its high-speed trading infrastructure, we can hope to avoid the latter outcome.  While we will never come close to achieving a 100% fill rate, we may come close enough to offset the inevitable losses from toxic flow and produce a decent return.  Whether ADL is capable of fulfilling that potential remains to be seen.

More on ADL
For more information on ADL go here.

The Mathematics of Scalping

NOTE:  if you are unable to see the Mathematica models below, you can download the free Wolfram CDF player and you may also need this plug-in.

You can also download the complete Mathematica CDF file here.

In this post I want to explore aspects of scalping, a type of strategy widely utilized by high frequency trading firms.

I will define a scalping strategy as one in which we seek to take small profits by posting limit orders on alternate side of the book. Scalping, as I define it, is a strategy rather like market making, except that we “lean” on one side of the book. So, at any given time, we may have a long bias and so look to enter with a limit buy order. If this is filled, we will then look to exit with a subsequent limit sell order, taking a profit of a few ticks. Conversely, we may enter with a limit sell order and look to exit with a limit buy order.
The strategy relies on two critical factors:

(i) the alpha signal which tells us from moment to moment whether we should prefer to be long or short
(ii) the execution strategy, or “trade expression”

In this article I want to focus on the latter, making the assumption that we have some kind of alpha generation model already in place (more about this in later posts).

There are several means that a trader can use to enter a position. The simplest approach, the one we will be considering here, is simply to place a single limit order at or just outside the inside bid/ask prices – so in other words we will be looking to buy on the bid and sell on the ask (and hoping to earn the bid-ask spread, at least).

SSALGOTRADING AD

One of the problems with this approach is that it is highly latency sensitive. Limit orders join the limit order book at the back of the queue and slowly works their way towards the front, as earlier orders get filled. Buy the time the market gets around to your limit buy order, there may be no more sellers at that price. In that case the market trades away, a higher bid comes in and supersedes your order, and you don’t get filled. Conversely, yours may be one of the last orders to get filled, after which the market trades down to a lower bid and your position is immediately under water.

This simplistic model explains why latency is such a concern – you want to get as near to the front of the queue as you can, as quickly as possible. You do this by minimizing the time it takes to issue and order and get it into the limit order book. That entails both hardware (co-located servers, fiber-optic connections) and software optimization and typically also involves the use of Immediate or Cancel (IOC) orders. The use of IOC orders by HFT firms to gain order priority is highly controversial and is seen as gaming the system by traditional investors, who may end up paying higher prices as a result.

Another approach is to layer limit orders at price points up and down the order book, establishing priority long before the market trades there. Order layering is a highly complex execution strategy that brings addition complications.

Let’s confine ourselves to considering the single limit order, the type of order available to any trader using a standard retail platform.

As I have explained, we are assuming here that, at any point in time, you know whether you prefer to be long or short, and therefore whether you want to place a bid or an offer. The issue is, at what price do you place your order, and what do you do about limiting your risk? In other words, we are discussing profit targets and stop losses, which, of course, are all about risk and return.

Risk and Return in Scalping

Lets start by considering risk. The biggest risk to a scalper is that, once filled, the market goes against his position until he is obliged to trigger his stop loss. If he sets his stop loss too tight, he may be forced to exit positions that are initially unprofitable, but which would have recovered and shown a profit if he had not exited. Conversely,  if he sets the stop loss too loose, the risk reward ratio is very low – a single loss-making trade could eradicate the profit from a large number of smaller, profitable trades.

Now lets think about reward. If the trader is too ambitious in setting his profit target he may never get to realize the gains his position is showing – the market could reverse, leaving him with a loss on a position that was, initially, profitable. Conversely, if he sets the target too tight, the trader may give up too much potential in a winning trade to overcome the effects of the occasional, large loss.

It’s clear that these are critical concerns for a scalper: indeed the trade exit rules are just as important, or even more important, than the entry rules. So how should he proceed?

Theoretical Framework for Scalping

Let’s make the rather heroic assumption that market returns are Normally distributed (in fact, we know from empirical research that they are not – but this is a starting point, at least). And let’s assume for the moment that our trader has been filled on a limit buy order and is looking to decide where to place his profit target and stop loss limit orders. Given a current price of the underlying security of X, the scalper is seeking to determine the profit target of p ticks and the stop loss level of q ticks that will determine the prices at which he should post his limit orders to exit the trade. We can translate these into returns, as follows:

to the upside: Ru = Ln[X+p] – Ln[X]

and to the downside: Rd = Ln[X-q] – Ln[X]

This situation is illustrated in the chart below.

Normal Distn Shaded

The profitable area is the shaded region on the RHS of the distribution. If the market trades at this price or higher, we will make money: p ticks, less trading fees and commissions, to be precise. Conversely we lose q ticks (plus commissions) if the market trades in the region shaded on the LHS of the distribution.

Under our assumptions, the probability of ending up in the RHS shaded region is:

probWin = 1 – NormalCDF(Ru, mu, sigma),

where mu and sigma are the mean and standard deviation of the distribution.

The probability of losing money, i.e. the shaded area in the LHS of the distribution, is given by:

probLoss = NormalCDF(Rd, mu, sigma),

where NormalCDF is the cumulative distribution function of the Gaussian distribution.

The expected profit from the trade is therefore:

Expected profit = p * probWin – q * probLoss

And the expected win rate, the proportion of profitable trades, is given by:

WinRate = probWin / (probWin + probLoss)

If we set a stretch profit target, then p will be large, and probWin, the shaded region on the RHS of the distribution, will be small, so our winRate will be low. Under this scenario we would have a low probability of a large gain. Conversely, if we set p to, say, 1 tick, and our stop loss q to, say, 20 ticks, the shaded region on the RHS will represent close to half of the probability density, while the shaded LHS will encompass only around 5%. Our win rate in that case would be of the order of 91%:

WinRate = 50% / (50% + 5%) = 91%

Under this scenario, we make frequent, small profits  and suffer the occasional large loss.

So the critical question is: how do we pick p and q, our profit target and stop loss?  Does it matter?  What should the decision depend on?

Modeling Scalping Strategies

We can begin to address these questions by noticing, as we have already seen, that there is a trade-off between the size of profit we are hoping to make, and the size of loss we are willing to tolerate, and the probability of that gain or loss arising.  Those probabilities in turn depend on the underlying probability distribution, assumed here to be Gaussian.

Now, the Normal or Gaussian distribution which determines the probabilities of winning or losing at different price levels has two parameters – the mean, mu, or drift of the returns process and sigma, its volatility.

Over short time intervals the effect of volatility outweigh any impact from drift by orders of magnitude.  The reason for this is simple:  volatility scales with the square root of time, while the drift scales linearly.  Over small time intervals, the drift becomes un-noticeably small, compared to the process volatility.  Hence we can assume that mu, the process mean is zero, without concern, and focus exclusively on sigma, the volatility.

What other factors do we need to consider?  Well there is a minimum price move, which might be 1 tick, and the dollar value of that tick, from which we can derive our upside and downside returns, Ru and Rd.  And, finally, we need to factor in commissions and exchange fees into our net trade P&L.

Here’s a simple formulation of the model, in which I am using the E-mini futures contract as an exemplar.

 WinRate[currentPrice_,annualVolatility_,BarSizeMins_, nTicksPT_, nTicksSL_,minMove_, tickValue_, costContract_]:=Module[{ nMinsPerDay, periodVolatility, tgtReturn, slReturn,tgtDollar, slDollar, probWin, probLoss, winRate, expWinDollar, expLossDollar, expProfit},
nMinsPerDay = 250*6.5*60;
periodVolatility = annualVolatility / Sqrt[nMinsPerDay/BarSizeMins];
tgtReturn=nTicksPT*minMove/currentPrice;tgtDollar = nTicksPT * tickValue;
slReturn = nTicksSL*minMove/currentPrice;
slDollar=nTicksSL*tickValue;
probWin=1-CDF[NormalDistribution[0, periodVolatility],tgtReturn];
probLoss=CDF[NormalDistribution[0, periodVolatility],slReturn];
winRate=probWin/(probWin+probLoss);
expWinDollar=tgtDollar*probWin;
expLossDollar=slDollar*probLoss;
expProfit=expWinDollar+expLossDollar-costContract;
{expProfit, winRate}]

For the ES contract we have a min price move of 0.25 and the tick value is $12.50.  Notice that we scale annual volatility to the size of the period we are trading (15 minute bars, in the following example).

Scenario Analysis

Let’s take a look at how the expected profit and win rate vary with the profit target and stop loss limits we set.  In the following interactive graphics, we can assess the impact of different levels of volatility on the outcome.

Expected Profit by Bar Size and Volatility

[WolframCDF source=”http://www.jonathankinlay.com/Articles/Mathematics%20of%20Scalping/expectedProfitEVD.cdf” width=”800″ height=”450″ altimage=”” altimagewidth=”” altimageheight=””]

Expected Win Rate by Volatility

[WolframCDF source=”http://www.jonathankinlay.com/Articles/Mathematics%20of%20Scalping/expectedWinRate.cdf” width=”800″ height=”450″ altimage=”” altimagewidth=”” altimageheight=””]

Notice to begin with that the win rate (and expected profit) are very far from being Normally distributed – not least because they change radically with volatility, which is itself time-varying.

For very low levels of volatility, around 5%, we appear to do best in terms of maximizing our expected P&L by setting a tight profit target of a couple of ticks, and a stop loss of around 10 ticks.  Our win rate is very high at these levels – around 90% or more.  In other words, at low levels of volatility, our aim should be to try to make a large number of small gains.

But as volatility increases to around 15%, it becomes evident that we need to increase our profit target, to around 10 or 11 ticks.  The distribution of the expected P&L suggests we have a couple of different strategy options: either we can set a larger stop loss, of around 30 ticks, or we can head in the other direction, and set a very low stop loss of perhaps just 1-2 ticks.  This later strategy is, in fact, the mirror image of our low-volatility strategy:  at higher levels of volatility, we are aiming to make occasional, large gains and we are willing to pay the price of sustaining repeated small stop-losses.  Our win rate, although still well above 50%, naturally declines.

As volatility rises still further, to 20% or 30%, or more, it becomes apparent that we really have no alternative but to aim for occasional large gains, by increasing our profit target and tightening stop loss limits.   Our win rate under this strategy scenario will be much lower – around 30% or less.

Non – Gaussian Model

Now let’s address the concern that asset returns are not typically distributed Normally. In particular, the empirical distribution of returns tends to have “fat tails”, i.e. the probability of an extreme event is much higher than in an equivalent Normal distribution.

A widely used model for fat-tailed distributions in the Extreme Value Distribution. This has pdf:

PDF[ExtremeValueDistribution[,],x]

 EVD

Plot[Evaluate@Table[PDF[ExtremeValueDistribution[,2],x],{,{-3,0,4}}],{x,-8,12},FillingAxis]

EVD pdf

Mean[ExtremeValueDistribution[,]]

+EulerGamma

Variance[ExtremeValueDistribution[,]]

EVD Variance

In order to set the parameters of the EVD, we need to arrange them so that the mean and variance match those of the equivalent Gaussian distribution with mean = 0 and standard deviation . hence:

EVD params

The code for a version of the model using the GED is given as follows

WinRateExtreme[currentPrice_,annualVolatility_,BarSizeMins_, nTicksPT_, nTicksSL_,minMove_, tickValue_, costContract_]:=Module[{ nMinsPerDay, periodVolatility, alpha, beta,tgtReturn, slReturn,tgtDollar, slDollar, probWin, probLoss, winRate, expWinDollar, expLossDollar, expProfit},
nMinsPerDay = 250*6.5*60;
periodVolatility = annualVolatility / Sqrt[nMinsPerDay/BarSizeMins];
beta = Sqrt[6]*periodVolatility / Pi;
alpha=-EulerGamma*beta;
tgtReturn=nTicksPT*minMove/currentPrice;tgtDollar = nTicksPT * tickValue;
slReturn = nTicksSL*minMove/currentPrice;
slDollar=nTicksSL*tickValue;
probWin=1-CDF[ExtremeValueDistribution[alpha, beta],tgtReturn];
probLoss=CDF[ExtremeValueDistribution[alpha, beta],slReturn];
winRate=probWin/(probWin+probLoss);
expWinDollar=tgtDollar*probWin;
expLossDollar=slDollar*probLoss;
expProfit=expWinDollar+expLossDollar-costContract;
{expProfit, winRate}]

WinRateExtreme[1900,0.05,15,2,30,0.25,12.50,3][[2]]

0.21759

We can now produce the same plots for the EVD version of the model that we plotted for the Gaussian versions :

Expected Profit by Bar Size and Volatility – Extreme Value Distribution

[WolframCDF source=”http://www.jonathankinlay.com/Articles/Mathematics%20of%20Scalping/expectedProfitEVD.cdf” width=”600″ height=”450″ altimage=”” altimagewidth=”” altimageheight=””]

Expected Win Rate by Volatility – Extreme Value Distribution

[WolframCDF source=”http://www.jonathankinlay.com/Articles/Mathematics%20of%20Scalping/expectedWinRateEVD.cdf” width=”600″ height=”450″ altimage=”” altimagewidth=”” altimageheight=””]

Next we compare the Gaussian and EVD versions of the model, to gain an understanding of how the differing assumptions impact the expected Win Rate.

Expected Win Rate by Stop Loss and Profit Target

[WolframCDF source=”http://www.jonathankinlay.com/Articles/Mathematics%20of%20Scalping/WinRatebyVolatility.cdf” width=”600″ height=”450″ altimage=”” altimagewidth=”” altimageheight=””]

As you can see, for moderate levels of volatility, up to around 18 % annually, the expected Win Rate is actually higher if we assume an Extreme Value distribution of returns, rather than a Normal distribution.If we use a Normal distribution we will actually underestimate the Win Rate, if the actual return distribution is closer to Extreme Value.In other words, the assumption of a Gaussian distribution for returns is actually conservative.

Now, on the other hand, it is also the case that at higher levels of volatility the assumption of Normality will tend to over – estimate the expected Win Rate, if returns actually follow an extreme value distribution. But, as indicated before, for high levels of volatility we need to consider amending the scalping strategy very substantially. Either we need to reverse it, setting larger Profit Targets and tighter Stops, or we need to stop trading altogether, until volatility declines to normal levels.Many scalpers would prefer the second option, as the first alternative doesn’t strike them as being close enough to scalping to justify the name.If you take that approach, i.e.stop trying to scalp in periods when volatility is elevated, then the differences in estimated Win Rate resulting from alternative assumptions of return distribution are irrelevant.

If you only try to scalp when volatility is under, say, 20 % and you use a Gaussian distribution in your scalping model, you will only ever typically under – estimate your actual expected Win Rate.In other words, the assumption of Normality helps, not hurts, your strategy, by being conservative in its estimate of the expected Win Rate.

If, in the alternative, you want to trade the strategy regardless of the level of volatility, then by all means use something like an Extreme Value distribution in your model, as I have done here.That changes the estimates of expected Win Rate that the model produces, but it in no way changes the structure of the model, or invalidates it.It’ s just a different, arguably more realistic set of assumptions pertaining to situations of elevated volatility.

Monte-Carlo Simulation Analysis

Let’ s move on to do some simulation analysis so we can get an understanding of the distribution of the expected Win Rate and Avg Trade PL for our two alternative models. We begin by coding a generator that produces a sample of 1,000 trades and calculates the Avg Trade PL and Win Rate.

Gaussian Model

GenWinRate[currentPrice_,annualVolatility_,BarSizeMins_, nTicksPT_, nTicksSL_,minMove_, tickValue_, costContract_]:=Module[{ nMinsPerDay, periodVolatility, randObs, tgtReturn, slReturn,tgtDollar, slDollar, nWins,nLosses, perTradePL, probWin, probLoss, winRate, expWinDollar, expLossDollar, expProfit},
nMinsPerDay = 250*6.5*60;
periodVolatility = annualVolatility / Sqrt[nMinsPerDay/BarSizeMins];
tgtReturn=nTicksPT*minMove/currentPrice;tgtDollar = nTicksPT * tickValue;
slReturn = nTicksSL*minMove/currentPrice;
slDollar=nTicksSL*tickValue;
randObs=RandomVariate[NormalDistribution[0,periodVolatility],10^3];
nWins=Count[randObs,x_/;x>=tgtReturn];
nLosses=Count[randObs,x_/;xslReturn];
winRate=nWins/(nWins+nLosses)//N;
perTradePL=(nWins*tgtDollar+nLosses*slDollar)/(nWins+nLosses);{perTradePL,winRate}]

GenWinRate[1900,0.1,15,1,-24,0.25,12.50,3]

{7.69231,0.984615}

Now we can generate a random sample of 10, 000 simulation runs and plot a histogram of the Win Rates, using, for example, ES on 5-min bars, with a PT of 2 ticks and SL of – 20 ticks, assuming annual volatility of 15 %.

Histogram[Table[GenWinRate[1900,0.15,5,2,-20,0.25,12.50,3][[2]],{i,10000}],10,AxesLabel{“Exp. Win Rate (%)”}]

WinRateHist

Histogram[Table[GenWinRate[1900,0.15,5,2,-20,0.25,12.50,3][[1]],{i,10000}],10,AxesLabel{“Exp. PL/Trade ($)”}]

PLHist

Extreme Value Distribution Model

Next we can do the same for the Extreme Value Distribution version of the model.

GenWinRateExtreme[currentPrice_,annualVolatility_,BarSizeMins_, nTicksPT_, nTicksSL_,minMove_, tickValue_, costContract_]:=Module[{ nMinsPerDay, periodVolatility, randObs, tgtReturn, slReturn,tgtDollar, slDollar, alpha, beta,nWins,nLosses, perTradePL, probWin, probLoss, winRate, expWinDollar, expLossDollar, expProfit},
nMinsPerDay = 250*6.5*60;
periodVolatility = annualVolatility / Sqrt[nMinsPerDay/BarSizeMins];
beta = Sqrt[6]*periodVolatility / Pi;
alpha=-EulerGamma*beta;
tgtReturn=nTicksPT*minMove/currentPrice;tgtDollar = nTicksPT * tickValue;
slReturn = nTicksSL*minMove/currentPrice;
slDollar=nTicksSL*tickValue;
randObs=RandomVariate[ExtremeValueDistribution[alpha, beta],10^3];
nWins=Count[randObs,x_/;x>=tgtReturn];
nLosses=Count[randObs,x_/;xslReturn];
winRate=nWins/(nWins+nLosses)//N;
perTradePL=(nWins*tgtDollar+nLosses*slDollar)/(nWins+nLosses);{perTradePL,winRate}]

Histogram[Table[GenWinRateExtreme[1900,0.15,5,2,-10,0.25,12.50,3][[2]],{i,10000}],10,AxesLabel{“Exp. Win Rate (%)”}]

WinRateEVDHist

Histogram[Table[GenWinRateExtreme[1900,0.15,5,2,-10,0.25,12.50,3][[1]],{i,10000}],10,AxesLabel{“Exp. PL/Trade ($)”}]

PLEVDHist

 

 

Conclusions

The key conclusions from this analysis are:

  1. Scalping is essentially a volatility trade
  2. The setting of optimal profit targets are stop loss limits depend critically on the volatility of the underlying, and needs to be handled dynamically, depending on current levels of market volatility
  3. At low levels of volatility we should set tight profit targets and wide stop loss limits, looking to make a high percentage of small gains, of perhaps 2-3 ticks.
  4. As volatility rises, we need to reverse that position, setting more ambitious profit targets and tight stops, aiming for the occasional big win.