Statistical Arbitrage with Synthetic Data

In my last post I mapped out how one could test the reliability of a single stock strategy (for the S&P 500 Index) using synthetic data generated by the new algorithm I developed.

Developing Trading Strategies with Synthetic Data

As this piece of research follows a similar path, I won’t repeat all those details here. The key point addressed in this post is that not only are we able to generate consistent open/high/low/close prices for individual stocks, we can do so in a way that preserves the correlations between related securities. In other words, the algorithm not only replicates the time series properties of individual stocks, but also the cross-sectional relationships between them. This has important applications for the development of portfolio strategies and portfolio risk management.

KO-PEP Pair

To illustrate this I will use synthetic daily data to develop a pairs trading strategy for the KO-PEP pair.

The two price series are highly correlated, which potentially makes them a suitable candidate for a pairs trading strategy.

There are numerous ways to trade a pairs spread such as dollar neutral or beta neutral, but in this example I am simply going to look at trading the price difference. This is not a true market neutral approach, nor is the price difference reliably stationary. However, it will serve the purpose of illustrating the methodology.

Historical price differences between KO and PEP

Obviously it is crucial that the synthetic series we create behave in a way that replicates the relationship between the two stocks, so that we can use it for strategy development and testing. Ideally we would like to see high correlations between the synthetic and original price series as well as between the pairs of synthetic price data.

We begin by using the algorithm to generate 100 synthetic daily price series for KO and PEP and examine their properties.

Correlations

As we saw previously, the algorithm is able to generate synthetic data with correlations to the real price series ranging from below zero to close to 1.0:

Distribution of correlations between synthetic and real price series for KO and PEP

The crucial point, however, is that the algorithm has been designed to also preserve the cross-sectional correlation between the pairs of synthetic KO-PEP data, just as in the real data series:

Distribution of correlations between synthetic KO and PEP price series

Some examples of highly correlated pairs of synthetic data are shown in the plots below:

In addition to correlation, we might also want to consider the price differences between the pairs of synthetic series, since the strategy will be trading that price difference, in the simple approach adopted here. We could, for example, select synthetic pairs for which the divergence in the price difference does not become too large, on the assumption that the series difference is stationary. While that approach might well be reasonable in other situations, here an assumption of stationarity would be perhaps closer to wishful thinking than reality. Instead we can use of selection of synthetic pairs with high levels of cross-correlation, as we all high levels of correlation with the real price data. We can also select for high correlation between the price differences for the real and synthetic price series.

Strategy Development & WFO Testing

Once again we follow the procedure for strategy development outline in the previous post, except that, in addition to a selection of synthetic price difference series we also include 14-day correlations between the pairs. We use synthetic daily synthetic data from 1999 to 2012 to build the strategy and use the data from 2013 onwards for testing/validation. Eventually, after 50 generations we arrive at the result shown in the figure below:

As before, the equity curve for the individual synthetic pairs are shown towards the bottom of the chart, while the aggregate equity curve, which is a composition of the results for all none synthetic pairs is shown above in green. Clearly the results appear encouraging.

As a final step we apply the WFO analysis procedure described in the previous post to test the performance of the strategy on the real data series, using a variable number in-sample and out-of-sample periods of differing size. The results of the WFO cluster test are as follows:

The results are no so unequivocal as for the strategy developed for the S&P 500 index, but would nonethless be regarded as acceptable, since the strategy passes the great majority of the tests (in addition to the tests on synthetic pairs data).

The final results appear as follows:

Conclusion

We have demonstrated how the algorithm can be used to generate synthetic price series the preserve not only the important time series properties, but also the cross-sectional properties between series for correlated securities. This important feature has applications in the development of statistical arbitrage strategies, portfolio construction methodology and in portfolio risk management.

Developing Trading Strategies With Synthetic Data

One of the main criticisms levelled at systematic trading over the last few years is that the over-use of historical market data has tended to produce curve-fitted strategies that perform poorly out of sample in a live trading environment. This is indeed a valid criticism – given enough attempts one is bound to arrive eventually at a strategy that performs well in backtest, even on a holdout data sample. But that by no means guarantees that the strategy will continue to perform well going forward.

The solution to the problem has been clear for some time: what is required is a method of producing synthetic market data that can be used to build a strategy and test it under a wide variety of simulated market conditions. A strategy built in this way is more likely to survive the challenge of live trading than one that has been developed using only a single historical data path.

The problem, however, has been in implementation. Up until now all the attempts to produce credible synthetic price data have failed, for one reason or another, as I described in an earlier post:

I have been able to devise a completely new algorithm for generating artificial price series that meet all of the key requirements, as follows:

  • Computational simplicity & efficiency. Important if we are looking to mass-produce synthetic series for a large number of assets, for a variety of different applications. Some deep learning methods would struggle to meet this requirement, even supposing that transfer learning is possible.
  • The ability to produce price series that are internally consistent (i.e High > Low, etc) in every case .
  • Should be able to produce a range of synthetic series that vary widely in their correspondence to the original price series. In some case we want synthetic price series that are highly correlated to the original; in other cases we might want to test our investment portfolio or risk control systems under extreme conditions never before seen in the market.
  • The distribution of returns in the synthetic series should closely match the historical series, being non-Gaussian and with “fat-tails”.
  • The ability to incorporate long memory effects in the sequence of returns.
  • The ability to model GARCH effects in the returns process.

This means that we are now in a position to develop trading strategies without any direct reference to the underlying market data. Consequently we can then use all of the real market data for out-of-sample back-testing.

Developing a Trading Strategy for the S&P 500 Index Using Synthetic Market Data

To illustrate the procedure I am going to use daily synthetic price data for the S&P 500 Index over the period from Jan 1999 to July 2022. Details of the the characteristics of the synthetic series are given in the post referred to above.

This image has an empty alt attribute; its file name is Fig3-12.png

Because we want to create a trading strategy that will perform under market conditions close to those currently prevailing, I will downsample the synthetic series to include only those that correlate quite closely, i.e. with a minimum correlation of 0.75, with the real price data.

Why do this? Surely if we want to make a strategy as robust as possible we should use all of the synthetic data series for model development?

The reason is that I believe that some of the more extreme adverse scenarios generated by the algorithm may occur quite rarely, perhaps once in every few decades. However, I am principally interested in a strategy that I can apply under current market conditions and I am prepared to take my chances that the worst-case scenarios are unlikely to come about any time soon. This is a major design decision, one that you may disagree with. Of course, one could make use of every available synthetic data series in the development of the trading model and by doing so it is likely that you would produce a model that is more robust. But the training could take longer and the performance during normal market conditions may not be as good.

Having generated the price series, the process I am going to follow is to use genetic programming to develop trading strategies that will be evaluated on all of the synthetic data series simultaneously. I will then use the performance of the aggregate portfolio, i.e. the outcome of all of the trades generated by the strategy when applied to all of the synthetic series, to assess the overall performance. In order to be considered, candidate strategies have to perform well under all of the different market scenarios, or at least the great majority of them. This ensures that the strategy is likely to prove more robust across different types of market conditions, rather than on just the single type of market scenario observed in the real historical series.

As usual in these cases I will reserve a portion (10%) of each data series for testing each strategy, and a further 10% sample for out-of-sample validation. This isn’t strictly necessary: since the real data series has not be used directly in the development of the trading system, we can later test the strategy on all of the historical data and regard this as an out-of-sample backtest.

To implement the procedure I am going to use Mike Bryant’s excellent Adaptrade Builder software.

This is an exemplar of outstanding software engineering and provides a broad range of features for generating trading strategies of every kind. One feature of Builder that is particularly useful in this context is its ability to construct strategies and test them on up to 20 data series concurrently. This enables us to develop a strategy using all of the synthetic data series simultaneously, showing the performance of each individual strategy as well for as the aggregate portfolio.

After evolving strategies for 50 generations we arrive at the following outcome:

The equity curve for the aggregate portfolio is shown in blue, while the equity curves for the strategy applied to individual synthetic data series are shown towards the bottom of the chart. Of course, the performance of the aggregate portfolio appears much superior to any of the individual strategies, because it is effectively the arithmetic sum of the individual equity curves. And just because the aggregate portfolio appears to perform well both in-sample and out-of-sample, that doesn’t imply that the strategy works equally well for every individual market scenario. In some scenarios it performs better than in others, as can be observed from the individual equity curves.

But, in any case, our objective here is not to create a stock portfolio strategy, but rather to trade a single asset – the S&P 500 Index. The role of the aggregate portfolio is simply to suggest that we may have found a strategy that is sufficiently robust to work well across a variety of market conditions, as represented by the various synthetic price series.

Builder generates code for the strategies it evolves in a number of different languages and in this case we take the EasyLanguage code for the fittest strategy #77 and apply it to a daily chart for the S&P 500 Index – i.e. the real data series – in Tradestation, with the following results:

The strategy appears to work well “out-of-the-box”, i,e, without any further refinement. So our quest for a robust strategy appears to have been quite successful, given that none of the 23-year span of real market data on which the strategy was tested was used in the development process.

We can take the process a little further, however, by “optimizing” the strategy. Traditionally this would mean finding the optimal set of parameters that produces the highest net profit on the test data. But this would be curve fitting in the worst possible sense, and is not at all what I am suggesting.

Instead we use a procedure known as Walk Forward Optimization (WFO), as described in this post:

The goal of WFO is not to curve-fit the best parameters, which would entirely defeat the object of using synthetic data. Instead, its purpose is to test the robustness of the strategy. We accomplish this by using a sequence of overlapping in-sample and out-of-sample periods to evaluate how well the strategy stands up, assuming the parameters are optimized on in-sample periods of varying size and start date and tested of similarly varying out-of-sample periods. A strategy that fails a cluster of such tests is unlikely to prove robust in live trading. A strategy that passes a test cluster at least demonstrates some capability to perform well in different market regimes.

To some extent we might regard such a test as unnecessary, given that the strategy has already been observed to perform well under several different market conditions, encapsulated in the different synthetic price series, in addition to the real historical price series. Nonetheless, we conduct a WFO cluster test to further evaluate the robustness of the strategy.

As the goal of the procedure is not to maximize the theoretical profitability of the strategy, but rather to evaluate its robustness, we select a criterion other than net profit as the factor to optimize. Specifically, we select the sum of the areas of the strategy drawdowns as the quantity to minimize (by maximizing the inverse of the sum of drawdown areas, which amounts to the same thing). This requires a little explanation.

If we look at the strategy drawdown periods of the equity curve, we observe several periods (highlighted in red) in which the strategy was underwater:

The area of each drawdown represents the length and magnitude of the drawdown and our goal here is to minimize the sum of these areas, so that we reduce both the total duration and severity of strategy drawdowns.

In each WFO test we use different % of OOS data and a different number of runs, assessing the performance of the strategy on a battery of different criteria:

x

These criteria not only include overall profitability, but also factors such as parameter stability, profit consistency in each test, the ratio of in-sample to out-of-sample profits, etc. In other words, this WFO cluster analysis is not about profit maximization, but robustness evaluation, as assessed by these several different metrics. And in this case the strategy passes every test with flying colors:

Other than validating the robustness of the strategy’s performance, the overall effect of the procedure is to slightly improve the equity curve by diminishing the magnitude and duration of the drawdown periods:

Conclusion

We have shown how, by using synthetic price series, we can build a robust trading strategy that performs well under a variety of different market conditions, including on previously “unseen” historical market data. Further analysis using cluster WFO tests strengthens the assessment of the strategy’s robustness.

Backtest vs. Trading Reality

Kris Sidial, whose Twitter posts are often interesting, recently posted about the reality of trading profitability vs backtest performance, as follows:

While I certainly agree that the latter example is more representative of a typical trader’s P&L, I don’t concur that the first P&L curve is necessarily “99.9% garbage”. There are many strategies that have equity curves that are smoother and more monotonic than those of Kris’s Skeleton Case V2 strategy. Admittedly, most of these lie in the area of high frequency, which is not Kris’s domain expertise. But there are also lower frequency strategies that produce results which are not dissimilar to those shown the first chart.

As a case in point, consider the following strategy for the S&P 500 E-Mini futures contract, described in more detail below. The strategy was developed using 15-minute bar data from 1999 to 2012, and traded live thereafter. The live and backtest performance characteristics are almost indistinguishable, not only in terms of rate of profit, but also in regard to strategy characteristics such as the no. of trades, % win rate and profit factor.

Just in case you think the picture is a little too rosy, I would point out that the average profit factor is 1.25, which means that the strategy is generating only 25% more in profits than losses. There will be big losing trades from time to time and long sequences of losses during which the strategy appears to have broken down. It takes discipline to resist the temptation to “fix” the strategy during extended drawdowns and instead rely on reversion to the mean rate of performance over the long haul. One source of comfort to the trader through such periods is that the 60% win rate means that the majority of trades are profitable.

As you read through the replies to Kris’s post, you will see that several of his readers make the point that strategies with highly attractive equity curves and performance characteristics are typically capital constrained. This is true in the case of this strategy, which I trade with a very modest amount of (my own) capital. Even trading one-lots in the E-Mini futures I occasionally experience missed trades, either on entry or exit, due to limit orders not being filled at the high or low of a bar. In scaling the strategy up to something more meaningful such as a 10-lot, there would be multiple partial fills to deal with. But I think it would be a mistake to abandon a high performing strategy such as this just because of an apparent capacity constraint. There are several approaches one can explore to address the issue, which may be enough to make the strategy scalable.

Where (as here) the issue of scalability relates to the strategy fill rate on limit orders, a good starting point is to compute the extreme hit rate, which is the proportion of trades that take place at the high or low of the bar. As a rule of thumb, for strategies running on typical low frequency infrastructure an extreme hit rate of 10% or less is manageable; anything above that level quickly becomes problematic. If the extreme hit rate is very high, e.g. 25% or more, then you are going to have to pay a great deal of attention to the issues of latency and order priority to make the strategy viable in practise. Ultimately, for a high frequency market making strategy, most orders are filled at the extreme of each “bar”, so almost all of the focus in on minimizing latency and maintaining a high queue priority, with all of the attendant concerns regarding trading hardware, software and infrastructure.

Next, you need a strategy for handling missed trades. You could, for example, decide to skip any entry trades that are missed, while manually entering unfilled exit trades at the market. Or you could post market orders for both entry and exit trades if they are not filled. An extreme solution would be to substitute market-if-touched orders for limit orders in your strategy code. But this would affect all orders generated by the system, not just the 10% at the high or low of the bar and is likely to have a very adverse affect on overall profitability, especially if the average trade is low (because you are paying an extra tick on entry and exit of every trade).

The above suggests that you are monitoring the strategy manually, running simulation and live versions side by side, so that you can pick up any trades that the strategy should have taken, but which have been missed. This may be practical for a strategy that trades during regular market hours, but not for one that also trades the overnight session.

An alternative approach, one that is commonly applied by systematic traders, is to automate the handling of missed trades. Typically the trader will set a parameter that converts a limit order to a market order X seconds after a limit price has been traded but not filled. Of course, this will result in paying up an extra tick (or more) to enter trades that perhaps would have been filled if one had waited longer than X seconds. It will have some negative impact on strategy profitability, but not too much if the extreme hit rate is low. I tend to use this method for exit trades, preferring to skip any entry trades that don’t get filled at the limit price.

Beyond these simple measures, there are several other ways to extend the capacity of the strategy. An obvious place to start is by evaluating strategy performance on different session times and bar lengths. So, in this case, we might look at deploying the strategy on both the day and night sessions. We can also evaluate performance on bars of different length. This will give different entry and exit points for individual trades and trades that are at the extreme of a bar on one timeframe may not be at the high or low of a bar on the other timescale. For example, here is the (simulated) performance of the strategy on 13 minute bars:

There is a reason for choosing a bar interval such as 13 minutes, rather than the more commonplace 5- or 10 minutes, as explained in this post:

Finally, it is worth exploring whether the strategy can be applied to other related markets such as NQ futures, for example. Typically this will entail some change to the strategy code to reflect the difference in price levels, but the thrust of the strategy logic will be similar. Another approach is to use the signals from the current strategy as inputs – i.e. alpha generators – for a derivative strategy, such as trading the SPY ETF based on signals from the ES strategy. The performance of the derived strategy may not be as good, but in a product like SPY the capacity might be larger.

Strategy Backtesting in Mathematica

This is a snippet from a strategy backtesting system that I am currently building in Mathematica.

One of the challenges when building systems in WL is to avoid looping wherever possible. This can usually be accomplished with some thought, and the efficiency gains can be significant. But it can be challenging to get one’s head around the appropriate construct using functions like FoldList, etc, especially as there are often edge cases to be taken into consideration.

A case in point is the issue of calculating the profit and loss from individual trades in a trading strategy. The starting point is to come up with a FoldList compatible function that does the necessary calculations:

CalculateRealizedTradePL[{totalQty_, totalValue_, avgPrice_, PL_,
totalPL_}, {qprice_, qty_}] :=
Module[{newTotalPL = totalPL, price = QuantityMagnitude[qprice],
newTotalQty, tradeValue, newavgPrice, newTotalValue, newPL},
newTotalQty = totalQty + qty;
tradeValue =
If[Sign[qty] == Sign[totalQty] || avgPrice == 0, priceqty, If[Sign[totalQty + qty] == Sign[totalQty], avgPriceqty,
price(totalQty + qty)]]; newTotalValue = If[Sign[totalQty] == Sign[newTotalQty], totalValue + tradeValue, newTotalQtyprice];
newavgPrice =
If[Sign[totalQty + qty] ==
Sign[totalQty], (totalQtyavgPrice + tradeValue)/newTotalQty, price]; newPL = If[(Sign[qty] == Sign[totalQty] ) || totalQty == 0, 0, qty(avgPrice - price)];
newTotalPL = newTotalPL + newPL;
{newTotalQty, newTotalValue, newavgPrice, newPL, newTotalPL}]

Trade P&L is calculated on an average cost basis, as opposed to FIFO or LIFO.

Note that the functions handle both regular long-only trading strategies and short-sale strategies, in which (in the case of equities), we have to borrow the underlying stock to sell it short. Also, the pointValue argument enables us to apply the functions to trades in instruments such as futures for which, unlike stocks, the value of a 1 point move is typically larger than 1(e.g.50 for the ES S&P 500 mini futures contract).

We then apply the function in two flavors, to accommodate both standard numerical arrays and timeseries (associations would be another good alternative):

CalculateRealizedPLFromTrades[tradeList_?ArrayQ, pointValue_ : 1] :=
Module[{tradePL =
Rest@FoldList[CalculateRealizedTradePL, {0, 0, 0, 0, 0},
tradeList]},
tradePL[[All, 4 ;; 5]] = tradePL[[All, 4 ;; 5]]pointValue; tradePL] CalculateRealizedPLFromTrades[tsTradeList_, pointValue_ : 1] := Module[{tsTradePL = Rest@FoldList[CalculateRealizedTradePL, {0, 0, 0, 0, 0}, QuantityMagnitude@tsTradeList["Values"]]}, tsTradePL[[All, 4 ;; 5]] = tsTradePL[[All, 4 ;; 5]]pointValue;
tsTradePL[[All, 2 ;;]] =
Quantity[tsTradePL[[All, 2 ;;]], "US Dollars"];
tsTradePL =
TimeSeries[
Transpose@
Join[Transpose@tsTradeList["Values"], Transpose@tsTradePL],
tsTradeList["DateList"]]]

These functions run around 10x faster that the equivalent functions that use Do loops (without parallelization or compilation, admittedly).

Let’s see how they work with an example:

Trade Simulation

Next, we’ll generate a series of random trades using the AAPL time series, as follows (we also take the opportunity to convert the list of trades into a time series, tsTrades):

trades = Transpose@
Join[Transpose[
tsAAPL["DatePath"][[
Sort@RandomSample[Range[tsAAPL["PathLength"]],
20]]]], {RandomChoice[{-100, 100}, 20]}];
trades // TableForm

Trade P&L Calculation

We are now ready to apply our Trade P&L calculation function, first to the list of trades in array form:

TableForm[
Flatten[#] & /@ 
Partition[
Riffle[trades, 
CalculateRealizedPLFromTrades[trades[[All, 2 ;; 3]]]], 2], 
TableHeadings -> {{}, {"Date", "Price", "Quantity", "Total Qty", 
"Position Value", "Average Price", "P&L", "Total PL"}}]

The timeseries version of the function provides the output as a timeseries object in Quantity[“US Dollars”] format and, of course, can be plotted immediately with DateListPlot (it is also convenient for other reasons, as the complete backtest system is built around timeseries objects):

tsTradePL = CalculateRealizedPLFromTrades[tsTrades]