Survivorship Bias

From my forthcoming book Equity Analytics:

Detecting Survivorship Bias

The relprice Index in the Performance Data table shows the price of the stock relative to the S&P 500 index over a specified period.

Let’s look at the median relPrice for all stocks that are currently members of the S&P500 index, eliminating any for which the relevant Performance Data is missing:

currentSP500 = Select [ allStocks , # [ Symbol Information ] [ SP500 ] && Length [ # [ Performance ] [ [ All , relPrice Index ] ] ] == 7 & ] // Quiet ;

Sort@RandomSample[currentSP500, 10]

We can then obtain the median relprice for this universe of stocks:

# [ Performance ] [ [ All , relPrice Index ] ] & /@ currentSP500 // Median

We would expect that roughly half of the S&P 500 index membership would outperform the index over any given period and consequently that the median relPrice would be close to 1. Indeed this is the case for periods of up to 60 months. But if we look at the period from inception, the median relPrice is 3.46 x this level, indicating a very significant out-performance by the current S&P membership relative to the index.

How does this arise? The composition of the index changes over time and many stocks that were once index members have been removed from the index for various reasons. In a small number of cases this will occur where a stock is acquired after a period of exceptional performance. More typically, a stock will be removed from the index after a period of poor performance, following which the firm’s capital structure no longer meets the criteria for inclusion in the index, or because the stock is delisted after acquisition or bankruptcy of the company. None of these stocks is included in the index currently, but instead have been replaced by the stocks of more successful companies – firms that have “survived”. Consequently, when looking the current membership of the index we are considering only these “survivors” and neglecting those stocks that were once index members but which have since been removed. As a result, the aggregate performance of the current members, the survivors, far exceeds the historical performance of the index, which reflects the impact of those stocks removed from membership, mostly for reasons of under-performance.

The outcome of this is that if you design equity portfolio strategies using a universe comprising only the current index membership, or indeed only stocks that are currently listed, the resulting portfolio is subject to this kind of “survivorship bias”, that will tend to inflate its performance. This probably wont make much difference over shorter periods of up to five years, but if you backtest the strategy over longer periods the results are likely to become subject to significant upward bias that will over-state the expected performance of the strategy in future. You may find evidence of this bias in the form of deteriorating strategy performance over time, for more recent periods covered in the backtest.

A secondary effect of using a survivorship-biased universe, also very important, is that it will prove difficult to identify enough short candidates to be able to design long/short or market-neutral strategies. The long term performance of even the worst performing survivors is such that shorting them will almost always detract from portfolio performance without reducing portfolio risk, due to the highly correlated performance amongst survivors. In order to design such strategies, it is essential that your universe contains stocks that are no longer listed, as many of these will have been delisted for reasons of underperformance. These are the ideal short candidates for your long/short or market-neutral strategy.

In summary, it is vital that the stock universe includes both currently listed and delisted stocks in order to mitigate the impact of survivorship bias.

Let’s take a look at the median relPrice once again, this time including both listed and delisted stocks:

allValidStocks = Select [ allStocks , Length [ # [ Performance ] [ [ All , relPrice Index ] ] ] == 7 & ] // Quiet ;

# [ Performance ] [ [ All , relPrice Index ] ] & /@ allValidStocks // Median