Measuring Toxic Flow for Trading & Risk Management

A common theme of microstructure modeling is that trade flow is often predictive of market direction.  One concept in particular that has gained traction is flow toxicity, i.e. flow where resting orders tend to be filled more quickly than expected, while aggressive orders rarely get filled at all, due to the participation of informed traders trading against uninformed traders.  The fundamental insight from microstructure research is that the order arrival process is informative of subsequent price moves in general and toxic flow in particular.  This is turn has led researchers to try to measure the probability of informed trading  (PIN).  One recent attempt to model flow toxicity, the Volume-Synchronized Probability of Informed Trading (VPIN)metric, seeks to estimate PIN based on volume imbalance and trade intensity.  A major advantage of this approach is that it does not require the estimation of unobservable parameters and, additionally, updating VPIN in trade time rather than clock time improves its predictive power.  VPIN has potential applications both in high frequency trading strategies, but also in risk management, since highly toxic flow is likely to lead to the withdrawal of liquidity providers, setting up the conditions for a flash-crash” type of market breakdown.

The procedure for estimating VPIN is as follows.  We begin by grouping sequential trades into equal volume buckets of size V.  If the last trade needed to complete a bucket was for a size greater than needed, the excess size is given to the next bucket.  Then we classify trades within each bucket into two volume groups:  Buys (V(t)B) and Sells (V(t)S), with V = V(t)B + V(t)S
The Volume-Synchronized Probability of Informed Trading is then derived as:

risk management

Typically one might choose to estimate VPIN using a moving average over n buckets, with n being in the range of 50 to 100.

Another related statistic of interest is the single-period signed VPIN. This will take a value of between -1 and =1, depending on the proportion of buying to selling during a single period t.

Toxic Flow

Fig 1. Single-Period Signed VPIN for the ES Futures Contract

It turns out that quote revisions condition strongly on the signed VPIN. For example, in tests of the ES futures contract, we found that the change in the midprice from one volume bucket the next  was highly correlated to the prior bucket’s signed VPIN, with a coefficient of 0.5.  In other words, market participants offering liquidity will adjust their quotes in a way that directly reflects the direction and intensity of toxic flow, which is perhaps hardly surprising.

Of greater interest is the finding that there is a small but statistically significant dependency of price changes, as measured by first buy (sell) trade price to last sell (buy) trade price, on the prior period’s signed VPIN.  The correlation is positive, meaning that strongly toxic flow in one direction has a tendency  to push prices in the same direction during the subsequent period. Moreover, the single period signed VPIN turns out to be somewhat predictable, since its autocorrelations are statistically significant at two or more lags.  A simple linear auto-regression ARMMA(2,1) model produces an R-square of around 7%, which is small, but statistically significant.

A more useful model, however , can be constructed by introducing the idea of Markov states and allowing the regression model to assume different parameter values (and error variances) in each state.  In the Markov-state framework, the system transitions from one state to another with conditional probabilities that are estimated in the model.

SSALGOTRADING AD

An example of such a model  for the signed VPIN in ES is shown below. Note that the model R-square is over 27%, around 4x larger than for a standard linear ARMA model.

We can describe the regime-switching model in the following terms.  In the regime 1 state  the model has two significant autoregressive terms and one significant moving average term (ARMA(2,1)).  The AR1 term is large and positive, suggesting that trends in VPIN tend to be reinforced from one period to the next. In other words, this is a momentum state. In the regime 2 state the AR2 term is not significant and the AR1 term is large and negative, suggesting that changes in VPIN in one period tend to be reversed in the following period, i.e. this is a mean-reversion state.

The state transition probabilities indicate that the system is in mean-reversion mode for the majority of the time, approximately around 2 periods out of 3.  During these periods, excessive flow in one direction during one period tends to be corrected in the
ensuring period.  But in the less frequently occurring state 1, excess flow in one direction tends to produce even more flow in the same direction in the following period.  This first state, then, may be regarded as the regime characterized by toxic flow.

Markov State Regime-Switching Model

Markov Transition Probabilities

P(.|1)       P(.|2)

P(1|.)        0.54916      0.27782

P(2|.)       0.45084      0.7221

Regime 1:

AR1           1.35502    0.02657   50.998        0

AR2         -0.33687    0.02354   -14.311        0

MA1          0.83662    0.01679   49.828        0

Error Variance^(1/2)           0.36294     0.0058

Regime 2:

AR1      -0.68268    0.08479    -8.051        0

AR2       0.00548    0.01854    0.296    0.767

MA1     -0.70513    0.08436    -8.359        0

Error Variance^(1/2)           0.42281     0.0016

Log Likelihood = -33390.6

Schwarz Criterion = -33445.7

Hannan-Quinn Criterion = -33414.6

Akaike Criterion = -33400.6

Sum of Squares = 8955.38

R-Squared =  0.2753

R-Bar-Squared =  0.2752

Residual SD =  0.3847

Residual Skewness = -0.0194

Residual Kurtosis =  2.5332

Jarque-Bera Test = 553.472     {0}

Box-Pierce (residuals):         Q(9) = 13.9395 {0.124}

Box-Pierce (squared residuals): Q(12) = 743.161     {0}

 

A Simple Trading Strategy

One way to try to monetize the predictability of the VPIN model is to use the forecasts to take directional positions in the ES
contract.  In this simple simulation we assume that we enter a long (short) position at the first buy (sell) price if the forecast VPIN exceeds some threshold value 0.1  (-0.1).  The simulation assumes that we exit the position at the end of the current volume bucket, at the last sell (buy) trade price in the bucket.

This simple strategy made 1024 trades over a 5-day period from 8/8 to 8/14, 90% of which were profitable, for a total of $7,675 – i.e. around ½ tick per trade.

The simulation is, of course, unrealistically simplistic, but it does give an indication of the prospects for  more realistic version of the strategy in which, for example, we might rest an order on one side of the book, depending on our VPIN forecast.

informed trading

Figure 2 – Cumulative Trade PL

References

Easley, D., Lopez de Prado, M., O’Hara, M., Flow Toxicity and Volatility in a High frequency World, Johnson School Research paper Series # 09-2011, 2011

Easley, D. and M. O‟Hara (1987), “Price, Trade Size, and Information in Securities Markets”, Journal of Financial Economics, 19.

Easley, D. and M. O‟Hara (1992a), “Adverse Selection and Large Trade Volume: The Implications for Market Efficiency”,
Journal of Financial and Quantitative Analysis, 27(2), June, 185-208.

Easley, D. and M. O‟Hara (1992b), “Time and the process of security price adjustment”, Journal of Finance, 47, 576-605.

 

Forecasting Financial Markets – Part 1: Time Series Analysis

The presentation in this post covers a number of important topics in forecasting, including:

  • Stationary processes and random walks
  • Unit roots and autocorrelation
  • ARMA models
  • Seasonality
  • Model testing
  • Forecasting
  • Dickey-Fuller and Phillips-Perron tests for unit roots

Also included are a number of detailed worked examples, including:

  1. ARMA Modeling
  2. Box Jenkins methodology
  3. Modeling the US Wholesale Price Index
  4. Pesaran & Timmermann study of excess equity returns
  5. Purchasing Power Parity

 

Forecasting 2011 - Time Series

 

A Practical Application of Regime Switching Models to Pairs Trading

In the previous post I outlined some of the available techniques used for modeling market states.  The following is an illustration of how these techniques can be applied in practice.    You can download this post in pdf format here.

SSALGOTRADING AD

The chart below shows the daily compounded returns for a single pair in an ETF statistical arbitrage strategy, back-tested over a 1-year period from April 2010 to March 2011.

The idea is to examine the characteristics of the returns process and assess its predictability.

Pairs Trading

The initial impression given by the analytics plots of daily returns, shown in Fig 2 below, is that the process may be somewhat predictable, given what appears to be a significant 1-order lag in the autocorrelation spectrum.  We also see evidence of the
customary non-Gaussian “fat-tailed” distribution in the error process.

Regime Switching

An initial attempt to fit a standard Auto-Regressive Moving Average ARMA(1,0,1) model  yields disappointing results, with an unadjusted  model R-squared of only 7% (see model output in Appendix 1)

However, by fitting a 2-state Markov model we are able to explain as much as 65% in the variation in the returns process (see Appendix II).
The model estimates Markov Transition Probabilities as follows.

P(.|1)       P(.|2)

P(1|.)       0.93920      0.69781

P(2|.)     0.060802      0.30219

In other words, the process spends most of the time in State 1, switching to State 2 around once a month, as illustrated in Fig 3 below.

Markov model
In the first state, the  pairs model produces an expected daily return of around 65bp, with a standard deviation of similar magnitude.  In this state, the process also exhibits very significant auto-regressive and moving average features.

Regime 1:

Intercept                   0.00648     0.0009       7.2          0

AR1                            0.92569    0.01897   48.797        0

MA1                         -0.96264    0.02111   -45.601        0

Error Variance^(1/2)           0.00666     0.0007

In the second state, the pairs model  produces lower average returns, and with much greater variability, while the autoregressive and moving average terms are poorly determined.

Regime 2:

Intercept                    0.03554    0.04778    0.744    0.459

AR1                            0.79349    0.06418   12.364        0

MA1                         -0.76904    0.51601     -1.49   0.139

Error Variance^(1/2)           0.01819     0.0031

CONCLUSION
The analysis in Appendix II suggests that the residual process is stable and Gaussian.  In other words, the two-state Markov model is able to account for the non-Normality of the returns process and extract the salient autoregressive and moving average features in a way that makes economic sense.

How is this information useful?  Potentially in two ways:

(i)     If the market state can be forecast successfully, we can use that information to increase our capital allocation during periods when the process is predicted to be in State 1, and reduce the allocation at times when it is in State 2.

(ii)    By examining the timing of the Markov states and considering different features of the market during the contrasting periods, we might be able to identify additional explanatory factors that could be used to further enhance the trading model.

Markov model

Regime-Switching & Market State Modeling

The Excel workbook referred to in this post can be downloaded here.

Market state models are amongst the most useful analytical techniques that can be helpful in developing alpha-signal generators.  That term covers a great deal of ground, with ideas drawn from statistics, econometrics, physics and bioinformatics.  The purpose of this short note is to provide an introduction to some of the key ideas and suggest ways in which they might usefully applied in the context of researching and developing trading systems.

Although they come from different origins, the concepts presented here share common foundational principles:

  1. Markets operate in different states that may be characterized by various measures (volatility, correlation, microstructure, etc);
  2. Alpha signals can be generated more effectively by developing models that are adapted to take account of different market regimes;
  3. Alpha signals may be combined together effectively by taking account of the various states that a market may be in.

Market state models have shown great promise is a variety of applications within the field of applied econometrics in finance, not only for price and market direction forecasting, but also basis trading, index arbitrage, statistical arbitrage, portfolio construction, capital allocation and risk management.

REGIME SWITCHING MODELS

These are econometric models which seek to use statistical techniques to characterize market states in terms of different estimates of the parameters of some underlying linear model.  This is accompanied by a transition matrix which estimates the probability of moving from one state to another.

To illustrate this approach I have constructed a simple example, given in the accompanying Excel workbook.  In this model the market operates as follows:

econometric Where

Yt is a variable of interest (e.g. the return in an asset over the next period t) 

et is an error process with constant variance s2 

S is the market state, with two regimes (S=1 or S=2) 

a0 is the drift in the asset process 

a1 is an autoregressive term, by which the return in the current period is dependent on the prior period return 

b1 is a moving average term, which smoothes the error process 

 This is one of the simplest possible structures, which in more general form can include multiple states, and independent regressions Xi as explanatory variables (such as book pressure, order flow, etc):

econometric

 

SSALGOTRADING AD

The form of the error process et may also be dependent on the market state.  It may simply be that, as in this example, the standard deviation of the error process changes from state to state.  But the changes can also be much more complex:  for instance, the error process may be non-Gaussian, or it may follow a formulation from the GARCH framework.

In this example the state parameters are as follows:

Reg1 Reg 2
s 0.01 0.02
a0 0.005 -0.015
a1 0.40 0.70
b1 0.10 0.20

What this means is that, in the first state the market tends to trend upwards with relatively low volatility.  In the second state, not only is market volatility much higher, but also the trend is 3x as large in the negative direction.

I have specified the following state transition matrix:

Reg1 Reg2
Reg1 0.85 0.15
Reg2 0.90 0.10

This is interpreted as follows:  if the market is in State 1, it will tend to remain in that state 85% of the time, transitioning to State 2 15% of the time.  Once in State 2, the market tends to revert to State 1 very quickly, with 90% probability.  So the system is in State 1 most of the time, trending slowly upwards with low volatility and occasionally flipping into an aggressively downward trending phase with much higher volatility.

The Generate sheet in the Excel workbook shows how observations are generated from this process, from which we select a single instance of 3,000 observations, shown in sheet named Sample.

The sample looks like this:

 

Market state 
 
 

 As anticipated, the market is in State 1 most of the time, occasionally flipping into State 2 for brief periods.

Market state 

 It is well-known that in financial markets we are typically dealing with highly non-Gaussian distributions.  Non-Normality can arise for a number of reasons, including changes in regimes, as illustrated here.  It is worth noting that, even though in this example the process in either market state follows a Gaussian distribution, the combined process is distinctly non-Gaussian in form, having (extremely) fat tails, as shown by the QQ-plot below.

 

 Market state

If we attempt to fit a standard ARMA model to the process, the outcome is very disappointing in terms of the model’s poor explanatory power (R2 0.5%) and lack of fit in the squared-residuals:

 

 

ARIMA(1,0,1)

         Estimate  Std. Err.   t Ratio  p-Value

Intercept                      0.00037    0.00032     1.164    0.244

AR1                            0.57261     0.1697     3.374    0.001

MA1                           -0.63292    0.16163    -3.916        0

Error Variance^(1/2)           0.02015     0.0004    ——   ——

                       Log Likelihood = 7451.96

                    Schwarz Criterion = 7435.95

               Hannan-Quinn Criterion = 7443.64

                     Akaike Criterion = 7447.96

                       Sum of Squares =  1.2172

                            R-Squared =  0.0054

                        R-Bar-Squared =  0.0044

                          Residual SD =  0.0202

                    Residual Skewness = -2.1345

                    Residual Kurtosis =  5.7279

                     Jarque-Bera Test = 3206.15     {0}

Box-Pierce (residuals):         Q(48) = 59.9785 {0.115}

Box-Pierce (squared residuals): Q(50) = 78.2253 {0.007}

              Durbin Watson Statistic = 2.01392

                    KPSS test of I(0) =  0.2001    {<1} *

                 Lo’s RS test of I(0) =  1.2259  {<0.5} *

Nyblom-Hansen Stability Test:  NH(4)  =  0.5275    {<1}

MA form is 1 + a_1 L +…+ a_q L^q.

Covariance matrix from robust formula.

* KPSS, RS bandwidth = 0.

Parzen HAC kernel with Newey-West plug-in bandwidth.

 

 

However, if we keep the same simple form of ARMA(1,1) model, but allow for the possibility of a two-state Markov process, the picture alters dramatically:  now the model is able to account for 98% of the variation in the process, as shown below.

 

Notice that we have succeeded in estimating the correct underlying transition probabilities, and how the ARMA model parameters change from regime to regime much as they should (small positive drift in one regime, large negative drift in the second, etc).

 

Markov Transition Probabilities

                    P(.|1)       P(.|2)

P(1|.)            0.080265      0.14613

P(2|.)             0.91973      0.85387

 

                              Estimate  Std. Err.   t Ratio  p-Value

Logistic, t(1,1)              -2.43875     0.1821    ——   ——

Logistic, t(1,2)              -1.76531     0.0558    ——   ——

Non-switching parameters shown as Regime 1.

 

Regime 1:

Intercept                     -0.05615    0.00315   -17.826        0

AR1                            0.70864    0.16008     4.427        0

MA1                           -0.67382    0.16787    -4.014        0

Error Variance^(1/2)           0.00244     0.0001    ——   ——

 

Regime 2:

Intercept                      0.00838     2e-005   419.246        0

AR1                            0.26716    0.08347     3.201    0.001

MA1                           -0.26592    0.08339    -3.189    0.001

 

                       Log Likelihood = 12593.3

                    Schwarz Criterion = 12557.2

               Hannan-Quinn Criterion = 12574.5

                     Akaike Criterion = 12584.3

                       Sum of Squares =  0.0178

                            R-Squared =  0.9854

                        R-Bar-Squared =  0.9854

                          Residual SD =  0.002

                    Residual Skewness = -0.0483

                    Residual Kurtosis = 13.8765

                     Jarque-Bera Test = 14778.5     {0}

Box-Pierce (residuals):         Q(48) = 379.511     {0}

Box-Pierce (squared residuals): Q(50) = 36.8248 {0.917}

              Durbin Watson Statistic = 1.50589

                    KPSS test of I(0) =  0.2332    {<1} *

                 Lo’s RS test of I(0) =  2.1352 {<0.005} *

Nyblom-Hansen Stability Test:  NH(9)  =  0.8396    {<1}

MA form is 1 + a_1 L +…+ a_q L^q.

Covariance matrix from robust formula.

* KPSS, RS bandwidth = 0.

Parzen HAC kernel with Newey-West plug-in bandwidth.

regime switching

There are a variety of types of regime switching mechanisms we can use in state models:

 

Hamiltonian – the simplest, where the process mean and variance vary from state to state

Markovian – the approach used here, with state transition matrix

Explained Switching – where the process changes state as a result of the influence of some underlying variable (such as interest rate volatility, for example)

Smooth Transition – comparable to explained Markov switching, but without and explicitly probabilistic interpretation.

 

 

This example is both rather simplistic and pathological at the same time:  the states are well-separated , by design, whereas for real processes they tend to be much harder to distinguish.  A difficulty of this methodology is that the models can be very difficult to estimate.  The likelihood function tends to be very flat and there are a great many local maxima that give similar fit, but with widely varying model forms and parameter estimates.  That said, this is a very rich class of models with a great many potential applications.

Falling Water

The current 15-year drought in the South West is the most severe since recordkeeping for the Colorado River began in 1906.  Lake Mead, which supplies much of the water to Colorado Basin communities, is now more than half empty.

bathrub

A 120 foot high band of rock, bleached white by the water, and known as the “bathtub ring” encircles the lake, a stark reminder of the water crisis that has enveloped the surrounding region.  The Colorado River takes a 1,400 mile journey from the Rockies to Mexico, irrigating over 5 million acres of farmland in the Basin states of Wyoming, Utah, Colorado, New Mexico, Nevada, Arizona, and California.

 

map

The Colorado River Compact signed in 1922 enshrined the States’ water rights in law and Mexico was added to the roster in 1994, taking the total allocation to over 16.5 million acre-feet per year.  But the average freshwater input to the lake over the century from 1906 to 2005 reached only 15 million acre-feet.  The river can’t come close to meeting current demand and the problem is only likely to get worse. A 2009 study found that rainfall in the Colorado Basin could fall as much as 15% over the next 50 years and the shortfall in deliveries could reach 60% to 90% of the time.

 

 

Impact on Las Vegas

With an average of only 4 inches of rain a year, and a daily high temperatures of 103 o F during the summer, Las Vegas is perhaps the most hard pressed to meet the demand of its 2 million residents and 40 million visitors annually.   bellagio

Despite its conspicuous consumption, from the tumbling fountains of the Bellagio to the Venetian’s canals, since 2002, Las Vegas has been obliged to cut its water use by a third, from 314 gallons per capita a day to 212. The region recycles around half of its wastewater which is piped back into Lake Mead, after cleaning and treatment.  Residents are allowed to water their gardens no more than one day a week in peak season, and there are stiff fines for noncompliance.

SSALGOTRADING AD

The Third Straw

WATER_Last straw_sidebar 02_1

Historically, two intake pipes carried water from Lake Mead to Las Vegas, about 25 miles to the west. In 2012, realizing that the highest of these, at 1050 feet, would soon be sucking air, the Southern Nevada Water Authority began construction of a new pipeline. Known as the Third Straw, Intake No. 3 reaches 200 feet deeper into the lake—to keep water flowing for as long as there’s water to pump.  The new pipeline, which commenced operations in 2015, doesn’t draw more water from the lake than before, or make the surface level drop any faster. But it will keep taps flowing in Las Vegas homes and casinos even if drought-stricken Lake Mead drops to its lowest levels.

WATER_three images

 

 

 

 

 

 

Modeling Water Levels in Lake Mead

The monthly reported water levels in Lake Mead from Feb 1935 to June 2016 are shown in the chart below. The reference line is the drought level, historically defined as 1,125 feet.

chart1

 

One statistical technique widely applied in hydrology involves fitting a Kumaraswamy distribution to the relative water level.  According to the Arizona Game and Fish Department, the maximum lake level is 1229 feet.  We model the water level relative to the maximum level, as follows.

code1

 

chart2

The fit of the distribution appears quite good, even in the tails:

ProbabilityPlot[relativeLevels, edist]

chart3

 

Since water levels have been below the drought level for some time, let’s instead consider the “emergency” level, 1,075 feet.  According to this model, there is just over a 6% chance of Lake Mead hitting the emergency level and, consequently, a high probability of breaching the emergency threshold some time over before the end of 2017.

 

code2 code4

chart4

 

One problem with this approach is that it assumes that each observation is drawn independently from a random variable with the estimated distribution.  In reality, there are high levels of autocorrelation in the series, as might be expected:  lower levels last month typically increase the likelihood of lower levels this month.  The chart of the autocorrelation coefficients makes this pattern clear, with statistically significant coefficients at lags of up to 36 months.

ts[“ACFPlot”]

chart5

 

 

An alternative methodology that enables us to take account of the autocorrelation in the process is time series analysis.  We proceed to fit an autoregressive moving average (ARMA) model as follows:

tsm = TimeSeriesModelFit[ts, “ARMA”]

arma

The best fitting model in an ARMA(1,1) model, according to the AIC criterion:

armatable

Applying the fitted ARMA model, we forecast the water level in Lake Mead over the next ten years as shown in the chart below.  Given the mean-reverting moving average component of the model, it is not surprising to see the model forecasting a return to normal levels.

chart6

 

There is some evidence of lack of fit in the ARMA model, as shown in the autocorrelations of the model residuals:

chart7

A formal test reveals that residual autocorrelations at lags 4 and higher are jointly statistically significant:

chart8

The slowly decaying pattern of autocorrelations in the water level series suggests a possible “long memory” effect, which can be better modelled as a fractionally integrated process.  The forecasts from such a model, like the ARMA model forecasts, display a tendency to revert to a long term mean; but the reversion process is dampened by the reinforcing, long-memory effect captured in the FARIMA model.

chart9

The Prospects for the Next Decade

Taking the view that the water level in Lake Mead forms a stationary statistical process, the likelihood is that water levels will rise to 1,125 feet or more over the next ten years, easing the current water shortage in the region.

On the other hand, there are good reasons to believe that there are exogenous (deterministic) factors in play, specifically the over-consumption of water at a rate greater than the replenishment rate from average rainfall levels.  Added to this, plausible studies suggest that average rainfall in the Colorado Basin is expected to decline over the next fifty years.  Under this scenario, the water level in Lake Mead will likely continue to deteriorate, unless more stringent measures are introduced to regulate consumption.

 Economic Impact

The drought in the South West affects far more than just the water levels in Lake Mead, of course.  One study found that California’s agriculture sector alone had lost $2.2Bn and some 17,00 season and part time jobs in 2014, due to drought.  Agriculture uses more than 80% of the State’s water, according to Fortune magazine, which goes on to identify the key industries most affected, including agriculture, food processing, semiconductors, energy, utilities and tourism.

Dry fields and bare trees at Panoche Road, looking west, on Wednesday February 5, 2014, near San Joaquin, CA. California drought has hit the Central Valley hard.
Dry fields and bare trees at Panoche Road, looking west, on Wednesday February 5, 2014, near San Joaquin, CA. California drought has hit the Central Valley hard.

In the energy sector, for example, the loss of hydroelectric power cost CA around $1.4Bn in 2014, according to non-profit research group Pacific Institute.  Although Intel pulled its last fabrication plant from California in 2009, semiconductor manufacturing is still a going concern in the state. Maxim Integrated, TowerJazz, and TSI Semiconductors all still have fabrication plants in the state. And they need a lot of water. A single semiconductor fabrication plant can use as much water as a small city. That means the current plants could represent three cities worth of consumption.

The drought is also bad news for water utilities, of course. The need to conserve water raises the priority on repair and maintenance, and that means higher costs and lower profit. Complicating the problem, California lacks any kind of management system for its water supply and can’t measure the inflows and outflows to ground water levels at any particular time.

The Bureau of Reclamation has studied more than two dozen options for conserving and increasing water supply, including importation, desalination and reuse. While some were disregarded for being too costly or difficult, the bureau found that the remaining options, if instituted, could yield 3.7 million acre feet per year in savings and new supplies, increasing to 7 million acre feet per year by 2060.  Agriculture is the biggest user by far and has to be part of any solution. In the near term, the agriculture industry could reduce its use by 10 to 15 percent without changing the types of crops it grows by using new technology, such as using drip irrigation instead of flood irrigation and monitoring soil moisture to prevent overwatering, the Pacific Institute found.

Conclusion

We can anticipate that a series of short term fixes, like the “Third Straw”, will be employed to kick the can down the road as far as possible, but research now appears almost unanimous in finding that drought is having a deleterious, long term affect on the economics of the South Western states.  Agriculture is likely to have to bear the brunt of the impact, but so too will adverse consequences be felt in industries as disparate as food processing, semiconductors and utilities.  California, with the largest agricultural industry, by far, is likely to be hardest hit.  The Las Vegas region may be far less vulnerable, having already taken aggressive steps to conserve and reuse water supply and charge economic rents for water usage.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Modeling Water Levels in Lake Mead

 

The monthly reported water levels in Lake Mead from Feb 1935 to June 2016 are shown in the chart below. The reference line is the drought level, historically defined as 1,125 feet.

 

 

One statistical technique widely applied in hydrology involves fitting a Kumaraswamy distribution to the relative water level.  According to the Arizona Game and Fish Department, the maximum lake level is 1229 feet.  We model the water level relative to the maximum level, as follows.

 

 

 

The fit of the distribution appears quite good, even in the tails:

 

ProbabilityPlot[relativeLevels, edist]

 

 

Since water levels have been below the drought level for some time, let’s instead consider the “emergency” level, 1,075 feet.  According to this model, there is just over a 6% chance of Lake Mead hitting the emergency level and, consequently, a high probability of breaching the emergency threshold some time over before the end of 2017.

 

 

 

 

 

 

One problem with this approach is that it assumes that each observation is drawn independently from a random variable with the estimated distribution.  In reality, there are high levels of autocorrelation in the series, as might be expected:  lower levels last month typically increase the likelihood of lower levels this month.  The chart of the autocorrelation coefficients makes this pattern clear, with statistically significant coefficients at lags of up to 36 months.

 

 

ts[“ACFPlot”]

 

 

 

An alternative methodology that enables us to take account of the autocorrelation in the process is time series analysis.  We proceed to fit an autoregressive moving average (ARMA) model as follows:

 

tsm = TimeSeriesModelFit[ts, “ARMA”]

 

The best fitting model in an ARMA(1,1) model, according to the AIC criterion:

 

 

Applying the fitted ARMA model, we forecast the water level in Lake Mead over the next ten years as shown in the chart below.  Given the mean-reverting moving average component of the model, it is not surprising to see the model forecasting a return to normal levels.

 

 

There is some evidence of lack of fit in the ARMA model, as shown in the autocorrelations of the model residuals:

 

 

 

 

A formal test reveals that residual autocorrelations at lags 4 and higher are jointly statistically significant:

 

 

 

The slowly decaying pattern of autocorrelations in the water level series suggests a possible “long memory” effect, which can be better modelled as a fractionally integrated process.  The forecasts from such a model, like the ARMA model forecasts, display a tendency to revert to a long term mean; but the reversion process is dampened by the reinforcing, long-memory effect captured in the FARIMA model.

 

 

The Prospects for the Next Decade

Taking the view that the water level in Lake Mead forms a stationary statistical process, the likelihood is that water levels will rise to 1,125 feet or more over the next ten years, easing the current water shortage in the region.

 

On the other hand, there are good reasons to believe that there are exogenous (deterministic) factors in play, specifically the over-consumption of water at a rate greater than the replenishment rate from average rainfall levels.  Added to this, plausible studies suggest that average rainfall in the Colorado Basin is expected to decline over the next fifty years.  Under this scenario, the water level in Lake Mead will likely continue to deteriorate, unless more stringent measures are introduced to regulate consumption.

 

 

Economic Impact

 

The drought in the South West affects far more than just the water levels in Lake Mead, of course.  One study found that California’s agriculture sector alone had lost $2.2Bn and some 17,00 season and part time jobs in 2014, due to drought.  Agriculture uses more than 80% of the State’s water, according to Fortune magazine, which goes on to identify the key industries most affected, including agriculture, food processing, semiconductors, energy, utilities and tourism.

In the energy sector, for example, the loss of hydroelectric power cost CA around $1.4Bn in 2014, according to non-profit research group Pacific Institute.

 

Although Intel pulled its last fabrication plant from California in 2009, semiconductor manufacturing is still a going concern in the state. Maxim Integrated, TowerJazz, and TSI Semiconductors all still have fabrication plants in the state. And they need a lot of water. A single semiconductor fabrication plant can use as much water as a small city. That means the current plants could represent three cities worth of consumption.

The drought is also bad news for water utilities, of course. The need to conserve water raises the priority on repair and maintenance, and that means higher costs and lower profit. Complicating the problem, California lacks any kind of management system for its water supply and can’t measure the inflows and outflows to ground water levels at any particular time.

 

The Bureau of Reclamation has studied more than two dozen options for conserving and increasing water supply, including importation, desalination and reuse. While some were disregarded for being too costly or difficult, the bureau found that the remaining options, if instituted, could yield 3.7 million acre feet per year in savings and new supplies, increasing to 7 million acre feet per year by 2060.  Agriculture is the biggest user by far and has to be part of any solution. In the near term, the agriculture industry could reduce its use by 10 to 15 percent without changing the types of crops it grows by using new technology, such as using drip irrigation instead of flood irrigation and monitoring soil moisture to prevent overwatering, the Pacific Institute found.

 

Conclusion

We can anticipate that a series of short term fixes, like the “Third Straw”, will be employed to kick the can down the road as far as possible, but research now appears almost unanimous in finding that drought is having a deleterious, long term affect on the economics of the South Western states.  Agriculture is likely to have to bear the brunt of the impact, but so too will adverse consequences be felt in industries as disparate as food processing, semiconductors and utilities.  California, with the largest agricultural industry, by far, is likely to be hardest hit.  The Las Vegas region may be far less vulnerable, having already taken aggressive steps to conserve and reuse water supply and charge economic rents for water usage.

 

 

 

 

 

 

 

 

 

Profit Margins – Are they Predicting a Crash?

Jeremy Grantham: A Bullish Bear

Is Jeremy Grantham, co-founder and CIO of GMO, bullish or bearish these days?  According to Myles Udland at Business Insider, he’s both.  He quotes Grantham:

“I think the global economy and the U.S. in particular will do better than the bears believe it will because they appear to underestimate the slow-burning but huge positive of much-reduced resource prices in the U.S. and the availability of capacity both in labor and machinery.”

Grantham

Udland continues:

“On top of all this is the decline in profit margins, which Grantham has called the “most mean-reverting series in finance,” implying that the long period of elevated margins we’ve seen from American corporations is most certainly going to come an end. And soon. “

fredgraph

Corporate Profit Margins as a Leading Indicator

The claim is an interesting one.  It certainly looks as if corporate profit margins are mean-reverting and, possibly, predictive of recessionary periods. And there is an economic argument why this should be so, articulated by Grantham as quoted in an earlier Business Insider article by Sam Ro:

“Profit margins are probably the most mean-reverting series in finance, and if profit margins do not mean-revert, then something has gone badly wrong with capitalism.

If high profits do not attract competition, there is something wrong with the system and it is not functioning properly.”

Thomson Research / Barclays Research’s take on the same theme echoes Grantham:

“The link between profit margins and recessions is strong,” Barclays’ Jonathan Glionna writes in a new note to clients. “We analyze the link between profit margins and recessions for the last seven business cycles, dating back to 1973. The results are not encouraging for the economy or the market. In every period except one, a 0.6% decline in margins in 12 months coincided with a recession.”

barclays-margin

Buffett Weighs in

Even Warren Buffett gets in on the act (from 1999):

“In my opinion, you have to be wildly optimistic to believe that corporate profits as a percent of GDP can, for any sustained period, hold much above 6%.”

warren-buffett-477

With the Illuminati chorusing as one on the perils of elevated rates of corporate profits, one would be foolish to take a contrarian view, perhaps.  And yet, that claim of Grantham’s (“probably the most mean-reverting series in finance”) poses a challenge worthy of some analysis.  Let’s take a look.

The Predictive Value of Corporate Profit Margins

First, let’s reproduce the St Louis Fed chart:

CPGDP
Corporate Profit Margins

A plot of the series autocorrelations strongly suggests that the series is not at all mean-reverting, but non-stationary, integrated order 1:

CPGDPACF
Autocorrelations

 

Next, we conduct an exhaustive evaluation of a wide range of time series models, including seasonal and non-seasonal ARIMA and GARCH:

ModelFit ModelFitResults

The best fitting model (using the AIC criterion) is a simple ARMA(0,1,0) model, integrated order 1, as anticipated.  The series is apparently difference-stationary, with no mean-reversion characteristics at all.  Diagnostic tests indicate no significant patterning in the model residuals:

ModelACF
Residual Autocorrelations
LjungPlot
Ljung-Box Test Probabilities

Using the model to forecast a range of possible values of the Corporate Profit to GDP ratio over the next 8 quarters suggests a very wide range, from as low as 6% to as high as 13%!

Forecast

 

CONCLUSION

The opinion of investment celebrities like Grantham and Buffett to the contrary, there really isn’t any evidence in the data to support the suggestion that corporate profit margins are mean reverting, even though common-sense economics suggests they should be.

The best-available econometric model produces a very wide range of forecasts of corporate profit rates over the next two years, some even higher than they are today.

If a recession is just around the corner,  corporate profit margins aren’t going to call it for us.