Learning the Kalman Filter

Michael Kleder’s “Learning the Kalman Filter” mini tutorial, along with the great feedback it has garnered (73 comments and 67 ratings, averaging 4.5 out of 5 stars),  is one of the most popular downloads from Matlab Central and for good reason.

In his in-file example, Michael steps through a Kalman filter example in which a voltmeter is used to measure the output of a 12-volt automobile battery. The model simulates both randomness in the output of the battery, and error in the voltmeter readings. Then, even without defining an initial state for the true battery voltage, Michael demonstrates that with only 5 lines of code, the Kalman filter can be implemented to predict the true output based on (not-necessarily-accurate) uniformly spaced, measurements:

 

This is a simple but powerful example that shows the utility and potential of Kalman filters. It’s sure to help those who are trepid about delving into the world of Kalman filtering.

Using Volatility to Predict Market Direction

Decomposing Asset Returns

 

We can decompose the returns process Rt as follows:

While the left hand side of the equation is essentially unforecastable, both of the right-hand-side components of returns display persistent dynamics and hence are forecastable. Both the signs of returns and magnitude of returns are conditional mean dependent and hence forecastable, but their product is conditional mean independent and hence unforecastable. This is an example of a nonlinear “common feature” in the sense of Engle and Kozicki (1993).

Although asset returns are essentially unforecastable, the same is not true for asset return signs (i.e. the direction-of-change). As long as expected returns are nonzero, one should expect sign dependence, given the overwhelming evidence of volatility dependence. Even in assets where expected returns are zero, sign dependence may be induced by skewness in the asset returns process.  Hence market timing ability is a very real possibility, depending on the relationship between the mean of the asset returns process and its higher moments. The highly nonlinear nature of the relationship means that conditional sign dependence is not likely to be found by traditional measures such as signs autocorrelations, runs tests or traditional market timing tests. Sign dependence is likely to be strongest at intermediate horizons of 1-3 months, and unlikely to be important at very low or high frequencies. Empirical tests demonstrate that sign dependence is very much present in actual US equity returns, with probabilities of positive returns rising to 65% or higher at various points over the last 20 years. A simple logit regression model captures the essentials of the relationship very successfully.

Now consider the implications of dependence and hence forecastability in the sign of asset returns, or, equivalently, the direction-of-change. It may be possible to develop profitable trading strategies if one can successfully time the market, regardless of whether or not one is able to forecast the returns themselves.  

There is substantial evidence that sign forecasting can often be done successfully. Relevant research on this topic includes Breen, Glosten and Jaganathan (1989), Leitch and Tanner (1991), Wagner, Shellans and Paul (1992), Pesaran and Timmerman (1995), Kuan and Liu (1995), Larsen and Wozniak (10050, Womack (1996), Gencay (1998), Leung Daouk and Chen (1999), Elliott and Ito (1999) White (2000), Pesaran and Timmerman (2000), and Cheung, Chinn and Pascual (2003).

There is also a huge body of empirical research pointing to the conditional dependence and forecastability of asset volatility. Bollerslev, Chou and Kramer (1992) review evidence in the GARCH framework, Ghysels, Harvey and Renault (1996) survey results from stochastic volatility modeling, while Andersen, Bollerslev and Diebold (2003) survey results from realized volatility modeling.

Sign Dynamics Driven By Volatility Dynamics

Let the returns process Rt be Normally distributed with mean m and conditional volatility st.

The probability of a positive return Pr[Rt+1 >0] is given by the Normal CDF F=1-Prob[0,f]


 

 

For a given mean return, m, the probability of a positive return is a function of conditional volatility st. As the conditional volatility increases, the probability of a positive return falls, as illustrated in Figure 1 below with m = 10% and st = 5% and 15%.

In the former case, the probability of a positive return is greater because more of the probability mass lies to the right of the origin. Despite having the same, constant expected return of 10%, the process has a greater chance of generating a positive return in the first case than in the second. Thus volatility dynamics drive sign dynamics.  

 Figure 1

Email me at jkinlay@investment-analytics.com.com for a copy of the complete article.


 

 

 

 

Understanding Stock Price Range Forecasts

Stock Price Range Forecasts

Range forecasts are produced by estimating the parameters of a Geometric Brownian Motion process from historical data and using the model to project a large number of sample paths for the stock price over the coming month and year.

For example, this is a range forecast for Netflix, Inc. (NFLX) as at 7/27/2018 when the price of the stock stood at $355.21:

$NFLX

As you can see, the great majority of the simulated price paths trend upwards.  This is typical for most stocks on account of their upward drift, a tendency to move higher over time.  The statistical table below the chart tells you that in 50% of cases the ending stock price 1 month from the date of forecast was in the range $352.15 to $402.49. Similarly, around 50% of the time the price of the stock in one year’s time were found to be in the range $565.01 to $896.69.  Notice that the end points of the one-year range far exceed the end points of the one-month range forecast – again this is a feature of the upward drift in stocks.

If you want much greater certainty about the outcome, you should look at the 95% ranges.  So, for NFLX, the one month 95% range was projected to be $310.06 to $457.13.  Here, only 1 in 20 of the simulated price paths produced one month forecasts that were higher than $457.13, or lower than $310.06.

Notice that the spread of the one month and one year 95% ranges is much larger than of the corresponding 50% ranges.  This demonstrates the fundamental tradeoff between “accuracy” (the spread of the range) and “certainty”, (the probability of the outcome being with the projected range).  If you want greater certainty of the outcome, you have to allow for a broader span of possibilities, i.e. a wider range.

SSALGOTRADING AD

Uses of Range Forecasts

Most stock analysts tend to produce single price “targets”, rather than a range – these are known as “point forecasts” by econometricians.  So what’s the thinking behind range forecasts?

Range forecasts are arguably more useful than simple point forecasts.  Point forecasts make no guarantee as to the likelihood of the projected price – the only thing we know for sure about such forecasts is that they will be wrong!  Is the forecast target price optimistic or pessimistic?  We have no way to tell.

With range forecasts the situation is very different.  We can talk about the likelihood of a stock being within a specified range at a certain point in time.  If we want to provide a pessimistic forecast for the price in NFLX in one month’s time, for example, we could quote the value $352.15, the lower end of the 50% range forecast.  If we wanted to provide a very pessimistic forecast, one that is very likely to be exceeded, we could quote the bottom of the 95% range: $310.06.

The range also tells us about the future growth prospects for the firm.  So, for example, with NFLX, based on past performance, it is highly likely that the stock price will grow at a rate of more than 2.4% and, optimistically, might increase by almost 3x in the coming year (see the growth rates calculated for the 95% range values).

One specific use of range forecasts is in options trading.  If a trader is bullish on NFLX, instead of buying the stock, he might instead choose to sell one-month put options with a strike price below $352 (the lower end of the 50% one-month range).  If the trader wanted to be more conservative, he might look for put options struck at around $310, the bottom of the 95% range.  A more complex strategy might be to buy calls struck near the top of the 50% range, and sell more calls struck near the top of the 95% range (the theory being that the stock is quite likely to exceed the top of the 50% one-month range, but much less likely to reach the high end of the 95% range).

Limitations of Range Forecasts

Range forecasts are produced by using historical data to estimate the parameters of a particular type of mathematical model, known as a Geometric Brownian Motion process.  For those who are interested in the mechanics of how the forecasts are produced, I have summarized the relevant background theory below.

While there are grounds for challenging the use of such models in this context, it has to be acknowledged that the GBM process is one of the most successful mathematical models in finance today.  The problem lies not so much in the model, as in one of the key assumptions underpinning the approach:  specifically, that the characteristics of the stock process will remain as they are today (and as they have been in the historical past).  This assumption is manifestly untenable when applied to many stocks:  a company that was a high-growth $100M start-up is unlikely to demonstrate the same  rate of growth ten years later, as a $10Bn enterprise.  A company like Amazon that started out as an online book seller has fundamentally different characteristics today, as an online retail empire.  In such cases, forecasts about the future stock price – whether point or range forecasts – based on outdated historical informations are likely to be wrong, sometimes wildly so.

Having said that, there are a great many companies that have evolved to a point of relative stability over a period of perhaps several decades: for example, a company like Caterpillar Inc. (CAT).  In such cases the parameters of the GBM process underpinning the stock price are unlikely to fluctuate widely in the short term, so range forecasts are consequently more likely to be useful.

Another factor to consider are quarterly earnings reports, which can influence stock prices considerably in the short term, and corporate actions (mergers, takeovers, etc) that can change the long term characteristics of a firm and its stock price process in a fundamental way.  In these situations any forecast methodology is likely to unreliable, at least for a while, until the event has passed.  It’s best to avoid taking positions based on projections from historical data at times like this.

Review of Background Theory

 

GBM1

GBM2 GBM3 GBM4 GBM5

Correlation Cointegration

In a previous post I looked at ways of modeling the relationship between the CBOE VIX Index and the Year 1 and Year 2 CBOE Correlation Indices:

http://jonathankinlay.com/2017/08/modeling-volatility-correlation/

 

The question was put to me whether the VIX and correlation indices might be cointegrated.

Let’s begin by looking at the pattern of correlation between the three indices:

VIX-Correlation1 VIX-Correlation2 VIX-Correlation3

If you recall from my previous post, we were able to fit a linear regression model with the Year 1 and Year 2 Correlation Indices that accounts for around 50% in the variation in the VIX index.  While the model certainly has its shortcomings, as explained in the post, it will serve the purpose of demonstrating that the three series are cointegrated.  The standard Dickey-Fuller test rejects the null hypothesis of a unit root in the residuals of the linear model, confirming that the three series are cointegrated, order 1.

SSALGOTRADING AD

UnitRootTest

 

Vector Autoregression

We can attempt to take the modeling a little further by fitting a VAR model.  We begin by splitting the data into an in-sample period from Jan 2007 to Dec 2015 and an out-of-sample test period from Jan 2016  to Aug 2017.  We then fit a vector autoregression model to the in-sample data:

VAR Model

When we examine how the model performs on the out-of-sample data, we find that it fails to pick up on much of the variation in the series – the forecasts are fairly flat and provide quite poor predictions of the trends in the three series over the period from 2016-2017:

VIX-CorrelationForecast

Conclusion

The VIX and Correlation Indices are not only highly correlated, but also cointegrated, in the sense that a linear combination of the series is stationary.

One can fit a weakly stationary VAR process model to the three series, but the fit is quite poor and forecasts from the model don’t appear to add much value.  It is conceivable that a more comprehensive model involving longer lags would improve forecasting performance.

 

 

Conditional Value at Risk Models

One of the most widely used risk measures is the Value-at-Risk, defined as the expected loss on a portfolio at a specified confidence level. In other words, VaR is a percentile of a loss distribution.
But despite its popularity VaR suffers from well-known limitations: its tendency to underestimate the risk in the (left) tail of the loss distribution and its failure to capture the dynamics of correlation between portfolio components or nonlinearities in the risk characteristics of the underlying assets.

SSALGOTRADING AD

One method of seeking to address these shortcomings is discussed in a previous post Copulas in Risk Management. Another approach known as Conditional Value at Risk (CVaR), which seeks to focus on tail risk, is the subject of this post.  We look at how to estimate Conditional Value at Risk in both Gaussian and non-Gaussian frameworks, incorporating loss distributions with heavy tails and show how to apply the concept in the context of nonlinear time series models such as GARCH.


 

Var, CVaR and Heavy Tails

 

Metal Logic

Picture1

Precious metals have been in free-fall for several years, as a consequence of the Fed’s actions to stimulate the economy that have also had the effect of goosing the equity and fixed income markets.  All that changed towards the end of 2015, as the Fed moved to a tightening posture.   So far, 2016 has been a banner year for metal, with spot prices for platinum, gold and silver up 26%, 28% and 44% respectively.

So what are the prospects for metals through the end of the year?  We take a shot at predicting the outcome, from a quantitative perspective.

Picture2

Picture23

  Source: Wolfram Alpha. Spot silver prices are scaled x100

 

Metals as Correlated Processes

One of the key characteristics of metals is the very high levels of price-correlation between them.  Over the period under investigation, Jan 2012 to Aug 2016, the estimated correlation coefficients are as follows:

Picture4

 

A plot of the join density of spot gold and silver prices indicates low- and high-price regimes in which the metals display similar levels of linear correlation.

Picture24

 

 

Picture5

Simple Metal Trading Models

Levels of correlation that are consistently as high as this over extended periods of time are fairly unusual in financial markets and this presents a potential trading opportunity. One common approach is to use the ratios of metal prices as a trading signal.  However, taking the ratio of gold to silver spot prices as an example, a plot of the series demonstrates that it is highly unstable and susceptible to long term trends.

A more formal statistical test fails to reject the null hypothesis of a unit root.  In simple terms, this means we cannot reliably distinguish between the gold/silver price ratio and a random walk.

Picture6

 

Along similar lines, we might consider the difference in log prices of the series.  If this proved to be stationary then the log-price series would be cointegrated order 1 and we could build a standard pairs trading model to buy or sell the spread when prices become too far unaligned.  However, we find once again that the log-price difference can wander arbitrarily far from its mean, and we are unable to reject the null hypothesis that the series contains a unit root.

SSALGOTRADING AD

 

Picture7

 

Linear Models

We can hope to do better with a standard linear model, regressing spot silver prices against spot gold prices.  The fit of the best linear model is very good, with an R-sq of over 96%:

Picture8

 

A trader might look to exploit the correlation relationship by selling silver when its market price is greater than the value estimated by the model (and buying when the model price exceeds the market price).  Typically the spread is bought or sold when the log-price differential exceeds a threshold level that is set at twice the standard deviation of the price-difference series.  The threshold levels derive from the assumption of Normality, which in fact does not apply here, as we can see from an examination of the residuals of the linear model:

Picture9 Picture10

 

 

Given the evident lack of fit, especially in the left tail of the distribution, it is unsurprising that all of the formal statistical tests for Normality easily reject the null hypothesis:

Picture11

 

However, Normality, or the lack of it, is not the issue here:  one could just as easily set the 2.5% and 97.5% percentiles of the empirical distribution as trade entry points.  The real problem with the linear model is that it fails to take into account the time dependency in the price series.  An examination of the residual autocorrelations reveals significant patterning, indicating that the model tends to under-or over-estimate the spot price of silver for long periods of time:

Picture12

 

As the following chart shows, the cumulative difference between model and market prices can become very large indeed.  A trader risks going bust waiting for the market to revert to model prices.

Picture13

 

How does one remedy this?  The shortcoming of the simple linear model is that, while it captures the interdependency between the price series very well, it fails to factor in the time dependency of the series. What is required is a model that will account for both features.

 

Multivariate Vector Autoregression Model

Rather than modeling the metal prices individually, or in pairs, we instead adopt a multivariate vector autoregression approach, modeling all three spot price processes together.  The essence of the idea is that spot prices in each metal may be influenced, not only by historical values of the series, but also potentially by current and lagged prices of the other two metals.

Before proceeding we divide the data into two parts: an in-sample data set comprising data from 2012 to the end of 2015 and an out-of-sample period running from Jan-Aug 2016, which we use for model testing purposes.  In what follows, I make the simplifying assumption that a vector autoregressive moving average process of order (1, 1) will suffice for modeling purposes, although in practice one would go through a procedure to test a wide spectrum of possible models incorporating moving average and autoregressive terms of varying dimensions.

In any event, our simplified VAR model is estimated as follows:

Picture22

 

The chart below combines the actual, in-sample data from 2012-2015, together with the out-of-sample forecasts for each spot metal from January 2016.

 

 

Pic45

Picture23

It is clear that the model projects a recovery in spot metal prices from the end of 2015.  How did the forecasts turn out?  In the chart below we compare the actual spot prices with the model forecasts, over the period from Jan to Aug 2016.

Picture16

Picture23

 

The actual and forecast percentage change in the spot metal prices over the out-of-sample period are as follows:

Picture18

The VAR model does a good job of forecasting the strong upward trend in metal prices over the first eight months of 2016.  It performs exceptionally well in its forecast of gold prices, although its forecasts for silver and platinum are somewhat over-optimistic.  Nevertheless, investors would have made money taking a long position in any of the metals on the basis of the model projections.

 

Relative Value Trade

Another way to apply the model would be to implement a relative value trade, based on the model’s forecast that silver would outperform gold and platinum.  Indeed, despite the model’s forecast of silver prices turning out to be over-optimistic, a relative value trade in silver vs. gold or platinum would have performed well:  silver gained 44% in the period form Jan-Aug 2016, compared to only 26% for gold and 28% for platinum.  A relative value trade entailing a purchase of silver and simultaneous sale of gold or platinum would have produced a gross return of 17% and 15% respectively.

A second relative value trade indicated by the model forecasts, buying platinum and selling gold, would have turned out less successfully, producing a gross return of less than 2%.  We will examine the reasons for this in the next section.

 

Forecasts and Trading Opportunities Through 2016

If we re-estimate the VAR model using all of the the available data through mid-Aug 2016 and project metal prices through the end of the year, the outcome is as follows:

Picture19

Picture23

While the positive trend in all three metals is forecast to continue, the new model (which incorporates the latest data) anticipates lower percentage rates of appreciation going forward:

Picture21

Once again, the model predicts higher rates of appreciation for both silver and platinum relative to gold.  So investors have the option to take a relative value trade, hedging a long position in silver or platinum with a short position in gold.  While the forecasts for all three metals appear reasonable, the projections for platinum strike me as the least plausible.

The reason is that the major applications of platinum are industrial, most often as a catalyst: the metal is used as a catalytic converter in automobiles and in the chemical process of converting naphthas into higher-octane gasolines. Although gold is also used in some industrial applications, its demand is not so driven by industrial uses. Consequently, during periods of sustained economic stability and growth, the price of platinum tends to be as much as twice the price of gold, whereas during periods of economic uncertainty, the price of platinum tends to decrease due to reduced industrial demand, falling below the price of gold. Gold prices are more stable in slow economic times, as gold is considered a safe haven.

This is the most likely explanation of why the gold-platinum relative value trade has not worked out as expected hitherto and is perhaps unlikely to do so in the months ahead, as the slowdown in the global economy continues.

Conclusion

We have shown that simple models of the ratio or differential in the prices of precious metals are unlikely to provide a sound basis for forecasting or trading, due to non-stationarity and/or temporal dependencies in the residuals from such models.

On the other hand, a vector autoregression model that models all three price processes simultaneously, allowing both cross correlations and autocorrelations to be captured, performs extremely well in terms of forecast accuracy in out-of-sample tests over the period from Jan-Aug 2016.

Looking ahead over the remainder of the year, our updated VAR model predicts a continuation of the price appreciation, albeit at a slower rate, with silver and platinum expected to continue outpacing gold.  There are reasons to doubt whether the appreciation of platinum relative to gold will materialize, however, due to falling industrial demand as the global economy cools.

 

 

Falling Water

The current 15-year drought in the South West is the most severe since recordkeeping for the Colorado River began in 1906.  Lake Mead, which supplies much of the water to Colorado Basin communities, is now more than half empty.

bathrub

A 120 foot high band of rock, bleached white by the water, and known as the “bathtub ring” encircles the lake, a stark reminder of the water crisis that has enveloped the surrounding region.  The Colorado River takes a 1,400 mile journey from the Rockies to Mexico, irrigating over 5 million acres of farmland in the Basin states of Wyoming, Utah, Colorado, New Mexico, Nevada, Arizona, and California.

 

map

The Colorado River Compact signed in 1922 enshrined the States’ water rights in law and Mexico was added to the roster in 1994, taking the total allocation to over 16.5 million acre-feet per year.  But the average freshwater input to the lake over the century from 1906 to 2005 reached only 15 million acre-feet.  The river can’t come close to meeting current demand and the problem is only likely to get worse. A 2009 study found that rainfall in the Colorado Basin could fall as much as 15% over the next 50 years and the shortfall in deliveries could reach 60% to 90% of the time.

 

 

Impact on Las Vegas

With an average of only 4 inches of rain a year, and a daily high temperatures of 103 o F during the summer, Las Vegas is perhaps the most hard pressed to meet the demand of its 2 million residents and 40 million visitors annually.   bellagio

Despite its conspicuous consumption, from the tumbling fountains of the Bellagio to the Venetian’s canals, since 2002, Las Vegas has been obliged to cut its water use by a third, from 314 gallons per capita a day to 212. The region recycles around half of its wastewater which is piped back into Lake Mead, after cleaning and treatment.  Residents are allowed to water their gardens no more than one day a week in peak season, and there are stiff fines for noncompliance.

SSALGOTRADING AD

The Third Straw

WATER_Last straw_sidebar 02_1

Historically, two intake pipes carried water from Lake Mead to Las Vegas, about 25 miles to the west. In 2012, realizing that the highest of these, at 1050 feet, would soon be sucking air, the Southern Nevada Water Authority began construction of a new pipeline. Known as the Third Straw, Intake No. 3 reaches 200 feet deeper into the lake—to keep water flowing for as long as there’s water to pump.  The new pipeline, which commenced operations in 2015, doesn’t draw more water from the lake than before, or make the surface level drop any faster. But it will keep taps flowing in Las Vegas homes and casinos even if drought-stricken Lake Mead drops to its lowest levels.

WATER_three images

 

 

 

 

 

 

Modeling Water Levels in Lake Mead

The monthly reported water levels in Lake Mead from Feb 1935 to June 2016 are shown in the chart below. The reference line is the drought level, historically defined as 1,125 feet.

chart1

 

One statistical technique widely applied in hydrology involves fitting a Kumaraswamy distribution to the relative water level.  According to the Arizona Game and Fish Department, the maximum lake level is 1229 feet.  We model the water level relative to the maximum level, as follows.

code1

 

chart2

The fit of the distribution appears quite good, even in the tails:

ProbabilityPlot[relativeLevels, edist]

chart3

 

Since water levels have been below the drought level for some time, let’s instead consider the “emergency” level, 1,075 feet.  According to this model, there is just over a 6% chance of Lake Mead hitting the emergency level and, consequently, a high probability of breaching the emergency threshold some time over before the end of 2017.

 

code2 code4

chart4

 

One problem with this approach is that it assumes that each observation is drawn independently from a random variable with the estimated distribution.  In reality, there are high levels of autocorrelation in the series, as might be expected:  lower levels last month typically increase the likelihood of lower levels this month.  The chart of the autocorrelation coefficients makes this pattern clear, with statistically significant coefficients at lags of up to 36 months.

ts[“ACFPlot”]

chart5

 

 

An alternative methodology that enables us to take account of the autocorrelation in the process is time series analysis.  We proceed to fit an autoregressive moving average (ARMA) model as follows:

tsm = TimeSeriesModelFit[ts, “ARMA”]

arma

The best fitting model in an ARMA(1,1) model, according to the AIC criterion:

armatable

Applying the fitted ARMA model, we forecast the water level in Lake Mead over the next ten years as shown in the chart below.  Given the mean-reverting moving average component of the model, it is not surprising to see the model forecasting a return to normal levels.

chart6

 

There is some evidence of lack of fit in the ARMA model, as shown in the autocorrelations of the model residuals:

chart7

A formal test reveals that residual autocorrelations at lags 4 and higher are jointly statistically significant:

chart8

The slowly decaying pattern of autocorrelations in the water level series suggests a possible “long memory” effect, which can be better modelled as a fractionally integrated process.  The forecasts from such a model, like the ARMA model forecasts, display a tendency to revert to a long term mean; but the reversion process is dampened by the reinforcing, long-memory effect captured in the FARIMA model.

chart9

The Prospects for the Next Decade

Taking the view that the water level in Lake Mead forms a stationary statistical process, the likelihood is that water levels will rise to 1,125 feet or more over the next ten years, easing the current water shortage in the region.

On the other hand, there are good reasons to believe that there are exogenous (deterministic) factors in play, specifically the over-consumption of water at a rate greater than the replenishment rate from average rainfall levels.  Added to this, plausible studies suggest that average rainfall in the Colorado Basin is expected to decline over the next fifty years.  Under this scenario, the water level in Lake Mead will likely continue to deteriorate, unless more stringent measures are introduced to regulate consumption.

 Economic Impact

The drought in the South West affects far more than just the water levels in Lake Mead, of course.  One study found that California’s agriculture sector alone had lost $2.2Bn and some 17,00 season and part time jobs in 2014, due to drought.  Agriculture uses more than 80% of the State’s water, according to Fortune magazine, which goes on to identify the key industries most affected, including agriculture, food processing, semiconductors, energy, utilities and tourism.

Dry fields and bare trees at Panoche Road, looking west, on Wednesday February 5, 2014, near San Joaquin, CA. California drought has hit the Central Valley hard.
Dry fields and bare trees at Panoche Road, looking west, on Wednesday February 5, 2014, near San Joaquin, CA. California drought has hit the Central Valley hard.

In the energy sector, for example, the loss of hydroelectric power cost CA around $1.4Bn in 2014, according to non-profit research group Pacific Institute.  Although Intel pulled its last fabrication plant from California in 2009, semiconductor manufacturing is still a going concern in the state. Maxim Integrated, TowerJazz, and TSI Semiconductors all still have fabrication plants in the state. And they need a lot of water. A single semiconductor fabrication plant can use as much water as a small city. That means the current plants could represent three cities worth of consumption.

The drought is also bad news for water utilities, of course. The need to conserve water raises the priority on repair and maintenance, and that means higher costs and lower profit. Complicating the problem, California lacks any kind of management system for its water supply and can’t measure the inflows and outflows to ground water levels at any particular time.

The Bureau of Reclamation has studied more than two dozen options for conserving and increasing water supply, including importation, desalination and reuse. While some were disregarded for being too costly or difficult, the bureau found that the remaining options, if instituted, could yield 3.7 million acre feet per year in savings and new supplies, increasing to 7 million acre feet per year by 2060.  Agriculture is the biggest user by far and has to be part of any solution. In the near term, the agriculture industry could reduce its use by 10 to 15 percent without changing the types of crops it grows by using new technology, such as using drip irrigation instead of flood irrigation and monitoring soil moisture to prevent overwatering, the Pacific Institute found.

Conclusion

We can anticipate that a series of short term fixes, like the “Third Straw”, will be employed to kick the can down the road as far as possible, but research now appears almost unanimous in finding that drought is having a deleterious, long term affect on the economics of the South Western states.  Agriculture is likely to have to bear the brunt of the impact, but so too will adverse consequences be felt in industries as disparate as food processing, semiconductors and utilities.  California, with the largest agricultural industry, by far, is likely to be hardest hit.  The Las Vegas region may be far less vulnerable, having already taken aggressive steps to conserve and reuse water supply and charge economic rents for water usage.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Modeling Water Levels in Lake Mead

 

The monthly reported water levels in Lake Mead from Feb 1935 to June 2016 are shown in the chart below. The reference line is the drought level, historically defined as 1,125 feet.

 

 

One statistical technique widely applied in hydrology involves fitting a Kumaraswamy distribution to the relative water level.  According to the Arizona Game and Fish Department, the maximum lake level is 1229 feet.  We model the water level relative to the maximum level, as follows.

 

 

 

The fit of the distribution appears quite good, even in the tails:

 

ProbabilityPlot[relativeLevels, edist]

 

 

Since water levels have been below the drought level for some time, let’s instead consider the “emergency” level, 1,075 feet.  According to this model, there is just over a 6% chance of Lake Mead hitting the emergency level and, consequently, a high probability of breaching the emergency threshold some time over before the end of 2017.

 

 

 

 

 

 

One problem with this approach is that it assumes that each observation is drawn independently from a random variable with the estimated distribution.  In reality, there are high levels of autocorrelation in the series, as might be expected:  lower levels last month typically increase the likelihood of lower levels this month.  The chart of the autocorrelation coefficients makes this pattern clear, with statistically significant coefficients at lags of up to 36 months.

 

 

ts[“ACFPlot”]

 

 

 

An alternative methodology that enables us to take account of the autocorrelation in the process is time series analysis.  We proceed to fit an autoregressive moving average (ARMA) model as follows:

 

tsm = TimeSeriesModelFit[ts, “ARMA”]

 

The best fitting model in an ARMA(1,1) model, according to the AIC criterion:

 

 

Applying the fitted ARMA model, we forecast the water level in Lake Mead over the next ten years as shown in the chart below.  Given the mean-reverting moving average component of the model, it is not surprising to see the model forecasting a return to normal levels.

 

 

There is some evidence of lack of fit in the ARMA model, as shown in the autocorrelations of the model residuals:

 

 

 

 

A formal test reveals that residual autocorrelations at lags 4 and higher are jointly statistically significant:

 

 

 

The slowly decaying pattern of autocorrelations in the water level series suggests a possible “long memory” effect, which can be better modelled as a fractionally integrated process.  The forecasts from such a model, like the ARMA model forecasts, display a tendency to revert to a long term mean; but the reversion process is dampened by the reinforcing, long-memory effect captured in the FARIMA model.

 

 

The Prospects for the Next Decade

Taking the view that the water level in Lake Mead forms a stationary statistical process, the likelihood is that water levels will rise to 1,125 feet or more over the next ten years, easing the current water shortage in the region.

 

On the other hand, there are good reasons to believe that there are exogenous (deterministic) factors in play, specifically the over-consumption of water at a rate greater than the replenishment rate from average rainfall levels.  Added to this, plausible studies suggest that average rainfall in the Colorado Basin is expected to decline over the next fifty years.  Under this scenario, the water level in Lake Mead will likely continue to deteriorate, unless more stringent measures are introduced to regulate consumption.

 

 

Economic Impact

 

The drought in the South West affects far more than just the water levels in Lake Mead, of course.  One study found that California’s agriculture sector alone had lost $2.2Bn and some 17,00 season and part time jobs in 2014, due to drought.  Agriculture uses more than 80% of the State’s water, according to Fortune magazine, which goes on to identify the key industries most affected, including agriculture, food processing, semiconductors, energy, utilities and tourism.

In the energy sector, for example, the loss of hydroelectric power cost CA around $1.4Bn in 2014, according to non-profit research group Pacific Institute.

 

Although Intel pulled its last fabrication plant from California in 2009, semiconductor manufacturing is still a going concern in the state. Maxim Integrated, TowerJazz, and TSI Semiconductors all still have fabrication plants in the state. And they need a lot of water. A single semiconductor fabrication plant can use as much water as a small city. That means the current plants could represent three cities worth of consumption.

The drought is also bad news for water utilities, of course. The need to conserve water raises the priority on repair and maintenance, and that means higher costs and lower profit. Complicating the problem, California lacks any kind of management system for its water supply and can’t measure the inflows and outflows to ground water levels at any particular time.

 

The Bureau of Reclamation has studied more than two dozen options for conserving and increasing water supply, including importation, desalination and reuse. While some were disregarded for being too costly or difficult, the bureau found that the remaining options, if instituted, could yield 3.7 million acre feet per year in savings and new supplies, increasing to 7 million acre feet per year by 2060.  Agriculture is the biggest user by far and has to be part of any solution. In the near term, the agriculture industry could reduce its use by 10 to 15 percent without changing the types of crops it grows by using new technology, such as using drip irrigation instead of flood irrigation and monitoring soil moisture to prevent overwatering, the Pacific Institute found.

 

Conclusion

We can anticipate that a series of short term fixes, like the “Third Straw”, will be employed to kick the can down the road as far as possible, but research now appears almost unanimous in finding that drought is having a deleterious, long term affect on the economics of the South Western states.  Agriculture is likely to have to bear the brunt of the impact, but so too will adverse consequences be felt in industries as disparate as food processing, semiconductors and utilities.  California, with the largest agricultural industry, by far, is likely to be hardest hit.  The Las Vegas region may be far less vulnerable, having already taken aggressive steps to conserve and reuse water supply and charge economic rents for water usage.

 

 

 

 

 

 

 

 

 

Profit Margins – Are they Predicting a Crash?

Jeremy Grantham: A Bullish Bear

Is Jeremy Grantham, co-founder and CIO of GMO, bullish or bearish these days?  According to Myles Udland at Business Insider, he’s both.  He quotes Grantham:

“I think the global economy and the U.S. in particular will do better than the bears believe it will because they appear to underestimate the slow-burning but huge positive of much-reduced resource prices in the U.S. and the availability of capacity both in labor and machinery.”

Grantham

Udland continues:

“On top of all this is the decline in profit margins, which Grantham has called the “most mean-reverting series in finance,” implying that the long period of elevated margins we’ve seen from American corporations is most certainly going to come an end. And soon. “

fredgraph

Corporate Profit Margins as a Leading Indicator

The claim is an interesting one.  It certainly looks as if corporate profit margins are mean-reverting and, possibly, predictive of recessionary periods. And there is an economic argument why this should be so, articulated by Grantham as quoted in an earlier Business Insider article by Sam Ro:

“Profit margins are probably the most mean-reverting series in finance, and if profit margins do not mean-revert, then something has gone badly wrong with capitalism.

If high profits do not attract competition, there is something wrong with the system and it is not functioning properly.”

Thomson Research / Barclays Research’s take on the same theme echoes Grantham:

“The link between profit margins and recessions is strong,” Barclays’ Jonathan Glionna writes in a new note to clients. “We analyze the link between profit margins and recessions for the last seven business cycles, dating back to 1973. The results are not encouraging for the economy or the market. In every period except one, a 0.6% decline in margins in 12 months coincided with a recession.”

barclays-margin

Buffett Weighs in

Even Warren Buffett gets in on the act (from 1999):

“In my opinion, you have to be wildly optimistic to believe that corporate profits as a percent of GDP can, for any sustained period, hold much above 6%.”

warren-buffett-477

With the Illuminati chorusing as one on the perils of elevated rates of corporate profits, one would be foolish to take a contrarian view, perhaps.  And yet, that claim of Grantham’s (“probably the most mean-reverting series in finance”) poses a challenge worthy of some analysis.  Let’s take a look.

The Predictive Value of Corporate Profit Margins

First, let’s reproduce the St Louis Fed chart:

CPGDP
Corporate Profit Margins

A plot of the series autocorrelations strongly suggests that the series is not at all mean-reverting, but non-stationary, integrated order 1:

CPGDPACF
Autocorrelations

 

Next, we conduct an exhaustive evaluation of a wide range of time series models, including seasonal and non-seasonal ARIMA and GARCH:

ModelFit ModelFitResults

The best fitting model (using the AIC criterion) is a simple ARMA(0,1,0) model, integrated order 1, as anticipated.  The series is apparently difference-stationary, with no mean-reversion characteristics at all.  Diagnostic tests indicate no significant patterning in the model residuals:

ModelACF
Residual Autocorrelations
LjungPlot
Ljung-Box Test Probabilities

Using the model to forecast a range of possible values of the Corporate Profit to GDP ratio over the next 8 quarters suggests a very wide range, from as low as 6% to as high as 13%!

Forecast

 

CONCLUSION

The opinion of investment celebrities like Grantham and Buffett to the contrary, there really isn’t any evidence in the data to support the suggestion that corporate profit margins are mean reverting, even though common-sense economics suggests they should be.

The best-available econometric model produces a very wide range of forecasts of corporate profit rates over the next two years, some even higher than they are today.

If a recession is just around the corner,  corporate profit margins aren’t going to call it for us.

Alpha Extraction and Trading Under Different Market Regimes

Market Noise and Alpha Signals

One of the perennial problems in designing trading systems is noise in the data, which can often drown out an alpha signal.  This is turn creates difficulties for a trading system that relies on reading the signal, resulting in greater uncertainty about the trading outcome (i.e. greater volatility in system performance).  According to academic research, a great deal of market noise is caused by trading itself.  There is apparently not much that can be done about that problem:  sure, you can trade after hours or overnight, but the benefit of lower signal contamination from noise traders is offset by the disadvantage of poor liquidity.  Hence the thrust of most of the analysis in this area lies in the direction of trying to amplify the signal, often using techniques borrowed from signal processing and related engineering disciplines.

There is, however, one trick that I wanted to share with readers that is worth considering.  It allows you to trade during normal market hours, when liquidity is greatest, but at the same time limits the impact of market noise.

SSALGOTRADING AD

Quantifying Market Noise

How do you measure market noise?  One simple approach is to start by measuring market volatility, making the not-unreasonable assumption that higher levels of volatility are associated with greater amounts of random movement (i.e noise). Conversely, when markets are relatively calm, a greater proportion of the variation is caused by alpha factors.  During the latter periods, there is a greater information content in market data – the signal:noise ratio is larger and hence the alpha signal can be quantified and captured more accurately.

For a market like the E-Mini futures, the variation in daily volatility is considerable, as illustrated in the chart below.  The median daily volatility is 1.2%, while the maximum value (in 2008) was 14.7%!

Fig1

The extremely long tail of the distribution stands out clearly in the following histogram plot.

Fig 2

Obviously there are times when the noise in the process is going to drown out almost any alpha signal. What if we could avoid such periods?

Noise Reduction and Model Fitting

Let’s divide our data into two subsets of equal size, comprising days on which volatility was lower, or higher, than the median value.  Then let’s go ahead and use our alpha signal(s) to fit a trading model, using only data drawn from the lower volatility segment.

This is actually a little tricky to achieve in practice:  most software packages for time series analysis or charting are geared towards data occurring at equally spaced points in time.  One useful trick here is to replace the actual date and time values of the observations with sequential date and time values, in order to fool the software into accepting the data, since there are no longer any gaps in the timestamps.  Of course, the dates on our time series plot or chart will be incorrect. But that doesn’t matter:  as long as we know what the correct timestamps are.

An example of such a system is illustrated below.  The model was fitted  to  3-Min bar data in EMini futures, but only on days with market volatility below the median value, in the period from 2004 to 2015.  The strategy equity curve is exceptionally smooth, as might be expected, and the performance characteristics of the strategy are highly attractive, with a 27% annual rate of return, profit factor of 1.58 and Sharpe Ratio approaching double-digits.

Fig 3

Fig 4

Dealing with the Noisy Trading Days

Let’s say you have developed a trading system that works well on quiet days.  What next?  There are a couple of ways to go:

(i) Deploy the model only on quiet trading days; stay out of the market on volatile days; or

(ii) Develop a separate trading system to handle volatile market conditions.

Which approach is better?  It is likely that the system you develop for trading quiet days will outperform any system you manage to develop for volatile market conditions.  So, arguably, you should simply trade your best model when volatility is muted and avoid trading at other times.  Any other solution may reduce the overall risk-adjusted return.  But that isn’t guaranteed to be the case – and, in fact, I will give an example of systems that, when combined, will in practice yield a higher information ratio than any of the component systems.

Deploying the Trading Systems

The astute reader is likely to have noticed that I have “cheated” by using forward information in the model development process.  In building a trading system based only on data drawn from low-volatility days, I have assumed that I can somehow know in advance whether the market is going to be volatile or not, on any given day.  Of course, I don’t know for sure whether the upcoming session is going to be volatile and hence whether to deploy my trading system, or stand aside.  So is this just a purely theoretical exercise?  No, it’s not, for the following reasons.

The first reason is that, unlike the underlying asset market, the market volatility process is, by comparison, highly predictable.  This is due to a phenomenon known as “long memory”, i.e. very slow decay in the serial autocorrelations of the volatility process.  What that means is that the history of the volatility process contains useful information about its likely future behavior.  [There are several posts on this topic in this blog – just search for “long memory”].  So, in principle, one can develop an effective system to forecast market volatility in advance and hence make an informed decision about whether or not to deploy a specific model.

But let’s say you are unpersuaded by this argument and take the view that market volatility is intrinsically unpredictable.  Does that make this approach impractical?  Not at all.  You have a couple of options:

You can test the model built for quiet days on all the market data, including volatile days.  It may perform acceptably well across both market regimes.

For example, here are the results of a backtest of the model described above on all the market data, including volatile and quiet periods, from 2004-2015.  While the performance characteristics are not quite as good, overall the strategy remains very attractive.

Fig 5

Fig 6

 

Another approach is to develop a second model for volatile days and deploy both low- and high-volatility regime models simultaneously.  The trading systems will interact (if you allow them to) in a highly nonlinear and unpredictable way.  It might turn out badly – but on the other hand, it might not!  Here, for instance, is the result of combining low- and high-volatility models simultaneously for the Emini futures and running them in parallel.  The result is an improvement (relative to the low volatility model alone), not only in the annual rate of return (21% vs 17.8%), but also in the risk-adjusted performance, profit factor and average trade.

Fig 7

Fig 8

 

CONCLUSION

Separating the data into multiple subsets representing different market regimes allows the system developer to amplify the signal:noise ratio, increasing the effectiveness of his alpha factors. Potentially, this allows important features of the underlying market dynamics to be captured in the model more easily, which can lead to improved trading performance.

Models developed for different market regimes can be tested across all market conditions and deployed on an everyday basis if shown to be sufficiently robust.  Alternatively, a meta-strategy can be developed to forecast the market regime and select the appropriate trading system accordingly.

Finally, it is possible to achieve acceptable, or even very good results, by deploying several different models simultaneously and allowing them to interact, as the market moves from regime to regime.