Marco Avellaneda∗† and Jeong-Hyun Lee∗ First draft: July 11, 2008 This version: June 15, 2009
Abstract We study model-driven statistical arbitrage in U.S. equities. The trading signals are generated in two ways: using Principal Component Analysis and using sector ETFs. In both cases, we consider the residuals, or idiosyncratic components of stock returns, and model them as mean-reverting processes. This leads naturally to “contrarian” trading signals. The main contribution of the paper is the construction, back-testing and comparison of market-neutral PCA- and ETF- based strategies applied to the broad universe of U.S. stocks. Back-testing shows that, after accounting for transaction costs, PCA-based strategies have an average annual Sharpe ratio of 1.44 over the period 1997 to 2007, with much stronger performances prior to 2003. During 2003-2007, the average Sharpe ratio of PCA-based strategies was only 0.9. Strategies based on ETFs achieved a Sharpe ratio of 1.1 from 1997 to 2007, experiencing a similar degradation after 2002. We also introduce a method to account for daily trading volume information in the signals (which is akin to using “trading time” as opposed to calendar time), and observe signiﬁcant improvement in performance in the case of ETF-based signals. ETF strategies which use volume information achieve a Sharpe ratio of 1.51 from 2003 to 2007. The paper also relates the performance of mean-reversion statistical arbitrage strategies with the stock market cycle. In particular, we study in detail the performance of the strategies during the liquidity crisis of the summer of 2007. We obtain results which are consistent with Khandani and Lo (2007) and validate their “unwinding” theory for the quant fund drawdown of August 2007. ∗ Courant Institute of Mathematical Sciences, 251 Mercer Street, New York, N.Y. 10012 USA † Finance Concepts, 49-51 Avenue Victor-Hugo, 75116 Paris, France.
The term statistical arbitrage encompasses a variety of strategies and investment programs. Their common features are: (i) trading signals are systematic, or rules-based, as opposed to driven by fundamentals, (ii) the trading book is market-neutral, in the sense that it has zero beta with the market, and (iii) the mechanism for generating excess returns is statistical. The idea is to make many bets with positive expected returns, taking advantage of diversiﬁcation across stocks, to produce a low-volatility investment strategy which is uncorrelated with the market. Holding periods range from a few seconds to days, weeks or even longer. Pairs-trading is widely assumed to be the “ancestor” of statistical arbitrage. If stocks P and Q are in the same industry or have similar characteristics (e.g. Exxon Mobile and Conoco Phillips), one expects the returns of the two stocks to track each other after controlling for beta. Accordingly, if Pt and Qt denote the corresponding price time series, then we can model the system as ln(Pt /Pt0 ) = α(t − t0 ) + βln(Qt /Qt0 ) + Xt or, in its diﬀerential version, dQt dPt = αdt + β + dXt , Pt Qt (2) (1)
where Xt is a stationary, or mean-reverting, process. This process will be referred to as the cointegration residual, or residual, for short, in the rest of the paper. In many cases of interest, the drift α is small compared to the ﬂuctuations of Xt and can therefore be neglected. This means that, after controlling for beta, the long-short portfolio oscillates near some statistical equilibrium. The model suggests a contrarian investment strategy in which we go long 1 dollar of stock P and short β dollars of stock Q if Xt is small and, conversely, go short P and long Q if Xt is large. The portfolio is expected to produce a positive return as valuations converge (see Pole (2007) for a comprehensive review on statistical arbitrage and co-integration). The mean-reversion paradigm is typically associated with market...