# Deriving the Linear Regression Solution

In deriving the linear regression solution, we will be taking a closer look at how we “solve” the common linear regression, i.e., finding $\beta$ in $y = X\beta + \epsilon$.

I mention “common,” because there are actually several ways you can get an estimate for $\beta$ based on assumptions of your data and how you can correct for various anomalies. “Common” in this case specifically refers to ordinary least squares. For this specific case, I assume you already know the punch line, that is, $\beta = (X^{T}X)^{-1}X^{T}y$. But, what we’re really interested in is how to get to that point.

The crux is that you’re trying to find a solution $\beta$ that minimizes the sum of the squared errors, i.e., $\min\limits_{\beta} \: \epsilon^{T}\epsilon$. We can find the minimum by taking the derivative and setting it to zero, i.e., $\frac{d}{d\beta} \epsilon^{T}\epsilon = 0$.

In deriving the linear regression solution, it helps to remember two things. Regarding derivatives of two vectors, the product rule states that $\frac{d}{dx}u^{T}v = u^{T}\frac{d}{dx}v + v^{T}\frac{d}{dx}u$. See this and that. And, for matrix transpose, $(AB)^{T} = B^{T}A^{T}$.

Observe that $y = X\beta + \epsilon \implies \epsilon = y - X\beta$. As such, $\frac{d}{d\beta} \epsilon^{T}\epsilon = \frac{d}{d\beta} (y-X\beta)^{T}(y-X\beta)$.

Working it out,
$\frac{d}{d\beta} \epsilon^{T}\epsilon \\= \frac{d}{d\beta} (y-X\beta)^{T}(y-X\beta) \\= (y-X\beta)^{T} \frac{d}{d\beta}(y-X\beta) + (y-X\beta)^{T}\frac{d}{d\beta}(y-X\beta) \\= (y-X\beta)^{T}(-X) + (y-X\beta)^{T}(-X) \\= -2(y-X\beta)^{T}X \\= -2(y^{T} - \beta^{T}X^{T})X \\= -2(y^{T}X - \beta^{T}X^{T}X)$

By setting the derivative to zero and solving for $\beta$, we can find the $\beta$ that minimizes the sum of squared errors.
$\frac{d}{d\beta} \epsilon^{T}\epsilon = 0 \\ \implies -2(y^{T}X - \beta^{T}X^{T}X) = 0 \\ \implies y^{T}X - \beta^{T}X^{T}X = 0 \\ \implies y^{T}X = \beta^{T}X^{T}X \\ \implies (y^{T}X)^{T} = (\beta^{T}X^{T}X)^{T} \\ \implies X^{T}y = X^{T}X\beta \\ \implies (X^{T}X)^{-1}X^{T}y = (X^{T}X)^{-1}(X^{T}X)\beta \\ \implies \beta = (X^{T}X)^{-1}X^{T}y$

Without too much difficulty, we saw how we arrived at the linear regression solution of $\beta = (X^{T}X)^{-1}X^{T}y$. The general path to that derivation is to recognize that you’re trying to minimize the sum of squared errors ($\epsilon^{T}\epsilon$), which can be done by finding the derivative of $\epsilon^{T}\epsilon$, setting it to zero, and then solving for $\beta$.

# Mean-Variance Portfolio Optimization with R and Quadratic Programming

The following is a demonstration of how to use R to do quadratic programming in order to do mean-variance portfolio optimization under different constraints, e.g., no leverage, no shorting, max concentration, etc.

Taking a step back, it’s probably helpful to realize the point of all of this. In the 1950s, Harry Markowitz introduced what we now call Modern Portfolio Theory (MPT), which is a mathematical formulation for diversification. Intuitively, because some stocks zig when others zag, when we hold a portfolio of these stocks, our portfolio can have some notional return at a lower variance than holding the stocks outright. More specifically, given a basket of stocks, there exists a notion of an efficient frontier. I.e., for any return you choose, there exists a portfolio with the lowest variance and for any variance you fix, there exists a portfolio with the greatest return. Any portfolio you choose that is not on this efficient frontier is considered sub-optimal (for a given return, why would you choose a a higher variance portfolio when a lower one exists).

The question becomes if given a selection of stocks to choose from, how much do we invest in each stock if at all?

In an investments course I took a while back, we worked the solution for the case where we had a basket of three stocks to choose from, in Excel. Obviously, this solution wasn’t really scalable outside of the N=3 case. When asked about extending N to an arbitrary number, the behind-schedule-professor did some handwaving about matrix math. Looking into this later, there does exist a closed-form equation for determining the holdings for an arbitrary basket of stocks. However, the math starts getting more complicated with each constraint you decide to tack on (e.g., no leverage).

The happy medium between “portfolio optimizer in Excel for three stocks” and “hardcore matrix math for an arbitrary number of stocks” is to use a quadratic programming solver. Some context is needed to see why this is the case.

According to wikipedia, quadratic programming attempts to minimize a function of the form $\frac{1}{2}x^{T}Qx + c^{T}x$ subject to one or more constraints of the form $Ax \le b$ (inequality) or $Ex = d$ (equality).

Modern Portfolio Theory
The mathematical formulation of MPT is that for a given risk tolerance $q \in [0,\infty)$, we can find the efficient frontier by minimizing $w^{T} \Sigma w - q*R^{T}w$.

Where,

• $w$ is a vector of holding weights such that $\sum w_i = 1$
• $\Sigma$ is the covariance matrix of the returns of the assets
• $q \ge 0$ is the “risk tolerance”: $q = 0$ works to minimize portfolio variance and $q = \infty$ works to maximize portfolio return
• $R$ is the vector of expected returns
• $w^{T} \Sigma w$ is the variance of portfolio returns
• $R^{T} w$ is the expected return on the portfolio

My introducing of quadratic programming before mean-variance optimization was clearly setup, but look at the equivalence between $\frac{1}{2}x^{T}Qx + c^{T}x$ and $w^{T} \Sigma w - q*R^{T}w$.

Quadratic Programming in R
solve.QP, from quadprog, is a good choice for a quadratic programming solver. From the documentation, it minimizes quadratic programming problems of the form $-d^{T}b + \frac{1}{2} b^{T}Db$ with the constraints $A^{T}b \ge b_0$. Pedantically, note the variable mapping of $D = \Sigma$ and $d = R$

The fun begins when we have to modify $A^{T}b \ge b_0$ to impose the constraints we’re interested in.

I went to google finance and downloaded historical data for all of the sector SPDRs, e.g., XLY, XLP, XLE, XLF. I’ve named the files in the format of dat.{SYMBOL}.csv. The R code loads it up, formats it, and then ultimately creates a data frame where each column is the symbol and each row represents an observation (close to close log return).

The data is straight-forward enough, with approximately 13 years worth:

> dim(dat.ret)
[1] 3399    9
XLB         XLE          XLF         XLI          XLK
[1,]  0.010506305  0.02041755  0.014903406 0.017458395  0.023436164
[2,]  0.022546751 -0.00548872  0.006319802 0.013000812 -0.003664126
[3,] -0.008864066 -0.00509339 -0.013105239 0.004987542  0.002749353
XLP          XLU          XLV          XLY
[1,]  0.023863921 -0.004367553  0.022126545  0.004309507
[2,] -0.001843998  0.018349139  0.006232977  0.018206972
[3,] -0.005552485 -0.005303294 -0.014473165 -0.009255754
>


Mean-Variance Optimization with Sum of Weights Equal to One
If it wasn’t clear before, we typically fix the $q$ in $w^{T} \Sigma w - q*R^{T}w$ before optimization. By permuting the value of $q$, we then generate the efficient frontier. As such, for these examples, we’ll set $q = 0.5$.

solve.QP’s arguments are:

solve.QP(Dmat, dvec, Amat, bvec, meq=0, factorized=FALSE)

Dmat (covariance) and dvec (penalized returns) are generated easily enough:

risk.param <- 0.5
Dmat <- cov(dat.ret)
dvec <- colMeans(dat.ret) * risk.param


Amat and bvec are part of the inequality (or equality) you can impose, i.e., $A^{T}b \ge b_0$. meq is an integer argument that specifies "how many of the first meq constraints are equality statements instead of inequality statements." The default for meq is zero.

By construction, you need to think of the constraints in terms of matrix math. E.g., to have all the weights sum up to one, Amat needs to contain a column of ones and bvec needs to contain a single value of one. Additionally, since it's an equality contraint, meq needs to be one.

In R code:

# Constraints: sum(x_i) = 1
Amat <- matrix(1, nrow=nrow(Dmat))
bvec <- 1
meq <- 1


Having instantiated all the arguments for solve.QP, it's relatively straightforward to invoke it. Multiple things are outputted, e.g., constrained solution, unconstrained solution, number of iterations to solve, etc. For our purpose, we're primarily just interested in the solution.

> qp <- solve.QP(Dmat, dvec, Amat, bvec, meq)
> qp$solution [1] -0.1489193 0.6463653 -1.0117976 0.4107733 -0.4897956 0.2612327 -0.1094819 [8] 0.5496478 0.8919753  Things to note in the solution are that we have negative values (shorting is allowed) and there exists at least one weight whose absolute value is greater than one (leverage is allowed). Mean-Variance Optimization with Sum of Weights Equal to One and No Shorting We need to modify Amat and bvec to add the constraint of no shorting. In writing, we want to add a diagonal matrix of ones to Amat and a vector of zeros to bvec, which works out when doing the matrix multiplication that for each weight, its value must be greater than zero. # Constraints: sum(x_i) = 1 & x_i >= 0 Amat <- cbind(1, diag(nrow(Dmat))) bvec <- c(1, rep(0, nrow(Dmat))) meq <- 1 qp <- solve.QP(Dmat, dvec, Amat, bvec, meq) qp$solution[abs(qp$solution) <= 1e-7] <- 0 > qp$solution
[1] 0.0000000 0.4100454 0.0000000 0.0000000 0.0000000 0.3075880 0.0000000
[8] 0.2823666 0.0000000


Note that with the constraints that all the weights sum up to one and that the weights are positive, we've implicitly also constrained the solution to have no leverage.

Mean-Variance Optimization with Sum of Weights Equal to One, No Shorting, and No Heavy Concentration
Looking at the previous solution, note that one of the weights suggests that we put 41% of our portfolio into a single asset. We may not be comfortable with such a heavy allocation, and we might want to impose the additional constraint that no single asset in our portfolio takes up more than 15%. In math and with our existing constraints, that's the same as saying $-x \ge -0.15$ which is equivalent to saying $x \le 0.15$.

# Constraints: sum(x_i) = 1 & x_i >= 0 & x_i <= 0.15
Amat <- cbind(1, diag(nrow(Dmat)), -1*diag(nrow(Dmat)))
bvec <- c(1, rep(0, nrow(Dmat)), rep(-0.15, nrow(Dmat)))
meq <- 1
qp <- solve.QP(Dmat, dvec, Amat, bvec, meq)
qp$solution[abs(qp$solution) <= 1e-7] <- 0

> qp$solution [1] 0.1092174 0.1500000 0.0000000 0.1407826 0.0000000 0.1500000 0.1500000 [8] 0.1500000 0.1500000  Turning the Weights into Expected Portfolio Return and Expected Portfolio Volatility With our weights, we can now calculate the portfolio return as $R^{T}w$ and portfolio volatility as $\sqrt{w^T \Sigma w}$. Doing this, we might note that the values look "small" and not what you expected. Keep in mind that our observations are in daily-space and thus our expected return is expected daily return and expected volatility is expected daily volatility. You will need to annualize it, i.e., $R^{T}w * 252$ and $\sqrt{w^{T} \Sigma w * 252}$. The following is an example of the values of the weights and portfolio statistics while permuting the risk parameter and solving the quadratic programming problem with the constraints that the weights sum to one and there's no shorting. > head(ef.w) XLB XLE XLF XLI XLK XLP XLU XLV XLY 1 0 0.7943524 0 0 0 0 0 0.1244543 0.08119329 1.005 0 0.7977194 0 0 0 0 0 0.1210635 0.08121713 1.01 0 0.8010863 0 0 0 0 0 0.1176727 0.08124097 1.015 0 0.8044533 0 0 0 0 0 0.1142819 0.08126480 1.02 0 0.8078203 0 0 0 0 0 0.1108911 0.08128864 1.025 0 0.8111873 0 0 0 0 0 0.1075003 0.08131248 > head(ef.stat) ret sd 1 0.06663665 0.2617945 1.005 0.06679809 0.2624120 1.01 0.06695954 0.2630311 1.015 0.06712098 0.2636519 1.02 0.06728243 0.2642742 1.025 0.06744387 0.2648981 >  Note that as we increase the risk parameter, we're working to maximize return at the expense of risk. While obvious, it's worth stating that we're looking at the efficient frontier. If you plotted ef.stat in its entirety on a plot whose axis are in return space and risk space, you will get the efficient frontier. Wrap Up I've demonstrated how to use R and the quadprog package to do quadratic programming. It also happens to coincide that the mean-variance portfolio optimization problem really lends itself to quadratic programming. It's relatively straightforward to do variable mapping between the two problems. The only potential gotcha is how to state your desired constraints into the form $A^{T}b \ge b_{0}$, but several examples of constraints were given, for which you can hopefully extrapolate from. Getting away from the mechanics and talking about the theory, I'll also offer that there are some serious flaws with the approach demonstrated if you attempt to implement this for your own trading. Specifically, you will most likely want to create return forecasts and risk forecasts instead of using historical values only. You might also want to impose constraints to induce sparsity on what you actually hold, in order to minimize transaction costs. In saying that your portfolio is mean-variance optimal, there's the assumption that the returns you're working with is normal, which is definitely not the case. These and additional considerations will need to be handled before you let this run in "production." All that being said, however, Markowitz's mean-variance optimization is the building block for whatever more robust solution you might end up coming with. And, an understanding in both theory and implementation of a mean-variance optimization is needed before you can progress. # You are Horrible at Market Timing You are horrible at market timing. Don’t even attempt it. I probably can’t convince you how horrible you are, but hopefully some empirical data analysis will show how you and the majority of people are no good at market timing. Recently a friend came to me lamenting about a recent stock purchase he made, lamenting how the stock has gone down since he’s bought it and how he should have waited to buy it for even cheaper. From this, I was reminded by an anecdote from a professor from my econometrics class. I was taking the class in late 2008, which if you don’t remember, was right in the midst of the major financial collapse, with all the major indices taking a huge nose dive. Students being students, somebody asked the professor what he thought about the collapse and what he was doing himself in his own personal account. Keep in mind the tone was what a “normal” person does instead what a 1000-person hedge fund does. He referred to a past study that showed that most recoveries in the equities space didn’t come from steady returns but instead were concentrated on a few, infrequently-spaced days. That is, there was no way for you to catch the recoveries unless you were already invested the day before. And, if you were sitting on cash before, saw the move happen, and attempted to then get into the markets, it would have been too late for you. I decided to (very) roughly replicate this purported study for my friend. I first went to google to download daily prices for SPY. They provided a nice facility for you to export the data to a csv format. The data is relatively straightforward. Date,Open,High,Low,Close,Volume 29-May-12,133.16,133.92,132.75,133.70,32727593 25-May-12,132.48,132.85,131.78,132.10,28450945 24-May-12,132.67,132.84,131.42,132.53,31309748 ...  I wrote some R code to read in this data and to trim out days that didn’t have an open, which left me with observation starting in 2000/01/03 and ~3100 data points. Additionally, I created log returns for that day’s open to close, i.e., $log(p_{close}) - log(p_{open})$. # Get the data xx <- read.table(file="~/tmp/spy.data.csv", header=T, sep=",", as.is=T) names(xx) <- c("date", "open", "high", "low", "close", "vlm") # Get date in ymd format xx$ymd <- as.numeric(strftime(as.Date(xx$date, "%d-%b-%y"), "%Y%m%d")) xx <- xx[, names(xx)[-1]] xx <- xx[,c(ncol(xx), 1:(ncol(xx)-1))] # We want to work with complete data xx <- xx[xx$open != 0,]

# I prefer low dates first than high dates
xx <- xx[order(xx$ymd),] rownames(xx) <- 1:nrow(xx) # Getting open to close xx$o2c <- log(xx$close) - log(xx$open)
xx <- xx[!is.infinite(xx$o2c),]  Getting the top 10 return days is relatively straightforward. Note that finger-in-the-wind, a lot of the top 10 return days came from end of 2008, for which presumably a lot of people decided to put their money into cash out of fear. > head(xx[order(-xx$o2c),], n=10)
ymd   open   high    low  close       vlm        o2c
635  20020724  78.14  85.12  77.68  84.72    671400 0.08084961
2202 20081013  93.87 101.35  89.95 101.35   2821800 0.07666903
2213 20081028  87.34  94.24  84.53  93.76  81089900 0.07092978
2225 20081113  86.13  91.73  82.09  91.17 753800996 0.05686811
2234 20081126  84.30  89.19  84.24  88.97 370320441 0.05391737
2019 20080123 127.09 134.19 126.84 133.86  53861000 0.05189898
248  20010103 128.31 136.00 127.66 135.00  17523900 0.05082557
2241 20081205  83.65  88.42  82.24  87.93 471947872 0.04989962
2239 20081203  83.40  87.83  83.14  87.32 520103726 0.04593122
2315 20090323  78.74  82.29  78.31  82.22 420247245 0.04324730


Emphasizing this point more, if you didn't have your cash in equities at the beginning of the day, you would have missed out on the recovery. An additional point we can do is to see what the returns were on the prior day. In other words, is there some in-your-face behavior the prior day that would lead you to believe that huge returns would have come the next day?

Having gone through this process (I anticipate it taking at minimum five hours and more likely a couple days), you will get a much better idea of how your finances actually are. You will notice, for example, that you’re spending upwards of $500 on just going out for drinks in a given month. You most likely didn’t think you were spending this much, since$10 here and $14 there don’t really seem to add up to$500. But, $10 here and$14 there, extrapolated to the rest of the month, does add up to $500. At this point, I would take a deep breath. When I did this exercise before and got to the “how much I spent in each bucket” stage, I was having mini-anxiety attacks about how my money was just bleeding out. That being said, you did a hard step and deserve some congratulations. As the PSA from G.I. Joe would say, “Now you know, and knowing is half the battle.” With this knowledge, you can now take proactive steps in curtailing your spending on the little items, e.g., you will repeatedly tell yourself “I don’t need this {random$10 item}” knowing that those $10 items will easily add up to a much larger tab in the end. Bonus for reading this far: My personal workflow is to use mint.com, which is effectively an aggregator for all your financial accounts. I won’t get into the details of the security of their approach, but suffice it to say that I feel comfortable enough to use it. It will download all the transactions you have and attempt to categorize it for you. You can easily go in and change the categories, add new categories, etc… saving you a lot more time than doing the spreadsheet approach by hand. Even more so, it can show you how your budget has evolved month by month. It’s a great time-saving tool in keeping track of your finances, especially the little things. # Random Readings 0001 – Investment Related Kiplinger provided a list of four companies who are similar to Berkshire Hathaway and its chairman Warren Buffet. Specifically, they highlighted, Markel, Fairfax Financial, Loews, and Leucadia National. The common thread of such companies is that they are cash rich businesses from underwriting insurance and need to do something with the cash. At least for “Berkshire Hathaway”-like companies, they leverage the cash in building large stock portfolios and/or acquiring value/distress-based companies. Continuing the theme of taking insurance premiums and investing it, Greenlight Capital Re is a reinsurer who takes its premiums and invests it in David’s Einhorn’s hedgefund, Greenlight Capital. Jeffrey was profiling Annaly Capital Management and incidentally highlighted the downside risks of all the high dividend yield REITs we see. Specifically, the strategies typically encompass borrowing low interest rates and investing them in various sorts of mortgage securities, which typically earn a higher rate of return. The risks come from 1) increasing interest rates going forward relative to the all time low interest rates we have now will shrink the yields obtainable and 2) if home owners become able to refinance at the current lower interest rates (although if you’re underwater, it will be difficult to refinance), the yields will shrink. # Why a 529 Savings Plan is Superior This is an article demonstrating the benefit of the 529 savings plan compared against other potential choices for your child’s college savings. Having a newborn, I recently began investigating options for saving money for the kid’s college fund. Of course, there are many options, including (but not entirely inclusive) investing directly yourself, UGMA/UTMA, and the 529 plan. I think the natural choice arises when you begin with the right set of questions. E.g., “Is your kid going to turn into a twat at 18” or “How do you think your kid will handle having a sudden influx of money?” Not that this is directly correlated, but more than 50% of NBA and NFL players experience bankruptcy or financial duress post retirement – leading me to believe that if you don’t have a good handle on how to use money and debt, a sudden influx of money isn’t going to fix that. Of the options I mentioned, only the “investing directly yourself” and the 529 plan allow you to be in control at all times of the account (including how you handle distributions). With UGMA/UTMA, the kid inherits all of the money at the moment they turn 18 and can spend it all on baseball cards if they so desired. And of the”investing directly yourself” and the 529 plan, only the latter is tax-advantaged. With a 529 plan, when applying for financial aid, it’s more advantageous since it is counted much less so towards total expected family contribution. Additionally, you’re able to transfer the beneficiary to other people (including yourself) if there’s unused money. Really, I’m not seeing any downsides here with the 529 plan. In short, even though we hope our kids don’t turn out to be financially irresponsible, they might anyway due to inexperience. We need to remember that we’ve had the hard lessons already, in addition to years on them, and they haven’t had the chance to learn these things on their own. We want to help our kids through college but let’s not give them enough rope to hang themselves. Personally, I’m not going to tell my kid(s) about the 529’s existence and have them work under the mode that they better do well in school now to attain merit scholarships. And when they ask for residual money not covered by scholarships for additional school material, I’ll just tell them “Ugh. I’ll dig in my wallet and see what I can come up” when I’m really digging into their 529. Note that when I reference the 529 plan, I’m referring to the self-directed investing (instead of say, locking in a state tuition rate). Of all the 529 plans I saw, I ended up going with Vanguard (the 529 plan being “based” in Nevada), since Vanguard funds had the lowest expenses I’ve seen out of all the plans. # Cost of Replacing a 2012 Ford Edge Key Cost of replacing a 2012 Ford Edge key can be very expensive if you’re not prepared. Recently, I misplaced (read: placed on the roof of my car) my 2012 Ford Edge key. Not knowing anything about keys, I assumed it was a relatively straightforward process to get a new copy. Not so much. First and foremost, the key is a smart key, which implies there’s a chip embedded in the key that effectively talks to the car. I.e., if the key isn’t programmed to your car specifically, it’s useless. This also implies that most likely your local Walmart won’t be able to help you. Feeling panicky, I got some quotes from the Ford dealer, with all sorts of prices that were effectively at least$500. I ended up finding a parts dealer and ordering the fob ($150) and the key ($25) and felt pretty proud of myself, since I was under the assumption that I could program the second key myself with at least one working key. That came quickly crashing down once I actually pulled out the manual and realized that I needed at least two already programmed keys in order to program another key. So, my second key really wasn’t a spare at all, and a spare would have been the third key had the dealer given me three keys, which obviously they did not.
I ended up calling around various Ford dealers to get quotes for “I have one programmed key and one unprogrammed key and need to program the unprogrammed one.” I got various prices ranging from $100 and upwards but eventually I got a dealer who quoted me a price of$50 to reprogram that one key.
During all of this, I did more research on keys. Apparently, I don’t really need the $175 magical key from Ford. On ebay, I found various sellers selling blank uncut transponder keys for around$15. As such, I would be able to take my two programmed keys and start programming the cheapo-deapo keys as backups. It didn’t matter that the keys were uncut, since the Ford Edge I had couldn’t be physically started anyway; the most the mechanical key could do was lock and unlock the door, but once in, you couldn’t start the car until the programmed key was within range.
Cost of replacing a “spare”: $250 Cost of replacing a true spare:$15