{"id":456,"date":"2013-09-02T18:20:26","date_gmt":"2013-09-02T18:20:26","guid":{"rendered":"https:\/\/www.wdiam.com\/?p=456"},"modified":"2020-08-06T20:13:36","modified_gmt":"2020-08-06T20:13:36","slug":"deriving-the-linear-regression-solution","status":"publish","type":"post","link":"https:\/\/www.wdiam.com\/b\/2013\/09\/02\/deriving-the-linear-regression-solution\/","title":{"rendered":"Deriving the Linear Regression Solution"},"content":{"rendered":"<p>In deriving the linear regression solution, we will be taking a closer look at how we &#8220;solve&#8221; the common <a href=\"http:\/\/en.wikipedia.org\/wiki\/Linear_regression\">linear regression<\/a>, i.e., finding <span class=\"katex-eq\" data-katex-display=\"false\">\\beta<\/span> in <span class=\"katex-eq\" data-katex-display=\"false\">y = X\\beta + \\epsilon<\/span>.<\/p>\n<p>I mention &#8220;common,&#8221; because there are actually several ways you can get an estimate for <span class=\"katex-eq\" data-katex-display=\"false\">\\beta<\/span> based on assumptions of your data and how you can correct for various anomalies. &#8220;Common&#8221; in this case specifically refers to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Ordinary_least_squares\">ordinary least squares<\/a>. For this specific case, I assume you already know the punch line, that is, <span class=\"katex-eq\" data-katex-display=\"false\">\\beta = (X^{T}X)^{-1}X^{T}y<\/span>. But, what we&#8217;re really interested in is how to get to that point.<\/p>\n<p>The crux is that you&#8217;re trying to find a solution <span class=\"katex-eq\" data-katex-display=\"false\">\\beta<\/span> that minimizes the sum of the squared errors, i.e., <span class=\"katex-eq\" data-katex-display=\"false\">\\min\\limits_{\\beta} \\: \\epsilon^{T}\\epsilon<\/span>. We can find the minimum by taking the derivative and setting it to zero, i.e., <span class=\"katex-eq\" data-katex-display=\"false\">\\frac{d}{d\\beta} \\epsilon^{T}\\epsilon = 0<\/span>.<\/p>\n<p>In deriving the linear regression solution, it helps to remember two things. Regarding derivatives of two vectors, the product rule states that <span class=\"katex-eq\" data-katex-display=\"false\">\\frac{d}{dx}u^{T}v = u^{T}\\frac{d}{dx}v + v^{T}\\frac{d}{dx}u<\/span>. See <a href=\"http:\/\/mathworld.wolfram.com\/DotProduct.html\">this<\/a> and <a href=\"http:\/\/en.wikipedia.org\/wiki\/Matrix_calculus#Scalar-by-vector_identities\">that<\/a>. And, for matrix transpose, <span class=\"katex-eq\" data-katex-display=\"false\">(AB)^{T} = B^{T}A^{T}<\/span>.<\/p>\n<p>Observe that <span class=\"katex-eq\" data-katex-display=\"false\">y = X\\beta + \\epsilon \\implies \\epsilon = y - X\\beta<\/span>. As such, <span class=\"katex-eq\" data-katex-display=\"false\">\\frac{d}{d\\beta} \\epsilon^{T}\\epsilon = \\frac{d}{d\\beta} (y-X\\beta)^{T}(y-X\\beta)<\/span>.<\/p>\n<p>Working it out,<br \/>\n<span class=\"katex-eq\" data-katex-display=\"false\">\\frac{d}{d\\beta} \\epsilon^{T}\\epsilon \\\\= \\frac{d}{d\\beta} (y-X\\beta)^{T}(y-X\\beta) \\\\= (y-X\\beta)^{T} \\frac{d}{d\\beta}(y-X\\beta) + (y-X\\beta)^{T}\\frac{d}{d\\beta}(y-X\\beta) \\\\= (y-X\\beta)^{T}(-X) + (y-X\\beta)^{T}(-X) \\\\= -2(y-X\\beta)^{T}X \\\\= -2(y^{T} - \\beta^{T}X^{T})X \\\\= -2(y^{T}X - \\beta^{T}X^{T}X)<\/span><\/p>\n<p>By setting the derivative to zero and solving for <span class=\"katex-eq\" data-katex-display=\"false\">\\beta<\/span>, we can find the <span class=\"katex-eq\" data-katex-display=\"false\">\\beta<\/span> that minimizes the sum of squared errors.<br \/>\n<span class=\"katex-eq\" data-katex-display=\"false\">\\frac{d}{d\\beta} \\epsilon^{T}\\epsilon = 0 \\\\ \\implies -2(y^{T}X - \\beta^{T}X^{T}X) = 0 \\\\ \\implies y^{T}X - \\beta^{T}X^{T}X = 0 \\\\ \\implies y^{T}X = \\beta^{T}X^{T}X \\\\ \\implies (y^{T}X)^{T} = (\\beta^{T}X^{T}X)^{T} \\\\ \\implies X^{T}y = X^{T}X\\beta \\\\ \\implies (X^{T}X)^{-1}X^{T}y = (X^{T}X)^{-1}(X^{T}X)\\beta \\\\ \\implies \\beta = (X^{T}X)^{-1}X^{T}y<\/span><\/p>\n<p>Without too much difficulty, we saw how we arrived at the linear regression solution of <span class=\"katex-eq\" data-katex-display=\"false\">\\beta = (X^{T}X)^{-1}X^{T}y<\/span>. The general path to that derivation is to recognize that you&#8217;re trying to minimize the sum of squared errors (<span class=\"katex-eq\" data-katex-display=\"false\">\\epsilon^{T}\\epsilon<\/span>), which can be done by finding the derivative of <span class=\"katex-eq\" data-katex-display=\"false\">\\epsilon^{T}\\epsilon<\/span>, setting it to zero, and then solving for <span class=\"katex-eq\" data-katex-display=\"false\">\\beta<\/span>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In deriving the linear regression solution, we will be taking a closer look at how we &#8220;solve&#8221; the common linear regression, i.e., finding in . I mention &#8220;common,&#8221; because there are actually several ways you can get an estimate for based on assumptions of your data and how you can correct for various anomalies. &#8220;Common&#8221; &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.wdiam.com\/b\/2013\/09\/02\/deriving-the-linear-regression-solution\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Deriving the Linear Regression Solution&#8221;<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,6],"tags":[16,22,40],"class_list":["post-456","post","type-post","status-publish","format-standard","hentry","category-lesson","category-statistics","tag-derivation","tag-linear-regression","tag-statistics-2"],"_links":{"self":[{"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/posts\/456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/comments?post=456"}],"version-history":[{"count":6,"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/posts\/456\/revisions"}],"predecessor-version":[{"id":597,"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/posts\/456\/revisions\/597"}],"wp:attachment":[{"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/media?parent=456"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/categories?post=456"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wdiam.com\/b\/wp-json\/wp\/v2\/tags?post=456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}