Linear Least Squares

Mathematical Background

Linear Combinations Each point on the line is a linear combination of basis vectors: \[ y_i = \beta_0 \cdot 1 + \beta_1 \cdot x_i + \epsilon_i \] where \(\epsilon_i\) is Gaussian noise with variance \(\sigma^2\). In matrix form, for all points: \[ \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} \]

Basis Vectors The design matrix \(\mathbf{X}\) consists of two basis vectors: \[ \mathbf{x}_0 = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \quad \text{(intercept)} \] \[ \mathbf{x}_1 = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad \text{(slope)} \] The fitted line lies in the span of these vectors.

Least Squares Solution We find weights \(\beta_0, \beta_1\) that minimize: \[ \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|^2 = \sum_{i=1}^n (y_i - (\beta_0 + \beta_1x_i))^2 \] The solution is the projection onto span\((\mathbf{X})\): \[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} \]

Geometric Interpretation The fitted line is a linear combination of basis vectors that: Spans a plane in \(\mathbb{R}^n\) Minimizes distance to data point vector \(\mathbf{y}\) Makes residual \(\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}\) orthogonal to \(\mathbf{X}\)

Line Parameters (β)

True Slope (β₁): 1.00

True Intercept (β₀): 0.50

Data Generation

Noise Level (σ): 1.000

Number of Points (n): 50

True Line: y = β₁x + β₀

Fitted Line: ŷ = β̂₁x + β̂₀

Data: y = β₁x + β₀ + ε