Regression Analysis with Matlab

Michael Bar, San Francisco State University
Table of Contents

Introduction

In this note we demonstrate the use of matrices for regression analysis. The multivariate regression model is:
Thus, the outcome of observation i, , is a linear function of regressors , plus an error term . Let be some estimators of the unknown coefficients . We define the fitted eqaution, which gives the predicted values of the outcome for given estimates:
The residual of observation i is . The OLS (Ordinary Least Square) estimator of is the vector which minimizes the sum of squared residuals, i.e. solves:
The first order necessary conditions for this this quadratic optimization problem is a system of k linear equations with k unknowns, .
clear
close all

Statistics and Machine Learning Toolbox

Matlab's Statistics and Machine Learning Toolbox can estimate the multivariate linear regression models using the command fitlm, which is similar to the lm() in R.

Reading the data from the web

wage = webread("https://dl.dropboxusercontent.com/s/32m1t3iyno319j6/wage21.csv?dl=0");

Estimating the model

The next command generates output very similar to R.
m = fitlm(wage,'EARNINGS ~ S + EXP');
m
m =
Linear regression model: EARNINGS ~ 1 + S + EXP Estimated Coefficients: Estimate SE tStat pValue ________ _______ _______ __________ (Intercept) -40.984 7.4496 -5.5015 5.837e-08 S 4.1828 0.39843 10.498 1.4122e-23 EXP 0.73835 0.2243 3.2919 0.0010607 Number of observations: 540, Error degrees of freedom: 537 Root Mean Squared Error: 22.8 R-squared: 0.171, Adjusted R-Squared: 0.168 F-statistic vs. constant model: 55.5, p-value = 1.21e-22

Matrix Algebra

In this section we demonstrate what statistical packages are doing when they are estimating the multivariate regression model.

OLS problem in matrix notation

The model in the introduction can be presented in matrix form:
the model becomes
The fitted equation and residuals of vecotr of estimates are
The OLS problem in matrix form is therefore:
The solution is

Solving the OLS problem with matrix algebra

We prepare the vector of dependent variable Y and the design matrix X, containing column of 1s and the regressors .
Y = wage(:,{'EARNINGS'}); %Extracting the dependent variable
X = wage(:,{'S','EXP'}); %Extracting the needed regressors
Y = table2array(Y); %Converting to matrix
X = table2array(X); %Converting to matrix
X = [ones(length(X),1), X]; %Adding vector of 1s for X
Next we use the analytical solution for OLS coefficients, .
b = (X'*X)\X'*Y %Gaussian elimination (faster, more accurate)
b = 3×1
-40.9836 4.1828 0.7383
b_inv = inv(X'*X)*X'*Y %Matrix inversion
b_inv = 3×1
-40.9836 4.1828 0.7383
Matrix algebra is used to obtain elegan formulas for decomposition of sum of squares:
In matrix form, these become:
The estimated covariance matrix of OLS estimators:
where k is the number of number of estimated coefficients.
Thus, the standard errors of OLS estimators is the the square root of the diagonal of the covariance matrix.
Implementing the above in Matlab is straightforward.
Y_hat = X*b; %Fitted (predicted) values
e = Y - Y_hat; %Residuals
n = size(X,1); %Number of observations
k = size(X,2); %Number of regressors
C = (X'*X)\eye(k)*(e'*e)/(n-k); % cov(b)
se = sqrt(diag(C)) %Standard errors of b
se = 3×1
7.4496 0.3984 0.2243
TSS = Y'*Y - n*mean(Y)^2; %Total Sum of Squares
ESS = b'*(X'*X)*b - n*mean(Y)^2; %Explained Sum of Squares
RSS = e'*e; %Residuals Sum of Squares
R2 = ESS/TSS; %Ordinary R-squared
R2_adj = 1 - (1-R2)*(n-1)/(n-k); %Adjusted R-squared
% Presenting the results
disp(['TSS = ', num2str(TSS)])
TSS = 336803.0345
disp(['ESS = ', num2str(ESS)])
ESS = 57708.8678
disp(['RSS = ', num2str(RSS)])
RSS = 279094.1667
disp(['R^2 = ', num2str(R2)])
R^2 = 0.17134
disp(['R^2_adj = ', num2str(R2_adj)])
R^2_adj = 0.16826
Comparing our results to the ones obtained by Matlab's fitlm() function:
disp(['TSS = ', num2str(m.SST)]) %Total Sum of Squares
TSS = 336803.0345
disp(['ESS = ', num2str(m.SSR)]) %Explained Sum of Squares
ESS = 57708.8678
disp(['RSS = ', num2str(m.SSE)]) %Residuals Sum of Squares
RSS = 279094.1667
disp('R^2 = ') %R-squared (ordinary and adjusted)
R^2 =
disp(m.Rsquared)
Ordinary: 0.1713 Adjusted: 0.1683