- Training error is the error we get applying the model to the same data from which we trained.
- Test error is the error that we incur on new data.


When the test set is not available, cross-validation methods estimate the test error rate by holding out a subset of the training observations from the fitting process, and then applying the method to those held out observations.
- Cross-Validation approaches
- The validation set approach
- Leave-One-Out Cross-Validation (LOOCV)
- k-Fold Cross-validation
The Validation Set Approach
- Randomly divide the available set of samples into two parts: a training set and a validation or hold-out set.
- The model is fit on the training set, and the fitted model is used to predict the responses for the observations in the validation set.
- The resulting validation-set error provides an estimate of the test error.

## An example of the validation set approach
library(ISLR)
dim(Auto)
set.seed(1)
train = sample(1:392,196) ## create the index to randomly split the data set into two halves
degrees = 1:10
errate = c()
# fit the simple linear regression
lm.fit = lm(mpg~ horsepower,data = Auto, subset = train)
mse = mean((Auto$mpg-predict(lm.fit, newdata = Auto))[-train]^2)
errate = c(errate, mse)
# fit the regression with poly terms with degrees from 2 to 10
for (d in 2:10){
lm.fit = lm(mpg~poly(horsepower,d),data = Auto, subset = train)
mse = mean((Auto$mpg-predict(lm.fit, newdata = Auto))[-train]^2)
errate = c(errate, mse)
}
#par(new=T) # add new lines on the same graph
plot(degrees,errate,type="b", ylim=c(15,30), ylab = "Mean squared error")
Advantages:
Drawbacks:
- Highly variable: the validation estimate of the test error rate can be highly variable, depending on which observations are included in the training set.
- Waste data: we only use half of the data set to fit the model, which will gives us the unreliable estimate.
Leave-one-out
Instead of creating two subsets like the validation set approach, a single observation is used for the validation set, and the remaining data make up the training set.
Repeat the procedure n times by iteratively leaving one observation out
Compute the average MSE of all n test estimates.

The LOOCV estimate for the test MSE is the average of these n test error estimates:
\(CV_{(n)} = \frac{1}{n}\sum_{i=1}^{n}MSE_{i}\)
### An example of LOOCV
degrees = 1:10
errate = c()
for ( d in 1:10){
mse = c()
for (i in 1:nrow(Auto)){
lm.fit = lm(mpg~poly(horsepower,d),data = Auto[-i,])
mse0 = (Auto$mpg[i]-predict(lm.fit, newdata = Auto[i,]))^2
mse = c(mse,mse0)
}
errate = c(errate, mean(mse))
}
plot(degrees,errate,type="b", ylim=c(15,30), ylab = "Mean squared error")
### Use the cv.glm function in boot package
## set the family argument to default is the linear regression
library(boot)
glm.fit = glm(mpg~horsepower,data=Auto) ##
cv.glm(Auto, glm.fit)$delta
# tm = proc.time() # record the running time
errate2 = c()
for (d in degrees){
glm.fit = glm(mpg~poly(horsepower,d), data = Auto)
err0 = cv.glm(Auto,glm.fit)$delta[1]
errate2 = c(errate2,err0)
}
# proc.time() - tm
plot(errate2,type="b",col="red")
A Leave-one-out shortcut for Least squares linear regression
In linear regression, we can obtain all the prediction errors from a single fit.
\(CV_{(n)} = \frac{1}{n}\sum_{i=1}^{n}(\frac{y_i-\hat{y_i}}{1-h_i})^2\)
where \(h_i\) is the leverage statistics.
### create the loocv shortcut function
loocv = function(fit){
h = hatvalues(fit)
val = mean((residuals(fit)/(1-h))^2)
return(val)
}
errate3 = c()
for (d in degrees){
glm.fit = glm(mpg~poly(horsepower,d), data = Auto)
err0 = loocv(glm.fit)
errate3 = c(errate3,err0)
}
### This shortcut is much faster!
Advantages:
- Much less bias, since the training set contains n-1 observations.
- Performing LOOCV many times will always result in the same MSE
Drawbacks:
- Computationally expensive
k-fold
- Randomly divides a set of observations into k groups, for folds, of approximately equal size (commonly using k=5 or 10). Each fold contains a non overlapping validation set and training set.
- Repeat k times, each time, one of the k subsets is used as the test set.
- The test error is estimated by averaging these k estimates
The k-fold CV estimate:
\(CV_{(k)} = \frac{1}{k}\sum_{i=1}^{k}MSE_i\)

When k = n, it is actually the leave-one-out cross-validation, since we leave out one data point at a time.
glm.fit = glm(mpg~horsepower,data=Auto) ##
cv.glm(Auto, glm.fit, K = 10)$delta
erratek = c()
for (d in degrees){
glm.fit = glm(mpg~poly(horsepower,d), data = Auto)
err0 = cv.glm(Auto,glm.fit, K=10)$delta[1]
erratek = c(erratek,err0)
}
plot(degrees,erratek, type="l", ylim=c(15,30), ylab = "Mean squared error")
- The variability is much lower than the variability in the test error estimates that results from the validation set approach
- much faster than LOOCV method
- Bias-Variance trade-off for k-Fold cross validation
Extending to Classification
- Instead of MSE, classification error is used \(Err_i = I(y_i \neq \hat{y}_i)\)
- The algorithm remains the same
- In cv.glm() , add the argument cost, which is a function of two vector arguments specifying the cost function for the cross-validation.
library(boot)
head(mtcars)
### create the cost function
misRate = function(x,y){
mean(x==ifelse(y>0.5,1,0))
# mean(abs(x-y)>0.5)
}
### LOOCV
fit <- glm(vs ~ mpg, data = mtcars, family = binomial)
cv.glm(mtcars,fit,cost = misRate)$delta[1]
pred = c()
for (i in 1:nrow(mtcars)){
fit <- glm(vs ~ mpg, data = mtcars[-i,], family = binomial)
pred = c(pred, mtcars$vs[i] == ifelse(predict(fit,mtcars[i,],type="response")>0.5,1,0))
}
mean(pred)
### k-fold
fit <- glm(vs ~ mpg, data = mtcars, family = binomial)
cv.glm(mtcars,fit,cost = misRate,K = 5)$delta[1]