Theme of activity

Bootstrapping with boot

In assignment 6, you will need to bootstrap with boot

Here are the bootiful steps of the process:

Prepare data

We saw the hardwood data in Lecture:

rm(list=ls())
x = c(1,1.5,2,3,4,4.5,5,5.5,6,6.5,7,8,9,10,11,12,13,14,15)
y = c(6.3,11.1,20,24,26.1,30,33.8,34.0,38.1,39.9,42,46.1,53.1,52,52.5,48,42.8,27.8,21.9)
n=length(y)
plot(x,y,xlab="hardwood %",ylab="tensile strength")

data=data.frame(x=x,y=y)

Prepare the extractor function

# what should be the arguments of this function?
#myextractor=function(data, what else){
# which model should I run ?
# what output of the model should I keep?
#what should the function return?
#  return(whatreturned)
#}

Produce the bootstrap distribution

library(boot)
Nboot = 20
# what am I doing here?
# mybdone=boot(data,myextractor,R=Nboot)
# now append mybdone to the bootstrap distribution mybd
#  mybd=c()  
#  attributes()
#  mybd$t

How to use the boot library-summary:

# you need to create your own function - give it a name, here it is called myextractor
# that will take the data, run the regression model and return whatever statistic of interest, say t,
# you are interested in. For example t could be one of the regression coefficients produced by
# your regression model. In the case of linear regression we have two coefficients, the intercept and
# the slope.
# mynewt =boot(data,myextractor,R=Nboot)
# when this function is finished, t is a vector
# then you accumulate in your bootstrap distribution of a quality of S
# the measure of quality could be 
# temp=c(temp,var(myextractor$t)) #accumulate the statistic of interest t calculated by myfunction

The linear regression model

The following data consists of of pairs \((x_i,y_i), i=1, 2, \ldots , n\), where \(x\) is the proportion of hardwood in a piece of lumber, and \(y\) is a measure of tensile strength. A plot of \(x\) vs \(y\) indicates that the relationship is non-linear.

rm(list=ls())
x = c(1,1.5,2,3,4,4.5,5,5.5,6,6.5,7,8,9,10,11,12,13,14,15)
y = c(6.3,11.1,20,24,26.1,30,33.8,34.0,38.1,39.9,42,46.1,53.1,52,52.5,48,42.8,27.8,21.9)
n=length(y)
data=data.frame(x=x,y=y)
plot(x,y,xlab="hardwood %",ylab="tensile strength")

We would like to use \(x\) to predict \(y\).

  • Let’s start with a simple linear regression model

\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i\]

Bootstrapping the standard error of the slope.

There is a builtin procedure “boot” which can be used to estimate the standard error of any statistic. In order to use it, you need to create a function which returns the statistic you’re interesedt in. For the regression slope,

b1=function(data,index){
  x=data$x[index]
  y=data$y[index]
  lmout=lm(y~x)
  b1=coef(lmout)[2]
  return(b1)}

“index” is a subset of 1:n which will be passed IMPLICITLY in by the boot command.

USING BOOT for building the boodist

library(boot)
data=data.frame(x=x,y=y)
boot(data,b1,R=1000)
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = data, statistic = b1, R = 1000)
## 
## 
## Bootstrap Statistics :
##     original    bias    std. error
## t1* 1.770986 0.1048977   0.9600731

MANUAL computation of boodist

   nd <- nrow(data)
   Nboot=1000
   boodis=NULL
   for (i in 1:Nboot){
    index=sample(1:nd,nd,replace=T)
    datai=data.frame(data[index,])
    lmout=lm(y~x,data=datai)
    t = coef(lmout)[2]
    boodis=c(boodis,t)
   }

Exploitation (analysis) of the boodist:

#   hist(boodis,main="Bootstrap distribution of estimated slope",xlab="Slope estimate")
#   bi=quantile(boodis,c(.05,.95))
#   vbhat=var(boodis)
#   sbhat=sd(boodis)
hist(boodis,main="Bootstrap distribution of estimated slope",xlab="Slope estimate")

library(boot)
data=data.frame(x=x,y=y)
b1=function(data,index){
  x=data$x[index]
  y=data$y[index]
  lmout=lm(y~x)
  b1=coef(lmout)[2]
  return(b1)} 
data=data.frame(x=x,y=y)
Nboot = 100
bout=boot(data,b1,R=Nboot)
bout$t
##               [,1]
##   [1,]  1.36957449
##   [2,]  1.70886470
##   [3,]  2.16987323
##   [4,]  1.51027900
##   [5,]  1.81088475
##   [6,]  2.04072267
##   [7,]  1.01784471
##   [8,]  3.30262993
##   [9,]  2.16350250
##  [10,]  1.36802488
##  [11,]  0.77092476
##  [12,]  1.79919355
##  [13,]  3.13036718
##  [14,]  1.83186488
##  [15,]  3.16009047
##  [16,]  2.29977850
##  [17,]  1.50602386
##  [18,]  3.64288123
##  [19,]  1.63521042
##  [20,]  1.67236391
##  [21,]  1.04596273
##  [22,]  1.94589345
##  [23,]  2.62576749
##  [24,]  1.15503725
##  [25,] -0.46778711
##  [26,]  0.41052811
##  [27,]  1.51006273
##  [28,]  0.61215560
##  [29,]  2.25430545
##  [30,]  1.42966419
##  [31,]  1.91986155
##  [32,]  1.71427701
##  [33,]  1.60083892
##  [34,]  1.86588672
##  [35,]  0.90137823
##  [36,]  1.64983919
##  [37,]  1.67246757
##  [38,]  3.51654728
##  [39,]  2.35135574
##  [40,]  0.64773437
##  [41,]  1.65859449
##  [42,]  1.30414511
##  [43,]  1.05480648
##  [44,]  1.22772847
##  [45,]  0.94494959
##  [46,]  1.20936868
##  [47,]  2.57591627
##  [48,]  2.58037241
##  [49,]  0.75120091
##  [50,]  2.91964906
##  [51,]  2.30264820
##  [52,]  4.47456752
##  [53,]  4.35696185
##  [54,]  0.86781373
##  [55,]  2.66331267
##  [56,]  0.32747069
##  [57,]  1.97380860
##  [58,]  1.41225014
##  [59,]  1.58455657
##  [60,]  1.83177795
##  [61,]  0.77575230
##  [62,]  2.37379725
##  [63,]  1.92399629
##  [64,]  1.16185286
##  [65,]  0.96448087
##  [66,]  0.05134751
##  [67,]  1.16938250
##  [68,]  4.65164835
##  [69,]  1.40602582
##  [70,]  2.44036939
##  [71,]  3.11130485
##  [72,]  2.36394821
##  [73,]  1.85011753
##  [74,]  1.24532084
##  [75,]  2.07025588
##  [76,]  2.76835907
##  [77,]  2.29261620
##  [78,]  2.55613232
##  [79,]  3.05584208
##  [80,]  3.36788525
##  [81,]  1.13809615
##  [82,]  0.50746143
##  [83,]  2.14938160
##  [84,]  0.69231901
##  [85,]  0.93984002
##  [86,]  2.78336133
##  [87,]  2.16159605
##  [88,]  1.26023198
##  [89,]  2.13133428
##  [90,]  0.69672841
##  [91,]  2.83755579
##  [92,]  0.33021915
##  [93,]  2.10548024
##  [94,]  3.30046512
##  [95,]  1.54213208
##  [96,]  0.09642482
##  [97,]  1.22407180
##  [98,]  2.08538813
##  [99,]  1.41372566
## [100,]  1.40473870
#hist(temp)
Nboot = 100
temp=NULL
for (i in 1:10) {
  bout=boot(data,b1,R=Nboot)
  temp=c(temp,sd(bout$t)) #accumulate the standard errors
}
temp
##  [1] 0.8963718 0.8601467 0.8102189 0.9165147 1.0112139 1.1051976 1.0143718
##  [8] 1.0335791 0.8934850 0.9951823
#hist(temp)
  • A MANUAL EQUIVALENT TO BOOT:
serr <-function(Nboot,data){
   nd <- 19
   boodis=NULL
   for (i in 1:Nboot){
    index=sample(1:nd,nd,replace=T)
    datai=data.frame(data[index,])
    lmout=lm(y~x,data=datai)
    t = coef(lmout)[2]
    boodis=c(boodis,t)
   }
sbhat=sd(boodis)  
return(sbhat)
}

Now try to run this several time. This is the crux of question 2 in assignment 6. Each run gives a different answer!

serr(10,data)
## [1] 1.004119
  • WHAT IS THE IMPACT OF NBOOT?

So we can a. try to run serr several time with a fixed value of 10 bootstrap samples

# your loop here
serr(10,data)
## [1] 0.850797
# end of loop
  1. try to use other values , for example 50 here, for the number of bootstrap samples:
# your loop here
serr(5,data)
## [1] 0.8193444
# end of loop

c and so on

We can wrap a b c etc in another loop. For example:

Rry 4 values for Nboot

external loop

#allstats=NULL
#for (Nboot in c(10,20,50,100)){

# call boot with Nboot

# accumulate in allstats
#}

Puzzle: A mysterious table

O noble readers, gentleman ane gentlewomen

I hereby give thou a great table of real data

I ask thou to apply the power (?) of 21’th century technology to unravel what TRUTH lies behind this magical table.

Can thou do this?

# the mysterious table
x = c(84.01,248.54,0.24,11.86,1.88,0.61,164.79,1.00,29.46)
y = c(19.19,39.53,0.39,5.20,1.52,0.72,30.06,1.,9.54)
plot(x,y)

Hint: think about this guy:

Hints for solving this puzzle:

The solution of the puzzle is given at the end of this lecture

Polynomial regression: real examples

Example 1: Flight of Space Shuttle

rm(list=ls())
x = c(0,31,45,54,62,71,77,83,89,95,99,123)
y = c(0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,150000)

n=length(y)
plot(x,y,xlab="Altitude(m)",ylab="time",main="Space Shuttle Flight")

 lmout=lm(y~x)
  b=coef(lmout)
  b
## (Intercept)           x 
##  -26255.809    1224.451
x2=x^2
lm2.out=lm(y~x+x2)
coef(lm2.out) #least squares estimates
## (Intercept)           x          x2 
## -354.746270   38.359597    9.678284

\[z = -355 + 38.36 * t + 9.678 t^2\]

What is the acceleration of the Space shuttle?

https://www.nasa.gov/pdf/585034main_ALG_ED_SSA-Altitude.pdf

Example 2: Flight of the ISON Comet (2013)

(src: Nasa activity)

Let us analyze the orbit of a comet.

In November 2013, Comet ISON made a close perihelion passage to the Sun, resulting in it’s (believed) destruction. Since the orbit of Comet ISON was very nearly parabolic (eccentricity of 1), the data from this provided an excellent real-world example of parabolas.

The perihelion, or point where the comet is closest to the sun, came very close to the surface of the sun.

http://gk12.ciera.northwestern.edu/classroom/2014/Comet%20ISON.pdf

November 26, 18:00 UT -10.5 +0.7

November 27, 13:00 UT -8.8 +5.6

November 28, 01:00 UT -7.7 +8.7

November 28, 08:00 UT -6.3 +10.9

November 28, 14:00 UT -4.6 +13.7

November 28, 23:00 UT +3.2 +14.5

November 29, 10:00 UT +6.6 +9.0

November 29, 19:00 UT +8.0 +5.6

November 30, 10:00 UT +9.3 +1.7

rm(list=ls())
x = c(-10.5,-8.8,-7.7,-6.3,-4.5,3.2,6.6,8.0,9.3)
y = c(0.7,5.6,8.7,10.9,13.7,14.5,9.0,5.6,1.7)

n=length(y)
plot(x,y,ylim=c(0,16),xlab="X(MKm)",ylab="MKm",main="Comet ISON 2013 Flight")

  x2=x^2
lm2.out=lm(y~x+x2)
coef(lm2.out) #least squares estimates
## (Intercept)           x          x2 
##  16.4504252  -0.1293965  -0.1546104

Question for the curious: A famous astronomer born in Nova-Scotia did his first research work on comets. Who is he? What did he prove? Did he use regression? Why or why not?

Example 3: Nightime light and brightness gradient

Nightime light pollution

The ecological impacts of nighttime light pollution

The presence of artificial lights increases the average intensity of night-time sky brightness, masks natural regimes resulting from lunar periodicity and increases the number of full moon equivalent hours of sky brightness throughout the night.

src

Connect nighttime light to human activities and pace of urban and economic development in space:

(src:https://www.sciencedirect.com/science/article/pii/S003442571400474X)

Based upon statistically fitted yearly quadratic curves at the urban scale, we spatially subdivided the DMSP/OLS image of a given city into five sub-areas with various magnitudes of artificial night-time light emissions: low, medium-low, medium, medium-high and high night-time lighting areas which could be typically connected to the urban sub-regions experiencing different demographic and socioeconomic activities. Particularly, urban-level intense night-time lighting areas (including high and the sum of medium-high and high) are very close to the corresponding statistical urban built-up areas with intensified human activity.

Example 4: Impact of radiation on chromosomes (COMET assay)

(src:https://www.sciencedirect.com/science/article/pii/S0027510701001002)

Chromosome aberrations under radiation impact

https://www.jstor.org/stable/pdf/3577123.pdf

LESSON ?

piecewise regression?

Spline regression ?

Piecewise polynomial regression

Breakpoint analysis and segmented regression

Quantity <- c(25,39,45,57,70,85,89,100,110,124,137,150,177)
Sales <- c(1000,1250,2600,3000,3500,4500,5000,4700,4405,4000,3730,3400,3300)

plot(Quantity,Sales)

data <- data.frame(Quantity,Sales)
data
##    Quantity Sales
## 1        25  1000
## 2        39  1250
## 3        45  2600
## 4        57  3000
## 5        70  3500
## 6        85  4500
## 7        89  5000
## 8       100  4700
## 9       110  4405
## 10      124  4000
## 11      137  3730
## 12      150  3400
## 13      177  3300
set.seed(12)
xx<-1:100
zz<-runif(100)
yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+15*pmax(zz-.5,0)+rnorm(100,0,2)
dati<-data.frame(x=xx,y=yy,z=zz)
plot(dati$x,dati$y)

(src:https://rpubs.com/MarkusLoew/12164)

#x <- c(25,39,45,57,70,85,89,100,110,124,137,150,177)
#y <- c(1000,1250,2600,3000,3500,4500,5000,4700,4405,4000,3730,3400,3300)


x <- 0:39
y <- c(  8.8500,  32.0775,  74.7375, 107.6775, 132.0975, 156.6675,
       169.0650, 187.5375, 202.2575, 198.0750, 225.9600, 204.3550,
       233.8125, 204.5925, 232.3625, 204.7550, 220.1925, 199.5875,
       197.3025, 175.3050, 218.6325, 163.0775, 170.6625, 148.2850,
       154.5950, 135.4050, 138.8600, 125.6750, 118.8450,  99.2675,
       129.1675,  91.1925,  89.7000,  76.8825,  83.6625,  74.1950,
        73.9125,  55.8750,  59.8675,  48.1900)


plot(x,y)

df <- data.frame(x,y)
df
##     x        y
## 1   0   8.8500
## 2   1  32.0775
## 3   2  74.7375
## 4   3 107.6775
## 5   4 132.0975
## 6   5 156.6675
## 7   6 169.0650
## 8   7 187.5375
## 9   8 202.2575
## 10  9 198.0750
## 11 10 225.9600
## 12 11 204.3550
## 13 12 233.8125
## 14 13 204.5925
## 15 14 232.3625
## 16 15 204.7550
## 17 16 220.1925
## 18 17 199.5875
## 19 18 197.3025
## 20 19 175.3050
## 21 20 218.6325
## 22 21 163.0775
## 23 22 170.6625
## 24 23 148.2850
## 25 24 154.5950
## 26 25 135.4050
## 27 26 138.8600
## 28 27 125.6750
## 29 28 118.8450
## 30 29  99.2675
## 31 30 129.1675
## 32 31  91.1925
## 33 32  89.7000
## 34 33  76.8825
## 35 34  83.6625
## 36 35  74.1950
## 37 36  73.9125
## 38 37  55.8750
## 39 38  59.8675
## 40 39  48.1900
my.lm <- lm(y ~ x, data = df)
summary(my.lm)
## 
## Call:
## lm(formula = y ~ x, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -169.456  -35.160    7.183   41.404   81.142 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 178.3058    18.1340   9.833 5.45e-12 ***
## x            -1.9346     0.8002  -2.418   0.0205 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 58.42 on 38 degrees of freedom
## Multiple R-squared:  0.1333, Adjusted R-squared:  0.1105 
## F-statistic: 5.845 on 1 and 38 DF,  p-value: 0.02053
# -------------------
# analyse breakpoints
# -------------------
# http://cran.r-project.org/doc/Rnews/Rnews_2008-1.pdf
library(segmented)
# have to provide estimates for breakpoints.
# after looking a the data, 
my.seg <- segmented(my.lm,  seg.Z = ~ x,  psi = list(x = c(8)))

# When not providing estimates for the breakpoints "psi = NA" can be used.
# The number of breakpoints that will show up is not defined
#my.seg <- segmented(my.lm, 
#                    seg.Z = ~ DistanceMeters, 
#                    psi = NA)

# display the summary
summary(my.seg)
## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = my.lm, seg.Z = ~x, psi = list(x = c(8)))
## 
## Estimated Break-Point(s):
##          Est. St.Err
## psi1.x 9.723  0.385
## 
## Meaningful coefficients of the linear terms:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   26.895      8.703    3.09  0.00384 ** 
## x             22.224      1.630   13.63 8.72e-16 ***
## U1.x         -28.845      1.660  -17.38       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.81 on 36 degrees of freedom
## Multiple R-Squared: 0.9473,  Adjusted R-squared: 0.9429 
## 
## Convergence attained in 2 iter. (rel. change 0)
# get the breakpoints
my.seg$psi
##        Initial     Est.    St.Err
## psi1.x       8 9.722769 0.3845108
# get the slopes
slope(my.seg)
## $x
##           Est. St.Err. t value CI(95%).l CI(95%).u
## slope1 22.2240 1.63010  13.633   18.9180   25.5300
## slope2 -6.6208 0.31232 -21.199   -7.2543   -5.9874
library(ggplot2)
# get the fitted data
my.fitted <- fitted(my.seg)
my.model <- data.frame(Distance = df$x, Elevation = my.fitted)

# plot the fitted model
ggplot(my.model, aes(x = Distance, y = Elevation)) + geom_line() 

# add the fitted data to the exisiting plot
#p + geom_line(data = my.model, aes(x = Distance, y = Elevation), colour = "tomato")

higher degree piecewise polynomial regression

library(pracma)
## 
## Attaching package: 'pracma'
## The following object is masked from 'package:boot':
## 
##     logit
x <- 0:39
y <- c(  8.8500,  32.0775,  74.7375, 107.6775, 132.0975, 156.6675,
       169.0650, 187.5375, 202.2575, 198.0750, 225.9600, 204.3550,
       233.8125, 204.5925, 232.3625, 204.7550, 220.1925, 199.5875,
       197.3025, 175.3050, 218.6325, 163.0775, 170.6625, 148.2850,
       154.5950, 135.4050, 138.8600, 125.6750, 118.8450,  99.2675,
       129.1675,  91.1925,  89.7000,  76.8825,  83.6625,  74.1950,
        73.9125,  55.8750,  59.8675,  48.1900)
plot(x,y)

xi <- linspace(0, 39, 8)
pplin <- ppfit(x, y, xi)  # method = "linear"
ppcub <- ppfit(x, y, xi, method = "cubic")

## Not run: 
 plot(x, y, type = "b", main = "Piecewise polynomial approximation")
 xs <- linspace(0, 39, 100)
 yslin <- ppval(pplin, xs)
 yscub <- ppval(ppcub, xs)
 lines(xs, yscub, col="red",lwd = 2)
 lines(xs, yslin, col="blue")

# grid()## End(Not run)

GLM : logistic regression

GLM: generalized linear models

Data collected by Jones (BSc dissertation, University of Southampton, 1975).

Estimate the probability of having Bronchitis is influenced by smoking and/or pollution.

212 participants.

Bronchitis = read.csv("https://raw.githubusercontent.com/bioinformatics-core-shared-training/linear-models-r/master/data/Bronchitis.csv
",header=TRUE)
plot(Bronchitis$cigs,Bronchitis$bron,col="blue4",
        ylab = "Absence/Presense of Bronchitis", xlab = "Daily number of cigarettes")
abline(h=c(0,1),col="light blue")

\(\log \frac{p}{1-p} = b_0 + b_1 x_1 + b_2 x_2 + \ldots\)$

dd <- Bronchitis
fit.glm = glm(bron~cigs,data=Bronchitis,family=binomial)

dd <- Bronchitis
    bd <- dd[sample(nrow(dd),replace=TRUE),]
    gb <- glm(bron~cigs,data=bd,family=binomial)
coef(gb) 
## (Intercept)        cigs 
##  -1.7107371   0.1024861

How would you use boot to estimate a boostrap-CI for the first regression coefficient?

Nonrequired: Regression (more advanced topics)

src:https://github.com/jpneto/Markdowns/blob/master/regression/regression.Rmd

Regression

Regression analysis is the statistical method you use when both the response variable and the explanatory variable are continuous variables. Perhaps the easiest way of knowing when regression is the appropriate analysis is to see that a scatterplot is the appropriate graphic. [The R Book, Crawley] Linear Regression ================

The essence of regression analysis is using sample data to estimate parameter values and their standard errors. First, however, we need to select a model which describes the relationship between the response variable and the explanatory variable(s). The simplest of all is the linear model \[y = a + b.x\]

There are two variables and two parameters. The response variable is y, and x is a single continuous explanatory variable. The parameters are a and b: the intercept is a (the value of y when x = 0); and the slope is b (the change in y divided by the change in x which brought it about).

reg.data<-read.table("https://raw.githubusercontent.com/jpneto/Markdowns/master/regression/regression.txt",header=TRUE)
reg.data
##   growth tannin
## 1     12      0
## 2     10      1
## 3      8      2
## 4     11      3
## 5      6      4
## 6      7      5
## 7      2      6
## 8      3      7
## 9      3      8
attach(reg.data)
plot(tannin,growth,pch=16)
# To find the linear regression fit use 'lm'
# The response variable comes first (growth in our example), then the tilde ~, 
# then the name of the continuous explanatory variable (tannin).
fit <- lm(growth~tannin)
fit$coefficients
## (Intercept)      tannin 
##   11.755556   -1.216667
# draw the best fit line
abline(fit,col="red")

The values of the coefficients mean that the maximal likehood estimation to the equation that minimizes the overall error is \[\text{growth} = 11.756 - 1.217 * \text{tannin}\] ie, the line which is the best fit for the given dataset in the sense that it is the most probable line among all options.

The difference between a measured value of y and the value predicted by the model for the same value of x is called a residual.

Let’s see the residuals:

predt <- function(fit, x) {  # hand-made prediction function
  return (fit$coefficients[[1]] + x * fit$coefficients[[2]])
}
plot(tannin,growth,pch=16)
abline(fit,col="red")
segments(tannin,predt(fit,tannin),tannin,growth,col="red",lty=2)

# their specific values can also be accessed by this component:
fit$residuals
##          1          2          3          4          5          6          7 
##  0.2444444 -0.5388889 -1.3222222  2.8944444 -0.8888889  1.3277778 -2.4555556 
##          8          9 
## -0.2388889  0.9777778

For these regression models there are several important assumptions: + The variance in y is constant (i.e. the variance does not change as y gets bigger). + The explanatory variable, x, is measured without error. + Residuals are measured on the scale of y (i.e. parallel to the y axis). + The residuals are normally distributed.

Under these assumptions, the maximum likelihood is given by the method of least squares, ie, minimizing the sum of the squared errors (the residuals).

Confidence Intervals

In statistics, a confidence interval (CI) is a measure of the reliability of an estimate. It is a type of interval estimate of a population parameter. It is an observed interval (i.e. it is calculated from the observations), in principle different from sample to sample, that frequently includes the parameter of interest if the experiment is repeated. How frequently the observed interval contains the parameter is determined by the confidence level wikipedia

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
pl <- iris$Petal.Length
sl <- iris$Sepal.Length
model <- lm(pl ~ sl)
model
## 
## Call:
## lm(formula = pl ~ sl)
## 
## Coefficients:
## (Intercept)           sl  
##      -7.101        1.858
pred <- predict(model, data.frame(sl = sort(sl)), level = 0.95, interval = "confidence")
head(pred)
##         fit       lwr      upr
## 1 0.8898184 0.5928870 1.186750
## 2 1.0756617 0.7935781 1.357745
## 3 1.0756617 0.7935781 1.357745
## 4 1.0756617 0.7935781 1.357745
## 5 1.2615050 0.9940171 1.528993
## 6 1.4473483 1.1941605 1.700536
lower <- pred[,2]
upper <- pred[,3]
# make plot 
plot(sl, pl, pch=19, type="n")
grid()
polygon(c(sort(sl), rev(sort(sl))), c(upper, rev(lower)), col = "gray", border = "gray")
points(sl, pl, pch=19, cex=0.6)
abline(model, col="red", lwd=2)

Multi-linear Regression

Most of the times, we have a dataset with more than one predictor \(x_i\). One way is to apply simple linear regression to each predictor but this implies that each predictor does not care about the others which is not feasible. A better way is to fit all of them together:

\[y = \beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n + \epsilon\]

Function lm is able to do it:

marks.set <- read.csv(file="https://raw.githubusercontent.com/jpneto/Markdowns/master/regression/Advertising.csv", head=TRUE, sep=',')
head(marks.set)
##      TV Radio Newspaper Sales
## 1 230.1  37.8      69.2  22.1
## 2  44.5  39.3      45.1  10.4
## 3  17.2  45.9      69.3   9.3
## 4 151.5  41.3      58.5  18.5
## 5 180.8  10.8      58.4  12.9
## 6   8.7  48.9      75.0   7.2
fit <- lm(Sales ~ TV+Radio, data=marks.set) # fitting TV and Radio
summary(fit)
## 
## Call:
## lm(formula = Sales ~ TV + Radio, data = marks.set)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.7977 -0.8752  0.2422  1.1708  2.8328 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.92110    0.29449   9.919   <2e-16 ***
## TV           0.04575    0.00139  32.909   <2e-16 ***
## Radio        0.18799    0.00804  23.382   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.681 on 197 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8962 
## F-statistic: 859.6 on 2 and 197 DF,  p-value: < 2.2e-16
fit2 <- lm(Sales ~ ., data=marks.set) # all features are predictors
summary(fit2)
## 
## Call:
## lm(formula = Sales ~ ., data = marks.set)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8277 -0.8908  0.2418  1.1893  2.8292 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.938889   0.311908   9.422   <2e-16 ***
## TV           0.045765   0.001395  32.809   <2e-16 ***
## Radio        0.188530   0.008611  21.893   <2e-16 ***
## Newspaper   -0.001037   0.005871  -0.177     0.86    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.686 on 196 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8956 
## F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16

Some predictors might have stronger correlations than others:

cor(marks.set)
##                   TV      Radio  Newspaper     Sales
## TV        1.00000000 0.05480866 0.05664787 0.7822244
## Radio     0.05480866 1.00000000 0.35410375 0.5762226
## Newspaper 0.05664787 0.35410375 1.00000000 0.2282990
## Sales     0.78222442 0.57622257 0.22829903 1.0000000

In multi-linear regression a coefficient may not produce an effect because it is considered in the context of other coefficients. In this dataset, newspaper seems like an eg.

We can check for interactions, ie,

fit3 <- lm(Sales ~ TV*Radio, data=marks.set) # use TV, Radio and its interaction
summary(fit3)
## 
## Call:
## lm(formula = Sales ~ TV * Radio, data = marks.set)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3366 -0.4028  0.1831  0.5948  1.5246 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.750e+00  2.479e-01  27.233   <2e-16 ***
## TV          1.910e-02  1.504e-03  12.699   <2e-16 ***
## Radio       2.886e-02  8.905e-03   3.241   0.0014 ** 
## TV:Radio    1.086e-03  5.242e-05  20.727   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9435 on 196 degrees of freedom
## Multiple R-squared:  0.9678, Adjusted R-squared:  0.9673 
## F-statistic:  1963 on 3 and 196 DF,  p-value: < 2.2e-16

Notice that this is no longer a simple linear relationship.

We should check for correlations between relevant coefficients (herein TV and Radio) to see if they complement each other in the estimation (low correlation) or just mirror the same effect (because they have high correlation). In this case, since Radio and TV have low correlation, it seems that a joint marketing in TV and Radio has some sinergetic effects.

Another note: we should be careful when deciding to remove coefficients because of their perceived contributions. The hierarchy principle says the following:

If we include an interaction in a model, we should also include the main effects, even if the p-values associated with their coefficients are not significant. Polynomial Regression =====================

Instead of using a line, we can generalize for polynomials. However, Polynomials are extremely flexible and the next example of approaching a sine wave (using just Calculus):

Using Maclaurin expansion we know that sin(x) is given by expression

\[x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + ...\]

Let’s do this in R:

x<-seq(-pi,pi,0.01)
y<-sin(x)
plot(x,y,type="l",ylab="sin(x)",lwd=2)
a1<-x-x^3/factorial(3)
lines(x,a1,col="green")  # 1st approximation
a2<-x-x^3/factorial(3)+x^5/factorial(5)
lines(x,a2,col="red")    # 2nd approximation
a3<-x-x^3/factorial(3)+x^5/factorial(5)-x^7/factorial(7)
lines(x,a3,col="blue")   # 3rd approximation

This is both a strength and a problem. If we use high-degree polynomials we risk overfitting, which is something we wish to avoid.

poly<-read.table("https://raw.githubusercontent.com/jpneto/Markdowns/master/regression/diminish.txt",header=TRUE)
attach(poly)
head(poly)
##   xv yv
## 1  5 26
## 2 10 29
## 3 15 31
## 4 20 30
## 5 25 35
## 6 30 37
plot(xv,yv,pch=16)
model1<-lm(yv~xv)  # fitting a linear model
abline(model1,col="red")

summary(model1)
## 
## Call:
## lm(formula = yv ~ xv)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3584 -2.1206 -0.3218  1.8763  4.1782 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 27.61438    1.17315   23.54 7.66e-14 ***
## xv           0.22683    0.02168   10.46 1.45e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.386 on 16 degrees of freedom
## Multiple R-squared:  0.8725, Adjusted R-squared:  0.8646 
## F-statistic: 109.5 on 1 and 16 DF,  p-value: 1.455e-08
model2<-lm(yv~xv+I(xv^2)) # Fitting a quadratic polynomial
x<-0:90
y<-predict(model2,list(xv=x))
plot(xv,yv,pch=16)
lines(x,y,col="red")

model3<-lm(yv~xv+I(xv^2)+I(xv^3)+I(xv^4)+I(xv^5)) # Fitting a quintic polynomial
x<-0:90
y<-predict(model3,list(xv=x))
plot(xv,yv,pch=16)
lines(x,y,col="red")

Another R tool is the function poly() (eg from Conway’s “Machine Learning for Hackers”, chapter 6):

library(ggplot2)
set.seed(1)
x <- seq(0, 1, by = 0.01)
y <- sin(2 * pi * x) + rnorm(length(x), 0, 0.1)
df <- data.frame(X = x, Y = y)
ggplot(df, aes(x = X, y = Y)) + 
  geom_point()

# using a linear regression:
model <- lm(Y ~ X, data = df)
summary(model)
## 
## Call:
## lm(formula = Y ~ X, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00376 -0.41253 -0.00409  0.40664  0.85874 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.94111    0.09057   10.39   <2e-16 ***
## X           -1.86189    0.15648  -11.90   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4585 on 99 degrees of freedom
## Multiple R-squared:  0.5885, Adjusted R-squared:  0.5843 
## F-statistic: 141.6 on 1 and 99 DF,  p-value: < 2.2e-16
# it's possible to explain 58.9% of the variance
ggplot(data.frame(X = x, Y = y), aes(x = X, y = Y)) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE)

# here's the polynomial regression with poly()
model2 <- lm(Y ~ poly(X, degree = 3), data = df)
summary(model2)
## 
## Call:
## lm(formula = Y ~ poly(X, degree = 3), data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32331 -0.08538  0.00652  0.08320  0.20239 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           0.01017    0.01148   0.886    0.378    
## poly(X, degree = 3)1 -5.45536    0.11533 -47.302   <2e-16 ***
## poly(X, degree = 3)2 -0.03939    0.11533  -0.342    0.733    
## poly(X, degree = 3)3  4.41805    0.11533  38.308   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1153 on 97 degrees of freedom
## Multiple R-squared:  0.9745, Adjusted R-squared:  0.9737 
## F-statistic:  1235 on 3 and 97 DF,  p-value: < 2.2e-16
df <- transform(df, PredictedY = predict(model2))
ggplot(df, aes(x = X, y = PredictedY)) +
  geom_point() +
  geom_line()

# if we exagerate in the polynomial degree, things get overfitted:
model3 <- lm(Y ~ poly(X, degree = 25), data = df)
summary(model3)
## 
## Call:
## lm(formula = Y ~ poly(X, degree = 25), data = df)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.211435 -0.040653  0.005966  0.040225  0.221436 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             1.017e-02  9.114e-03   1.116   0.2682    
## poly(X, degree = 25)1  -5.455e+00  9.159e-02 -59.561  < 2e-16 ***
## poly(X, degree = 25)2  -3.939e-02  9.159e-02  -0.430   0.6684    
## poly(X, degree = 25)3   4.418e+00  9.159e-02  48.236  < 2e-16 ***
## poly(X, degree = 25)4  -4.797e-02  9.159e-02  -0.524   0.6020    
## poly(X, degree = 25)5  -7.065e-01  9.159e-02  -7.713 4.19e-11 ***
## poly(X, degree = 25)6  -2.042e-01  9.159e-02  -2.230   0.0288 *  
## poly(X, degree = 25)7  -5.134e-02  9.159e-02  -0.561   0.5768    
## poly(X, degree = 25)8  -3.100e-02  9.159e-02  -0.338   0.7360    
## poly(X, degree = 25)9   7.723e-02  9.159e-02   0.843   0.4018    
## poly(X, degree = 25)10  4.809e-02  9.159e-02   0.525   0.6011    
## poly(X, degree = 25)11  1.300e-01  9.159e-02   1.419   0.1600    
## poly(X, degree = 25)12  2.473e-02  9.159e-02   0.270   0.7879    
## poly(X, degree = 25)13  2.371e-02  9.159e-02   0.259   0.7965    
## poly(X, degree = 25)14  8.791e-02  9.159e-02   0.960   0.3403    
## poly(X, degree = 25)15 -1.346e-02  9.159e-02  -0.147   0.8836    
## poly(X, degree = 25)16 -2.613e-05  9.159e-02   0.000   0.9998    
## poly(X, degree = 25)17  3.501e-02  9.159e-02   0.382   0.7034    
## poly(X, degree = 25)18 -1.372e-01  9.159e-02  -1.498   0.1384    
## poly(X, degree = 25)19 -6.913e-02  9.159e-02  -0.755   0.4528    
## poly(X, degree = 25)20 -6.793e-02  9.159e-02  -0.742   0.4606    
## poly(X, degree = 25)21 -1.093e-01  9.159e-02  -1.193   0.2365    
## poly(X, degree = 25)22  1.367e-01  9.159e-02   1.493   0.1397    
## poly(X, degree = 25)23 -3.921e-02  9.159e-02  -0.428   0.6698    
## poly(X, degree = 25)24 -5.032e-02  9.159e-02  -0.549   0.5844    
## poly(X, degree = 25)25  1.262e-01  9.159e-02   1.378   0.1722    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09159 on 75 degrees of freedom
## Multiple R-squared:  0.9876, Adjusted R-squared:  0.9834 
## F-statistic: 238.1 on 25 and 75 DF,  p-value: < 2.2e-16
df <- transform(df, PredictedY = predict(model3))
ggplot(df, aes(x = X, y = PredictedY)) +
  geom_point() +
  geom_line()

Fitting a mechanistic model to the available data

Rather than fitting some arbitrary model for curvature (as above, with a quadratic term for inputs), we sometimes have a mechanistic model relating the value of the response variable to the explanatory variable (e.g. a mathematical model of a physical process). In the following example we are interested in the decay of organic material in soil, and our mechanistic model is based on the assumption that the fraction of dry matter lost per year is a constant. This leads to a two-parameter model of exponential decay in which the amount of material remaining (y) is a function of time (t) \[y = y_0 e^{-bt}\]

Taking logs on both sides: \[log(y) = log(y_0) - bt\] we can estimate the parameter of interest, \(b\), as the slope of a linear regression of \(log(y)\) on \(t\) (i.e. we log-transform the \(y\) axis but not the \(x\) axis) and the value of \(y_0\) as the antilog of the intercept.

# get some data (y,t)
data<-read.table("https://raw.githubusercontent.com/jpneto/Markdowns/master/regression/Decay.txt",header=TRUE)
head(data)
##   time    amount
## 1    0 125.00000
## 2    1 100.24886
## 3    2  70.00000
## 4    3  83.47080
## 5    4 100.00000
## 6    5  65.90787
attach(data)
plot(time,amount,pch=16)
# if we apply log to both side of the equation:
# log(y) = log(y0) - bt
# which gives us a linear model, that can be fitted to the data:
model<-lm( log(amount) ~ time )
summary(model)
## 
## Call:
## lm(formula = log(amount) ~ time)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5935 -0.2043  0.0067  0.2198  0.6297 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.547386   0.100295   45.34  < 2e-16 ***
## time        -0.068528   0.005743  -11.93 1.04e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.286 on 29 degrees of freedom
## Multiple R-squared:  0.8308, Adjusted R-squared:  0.825 
## F-statistic: 142.4 on 1 and 29 DF,  p-value: 1.038e-12
# in this case, the slope b = -0.068 and the intercept is 4.547 = log(y0) <=> y0 = 94.38
# which, without the errors, turns out to be: y = 94.38 exp(-0.068t)
# let's include it in the plot:
xv<-seq(0,30,0.2)
yv<-exp(predict(model,list(time=xv)))
lines(xv,yv,col="red")

Linear Regression after Transformation (Power Laws)

Many mathematical functions that are non-linear in their parameters can be linearized by transformation. The most frequent transformations (in order of frequency of use), are logarithms, antilogs and reciprocals. Here is an example of linear regression associated with a power law: \[y = a x^b\] with two parameters, where the parameter \(a\) describes the slope of the function for low values of \(x\) and \(b\) is the shape parameter.

power<-read.table("https://raw.githubusercontent.com/jpneto/Markdowns/master/regression/power.txt",header=TRUE)
attach(power)
head(power)
##       area response
## 1 2.321550 2.367588
## 2 2.525543 2.881607
## 3 2.627449 2.641165
## 4 2.195558 2.592812
## 5 1.088055 2.159380
## 6 2.044432 2.365786
par(mfrow=c(1,2))
plot(area,response,pch=16)
model1 <- lm(response~area)
abline(model1)
plot(log(area),log(response),pch=16)
model2 <- lm(log(response)~log(area))
abline(model2)

The two plots look very similar in this case (they don’t always), but we need to compare the two models.

par(mfrow=c(1,1))
plot(area,response)
abline(lm(response~area))    # model1 fit
xv<-seq(1,2.7,0.01)
yv<-exp(0.75378)*xv^0.24818  # model2 fit: intercept&slope from model2
lines(xv,yv)

They seem quite close, but if we extend the axis, we see how different they judge outliers:

xv<-seq(0,5,0.01)
yv<-exp(predict(model2,list(area=xv))) # same as yv<-exp(0.75378)*xv^0.24818
plot(area,response,xlim=c(0,5),ylim=c(0,4),pch=16)
abline(model1, col="red")
lines(xv,yv, col="blue")

Both models are ok for interpolation (predicting within the range of known data), but offer quite different responses for extrapolation (predicting outside the range of known data)

Another eg: data taken from the inclined plane experiment made with Sofia (Fev 2, 2013)

height <- seq(4,32,4)
time <- c(144,115,103,88,76,72,60,54)
plot(time,height,xlim=c(20,180),ylim=c(0,60),pch=16) # wider 
plot(time,height,xlim=c(20,180),ylim=c(0,60),pch=16)
model<-lm(time~height+I(height^2)) # Fitting a quadratic polynomial
x<-0:60
y<-predict(model,list(height=x))
lines(y,x,col="blue",lwd=2.5)
model2 <- lm(log(time)~log(height)) # Fitting a power law
x<-0:60
y<-exp(predict(model2,list(height=x)))
lines(y,x,col="red",lwd=2.5)
model3 <- lm(time~log(height)) # Fitting a (semi?) power law (seems the best fit 
x<-0:60                        # for interpolation, all are bad for extrapolation)
y<-predict(model3,list(height=x))
lines(y,x,col="green",lwd=2.5)
legend("topright",  
       c("quadratic poly","power law","semi power law"),
       lty=c(1,1,1), # gives the legend appropriate symbols (lines)
       lwd=c(2.5,2.5,2.5),col=c("blue","red","green"))

Before moving on, let’s just make a list of some possible symbols that can be used in the regression formulas.

  • ‘+’ means include this variable (eg: y ~ x1 + x2)
  • ‘.’ means all variables (y ~ .)
  • ‘-’ means delete this variable (y ~ . - x2)
  • ‘*’ means include both variables plus their interaction (y ~ x1*x2)
  • ‘:’ means just the interaction between the variables (y ~ x1:x2)
  • ‘|’ means conditioning, y ~ x|z include x given z
  • ‘^n’ means include the variables plus all the interaction up to n ways ( y~(x1+x2)^3 is equal to y ~ x1 + x2 + x1:x2 + x1:x3 + x2:x3 + x1:x2:x3)
  • ‘I’ means as is, includes a new variable given the expression ( y ~ I(x^2) )
  • ‘1’ means the intercept, -1 deletes the intercept (regress thru the origin)

The nature of the variables–binary, categorial (factors), numerical–will determine the nature of the analysis. For example, if “u” and “v” are factors, \[y \sim u + v\] dictates an analysis of variance (without the interaction term). If “u” and “v” are numerical, the same formula would dictate a multiple regression. If “u” is numerical and “v” is a factor, then an analysis of covariance is dictated.

Local regression

Instead of using the entire set to define a fit, at each point in the data set a low-degree polynomial is fitted to a subset of the data. Local regression (LOESS) uses a nearest neighbors algorithm that determines which data is going to be used to fit the local polynomial. The polynomial is fitted using weighted least squares, giving more weight to points near the point whose response is being estimated and less weight to points further away.

Here’s an animated gif (from SimplyStatistics.org) showing the process:

<img src=“loess.gif” height=“50%” width=“50%”">

It has two main parameters

  • \(\lambda\) - the degree of the local polynommial (usually linear or quadratic). If \(\lambda=0\) then LOESS turns into a moving average.
  • \(\alpha\) - the smoothing parameter (or span), ie, the proportion of data used in each fit. It must be between \((\lambda+1)/n and 1\) (for \(n\) data points). The larger the \(\alpha\) the smoother the fitting curve. Small values are bad since they tend to overfit the noise in the data (good values are 25%, 50%, 75%).
set.seed(101)
xs <- seq(0,2*pi,len=100)
df <- data.frame(x=xs, y=sin(xs)+rnorm(100,0,0.5))  # noisy dataset
plot(df$x, df$y, pch=19)
lines(xs, sin(xs), col="red", lwd=1) # target function
fit <- loess(y~x, data=df, span=0.75, degree=2)
pred <- predict(fit, df$x)
lines(df$x, pred, col="blue", lwd=2) # prediction
# draw confidence bands
library(msir)
## Package 'msir' version 1.3.2
## Type 'citation("msir")' for citing this R package in publications.
fit1 <- msir::loess.sd(df$x, df$y, span=0.75, degree=2)
# lines(xs, fit2$y) # the same as the prediction line above
lines(xs, fit1$upper, lty=2, col="cyan")  # one-sigma lines
lines(xs, fit1$lower, lty=2, col="cyan")
fit2 <- msir::loess.sd(df$x, df$y, span=0.75, degree=2, nsigma=2)
lines(xs, fit2$upper, lty=2, col="blue")  # two-sigma lines
lines(xs, fit2$lower, lty=2, col="blue")

Logistic Regression

The logistic function, or sigmoid function, has a real domain and a [0,1] range, which is appropriate to represent probability at some adequate contexts. Its formula is \[p(x) = \frac{1}{1+e^{-x}}\]

logit <- function(x) {
  return(1/(1+exp(-x)))
}
xs <- seq(-10,10,0.5)
plot(xs,logit(xs), type="l") + abline(v=0,lty=2)

## integer(0)

One popular use of the logistic function is in demographic models, where \(p(x)\) is the population and \(t\) is time, modeling a saturation limit. Another place that is used in is neural networks as an activation function.

Logistic Regression fits a logistic function to a dataset:

marks.set <- read.csv(file="https://raw.githubusercontent.com/jpneto/Markdowns/master/regression/Marks.csv", head=TRUE, sep=',')
head(marks.set)
##     exam_1   exam_2 admitted
## 1 34.62366 78.02469        0
## 2 30.28671 43.89500        0
## 3 35.84741 72.90220        0
## 4 60.18260 86.30855        1
## 5 79.03274 75.34438        1
## 6 45.08328 56.31637        0
sapply(marks.set,class)
##    exam_1    exam_2  admitted 
## "numeric" "numeric" "integer"
# apply the logistic regression
model <- glm(admitted ~ exam_1 + exam_2, family = binomial("logit"), data=marks.set)
new.sample <- data.frame(exam_1=60, exam_2=86)       # a new sample 
predt <- predict(model, new.sample, type="response") # let's predict its class (admitted or not)
predt
##         1 
## 0.9894302
if (round(predt)==1) {   # we can round the result to get a binary response
  print("admmited")
} else {
  print("not admmited")
}
## [1] "admmited"
# Let's check the model's summary:
summary(model)
## 
## Call:
## glm(formula = admitted ~ exam_1 + exam_2, family = binomial("logit"), 
##     data = marks.set)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.19287  -0.18009   0.01577   0.19578   1.78527  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -25.16133    5.79836  -4.339 1.43e-05 ***
## exam_1        0.20623    0.04800   4.297 1.73e-05 ***
## exam_2        0.20147    0.04862   4.144 3.42e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 134.6  on 99  degrees of freedom
## Residual deviance:  40.7  on 97  degrees of freedom
## AIC: 46.7
## 
## Number of Fisher Scoring iterations: 7

The same prediction value could be computed by the formula \[logit(\theta^T \times X)\]where \(\theta\) is the vector of coefficients, and \(X\) is the vector with the sample data. We assume that \(X_0=1\) and \(\theta_0\) is the intercept. So:

predictLogitModel <- function(theta,X) {  # compute logf(theta^T \times X)
  return ( logit( t(theta) %*% X ) )
}
X <- as.matrix(c(1,60,86),nrow=3)         # the sample data
C <- as.matrix(model$coefficients,nrow=3) # the model's coefficients
predictLogitModel(C,X)                    # and, as we see, it's the same value as above
##           [,1]
## [1,] 0.9894302

Logistic Regression is used in categorization (despite its name…), ie, the output is categorical, usually binary. The prediction output \(y = h(\theta,x)\) is within 0 and 1 and can be interpreted has the estimated probability of \(p(y=1|x,\theta)\).

So, as we did before, if there’s no different cost associated to each decision, we decide \(y=1\) when \(p(y=1|x,\theta) \gt 0.5\), or decide \(y=0\) otherwise.

This is the same to decide \(y=1\) when \(\theta^T \times X \geq 0\), and \(y=0\) otherwise. The equation \[\theta^T \times X = 0\] is called the decision boundary.

There is nothing to prevent us to use non-linear decision boundaries:

library(plotrix)
set.seed(222)  
round.data <- data.frame(x=rnorm(40,0,1),
                         y=rnorm(40,0,1))
data.class <- round.data$x^2 + round.data$y^2 > 1 # make a non-linear separable dataset: the decision boundary is the unit circle
round.data$class <- data.class*1
head(round.data)
##              x          y class
## 1  1.487757090  0.2023869     1
## 2 -0.001891901  0.4951010     0
## 3  1.381020790 -0.5693554     1
## 4 -0.380213631  1.1192945     1
## 5  0.184136230  2.2090779     1
## 6 -0.246895883  0.3171826     0
plot(x~y,data=round.data,col=round.data$class+1,xlim=c(-3,3),ylim=c(-3,3),pch=19)
draw.circle(0,0,1,lty=2,border="red")

model <- glm(data.class ~ x+y++I(x^2)++I(y^2), 
             family = binomial("logit"), data=round.data)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
theta <- model$coefficients  # get the coefficients
# if we scale these values we notice that the regression approximates the unit circle equation x^2 + y^2 = 1
theta/max(theta)
## (Intercept)           x           y      I(x^2)      I(y^2) 
## -0.93797140 -0.07723953  0.13032606  1.00000000  0.88413739
predt <- round( predict(model, round.data[-3], type="response") )
table(round.data$class, predt)
##    predt
##      0  1
##   0 14  0
##   1  0 26

Let’s draw the estimated boundary decision in comparisation with the unit circle (the real boundary):

predModel <- function (x,y) {
  return (theta[1] + theta[2]*x + theta[3]*y + theta[4]*x^2 + theta[5]*y^2)
}
xs <- seq(-4,4,0.1)
ys <- seq(-4,4,0.1)
zs <- outer(xs,ys,predModel)
plot(x~y,data=round.data,col=round.data$class+1,xlim=c(-3,3),ylim=c(-3,3),pch=19)
draw.circle(0,0,1,lty=2,border="red")
contour(xs,ys,zs,nlev=0,add=TRUE,col="blue",lty=2)

Spline Regression

ref: James et al. - Intr to Statistical Learning with R Applications (2013)

Instead of fitting a high-degree polynomial over the entire range of \(X\), piecewise polynomial regression involves fitting separate low-degree polynomials over different regions of \(X\). The points where the coefficients change are called knots. Using more knots leads to a more flexible piecewise polynomial. In general, if we place \(K\) different knots throughout the range of \(X\), then we will end up fitting \(K+1\) different polynomials.

To prevent jumps around the knots, the process adds restrictions to which polynommials can be choosen for each region. Usually, for polinomials of degree \(d\) the constraints demand that each pair of polynomials must be continuous up to the \(d-1\) derivative. Usually \(d=3\), so the constraint goes until the second derivative. These are called cubic splines.

library(splines)
library(ISLR) # includes Wage dataset
data(Wage) 
agelims  <- range(Wage$age)
age.grid <- seq(from=agelims[1],to=agelims[2])

In the next code we prespecified knots at ages 25, 40 and 60. This produces a spline with six basis functions. (recall that a cubic spline with three knots has seven degrees of freedom; these degrees of freedom are used up by an intercept, plus six basis functions.)

fit <- lm(wage ~ bs(age, knots=c(25,40,60) ), # prespecified knots at ages 25, 40, and 60
          data=Wage)
pred <- predict(fit, newdata=list(age=age.grid), se.fit=TRUE)
plot(Wage$age, Wage$wage, col="gray")
lines(age.grid, pred$fit, lwd=2)
lines(age.grid, pred$fit+2*pred$se.fit, lty="dashed")
lines(age.grid, pred$fit-2*pred$se.fit, lty="dashed")
for (i in c(25,40,60))
  abline(v=i, col="red", lty=2)  # draw position of knots

We could also use the df option to produce a spline with knots at uniform quantiles of the data.

dim(bs(Wage$age, knots=c(25,40,60)))
## [1] 3000    6
dim(bs(Wage$age, df=6))
## [1] 3000    6
my.knots <- attr(bs(Wage$age, df=6), "knots")

In this case R chooses knots at ages 33.8, 42.0, and 51.0, which correspond to the 25th, 50th, and 75th percentiles of age.

fit.2 <- lm(wage ~ bs(age, knots= my.knots), data=Wage)
pred.2 <- predict(fit.2, newdata=list(age=age.grid), se.fit=TRUE)
plot(Wage$age, Wage$wage, col="gray")
lines(age.grid, pred.2$fit, lwd=2)
lines(age.grid, pred.2$fit+2*pred$se.fit, lty="dashed")
lines(age.grid, pred.2$fit-2*pred$se.fit, lty="dashed")
for (i in 1:length(my.knots))
  abline(v=my.knots[[i]], col="red", lty=2)  # draw position of knots

The function bs() also has a degree argument, so we can fit splines of any degree, rather than the default degree of 3 (which yields a cubic spline).

Multiclass classification

If there’s the need for multiclass classification the typical method is called one vs. all. For \(n\) classes, we create \(n\) predictors. For predictor \(i\) we create a binary classification where class \(i\) is considered as \(1\) and all the others are \(0\). With this process we get \(n\) models \(h^{(i)}(x)\), each one predicting that \(y=i\).

So, when we get a new input \(x\), we apply it to the \(n\) predictors and choose the one with higher value. If the \(k^{th}\) predictor output the higher value, than our prediction will be \(y=k\).

Regularization

From http://horicky.blogspot.pt/2012/05/predictive-analytics-generalized-linear.html

With a large size of input variable but moderate size of training data, we are subjected to the overfitting problem, which is our model fits too specific to the training data and not generalized enough for the data we haven’t seen. Regularization is the technique of preventing the model to fit too specifically to the training data.

In linear regression, it is found that overfitting happens when \(\theta\) has a large value. So we can add a penalty that is proportional to the magnitude of \(\theta\). In L2 regularization (also known as Ridge regression), \(\sum_i \theta_i^2\) will be added to the cost function, while In L1 regularization (also known as Lasso regression), \(\sum_i |\theta_i|\) will be added to the cost function.

Both L1, L2 will shrink the magnitude of \(\theta_i\), L2 tends to make dependent input variables having the same coefficient while L1 tends to pick of the coefficient of variable to be non-zero and other zero. In other words, L1 regression will penalize the coefficient of redundant variables that are linearly dependent and is frequently used to remove redundant features.

Combining L1 and L2, the general form of cost function becomes Cost == Non-regularization-cost + \(\lambda (\alpha \sum_i |\theta_i| + (1-\alpha) \sum_i \theta_i^2)\)

Notice there are two tunable parameters, \(\lambda\) and \(\alpha\). Lambda controls the degree of regularization (0 means no-regularization, infinity means ignoring all input variables because all coefficients of them will be zero). Alpha controls the degree of mix between L1 and L2. (0 means pure L2 and 1 means pure L1).

library(glmnet)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:pracma':
## 
##     expm, lu, tril, triu
## Loaded glmnet 3.0-2
# the cross validation selects the best lambda
cv.fit <- cv.glmnet(as.matrix(iris[,-5]), 
                    iris[,5],
                    nlambda=100, alpha=0.8, 
                    family="multinomial")
plot(cv.fit)

cv.fit$lambda.min # best lambda, ie, that gives minimum mean cross-validated error
## [1] 0.0005070981
prediction <- predict(cv.fit, newx=as.matrix(iris[,-5]), type="class")
table(prediction, iris$Species)
##             
## prediction   setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         48         1
##   virginica       0          2        49

Instead of picking lambda (the degree of regularization) based on cross validation, we can also based on the number of input variables that we want to retain. So we can plot the “regularization path” which shows how the coefficient of each input variables changes when the lambda changes and pick the right lambda that filter out the number of input variables for us.

# try alpha = 0, Ridge regression
fit <- glmnet(as.matrix(iris[,-5]), 
              iris[,5], alpha=0, 
              family="multinomial")
plot(fit)

legend("topleft", names(iris)[-5], col = 1:4, lwd=1)

There is another penalty function called Huber Loss Function which is robust to outliers providing a way to make robust regression.

The penalty function is (\(a\) being the residual equal to \(y-\hat{y}\))

\[L_{\delta}(a) = 0.5 a^2, for |a| < \delta\] \[L_{\delta}(a) = \delta ( |a| - \delta/2), otherwise\]

making it quadratic for small values of \(a\), and linear for larger values.

[…] the Huber loss function is convex in a uniform neighborhood of its minimum a=0, at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points \(a=-\delta\) and \(a = \delta\) . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the L2 quadratic loss function) and the robustness of the median-unbiased estimor (using the L1 absolute value function). [wikipedia] The function \(L(a) = \log(cosh(a))\) has a similar behavior.

The pseudo-huber loss function \(L_{\delta}(a) = \delta^2 (\sqrt{1+(a/\delta)^2} - 1)\) is a smooth approximation of the huber, which has continuous derivates of all degrees.

delta = 1
xs <- seq(-2,2,len=100)
huber <- function(a,delta) {
  ifelse(abs(a)<delta,a^2/2,delta*(abs(a)-delta/2))
}
plot(xs,huber(xs,delta),type="l",col="blue",lwd=2)
lines(xs,log(cosh(xs)),col="red")                      # log.cosh  curve
lines(xs,delta^2*(sqrt(1+(xs/delta)^2)-1),col="green") # pseudo huber curve

Non-linear Regression

A nonlinear regression classical example is a model like this:

\[y = f(x, \theta) + \epsilon\]

with error \(\epsilon \sim \mathcal{0,\sigma^2}\), response \(y\), covariantes \(x\) and model parameters \(\theta\). The main difference for linear regression, is that \(f\) does not have restrictions on its form.

In base R we have access to nls which performs non-linear regression. The next egs are from its help file:

x <- -(1:100)/10
y <- 100 + 10 * exp(x / 2) + rnorm(x)/5  # some non-linear, exponential relatioship
fit <- nls(y ~ Const + A * exp(B * x),   # make the non-linear regression fit
           start = list(Const=1, A = 1, B = 1)) 
fit$m$getPars()                          # check estimated parameters
##      Const          A          B 
## 99.9787966 10.1628654  0.5073087
plot(x, y, main = "", pch=20)
curve(100 + 10 * exp(x / 2), col = "blue", lwd=4, add = TRUE)
lines(x, predict(fit), col = "red", lwd=2)

If we have some information about where to search for the parameters, we can use start, and if we need to tweak the number of interations or the convergence tolerance we use control:

fit <- nls(y ~ Const + A * exp(B * x), 
           start   = list(Const=5, A = 1, B = 0.5),
           control = nls.control(maxiter = 100, tol = 1e-05))

Another eg:

my_data <- subset(DNase, Run == 1)
head(my_data)
##   Run       conc density
## 1   1 0.04882812   0.017
## 2   1 0.04882812   0.018
## 3   1 0.19531250   0.121
## 4   1 0.19531250   0.124
## 5   1 0.39062500   0.206
## 6   1 0.39062500   0.215
fit1 <- nls(density ~ 1/(1 + exp((xmid - log(conc))/scal)),
            data  = my_data,
            start = list(xmid = 0, scal = 1),
            algorithm = "plinear")
summary(fit1)
## 
## Formula: density ~ 1/(1 + exp((xmid - log(conc))/scal))
## 
## Parameters:
##      Estimate Std. Error t value Pr(>|t|)    
## xmid  1.48309    0.08135   18.23 1.22e-10 ***
## scal  1.04145    0.03227   32.27 8.51e-14 ***
## .lin  2.34518    0.07815   30.01 2.17e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01919 on 13 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 1.032e-06
# plot it:
xs <- data.frame(conc=seq(0,13,len=100))
plot(my_data$conc, my_data$density, pch=20)
lines(xs$conc, predict(fit1,newdata=xs), col="red", lwd=2)

A more complex eg with indexing:

data(muscle, package = "MASS")
## The non linear model considered is
##       Length = alpha + beta*exp(-Conc/theta) + error
## where theta is constant but alpha and beta may vary with Strip.
with(muscle, table(Strip)) # 2, 3 or 4 obs per strip
## Strip
## S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 
##   4   4   4   3   3   3   2   2   2   2   3   2   2   2   2   4   4   3   3   3 
## S21 
##   3
## We first use the plinear algorithm to fit an overall model,
## ignoring that alpha and beta might vary with Strip.
musc.1 <- nls(Length ~ cbind(1, exp(-Conc/th)), muscle,
              start = list(th = 1), algorithm = "plinear")
summary(musc.1)
## 
## Formula: Length ~ cbind(1, exp(-Conc/th))
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## th      0.6082     0.1146   5.309 1.89e-06 ***
## .lin1  28.9633     1.2298  23.552  < 2e-16 ***
## .lin2 -34.2274     3.7926  -9.025 1.41e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.666 on 57 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 9.319e-07
## Then we use nls' indexing feature for parameters in non-linear
## models to use the conventional algorithm to fit a model in which
## alpha and beta vary with Strip.  The starting values are provided
## by the previously fitted model.
## Note that with indexed parameters, the starting values must be
## given in a list (with names):
b <- coef(musc.1)
musc.2 <- nls(Length ~ a[Strip] + b[Strip]*exp(-Conc/th), muscle,
              start = list(a = rep(b[2], 21), b = rep(b[3], 21), th = b[1]))
summary(musc.2)
## 
## Formula: Length ~ a[Strip] + b[Strip] * exp(-Conc/th)
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## a1   23.4541     0.7962  29.456 4.96e-16 ***
## a2   28.3020     0.7927  35.703  < 2e-16 ***
## a3   30.8007     1.7156  17.953 1.73e-12 ***
## a4   25.9211     3.0158   8.595 1.36e-07 ***
## a5   23.2008     2.8912   8.025 3.50e-07 ***
## a6   20.1200     2.4354   8.261 2.35e-07 ***
## a7   33.5953     1.6815  19.979 3.04e-13 ***
## a8   39.0527     3.7533  10.405 8.63e-09 ***
## a9   32.1369     3.3175   9.687 2.46e-08 ***
## a10  40.0052     3.3358  11.993 1.02e-09 ***
## a11  36.1904     3.1095  11.639 1.60e-09 ***
## a12  36.9109     1.8390  20.071 2.82e-13 ***
## a13  30.6346     1.7004  18.016 1.64e-12 ***
## a14  34.3118     3.4951   9.817 2.03e-08 ***
## a15  38.3952     3.3749  11.377 2.27e-09 ***
## a16  31.2258     0.8857  35.257  < 2e-16 ***
## a17  31.2303     0.8214  38.019  < 2e-16 ***
## a18  19.9977     1.0108  19.784 3.58e-13 ***
## a19  37.0953     1.0706  34.650  < 2e-16 ***
## a20  32.5942     1.1212  29.070 6.18e-16 ***
## a21  30.3757     1.0570  28.738 7.48e-16 ***
## b1  -27.3004     6.8732  -3.972 0.000985 ***
## b2  -26.2702     6.7537  -3.890 0.001178 ** 
## b3  -30.9011     2.2700 -13.613 1.43e-10 ***
## b4  -32.2384     3.8100  -8.461 1.69e-07 ***
## b5  -29.9406     3.7728  -7.936 4.07e-07 ***
## b6  -20.6219     3.6473  -5.654 2.86e-05 ***
## b7  -19.6246     8.0848  -2.427 0.026610 *  
## b8  -45.7799     4.1131 -11.130 3.15e-09 ***
## b9  -31.3446     6.3522  -4.934 0.000126 ***
## b10 -38.5987     3.9555  -9.758 2.21e-08 ***
## b11 -33.9211     3.8388  -8.836 9.19e-08 ***
## b12 -38.2680     8.9920  -4.256 0.000533 ***
## b13 -22.5683     8.1943  -2.754 0.013550 *  
## b14 -36.1669     6.3576  -5.689 2.66e-05 ***
## b15 -32.9521     6.3539  -5.186 7.44e-05 ***
## b16 -47.2068     9.5403  -4.948 0.000122 ***
## b17 -33.8746     7.6884  -4.406 0.000386 ***
## b18 -15.8962     6.2222  -2.555 0.020508 *  
## b19 -28.9690     7.2353  -4.004 0.000919 ***
## b20 -36.9171     8.0325  -4.596 0.000257 ***
## b21 -26.5075     7.0125  -3.780 0.001494 ** 
## th    0.7969     0.1266   6.296 8.04e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.115 on 17 degrees of freedom
## 
## Number of iterations to convergence: 8 
## Achieved convergence tolerance: 2.171e-06

Package nlstools

Ref:

Package nlstools is a unified diagnostic framework for non-linear regression. It deals with the following problems:

  • The iterative estimation usually requires initial values of the model parameters. These values shouldbe relatively close to the ‘real’ values for convergence.

  • The validity of the model fit using diagnostic and visual tools

  • The construction of confidence intervals with non-parametric methods

library(nlstools)

The next eg shows a model for oxygen kinetics during 6-minute walk tests. There are two distict phases. One before time \(\lambda \geq 5.883\) which is a rest phase, and then after a walking stage which is modelled by an exponential. The formula is shown in the code

head(O2K)
##           t      VO2
## 1 0.0000000 377.1111
## 2 0.3333333 333.3333
## 3 0.6666667 352.1429
## 4 1.0000000 328.7500
## 5 1.3333333 369.8750
## 6 1.6666667 394.4000
plot(O2K, pch=20)
formulaExp <- 
  as.formula(VO2 ~ (t <= 5.883) * VO2rest + 
                   (t >  5.883) *(VO2rest + (VO2peak-VO2rest)*(1-exp(-(t-5.883)/mu))))
fit_O2K <- nls(formulaExp, 
               start = list(VO2rest=400, VO2peak=1600, mu=1), data = O2K)
overview(fit_O2K)
## 
## ------
## Formula: VO2 ~ (t <= 5.883) * VO2rest + (t > 5.883) * (VO2rest + (VO2peak - 
##     VO2rest) * (1 - exp(-(t - 5.883)/mu)))
## 
## Parameters:
##          Estimate Std. Error t value Pr(>|t|)    
## VO2rest 3.568e+02  1.141e+01   31.26   <2e-16 ***
## VO2peak 1.631e+03  2.149e+01   75.88   <2e-16 ***
## mu      1.186e+00  7.661e-02   15.48   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 49.59 on 33 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 7.598e-06
## 
## ------
## Residual sum of squares: 81200 
## 
## ------
## t-based confidence interval:
##                2.5%       97.5%
## VO2rest  333.537401  379.980302
## VO2peak 1587.155300 1674.611703
## mu         1.030255    1.342002
## 
## ------
## Correlation matrix:
##            VO2rest    VO2peak        mu
## VO2rest 1.00000000 0.07907046 0.1995377
## VO2peak 0.07907046 1.00000000 0.7554924
## mu      0.19953773 0.75549241 1.0000000
plotfit(fit_O2K, smooth = TRUE, lwd=2, pch.obs=20)

Assessing Goodness of fit

Given the parameter estimates \(\hat{\theta}\), we can use the residuals \(\hat{\epsilon}\) to check the quality of the regression

\[\hat{\epsilon} = y - f(x, \hat{\theta})\]

To get the residuals:

e_hat <- nlsResiduals(fit_O2K)
plot(e_hat)

We see that the residuals seem approximately normally distributed, and there’s no evidence of autocorrelation.

test.nlsResiduals(e_hat)
## 
## ------
##  Shapiro-Wilk normality test
## 
## data:  stdres
## W = 0.95205, p-value = 0.1214
## 
## 
## ------
## 
##  Runs Test
## 
## data:  as.factor(run)
## Standard Normal = 0.76123, p-value = 0.4465
## alternative hypothesis: two.sided

which means the null hypothesis (they are normally distributed) cannot be rejected. The second test cannot reject the hypothesis of no autocorrelation.

The next functions compute confidence regions for the parameters:

O2K.cont1 <- nlsContourRSS(fit_O2K)
## 0%33%66%100%
##  RSS contour surface array returned
plot(O2K.cont1, col = FALSE, nlev = 5)

O2K.conf1 <- nlsConfRegions(fit_O2K, exp = 2, length = 2000)
##   0%1%2%3%4%5%6%7%8%9%10%11%12%13%14%15%16%17%18%19%20%21%22%23%24%25%26%27%28%29%30%31%32%33%34%35%36%37%38%39%40%41%42%43%44%45%46%47%48%49%50%51%52%53%54%55%56%57%58%59%60%61%62%63%64%65%66%67%68%69%70%71%72%73%74%75%76%77%78%79%80%81%82%83%84%85%86%87%88%89%90%91%92%93%94%95%96%97%98%99%100%
##  Confidence regions array returned
plot(O2K.conf1, bounds = TRUE, pch=20)

solution of the puzzle

The portrait is from John Napier, scottish mathematican and inventor of the logarithm.

The astronomer Kepler, who was in Praha, was a real genius. He had the fantastic insight of applying this new and then very extraordinary function discovered my Napier, to the observations of planet positions made by Tycho Brahe. And this is how he discovered his third law.

So the idea of the puzzle was: