#vectors matrices tables
what is v?
a=3
b=9
v = a+c(-1,2,4)*b
what does this do?
x=c(T,T,F,F)
y=c(T,F,T,F)
M=cbind(x,y,x&y,x|y,!x)
dimnames(M)[[2]]=c("x","y","x&y","x|y","!x")
M
## x y x&y x|y !x
## [1,] TRUE TRUE TRUE TRUE FALSE
## [2,] TRUE FALSE FALSE TRUE FALSE
## [3,] FALSE TRUE FALSE TRUE TRUE
## [4,] FALSE FALSE FALSE FALSE TRUE
fulltime=NA
nclasses=1
if (nclasses >= 3) {fulltime=TRUE
} else {fulltime=FALSE}
fulltime
## [1] FALSE
#ifelse(condition, statement1, statement2)
There is a subtle difference in that ifelse works on vectors.
#find the component by component minimum of vectors a and b
a=rnorm(10);
b=rnorm(10);
vecminab=ifelse(a<b,a,b);
print(cbind(a,b,vecminab))
## a b vecminab
## [1,] 0.5435615 -0.82403858 -0.8240386
## [2,] -0.6385618 1.73761614 -0.6385618
## [3,] -0.3986438 0.77237730 -0.3986438
## [4,] 0.7630046 -1.19751288 -1.1975129
## [5,] 0.2071592 -0.73389851 -0.7338985
## [6,] -2.4247580 -1.01896389 -2.4247580
## [7,] 0.1068852 0.69329282 0.1068852
## [8,] -1.0462237 0.43096752 -1.0462237
## [9,] -0.2292156 0.02872836 -0.2292156
## [10,] -0.3637466 0.69651526 -0.3637466
if number if negative write ‘negative’
if number is positive write ‘>0’
W=matrix(1:16,byrow=T,ncol=4)
print(W)
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
## [4,] 13 14 15 16
fmat=function(W){
n=nrow(W)
# these loops will generate 16=4x4 states for the matrix
#
for (i in 1:n){
for (j in 1:n){
W[j,i]=W[i,j]+W[j,i]
# pass i=1
#11 = 11+11 # 11 will never be seen again
#21 = 12+21 # 21 will be seen again
#31 = 13+31
#41 = 14+41
# pass i=2
#12 = 21+12 # will never be seen again
#22 = 22+22
#32 = 23+32
#42 = 24+42
# pass i=3
#13 = 31+13 #
#23 = 32+23
#33 = 33+33
#43 = 34+43
# pass i=4
#14 = 41+14 #
#24 = 42+24
#34 = 43+34
#44 = 44+44
}
}
return(W)
}
print(fmat(W))
## [,1] [,2] [,3] [,4]
## [1,] 2 9 15 21
## [2,] 7 12 24 30
## [3,] 12 17 22 39
## [4,] 17 22 27 32
what does this code do?
print(sample(c(2,5,3), size=3, replace=FALSE))
## [1] 5 3 2
print(sample(c(2,5,3), size=3, replace=TRUE))
## [1] 3 5 5
what does this code do?
pbinom(2, size=10, prob=.2)
## [1] 0.6777995
what does this code do?
data=rbinom(2, size=10, prob=.2)
data
## [1] 4 4
\(t_{\alpha,k}\)
quantiles
qt(.975,23)
## [1] 2.068658
what does this code do?
set.seed(87612345)
mydata=rnorm(20,mean=4,sd=.75)
t.test(mydata,conf.level=.95)
##
## One Sample t-test
##
## data: mydata
## t = 24.626, df = 19, p-value = 7.037e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 3.740388 4.435262
## sample estimates:
## mean of x
## 4.087825
what is the assumption made for data?
are these assumptions justified?
what is a confidence interval?
I used the expression: ‘is the rat in the box’. What did I mean?
what is the coverage of a confidence interval?
what does empirical coverage of a CI mean?
what is alpha? what is dof?
what is the formula that R is using in the function t.test?
where is a quantile used? quantile of what? what does a quantile mean?
how could I have a smaller confidence interval?
what would happen if the data does not come from a normal distribution?
Can you give me examples?
but how do I know the distribution of the data?
what does this code do?
set.seed(117426)
data=rchisq(42,7)
what is the assumption made for data?
data is i.i.d normally distributed.
Each data point is a random number
\[x_i \sim N(\mu=4,\sigma^2 = (.75)^2)\]
I will use the notation
\[x_i \sim f\] to say the data is i.i.d. fro, distribution \(f\)
Please note that in general, distribution have ‘knobs’ (controls or parameters), so a better notation could be:
\[x_i \sim f(.,\theta)\]
where \(theta\) is a vector of parameters for the distrubution.
For example, a multivariate normal distribution in 2 dimensions requires 5 scalar parameters.
what is a confidence interval?
A confidence interval for a parameter of a distribution (for example the mean of a normal) is an interval in which we believe that the parameter will be with some given probability.
For example, a \(90\%\) confidence interval for the mean \(\mu\) of a random sample (data) is an interval \([a,b}\) such that
\[ \mathbb{P}( \mu \in [a,b]) = 0.9\]
I used the expression: ‘is the rat in the box’. What did I mean?
The rat is the unknown parameter. The box is the confidence interval.
what is the coverage of a confidence interval?
It is the real probability that the interval contains the parameter. This may differ from the nominal or announced probability.
what does empirical coverage of a CI mean?
It is an estimation of the real coverage probability of the CI. We get it by simulating data and counting how many times the rat is in the box.
what is alpha? what is dof?
for a \(90\%\)-CI, alpha is \(10\%\). The number of degrees of freedoms depends on the rat. It will not be the same if you try to catch a ‘mean’ or ‘a variance’ say. For a mean, the dof is the number of data points minus 1.
what is the formula that R is using in the function t.test?
The formula of Student
When \(x_1, x_2, \ldots , x_n\) is a sample from a normal distribution with unknown mean \(\mu\) and unknown variance \(\sigma^2\), the level \(100(1-\alpha)\)% confidence interval for \(\mu\) is given by
\[\bar x \pm t_{1-\alpha/2, n-1} \frac{s}{\sqrt n}\]
where \(\bar x\) and \(s\) are the sample mean and sample standard deviation of the data, and \(t_{1-\alpha/2, n-1}\) cuts off an area \(1-\alpha/2\) to its left under the \(t\) curve with \(n-1\) degrees of freedom.
where is a quantile used? quantile of what? what does a quantile mean?
see below
how could I have a smaller confidence interval?
Get more data points
what would happen if the data does not come from a normal distribution?
The coverage of your t-CI will not be the nominal coverage
Can you give me examples?
Draw data from any other distribution : poisson, chisquare, binomial, uniform etc.
but how do I know the distribution of the data?
You try fitting several distributions and keep the distribution with the best fit. BUT…
STEPS for writing functions:
step 1: formulate the problem
step 2: solve the problem
step 3: implement your algorithm in R
NOTA-BENE: You can build your function on top of powerful functions available in base R or in libraries!
write a function that calculates
\[ f(a,n) = \sum_{i=1}^n \frac{1}{i^a}\]
for \(a=2\), this is:
\[(1/1)^2 + (1/2)^2 + (1/3)^2 + (1/4)^2 + \ldots\]
compute $ $
does this remind you of any well-known number?
My solution
sa=function(a,n){
return(sum(seq(1,n)^(-2)))
}
n=20
print(sqrt( 6.*sa(2,n)))
## [1] 3.09467
n=100000
print(sqrt( 6.*sa(2,n)))
## [1] 3.141583
pi
## [1] 3.141593
some amazing speed effects
#for (n in (20:100000)){
#s = sqrt( 6.*sa(2,n))
#if(n%%10000 == 0){print(c(s,s-pi))}
#}
for (n in (20:100000)){
if(n%%10000 == 0){print(sqrt( 6.*sa(2,n)))}
}
## [1] 3.141497
## [1] 3.141545
## [1] 3.141561
## [1] 3.141569
## [1] 3.141574
## [1] 3.141577
## [1] 3.141579
## [1] 3.141581
## [1] 3.141582
## [1] 3.141583
You all know that given two numbers, \(x,y\), we can define several types of averages:
\[A(x,y)=1/2(x+y)\] \[G(x,y)=\sqrt{x y}= (xy)^{1/2}\] Note that …
\[H(x,y)=\frac{1}{\frac{1}{x} + \frac{1}{y}}\]
Suppose I start with two real numbers
\[a=a_0,b=b_0\]
Now I define a sequence of points
\[a_{n+1}=A(a_n,b_n), b_{n+1}=G(a_n,b_n)\]
Can you write a function that calculates \(a_N,b_N\) for any given rank \(N\)?
#agmf=function(arguments here){
#
#return()
#}
The arithmetic-geometric mean is defined as the common limit of these two sequences
FUN: Here is a code found online that plays with the AGM function.
Can you guess what it does?
library(numbers)
## Gauss constant
1 / agm(1, sqrt(2)) # 0.834626841674073
## [1] 0.8346268
## Calculate the (elliptic) integral 2/pi \int_0^1 dt / sqrt(1 - t^4)
f <- function(t) 1 / sqrt(1-t^4)
2 / pi * integrate(f, 0, 1)$value
## [1] 0.8346268
1 / agm(1, sqrt(2))
## [1] 0.8346268
## Calculate pi with quadratic convergence (modified AGM)
# See algorithm 2.1 in Borwein and Borwein
y <- sqrt(sqrt(2))
x <- (y+1/y)/2
p <- 2+sqrt(2)
for (i in 1:6){
cat(format(p, digits=16), "\n")
p <- p * (1+x) / (1+y)
s <- sqrt(x)
y <- (y*s + 1/s) / (1+y)
x <- (s+1/s)/2
}
## 3.414213562373095
## 3.142606753941623
## 3.141592660966044
## 3.141592653589793
## 3.141592653589793
## 3.141592653589793
## Not run:
## Calculate pi with arbitrary precision using the Rmpfr package
require("Rmpfr")
## Loading required package: Rmpfr
## Loading required package: gmp
##
## Attaching package: 'gmp'
## The following objects are masked from 'package:base':
##
## %*%, apply, crossprod, matrix, tcrossprod
## C code of R package 'Rmpfr': GMP using 64 bits per limb
##
## Attaching package: 'Rmpfr'
## The following object is masked from 'package:gmp':
##
## outer
## The following objects are masked from 'package:stats':
##
## dbinom, dgamma, dnorm, dpois, pnorm
## The following objects are masked from 'package:base':
##
## cbind, pmax, pmin, rbind
vpa <- function(., d = 32) mpfr(., precBits = 4*d)
# Function to compute \pi to d decimal digits accuracy, based on the
# algebraic-geometric mean, correct digits are doubled in each step.
agm_pi <- function(d) {
a <- vpa(1, d)
b <- 1/sqrt(vpa(2, d))
s <- 1/vpa(4, d)
p <- 1
n <- ceiling(log2(d));
for (k in 1:n) {
c <- (a+b)/2
b <- sqrt(a*b)
s <- s - p * (c-a)^2
p <- 2 * p
a <- c
}
return(a^2/s)
}
d <- 64
pia <- agm_pi(d)
print(pia, digits = d)
## 1 'mpfr' number of precision 256 bits
## [1] 3.141592653589793238462643383279502884197169399375105820974944592
# 3.141592653589793238462643383279502884197169399375105820974944592
# 3.1415926535897932384626433832795028841971693993751058209749445923 exact
## End(Not run)
Write a function that prints one root f a cubic polynomial
\[x^{3}+px+q=0\]
\[t=x+{\frac {b}{3a}}\] \[p={\frac {3ac-b^{2}}{3a^{2}}}\] \[q= {\frac {2b^{3}-9abc+27a^{2}d}{27a^{3}}}\] \[4p^{3}+27q^{2}>0\]
\[{\sqrt[{3}]{-{\frac {q}{2}}+{\sqrt {{\frac {q^{2}}{4}}+{\frac {p^{3}}{27}}}}}}+{\sqrt[{3}]{-{\frac{q}{2}}-{\sqrt {{\frac {q^{2}}{4}}+{\frac {p^{3}}{27}}}}}}\]
\[ax^3+bx^2+cx+d=0\]
The discriminant of the general cubic
\[18\,abcd-4\,b^{3}d+b^{2}c^{2}-4\,ac^{3}-27\,a^{2}d^{2}\]
It is the product of \[a^{4}\] and the discriminant of the corresponding depressed cubic.
\[\Delta _{0} = b^{2}-3ac \] \[\Delta _{1} =2b^{3}-9abc+27a^{2}d\]
\[C={\sqrt[{3}]{\frac {\Delta _{1}\pm {\sqrt {\Delta _{1}^{2}-4\Delta _{0}^{3}}}}{2}}}\]
\[x=-{\frac {1}{3a}} (b+C+{\frac {\Delta _{0}}{C}} )\]
Can you write a function that accepts a depressed cubic as input and returns three things:
a message
one real root of a cubic polynomial?
the discriminant
#cubicroot=function(arguments here){
#
#return()
#}
#discriminant=
#return(NULL)}
The integral of a function \(f(x)\) can be approximated by a Riemann sum
\[\int_{a}^b f(x)dx \approx \sum_{i=1}^n f(x_i) dx_i\]
How would you write a function that calculates the riemann sm at any gvn anl f a fnctn?
\[\int_a^b g'(x)dx = g(b)-g(a)\]
if \(f(x)=g'(x)\), \(g\) is called a primitive of \(f\)
What if \(g=sin\), what is \(f\)?
Hence
\[\int_a^b sin(x)dx = \cos(b)-\cos(a)\]
dumm=function(x,f){
output=f(x)+2
return(output)
}
dumm(pi/2,sin)
## [1] 3
Implement a function that calculates the riemann sum at rank n of a function of your choice
What output might you want to return?
rsum=function(f,n){
# output=f(x)+2
# return(output)
}
# how would you test your function?
bread=function(x){
# implement any function of your choice
# return()
}
# how would you test your function?
butter=function(x){
if(x == 0 ){return(1)} else{
return(sin(x)/x)
}
}
# how would you test your function?
butter(0.01)
## [1] 0.9999833
what is the integral of butter?
rsum=function(f,n){
# output=f(x)+2
# return(output)
}
# how would you test your function?
How about computing the area under an empirical (not analytical) curve?
x <- seq(-1., 1., by = 0.05)
#set.seed(100)
y = exp(-x^2/2) / sqrt(2*pi)
library(pracma)
##
## Attaching package: 'pracma'
## The following objects are masked from 'package:Rmpfr':
##
## erf, erfc, hypot, zeta
## The following objects are masked from 'package:gmp':
##
## gcd, isprime
## The following objects are masked from 'package:numbers':
##
## mod, rem
area <- trapz(x, y)
area
## [1] 0.6825887
z <- seq(-10., 10., by = 0.05)
xs <- sample(z,20,replace=F)
x <- sort(xs)
#set.seed(100)
y = exp(-x^2/2) / sqrt(2*pi)
library(pracma)
area <- trapz(x, y)
area
## [1] 1.064928