I write this document as a complement to today’s lecture.
I hope that this will help everybody in the class, except maybe for more advanced students who are already familiar with this material.
The goal is to help you understand a few basic distributions (uniform, exponential,normal, chi-square, Fisher, binomial, Poisson) and especially:
After reading this document, and repeating each exercise yourselves, you should become more familiar with these functions and they will appear less scary.
A distribution is a shortcut for ‘probability distribution’.
A distribution specifies the chance or importance of each admissible random outcome of a random variable \(X\).
\(X \sim f\)
The value \(h(x)\) of the pdf of the distribution \(f\) at \(x\) is the height of a thin vertical rectangle of horizontal width \(dx\) such that the area of this rectangle represents the probability that the outcome/random variable \(X\) gets a value close to (within \(dx\) of) the numerical value \(x\). We can represent this relation by:
\(AREA = HEIGHT * WIDTH\) or:
\[P(X \in [x,x+dx]) = h(x) dx \]
Please note that \(x\) is a given non random value (a real number), whereas \(X\) is a random real-valued variable.
I call this the ‘thin slice’ interpretation of the pdf function.
Each red rectangle corresponds to a different value of \(x\) (i.e. various values of the outcome). The area of a thin red rectangle is the probability that the outcome will be in the interval represented by the base of the rectangle. The sum of areas of all red rectangles is 1. Red rectangles should be even thinner (infinitesimally small in width) and there should be an infinite number of rectangles (so that the white part under the curev. Here we have represented a simple single- bump-like distribution, but you can redraw the same picture for any arbitrary curve, as long as the curve is positive (not negative y value) and that the total area under the curve (red area) is 1, which means that the probability of ANY possible value $ -< X < $ is 1.
\[1=P(-\infty < X < \infty) = \int_{-\infty}^{+\infty} h(x) dx \]
The cumulative distribution function (cdf) of the distribution is defined by
\[F(x) = P(X \le x) = \int_{-\infty}^x h(u) du \] This means that you collect all the red rectangles that are at the left of the value \(x\) and wipe out the others.
This cdf function varies monotonically from 0 to 1. It represents the cumulative probabilities of all outcomes less than a given value.
Sometimes, it is convenient to look at the complement of the cdf, meaning:
\[F^c(x) = P(X > x) = 1- P(X \le x) = \int_{x}^{+\infty} h(u) du \] This corresponds to wiping out all red rectangles LEFT of x and keeping the right tail of the distribution.
We might want to know what cutoff value \(x_q\) makes the cdf area equal to say \(q=0.85\). This height value is called the 80 percent quantile of the distribution.
\[F(x_q) = P(X \le x_q) =q\] Note that in this formula, we give \(q\) (input) and request \(x_q\) (output). This is the opposite of the cdf where we give the value \(x_q\) and want to know the value of \(q\).
The value of \(x_q\) is obtained in R by using the q letter in front of the name of the distribution.
For example:
mu = 0.
sigma=1.
variance=sigma^2
qnorm(0.95)
## [1] 1.644854
See the section on quantiles for more details.
Normal distributions are used to represent outcomes that vary continuously and can take any negative or positive real number.
The support of the normal distribution i.e. the set of all possible outcome values, is the entire set of real numbers: \(\mathbb{R}\).
A random variable \(X\) that is drawn from a normal distribution of mean \(\mu\) and variance \(v=\sigma^2\) is noted:
\(X \sim N(\mu,\sigma^2)\)
A normal distribution is completely charaterized by two parameters: -its mean (or average) -its variance \(\sigma*2\)
The variance is the square of the standard deviation \(\sigma\). The standard deviation has the same units as \(X\) (e.g. meters if \(X\) is measured in meters).
Let us assume that male canadian height is normally distributed with mean \(\mu=1.78\) and standard deviation \(0.15\).
Here is the meaning of the parameters: width=300 src=“https://www.researchgate.net/profile/Sara_Johnson9/publication/47300259/figure/fig1/AS:394309422600196@1471022101190/llustration-of-the-normal-distribution-mean-standard-deviation.png”>
mu = 1.78
sigma=0.15
variance=sigma^2
The distribution is now fully specified and we can work with it.
Let us ask R to draw \(13\) independent samples from this distribution
# please note how to specify the parameters of the normal
# the keywords mean and sd are compulsory
nsamples=13
X <- rnorm(nsamples,mean=mu,sd=sigma)
X
## [1] 2.001617 1.809157 1.854926 1.766812 1.693316 1.944862 1.759418 1.932956
## [9] 1.976868 1.793250 1.876765 1.880565 1.774968
Exercice: can you write a code in R to replicate as closely as possible the figure above?
Binomial distributions are used to represent outcomes that counts the number of success in a finite series of independent experiments, each of which is a Bernoulli trial, i.e. has two possible outcomes (success, noted T or failure, noted F).
The support of the binomial distribution i.e. the set of all possible outcome values, is an interval of integers. If we have 5 trials, the support is \({0,1,2,3,4,5}\) because \(X\) can only take these values with non-zero probability.
A random variable \(X\) that is drawn from a binomial distribution of probability \(p\) and size \(N\) is noted:
\(X \sim Binom(p,N)\)
Think of ONE binomial trial as equivalent to \(N\) independent Bernoulli tosses of a coin. Each toss will give HEAD (success) with a probability \(p\).
A binomial distribution is completely charaterized by two parameters: -the probability of success in one trial \(p\) -the number of trials \(N\)
Let us assume that in a population of clover, there is a probability \(p=0.0001\) to have four leaves (success). And let’s assume that we collect \(N=2000\) clover plants.
Here is a plot of the pdf of a binomial function. On the x-axis, we represent the outcomes, and on the y-axis we represent the probability of each outcome. This is also called the probability mass function or pmf. Can you tell which parameters were used to create this plot? Can you read on this plot what is the probability of 6 success? How many trials were used? What was the probability of success per trial? Why do we have a zero probability for 40 success?
N = 2000
p = 0.0001
The distribution is now fully specified and we can work with it.
Let us ask R to draw \(50\) independent samples from this distribution
# please note how to specify the parameters of the binomial
# the keywords size and prob are compulsory
nsamples=50
X <- rbinom(nsamples,size=N,prob=p)
X
## [1] 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0
## [39] 0 0 0 0 0 0 0 0 1 1 0 1
The meaning of this is the following: each replicate or sample is say a person, and each person reports the number of clover plants with 4 leaves (successes) that they have found among the 2000 trials (plants) that they have each collected.
So this experiment supposes that a total of 100000 plants have been collected.
Given the small probability of finding 4 leaves, many outcomes are 0, there are some 1’s and a few 2’s.
Please note that, each time you rerun this code, you will get a different result for the sequence of sample. If you want to replicate the exact same random sample, you need to start with the same random seed at each execution of the code. You can do this by adding the instruction set.seed()
The integer value that you pass is arbitrary (e.g. 101 or 200 or anything you want) as long as you are aware that the same seed will generate the same sequence, and different seeds will generate different sequences.
Uniform distributions are used to represent any random number in an interval, and assume that any number within the specified interval, say \([a,b]\) has an equal chance to be drawn.
The support of the uniform distribution is an interval \([a,b]\).
A random variable \(X\) that is drawn from a uniform distribution of in interval \([a,b]\) is noted:
\(X \sim U(a,b)\)
Hence the uniform distribution is completely characterized by two parameters: -\(a\), the minimum value -\(b\), the maximum value
Let us draw 19 samples from \(U(1,5)\):
a=1
b=5
# please note how to use the compulsory min and max parameters to specify
# the parameters of the uniform distribution
runif(19,min=a,max=b)
## [1] 2.732964 2.440328 4.401623 2.540395 1.086079 3.013539 1.896478 2.291761
## [9] 2.380972 3.719870 3.883208 4.460315 2.395409 1.845875 1.065350 2.291618
## [17] 1.687740 3.771335 4.570096
The Poisson distribution is often used to represent the counts of certain events (e.g. disease dignostic, disintegration of an atom, explosion of a star, mutation of a gene) in a period of time.
The support of the Poisson distribution is the set of all integers \(0,1,2,\ldots\).
A random variable \(X\) that is drawn from a Poisson distribution with rate \(\lambda\) is noted:
\(X \sim Pois(\lambda)\)
Hence the Poisson distribution is completely characterized by one parameter: -\(\lambda\), the rate value
Please note that the numerical value of the rate parameter coincides exactly with both the mean and the variance of the Poisson distribution. That is, if we choose a Poisson distribution with rate \(\lambda=8.4\), we immediately know that the mean of random samples from this distribution will be \(8.4\) and the variance will also be \(8.4\).
Let us draw 23 samples from \(Pois(7.2)\):
nsamples=23
rate=7.2
#please note the use of the R keyword lambda to specify the rate parameter of the
#Poisson distribution:
rpois(nsamples,lambda=rate)
## [1] 13 11 3 7 5 5 3 7 3 7 5 5 8 7 4 8 12 11 6 10 3 12 7
The exponential distribution is tightly related to the Poisson distribution. A Poisson process that develops in time (for example radioactive decay of atoms) can be analyzed in terms of counts in given time intervals, in which case counts of disintegrations per equal interval (e.g. a second) are Poisson distributed. The time separating two consecutive disintegration is also random, and it obeys an exponential distribution.
The support of the exponential distribution is the set of positive reals.
A random variable \(X\) that is drawn from an exponential distribution with rate \(\lambda\) is noted:
\(X \sim Exp(\lambda)\)
Hence the exponential distribution is completely characterized by one parameter: -\(\lambda\) , the rate of the exponential (decay)
The value of \(\lambda\) coincides with the inverse of the mean of the distribution. Hence if we draw samples from \(Exp(0.25)\) we know that the mean of the sample will be 4.
Let us draw 7 samples from \(Exp(3.25)\):
nsamples=23
lambda=3.25
#please note the use of the compulsory R keyword 'rate' to specify the parameter
#of the exponential distribution
#
rexp(nsamples,rate=lambda)
## [1] 0.802771289 1.495487860 0.183553844 0.066618832 0.287712884 0.553792094
## [7] 0.338004993 0.172330209 0.073925281 0.051999049 0.127702568 0.041008146
## [13] 0.309629423 0.219832570 0.159727035 0.504180785 0.369560668 0.337964318
## [19] 0.047764481 0.472216626 0.019471597 0.068688480 0.006470877
The chi-square distribution is used to represent the area of squares whose sides are random and drawn from normal distributions.
The support of the chisquare distribution is the set of all positive reals \(\mathbb{R}^{+}\).
A random variable \(X\) that is drawn from a Poisson distribution with rate \(\lambda\) is noted:
\(X \sim \chi^2(k)\)
Hence the chi-square distribution is completely characterized by one parameter: -\(k\), the number of degrees of freedom
The intuitive interpretation of \(k\) is the number of random squares used to generate one sample outcome.
Let us draw 9 samples from \(\chi^2(3)\):
kdof=4
nsamples=9
#please note the use of the R keyword lambda to specify the rate parameter
rchisq(nsamples,df=kdof)
## [1] 1.671761 1.415809 3.548919 2.914980 3.249824 3.067616 5.040069 9.409600
## [9] 1.748947
The t-distribution is used to model the ratio ofempirical variance
The support of the exponential distribution is the set of all integers \(0,1,2,\ldots\).
A random variable \(X\) that is drawn from t-distribution with rate \(k\) is noted:
\(X \sim t_k\)
Hence the t-distribution is completely characterized by one parameter: -\(k\) , the number of degrees of freedoms
Let us draw 11 samples from \(t_4\):
nsamples=11
ndof=4
#please note the use of the compulsory R keyword 'df' to specify the parameter
#of the t-distribution
#
rt(nsamples,df=ndof)
## [1] 0.2845803 1.1902902 2.1406185 0.1452952 0.8053436 0.9839946
## [7] 2.3945917 -0.2385998 -3.4818857 0.1282723 -1.0776839
The Fisher distribution is used to model the ratio of two sums of squares (each distributed as a chi-square) and is especially convenient in the analysis of variance (ANOVA).
The support of the Fisher distribution is the set of all integers \(0,1,2,\ldots\).
A random variable \(X\) that is drawn from an Fisher distribution is noted:
\(X \sim F_{k,k'}\)
Hence the Fisher distribution is completely characterized by two parameters: -\(k\) , the number of dofs of the numerator -\(k'\), the number of dofs of the denominator
The value of \(\lambda\) coincides with the inverse of the mean of the distribution. Hence if we draw samples from \(Exp(0.25)\) we know that the mean of the sample will be 4.
Let us draw 4 samples from \(F_{3,7}\):
nsamples=4
k1=3
k2=7
#please note the use of the compulsory R keyword 'df1' and 'df2'
# used to specify the parameters of the Fisher distribution
#
rf(nsamples,df1=k1,df2=k2)
## [1] 1.146582 2.376979 1.518826 5.177511
In all the exercices of this section, we are going to evaluate the pdf (probability densoty function) of a given distribution (for example the binomial) at a given value \(x\).
If the distribution models outcomes that vary on a discrete scale (i.e. each sample is an integer, for example, the binomial or the poisson model the probabilities of counts. e.g. number of success in N attempts), then the pdf at x is simply the probability of occurrence of x.
Example:
The probability of getting 3 successes over 11 tosses of a fair coin is:
# type the value of the outcome here the number of success
x=3
# and call the R function to calculate the pdf at x
dbinom(x,size=11,prob=0.5)
## [1] 0.08056641
For an outcome that varies on a continuous scale (e.g. the height of a person), the meaning of the pdf is slightly different (see above). For example, assume that a random outcome (variable X) is distributed according to a normal distribution (mean zero and variance 1). What is the meaning of the value of the pdf function evaluated at \(x=2.2\). Unlike the case of the binomial, this value is NOT the probability of getting exactly 2.2. But, choosing a little interval, say \(dx=0.1\), the probability that a random outcome X lands in \([2.2,2.3]\) is 0.1 times the pdf evaluated at 2.2.
When we draw the curve of a distribution, for example the normal distribution, we plot the value of the pdf (y) as a function of x.
The analytical formula that gives the pdf of \(N(\mu,\sigma^2)\) at \(x\) is known to be:
$(x) = e^{-}
Therefore, you could apply this formula to calculate the pdf of any normal distribution at any value \(x\).
Try this here
# say you want to calculate the formula above for x=1.4
x=1.4
# and now do the calculation pdfx = ? (use the formula abvoe)
# type your code
But you can also use a shortcut:
x=1.4
dnorm(x,mean=mu,sd=sigma)
## [1] 0.1074524
To understand what this means, can you apply the ‘thin slice’ interpretation of the pdf?
Hint: just cut a small rectangle of size \(dx=0.1\) near \(x\).
You can now plot this distribution, simply by calculating the value of the pdf at various values of x
x <- seq(from=-5,to=5,by=0.01)
y <- dnorm(x,mean=mu,sd=sigma)
plot(x,y,main="hey this is a plot of the normal distribution")
Now you all the basic distributions in R have similar formulas for their pdf and cdf functions. These formulas are evaluated ‘under the hood’ when you call the appropriate R functions (dbinom for the binomial, dpois for the Poisson, dt for the t distribution etc.)
So, for any of these distributions you can use one of two ways to calculate their pdf. Way one: go to the Wikipedia page on the distribution (e.g. type wikipedia binomial distribution in your prefered search engine), grab the exact analytical formula for the pdf in the right margin of the article, and type the formula in R to evaluate the pdf at any value of x that is admissible. The great thing about this is that you can explore many distributions, not only the basic (but most common) ones that I have listed here.
Can thou calculate the pdf of this distribution yourself?
You just have to call the proper R distribution. How many parameters do you need to specify? What are these parameters called? What is the interpretation of these parameters? What are the defaults that R uses if you do not specify these parameters? How do you specify these parameters in the R function?
Choose the same parameters as in the section ‘how to draw sample’. Or experiment with your own choice of parameters.
Bonus question: could you retrieve the exact mathematial formula for this pdf function?
Practical example: What is the value of the pdf of the binomial distribution \(Binom(0.3,5)\) at \(x=2\)?
# type your code
VISUAL: can you plot this distribution?
Just mimic what we did for the normal. But exert caution: think about this: what are the admissible values of the outcome (this will limit the x values!)
Can thou calculate the pdf of this distribution yourself?
You just have to call the proper R distribution. How many parameters do you need to specify? What are these parameters called? What is the interpretation of these parameters? What are the defaults that R uses if you do not specify these parameters? How do you specify these parameters in the R function?
Choose the same parameters as in the section ‘how to draw sample’. Or experiment with your own choice of parameters.
Bonus question: could you retrieve the exact mathematial formula for this pdf function?
Practical example: What is the value of the pdf of the distribution \(U(2.2,4.8)\) at \(x=3.1\)?
# type your code
VISUAL: can you plot this distribution?
Just mimic what we did for the normal. But exert caution: think about this: what are the admissible values of the outcome (this will limit the x values!)
Can thou calculate the pdf of this distribution yourself?
You just have to call the proper R distribution. How many parameters do you need to specify? What are these parameters called? What is the interpretation of these parameters? What are the defaults that R uses if you do not specify these parameters? How do you specify these parameters in the R function?
Choose the same parameters as in the section ‘how to draw sample’. Or experiment with your own choice of parameters.
Bonus question: could you retrieve the exact mathematial formula for this pdf function?
Practical example: What is the value of the pdf of the distribution \(Exp(4.8)\) at \(x=1.2\)?
# type your code
VISUAL: can you plot this distribution?
Just mimic what we did for the normal. But exert caution: think about this: what are the admissible values of the outcome (this will limit the x values!)
Can thou calculate the pdf of this distribution yourself?
You just have to call the proper R distribution. How many parameters do you need to specify? What are these parameters called? What is the interpretation of these parameters? What are the defaults that R uses if you do not specify these parameters? How do you specify these parameters in the R function?
Choose the same parameters as in the section ‘how to draw sample’. Or experiment with your own choice of parameters.
Bonus question: could you retrieve the exact mathematial formula for this pdf function?
Practical example: What is the value of the pdf of the distribution \(\chi^2(3)\) at \(x=1.29\)?
# type your code
VISUAL: can you plot this distribution?
Just mimic what we did for the normal. But exert caution: think about this: what are the admissible values of the outcome (this will limit the x values!)
Can thou calculate the pdf of this distribution yourself?
You just have to call the proper R distribution. How many parameters do you need to specify? What are these parameters called? What is the interpretation of these parameters? What are the defaults that R uses if you do not specify these parameters? How do you specify these parameters in the R function?
Choose the same parameters as in the section ‘how to draw sample’. Or experiment with your own choice of parameters.
Bonus question: could you retrieve the exact mathematial formula for this pdf function?
Practical example: What is the value of the pdf of the distribution \(F_{2,5}\) at \(x=0.2\)?
# type your code
VISUAL: can you plot this distribution?
Just mimic what we did for the normal. But exert caution: think about this: what are the admissible values of the outcome (this will limit the x values!)
Can thou calculate the pdf of this distribution yourself?
You just have to call the proper R distribution. How many parameters do you need to specify? What are these parameters called? What is the interpretation of these parameters? What are the defaults that R uses if you do not specify these parameters? How do you specify these parameters in the R function?
Choose the same parameters as in the section ‘how to draw sample’. Or experiment with your own choice of parameters.
Bonus question: could you retrieve the exact mathematial formula for this pdf function?
Practical example: What is the value of the pdf of the distribution \(t_{2}\) at \(x=0.21\)?
# type your code
VISUAL: can you plot this distribution?
Just mimic what we did for the normal. But exert caution: think about this: what are the admissible values of the outcome (this will limit the x values!)
And now try to repeat all the (d) section but with the (p) letter.
We are calculating the CUMULATIVE PROBABILITY of x, which means the probability that the random variable produces ANY output SMALLER than (or equal to) x.
For example, the cdf at \(x=3\) for a binomial using 7 trials would be the probability of getting 0 success or 1 success or 2 success or 3 success.
What is the value of the cdf of the uniform distribution \(U(1,5)\) at \(x=4.2\)? What is the meaning (draw something on a sheet of paper).
a=1
b=5
x=4.2
punif(x,min=a,max=b)
## [1] 0.8
Can thou do the other examples yourself?
What is the 70percentile of \(U(1,5)\)?
a=1
b=5
q=0.7
qunif(x,min=a,max=b)
## Warning in qunif(x, min = a, max = b): NaNs produced
## [1] NaN
Can thou do the other examples yourself?
Can you please rate this document on a scale of 1 to 4?
1: not useful at all
2: not useful
4: useful
5: very useful
There might still be some errors or typos in this document.
If you have found some, it means that you are doing a good job.
Thanks for sending a copy of the sections of the document that contain a mistake.