Statistics 2450, Winter 2020

  1. Outline
  2. Evaluation
  3. Learning center hours
  4. Calendar
  5. Lectures
  6. Assignments
  7. Midterm Exam
  8. Final Exam
  9. Recommanded books
  10. Supplementary Material
  11. Waiting list for registration

Current announcements

Outline


  • Office Hours: Monday, 1:50-2:15, 4:00-5:00 | Wednesday, 1:50-2:15 | Tuesday, 11:15-12:15
  • Evaluation

    8 assignments + midterm + final. See thecourse outline for details. DUE TO THE COVID-19, the Final Exam is cancelled. The final note will be based on the 8 assignments and the midterm. I have sent a revised syllabus with all relevant details to the class.

    The graders for this course are:

    Please note that Mengyao is also in charge of helping students for STAT240 at the Learning Center (see that section)

    Learning centers hours

    Math/Stat Learning Centre hours for Stat2450
  • Location: Chase 119
  • First day: Tuesday, Jan. 6, 2020
  • Last day: Friday, April 24, 2020
  • LC Hours for STAT 2450:

    Due to the Covid-19 situation, the information below is not relevant anymore

  • Mondays Mon. 2-5pm
  • Tuesdays Tue. 3-4pm ( Mengyao Wang, email: mn427134@dal.ca)
  • Wednesdays Wed. 2-5pm
  • Thursdays 3-5pm ( Mengyao Wang, email: mn427134@dal.ca)
  • Fridays 12-5pm
  • Closed on the holidays: Feb. 7 (Monroe Day), Feb. 17 (Nova Scotia Heritage Day), Apr.10 (Good Friday)
  • (Class closed during the study break: Feb 18-Feb 21) Week: Monday-17 off: heritage day


  • Important dates for 2020
  • (Dal URL)
  • Classes Begin January 6, 2020
  • Munro Day (university closed) February 7, 2020 (OK Friday: not imn 2450)
  • Nova Scotia Heritage Day (university closed) February 17, 2020 (Monday:first day of study break week) OK
  • Study Break (no classes – university remains open) February 18 – 21, 2020 OK
  • Classes End April 3, 2020 (Friday) OK
  • “In Lieu” Class Day #1 (follows a Friday class schedule) April 6, 2020
  • Good Friday (university closed) April 10, 2020 OK
  • Exam Period April 8 – 24, 2020
  • Spring Convocation (TBC) top

    Course Calendar

    There will be 24 sessions total (12 bi-weekly lectures).
    top

    Lecture Notes (under construction)

    Preliminary lecture (self-learning)
    Notes on installing R, Rstudio, RMarkdown, and a quick tutorial intro to R:
    (html)
    RMarkdown code used to produce this html file:
    (Rmd)

    Lecture #1: Programming with R - part 1
    pdf /Rmd
    • R as calculator
    • Data structures
    • IO
    • Get help/doc
    • Workspace
    • wd
    • comments
    • packages
    Lecture #2: Programming with R - part 2
    pdf /Rmd
    • Flow controls
    • Functions
    • R for statistics
    Lecture #3: Programming with R - part 3
    pdf /Rmd
    • User-defined functions
    • General syntax
    • call a function within a function
    • argument
    • recursion
    Lecture #4: Programming with R - part 4
    pdf /Rmd
    • vectorized arithmetics
    • pdf,auc,percentiles,random sampling
    • functions of matrices
    • binom,pois,exp,unif...
    • d,p,q,r
    Lecture #5: Simulation study of the coverage of the t-interval
    html Rmd
    Details: html Rmd
    • empirical coverage
    • simulation
    • t-interval
    • plot
    Lecture #6: Bootstrap sampling
    html/Rmd
    • bootstrap sample
    • bootstrap distribution
    • bootstrap CI
    • Assess true coverage of CI
    Lecture #7: Polynomial regression
    html/Rmd
    • linear regression
    • adding higher order
    • bootstrap standard error of slope
    Lecture #8: Section 8.1 Fitting a regression tree
    html/Rmd
    • Regression trees
    • Baseball players salary
    • cutpoints
    • bagging
    • random forests
    Lecture #9: Fitting a classification tree
    html/Rmd
    • Classification trees
    • Predict AHD
    • Pruning
    • Assess misclassification rate
    Lecture #10: Practice: R programming-1
    html/Rmd
    • matrices
    • loops
    • functions
    • statistical functions
    • confidence intervals
    Lecture #11: Practice: R programming-2
    html/Rmd
    • matrices
    • loops
    • functions
    Lecture #12: Practice: bootstrapping
    html/Rmd
    • Review of bootstrap
    • bootstrap sample
    • bootstrap distribution
    • bootstrap CI
    • Assess true coverage of CI
    Lecture #13: Classification tree (2)
    Rmd
    • Classification trees
    • Predict AHD
    • Pruning
    • Assess misclassification rate
    Lecture #14: Combinatorics of the bootstrap
    Rmd
    • Exchangeable statistics
    • Counting unique bootstrap patterns
    • Average fraction missing
    • Stirling's formula
    Lecture #15: Cross-validation
    html/Rmd
    • Classification trees
    • Cross-validation
    • Pruning
    • Assess misclassification rate
    Lecture #16: Polynomial regression and bootstrapping
    html/Rmd
    • Bootstrapping with boot
    • Prepare for assignment 6
    • Polynomial regression and variants
    • Review real data examples
    Lecture #17:
    html/Rmd
    • Bagging
    • Random Forest
    • Regression
    • MSE
    Lecture #18:
    html/Rmd
    • Bagging
    • Random Forest
    • Out-of-bag
    • Titanic
    Lecture #19:
    html/Rmd
    Lecture #20: Cancelled
    html/Rmd
    • Chat on Brightspace
    Lecture #21:
    html/Rmd
    Lecture #22: Preparation for Assignment 8
  • Wednesday:
    html
    • Properties of the covariance
    • Review of validation procedures
    • Chat on Brightspace
    • Boosting is (unrequired yet interesting) supplementary material
  • Lecture #23: Preparation for Assignment 8
  • Monday:
    html
    • Tests of hypotheses and p-values
    • Plotting details
    • CART and random forests
    • Chat on Brightspace
  • Lecture #24: Preparation for Assignment 8
  • Wednesday:
    html
    • Final revisions
    • Chat on Brightspace
  • top

    Assignments

    Assignment # 1, due January 20th
    Rmd
    • R calculator
    • IO with scan
    • shaping arrays
    • read csv
    • boxplot
    Assignment # 2, due January 27th
    Rmd help /sol
    • read csv
    • edit data
    • binning
    • scatterplot
    • manipulate booleans
    • create dataframe
    • boxplot
    Assignment # 3, due February 3rd
    Rmd
    • For loop
    • If then
    • Modify code
    • Random matrix
    Assignment # 4, due February 10th
    Rmd
    • write functions
    • obeying goals
    • specify arguments
    • specify returned values
    • t-Confidence interval
    Assignment # 5, due February 24th
    Rmd (ONLINE)
    • generate random observations
    • CI for mean
    • bootstrap sampling
    • bootstrap medians
    • recursive function
    Assignment # 6, due March 9th
    Rmd
    • bootstrap and linear regression
    • bootstrap of slope estimate
    • histogram
    Assignment # 7, due March 23rd<
    /Rmd
    • fit tree to training data
    • cross-validation
    • pruning
    • testing error
    Assignment # 8, due April 6
    /Rmd
    • 6 problems (total 50 points)
    top

    Midterm Exam


  • Midterm is February 10th, 6:00-8:00 PM
  • Location: D420 (auditorium 1), McCain Building, University Avenue top

    Final Exam


  • Cancelled due to COVID-19-related measures.

    Recommanded book

    top
  • Introduction to Statistical Learning by James Gareth et al.
  • Supplementary material

    top

    top

    Waiting list for registration

    The number of students that can attend the course is limited by room capacity (120).

    I try my best to accommodate all students who want to take this class. Please note that I made a first set of changes on Jan. 7 2020 to allow students then in the waiting list..

    I have made further changes today (Jan. 14) to allow more students in.
    Please note that you have to register yourself, following my allowing you to take the course.

    orests: who and why
    html -->