Introduction to Statistical Modeling
2022-01-18
Preface
This online text has been designed for STA 363 - Introduction to Statistical Modeling at Miami University. What is now this text was originally a set of notes written by Mr. Mike Hughes and has since been reformulated into this bookdown version. The original version of this document was not intended for publication but, as is common, has evolved into an textbook of sorts. Future iterations will improve some coding examples and provide more references.
The bulletin description for the course states:
Applications of statistics using regression and design of experiments techniques. Regression topics include simple linear regression, correlation, multiple regression and selection of the best model. Design topics include the completely randomized design, multiple comparisons, blocking and factorials.
The book and course have been designed to be a follow-up to a standard introductory statistics course (in many ways, this course can be considered “Intro Stat 2”). The course and text assumes the reader has a solid foundation in two-sample inference and some basic computing skills.
The book mixes statistical background with applications using the R Project for Statistical Computing (R Core Team, 2019).
We have attempted to perform all data processing and analysis in this text using the tidyverse; thus all plots and functionality should follow the new grammar in R (ggplot and %>%
type code).
The book was designed with 13 chapters with the intent to be used during the 14-week semester at Miami University (one week reserved for midterm examination and end-of-semester review).
The first chapter is comprehensive.
It reviews introductory statistics material and provides a crash course in the basics of R and making plots with ggplot2
(Wickham, 2016).
Chapter 2 extends the idea of two-sample inference into multiple samples and introduces key concepts of experimental design.
Chapters 3 and 4 cover some advanced design ideas (multiple factors, blocking and repeated measures), presenting all inference in the form of a linear model.
Chapters 5–10 cover the classic topics around multiple linear regression as well as some data science concepts (prediction, variable selection and model validation).
Chapters 11 and 12 provides an overview of statistical odds and logistic regression.
Chapter 13 provides an overview of Generalized Linear Modeling with examples of both Poisson and Negative Binomial regression.