Introduction to Statistical Modeling
2025-06-26
Preface
![A 'fun' plot: A scatterplot resulting in the shape of a Tyrannusaurus [@R-datasauRus] with shaded Voronoi cells using the `ggvoronoi` package [@R-ggvoronoi]](introStatModeling_files/figure-html/dino-1.png)
Figure 0.1: A ‘fun’ plot: A scatterplot resulting in the shape of a Tyrannusaurus (Locke et al., 2018) with shaded Voronoi cells using the ggvoronoi
package (Garrett et al., 2019)
This online text was designed for STA 363 - Introduction to Statistical Modeling at Miami University and has been used as a resource in other courses at Miami University (e.g., STA 672 - Statistical Modeling and Study Design, a service course for graduate students in the life sciences). The original version of this document was not intended for broad publication but, as is common, has evolved into an textbook of sorts. What is now this text was originally a set of class notes written by Mr. Mike Hughes for use during class. Many of the data and coding examples have been updated and everything has been structured such that this work can largely stand alone.
The bulletin description for the STA 363 course states:
Applications of statistics using regression and design of experiments techniques. Regression topics include simple linear regression, correlation, multiple regression and selection of the best model. Design topics include the completely randomized design, multiple comparisons, blocking and factorials.
The book and course have been designed to be a follow-up to a standard introductory statistics course (in many ways, this course can be considered “Intro Stat 2”). The course and text assumes the reader has a solid foundation in two-sample inference and some basic computing skills.
The book mixes statistical background with applications using the R Project for Statistical Computing (R Core Team, 2019).
We have attempted to perform all data processing and graphics in this text using the tidyverse
(Wickham, 2017); thus all plots and data processing should follow the tidy grammar in R (ggplot
, piping |>
and dplyr
function calls).
All statistical modeling uses base R function calls (e.g., lm()
and aov()
) and the emmeans
package (Lenth, 2019) is used for follow-up comparisons.
The book was designed with 13 chapters with the intent to be used during the 14-week semester at Miami University (one week reserved for midterm examination and end-of-semester review).
The first two chapters are very comprehensive.
Chapter 1 reviews introductory statistics material and provides a crash course in the basics of R, some elementary data handling with dplyr
(Wickham, François, et al., 2021) and making plots with ggplot2
(Wickham, 2016). The development of the two-sample \(t\)-test is reviewed.
Chapter 2 discusses the difference between designed experiments and observational studies, and extends the idea of two-sample inference into multiple samples. The key concepts of experimental design are introduced, some mathematical foundation for statistical models is introduced, and One-Way ANOVA is fully covered, including follow-up multiple comparisons.
Chapters 3 and 4 cover some advanced design ideas (multiple factors, blocking and repeated measures), presenting all inference in the form of a linear model.
Chapters 5–10 cover the classic topics around multiple linear regression as well as some data science concepts (prediction, variable selection and model validation).
Chapters 11 and 12 provides an overview of statistical odds and logistic regression.
Chapter 13 provides an overview of Generalized Linear Modeling with examples of both Poisson and Negative Binomial regression.