# Probability and statistics

Educational level: this is a tertiary (university) resource. |

This curriculum reflects a hybrid between the typical undergraduate and graduate programs in Statistics. It aspires to provide a strong foundation in **both** the applied and theoretical branches of Statistics. Generally an "undergraduate statistics program" is functionally a math major with an emphasis in some statistical topics. (Rarely will an undergraduate student have the desire or foresight to focus on the field of Statistics quite this much.)

That's okay! Mentioned in this curriculum is the idea of a "statistics minor" which might be a stats emphasis on a math degree *or* perhaps someone in the applied physical sciences (physics, chemistry, biology, geology, or even psychology) wants to have a strong foundation in experimental design to supplement a research-oriented career. In these cases the student would want to tailor her curriculum with classes up through the fourth semester.

If the student wishes, however, to pursue a real professional career in Statistics, or is considering graduate school, the fifth semester and on will provide an excellent preparation. If anyone actually mastered this entire curriculum, he or she would be on par with any modern graduate student. A full-fledged thesis is expected, and the student will be expected to prepare *well in advance* starting in the sixth semester so the thesis does not fall under that *hurried, last minute* curse. Additionally the student will be expected to write a shorter summary paper for submission to two academic journals.

## Semester 0: Preparations and Review[edit | edit source]

*Semester 0* is conceptually a collection of preparatory pre-classes intended to give the student a good working foundation before he or she starts the study of Statistics in earnest. These would probably qualify as "half-credit" classes at most. Some students already possessing a strong background in mathematics may opt to test out of these classes and proceed directly to the first semester.

- Fundamentals of Probability, Statistics, Experiments and Data This discusses the history and nature of "What are Probability and Statistics?" The nature of data (types of data—discrete versus continuous, categorical, etc.) is discussed, as well as topics of problems in data gathering, unintended outside forces getting confounded with the data, etc.
- Combinatorics This class doesn't require any "fancy high-level math" but it
*is*computationally intensive. Lots of problems will exercise the student's brain, developing that muscle that is expected to understand and visualize issues in probability in later semesters. "Pump it up and feel the burn!" - Pre-Statistics Topics in Math, Introduction to Computing with R This is both a review of all the math that will be expected in the first semester (don't worry, it's pre-calculus at best) and an introduction to using R, the Open Source statistical package and programming language that is used in most academic institutions on the graduate level!

## Semester 1: Introducing Probability[edit | edit source]

The first semester provides an introduction to Probability in its true mathematical context. This class attempts to introduce strong foundations while avoiding too much mathematical rigor. (The Probability 2 and Statistics 2 classes will pick up with material that assumes a strong understanding of Calculus.)

- Introduction to Probability Probability defines the concepts of probability spaces, random variables, distributions, density and distribution functions. Independent and IID (independent identically distributed) random variables are also discussed. It ends with the introduction of expected values, attempting to provide an introduction that is accessible to a student who has not quite yet had a course in calculus.
- Research Methods and Library Science In today's age so many students don't know how to use a library. The student will be introduced to some of the primary statistical journals and will learn how to conduct research, including modern "interlibrary loan" systems and other means for accessing materials when we don't have a major research library nearby! This class is really intended for the serious student who plans to complete the full major program and attempts to prepare her for the research that will be necessary for thesis work. Fact: you can't just look it all up on the Internet!

## Semester 2: Introducing Statistics and Linear Models[edit | edit source]

The second semester is a continuation under the same pace as the previous semester, with an attempt to establish some foundations and familiarity with statistical and mathematical notation, while avoiding the rigor that requires calculus. (That will come soon enough!) The second half of *Regression and Linear Models* with attempt to reinforce and restate some of the theory from *Introduction to Statistical Analysis* in the context of lineal algebra.

- Introduction to Statistical Analysis Statistical Analysis reviews some fundamental summary statistics and then begins to relate sample statistics with their parallel components in probability. (Sample mean to probability mean, sample variance to variance, etc.) Concepts of confidence intervals and hypothesis testing are introduced with simple examples of each.
- Regression and Linear Models Theory of regression is presented in the context of Linear Models with a strong emphasis for example. Optional modules that give a solid proficiency in the R computer package are encouraged. Topics from the previous semester's introductory Probability and Statistics courses will be reiterated in the context of Linear Models.

## Semester 3: Completing the Basics[edit | edit source]

These courses complement each other, focusing on a different aspect of the field of Statistics. Whereas *Statistics for Experimenters* is very "applied" and data-oriented for the experimental or "laboratory" statistician, the time series course will appeal to the pure mathematician and theorist. These classes can be taken concurrently.

- Statistics for Experimenters This is mostly a review course, discussing all previous material in the context of real-world scientific research. Addition topics will include: best practices for using statistics in "thesis work" in any of the other disciplines, effective writing and presentation in papers and journals, application of the R package, best practices in data gathering. A preview of Sampling and Design of Experiments will hopefully encourage serious researchers to also take those Semester 4 classes.
- Stochastic Processes and Intro to Time Series Not for the weak of heart, a Statistics Major will now be expected to understand material presented with heavy mathematical notation. Topics in probability are extended into the time-dimension. Concepts of stationarity, ARMA and ARIMA and introduced. The semester ends with an introduction to the kalman filter.

## Semester 4: Applied Statistics Topics for Researchers[edit | edit source]

This semester is very heavy in the "applied statistics" arena. A student pursuing a *minor* in Statistics would conclude with one of these classes as his or her final class. (One of these classes will be a more natural fit depending on what the specific major program is.)

- Design of Experiments Class uses the foundations built in
*Statistics 103: Regression and Linear Models*to explore experimental design. This is an example-heavy class with a large amount of work done in R with pre-designed data sets. - Sampling All main sampling designs are explored, from Simple Random Sampling to Stratified and Importance Sampling.
- Categorical Methods and GLIM This class is most useful to students in the social sciences, psychology and pre-med. Contingency tables, Logistical Regression and GLIM will be introduced. This class is also very computer-intensive with sample analyses being performed using R.

## Semester 5: Advanced Probability[edit | edit source]

These classes are not considered for a *minor* study in Statistics. *Probability 2* is a required course for a Statistics major and approaches the level of a first-semester graduate course. From this point all students are expected to have a solid grasp of Calculus.

- Probability 2 "The rest of probability," this class looks at cumulative distribution functions, moment generating functions, calculation of expectation, conditional probability and random vectors.
- Computational Statistics Computer modeling, monte-carlo design, kernel density estimation, and other advanced topics.

## Semester 6: Advanced Statistics and Thesis Preparation[edit | edit source]

More of the same: Statistical Analysis 2 has plenty of mathematical rigor. The *Advanced Topics* are purely considered electives. The student at this point is expected to prepare for a final project, consulting with an advisor and fulling documenting the experimental or sampling design.

- Statistical Analysis 2 If this class doesn't kill you, nothing will. All the theory leading up to this point will be brought to bear on topics of minimal sufficient statistics, maximum likelihood, confidence intervals and hypothesis testing.
- Advanced Topics: Clinical Trials An introduction to the application of statistics to clinical trials in the pharmaceutical industry. Includes discussions of Phase I-IV clinical trials. Some independent reading of literature (ie. statistical journals) will be expected.
- Advanced Topics: Consulting Lab This class is geared to prepare a statistical consultant to help outside experimenters seeking help. Case studies will examine common problems and misconceptions, communication issues, and differences between being brought in early (for planning and design) and being brought in late (a.k.a. "help bring meaning to my worthless data") and everything in between.
- Thesis Design and Preparation An idea, design, and acceptable levels of background research must be presented to the advisor. Bibliographies of all "background research" materials must be prepared in advance. A detailed calendar must be presented and checked off by the advisor.

## Semester 7: Advanced Time Series and Thesis Work[edit | edit source]

Time Series 2 returns to the time-oriented data with the application of all the Probability and Statistics that has been acquired so far. This semester is generally where the student is supposed to perform the thesis work that had been planned out in the previous semester, including any mid-experiment analysis.

- Time Series 2 More math analysis and proofs of theorems. Applied examples and outside readings will show how the theory can be applied and developed for various "real world" problems.
- Advanced Topics: Bayesian Statistics Examination of Bayesian theory and its controversies. The class will discuss instances where a Bayesian approach has the most advantages.
- Advanced Topics: Spacial Analysis Theories and testing for regularity and clustering, etc.
- Thesis Work The outline, introduction and background material and most of the bibliography must be completed before the end of the semester. A draft of the journal paper must be in a similar state of preparedness.

## Semester 8: Multivariate and Thesis Defense[edit | edit source]

The student will defend a thesis during this semester, based on the (guided) planning and execution over the previous two semesters.

- Multivariate Statistics Everything we've done in Statistics 2 is taken to the n-dimension with an extension from random variables to random vectors. Techniques for analyzing eigenvalues and eigenvectors, transforming into orthogonal axes, etc.
- Mathematical Statistical Theory A purely optional class extending "Probability and Statistics 2" to the next step. The proofs of advanced concepts like the Weak and Strong Laws of Large Numbers and introduction of Measure Theory are approached. This class can really only aspire to set the foundation for a real class in this arguably doctoral level material.
- Thesis Defense Student will defend the thesis
*by mid-semester*, perform revisions, and will submit a condensed review of work to at least two industry journals. (They just have to be submitted, but accepted.)