KnitR/dice.Rmd

From Wikiversity
Jump to navigation Jump to search
---
title: "Descriptive Statistics of 10000 dice rolls - a simple KnitR example"
author: "Martin Papke"
date: "22 August 2018"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(readr)
library(dplyr)
library(ggplot2)
```

# A simple KnitR example

## Data import

In this document we aim to show how KnitR can be used to gerenate a report or an article 
containing statistical data and how the R code can be integrated within the document.
As example data, we use 10000 dice rolls contained in the file *dice.csv*. As usual in R 
we could load the data with 
```{r loaddata}
  # data <- read.csv('dice.csv', stringsAsFactors=FALSE)
  # dice <- as.numeric(data$X3)  
```
To give a standalone example here, we use R's feature to generate random numbers 
```
  dice <- sample(1:6, 10000, replace=TRUE)
```

## Statistics

Now we can do some statistics 
``` {r statistics}
  dicemean <- mean(dice)
  dicemedian <- median(dice)
```
So, the mean of our dice throws is $\bar x = `r dicemean`$ and the median is `r dicemedian`. We 
know count the absolute frequencies of the dice results: 
```{r statistics2}
  dicetable <- table(dice)
```
We obtain the results 
```{r table1, echo=FALSE}
  kable(dicetable, caption='Dice results')
```

## Plots
In KnitR, plots can be done into the document, just call the usual R plot command 
```{r plot}
  xy <- data.frame(dicetable)
  ggplot(data=xy, aes(x=dice, y=Freq)) + geom_bar(stat="identity")
```

## Some data manipulation
We now combine each two dice throws into one, hence we get 5000 samples of two dice throws. 
```{r combine}
  dicetwo  <- dice[seq.int(0,10000,2)] + dice[seq.int(1,10000,2)]
  twotable <- table(dicetwo)
```
As the result of the first two throws were $(`r dice[1]`,`r dice[2]`)$, the first entry 
of *dicetwo* is $\texttt{dicetwo[1]} = `r dicetwo[1]`$.

Finally, we look again at a plot 
```{r plot2}
  xy <- data.frame(twotable)
  ggplot(data=xy, aes(x=dicetwo, y=Freq)) + geom_bar(stat="identity")
```