KnitR

From Wikiversity
Jump to navigation Jump to search

KnitR is package for the RStudio, which is a graphical user interface for calling commands and scripts for the underlying statistic software R (see Wikipedia:Knitr for details). If learners are able to see the R-Code in the learning document they can perform activities in the software for statistics on their own. Furthermore for research publications in the Wikiversity[1] readers can

  • reproduce the results,
  • learn from the methodology,
  • apply the R-code on their own data,
  • check if the algorithm are appropriate for experimental design
Knitr integration.png

Workflow of KnitR[edit]

KnitR consists of standard e.g. MarkDown document with R-code chunks integrated in the document. The code chunks can be regarded as R-scripts that

  • load data,
  • preforms data processing and
  • creates output data (e.g. descriptive analysis) or output graphics (e.g. boxplot diagram).

The implementation of logical conditions in R can provide text elements for the dynamic report depended on the statistical analysis. The following text is as stan

   The Wilcoxon Sign test was applied as statistical comparison of the average of two dependent samples data1 and data2. 
   In this case the the calculated P-value was 0.056 and hence greater than the significance (0.05 by default).
   This implies that "H0: there is no difference between the    
   results in data1 and data2" must be accepted. 

Depending on the R results (here 0.056) the text fragments are determined by logical conditions in the R-script. If the P-value was 0.045, which is lower than the significance (0.05 by default). An other appropriate text fragment is inserted in the dynamic report.

   The Wilcoxon Sign test was applied as statistical comparison of the average of two dependent samples data1 and data2. 
   In this case the the calculated P-value was 0.045 and hence smaller than the significance (0.05 by default).
   This implies that "H0: there is no difference between the    
   results in data1 and data2" will be rejected. 

By this workflow the replacement of the input data of the statistical or numerical analysis in R creates a reproducible report which the same methodology. The key in KnitR to accomplish this basic example of dynamic text generation is the application of text and numeric variables that are populate by R-Code, that are part of the source document.

Installation of KnitR[edit]

KnitR comes as a CRAN package, hence can be installed as any CRAN package either with your R package manager, if you use one, for example that which comes with your IDE or simple from the R command line by typing

install.packages('knitR', dependencies=TRUE)

where the part about the dependencies automatically installs all packages that are needed for the use of knitR.

A "Hello World!" example[edit]

As our simplest KnitR document, we start with the classical "Hello world!". We will create a document, that uses the features of knitR to print the values of R variables to generate it. It looks like the following:

---
title: "Hello world! - 1"
author: "Martin Papke"
date: "23 August 2018"
output: pdf_document
---
```{r definition, include=FALSE}
helloworld <- "Hello World!"
```
I always like to say `r helloworld`

We first specified the title and author of the document and the output we want to generate. Then we created an R variable containing the text "Hello World!", which we printed into our document. To test this document, copy it into a textfile named 'hello.Rmd', start R and compile it with rmarkdown::render('hello.Rmd') from the R console. The used knitR features are explained below

Extension of the "Hello world!" example[edit]

As a next step, we want to use R's feature to generate random numbers to print "Hello World!" and random numbers. We extend our example as follows

---
title: "Hello world! - 2"
author: "Martin Papke"
date: "23 August 2018"
output: pdf_document
---
```{r definition, include=FALSE}
helloworld <- "Hello World!"
number <- sample(1:10, 1) # generate a random number between 1 and 10
```
It's better to say "Hello" more often, we will do it `r number`-times:
```{r loop, include=FALSE}
for (i in 1:number){
  helloworld
}
```

Here we generated a random number to print "Hello World!" more often.

Use of KnitR[edit]

We will explain the creation of a knitR document with a simple example document. We will look at each part of this simple document and explain the new features used here. A knitR document basically consists of two types of text:

  • Text, which can be formatted with the R markup language, quite similar to the Mediawiki markup used to format wikiversity articles
  • R code snippets, which consists of R code, which is executed if the document is rendered

The first part of our sample document looks like this

---
title: "Descriptive Statistics of 10000 dice rolls - a simple KnitR example"
author: "Martin Papke"
date: "22 August 2018"
output: pdf_document
---

At the start of every knitR document, we specify a title and an author. Moreover we have to tell the interpreter, in what form we want knitR to produce the output, here we create a PDF document. Other possible outputs include Word and HTML documents.

Next, we load the packages we need in an R code snippet. Code snippets are seperated by ``` from the text parts.

 
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(readr)
library(dplyr)
library(ggplot2)
```

At the beginning of a code snippet, we have to specify in curly brackets the language used (here: R) and a name for the code snippet. We can give extra options as include=FALSE here, which prevents R from including this code snippet into the output document. Note that per default code snippets are included.

# A simple KnitR example

## Data import
[...]

Headings are marked with # and ## for level 1 and level 2 headings.

## Statistics

Now we can do some statistics 
``` {r statistics}
  dicemean <- mean(dice)
  dicemedian <- median(dice)
```
So, the mean of our dice throws is $\bar x = `r dicemean`$ and the median is `r dicemedian`. We 
know count the absolute frequencies of the dice results: 
```{r statistics2}
  dicetable <- table(dice)
```

What we see in this part is that LaTeX markup can be used in the text to present mathematical formulae. We also say how to include a result of an R command into the document namely by `r command`. In this way, e.g. the results of calculations can be part of our document, as it is shown with the mean and the median above.

## Plots
In KnitR, plots can be done into the document, just call the usual R plot command 
```{r plot}
  xy <- data.frame(dicetable)
  ggplot(data=xy, aes(x=dice, y=Freq)) + geom_bar(stat="identity")
```

Plots are simply put into the document, created by usual R code, as shown above.

A statistical example - checking for idependence[edit]

We will reuse our dice data to check if the even and the odd numbered throws are independent, see here. After the loading of the data as above, we use R's internal -test to check for independence, by inwoking

  # test the contingency table 
  chi <- chisq.test(tbl)

Now we can check whether we have (high) significance of independence by looking at the $p$-value:

 
  p <- chi$p.value
  if (p < 0.01) {
    "high significance for independene"
  } else if (p < 0.05) {
    "significance for independence"
  } else {
    "no significance for independence"
  }

Compilation of a knitR source file[edit]

A knitR file can be compiled into the specified output formats by running rmarkdown::render(<FILENAME>) from the R console.

Some external knitR tutorials[edit]

Wiki to Markdown Conversion with PanDoc[edit]

The OpenSource tool PanDoc is called the "swiss army knife" of document conversion. Assume we have a KnitR document of a scientific paper that contains the KnitR code chunks for processing the data, that was analysed.

  • converted the Markdown document of the paper with PanDoc-Online Converter in a MediaWiki document.
    • Create a sample document with the knitr-package in RStudio and save the R-Markdown file with the extension Rmd to your harddrive.
    • Copy the content of your R-Markdown document to PanDoc-Online Converter,
    • select Markdown (pandoc) as input format,
    • select MediaWiki as output format,
    • press Convert-button and analyze the generated MediaWiki syntax of the text.
  • The R-Code chunks for the analysis of the data (e.g. loaded from CSV file of spreadsheet document) is converted into a <code>-environment.
  • This converted KnitR document is stored together with the scientific papers in the WikiJournal (e.g. WikiJournal of Medicine). If sampling of data was performed in the same way the application of the KnitR-document with the new data will be performed in the same algorithmic way. This KnitR-approach contributes to a workflow for Reproducible Science.

Workflow for KnitR-Backend connected to Wikiversity/Wikipedia[edit]

The following workflow is not implemented in Wikiversity currently. With this wiki resource you learn about the workflow analogy between KnitR and future benefits of a KnitR-like implementation for scientific publishing of dynamically generated learning resources in Wikiversity:

==My Section==
this is text in wikiversity. Now we calculate the covariance of two vectors x and y dynamically. The covariance is 
<dyncalc language="R" src="https://r-backend.example.com">
   x <- rnorm(100)
   y <- 3*x + rnorm(100)
   cor(x, y) 
</dyncal>
Now we create a scatter plot of the data 
<dyncalc language="R" src="https://r-backend.example.com">
   {r scatterplot, fig.width=8, fig.height=6}
   plot(x,y)
</dyncal>
Now we create the text output depending on the analysis of the data.
<dyncalc language="R" src="https://r-backend.example.com">
  if (cor(x, y) > 0) {
    print("The covariance of x and y is a positive number.")
  }
</dyncal>

After processing the wikiversity document with R-backend specified the src-attribute, the result could be:

==My Section==
this is text in wikiversity. Now we calculate the covariance of two vectors x and y dynamically. The covariance is 0.93612
Now we create a scatter plot of the data 
[[File:Scatterplot9923400384204.png]]
Now we create the text output depending on the analysis of the data.
The covariance of x and y is a positive number.

The proprosed environment for dynamic calculations are placed in a tag-envirnoment, just like the math-tag for mathematical expressions.

  • The encapsuled content in the math-tag is rendered for the output.
  • The encapsuled content in the dyncal-tag is submitted to a backend for calculation or creating a scatterplot.

The encapsuled code the real R-code, that works in R or RStudio. Workflow for R/KnitR can be found in the R-Tutorial by K. Broman[2]. The R-code creates to vectors with random numbers and calculates the covariance. The R-code should be processed for wikiversity pages after a code modification by default (Server load for the R-backend). Other options are, that learner and download the source via an API and can create the KnitR offline on the mobile device. Wikiversity community will decide if this is an option to include in a learning environement to explore analysis of data and its interpretation.

  • document language will be standard Wiki markup, also known as wikitext or wikicode, consists of the syntax and keywords used by the MediaWiki software to format a page (e.g. used in Wikipedia, Wikiversity, OLAT,...).
  • R-Code chunks will be recognized and interpreted by a R-backend or a reference to a versioned R-script was inserted in wiki document/article, and any found reference will lead to Read-update of the wiki article if data, script or document is updated. Diagrams are e.g. still PNG files in Wikimedia that are imported in a standard way most authors of the wiki community will know. The difference between a standard Wiki document and wiki document with R-Code chunks is, that any update of data or update of script will call the R-script again and a new version of output (diagrams as PNG files, number, dynamic text elements) are created. This concept is used basically for mathematical formulas in MediaWiki by TexVC[3] resp. the Math Extension for MediaWiki[4] as well. The LaTeX sources are parsed and converted into images, MathML, that can be displayed in a browser.
  • SageMath is another potential candidate as backend, perform the numerical and statistical analysis "on the fly" in a learning resource. The benefits are tremendous especially when learners and authors too, because a learning task can be performed in SageMath by the learner and the diagrams and maps for recents events can be visible in the document without the to update diagrams/figure statistical results with the most recent data again and again. The available software package within SageMath is huge and R is one package among the SageMath package list.
  • PanDoc could used to convert wiki code to markdown for processing with the KnitR package.

Learning Task[edit]

In the previous section the workflow of a integrated approach of KnitR was elaborated. Due to the fact that this concept is not implemented yet as extension in MediaWiki yet, the workflow cannot performed with code chunk for mathematical calculations in the MediaWiki of Wikiversity directly. But it possible to learn about the workflow in general:

See also[edit]

References[edit]

  1. WikiJournal of Medicine - An open access journal with no publication costs – About ISSN: 2002-4436 www.WikiJMed.org Frequency: Continuous Since: March 2014 Publisher: Wikimedia Foundation
  2. Karl Broman, KnitR in a Nutshell - (accessed 2017/08/14) - http://kbroman.org/knitr_knutshell/pages/Rmarkdown.html
  3. Schubotz, M. (2013). Making math searchable in Wikipedia. arXiv preprint arXiv:1304.5475.
  4. Schubotz, M., & Wicke, G. (2014). Mathoid: Robust, scalable, fast and accessible math rendering for wikipedia. In Intelligent Computer Mathematics (pp. 224-235). Springer, Cham.
  5. Quantum Geographic Information System (QGIS) - Open Source Software Package for Linux, Windows, Mac (2017) - LTR 2.18.11 access 2017/08/14 - https://www.qgis.org/en/site/forusers/download.html

External links[edit]