Introduction to RMarkdown (R Notebook)

Lesson
Materials

RStudio Project and R Markdown

R as a statistical computing environment packs a generous amount of tools allowing us to reshape, clean and visualize our data through its built-in capabilities. In this lesson, we’ll take a look at many of these capabilities and learn how to incorporate these into our day-to-day data science work.

In the next lesson, we’ll shift our focus onto ggplot, a plotting system by Hadley Wickham, inspired by the Grammar of Graphics (Wilkinson, 2005). As you’ll soon learn, this plotting system is among the most popular visualization tools today because of its power, extensibility and simplicity.

To get started with plotting in R, let’s start by creating a new Project in RStudio. If you haven’t download the materials, go to the Course Materials tab and follow the instructions to download the files.

  1. Launch RStudio and create a new RStudio Project. I shall name it covidRT (covid real-time dashboard), but you’re free to name it anything you like.
    • Click on the r project icon icon to Create Project > New Directory > Project Type: New Project > Directory Name: covidRT (place it under Desktop or any location you wish) >hit Create Project
  2. Once the project is created, if you peek into the directory you’ll see a file covidRT.Rproj being created for you.
  3. Create a new RMarkdown file
    • File > New File > R Markdown > Title: Learn plotting > OK. Save this file immediately and give it a name, like learnPlotting for example. If you peek at the Files tab (bottom-right pane, by default), you now have two files: covidRT.Rproj and learnPlotting.Rmd.
    • The first time you create a R Markdown document, RStudio will prompt for the installation of a package named rmarkdown. Make sure you’re connected to the internet and accept it. Otherwise, issue the command install.packages("rmarkdown") into the Console in RStudio. You only need to do this once. Subsequent creation of R Markdown documents will use this package you have installed onto your system.

But what’s the point of an RStudio Project anyway?

RStudio Project creates a “context” that you can work within. This context remembers the working directory where this project is initialized in, its workspace, its history and any project-specific temporary files. This saves you time in the long run especially if you use the same computer to work on multiple projects (each in its own directory and each have its own project-specific settings).

When you double click on covidRT.Rproj, a new R session is started, your working directory is automatically set to the project directory and other project-specific settings will be restored.

Whenever you wish to take a break from this project, simply exit RStudio. To resume working on the project, double click on covidRT.Rproj, which is located in the covidRT folder you created either on your Desktop or elsewhere.

What is the R Markdown document I just created?

r markdown

From the developers of RStudio, comes R Markdown, a format similar to Markdown (learn more about Markdown) but with the form and functionality of a code notebook like JupyterLab Notebook (learn more about JupyterLab Notebooks). If you’ve never worked with a notebook interface like Jupyter Notebooks, these are documents that allow you to “weave together narrative text and code”.

The output is elegantly formatted by R Markdown, and can be exported to PDF, HTML, PowerPoint, Microsoft Word, LaTeX, scientific articles formats and many others via third party extensions.

Using R Markdown: A Practical Tutorial

With learnPlotting.Rmd opened, let’s explain how to use a R Markdown document in more details.

When you first create a R Markdown document (e.g. File > New File > R Markdown), the document is populated with some filler content. Your document roughly looks like the following:

---
title: "Learn Plotting"
author: "Samuel"
date: "4/20/2020"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

```{r cars}
summary(cars)
```

## Including Plots

You can also embed plots, for example:

```{r pressure, echo=FALSE}
plot(pressure)
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

You will observe that the R Markdown file contains three types of content:

  • Header information surrounded by ---s. In our document, this is the header information that goes between line 1 and 6.
  • R code chunks surrounded by ```s. There are three chunks in our document, each containing one line of R code. A chunk can be run by hitting ther chunk play button – this will instruct RStudio to execute the code in that chunk and display the results on your notebook. Each chunk can have additional parameters, such as a name for that chunk and whether it should be included in the output when it is rendered later.
    • The first chunk is named setup, and the additional parameter include=FALSE tells R Markdown to not include this chunk in our output later since it’s only used for configuration and not part of our main thesis.
    • The second chunk is named cars, and the third chunk is named pressure. These names are arbitrary, you can name your chunk anything you want as long as it provides some identification. At the bottom of your code editor pane, there is a simple table-of-content (TOC) that you can use to navigate to the different chunks, which can be helpful as the length of your document start to grow.
    • rmarkdown toc
  • Narrative text with simple Markdown formatting. These are text outside of your header and code chunks. They are not meant to be executed as code, but decorative text to help narrate your work as you develop your research.
  • At the top of your editor pane there’s the Knit button rmarkdown knit. Select Knit to HTML.
  • A HTML file is now created and you will see a preview. Notice how the output is nicely formatted; The first chunk, since it contains include=FALSE, is no where to be seen in this HTML. It is simply not included.
  • The third chunk, since it contains echo=FALSE (echo is the computer science equivalent of “print”), is not printed out but its code is evaluated. Hence, the plot is rendered in the HTML output.
  • If you’re like me, you may prefer the HTML document to be rendered in RStudio’s built-in Viewer pane, rather than in a separate window. Click on settings icon and select Preview in Viewer Pane. Now click on Knit again. Notice that the HTML is rendered and the preview is now in RStudio’s Viewer Pane.

Go ahead now and add a new R chunk onto your notebook. This can be anywhere you wish.

  • You can add a new chunk by placing surrounding it with ``` manually (essentially, finding the back tick character on your keyboard, hitting it 3 times). Then at the opening ```, include {r} to denote that the following chunk is an R chunk. Put max(cars) to obtain the maximum value in the cars object.
```{r}
max(cars)
```
  • This could become tedious real fast, so there is another shortcut. You can hit the inser r chunk at the top of your editor pane to quickly insert at R chunk at wherever your cursor is.
  • You can, of course, use a keyboard shortcut to quickly insert an R chunk.
    • PC: Ctrl + alt + i
    • macOS: ⌘ (command) + option + i

For the remainder of this course, every time you read an instruction that says “insert an R chunk”, you can do any of the above. Take 30 seconds to practice the keyboard shortcut and try to commit them to memory since they can be a great time-saver in the long run.

Give a name to the R chunk you’ve just created, be it maxvalue or anything else. Your table-of-content (TOC) should now reflect the new structure of your R Markdown file. Make sure that within the chunk you have max(cars) so your HTML output will contain the value when this line of code is evaluated.

Your R Markdown document is evaluated from top to bottom, so having the following two chunks (maxvalue and experiment, respectively) in the right order is important. First, the maxv object has to created in the environment, and then it can be used or referenced. The second code chunk is randomly sampling 5 different numbers, from a range of 1 to maxv, so if we mistakenly place this chunk above the other, we will get an error that reads Error: object ‘max’ not found.

```{r maxvalue}
maxv <- max(cars)
print(maxv)
```

```{r experiment}
# sample 5 numbers from 1 to 120
sample(1:maxv, 5)
```

Because code chunk is expected to continue executable code, if we have wanted to add commentary (perhaps to explain what the following line does), then we’ll do it with a comment instead. In R and Python, any line that begins with a # character is treated as a comment and would just be ignored (won’t be executed).

Inline Code

You don’t always have to print code results using chunks. I can insert the code output directly into my narrative text by enclosing the code with `r `. R Markdown will display the results of inline code, in place of the code itself. To see an example of this, edit your R Markdown document to include the following and then knit the document again:

```{r maxvalue}
maxv <- max(cars)
print(maxv)
```

The maximum value in our data is `r maxv`.

```{r experiment}
# sample 5 numbers from 1 to 120
sample(1:maxv, 5)
```

Notice of course, that the knitted HTML actually display the result of the inline code:

The maximum value in our data is 120

In fact the inline output is indistinguishable from the surrounding text.

Code Languages in R Markdown

To process a code chunk using an alternate language engine, replace the r at the start of your chunk declaration with the name of the language:

```{r}
```

```{bash}
```

```{python}
```

Some of the available language engines include:

  • Python
  • SQL
  • Bash
  • Rcpp
  • Stan
  • JavaScript
  • CSS

Assuming you have Python installed on your system (learn how to install Python), you can add the following chunk in your R Markdown and hit r chunk play:

```{python lists}
# create a list
list(i*2 for i in [2,4,6])
```

You do not have to worry about remembering all the different options and syntax right now. As you continue building up your project, you will have plenty of opportunity to practice R and Python and many of these code will become familiar to you in no time.

Practice: R Markdown

Now it’s your turn to practice. Spend 30 – 60 minutes to work on your R Markdown document. Use Markdown (tips on Markdown) to add descriptive, narrative text, and change a few values from the sample code I gave you above. Try and get yourself familiar with the R Markdown format.

If you need a guidance, here’s the sample learnPlotting.Rmd:

---
title: "Learn Plotting"
author: "Samuel"
date: "4/20/2020"
output: html_document
---

## My Exercise

`summary` prints a summary of the dataset, which gives me information on the minimum, maximum, the different quantiles and the average (mean): 

```{r cars}
summary(cars)
```
I can use `max()` to obtain the maximum value in my data:

```{r maxvalue}
maxv <- max(cars)
print(maxv)
```

The maximum value in our data is `r maxv`.

```{r experiment}
# sample 5 numbers from 1 to 120
sample(1:maxv, 5)
```

```{python lists}
# create a list
list(i*2 for i in [2,4,6])
```


## Including Plots
`volcano` is another built-in dataset, and we can use the `image()` function to create a colored grid corresponding to the values in this dataset.
```{r}
image(volcano)
```

You can also render plot with a main title, using `main`:

```{r pressure, echo=FALSE}
plot(pressure, main="A non-descriptive title")
```

- Useful **bookmark 1**: Find more tips on using R Markdown on the [full tutorial](https://finetut.com/lessons/introduction-to-rmarkdown-r-notebook)!  
- Useful **bookmark 2**: Find more tips on working with Markdown on the [full tutorial](https://finetut.com/lessons/working-with-markdown-document/)!  

I believe you learn best by refraining the temptation to just copy and paste from above. Instead, modify and adapt from the reference above, and manually type the commands into your RStudio. You can change the values, rename the code chunks, add a few more descriptive paragraphs etc.

When you are done, Knit to HTML and verify that the output conforms to your expectations.

This HTML file is not different from any HTML file. You can locate in on your computer, right-click and open in any browser of your choice. You can send your report (HTML file) to your manager and he / she can open it in any browser without having to install R, or RStudio or anything for that matter. It is nicely formatted, and a standalone file that has no dependency on R or RStudio.

Summary: R Markdown

  • Add an R chunk using the delimiters ```{r} and ```, or the inser r chunk icon, or the CTRL + ALT + I command
  • It’s good practice to run your chunk interactively using the r chunk play icon to confirm that your code is indeed working as intended. Knit the document when you’re ready to export it to an output format.
  • include = FALSE prevents code and results from appearing in the finished file. R Markdown still runs the code in the chunk, and the results can be used by other chunks.
  • echo = FALSE prevents code, but not the results from appearing in the finished file. This is a useful way to embed figures.

Optional Video

 

Despite its name (RStudio), RStudio is a very capable IDE for other programming languages (Stan, R, SQL, C/C++, even Python) as well. You may not have heard of them yet, but you will learn about Python in this course. 

You can bookmark this video for the future and come back to it when you're more familiar with RStudio and working with Python: