Data Visualization in R and ggplot2
Starting Point: Data Visualization
An introduction to plotting in R, and how data scientists can benefit from a rich “grammar” for data visualization.
Data Visualization in R and ggplot2
COVID-19 is a global pandemic that has led to severe global socioeconomic disruption, and subsequently the largest global recession in history. Countries were put on full lockdown, curfew, and nationwide quarantine. Some countries has declared the state of emergency.
More than a third of the planet’s population is under some form of restriction.
Business Insider
Governments around the world temporarily closed educational institutions in their attempt to contain the spread, along with other social distancing measures.
These nationwide closures are impacting over 91% of the world’s student population.
UNESCO
While public health, commercial and clinical laboratories work around the clock to test for new cases of COVID-19, analysts, policymakers and the media rely on these data (lab-confirmed infections) to report the number and monitor the growth of “confirmed cases”.
These data are important, and as one researcher put it, it is “our window onto the pandemic and how it is spreading”. Without data, we have no way of understanding the spread of the pandemic and consequently, no way to responding to this threat appropriately.
The Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE), with support by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL), started to gather the data from a list of data sources released by the WHO, independent organizations as well as government-released statistics that include:
The data is collected using an automated script, and pushed to a GitHub (learn more about GitHub) repository that they maintain.
For the first half of this course, we will use a “cleaned” version of the dataset, extracted from JHU CSSE. This is, of course, provided to you as a convenient CSV in the Materials tab of the course. We will learn about various data visualization and plotting techniques in R using this dataset. The advantage of working with this CSV:
ggplot
visualization library. The CSV I’ve provided to you has been wrangled into the right shape we need for our exercise.As we develop our COVID-19 web dashboard, we will replace this with the direct call to JHU CSSE’s repository (“remote source”). This means some extra “overhead” since our R script needs to include the preprocessing steps. However, it has the benefit of:
High level overview: developing a web analytics app in R using Shiny
An introduction to plotting in R, and how data scientists can benefit from a rich “grammar” for data visualization.
A series of techniques for practical data manipulation, data cleansing, and data transformation in R (collectively called “data preprocessing”)
Learn the design principles behind Shiny, a popular web app framework among R developers and data scientists
Take your web dashboard “Live” after adding some CSS and JavaScript polish, courtesy of external libraries that integrate well with Shiny
Our dashboard will be web-based, so it is accessible to anyone with an internet connection. We want this dashboard to be responsive so it scales up to wide screen monitors but also “rearranges” its elements to fit nicely on a mobile phone.
Usually, this requires a developer to learn HTML, CSS, JavaScript along with a server-side language like Python and R. However, using the Shiny web app framework, we will only write our code in R; Our code in R will be “translated” to the HTML + CSS + JavaScript required for all the front-end action through the Shiny framework.
But what use is a web dashboard if we do not have a way to communicate our message? Data visualization, at its very essence, is about communication. So that’s where we’ll start this course with.
If you’re ready, head over to the course page, download the Cleaned CSV dataset and we’ll get started!