A quick look at ggplot2

A quick look at ggplot2

October 31, 2018

|

Sumit

ggplot2 is a R package for creating graphs. However, unlike the other graphics packages in R, ggplot2 has an underlying grammar that is based Wilkinson’s book ‘The Grammar of Graphics’. Since the components of ggplot2 can be organized and arranged in many different ways, it allows the user to move beyond a set of predefined options, and create a huge variety of graphs and charts.

 

To start working with ggplot2, let us first install and load it.

 

install.packages("ggplot2")

 

library(ggplot2)

 

We will use ggplot2 on the mpg dataset that is part of the ggplot2 package. We can examine the structure of this dataset with the str() command.

 

str(mpg)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 234 obs. of 11 variables:

$ manufacturer: chr "audi" "audi" "audi" "audi" ...

$ model : chr "a4" "a4" "a4" "a4" ...

$ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...

$ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...

$ cyl : int 4 4 4 4 6 6 6 4 4 4 ...

$ trans : chr "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...

$ drv : chr "f" "f" "f" "f" ...

$ cty : int 18 21 20 21 16 18 18 18 16 20 ...

$ hwy : int 29 29 31 30 26 26 27 26 25 28 ...

$ fl : chr "p" "p" "p" "p" ...

$ class : chr "compact" "compact" "compact" "compact" ...

 

 

For any ggplot2 graph, there are three major components:

  1. The data.

  2. The aesthetic mappings or aes() that define the relationship between the data and the visual properties.

  3. The geometry, or geom(), which defines how to render the observations.

 

To understand how these components let’s create a bar chart.

 

ggplot(mpg, aes(x=class)) + geom_bar()

 

Let’s now create a scatterplot from the same dataset. Note how the geometry has been changed here from bar to point, and how two variables have been added to aes().

 

ggplot(mpg, aes(x=displ, y=cty)) + geom_point()

 

We shall now try to enrich this graph by adding more information to aes(). Let us now color the points of this graph according the variable class.

 

ggplot(mpg, aes(x=displ, y=cty, colour=class)) + geom_point()

 

 

 

Next, let’s resize the points of the graph based on the variable cyl.

 

ggplot(mpg, aes(x=displ, y=cty, colour=class, size=cyl)) + geom_point()

 

 

ggplot also allows for the use of multiple geometries to enhance the graph . Let’s add a smoothing line to our first scatterplot .

 

ggplot(mpg, aes(x=displ, y=cty)) + geom_point() + geom_smooth()

Two things are now clear from the examples above. First, by taking care of a lot of messy details like declaring legends and formatting, ggplot2 makes graphing simple and easy, and second, it allows the user to build graphs iteratively, going step by step from simple to more complex graphs.

 

Give ggplot2 a try. Install it and try building a few graphs and charts yourself. I am sure that you will not be disappointed with the results.