Lukasz Piwek

Updates

16th of February 2016: (1) Added stem-and-leaf displays; (2) added sparklines in ggplot2 (thanks to Wouter Van Der Bijl for addressing my StackOverflow question); (3) back to all-in-one-page format and revised sparklines to simplify; (4) range-frame and dot-dash plots split into separate sections for clarity; (5) overlong lines in ggplot2 version of dot-dash plots is back due to recent updates in ggplot2 that crash previous solution; (6) changed data set for slopegraph to match it between graphical systems.


17th of October 2015: Thank you for the kind words regarding this project! Major changes in this update: (1) added the first two methods to create Sparklines using Base Graphics and Lattice; (2) the document has become too large to keep on one-page. Its now split into two separate pages-chapters: Chapter 1: Line plot, Boxplot, Barchart and Slopegraph and Chapter 2: Sparklines; (3) corrected overlong lines produces in y-axis when making dot-dash plots with panel.rug() for Lattice (thanks to Josh O’Brien) and with geom_rug() for ggplot2 (thanks to BondedDust); (4) added this list of updates to keep a reasonable track of changes.


28th of July 2015: Tufte in R is live!

Introduction

Motivation

The idea behind Tufte in R is to use R - the most powerful open-source statistical programming language - to replicate excellent visualisation practices developed by Edward Tufte. It’s not a novel approach - there are plenty of excellent R functions and related packages wrote by people who have much more expertise in programming than myself. I simply collect those resources in one place in an accessible and replicable format, adding a few bits of my own coding discoveries.

Format

Each visualisation is provided in three graphical systems used in R: base graphics, lattice and ggplot2. As an example data I mainly use basic data sets easily accessible within R. Occasionally I use data from package psych developed by William Revelle, package MASS developed by Brian Ripley with collegues and various custom data I link via my Gist profile.

This page was produced in RMarkdown using Michael Sachs’s tuftehandout, but with a modified CSS inspired by Dave Liepmann’s Tufte CSS. Its best if you view this page on a desktop computer rather than mobile devices.

Requirements

You need the most recent version of R installed on you computer. You also need a basic understanding of R and there are some great online tutorials to get you started. I also recommend R Studio as an integrated development environment for R.

I use resources from a number of R packages. You can install all those packages at once via R console using the command below:

install.packages(c("CarletonStats", "devtools", "fmsb", "ggplot2", "ggthemes", 
                   "latticeExtra", "MASS", "PerformanceAnalytics", "psych", 
                   "plyr", "proto", "RCurl", "reshape", "reshape2"))
Minimal line plot

We start by plotting the most basic graph from page 65 of The Visual Display of Quantitative Information - a minimal line plot. This one is important because it illustrates the most elemental principle - that of minimalism with reduced ‘data-ink’. As Tufte explains, the ‘data-ink’ (total ink used to print the graphic) ratio should equal to ‘1 - proportion of graphic that can be erased without loss of data-information’. The primary challenge is therefore to modify the default graphs produced with R so that we remove as much of ‘non-data ink’ as possible. As you will soon see, this is done by subtracting and deconstructing existing R graphs to get rid of as much ‘non-data ink’ as possible.

Minimal line plot in base graphics

x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
pdf(width=10, height=6)
plot(y ~ x, axes=F, xlab="", ylab="", pch=16, type="b")
axis(1, at=x, label=x, tick=F, family="serif")
axis(2, at=seq(1,6,1), label=sprintf("$%s", seq(300,400,20)), tick=F, las=2, family="serif")
abline(h=6,lty=2)
abline(h=5,lty=2)
text(max(x), min(y)*2.5,"Per capita\nbudget expanditures\nin constant dollars", adj=1, 
     family="serif")
text(max(x), max(y)/1.08, labels="5%", family="serif")
dev.off()

Minimal line plot in lattice

library(lattice)
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
xyplot(y~x, xlab="", ylab="", pch=16, col=1, border = "transparent", type="o",
       abline=list(h = c(max(y),max(y)-1), lty = 2),
       scales=list(x=list(at=x,labels=x, fontfamily="serif", cex=1),
                   y=list(at=seq(1,6,1), fontfamily="serif", cex=1,
                          label=sprintf("$%s",seq(300,400,20)))),
       par.settings = list(axis.line = list(col = "transparent"), dot.line=list(lwd=0)),
       axis = function(side, line.col = "black", ...) {
         if(side %in% c("left","bottom")) {axis.default(side = side, line.col = "black", ...)}})
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif", 
      current.panel.limits()$ylim[1]/1.3, cex=1,
      "Per capita\nbudget expandures\nin constant dollars")
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif", 
      current.panel.limits()$ylim[1]/5.5, cex=1, "5%")

Minimal line plot in ggplot2

library(ggplot2)
library(ggthemes)
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
d <- data.frame(x, y)
ggplot(d, aes(x,y)) + geom_line() + geom_point(size=3) + theme_tufte(base_size = 15) +
  theme(axis.title=element_blank()) + geom_hline(yintercept = c(5,6), lty=2) + 
  scale_y_continuous(breaks=seq(1, 6, 1), label=sprintf("$%s",seq(300,400,20))) + 
  scale_x_continuous(breaks=x,label=x) +
  annotate("text", x = c(1977,1977.2), y = c(1.5,5.5), adj=1,  family="serif",
           label = c("Per capita\nbudget expandures\nin constant dollars", "5%"))

Range-frame (or quartile-frame) scatterplot

Range frame plot in base graphics

x <- mtcars$wt
y <- mtcars$mpg
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8, family="serif",
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
axis(1,at=summary(x),labels=round(summary(x),1), tick=F, family="serif")
axis(2,at=summary(y),labels=round(summary(y),1), tick=F, las=2, family="serif")

Range frame plot in base graphics with fancyaxis

library(devtools)
source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
x <- mtcars$wt
y <- mtcars$mpg
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
     xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
fancyaxis(1, summary(x), digits=1)
fancyaxis(2, summary(y), digits=1)

Range frame plot in lattice

library(lattice)
x <- mtcars$wt
y <- mtcars$mpg
xyplot(y ~ x, mtcars, col=1, pch=16, fontfamily="serif",
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
par.settings = list(axis.line = list(col="transparent"),
par.xlab.text=list(fontfamily="serif"),
par.ylab.text=list(fontfamily="serif")),
scales = list(x=list(at=summary(mtcars$wt),labels=round(summary(mtcars$wt),1),
fontfamily="serif"),
y=list(at=summary(mtcars$mpg),labels=round(summary(mtcars$mpg),1),
fontfamily="serif")),
axis = function(side, line.col = "black", ...) {
if(side %in% c("left","bottom")) {axis.default(side = side, line.col = "black", ...)}})

Range-frame plot in ggplot2

library(ggplot2)
library(ggthemes)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_rangeframe() + theme_tufte() +
xlab("Car weight (lb/1000)") + ylab("Miles per gallon of fuel") + 
theme(axis.title.x = element_text(vjust=-0.5), axis.title.y = element_text(vjust=1.5))

Dot-dash (or rug) scatterplot

Dot-dash plot in base graphics with fancyaxis

library(devtools)
source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
x <- mtcars$wt
y <- mtcars$mpg
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel", 
xlim=c(min(x)-0.2, max(x)+0.2),
ylim=c(min(y)-1.5, max(y)+1.5))
axis(1, tick=F)
axis(2, tick=F, las=2)
minimalrug(x, side=1, line=-0.8)
minimalrug(y, side=2, line=-0.8)

Dot-dash plot in lattice

library(lattice)
x <- mtcars$wt
y <- mtcars$mpg
xyplot(y ~ x, xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
par.settings = list(axis.line = list(col="transparent")),
panel = function(x, y,...) { 
panel.xyplot(x, y, col=1, pch=16)
panel.rug(x, y, col=1, x.units = rep("snpc", 2), y.units = rep("snpc", 2), ...)})

Dot-dash plot in ggplot2

library(ggplot2)
library(ggthemes)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_rug() + theme_tufte(ticks=F) + 
  xlab("Car weight (lb/1000)") + ylab("Miles per gallon of fuel") + 
  theme(axis.title.x = element_text(vjust=-0.5), axis.title.y = element_text(vjust=1))

Minimal boxplot

Minimal boxplot in base graphics

x <- quakes$mag
y <- quakes$stations
boxplot(y ~ x, main = "", axes = FALSE, xlab=" ", ylab=" ",
pars = list(boxcol = "transparent", medlty = "blank", medpch=16, whisklty = c(1, 1),
medcex = 0.7,  outcex = 0, staplelty = "blank"))
axis(1, at=1:length(unique(x)), label=sort(unique(x)), tick=F, family="serif")
axis(2, las=2, tick=F, family="serif")
text(min(x)/3, max(y)/1.1, pos = 4, family="serif",
"Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)")

Minimal boxplot in base graphics with chart.Boxplot

library(PerformanceAnalytics)
library(psych)
d <- msq[,80:84]
chart.Boxplot(d, main = "", xlab="average personality rating (based on n=3896)", ylab="", 
element.color = "transparent", as.Tufte=TRUE)

Minimal boxplot in lattice

x <- quakes$mag
y <- quakes$stations
bwplot(y ~ x, horizontal=F, xlab="", ylab="", do.out = FALSE, box.ratio = 0,
scales=list(x=list(labels=sort(unique(x)), fontfamily="serif"),
y=list(fontfamily="serif")),
par.settings = list(axis.line = list(col = "transparent"), box.umbrella=list(lty=1, col= 1),
box.dot=list(col= 1), box.rectangle = list(col= c("transparent"))))
ltext(current.panel.limits()$xlim[1]+250, adj=1,
current.panel.limits()$ylim[2]+50, fontfamily="serif",
"Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)")