Lukasz Piwek

Updates

6th of July 2017: (1) New category - interactive plots made in Tufte-style with R - the first addition is basic line plot and basic barchart with the use of package highcharter; (2) Revised slopegraph in base graphics - Thomas Leeper has implemented his slopegraph functions into development version package on GitHub; (3) Revised sparklines in lattice get a gray bands (thanks to Bryan Urban for sharing his code on this); (4) Improved quality of all example figures; (5) Commented code and removed plot for range frame plot and dot-dash plot in base graphics produced with Steven Murdoch’s custom GitHub fancyaxis function - doesn’t work anymore; (6) Commented code and removed plot for slopegraph produced with James Keirstead’s GitHubs function - doesn’t work anymore.


30th of May 2016: (1) Added a new category - marginal histogram scatterplots for base graphics and ggplot2; (2) added new sparklines in base graphics with plotSparklineTable function from epanetReader package; (3) added new slopegraph in base graphic with bumpchart function from plotrix package (thanks to Jim Lemon forthis suggestion); (4) added range-frame scatterplot in ggplot2 with function qfplot by Mikhail Popov.


16th of February 2016: (1) Added stem-and-leaf displays; (2) added sparklines in ggplot2 (thanks to Wouter Van Der Bijl for addressing my StackOverflow question); (3) back to all-in-one-page format and revised sparklines to simplify; (4) range-frame and dot-dash plots split into separate sections for clarity; (5) overlong lines in ggplot2 version of dot-dash plots is back due to recent updates in ggplot2 that crash previous solution; (6) changed data set for slopegraph to match it between graphical systems.


17th of October 2015: Thank you for the kind words regarding this project! Major changes in this update: (1) added the first two methods to create Sparklines using Base Graphics and Lattice; (2) the document has become too large to keep on one-page. Its now split into two separate pages-chapters: Chapter 1: Line plot, Boxplot, Barchart and Slopegraph and Chapter 2: Sparklines; (3) corrected overlong lines produces in y-axis when making dot-dash plots with panel.rug() for Lattice (thanks to Josh O’Brien) and with geom_rug() for ggplot2 (thanks to BondedDust); (4) added this list of updates to keep a reasonable track of changes.


28th of July 2015: Tufte in R is live!

Introduction

Motivation

The idea behind Tufte in R is to use R - the most powerful open-source statistical programming language - to replicate excellent visualisation practices developed by Edward Tufte. It’s not a novel approach - there are plenty of excellent R functions and related packages wrote by people who have much more expertise in programming than myself. I simply collect those resources in one place in an accessible and replicable format, adding a few bits of my own coding discoveries.

Format

Each visualisation is provided in three graphical systems used in R: base graphics, lattice and ggplot2. As an example data I mainly use basic data sets easily accessible within R. Occasionally I use data from package psych developed by William Revelle, package MASS developed by Brian Ripley with collegues and various custom data I link via my Gist profile.

This page was produced in RMarkdown using Michael Sachs’s tuftehandout, but with a modified CSS inspired by Dave Liepmann’s Tufte CSS. Its best if you view this page on a desktop computer rather than mobile devices.

Requirements

You need the most recent version of R installed on you computer. You also need a basic understanding of R and there are some great online tutorials to get you started. I also recommend R Studio as an integrated development environment for R.

I use resources from a number of R packages. You can install all those packages at once via R console using the command below:

install.packages(c("CarletonStats", "devtools", "epanetReader", "fmsb", "ggplot2", "ggthemes", 
                   "latticeExtra", "MASS", "PerformanceAnalytics", "psych", 
                   "plyr", "prettyR", "plotrix", "proto", "RCurl", "reshape", "reshape2"))
Minimal line plot

We start by plotting the most basic graph from page 65 of The Visual Display of Quantitative Information - a minimal line plot. This one is important because it illustrates the most elemental principle - that of minimalism with reduced ‘data-ink’. As Tufte explains, the ‘data-ink’ (total ink used to print the graphic) ratio should equal to ‘1 - proportion of graphic that can be erased without loss of data-information’. The primary challenge is therefore to modify the default graphs produced with R so that we remove as much of ‘non-data ink’ as possible. As you will soon see, this is done by subtracting and deconstructing existing R graphs to get rid of as much ‘non-data ink’ as possible.

Minimal line plot in base graphics

x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
pdf(width=10, height=6)
plot(y ~ x, axes=F, xlab="", ylab="", pch=16, type="b")
axis(1, at=x, label=x, tick=F, family="serif")
axis(2, at=seq(1,6,1), label=sprintf("$%s", seq(300,400,20)), tick=F, las=2, family="serif")
abline(h=6,lty=2)
abline(h=5,lty=2)
text(max(x), min(y)*2.5,"Per capita\nbudget expanditures\nin constant dollars", adj=1, 
     family="serif")
text(max(x), max(y)/1.08, labels="5%", family="serif")
dev.off()

Minimal line plot in lattice

library(lattice)
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
xyplot(y~x, xlab="", ylab="", pch=16, col=1, border = "transparent", type="o",
       abline=list(h = c(max(y),max(y)-1), lty = 2),
       scales=list(x=list(at=x,labels=x, fontfamily="serif", cex=1),
                   y=list(at=seq(1,6,1), fontfamily="serif", cex=1,
                          label=sprintf("$%s",seq(300,400,20)))),
       par.settings = list(axis.line = list(col = "transparent"), dot.line=list(lwd=0)),
       axis = function(side, line.col = "black", ...) {
         if(side %in% c("left","bottom")) {axis.default(side = side, line.col = "black", ...)}})
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif", 
      current.panel.limits()$ylim[1]/1.3, cex=1,
      "Per capita\nbudget expandures\nin constant dollars")
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif", 
      current.panel.limits()$ylim[1]/5.5, cex=1, "5%")

Minimal line plot in ggplot2

library(ggplot2)
library(ggthemes)
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
d <- data.frame(x, y)
ggplot(d, aes(x,y)) + geom_line() + geom_point(size=3) + theme_tufte(base_size = 15) +
  theme(axis.title=element_blank()) + geom_hline(yintercept = c(5,6), lty=2) + 
  scale_y_continuous(breaks=seq(1, 6, 1), label=sprintf("$%s",seq(300,400,20))) + 
  scale_x_continuous(breaks=x,label=x) +
  annotate("text", x = c(1977,1977.2), y = c(1.5,5.5), adj=1,  family="serif",
           label = c("Per capita\nbudget expandures\nin constant dollars", "5%"))

Minimal line plot - interactive plot with highcharter

A new approach to create a dynamic plots with package highcharter - seems its based on Java wrappers and includes dedicated hc_theme_tufte().

library(highcharter)
x <- 1967:1977
y <- c(290,318,372,385,385,372,386,380,390,400,380)
d <- data.frame(x, y)
highchart() %>%
  hc_chart(type = "scatter") %>% 
  hc_subtitle(text = "Per capita budget expanditures in constant dollars") %>%
  hc_yAxis(labels = list(format = "${value}")) %>%
  hc_add_series(data = d) %>% 
  hc_add_theme(hc_theme_tufte())
Range-frame (or quartile-frame) scatterplot

Range frame plot in base graphics

x <- mtcars$wt
y <- mtcars$mpg
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8, family="serif",
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
axis(1,at=summary(x),labels=round(summary(x),1), tick=F, family="serif")
axis(2,at=summary(y),labels=round(summary(y),1), tick=F, las=2, family="serif")

Range frame plot in base graphics with fancyaxis

# library(devtools)
# source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
# x <- mtcars$wt
# y <- mtcars$mpg
# plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
#      xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
# fancyaxis(1, summary(x), digits=1)
# fancyaxis(2, summary(y), digits=1)

Range frame plot in lattice

library(lattice)
x <- mtcars$wt
y <- mtcars$mpg
xyplot(y ~ x, mtcars, col=1, pch=16, fontfamily="serif",
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
par.settings = list(axis.line = list(col="transparent"),
par.xlab.text=list(fontfamily="serif"),
par.ylab.text=list(fontfamily="serif")),
scales = list(x=list(at=summary(mtcars$wt),labels=round(summary(mtcars$wt),1),
fontfamily="serif"),
y=list(at=summary(mtcars$mpg),labels=round(summary(mtcars$mpg),1),
fontfamily="serif")),
axis = function(side, line.col = "black", ...) {
if(side %in% c("left","bottom")) {axis.default(side = side, line.col = "black", ...)}})

Range-frame plot in ggplot2

library(ggplot2)
library(ggthemes)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_rangeframe() + theme_tufte() +
xlab("Car weight (lb/1000)") + ylab("Miles per gallon of fuel") + 
theme(axis.title.x = element_text(vjust=-0.5), axis.title.y = element_text(vjust=1.5))

Range-frame plot in ggplot2 with qfplot

library(devtools)
source_url('https://raw.githubusercontent.com/bearloga/Quartile-frame-Scatterplot/master/qfplot.R')
qfplot(x=mtcars$wt, y=mtcars$mpg, xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")

Dot-dash (or rug) scatterplot

Dot-dash plot in base graphics with fancyaxis

# library(devtools)
# source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
# x <- mtcars$wt
# y <- mtcars$mpg
# plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
# xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel", 
# xlim=c(min(x)-0.2, max(x)+0.2),
# ylim=c(min(y)-1.5, max(y)+1.5))
# axis(1, tick=F)
# axis(2, tick=F, las=2)
# minimalrug(x, side=1, line=-0.8)
# minimalrug(y, side=2, line=-0.8)

Dot-dash plot in lattice

library(lattice)
x <- mtcars$wt
y <- mtcars$mpg
xyplot(y ~ x, xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
par.settings = list(axis.line = list(col="transparent")),
panel = function(x, y,...) { 
panel.xyplot(x, y, col=1, pch=16)
panel.rug(x, y, col=1, x.units = rep("snpc", 2), y.units = rep("snpc", 2), ...)})

Dot-dash plot in ggplot2

library(ggplot2)
library(ggthemes)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_rug() + theme_tufte(ticks=F) + 
  xlab("Car weight (lb/1000)") + ylab("Miles per gallon of fuel") + 
  theme(axis.title.x = element_text(vjust=-0.5), axis.title.y = element_text(vjust=1))

Marginal histogram scatterplot

Marginal histogram scatterplot in base graphics with fancyaxis

library(devtools)
source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
x <- faithful$waiting
y <- faithful$eruptions
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
     xlab="Time till next eruption (min)", ylab="Duration (sec)", 
     xlim=c(min(x)/1.1, max(x)), ylim=c(min(y)/1.5, max(y)))
axis(1, tick=F)
axis(2, tick=F, las=2)
axisstripchart(faithful$waiting, 1)
axisstripchart(faithful$eruptions, 2)

Marginal histogram scatterplot in lattice (in preparation)

Marginal histogram scatterplot in ggplot2 with ggMarginal

library(ggplot2)
library(ggExtra)
library(ggthemes)
p <- ggplot(faithful, aes(waiting, eruptions)) + geom_point() + theme_tufte(ticks=F)
ggMarginal(p, type = "histogram", fill="transparent")

However, ggMarginal can be also used to quickly create margin densityplots using the same function:

library(ggplot2)
library(ggExtra)
library(ggthemes)
p <- ggplot(faithful, aes(waiting, eruptions)) + geom_point() + theme_tufte(ticks=F) +
  theme(axis.title=element_blank(), axis.text=element_blank())
ggMarginal(p, type = "density")




…and it can also be used to create margin boxplots:

library(ggplot2)
library(ggExtra)
library(ggthemes)
p <- ggplot(faithful, aes(waiting, eruptions)) + geom_point() + theme_tufte(ticks=F) +
  theme(axis.title=element_blank(), axis.text=element_blank())
ggMarginal(p, type = "boxplot", size=10, fill="transparent")
Minimal boxplot

Minimal boxplot in base graphics

x <- quakes$mag
y <- quakes$stations
boxplot(y ~ x, main = "", axes = FALSE, xlab=" ", ylab=" ",
pars = list(boxcol = "transparent", medlty = "blank", medpch=16, whisklty = c(1, 1),
medcex = 0.7,  outcex = 0, staplelty = "blank"))
axis(1, at=1:length(unique(x)), label=sort(unique(x)), tick=F, family="serif")
axis(2, las=2, tick=F, family="serif")
text(min(x)/3, max(y)/1.1, pos = 4, family="serif",
"Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)")

Minimal boxplot in base graphics with chart.Boxplot

library(PerformanceAnalytics)
library(psych)
d <- msq[,80:84]
chart.Boxplot(d, main = "", xlab="average personality rating (based on n=3896)", ylab="", 
element.color = "transparent", as.Tufte=TRUE)

Minimal boxplot in lattice

x <- quakes$mag
y <- quakes$stations
bwplot(y ~ x, horizontal=F, xlab="", ylab="", do.out = FALSE, box.ratio = 0,
scales=list(x=list(labels=sort(unique(x)), fontfamily="serif"),
y=list(fontfamily="serif")),
par.settings = list(axis.line = list(col = "transparent"), box.umbrella=list(lty=1, col= 1),
box.dot=list(col= 1), box.rectangle = list(col= c("transparent"))))
ltext(current.panel.limits()$xlim[1]+250, adj=1,
current.panel.limits()$ylim[2]+50, fontfamily="serif",
"Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)")

Minimal boxplot in ggplot2

library(ggplot2)
library(ggthemes)
ggplot(quakes, aes(factor(mag),stations)) + theme_tufte() +
geom_tufteboxplot(outlier.colour="transparent") + theme(axis.title=element_blank()) +
annotate("text", x = 8, y = 120, adj=1,  family="serif",
label = c("Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)"))

Minimal barchart

Minimal barchart in base graphics

library(psych)
d <- colMeans(msq[,c(2,7,34,36,42,43,46,55,68)], na.rm = T)*10
barplot(d, xaxt="n", yaxt="n", ylab="", border=F, width=c(.35), space=1.8)
axis(1, at=(1:length(d))-.26, labels=names(d), tick=F, family="serif")
axis(2, at=seq(1, 5, 1), las=2, tick=F, family="serif")
abline(h=seq(1, 5, 1), col="white", lwd=3)
abline(h=0, col="gray", lwd=2)
text(min(d)/2, max(d)/1.2, pos = 4, family="serif",
"Average scores\non negative emotion traits\nfrom 3896 participants\n(Watson et al., 1988)")

Minimal barchart in lattice

library(lattice)
library(psych)
d <- colMeans(msq[,c(2,7,34,36,42,43,46,55,68)],na.rm = T)*10
barchart(sort(d), xlab="", ylab="", col = "grey", origin=1,  
border = "transparent", box.ratio=0.5, 
panel = function(x,y,...) {
panel.barchart(x,y,...)
panel.abline(v=seq(1,6,1), col="white", lwd=3)},
par.settings = list(axis.line = list(col = "transparent")))
ltext(current.panel.limits()$xlim[2]-50, adj=1,  
current.panel.limits()$ylim[1]-100,
"Average scores\non negative emotion traits\nfrom 3896 participants\n(Watson et al., 1988)")

Minimal barchart in ggplot2

library(ggplot2)
library(ggthemes)
library(psych)
library(reshape2)
d <- melt(colMeans(msq[,c(2,7,34,36,42,43,46,55,68)],na.rm = T)*10)
d$trait <- rownames(d)
ggplot(d, aes(x=trait, y=value)) + theme_tufte(base_size=14, ticks=F) +
  geom_bar(width=0.25, fill="gray", stat = "identity") +  theme(axis.title=element_blank()) +
  scale_y_continuous(breaks=seq(1, 5, 1)) + 
  geom_hline(yintercept=seq(1, 5, 1), col="white", lwd=1) +
  annotate("text", x = 3.5, y = 5, adj=1,  family="serif",
label = c("Average scores\non negative emotion traits
          from 3896 participants\n(Watson et al., 1988)"))

Minimal barchart - interactive with highcharter

library(psych)
library(reshape)
library(highcharter)
values <- 1 + abs(rnorm(12))
d <- melt(colMeans(msq[,c(2,7,34,36,42,43,46,55,68)], na.rm = T)*10)
trait <- row.names(d) 
value <- as.vector(d[,1])
highchart() %>%
  hc_chart(type = "column") %>%
  hc_add_series(data = value) %>%
  hc_xAxis(categories = row.names(d)) %>%
  hc_add_theme(hc_theme_tufte2())
Slopegraph

Slopegraph in base graphics

The most promising slopegraph functions for base graphics and ggplot2 comes from Thomas Leeper slopegraph package. Thomas’s solutions have evolved gradually and it’s now the most efficient method to create slopegraphs in R. However, a major limitation is inability to efficently offset left and right side labels to avoid don’t overlap (as seen below).

library(devtools)
#install_github("leeper/slopegraph")#install Leeper's package from Github
library(slopegraph)
data(cancer)
slopegraph(cancer, col.lines = 'gray', col.lab = 1, col.num = 1,
           xlim = c(-.2,5),
           main = "Estimate of % survival rates",
           xlabels = c('5 Year','10 Year','15 Year','20 Year'))