6th of July 2017: (1) New category - interactive plots made in Tufte-style with R - the first addition is basic line plot and basic barchart with the use of package highcharter; (2) Revised slopegraph in base graphics - Thomas Leeper has implemented his slopegraph functions into development version package on GitHub; (3) Revised sparklines in lattice get a gray bands (thanks to Bryan Urban for sharing his code on this); (4) Improved quality of all example figures; (5) Commented code and removed plot for range frame plot and dot-dash plot in base graphics produced with Steven Murdoch’s custom GitHub fancyaxis function - doesn’t work anymore; (6) Commented code and removed plot for slopegraph produced with James Keirstead’s GitHubs function - doesn’t work anymore.
30th of May 2016: (1) Added a new category - marginal histogram scatterplots for base graphics and ggplot2; (2) added new sparklines in base graphics with plotSparklineTable function from epanetReader package; (3) added new slopegraph in base graphic with bumpchart function from plotrix package (thanks to Jim Lemon forthis suggestion); (4) added range-frame scatterplot in ggplot2 with function qfplot by Mikhail Popov.
16th of February 2016: (1) Added stem-and-leaf displays; (2) added sparklines in ggplot2 (thanks to Wouter Van Der Bijl for addressing my StackOverflow question); (3) back to all-in-one-page format and revised sparklines to simplify; (4) range-frame and dot-dash plots split into separate sections for clarity; (5) overlong lines in ggplot2 version of dot-dash plots is back due to recent updates in ggplot2 that crash previous solution; (6) changed data set for slopegraph to match it between graphical systems.
17th of October 2015: Thank you for the kind words regarding this project! Major changes in this update: (1) added the first two methods to create Sparklines using Base Graphics and Lattice; (2) the document has become too large to keep on one-page. Its now split into two separate pages-chapters: Chapter 1: Line plot, Boxplot, Barchart and Slopegraph and Chapter 2: Sparklines; (3) corrected overlong lines produces in y-axis when making dot-dash plots with panel.rug() for Lattice (thanks to Josh O’Brien) and with geom_rug() for ggplot2 (thanks to BondedDust); (4) added this list of updates to keep a reasonable track of changes.
28th of July 2015: Tufte in R is live!
The idea behind Tufte in R is to use R - the most powerful open-source statistical programming language - to replicate excellent visualisation practices developed by Edward Tufte. It’s not a novel approach - there are plenty of excellent R functions and related packages wrote by people who have much more expertise in programming than myself. I simply collect those resources in one place in an accessible and replicable format, adding a few bits of my own coding discoveries.
Each visualisation is provided in three graphical systems used in R: base graphics, lattice and ggplot2. As an example data I mainly use basic data sets easily accessible within R. Occasionally I use data from package psych developed by William Revelle, package MASS developed by Brian Ripley with collegues and various custom data I link via my Gist profile.
This page was produced in RMarkdown using Michael Sachs’s tuftehandout, but with a modified CSS inspired by Dave Liepmann’s Tufte CSS. Its best if you view this page on a desktop computer rather than mobile devices.
You need the most recent version of R installed on you computer. You also need a basic understanding of R and there are some great online tutorials to get you started. I also recommend R Studio as an integrated development environment for R.
I use resources from a number of R packages. You can install all those packages at once via R console using the command below:
install.packages(c("CarletonStats", "devtools", "epanetReader", "fmsb", "ggplot2", "ggthemes",
"latticeExtra", "MASS", "PerformanceAnalytics", "psych",
"plyr", "prettyR", "plotrix", "proto", "RCurl", "reshape", "reshape2"))
We start by plotting the most basic graph from page 65 of The Visual Display of Quantitative Information - a minimal line plot. This one is important because it illustrates the most elemental principle - that of minimalism with reduced ‘data-ink’. As Tufte explains, the ‘data-ink’ (total ink used to print the graphic) ratio should equal to ‘1 - proportion of graphic that can be erased without loss of data-information’. The primary challenge is therefore to modify the default graphs produced with R so that we remove as much of ‘non-data ink’ as possible. As you will soon see, this is done by subtracting and deconstructing existing R graphs to get rid of as much ‘non-data ink’ as possible.
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
pdf(width=10, height=6)
plot(y ~ x, axes=F, xlab="", ylab="", pch=16, type="b")
axis(1, at=x, label=x, tick=F, family="serif")
axis(2, at=seq(1,6,1), label=sprintf("$%s", seq(300,400,20)), tick=F, las=2, family="serif")
abline(h=6,lty=2)
abline(h=5,lty=2)
text(max(x), min(y)*2.5,"Per capita\nbudget expanditures\nin constant dollars", adj=1,
family="serif")
text(max(x), max(y)/1.08, labels="5%", family="serif")
dev.off()
library(lattice)
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
xyplot(y~x, xlab="", ylab="", pch=16, col=1, border = "transparent", type="o",
abline=list(h = c(max(y),max(y)-1), lty = 2),
scales=list(x=list(at=x,labels=x, fontfamily="serif", cex=1),
y=list(at=seq(1,6,1), fontfamily="serif", cex=1,
label=sprintf("$%s",seq(300,400,20)))),
par.settings = list(axis.line = list(col = "transparent"), dot.line=list(lwd=0)),
axis = function(side, line.col = "black", ...) {
if(side %in% c("left","bottom")) {axis.default(side = side, line.col = "black", ...)}})
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif",
current.panel.limits()$ylim[1]/1.3, cex=1,
"Per capita\nbudget expandures\nin constant dollars")
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif",
current.panel.limits()$ylim[1]/5.5, cex=1, "5%")
library(ggplot2)
library(ggthemes)
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
d <- data.frame(x, y)
ggplot(d, aes(x,y)) + geom_line() + geom_point(size=3) + theme_tufte(base_size = 15) +
theme(axis.title=element_blank()) + geom_hline(yintercept = c(5,6), lty=2) +
scale_y_continuous(breaks=seq(1, 6, 1), label=sprintf("$%s",seq(300,400,20))) +
scale_x_continuous(breaks=x,label=x) +
annotate("text", x = c(1977,1977.2), y = c(1.5,5.5), adj=1, family="serif",
label = c("Per capita\nbudget expandures\nin constant dollars", "5%"))
highcharterA new approach to create a dynamic plots with package highcharter - seems its based on Java wrappers and includes dedicated hc_theme_tufte().
library(highcharter)
x <- 1967:1977
y <- c(290,318,372,385,385,372,386,380,390,400,380)
d <- data.frame(x, y)
highchart() %>%
hc_chart(type = "scatter") %>%
hc_subtitle(text = "Per capita budget expanditures in constant dollars") %>%
hc_yAxis(labels = list(format = "${value}")) %>%
hc_add_series(data = d) %>%
hc_add_theme(hc_theme_tufte())
x <- mtcars$wt
y <- mtcars$mpg
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8, family="serif",
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
axis(1,at=summary(x),labels=round(summary(x),1), tick=F, family="serif")
axis(2,at=summary(y),labels=round(summary(y),1), tick=F, las=2, family="serif")
# library(devtools)
# source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
# x <- mtcars$wt
# y <- mtcars$mpg
# plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
# xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
# fancyaxis(1, summary(x), digits=1)
# fancyaxis(2, summary(y), digits=1)
library(lattice)
x <- mtcars$wt
y <- mtcars$mpg
xyplot(y ~ x, mtcars, col=1, pch=16, fontfamily="serif",
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
par.settings = list(axis.line = list(col="transparent"),
par.xlab.text=list(fontfamily="serif"),
par.ylab.text=list(fontfamily="serif")),
scales = list(x=list(at=summary(mtcars$wt),labels=round(summary(mtcars$wt),1),
fontfamily="serif"),
y=list(at=summary(mtcars$mpg),labels=round(summary(mtcars$mpg),1),
fontfamily="serif")),
axis = function(side, line.col = "black", ...) {
if(side %in% c("left","bottom")) {axis.default(side = side, line.col = "black", ...)}})
library(ggplot2)
library(ggthemes)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_rangeframe() + theme_tufte() +
xlab("Car weight (lb/1000)") + ylab("Miles per gallon of fuel") +
theme(axis.title.x = element_text(vjust=-0.5), axis.title.y = element_text(vjust=1.5))
qfplotlibrary(devtools)
source_url('https://raw.githubusercontent.com/bearloga/Quartile-frame-Scatterplot/master/qfplot.R')
qfplot(x=mtcars$wt, y=mtcars$mpg, xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
# library(devtools)
# source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
# x <- mtcars$wt
# y <- mtcars$mpg
# plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
# xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
# xlim=c(min(x)-0.2, max(x)+0.2),
# ylim=c(min(y)-1.5, max(y)+1.5))
# axis(1, tick=F)
# axis(2, tick=F, las=2)
# minimalrug(x, side=1, line=-0.8)
# minimalrug(y, side=2, line=-0.8)
library(lattice)
x <- mtcars$wt
y <- mtcars$mpg
xyplot(y ~ x, xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
par.settings = list(axis.line = list(col="transparent")),
panel = function(x, y,...) {
panel.xyplot(x, y, col=1, pch=16)
panel.rug(x, y, col=1, x.units = rep("snpc", 2), y.units = rep("snpc", 2), ...)})
library(ggplot2)
library(ggthemes)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_rug() + theme_tufte(ticks=F) +
xlab("Car weight (lb/1000)") + ylab("Miles per gallon of fuel") +
theme(axis.title.x = element_text(vjust=-0.5), axis.title.y = element_text(vjust=1))
library(devtools)
source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
x <- faithful$waiting
y <- faithful$eruptions
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
xlab="Time till next eruption (min)", ylab="Duration (sec)",
xlim=c(min(x)/1.1, max(x)), ylim=c(min(y)/1.5, max(y)))
axis(1, tick=F)
axis(2, tick=F, las=2)
axisstripchart(faithful$waiting, 1)
axisstripchart(faithful$eruptions, 2)
library(ggplot2)
library(ggExtra)
library(ggthemes)
p <- ggplot(faithful, aes(waiting, eruptions)) + geom_point() + theme_tufte(ticks=F)
ggMarginal(p, type = "histogram", fill="transparent")
However, ggMarginal can be also used to quickly create margin densityplots using the same function:
library(ggplot2)
library(ggExtra)
library(ggthemes)
p <- ggplot(faithful, aes(waiting, eruptions)) + geom_point() + theme_tufte(ticks=F) +
theme(axis.title=element_blank(), axis.text=element_blank())
ggMarginal(p, type = "density")
…and it can also be used to create margin boxplots:
library(ggplot2)
library(ggExtra)
library(ggthemes)
p <- ggplot(faithful, aes(waiting, eruptions)) + geom_point() + theme_tufte(ticks=F) +
theme(axis.title=element_blank(), axis.text=element_blank())
ggMarginal(p, type = "boxplot", size=10, fill="transparent")
x <- quakes$mag
y <- quakes$stations
boxplot(y ~ x, main = "", axes = FALSE, xlab=" ", ylab=" ",
pars = list(boxcol = "transparent", medlty = "blank", medpch=16, whisklty = c(1, 1),
medcex = 0.7, outcex = 0, staplelty = "blank"))
axis(1, at=1:length(unique(x)), label=sort(unique(x)), tick=F, family="serif")
axis(2, las=2, tick=F, family="serif")
text(min(x)/3, max(y)/1.1, pos = 4, family="serif",
"Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)")
chart.Boxplotlibrary(PerformanceAnalytics)
library(psych)
d <- msq[,80:84]
chart.Boxplot(d, main = "", xlab="average personality rating (based on n=3896)", ylab="",
element.color = "transparent", as.Tufte=TRUE)
x <- quakes$mag
y <- quakes$stations
bwplot(y ~ x, horizontal=F, xlab="", ylab="", do.out = FALSE, box.ratio = 0,
scales=list(x=list(labels=sort(unique(x)), fontfamily="serif"),
y=list(fontfamily="serif")),
par.settings = list(axis.line = list(col = "transparent"), box.umbrella=list(lty=1, col= 1),
box.dot=list(col= 1), box.rectangle = list(col= c("transparent"))))
ltext(current.panel.limits()$xlim[1]+250, adj=1,
current.panel.limits()$ylim[2]+50, fontfamily="serif",
"Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)")
library(ggplot2)
library(ggthemes)
ggplot(quakes, aes(factor(mag),stations)) + theme_tufte() +
geom_tufteboxplot(outlier.colour="transparent") + theme(axis.title=element_blank()) +
annotate("text", x = 8, y = 120, adj=1, family="serif",
label = c("Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)"))
library(psych)
d <- colMeans(msq[,c(2,7,34,36,42,43,46,55,68)], na.rm = T)*10
barplot(d, xaxt="n", yaxt="n", ylab="", border=F, width=c(.35), space=1.8)
axis(1, at=(1:length(d))-.26, labels=names(d), tick=F, family="serif")
axis(2, at=seq(1, 5, 1), las=2, tick=F, family="serif")
abline(h=seq(1, 5, 1), col="white", lwd=3)
abline(h=0, col="gray", lwd=2)
text(min(d)/2, max(d)/1.2, pos = 4, family="serif",
"Average scores\non negative emotion traits\nfrom 3896 participants\n(Watson et al., 1988)")
library(lattice)
library(psych)
d <- colMeans(msq[,c(2,7,34,36,42,43,46,55,68)],na.rm = T)*10
barchart(sort(d), xlab="", ylab="", col = "grey", origin=1,
border = "transparent", box.ratio=0.5,
panel = function(x,y,...) {
panel.barchart(x,y,...)
panel.abline(v=seq(1,6,1), col="white", lwd=3)},
par.settings = list(axis.line = list(col = "transparent")))
ltext(current.panel.limits()$xlim[2]-50, adj=1,
current.panel.limits()$ylim[1]-100,
"Average scores\non negative emotion traits\nfrom 3896 participants\n(Watson et al., 1988)")
library(ggplot2)
library(ggthemes)
library(psych)
library(reshape2)
d <- melt(colMeans(msq[,c(2,7,34,36,42,43,46,55,68)],na.rm = T)*10)
d$trait <- rownames(d)
ggplot(d, aes(x=trait, y=value)) + theme_tufte(base_size=14, ticks=F) +
geom_bar(width=0.25, fill="gray", stat = "identity") + theme(axis.title=element_blank()) +
scale_y_continuous(breaks=seq(1, 5, 1)) +
geom_hline(yintercept=seq(1, 5, 1), col="white", lwd=1) +
annotate("text", x = 3.5, y = 5, adj=1, family="serif",
label = c("Average scores\non negative emotion traits
from 3896 participants\n(Watson et al., 1988)"))
highcharterlibrary(psych)
library(reshape)
library(highcharter)
values <- 1 + abs(rnorm(12))
d <- melt(colMeans(msq[,c(2,7,34,36,42,43,46,55,68)], na.rm = T)*10)
trait <- row.names(d)
value <- as.vector(d[,1])
highchart() %>%
hc_chart(type = "column") %>%
hc_add_series(data = value) %>%
hc_xAxis(categories = row.names(d)) %>%
hc_add_theme(hc_theme_tufte2())
The most promising slopegraph functions for base graphics and ggplot2 comes from Thomas Leeper slopegraph package. Thomas’s solutions have evolved gradually and it’s now the most efficient method to create slopegraphs in R. However, a major limitation is inability to efficently offset left and right side labels to avoid don’t overlap (as seen below).
library(devtools)
#install_github("leeper/slopegraph")#install Leeper's package from Github
library(slopegraph)
data(cancer)
slopegraph(cancer, col.lines = 'gray', col.lab = 1, col.num = 1,
xlim = c(-.2,5),
main = "Estimate of % survival rates",
xlabels = c('5 Year','10 Year','15 Year','20 Year'))