- Updates
- Introduction
- Minimal line plot
- Range-frame (or quartile-frame) scatterplot
- Dot-dash (or rug) scatterplot
- Marginal histogram scatterplot
- Minimal boxplot
- Minimal barchart
- Slopegraph
- Sparklines
- Stem-and-leaf display
- Discussion

Updates

30th of May 2016: (1) Added a new category - marginal histogram scatterplots for *base graphics* and *ggplot2*; (2) added new sparklines in base graphics with `plotSparklineTable`

function from epanetReader package; (3) added new slopegraph in base graphic with `bumpchart`

function from plotrix package (thanks to Jim Lemon forthis suggestion); (4) added range-frame scatterplot in *ggplot2* with function `qfplot`

by Mikhail Popov.

16th of February 2016: (1) Added stem-and-leaf displays; (2) added sparklines in *ggplot2* (thanks to Wouter Van Der Bijl for addressing my StackOverflow question); (3) back to all-in-one-page format and revised sparklines to simplify; (4) range-frame and dot-dash plots split into separate sections for clarity; (5) overlong lines in *ggplot2* version of *dot-dash plots* is back due to recent updates in *ggplot2* that crash previous solution; (6) changed data set for slopegraph to match it between graphical systems.

17th of October 2015: Thank you for the kind words regarding this project! Major changes in this update: (1) added the first two methods to create Sparklines using *Base Graphics* and *Lattice*; (2) the document has become too large to keep on one-page. Its now split into two separate pages-chapters: *Chapter 1: Line plot, Boxplot, Barchart and Slopegraph* and *Chapter 2: Sparklines*; (3) corrected overlong lines produces in y-axis when making *dot-dash plots* with `panel.rug()`

for *Lattice* (thanks to Josh O’Brien) and with `geom_rug()`

for *ggplot2* (thanks to BondedDust); (4) added this list of updates to keep a reasonable track of changes.

28th of July 2015: *Tufte in R* is live!

Introduction

The idea behind *Tufte in R* is to use *R* - the most powerful open-source statistical programming language - to replicate excellent visualisation practices developed by Edward Tufte. It’s not a novel approach - there are plenty of excellent *R* functions and related packages wrote by people who have much more expertise in programming than myself. I simply collect those resources in one place in an accessible and replicable format, adding a few bits of my own coding discoveries.

Each visualisation is provided in three graphical systems used in *R*: base graphics, lattice and ggplot2. As an example data I mainly use basic data sets easily accessible within *R*. Occasionally I use data from package `psych`

developed by William Revelle, package `MASS`

developed by Brian Ripley with collegues and various custom data I link via my *Gist* profile.

This page was produced in *RMarkdown* using Michael Sachs’s `tuftehandout`

, but with a modified CSS inspired by Dave Liepmann’s *Tufte CSS*. Its best if you view this page on a desktop computer rather than mobile devices.

You need the most recent version of *R* installed on you computer. You also need a basic understanding of *R* and there are some great online tutorials to get you started. I also recommend *R Studio* as an integrated development environment for *R*.

I use resources from a number of *R* packages. You can install all those packages at once via *R* console using the command below:

```
install.packages(c("CarletonStats", "devtools", "epanetReader", "fmsb", "ggplot2", "ggthemes",
"latticeExtra", "MASS", "PerformanceAnalytics", "psych",
"plyr", "prettyR", "plotrix", "proto", "RCurl", "reshape", "reshape2"))
```

Minimal line plot

We start by plotting the most basic graph from page 65 of *The Visual Display of Quantitative Information* - a minimal line plot. This one is important because it illustrates the most elemental principle - that of minimalism with reduced ‘data-ink’. As Tufte explains, the ‘data-ink’ (total ink used to print the graphic) ratio should equal to ‘1 - proportion of graphic that can be erased without loss of data-information’. The primary challenge is therefore to modify the default graphs produced with *R* so that we remove as much of ‘non-data ink’ as possible. As you will soon see, this is done by subtracting and deconstructing existing *R* graphs to get rid of as much ‘non-data ink’ as possible.

```
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
pdf(width=10, height=6)
plot(y ~ x, axes=F, xlab="", ylab="", pch=16, type="b")
axis(1, at=x, label=x, tick=F, family="serif")
axis(2, at=seq(1,6,1), label=sprintf("$%s", seq(300,400,20)), tick=F, las=2, family="serif")
abline(h=6,lty=2)
abline(h=5,lty=2)
text(max(x), min(y)*2.5,"Per capita\nbudget expanditures\nin constant dollars", adj=1,
family="serif")
text(max(x), max(y)/1.08, labels="5%", family="serif")
dev.off()
```

```
library(lattice)
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
xyplot(y~x, xlab="", ylab="", pch=16, col=1, border = "transparent", type="o",
abline=list(h = c(max(y),max(y)-1), lty = 2),
scales=list(x=list(at=x,labels=x, fontfamily="serif", cex=1),
y=list(at=seq(1,6,1), fontfamily="serif", cex=1,
label=sprintf("$%s",seq(300,400,20)))),
par.settings = list(axis.line = list(col = "transparent"), dot.line=list(lwd=0)),
axis = function(side, line.col = "black", ...) {
if(side %in% c("left","bottom")) {axis.default(side = side, line.col = "black", ...)}})
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif",
current.panel.limits()$ylim[1]/1.3, cex=1,
"Per capita\nbudget expandures\nin constant dollars")
ltext(current.panel.limits()$xlim[2]/1.1, adj=1, fontfamily="serif",
current.panel.limits()$ylim[1]/5.5, cex=1, "5%")
```

```
library(ggplot2)
library(ggthemes)
x <- 1967:1977
y <- c(0.5,1.8,4.6,5.3,5.3,5.7,5.4,5,5.5,6,5)
d <- data.frame(x, y)
ggplot(d, aes(x,y)) + geom_line() + geom_point(size=3) + theme_tufte(base_size = 15) +
theme(axis.title=element_blank()) + geom_hline(yintercept = c(5,6), lty=2) +
scale_y_continuous(breaks=seq(1, 6, 1), label=sprintf("$%s",seq(300,400,20))) +
scale_x_continuous(breaks=x,label=x) +
annotate("text", x = c(1977,1977.2), y = c(1.5,5.5), adj=1, family="serif",
label = c("Per capita\nbudget expandures\nin constant dollars", "5%"))
```

Range-frame (or quartile-frame) scatterplot

```
x <- mtcars$wt
y <- mtcars$mpg
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8, family="serif",
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
axis(1,at=summary(x),labels=round(summary(x),1), tick=F, family="serif")
axis(2,at=summary(y),labels=round(summary(y),1), tick=F, las=2, family="serif")
```

```
library(devtools)
source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
x <- mtcars$wt
y <- mtcars$mpg
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
fancyaxis(1, summary(x), digits=1)
fancyaxis(2, summary(y), digits=1)
```

```
library(lattice)
x <- mtcars$wt
y <- mtcars$mpg
xyplot(y ~ x, mtcars, col=1, pch=16, fontfamily="serif",
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
par.settings = list(axis.line = list(col="transparent"),
par.xlab.text=list(fontfamily="serif"),
par.ylab.text=list(fontfamily="serif")),
scales = list(x=list(at=summary(mtcars$wt),labels=round(summary(mtcars$wt),1),
fontfamily="serif"),
y=list(at=summary(mtcars$mpg),labels=round(summary(mtcars$mpg),1),
fontfamily="serif")),
axis = function(side, line.col = "black", ...) {
if(side %in% c("left","bottom")) {axis.default(side = side, line.col = "black", ...)}})
```

```
library(ggplot2)
library(ggthemes)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_rangeframe() + theme_tufte() +
xlab("Car weight (lb/1000)") + ylab("Miles per gallon of fuel") +
theme(axis.title.x = element_text(vjust=-0.5), axis.title.y = element_text(vjust=1.5))
```

`qfplot`

```
library(devtools)
source_url('https://raw.githubusercontent.com/bearloga/Quartile-frame-Scatterplot/master/qfplot.R')
qfplot(x=mtcars$wt, y=mtcars$mpg, xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel")
```

Dot-dash (or rug) scatterplot

```
library(devtools)
source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
x <- mtcars$wt
y <- mtcars$mpg
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
xlim=c(min(x)-0.2, max(x)+0.2),
ylim=c(min(y)-1.5, max(y)+1.5))
axis(1, tick=F)
axis(2, tick=F, las=2)
minimalrug(x, side=1, line=-0.8)
minimalrug(y, side=2, line=-0.8)
```

```
library(lattice)
x <- mtcars$wt
y <- mtcars$mpg
xyplot(y ~ x, xlab="Car weight (lb/1000)", ylab="Miles per gallon of fuel",
par.settings = list(axis.line = list(col="transparent")),
panel = function(x, y,...) {
panel.xyplot(x, y, col=1, pch=16)
panel.rug(x, y, col=1, x.units = rep("snpc", 2), y.units = rep("snpc", 2), ...)})
```

```
library(ggplot2)
library(ggthemes)
ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_rug() + theme_tufte(ticks=F) +
xlab("Car weight (lb/1000)") + ylab("Miles per gallon of fuel") +
theme(axis.title.x = element_text(vjust=-0.5), axis.title.y = element_text(vjust=1))
```

Marginal histogram scatterplot

```
library(devtools)
source_url("https://raw.githubusercontent.com/sjmurdoch/fancyaxis/master/fancyaxis.R")
x <- faithful$waiting
y <- faithful$eruptions
plot(x, y, main="", axes=FALSE, pch=16, cex=0.8,
xlab="Time till next eruption (min)", ylab="Duration (sec)",
xlim=c(min(x)/1.1, max(x)), ylim=c(min(y)/1.5, max(y)))
axis(1, tick=F)
axis(2, tick=F, las=2)
axisstripchart(faithful$waiting, 1)
axisstripchart(faithful$eruptions, 2)
```