厚缊

诹图——轮廓图

厚缊 / 2019-04-10


转眼之间,差不多两周的时间没有更新了,不是因为忙,也不是因为时间久了就没有热情了,是我重新审视了前面的绘图代码,十分凌乱,没有成型的章法,可重用性很低。为了解决这个问题,头脑发热的想把绘图语句封装成可重用的函数,一来方便自己,二来还抱着能帮助到其他人解决问题的幻想。从上周末开始折腾“profile plot”(轮廓图)函数,持续折腾了三天才有个样子,但是当我用不同数据集测试时,总是发现各种各样的bug,问题主要集中在我希望封装的函数能像ggplot2一样,能自动化处理标题、副标题、注释、图例等问题,可惜水平不够,总是在控制距离上失手。后来放弃了做这样复杂的函数,把标题、图例等问题留在调用时手动去添加,这样省了写函数的时间,增加了绘图时间,算是一种权衡。

为了写这个函数,反反复复翻阅了无数次par()帮助文件,查看了很多遍graphics包的函数清单,才算真正入了base 绘图的大门。总结一点,真正自己动手写绘图函数,最难的是距离控制,你得知道,每个图形元素在哪里、宽度(height)多少、高度(height)多少,留多了空白太大,图形不美观,留少了图形元素可能溢出或相互覆盖。今天能大致回顾下可能遇到什么问题,怎么避免踏入雷区。

基础绘图系统的位置和距离

基础绘图系统的位置控制主要有绝对距离(英寸)和相对距离(行数),绝对距离定位比较精准,但当缩放尺寸(cex)、行高(lheight)等变化时容易导致绘图单元相互重叠,相对距离能跟着其它参数一起变化,但这些距离关系往往在指定相关的参数之后就已经确定,不能后续添加的图形元素改变了这些参数,绘图距离会失去控制。

绘图区域

要理解基础绘图系统的位置和距离,首先要知道基础绘图区域的关系。如下示例图所示,基础绘图包含绘图区(plot region)、图像区(figure region)和图像区外边距(out figure margin),其中图像区包含包含绘图区。全局绘图参数marmai用来设置绘图区到图像区外边距之间的距离(我一般称为内边距,与外边距相对应),omromi用来设置图像区外边距,区别是maromr的单位是行数,maiomi的单位是英寸(一英寸约等于2.54厘米)。这四个参数均是长度为4的向量,依次分别表示下、左、上、右的边距大小。例如mai = c(0.5, 0.5, 0.5, 0.5)表示内边距从下、左、上、右均是0.5英寸。图形设备绘制时,先预留内、外边距,余下部分分配给绘图区。

par(
  mai = c(0.5, 0.5, 0.5, 0.5),
  omi = c(0.5, 0.5, 0.5, 0.5),
  family = 'Arial'
)
plot(1:3, type = 'n', axes = FALSE)
xy <- par('usr') #获取绘图区坐标轴范围
fin <- par('fin') #获取图像区尺寸(英寸)
#计算横坐标每坐标单位的绝对长度(英寸)
x_units <- (xy[2] - xy[1]) / (fin[1] - 1) 
#计算纵坐标每坐标单位的绝对长度(英寸)
y_units <- (xy[4] - xy[3]) / (fin[2] - 1)

#计算图像区的坐标范围
x_min_xpd <- xy[1] - 0.5 * x_units
x_max_xpd <- xy[2] + 0.5 * x_units
y_min_xpd <- xy[3] - 0.5 * y_units
y_max_xpd <- xy[4] + 0.5 * y_units

#分别添加绘图区、图像区阴影矩形
rect(xy[1], xy[3], xy[2], xy[4], col = '#66C2A544', xpd = TRUE)
rect(x_min_xpd, y_min_xpd, x_max_xpd, y_max_xpd, 
     col = '#FC8D6244', xpd = TRUE)

mtext('out figure margin', side = 1, line = 0.8, adj = 0, col = 'red', 
      cex = 1.2, outer = TRUE)
mtext('figure region', side = 1, line = 0.8, adj = 0, col = 'red', 
      cex = 1.2)
text(2, 2, labels = 'plot region',col = 'red')

mtext('mai[1]  mar[1]', side = 1, line = 0.7, adj = 0.5)
mtext('omi[1]  omr[1]', side = 1, line = 0.7, adj = 0.5, outer = TRUE)
mtext('mai[2]  mar[2]', side = 2, line = 0.7, adj = 0.5)
mtext('omi[2]  omr[2]', side = 2, line = 0.7, adj = 0.5, outer = TRUE)
mtext('mai[3]  mar[3]', side = 3, line = 0.7, adj = 0.5)
mtext('omi[3]  omr[3]', side = 3, line = 0.7, adj = 0.5, outer = TRUE)
mtext('mai[4]  mar[4]', side = 4, line = 0.7, adj = 0.5)
mtext('omi[4]  omr[4]', side = 4, line = 0.7, adj = 0.5, outer = TRUE)

mtext('line = 0, adj = 0, outer = FALSE', side = 3, line = 0,
      adj = 0, cex = 0.8, outer = FALSE)
mtext('line = 1, adj = 0, outer = FALSE', side = 3, line = 1,
      adj = 0, cex = 0.8, outer = FALSE)
mtext('line = 0, adj = 1, outer = FALSE', side = 3, line = 0,
      adj = 1, cex = 0.8, outer = FALSE)
mtext('line = 1, adj = 1, outer = FALSE', side = 3, line = 1,
      adj = 1, cex = 0.8, outer = FALSE)

mtext('line = 0, adj = 0, outer = TRUE', side = 3, line = 0,
      adj = 0, cex = 0.8, outer = TRUE)
mtext('line = 1, adj = 0, outer = TRUE', side = 3, line = 1,
      adj = 0, cex = 0.8, outer = TRUE)
mtext('line = 0, adj = 1, outer = TRUE', side = 3, line = 0,
      adj = 1, cex = 0.8, outer = TRUE)
mtext('line = 1, adj = 1, outer = TRUE', side = 3, line = 1,
      adj = 1, cex = 0.8, outer = TRUE)

绘图区,即坐标轴xlimylim所围成的区域,主要是基础绘图函数添加点、线、面、文本等绘图元素的区域。内边距一般是添加坐标轴、坐标轴标签、坐标轴标题、文本等元素的,当然,可以通过xpd = TRUE将坐标轴范围延伸到图像区。外边距只能通过mtext()添加文本(需要设置参数outer = TRUE)。下面代码展示了如何利用内边距在散点图顶端和和右端分别添加A和B的密度曲线。

library(scales)
par(
  mai = c(0.5, 0.5, 1.2, 1.2),
  omi = c(0.2, 0.2, 0.2, 0.2),
  family = 'Arial'
)
set.seed(20190412) ##设置随机数种子
A <- 0.5 * rnorm(100) + 1.2 * rt(100, df = 21)
B <- 1.7 * rt(100, df = 13) + 0.4 * rnorm(100, mean = 3, sd = 2.3)

dA <- unclass(density(A))
dB <- unclass(density(B))
plot(A, B, pch = 19, col = 'lightblue', cex.axis = 0.9, tcl = 0.2,
                  mgp = c(1.6, 0.3, 0))

xy <- par('usr')
fin <- par('fin')
x_units <- (xy[2] - xy[1]) / (fin[1] - 1.7)
y_units <- (xy[4] - xy[3]) / (fin[2] - 1.7)
top_range <- c(xy[4], xy[4] + 1.1 * y_units)
right_range <- c(xy[2], xy[2] + 1.1 * x_units)

A_top_x <- rescale(dA$x, to = range(A))
A_top_y <- rescale(dA$y, to = top_range) + yinch(0.05)

B_right_x <- rescale(dB$y, right_range) + xinch(0.05)
B_right_y <- rescale(dB$x, range(B))

lines(A_top_x, A_top_y, lty = 'solid', lwd = 1.5, 
      col = 'blue', xpd = TRUE)
lines(B_right_x, B_right_y, lty = 'solid', lwd = 1.5, 
      col = 'blue', xpd = TRUE)

绘图距离

第一类距离是依据绘图元素(主要是文本)计算的距离,例如文本的宽度、高度。graphics包提供了strwidth()strheight()函数计算文本的宽度和高度,结果依赖于文本缩放比例(cex)和字体(font)和返回值单位(units)等参数影响。当units = 'usr'units = 'inches'时返回值单位是英寸,前者受图形设备的影响;当units = 'figure'时返回值是图像区的比例。

第二类距离是文本行数控制的距离,这个在mtext()、坐标轴标题、标签、刻度控制上使用,如par(mgp = c(1.6, 0.3, 0))表示坐标轴标题和绘图区相距1.6行,坐标轴标签和坐标轴相距0.3行,坐标轴和绘图区相距0行。

基础绘图系统中有几个有用的转换函数,用的比较多的是xinch()yinch()函数,能把单位为英寸的绝对距离转换为坐标系统距离。下面的例子展示了如何利用距离函数精确控制绘图位置,代码有点多,不想一个个解释了。

par(
  mai = c(0, 0, 0, 0),
  omi = c(0, 0, 0, 0),
  family = 'Arial'
  )
plot.new()
plot.window(c(1, 3), c(1, 3))
width <- xinch(2.4)
height <- yinch(1.8)
box_x_min <- 2 - 1.2 * width
box_x_max <- 2 + 1.2 * width
box_y_min <- 2 - 0.9 * height
box_y_max <- 2 + 0.9 * height
rect(box_x_min, box_y_min, box_x_max, box_y_max, 
     col = '#FC8D6244', lty = 'dashed', lwd = 1.6)
x_width <- strwidth('1.8 inches width', units = 'inches')
y_width <- strwidth('1.2 inches height', units = 'inches')
x_lheight <- xinch(strheight('height', units = 'inches') + 0.2)
y_lheight <- yinch(strheight('height', units = 'inches') + 0.2)
x_str_width <- xinch(strwidth('2.4 inches width', units = 'inches') + 0.2)
y_str_width <- yinch(strwidth('1.8 inches height', units = 'inches') + 0.2)
segments(c(box_x_min, rep(box_x_max, 3)),
         c(rep(box_y_min, 2), box_y_min, box_y_max),
         c(box_x_min, box_x_max, rep(box_x_max + x_lheight, 2)),
         c(rep(box_y_min - y_lheight, 2), box_y_min, box_y_max), lwd = 1.6)
arrow_x_min <- c(2 - 0.5 * x_str_width, 2 + 0.5 * x_str_width,
                 rep(box_x_max + 0.5 * x_lheight, 2))
arrow_x_max <- c(box_x_min, box_x_max, rep(box_x_max + 0.5 * x_lheight, 2))
arrow_y_min <- c(rep(box_y_min - 0.5 * y_lheight, 2), 
                2 - 0.5 * y_str_width , 2 + 0.5 * y_str_width)
arrow_y_max <- c(rep(box_y_min - 0.5 * y_lheight, 2),
                 box_y_min, box_y_max)
arrows(arrow_x_min, arrow_y_min, arrow_x_max, arrow_y_max, length = 0.12,
       lwd = 1.6)
text(x = 2, y = box_y_min - 0.5 * y_lheight, labels = '2.4 inches width',
     adj = c(0.5, 0.5))
text(x = box_x_max + 0.5 * x_lheight, y = 2, labels = '1.8 inches width',
     adj = c(0.5, 0.5), srt = 90)

框线图绘图函数

今天的框线图,本质上就是带有数据标记点的线图,在社会调查中某个模块有一系列调查问题时,用框线图能直观呈现数据的分布情况。先造个适合画框线图的轮子。

函数总体结构不是很复杂,核心部分是需要根据左右标签长度计算左右的内边距,然后调用points()lines()函数添加数据点和线条,之后调用mtext()添加左右标签。注意,pchltyltwcolbg参数不支持自动循环补齐,当参数长度不为1且小于数据系列数(数据框的列数)时,会报错;默认颜色是当Set2系列,最大数量是8个,当数据系列高于8个时,随机从colors()函数抽取;pch在21-25之间时,背景色默认等于col参数。

profile.plot <- function(df, 
                         left.lab, # left labels
                         right.lab, # right lables
                         x.ticks.n = 5, # x axis ticks number
                         label.offset = 0.1, # space(inches) of labels and plot region
                         pch = 19, # points symbols
                         size = 1.5, # points size
                         lty = 'solid', # lines type
                         lwd = 2.5, # lines width
                         label.cex = 0.9, # label cex
                         reverse = FALSE, #if FALSE(default), start at bottom
                         axes = TRUE, # if TRUE(default), make x axis
                         grid = TRUE, # if TRUE(default), make vertical grid line
                         range.x = NULL, # the xlim
                         col = NULL, # points color
                                     # if pch in 21:25, points border color
                         bg = NULL,  # if pch in 21:25, points background color
                         midline = NULL, # NULL -> mid; number -> x; FALSE -> not draw  
                         title = NULL, # plot title
                         subtitle = NULL, # plot sub title
                         family = '', # font family
                         ... # any other par() params
)
{
  opar <- par('mai')
  on.exit(par(opar))
  n.col <- ncol(df)
  n.row <- nrow(df)
  if(!is.data.frame(df)) stop('df not a data.frame', call. = FALSE)
  if(n.row == 0) stop('df seems to a zero length data.frame', call. = FALSE)
  if(!is.null(pch)  && !any(length(pch) == 1, length(pch) == n.col)) {
    stop('params pch must be 1 or equeal the number of series', call. = FALSE)
  }
  if(length(pch) == 1) pch <- rep(pch, n.col)
  if(!is.null(lty)  && !any(length(lty) == 1, length(lty) == n.col)) {
    stop('params lty must be 1 or equeal the number of series', call. = FALSE)
  }
  if(length(lty) == 1) lty <- rep(lty, n.col)
  if(!is.null(col) && !any(length(col) == 1, length(col) == n.col)) {
    stop('params col must be 1 or equeal the number of series', call. = FALSE)
  }
  if(is.null(col)){
    if(n.col <= 8){
      col <- RColorBrewer::brewer.pal(8,'Set2')[1:n.col]
    }else{
      col <- sample(colors(), n.col)
    }
  if(length(col) == 1) col <- rep(col, n.col)
  }
  if(!is.null(bg) && !any(length(bg) == 1, length(bg) == n.col)) {
    stop('params bg must be 1 or equeal the number of series', call. = FALSE)
  }
  if(is.null(bg)) bg <- col
  if(length(bg) == 1) bg <- rep(bg, n.col)
  
  if(is.null(range.x)){
    min.x <- floor(min(df))
    max.x <- ceiling(max(df))
    mid.x <- (max.x + min.x) / 2
  }else{
    min.x <- floor(range.x[1])
    max.x <- ceiling(range.x[2])
    mid.x <- (max.x + min.x) / 2
  }
  x.ticks <- round(seq(min.x, max.x, length.out = x.ticks.n), digits = 3)
  plot.new()
  left.label.width <- max(strwidth(left.lab, units = 'inches', 
                                   cex = label.cex), na.rm = TRUE) + label.offset
  right.label.width <- max(strwidth(right.lab, units = 'inches', 
                                    cex = label.cex), na.rm = TRUE) + label.offset
  nmai <- par('mai')
  nmai[2L] <- left.label.width
  nmai[4L] <- right.label.width
  par(mai = nmai)
  plot.window(xlim = c(min.x, max.x), ylim = c(0.9, n.row))
  y <- 1:n.row
  if(grid){
    segments(min.x, y, max.x, y, lwd = lwd * 0.9, 
             lty="dotted", col="gray", ...)
    if(is.null(midline)){
      segments(mid.x, min(y), mid.x, max(y), 
             lwd = lwd * 0.9, lty="solid", col="gray", ...)  
    } 
    if(is.atomic(midline) && is.numeric(midline)){
      segments(midline, min(y), midline, max(y), 
               lwd = lwd * 0.9, lty="solid", col="gray", ...)  
    }
  }
  for (i in 1:n.col) {
    if(reverse){
      x <- rev(df[[i]])
    }else{
      x <- df[[i]]
    }
    points(x, y, cex = size, pch = pch[i], col = col[i], bg = bg[i], ...)
    lines(x, y, lty = lty[i], lwd = lwd, col = col[i], ...)
  }
  if(reverse){
    left.lab <- rev(left.lab)
    right.lab <- rev(right.lab)
  }
  mtext(left.lab, at = y, line = label.offset, cex = label.cex, col = 'grey30', 
        adj = 1, side = 2, las=2, family = family, ...)
  mtext(right.lab, at = y, line = label.offset, cex = label.cex, col = 'grey30', 
        adj = 0, side = 4, las=2, family = family, ...)
  if(axes){
    axis(side = 1, at = x.ticks, labels = x.ticks, col = 'grey30', ...)
  }
  title(main = title, sub = subtitle, ...)
  invisible(y)
}

画个图测试下profile.plot()函数的默认效果。一时没想到有什么数据特别适合用在这个图上,所有拿了mtcars数据集中比较接近的两列来测试下。第一个参数df是要画图的数据框,第二个参数left.lab是左侧文本标签,第三个参数left.lab是右侧文本标签,其它参数均是可选参数。

df <- mtcars[, 3:4]
par(
  mai = c(0.4, 0, 0.4, 0),
  omi = c(0.2, 0.2, 1, 0.2), 
  family = 'Arial')
profile.plot(df, left.lab = rownames(df), right.lab = rownames(df), 
             pch = c(21, 23), bg = c('red', 'blue'),
             x.ticks.n = 5, title = 'this is a profile plot')

框线图

乱写了一大串,也不知道讲清楚了还是没讲清,我倒是觉得可以长舒一口气,进入“诹图”环节了,不啰嗦了,还是先上图。

profileplot <- 'profileplot.pdf'
Cairo::CairoPDF(profileplot, bg = 'grey98', width = 11, height = 8)

# 准备数据(懒得找,纯手动)
data <- data.frame(
  Men = c(4.36, 4.08, 3.74, 5.49, 3.71, 5.35),
  Women = c(4.14, 3.87, 3.52, 5.11, 3.92, 5.17)
)
left_lable <- c(
  'Individuals should take more responsibility for\nproviding for themselves.',
  'People who are unemployed should\nhave to take any job available or lose\ntheir unemployment benefits.',
  'Competition is good. It stimulates\npeople to work hard and develop\nnew ideas.',
  'The government should give companies\nmore freedom.',
  'Incomes should be made more equal.',
  'Private ownership of business\nand industry should be increased.'
)
right_label <- c(
  'The state should take more\nresponsibility to ensure that everyone is\nprovided for.',
  'People who are unemployed should\nhave the right to refuse a job they\ndo not want.',
  'Competition is harmful. It brings\nout the worst in people.',
  'The state should control firms more\neffectively.',
  'There should be greater incentives\nfor individual effort.',
  'Government ownership of\nbusiness and industry should be\nincreased.'
)

# 全局绘图参数
opar <- par(no.readonly = TRUE)
par(mai = c(1, 0.1, 0.1, 0.1), # 第二个和第四参数随便设置
    omi = c(0.5, 0.4, 1.2, 0.4), # 第一个和第三个设置大一点,
                                # 预留数据来源和标题的距离。
    family = 'Arial')

# 绘图
y <- profile.plot(data, left_lable, right_label, range.x = c(3, 6), 
                  reverse = TRUE, cex.axis = 0.9, tcl = -0.2,
                  mgp = c(3, 0.5, 0))

# 其它图形元素
col <- RColorBrewer::brewer.pal(8,'Set2')[1:2]
legen.x <- c(3.05, 3.8)
legend.y <- c(0, 0)
points(legen.x, legend.y, pch = 19, cex = 2, col = col, xpd = TRUE)
text(legen.x + 0.1, legend.y, names(data), adj = 0, cex = 1, col = 'grey30', xpd = TRUE)
mtext('Now I’d ask you to please tell me your views on different statements.',
      side = 3, line = 2, adj = 0, cex = 1.5, font = 2, outer = TRUE)
mtext('Source: ZA4753: European Values Study 2008: Germany (EVS 2008). N=2,075',
      side = 1, line = 0, adj = 1, cex = 0.8, col = 'grey30', outer = TRUE)
dev.off()

有了前面造的轮子,绘图就来得容易多了,核心绘图代码15行,出图效果也比较好。调用基础绘图函数,我比较习惯手动添加图例、标题、副标题和数据来源之类的辅助元素,主要我觉得这样更容易控制一些,可能和我对legend()title()等函数的了解还不是很深入有关。

后记

这是一篇从筹划到完成花了近两周时间的文章,从出图的效果来说,我觉得还比较满意。最重要的是,为了这篇文章,各种翻阅文档,文档看不明白的反复测试,几乎重新学习一边基础绘图系统,对基础绘图系统的理解有了很大的进步。最后,还是得说,需要治愈下自己的拖延症,希望“诹图”就真的能每周一“诹”。