厚缊

诹图——ggcor简介(七)

厚缊 / 2020-02-07


我感觉很多朋友对于ggcor的期待就是能比较方便的处理science的这个组合图,这也是ggcor项目起步的地方,但是作为作者,我还是要说ggcor是一个完整的体系,每个环节都相互关联,要想组合图处理得好,其它内容也必须要了解。

缘起

science的这幅图可以拆解成两个部分,右上角是相关系数矩阵热图,看上去是corrplot包画了做了一点后期处理,左下角是群落和每个环境因子之间mantel 相关性的连接线,这部分听说原作者是用Cytoscape画的,我并懂。

早在2019年4月,不知天高地厚的我看到这幅图用base 绘图系统试了下(不怕被绕晕的戳这里),貌似也可以画,就是比较麻烦。正是那一版很麻烦,绕晕了很多圈内的朋友,所以承诺出个更简洁的版本,这个版本就是现在看到的ggcor

add_link()函数

基本函数是add_link(),是原science组合图左下角连接线元素的图层对象打包,因此add_link()只能配合ggplot对象使用,或者更直白点配合ggcor()或者quickcor()函数使用。

library(ggcor)
args(add_link)
## function (df, mapping = NULL, spec.key = "spec", env.key = "env", 
##     curvature = NULL, spec.label.hspace = NULL, spec.label.vspace = 0, 
##     on.left = FALSE, diag.label = FALSE, extra.params = extra_params(), 
##     ...) 
## NULL

add_link()的第一个参数是表示连接线关系的数据框,其它参数都是可选的。当输入不是mantel_tbl(即mantel_test()或者fortify_mantel()函数的结果)时,一定得设置spec.keyenv.key参数。

data("varespec", package = "vegan")
data("varechem", package = "vegan")
df <- mantel_test(varespec, varechem)
quickcor(varechem, type = "upper") +
  geom_square() +
  add_link(df)

看到这个图应该很头疼,除了黑漆漆的几根连接线啥没有,那是因为我们还没有设置mapping参数。

library(ggplot2)
df <- df %>% 
  mutate(lty = cut(r, breaks = c(-Inf, 0, Inf), 
                   labels = c("r <= 0", "r > 0")),
         col = cut(p.value, breaks = c(0, 0.01, 0.05, 1),
                   labels = c("< 0.01", "< 0.05", ">= 0.05"),
                   right = FALSE, include.lowest = TRUE))
quickcor(varechem, type = "upper") +
  geom_square() +
  add_link(df, mapping = aes(colour = col,
                             size = r,
                             linetype = lty)) +
  scale_fill_gradient2n() +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

其实还可以换个方向的。

quickcor(varechem, type = "lower") +
  geom_square() +
  add_link(df, mapping = aes(colour = col,
                             size = r,
                             linetype = lty)) +
  scale_fill_gradient2n() +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

细节优化

对于这样比较复杂的组合图,控制细节需要花费更多的精力。

对角线标签

当相关系数矩阵变量较多时,需要把坐标轴标签放在对角线,才能更容易判断每条连接线是表达的那组变量之间的关系。当使用了对角线坐标,一定要在add_link()函数中设置diag.label = TRUE

quickcor(varechem, type = "lower") +
  geom_square() +
  add_diag_label() +
  add_link(df, mapping = aes(colour = col,
                             size = r,
                             linetype = lty), diag.label = TRUE) +
  remove_axis("y") +
  scale_fill_gradient2n() +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

连接线标记

连接线标记包括标记点和群落的文本标签,这些均可以通过extra.params参数来调整,该参数只能通过extra_params()函数进行设置,用法和ggplot2中的theme()函数比较相似。

args(extra_params)
## function (spec.label = text_params(), spec.point = point_params(fill = "blue"), 
##     env.point = point_params(fill = "grey60"), link.params = link_params()) 
## NULL
extra_params()
## $spec.label
## $colour
## [1] "black"
## 
## $size
## [1] 3.88
## 
## $angle
## [1] 0
## 
## $hjust
## NULL
## 
## $vjust
## [1] 0.5
## 
## $alpha
## [1] NA
## 
## $family
## [1] ""
## 
## $fontface
## [1] 1
## 
## attr(,"class")
## [1] "text_params"
## 
## $spec.point
## $alpha
## [1] NA
## 
## $colour
## [1] "black"
## 
## $fill
## [1] "blue"
## 
## $shape
## [1] 21
## 
## $size
## [1] 1
## 
## $stroke
## [1] 0.5
## 
## attr(,"class")
## [1] "point_params"
## 
## $env.point
## $alpha
## [1] NA
## 
## $colour
## [1] "black"
## 
## $fill
## [1] "grey60"
## 
## $shape
## [1] 21
## 
## $size
## [1] 1
## 
## $stroke
## [1] 0.5
## 
## attr(,"class")
## [1] "point_params"
## 
## $link.params
## $env.point.hjust
## NULL
## 
## $env.point.vjust
## NULL
## 
## $spec.point.hjust
## NULL
## 
## $spec.point.vjust
## NULL
## 
## attr(,"class")
## [1] "link_params"
## 
## attr(,"class")
## [1] "extra_params"

文本标签字号和颜色

群落文本标签是通过spec.label控制的,该参数通过text_params()函数设置。

extra.params <- extra_params(
  spec.label = text_params(colour = "red", size = 7)
)
quickcor(varechem, type = "lower") +
  geom_square() +
  add_diag_label() +
  add_link(df, mapping = aes(colour = col,
                             size = r,
                             linetype = lty), 
           diag.label = TRUE, extra.params = extra.params) +
  remove_axis("y") +
  scale_fill_gradient2n() +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

标记点包括靠近相关系数矩阵一侧的标记点(在ggcor中成为env.point)和群落标记点(在ggcor中成为spec.point)。

extra.params <- extra_params(
  spec.label = text_params(colour = "red", size = 7),
  env.point = point_params(size = 3, fill = "grey80"),
  spec.point = point_params(size = 4, shape = 24, fill = "red")
)
quickcor(varechem, type = "lower") +
  geom_square() +
  add_diag_label() +
  add_link(df, mapping = aes(colour = col,
                             size = r,
                             linetype = lty), 
           diag.label = TRUE, extra.params = extra.params) +
  remove_axis("y") +
  expand_axis(x = 16) + ## 扩展x轴范围
  scale_fill_gradient2n() +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

连接线位置调整

上面几幅图都存在一个问题:文本标签被覆盖,导致图形显得很凌乱。这个问题的解决方案就是根据文本标签的长度调整下连接线起始点的位置,调整的方法还是extra.params参数。

extra.params <- extra_params(
  spec.label = text_params(colour = "red", size = 7),
  env.point = point_params(size = 2, fill = "grey80"),
  spec.point = point_params(size = 4, shape = 24, fill = "red"),
  link.params = link_params(env.point.hjust = -0.5,
                            env.point.vjust = -0.1,
                            spec.point.hjust = 1)
)
quickcor(varechem, type = "lower") +
  geom_square() +
  add_link(df, mapping = aes(colour = col,
                             size = r,
                             linetype = lty), 
           diag.label = TRUE, spec.label.hspace = 0.5,
           extra.params = extra.params) +
  add_diag_label(angle = 45) +
  remove_axis("y") +
  expand_axis(x = 16) + ## 扩展x轴范围
  scale_fill_gradient2n() +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

连接线颜色

看来看去,上面那幅图还有个不那么好的地方,就是不显著的(p >= 0.05)的颜色太靓丽了,变成不那么突出的灰色更好了。这个问题稍微懂一点ggplot2的就知道不难,直接通过scale_colour_manual()函数设置就行了。

extra.params <- extra_params(
  spec.label = text_params(colour = "red", size = 7),
  env.point = point_params(size = 2, fill = "grey80"),
  spec.point = point_params(size = 4, shape = 24, fill = "red"),
  link.params = link_params(env.point.hjust = -0.5,
                            env.point.vjust = -0.1,
                            spec.point.hjust = 1)
)
quickcor(varechem, type = "lower") +
  geom_square() +
  add_link(df, mapping = aes(colour = col,
                             size = r,
                             linetype = lty), 
           diag.label = TRUE, spec.label.hspace = 0.5,
           extra.params = extra.params) +
  add_diag_label(angle = 45) +
  remove_axis("y") +
  expand_axis(x = 16) + ## 扩展x轴范围
  scale_fill_gradient2n() +
  scale_colour_manual(values = c("#D95F02", "#1B9E77", "#D2D2D2")) +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

多个群落

多个群落和上面所列的单个群落的处置方式并没有什么不一样,这里给个例子就行老了。

df02 <- mantel_test(varespec, varechem, 
                    spec.select = list(spec01 = 1:5,
                                       spec02 = 6:12,
                                       spec03 = 13:17)) %>% 
  mutate(lty = cut(r, breaks = c(-Inf, 0, Inf), 
                   labels = c("r <= 0", "r > 0")),
         col = cut(p.value, breaks = c(0, 0.01, 0.05, 1),
                   labels = c("< 0.01", "< 0.05", ">= 0.05"),
                   right = FALSE, include.lowest = TRUE))
extra.params <- extra_params(
  spec.label = text_params(colour = "red", size = 7),
  env.point = point_params(size = 2, fill = "grey80"),
  spec.point = point_params(size = 4, shape = 24, fill = "red"),
  link.params = link_params(env.point.hjust = -0.5,
                            env.point.vjust = -0.1,
                            spec.point.hjust = 1)
)
quickcor(varechem, type = "lower") +
  geom_square() +
  add_link(df02, mapping = aes(colour = col,
                             size = r,
                             linetype = lty), 
           diag.label = TRUE, spec.label.hspace = 0.5,
           extra.params = extra.params) +
  add_diag_label(angle = 45) +
  remove_axis("y") +
  expand_axis(x = 25) + ## 扩展x轴范围
  scale_fill_gradient2n() +
  scale_colour_manual(values = c("#D95F02", "#1B9E77", "#D2D2D2")) +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

非mantel 检验的数据

其它数据集也是可以的,我们用相关性分析模拟了一个例子,设置的重点是spec.keyenv.key参数,分别指定其它数据集中哪个变量当成群落处理,哪个变量是和环境相关系数矩阵相对应的。

df03 <- fortify_cor(varespec[1:4], varechem, cor.test = TRUE) %>% 
  mutate(p.value = cut(p.value, breaks = c(-Inf, 0.05, Inf),
                       labels = c("<0.05", ">=0.05"), right = FALSE))
quickcor(varechem, type = "upper") + geom_square() + 
  add_link(df03, mapping = aes(colour = p.value, size = r), alpha = 0.5,
           spec.key = ".row.names", env.key = ".col.names",
           diag.label = TRUE) +
  scale_size_continuous(range = c(0.2, 1.5))  +
  scale_color_manual(values = c("#D95F02", "#CCCCCC")) +
  add_diag_label() + 
  expand_axis(x = -5) +
  guides(colour = guide_legend(title = "p value", order = 2), 
         size = guide_legend(title = "correlation", order = 3)) +
  remove_axis("y")

非对称的相关系数矩阵

我觉得非对称的情况很少,但是既然有这么个功能,就简单介绍下这个怎么处理。

corr <- fortify_cor(varechem, varespec[1:6])
extra.params <- extra_params(
  spec.label = text_params(colour = "red", size = 7),
  env.point = point_params(size = 2, fill = "grey80"),
  spec.point = point_params(size = 4, shape = 24, fill = "red"),
  link.params = link_params(spec.point.hjust = 3)
)
quickcor(corr) +
  geom_square() +
  add_link(df02, mapping = aes(colour = col,
                             size = r,
                             linetype = lty), 
           spec.label.hspace = 0.5,
           extra.params = extra.params) +
  expand_axis(x = 15) +
  scale_fill_gradient2n() +
  scale_colour_manual(values = c("#D95F02", "#1B9E77", "#D2D2D2")) +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

放在另外一边也是可以的。

extra.params <- extra_params(
  spec.label = text_params(colour = "red", size = 7),
  env.point = point_params(size = 2, fill = "grey80"),
  spec.point = point_params(size = 4, shape = 24, fill = "red"),
  link.params = link_params(spec.point.hjust = -3)
)
quickcor(corr, axis.y.position = "right") +
  geom_square() +
  add_link(df02, mapping = aes(colour = col,
                             size = r,
                             linetype = lty), 
           on.left = TRUE, spec.label.hspace = -0.5,
           extra.params = extra.params) +
  expand_axis(x = -9) +
  scale_fill_gradient2n() +
  scale_colour_manual(values = c("#D95F02", "#1B9E77", "#D2D2D2")) +
  scale_size_area(max_size = 3) +
  scale_linetype_manual(values = c("dotted", "solid")) +
  guides(
    fill = guide_colourbar(title = "corr", order = 1),
    colour = guide_legend(title = "Mantel's p", order = 2),
    size = guide_legend(title = "Mantel's r", order = 3),
    linetype = "none"
  )

小结

这篇文章可能看着很繁琐,基本每个图的代码都是在上一次的基础上做了极小的改动,之所以我还是没幅图都给了几乎全部的代码(宁愿重复),是觉得日后别人在翻阅的时候,随便找到某一幅图都能马上进行重复,而不用向前来来回回找这个变量在哪里定义的。

下期预告:相关性网络数据处理。