Edited by: Alfredo Pulvirenti, University of Catania, Italy
Reviewed by: Gregorio Iraola, Institut Pasteur de Montevideo, Uruguay; Rifat Hamoudi, University of Sharjah, United Arab Emirates
This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Venn diagrams are widely used diagrams to show the set relationships in biomedical studies. In this study, we developed ggVennDiagram, an R package that could automatically generate high-quality Venn diagrams with two to seven sets. The ggVennDiagram is built based on ggplot2, and it integrates the advantages of existing packages, such as venn, RVenn, VennDiagram, and sf. Satisfactory results can be obtained with minimal configurations. Furthermore, we designed comprehensive objects to store the entire data of the Venn diagram, which allowed free access to both intersection values and Venn plot sub-elements, such as set label/edge and region label/filling. Therefore, high customization of every Venn plot sub-element can be fulfilled without increasing the cost of learning when the user is familiar with ggplot2 methods. To date, ggVennDiagram has been cited in more than 10 publications, and its source code repository has been starred by more than 140 GitHub users, suggesting a great potential in applications. The package is an open-source software released under the GPL-3 license, and it is freely available through CRAN (
香京julia种子在线播放
A Venn diagram is a widely used diagram that shows the relationships between multiple sets. In biomedical studies, a Venn diagram is frequently used in distinguishing the membership of various types of data, such as compounds, genes, pathways, and species. When the number of sets is less than five, Venn diagrams are probably the most intuitive form of data visualization, superior to heat maps and tables.
In the R environment, one of the most popular platforms in biomedical data visualizations, many packages are available to plot a Venn diagram including VennDiagram (
Feature comparisons of currently available Venn plot tools (R packages and web tools).
ggVennDiagram | Fully support | Yes | List | Yes | Yes | Circle, ellipse, and others | 2–7 | Set edge/label, region filling/label | |
VennDiagram | No | Yes | List | No | No | Circle, ellipse | 2–5 | Set edge/label/filling/area, region label | |
colorfulVennPlot | No | No | Named vector | No | Yes | Circle, ellipse | 2–4 | Set label, region filling/label | |
venn | No |
No | List, formula, set number, Boolean values | No | Yes | Circle, ellipse, and others |
2–7 |
Set edge/label, region filling/label | |
nVennR | Partial | Yes | List | Yes | No | Irregular polygon (calculated) | 2–many | Set edge/filling/area, region label | |
eulerr | No | No | List, data frame, table, matrix, named vector | No | No | Circle, ellipse | 2–4, maybe many |
Set label/filling/area, region label | |
venneuler | No | No | Formula, matrix, character vector | No | No | Circle | 2–4, maybe many |
Set label/filling/area | |
RVenn | No | Yes |
Venn object (derived from list) | No | No | Circle | 2–3 | Set filling/edge | |
gplots | No | Yes | List, data frame | No | No | Circle, ellipse | 2–5 | Set label, region label | |
InteractiVenn | na | Yes | List (web interface) | na | No | Circle, ellipse, and Edwards | 2–6 | Set label/filling, region label | |
Venny | na | Yes | List (web interface) | na | Yes | Circle, ellipse | 2–4 | Set label, region label/filling |
However, the above-mentioned software packages also have their disadvantages. First of all, these packages have limitations in displaying the difference between various regions in a Venn diagram in spite of the capability of exhibiting the original sets. ColorfulVennPlot and venn do support region filling, but users need to manually specify colors for every region, making it too complicated to be used by ordinary users. Besides, most of these packages lack full support for grammar of graphics, resulting in the failure of adequate integration into the popular ggplot2 ecosystem. In addition, the inputs of some packages are very obscure; thus, it is time-consuming to obtain a qualified input data.
Considering this, we developed ggVennDiagram, an intuitive, easy-to-use, and customizable R package to generate Venn diagrams, which supports a two- to seven-set Venn plot and generates publication-quality figure with minimal input. Furthermore, we also developed a comprehensive Venn data structure to simplify the expansion of Venn diagrams and make the new presentation of the diagram easy in the future.
The main function “ggVennDiagram()” accepts a list input and outputs a
Data pre-processing then can be divided into two procedures: shape generation, which defines the edges of Venn sets and regions and region value calculation which calculates the region items and performs necessary statistics, such as counting and calculating percentages.
Since the returned data after data pre-processing are compatible with the
Design of ggVennDiagram.
In ggVennDiagram, we treated all the edges, labels, and polygons as simple features, which refer to a standard to describe how the objects in the real world can be presented in computers, with emphasis on the spatial geometry of these objects. A total of 15 types of simple features are implemented in R, three of which are used to describe all the components of a Venn diagram.
Firstly, the edges of sets are inherited from
To simplify the calculation of simple features, we introduce an S4 class
The shape used in the Venn diagram with less than four sets can be a simple structure, such as a circle or an ellipse, but when the Venn diagram has more than four sets, irregular polygons are required. It is hard to generate irregular polygons with simple geometric functions. Therefore, ggVennDiagram is designed to bear a built-in preprocessed shape data set imported from venn, VennDiagram, and some online materials, which undoubtedly increases the efficiency of shape generation on the user side.
Region value calculation depends on the RVenn package and new functions written on its defined
After data pre-processing, ggVennDiagram calls native ggplot2 functions to draw Venn diagrams in four layers (
Plotting method of ggVennDiagram. The default manner
As has been noted above, a set of built-in shapes from ggVennDiagram is used to plot the Venn diagram. By default, only the most appropriate shape is used when the main function “ggVennDiagram()” is called. However, other applicable shapes can be specified in a stepwise plot, which has been described in the previous section (
Application of new shapes and support for a Venn diagram of up to seven sets in ggVennDiagram.
From version 1.0, ggVennDiagram supports Venn diagrams with up to seven sets (
To date, there are three major methods to display set relationships: Venn diagram, Euler diagram, and UpSet plot (
The first version of ggVennDiagram was released on October 9th, 2019 (version 0.3). Since then, it has been applied to many biomedical research fields. For example,
Plots generated by the tools listed in
Additionally, ggVennDiagram takes the lead in the following three aspects of data processing capacity. (1) We can get access to region members by querying the
Furthermore, ggVennDiagram is superior in four aspects of visualization. (1) Region filling allows the user to easily identify the differences between various parts of the Venn diagram, and this is one of the key features of ggVennDiagram. Although several other tools have this feature, only ggVennDiagram is fully automatic since it is driven by ggplot2’s aesthetic mapping. (2) The ggVennDiagram has built-in shapes consisting of circles, ellipses, and others. Besides, we also provide functions to help users to import self-defined shapes (
Notably, several tools support both Venn and Euler diagrams. However, an Euler diagram has two shortages: firstly, it is area proportional, but the human eye is less sensitive to area than to color; secondly, it only shows relevant relationships, but sometimes, it is impossible to show all intersection regions merely by using simple geometric shapes, such as circles and ellipses. Therefore, we assume that it is more appropriate to use color filling for displaying the difference between different regions in ordinary biomedical studies.
Overall, ggVennDiagram integrates and optimizes a Venn diagram plotting method, exhibiting multiple advantages in performance over current existing tools. Compared with webtool, R scripts are easier to integrate into the existing bioinformatics analysis pipelines to realize automation and batch drawing of Venn diagrams. Therefore, it is necessary and useful to develop ggVennDiagram.
The ggVennDiagram R package is open source and freely available on CRAN (
C-HG, GY, and PC wrote this manuscript. C-HG implemented this package with the help of GY. PC supervised the project. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
This work was supported by the National Natural Science Foundation of China (32100090, 41877029, and 41961130383), Royal Society-Newton Advanced Fellowship (NAFR1191017), the National Key Research Program of China (2020YFC1806803), Wuhan Applied Foundational Frontier Project (2019020701011469), and Fundamental Research Funds for the Central Universities (2662021JC012).
We thank Adrian Duşa for letting us reuse the “venn:::sets” data in his venn package, and this is critical to enable five- to seven-set Venn diagrams in ggVennDiagram. We also thank the GitHub user Yi Liu (@liuyigh) for his contribution on code curation. Great gratitude goes to linguistics Ping Liu from Huazhong Agriculture University, Wuhan, China, for her work on English editing and language polishing.