Skip to main content

Phylogenetic network analysis of SARS-CoV-2 genomes

Our take —

This study used publicly available SARS-CoV-2 virus genomes to draw conclusions about the different types of virus circulating in this outbreak, and how these types are spreading to different parts of the world. However, the methodology used was relatively simplistic, and the conclusions were based on a specific relationship between human and bat viruses that is not accurately represented in this study. Other studies using methods developed specifically for viral sequence data (and many more sequences) better represent the different types of virus circulating, and how they are circulating across the globe.

Study design


Study population and setting

This study involved the analysis of 160 previously published viral genetic sequences from all over the world. These sequences were from humans, pangolins, and one bat infected with SARS-CoV-2 between December 2019 and March 4, 2020.

Summary of Main Findings

The primary result of this study is a network analysis of the viral sequence data, from which the authors to identified 3 main types of virus (called “A”, “B”, and “C”) circulating in early 2020. They found that type A is most similar to the most closely related bat virus and that, while types A and C are found in a number of patients outside of East Asia, type B is found in predominately East Asian patients. The authors also analyzed viral sequences from a number of known transmissions (e.g., the first Brazilian patient that had a history of travel to Italy) and found that, in all cases, the network analysis confirmed these suggested cross-country links.

Study Strengths

Due to serious concerns about the methods and conclusions in this study noted by numerous members of the viral phylogenetics community, it is difficult to identify specific strengths.


The primary limitation of this paper is that the connection between the 160 human virus genomes with the bat virus is misrepresented, and possibly incorrect. Although the paper depicts only 16 mutations between the bat virus and the human viruses, there are actually over 1000 mutations between the most closely related bat coronavirus and what is currently circulating in humans. Because the bat virus is so different, it is actually difficult to determine which of clusters A, B, or C is most closely related to it. Additionally, this paper employs a relatively simple network analysis, and does not take advantage of more state-of-the-art methods that can more accurately determine relationships between virus sequences.

Value added

There is limited value added by this paper, other than the confirmation of epidemiological links between virus cases in different countries. Even so, the language used to describe these links is very definitive and does not effectively communicate the caveats to this type of analysis.