Skip to main content

On the origin and continuing evolution of SARS-CoV-2

Our take —

In an analysis of available SARS-CoV-2 genomes, two genomic types were identified (L and S). However, the “aggressive” nature of the L strain should not be interpreted as more virulent or pathogenic. The higher frequency of this strain in the early outbreak may have been due to higher transmission rates but may also be explained by neutral effects and sampling biases and should thus be interpreted with caution.

Study design

Ecological; Modeling/Simulation; Other

Study population and setting

The data used in the study include genetic information from SARS-CoV-2 infecting humans and related coronaviruses in animals: the reference genome for SARS-CoV-2 (NC_045512), human SARS-CoV, four bat SARS-related coronaviruses (SARSr-CoV: RaTG13, ZXC21, ZC45, and BM48-31), one pangolin SARSr-CoV from Guangdong (GD), and six pangolin SARSr-CoV genomes from Guangxi (GX). Additional analysis focused on 103 SARS-CoV-2 genomes publicly available (on GISAID) from patients inside and outside Wuhan.

Summary of Main Findings

The authors find that the SARS-CoV-2 virus is most closely related to the bat SARS-related coronavirus RaTG13 based on its genetic code (differing ~4% on average across the whole genome), but that there was a larger difference when looking at genetic sequences that are not experiencing natural selection and cause no changes in the proteins that are produced from the genetic code. The authors also find that changes in critical amino acids in the spike protein of SARS-CoV-2 are more similar to a pangolin coronavirus versus bat SARSr-CoV RaTG13, but based on differences in adjacent amino acids near the key residues it is likely that this similarity arose from adaptation of viruses separately within pangolins and humans. Finally, their analysis of 103 SARS-CoV-2 genomes revealed two distinct types (L and S) that differ at two positions in the genetic code. The L type was more common at the early stage of the outbreak in Wuhan (prior to January 7, 2020) compared to after this date and in other locations outside of Wuhan.

Study Strengths

Provides an in-depth analysis of patterns of natural selection on the genomes of SARS-CoV-2 and related coronaviruses.


There is evidence from this study that there are multiple genetic types of SARS-CoV-2 circulating. However, their inference that the L type has a higher transmission rate or is more “aggressive” than the S type cannot be concluded from the data presented. Furthermore, “aggressiveness” of a virus can mislead readers into thinking the L type is more virulent, rather than the intended meaning of higher transmissibility. Without access to patient data to assess transmission rates, the authors rely on prevalence rates alone to infer fitness differences between types, yet their inference does not account for other neutral effects (genetic drift, founder effects as types are seeded to different countries). Additionally, insufficient and biased sampling in the early outbreak is likely to be a significant confounding factor in prevalence estimates and any related inference about transmissibility.

Value added

The study indicates that SARS-CoV-2 is evolving, however predominately in a neutral manner, resulting in identifiable genomic types.