Study population and setting
This study used SARS-CoV-2 viral sequences taken over time from around the world that are deposited into the GISAID database (gisaid.org). The authors used a sequence analysis pipeline to rapidly identify mutations in the virus over time, and then attempted to determine if there is evidence that viral variants possess any biologically relevant advantage, e.g., transmission potential, that might explain changes in their frequency in infected populations over time. Additional sequencing data from COVID-19 patients in Sheffield, England was used to compare changes in the frequency of viral variants over time with other patterns observed in other countries, and to test whether viral variants influence patient clinical outcomes.
Summary of Main Findings
The authors identified a mutation in SARS-CoV-2, D614G in the viral spike protein, that has now circulated widely in infected populations since its initial emergence in January 2020. The increase in frequency of the G614 variant over the D614 variant is shown in multiple countries and states in Europe, Asia, and North America, and is paralleled in the data from Sheffield, England. Within the Sheffield population, there was no significant association between D614G variants and clinical outcomes (hospitalized versus not hospitalized), but there was a significant association with the cycle threshold for viral detection in cases, suggesting potentially higher viral loads in patients carrying the G614 variant. The authors also identify potential recombination events in SARS-CoV-2 that could indicate the presence of mixed infections of viral variants that exchange genetic material.
The dataset used for the study was extensive, and the researchers have significant experience working with large-scale viral sequence data. The pipeline to rapidly identify emerging mutations was robust and appears useful. In addition, pairing this mutational identification with preliminary structural analysis can also help to highlight mutations that may be important to examine further. The data from Sheffield was also important for supporting a biological explanation for the rapid increase in frequency of the G614 variant virus.
The initial limitation of the technique described here is that it does not take into account other factors that might play a critical role in the spread of different viral variants globally. In particular, the authors claim that increasing frequency of the G614 variant is due to higher transmissibility. While this is supported by potentially higher viral loads in patients infected with the G614 variant, there are other equally plausible explanations for the observed patterns. One of these is a pervasive founder effect, wherein the G614 variant simply increased in frequency due to random chance after introduction into Europe from China, and was then reinforced by repeated introductions from Europe to other countries around the world. Without controlling for these other neutral factors through more complex modeling, many of the claims made in the paper on the emergence of “dominant” strains are unfounded. Although structural modeling is an interesting tool added to this analysis pipeline, it should only be used for generating questions to be explored functionality. The authors speculate that the mutations they identify may have critical biological advantages based on these structural models and some clinical data. However, the researchers would need much denser sequence data from patients over time or infection experiments in cell cultures or live animals to establish that the G614 variant is more transmissible. In addition, the results indicating lower cycle threshold values for detecting the G614 variant do not account for the timing of sample collection relative to symptom onset. Additional sampling or experiments will be needed to verify these results and establish whether the potentially higher viral loads in patients with the G614 variant translate into more transmission.
This study provides an interesting and novel series of tools for researchers to rapidly examine the large amount of SARS-CoV-2 sequence data that is being generated daily, and provides preliminary identification of possibly important mutations as they arise.