Skip to main content

Molecular Architecture of Early Dissemination and Massive Second Wave of the SARS-CoV-2 Virus in a Major Metropolitan Area

Our take —

This study used genetic sequences to understand the introduction of SARS-CoV-2 into Houston, Texas, a large metropolitan area in the United States. The authors generated over 5,000 genomes and found evidence for multiple introductions of the virus from all over the world, and pointed out key differences in the individuals affected during the first (March-May 2020) and second (May-July 2020) waves of infection. They find evidence that patients with virus containing the D614G mutation have higher viral loads, and argue this supports the hypothesis that this mutation makes this type of virus more transmissible, but do not specify when during infection samples were taken. In general, the breadth of the data and analyses presented in this paper are impressive, but some analyses lack nuance and sufficient validation.

Study design


Study population and setting

This study investigated two waves of COVID-19 infection in Houston, Texas, an ethnically diverse region of the United States. The authors generated and analyzed 5,085 SARS-CoV-2 genomes collected from March 5 – July 7, 2020, collected from over 55,000 patients within the Houston Methodist Hospital system. The authors also experimentally analyzed synthetic spike protein constructs in the lab to evaluate the functional effects of specific mutations observed in their genomes. They specifically focus on the D614G mutation, which has been observed in a large portion of viruses from recent cases in the United States and Europe, prompting speculation that virus with this mutation may be more transmissible.

Summary of Main Findings

This study contains three primary findings: (1) Multiple strains of SARS-CoV-2 were introduced into the Houston area in March 2020 from diverse geographic regions; (2) There were two waves (i.e., peaks) of cases, with the second wave affecting younger individuals with fewer comorbidities. The second wave consisted almost exclusively of SARS-CoV-2 strains with a much-noted mutation in the spike protein of the virus (D614G), whereas 82% of genomes from the first wave contained this mutation; (3) There were no mutations in the 5,085 genomes at sites known to cause resistance to the drug remdesivir, but they found that D614G mutation in the spike protein was associated with higher viral loads, suggesting it may be better able to enter host cells and spread through human populations.

Study Strengths

This study is the largest SARS-CoV-2 genomic study in the United States to date, analyzing data from over 5,000 patients. The large number of genomes analyzed allowed the authors to comprehensively understand the mutations circulating in the region. Additionally, molecular studies provide much-needed additional data on the potential functional implications of the D614G spike protein mutation.


This study has several limitations. First, the authors claim their sequences are representative of COVID-19 cases in Houston, but do not provide information on even the basic demographics of the infected individuals. Second, they claim the increase in cases in wave 2 with the D614G spike protein is statistically significant, but they fail to address that this assumes similar epidemic dynamics during the two waves. They also state that viral load is higher in patients with the D614G SARS-CoV-2 variant, but do not address or correct for when the sample was taken during the patient’s course of infection; this would be important because there are several studies already that show a clear correlation between viral load and days since symptom onset. Finally, they provided very few citations to previous work in their introduction (only in the discussion do they provide context for some of their findings), and did not make their genomic data publicly available at the time of publication, hindering further research on this topic.

Value added

This study provides a large number of SARS-CoV-2 genome sequences from a diverse metropolitan area in the United States. These data provide a comprehensive picture of the strains circulating in Houston, Texas between March-July 2020, and shows that there were multiple introductions from around the globe. The data also show that the popular drug remdesivir is likely to be effective on all strains circulating in the region. Finally, the authors conduct experiments on the spike protein of the virus and evaluate the potential effect of the D614G mutation on the transmissibility of the virus.

This review was posted on: 5 October 2020