Study population and setting
This study describes the identification of a novel SARS-CoV-2 lineage, B.1.x (sometimes referred to as B.1.321.1) in Santa Cruz County, CA, USA, in early 2021. Phylogenetic analysis was performed using consensus sequences from SARS-CoV-2- positive residual samples (n=339) and randomly selected global background sequences (n=1,000). Similar sequences were retrieved from GenBank and GISAID for comparison. The growth rate of the B.1.x lineage was estimated using a simple logistic regression model.
Summary of Main Findings
More than half of the sequences identified in this dataset were from the B.1.427 and B.1.429 lineages, which were first identified in California. Two B.1.1.7 sequences were also found, but no other CDC-designated variants of concern (VOC) were identified. However, eight samples (2.4%), collected in February and March 2021, appeared to represent a new lineage within B.1, which the authors temporarily refer to as B.1.x, awaiting more refined classification. Prevalence of B.1.x increased over time, from 1% in January 2021 to 10% in March 2021. Additional sequences similar to B.1.x were identified in over 20 US states and 6 countries. (Of note, some UK sequences had been submitted under lineage B.1.321.1.)
Lineage-defining point mutations for B.1.x include several in spike protein (S494P, N501Y, D614G, P681H, K854N, and E1111K) and N:M234I. While several of these mutations are shared with other VOC, it appears unlikely that B.1.x is the result of a recombination event. B.1.x sequences also contain a large 35 base pair deletion in ORF8, which results in a premature stop codon. The biological significance of ORF8 inactivation, which is also present in B.1.1.17, is still unknown. However, because the deletion in B.1.x sequences leads to a frameshift, their submission to database repositories is automatically rejected. Successful submission of these sequences requires additional, lengthy steps in the manual curation process that many labs elect not to complete, instead choosing to abandon submission or to modify the sequences (i.e. adding N’s in place of deleted residues) in order to bypass quality control mechanisms. This means that B.1.x and other lineages with frameshift mutations may be underrepresented in sequence databases, limiting the ability to accurately estimate their impact on the pandemic. Authors suggest adding rapid phylogenetic analysis as a step in the submission process, in order to allow closely-related novel sequences to validate each other at the time of submission.
Routine genomic surveillance with whole genome sequencing was used to identify a new SARS-CoV-2 lineage harboring several mutations found in other VOC.
The sample size for B.1.x sequences in this dataset is quite small (n=8), and none were detected at the last two study timepoints, both of which limit the accuracy of growth estimates. Additionally, the samples do not represent a randomized sample from the region. The growth rate of the B.1.x lineage was estimated using only a simple logistic regression model, as samples were anonymized and lacked covariate data. Functional relevance of the combination of mutations found in B.1.x was not assessed.
This study describes the identification of a new SARS-CoV-2 lineage (B.1.x) by genetic surveillance in early 2021. B.1.x contains a large deletion (and consequent frameshift mutation that inactivates ORF8) which may lead to underrepresentation of this lineage in sequence databases, as initial submissions of sequences containing frameshift deletions are automatically rejected. This illustrates a limitation in our ability to accurately monitor the spread of some SARS-CoV-2 lineages and VOC.
This review was posted on: 14 May 2021