Skip to main content

Sixteen novel lineages of SARS-CoV-2 in South Africa

Our take —

This study describes the emergence of sixteen unique SARS-CoV-2 lineages in South Africa, emerging from localized outbreaks during the lockdown phase of the pandemic response. The early phase of the South African epidemic was dominated by three of these lineages (B.1.1.54, B.1.1.56 and C.1), with C.1 becoming the predominant lineage by late summer. Of note, this study precedes the emergence of B.1.351 (501Y.V2), which has recently received international attention due to findings suggesting lower vaccine effectiveness against this variant. This work also describes the successful use of real-time data from large-scale genomic surveillance to contain an outbreak associated with a novel lineage, illustrating the value these methods add to the public health toolkit.

Study design

Retrospective Cohort

Study population and setting

The aim of this study was to understand the early transmission dynamics and molecular evolution of SARS-CoV-2 in South Africa. This study included 1,365 SARS-CoV-2 genomic sequences that were consistently sampled between March and August 2020 across eight of nine South African provinces, including all districts of KwaZulu-Natal (KZN). Background sequences for this analysis were comprised of a global dataset of 5,848 representative genomes. Phylogenetic analysis was used to identify instances of SARS-CoV-2 introduction, describe early transmission dynamics, and characterize novel viral lineages in South Africa.

Summary of Main Findings

SARS-CoV-2 was first identified in South Africa on March 5, 2020, and by November 2020, the virus had infected over 785,000 South Africans. This study identified at least 101 independent introductions of SARS-CoV-2 into South Africa, largely originating from Europe prior to the initiation of a national lockdown on March 26, 2020. Although lockdown protocols were effective in limiting further introductions, localized outbreaks resulted in the emergence of South African-specific SARS-CoV-2 lineages. Sixteen new South African lineages were identified in this study, and most included mutations not observed in in other countries. The early South African epidemic was dominated by three of these novel lineages (B.1.1.54, B.1.1.56 and C.1), representing ~42% of COVID-19 cases. Approximate viral loads for these lineages did not differ significantly from wild-type, implying that their expanded representation was secondary to containment failure of localized outbreaks (largely nosocomial) vs. advantages in viral fitness (i.e. increased transmissibility). The C.1 lineage, which contains 16 mutations (including the spike mutation D614G), became predominant in South Africa by August 2020. Notably, another early lineage (B.1.106) was associated with nosocomial outbreaks in KZN. However, integration of genomic surveillance into outbreak investigation and containment measures resulted in an effective public health response, ultimately leading to this lineage’s extinction.

Study Strengths

This study includes a large dataset of whole genome sequences from South Africa, with attempts to randomize sampling in terms of both geography and timing. Phylogenetic data were analyzed in the context of important events in the pandemic, including known outbreaks and the implementation of public health measures, including lockdown.


Geographic distribution of collected samples was not entirely random; only eight of nine provinces were sampled, and KZN was better-represented than other provinces. Time distribution of collected samples was also skewed, with less coverage during the earliest stages of the pandemic. Lineage specific viral load approximations were made using samples collected at different infection timepoints, which may have limited the ability to identify differences between novel lineages.

Value added

This work improves our understanding of the dynamics of the COVID-19 epidemic in South Africa and illustrates how genomic surveillance can be useful in identifying and tracking the spread of novel lineages. Additionally, this work illustrates the value of using this data in real-time for purposes of identifying and effectively containing localized outbreaks caused by unique lineages.

This review was posted on: 12 March 2021