Skip to main content

Protein structure and sequence reanalysis of 2019-nCoV genome refutes snakes as its intermediate host and the unique similarity between its spike protein insertions and HIV-1

Our take —

Reanalysis of two controversial publications using additional data refutes the hypotheses that SARS-CoV-2 was engineered to contain pieces of HIV-1 and uses snakes as an intermediate host. Available evidence suggests the probable evolutionary origin of SARS-CoV-2 is bats, with the involvement of one or more mammalian intermediate hosts. Additionally, the study finds that comparing host and virus codon usage is not specific enough to determine intermediate hosts of coronaviruses.

Study design

Ecological, Modeling/simulation, Other

Study population and setting

The authors present a reanalysis of data from three recent publications: 1) similarities in genetic sequences within the spike protein of SARS-CoV-2 and human immunodeficiency virus (HIV-1) published by Pradhan et al. (, now withdrawn); 2) identification of potential intermediate hosts of SARS-CoV-2 by comparing the ways that the virus and animals use their genetic code to produce proteins, specifically their relative synonymous codon usage, published by Ji et al. (; and 3) assembly of a draft coronavirus genome from metagenomic reads from Malayan pangolins produced by multiple research groups.

Summary of Main Findings

The authors found that the genetic sequences within the spike protein share no significant similarity with HIV-1 (contradicting Pradhan et al.); rather, all four sequences were close matches to other viruses and three out of four matched exactly with sequences in a coronavirus from a bat. The reanalysis of codon usage between SARS-CoV-2 and potential intermediate hosts was performed using a more complete database than that used by Ji et al. and additional coronaviruses for comparison. The authors find that the most probable intermediate hosts for SARS-CoV-2, SARS-CoV, and MERS-CoV based on codon usage are frogs, which are not known to be involved in any way with the life history of these viruses, thus calling into question the biological validity of relying on codon usage for identifying intermediate hosts. Finally, they successfully put together all of the sequences of pangolin coronaviruses into a draft genome, with 73% coverage and 91% sequence identity (92% for the spike protein) compared to the SARS-CoV-2 genome.

Study Strengths

The authors make a clear and well-supported argument against the claims presented in the Pradhan et al. and Ji et al. studies. The authors reexamine the spike protein sequences with broader search parameters than the original Pradhan et al. study. For the codon usage analysis, a broader diversity of viruses (SARS-CoV and MERS-CoV) and potential intermediate hosts were considered, and the database of codon usage was updated more recently than the one used by Ji et al.


Regarding the analysis of pangolin coronavirus metagenomes, the phylogenetic distance of the pangolin coronavirus to SARS-CoV-2 is still too far to implicate pangolins as the intermediate hosts of the virus. More surveying must be done in bats, pangolins, and other mammals to identify the zoonotic source of SARS-CoV-2 in humans.

Value added

This study discredits two controversial hypotheses regarding the origin of SARS-CoV-2 that emerged early in the outbreak and generated significant media attention.