Recovery of Deleted COVID-19 Data From NIH Database, Removal Request Made by Chinese Scientist

By Author:
177 0
Some early SARS-CoV-2 viral sample data was deleted from an NIH database and later recovered, providing additional insights into the origins of COVID-19.
Some early SARS-CoV-2 viral sample data was deleted from an NIH database and later recovered, providing additional insights into the origins of COVID-19. (Image: Brett Sayles via Pexels)

Some of the earliest viral samples from the Coronavirus Disease 2019 (COVID-19) pandemic submitted by a Chinese researcher were removed from a shared database maintained by the U.S. National Institutes of Health (NIH), according to a report by Fred Hutchinson Cancer Center researcher Jesse Bloom.

In the preprint paper published on BioRxiv, Bloom wrote that the missing data in the NIH database included viral samples obtained from patients who were either suspected of having COVID-19 or hospitalized due to the infection in Wuhan. Bloom claimed that he recovered the deleted files from the Google Cloud and reconstructed partial sequences of 13 early epidemic viruses.

After conducting a phylogenetic analysis of these sequences, Bloom concluded that “the Huanan Seafood Market sequences that are the focus of the joint WHO-China report are not fully representative of the viruses in Wuhan early in the epidemic. Instead, the progenitor of known SARS-CoV-2 sequences likely contained three mutations relative to the market viruses that made it more similar to SARS-CoV-2’s bat coronavirus relatives.”

“Although events that led to emergence of #SARSCoV2 in Wuhan are unclear (zoonosis vs lab accident), everyone agrees deep ancestors are coronaviruses from bats… Therefore, we’d expect the first #SARSCoV2 sequences would be more similar to bat coronaviruses, and as #SARSCoV2 continued to evolve it would become more divergent from these ancestors. But that is *not* the case,” Bloom said in a series of tweets.

Instead, the early viruses from the Huanan seafood market are more different from bat coronaviruses than SARS-CoV-2 viruses collected later on in China and elsewhere. In addition, these early viruses were circulating in Wuhan even before December, when the first seafood market outbreak was reported.

Bloom states that there are three broader implications to his findings. “First, [the] fact [that] this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared. We already know many labs in China ordered to destroy early samples… Sequence sharing could be further limited by fact that scientists in China are under an order from the State Council requiring central approval of all publications.”

Second, the findings imply that it might be possible to obtain additional information on the early spread of COVID-19 in Wuhan even if the investigations “remain stymied.” Third, Bloom feels that scientists need to stay focused on data-driven studies of the pandemic’s origin and early spread. He is optimistic that relevant data will continue to come to light, and asks scientists to focus on two questions: (1) how to get more data, and (2) how to better analyze the data.

According to the NIH, the missing sequences were initially submitted by Chinese researchers in March 2020. In June 2020, the organization received a request from a Chinese scientist who asked for the sequences to be deleted because they had been updated and were to be posted on an unspecified website.

“Submitting investigators hold the rights to their data and can request withdrawal of the data,” the NIH said in a statement, according to The Wall Street Journal (WSJ). According to the NIH, the investigator wanted the older version of the sequences to be removed to avoid confusion. Although some of the deleted data is still available in a specialized journal, Bloom says that scientists usually look for sequences in major databases like the one maintained by the NIH.

Maciej Boni, an associate professor at Pennsylvania State University, told the South China Morning Post that the recovered data confirmed the generally accepted timeline of the pandemic’s emergence. “It’s further confirmation that the date of origin was in the mid-October to mid-November range… Does it change the overall picture? No. But is the data valuable in confirming the picture? Yes,” Boni said.

In an interview with WSJ, Dr. Vaughn S. Cooper, an evolutionary biologist at the University of Pittsburgh, stated that the deleted sequences do not provide a solution to the debate over whether the COVID-19 virus emerged from animal spillover to humans or from a lab accident. However, “it makes us wonder if there are other sequences like these that have been purged.”

In May, Bloom was one of the international scientists who signed an open letter published in Science magazine criticizing the World Health Organization’s (WHO) investigation into the pandemic’s origin. The investigation began in January, and a report was published in March. 

The letter stated that “only 4 of the 313 pages of the report and its annexes addressed the possibility of a laboratory accident.” In addition, the letter asked for a more serious investigation into the natural and lab-leak COVID-19 origin theories.