Using molecular dating tools and epidemiological simulations, researchers at University of California San Diego School of Medicine, with colleagues at the University of Arizona and Illumina, Inc., estimate that the SARS-CoV-2 virus was likely circulating undetected for at most two months before the first human cases of COVID-19 were described in Wuhan, China in late-December 2019.

Writing in the March 18, 2021 online issue of Science, they also note that their simulations suggest that the mutating virus dies out naturally more than three-quarters of the time without causing an epidemic.

“Our study was designed to answer the question of how long could SARS-CoV-2 have circulated in China before it was discovered,” said senior author Joel O. Wertheim, PhD, associate professor in the Division of Infectious Diseases and Global Public Health at UC San Diego School of Medicine.

“To answer this question, we combined three important pieces of information: a detailed understanding of how SARS-CoV-2 spread in Wuhan before the lockdown, the genetic diversity of the virus in China and reports of the earliest cases of COVID-19 in China. By combining these disparate lines of evidence, we were able to put an upper limit of mid-October 2019 for when SARS-CoV-2 started circulating in Hubei province.”

Cases of COVID-19 were first reported in late-December 2019 in Wuhan, located in the Hubei province of central China. The virus quickly spread beyond Hubei. Chinese authorities cordoned off the region and implemented mitigation measures nationwide. By April 2020, local transmission of the virus was under control but, by then, COVID-19 was pandemic with more than 100 countries reporting cases.

SARS-CoV-2 is a zoonotic coronavirus, believed to have jumped from an unknown animal host to humans. Numerous efforts have been made to identify when the virus first began spreading among humans, based on investigations of early-diagnosed cases of COVID-19. The first cluster of cases — and the earliest sequenced SARS-CoV-2 genomes — were associated with the Huanan Seafood Wholesale Market, but study authors say the market cluster is unlikely to have marked the beginning of the pandemic because the earliest documented COVID-19 cases had no connection to the market.

Regional newspaper reports suggest COVID-19 diagnoses in Hubei date back to at least November 17, 2019, suggesting the virus was already actively circulating when Chinese authorities enacted public health measures.

In the new study, researchers used molecular clock evolutionary analyses to try to home in on when the first, or index, case of SARS-CoV-2 occurred. “Molecular clock” is a term for a technique that uses the mutation rate of genes to deduce when two or more life forms diverged — in this case, when the common ancestor of all variants of SARS-CoV-2 existed, estimated in this study to as early as mid-November 2019.

Molecular dating of the most recent common ancestor is often taken to be synonymous with the index case of an emerging disease. However, said co-author Michael Worobey, PhD, professor of ecology and evolutionary biology at University of Arizona: “The index case can conceivably predate the common ancestor — the actual first case of this outbreak may have occurred days, weeks or even many months before the estimated common ancestor. Determining the length of that ‘phylogenetic fuse’ was at the heart of our investigation.”

Based on this work, the researchers estimate that the median number of persons infected with SARS-CoV-2 in China was less than one until November 4, 2019. Thirteen days later, it was four individuals, and just nine on December 1, 2019. The first hospitalizations in Wuhan with a condition later identified as COVID-19 occurred in mid-December.

Study authors used a variety of analytical tools to model how the SARS-CoV-2 virus may have behaved during the initial outbreak and early days of the pandemic when it was largely an unknown entity and the scope of the public health threat not yet fully realized.

These tools included epidemic simulations based on the virus’s known biology, such as its transmissibility and other factors. In just 29.7 percent of these simulations was the virus able to create self-sustaining epidemics. In the other 70.3 percent, the virus infected relatively few persons before dying out. The average failed epidemic ended just eight days after the index case.

“Typically, scientists use the viral genetic diversity to get the timing of when a virus started to spread,” said Wertheim. “Our study added a crucial layer on top of this approach by modeling how long the virus could have circulated before giving rise to the observed genetic diversity.

“Our approach yielded some surprising results. We saw that over two-thirds of the epidemics we attempted to simulate went extinct. That means that if we could go back in time and repeat 2019 one hundred times, two out of three times, COVID-19 would have fizzled out on its own without igniting a pandemic. This finding supports the notion that humans are constantly being bombarded with zoonotic pathogens.”

Wertheim noted that even as SARS-CoV-2 was circulating in China in the fall of 2019, the researchers’ model suggests it was doing so at low levels until at least December of that year.

“Given that, it’s hard to reconcile these low levels of virus in China with claims of infections in Europe and the U.S. at the same time,” Wertheim said. “I am quite skeptical of claims of COVID-19 outside China at that time.”

The original strain of SARS-CoV-2 became epidemic, the authors write, because it was widely dispersed, which favors persistence, and because it thrived in urban areas where transmission was easier. In simulated epidemics involving less dense rural communities, epidemics went extinct 94.5 to 99.6 percent of the time.

The virus has since mutated multiple times, with a number of variants becoming more transmissible.

“Pandemic surveillance wasn’t prepared for a virus like SARS-CoV-2,” Wertheim said. “We were looking for the next SARS or MERS, something that killed people at a high rate, but in hindsight, we see how a highly transmissible virus with a modest mortality rate can also lay the world low.”

Co-authors include: Jonathan Pekar and Niema Moshiri, UC San Diego; and Konrad Scheffler, Illumina, Inc.

Funding for this research came, in part, from the National Institutes of Health (grants AI135992, AI136056, T15LM011271), the Google Cloud COVID-19 Research Credits Program, the David and Lucile Packard Foundation, the University of Arizona and the National Science Foundation (grant 2028040).