Genomic surveillance — the process of monitoring and sequencing pathogens — is one of the most important tools for detecting emerging viral threats. But global surveillance systems remain costly, unevenly distributed and often too slow to identify dangerous variants before they spread internationally, amplifying future threats of disease outbreaks.
A recently published research paper in Nature Communications, co-authored by Dr. Patricia Ning, assistant statistics professor, and Jifan Li, a doctoral candidate in the Department of Statistics, along with collaborators from multiple international institutions, introduces a new framework to rectify these issues while using fewer resources, making genomic surveillance rapid and cost-effective in preparation for new strains of COVID-19.
Ning’s algorithm works to strengthen local, community-based surveillance capacity in all regions, in anticipation of future pandemics.
People are also reading…
“We want to make real-time predictions of what will happen so we can provide guidance for decision-making, especially for government agencies,” Ning said. “This can help improve disease-control efforts.”
Optimizing the system
Ning developed a process called the Iterative Block Particle Filter algorithm, which is based on the basic particle filter algorithm, also called the Sequential Monte Carlo (SMC) method, widely used across scientific fields of research.
The SMC method has a significant limitation; it only works on small to medium spatial and graphical dimensions. The method becomes increasingly difficult to use effectively in highly complex systems.
To make the SMC method work, larger graphs can be broken into smaller sets, but genomic surveillance involves data sets that are often too large and complex for this approach to remain effective.
“For example, we don’t want to disrupt interactions between cities,” Ning said. “When someone flies from New York to California, there is still some interaction. We do not want to break that; we want to preserve it instead of separating cities into isolated study groups.”
Ning’s algorithm was specifically designed to solve scalable inference without suffering from the curse of dimensionality — the issue in algorithms where computation error grows exponentially with the number of interacting variables.
It addresses this problem using an iterated setting; the output of one data set is the input of the next. Rather than one massive global system, Ning localized updates while preserving critical interactions between neighboring regions and travel hubs, allowing the algorithm to remain scalable and beat dimensionality by controlling filtering error locally. Her algorithm is designed to work with high-dimensional, spatial-temporal data and to maintain interactions between spatial units.
The version of the algorithm in the study used large-scale multi-strain models on real data such as epidemiological records, vaccine information and high-resolution international air travel data, tracking dozens of regions and multiple viral strains. This method outperformed all existing common filter algorithms and reduced the time between detecting a disease variant and sequencing it, or identifying the exact genetic code of an organism or virus.
Based on the results from Ning’s algorithm, governments and hospitals can better prepare for the impacts of the outbreak by allocating more surveillance resources toward major international travel hubs. This was especially true for lower-resource regions, where large-scale genomic sequencing projects may not be feasible for the long term. “If we can optimize the allocation, we can spend money more efficiently,” Li said. “That would allow us to detect new variants much earlier without increasing the budget.”
The future of the algorithm
The Iterative Block Particle Filter algorithm has the potential for use beyond genomic surveillance. “Our work using COVID data has already produced strong research results,” she said. “If we can apply this method to other types of data, we expect it will continue to be a success.”
The researchers emphasized that their framework extends beyond COVID-19. Because the algorithm is compatible with more general systems, the methodology can potentially support surveillance and forecasting efforts for influenza, dengue, Ebola, Zika and other future pandemic threats.
The research group’s code has been made public on GitHub, a platform used by developers to share software projects.
Additionally, Ning herself believes the broader significance of the algorithm extends beyond genomic surveillance. It was designed as a general methodology for dynamic systems — cities connected by transit systems, ecosystems linked by migration patterns, social networks, electrical grids, gene regulation and many more — making it broadly applicable across scientific disciplines wherever complex interactions evolve over time.
Because it can efficiently learn from streaming data in real time while preserving dependency across networks, the algorithm opens the door to a new generation of scalable spatiotemporal learning models.
“In the machine learning community, many algorithms are designed to perform well only on specific test data sets,” Ning said. “Our approach is different because we have theoretical performance guarantee that it works reliably across broad spatiotemporal data.”

