Phylodynamics of SARS-CoV-2

Simon Frost

Principal Data Scientist, Microsoft Health Futures

Professor of Pathogen Dynamics, LSHTM

Models old and new

  • Old
    • SIR/SEIR
  • Not as old
    • Stochastic models
    • Age structured models
  • Recent
    • Agent-based models
    • Network models
  • New
    • Phylodynamic models


  • Models that integrate evolutionary models with…
    • Epidemiology
    • Immunology
    • Ecology
  • Different data streams can complement and enhance each other
  • Villabona-Arenas, Hanage and Tully (2020) argue that phylogenetic data should be integrated with other sources

Sequence data

  • Why include sequence data?
    • Introduction of cases
    • Spatial coupling
    • Hidden heterogeneity
  • Remarkable generation and dissemination of SARS-CoV-2 sequence data
    • GISAID: collates data worldwide
    • COGUK: COVID-19 Genomics UK Consortium

Increased accessibility

  • Not just ability to download sequences
  • Post-processed data
    • Alignment
    • Lineages
  • Dashboards
    • Nextstrain, Microreact, CoV-GLUE

Microreact: UK data

Yet studies are limited

  • medrXiv/biorXiv:
    • ‘SEIR and COVID19’: 445 hits
    • ‘phylodynamics and COVID19’: 26 hits
  • Until recently, sample sizes of studies were small
  • Typically stronger on the phylogenetics than on the modeling

Early dynamics (n=53)

  • SEI(2)R model with (assumed) heterogeneity in infectiousness
  • \(R_0=2.15\) \((1.79-2.75)\)

Volz et al., Imperial Report 5, 2020-02-15

Weifang outbreak (n=20)

Volz et al., medrXiv, 2020-03-19

Li and Ayscue: multiple locations (n=1,113)

  • Joint estimation using cases and phylogeny
    • Methodology of Li, Grassly, and Fraser, assuming a branching process with time-varying parameters
  • Estimated undercount of cases as well as heterogeneity in cases

Li and Ayscue, medrXiv, 2020-05-09

Australian epidemic (n=903)

  • Assume two regimes of \(R\) and fitted timing and magnitude using a birth-death process

Seemann et al., medrXiv, 2020-05-16


  • With few exceptions, studies that use SARS-CoV-2 sequence data are more ‘phylo’ and less ‘dynamics’
  • Datasets are large and expanding rapidly
    • Allows us to fit more complex models…
    • Computationally expensive


  • Long term:
    • Focus on scalable inference
  • Short term:
    1. Look at smaller e.g. subnational epidemics
    2. Generate predicted phylogenies from existing models

1. Smaller epidemics

  • Analyses of datasets in the low 1000s is possible with current frameworks
    • Scotland
    • Wales

Microreact: Scotland data

Microreact: Wales data

2. Generating predicted phylogenies

  • Algorithms and tools exist today to take epidemiological models and output phylogenies
    • ODE models: phydynR (Volz)
    • Gillespie-type models: MASTER (Vaughan)
    • Agent-based models:
      • VirusTreeSimulator (Hall)
      • treesampler (Kosakovsky Pond)
  • A requirement is that we need to understand the link between state changes in an epidemiological model and the phylogeny

An analogy: deterministic vs. stochastic models

  • Consider a birth-death process with birth rate \(b\) and death rate \(d\)
  • In a deterministic ODE model, we just have to consider the difference in rates \(b-d\)
  • In a stochastic model, we have to consider both processes

Phylodynamic vs. non-phylodynamic models

  • By defining models in terms of their components, we can easily extend epidemiological models to generate phylogenies
    • Transmission results in lineages splitting
    • Processes such as movement can result in a change in lineage state
  • With more work, it is possible to extend them to generate summary statistics of phylogenies
    • Clustering, asymmetry etc.
    • Frost and Volz (2010,2013)

Core groups and phylogeny

Frost and Volz (2013)

Deterministic models with phydynR

births <-  c('parms$beta*S*I')
deaths <- c('(parms$mu+parms$gamma)*I')
names(births) <- names(deaths) <- c("I")
nonDemeDynamics <- c('parms$mu*(S+I+R)-parms$beta*S*I-parms$mu*S',
names(nonDemeDynamics) <- c("S","R")

Dynamics and trees

Stochastic models with MASTER

<reaction spec='Reaction' reactionName="Infection" rate="0.005">
    S + I -> 2I
<lineageSeed spec='Individual' population='@I'/>

Agent based models

  • Perhaps the simplest to deal with
  • Just need to keep track of
    • who infected whom (and when)
    • when infected cases die/recover
Ego  Alter  Time
  1      2   1.0
  2      3   1.5
  1     -1   2.1

Moving between model formulations

Phylogenies or lineages?

  • SARS-CoV-2 is not that diverse
  • Rambaut et al. have defined a number of lineages
    • A, B, B.1, B.1.1 etc.
  • Do we need the full phylogeny? Or do lineages suffice?

UK phylogeny

New models?

  • Rather than repurpose existing models, are there new ones we can explore?
  • How can we represent genealogies within an epidemiological model?
  • In population genetics, Fleming-Viot processes have been developed that have considered e.g. infinite allele models, infinite sites models etc..
    • Ethier and Kurtz (1993)


  • The densely sampled COVID19 epidemic in the UK provides challenges and opportunities:
    • Methodological development
    • Understanding contact structure
  • Many existing models that can be retrofitted to generate pathogen phylogenies
    • Aid to model comparison, as many models may fit the case data equally well

Thanks!  @sdwfrost