May 21, 2018, Monday
University of Colorado at Boulder Search A to Z Campus Map University of Colorado at Boulder CU 
Search Links

MBW:Stochastic Epidemic Modeling

From MathBio

Jump to: navigation, search


This wiki page presents results from our final project for APPM 5390 Mathematical Biology. An exact copy with functioning equations is available here.

In this wiki we present our work regarding stochastic modeling of epidemics using SIS and SIR models. We provide an alternate discretization scheme to handle larger problems and discuss deterministic vs. stochastic model selection. We apply the models to the flu epidemic in an English boarding school reported in 1978 by the British medical journal 'The Lancet' [1].

Some characteristics of the epidemic

  • Involved 763 students
  • Lasted January 22 to February 4
  • Number of infected students was recorded throughout the epidemic

Summary of Project

* Mathematics used: This project utilizes first order ordianary differential equations to model the spread of disease in a population. These deterministic models are then compared with discrete stochastic models that rely upon statistical techniques such as Markov chains and probability distributions.

* Type of Model: The models in this project simulate the dynamics of SIS and SIR diseases for a population of school children in an English boarding school.

* Biological System: The epidemic of a flu strain in a human population over a period of two weeks.

Markov Chains for Stochastic Epidemic Models

Techniques from Markov chains, applied to epidemic modeling, may allow for more detailed analysis of epidemics. All but the simplest epidemic models are nonlinear, preventing direct analytic computation of the time dependent probability distribution. However, numerical solutions can be computed for systems of moderate size (100's or 1000's individuals).

A continuous time Markov chain is a discrete set of states, known as the state space, and a set of rules for transitions among the states. The Markov property means transitions between states occur randomly and depend only on the previous state. Mathematically, Markovianity is represented as follows: Let X_{t} be a random variable that takes a value on the state space and represents the state of the Markov chain at time t then


for all sequences \{i_{n}\} in the state space and all times 0\leq t_{1}<t_{1}<...<t_{n}.

Commonly, Markov chains are represented as directed graphs. Imagine your self arriving at node B in the following figure. Immediately, a random clock on each outward edge starts ticking. When the first clock rings you move along that edge to another node. The process is then repeated indefinitely.

A 3 State Markov Chain

The behavior of a continuous time Markov chain is completely specified by its transition rate matrix, Q, and an initial distribution. The off diagonal elements Q_{{ij}} are rate parameters for exponential random variables describing the transition from state i to j. The diagonal elements Q_{{ii}} are negative with a value such that every row sums to zero. This construction yields the following governing system of differential equations known as the Kolmogorov Backward Equation


Where p is a discrete, time dependent probability distribution on the state space. Given initial conditions p(0) the solution to the Kolmogorov Backward Equation is


In practice the solutions can be determined through by this method or by numerical solution to the system of ODEs. Computing the exponential of a large matrix is computationally and memory intensive but an ODE integrator may require many time steps to reach the desired output time and could lose accuracy. The results in this wiki were computed using the exponential solution.

S-I-S Model

The deterministic model is:

{\frac  {dS}{dt}}=a*I-r*S*I
{\frac  {dI}{dt}}=r*S*I-a*I

for parameters 'r' and 'a'.

The stochastic model is:

S+I{\stackrel  {f(r)}{\rightarrow }}2*I
I{\stackrel  {g(a)}{\rightarrow }}S

The S-I-S model is a typical 'compartmental'-type model. It is effective in modeling one-step reactions with feedback. One might apply the S-I-S model to epidemiological problems where infection does not deliver immunity. Examples include some sexually transmitted diseases and the common cold.

Adjusting Rate Constants

Quite often, when dealing with mass action kinetic equations, we need to convert rate constants when changing form from a deterministic equation to a stoichiometric one. The rate constant units of measurement for a deterministic model differ from those of a stochastic model. The following examples are presented with the assumption that the reactions are elementary (i.e. consist of only one step).


1. X -> Y, with reaction rate k

    • 'k' is the rate at which the concentration of X, [X], changes to a concentration of Y, [Y].
    'k' is expressed in units of 'per seconds'
    'X' is expressed in units of moles
    '[X]' is expressed in units of moles per liter
  • The differential equation is: {\frac  {dx}{dt}}=-c\cdot x.
    • 'c' is the rate at which x (expressed in number of particles) increases.
    'c' is also expressed in units of 'per seconds'

Let V=volume(liters),n_{A}=6.023\times 10^{{23}}. Then -c\times x={\frac  {dx}{dt}}={\frac  {dx}{d[X]}}{\frac  {d[X]}{dt}}=(n_{A}\times V)\times ({\frac  {-k\times x}{n_{A}\times V}})=1\times -k\times x. Thus, in this case c=k.

2. X + Y -> Z

  • This is defined by k[X][Y].
    {\frac  {d[X]}{dt}}=-k\times [X][Y]\times V={\frac  {-k\times x\times y}{V^{2}\times (n_{A})^{2}}}
    This is also equal to: -cxy/(V\times n_{A}). In this case c\neq k. In fact, c={\frac  {k}{V\times n_{A}}}.
    c's units of measurement are now 'per second \cdot particles.'

Deterministic and Stochastic S-I-S Models

The deterministic and stochastic models give different information about the development of the disease. For detail of the benefits and drawbacks of each model type, see below. In sum, the deterministic model is easier to implement but gives less information than the stochastic model.

(For more information on the S-I-S Model and another application see APPM4390:Modelling the dynamics of a complex life-cycle parasite )

Adjusting Rate Constants for the Stochastic Model

To adjust the rate constants for the stochastic S-I-S model, consider the examples above. In the deterministic problem, the I->S rate constant is a function of the parameter a, the deterministic rate constant. This is similar to the first example in which no adjustment in the constant is necessary; thus, a=f(a). For the susceptible to infected reaction, the S+I->2I rate constant is a function of r. This is similar to the second example. The result of interest is c={\frac  {k}{V\times n_{A}}}, where c is the deterministic rate constant and k is the stochastic rate constant. Since the particle in the S-I-S model is a person, n_{A}=1 (meaning that there is one person per person). For this problem, V=1. Thus we do not need to adjust the parameter r.


Once the rate constants have been transformed properly, one can compute the solution. The results of the stochastic solution agree with the deterministic solution. The animation below shows the probability distribution evolving in time.

Probability distribution of the conditioned SIS model.

Deterministic Model Stochastic Model

Conditional Probability

One adjustment that needs to be made is to recognize that the stochastic simulation will give a significant (around 26%) probability of there being no infecteds. Of course, the premise is that an outbreak did occur. To correct for this, it makes sense to look not at the final simulated (day 14) probability distribution but at the final simulated conditional probability distribution, that is, the distribution of possible numbers of infected given that there is at least one infected. One computes the conditional probability as follows:

P({\text{ k infected }}|{\text{ 1 or more people are infected }})=P({\text{ k infected }})/P({\text{ 1 or more people are infected}})

where the probability that one or more people are infected is 1-P({\text{ no outbreak occurs}}).

In sum, the data was normalized to reflect this conditional dependence before plotting.

S-I-R Model

The SIR epidemic is modeled with the following equations

{\frac  {dS}{dt}}=-rSI
{\frac  {dI}{dt}}=rSI-aI
{\frac  {dR}{dt}}=aI

where S, I and R are the numbers of susceptible, infected and recovered students. The parameters r and a respectively control the rate of infection and rate of recovery. The deterministic solution was calculated for the initial conditions of 1 infected and 762 suceptibles using the ode45 integrator in Matlab. For an additional example of an SIR model see APPM4390:Population Biology of Infectious Diseases.

S-I-R Model

Scaling the Stochastic System

As a discrete process the SIR model takes the form

S+I\to 2I
I\to R

One drawback of the stochastic models is encountered immediately upon trying to discretize the model of the boarding school epidemic. The state space for the SIR model consists of all 3-tuples (S,I,R) where S,I,R are nonnegative and S+I+R=763. That's approximately 300,000 elements which is far to large to apply the standard solution. Instead, the system was discretized into 109 groups of 7 students. The size of the groups was dictated by the integer factors of 763 and the new state space has 6105 elements and is computationally tractable. The new initial conditions are: 1 infected group, 108 suceptible groups.

The change in scale requires a change in the r parameter. In the conversion from concentration to discrete stochastic, combining the students into groups requires division by the number of groups per student (this corresponds to the conversion from moles to molecules in the concentration case however, here the students are converted from individuals to groups whereas the molecules are converted from groups to individuals). The a parameter is unchanged as in the stochastic SIS model.


The following figure describes the transition rates for the stochastic SIR. Boundary conditions apply to keep each element of the 3-tuple nonnegative and less than or equal to 109. Matlab code to implement this model is attached to the bottom of this wiki.

SIR MC.png


The animation to the right presents the results from the stochastic SIR model. The animation shows the probability mass function evolving across a simplex describing the state space. Probabilities for states with no infecteds are not shown. The stochastic SIR model agrees with the deterministic results but suggests that the timing of the epidemic is highly variable.

Practically, it would be difficult to validate the probability the model assigns to the case that no epidemic occurs. Still, it is interesting to consider the timing of end of the epidemic. The figure below plots the probability (cumulative distribution function) that the system is in a state with no infecteds as a function of time (at which point the epidemic is over and the model will remain stationary). The plot indicates that there is a significant probability (~0.25) that the epidemic won't happen. However, if there are still infected students after two days there will almost certainly be an epidemic and it won't end until sometime after the tenth day.

Probability distribution for the stochastic SIR model. Probabilities for states with no infecteds are not shown.

Cumulative distribution function for the timing of the end of the epidemic (first time that I=0)

Model Comparison

The ODE and stochastic models have complementary strengths. The stochastic models account for individual interaction and provide the entire probability distribution output. However, accounting for individual interaction is intractable for large systems: for N individuals the state space in the stochastic SIS model has N+1 elements, the state space in the stochastic SIR model requires (N+1)(N+2)/2 elements. Epidemic models with more compartments, such as SEIR or MSIR, require geometrically increasing state spaces. On a personal computer, it becomes impractical to solve a Markov chain with more than a few thousand elements in the state space. Solving the system using an ODE integrator requires considerably less memory but computational requirements may be just as impractical, especially for large times. Finally, verifying the stochastic models experimentally would require many epidemics under controls and would be very difficult for human subjects.

The ODE models are easy to solve accurately but only provide mean output. For large systems, the continuous nature of the ode solution will not significantly change the results.

Recent Advancements

In a recent 2010 paper entitled "Stochasticity in staged models of epidemics: quantifying the dynamics of whooping cough", Andrew J. Black and Alan J. McKane attempt to improve the accuracy of stochastic epidemic models by incorporating subclasses into the standard stages of disease progression (S-E-I-R). As discussed in this wiki project, a usual assumption in these models is that exposed and infectious periods are exponentially distributed, which makes the probability of moving from one class to another independent of time spent within that class. This assumption has been shown to be flawed, as the probability of recovery is strongly dependent on the time since infection. Black and McKane intend to remedy this problem by incorporating staged subclasses which are traversed sequentially. By using this method, the authors were able to reinterpret the dynamics of a seasonally forced model of whooping cough and better quantify this dynamical system, which could not be done successfully with the standard stochastic epidemic model. Further models are being studied and developed that study the effect of human decision on the outcome of the epidemic.


  1. J.D. Murray. Mathematical Biology. Vol. I. pp 325-6. Springer. 2003.
  2. R. Durrett. Essentials of Stochastic Processes. Springer. 2004.
  3. D.J. Wilkinson. Stochastic Modeling for Systems Biology. Chapman and Hall/CRC. 2006.

Stochastic S-I-R Matlab Source

% Matlab script to run scaled version of the epidemic

%% Parameters
N=109; %number of groups of boys - choose this small at first or it may take several days to run
initI=1; %initial number of infecteds
initR=0; %initial number of recovereds
initS=N-initI-initR; %initial number of suceptibles

%% Set up variables

% number of recoverd, suceptible, and infecteds

%flip make I upper triangular, transpose makes it lower triangular
% have to flip the others to match

% create an index matrix that maps (S,I,R) tuples that sum to N into a
% vector.  index(ss,rr) gives index in state space Q,
% S(ss,rr),I(ss,rr),R(ss,rr) gives number of sucepts, infecteds and recvds 
for rr=1:(N+1)  %index rows of S, R N+1-rr gives number of recovereds
    for ss=1:rr  %index cols of S, R. ss-1 gives number of susceptibles
        index(rr,ss)=nn; % index is upper triangular

numStates=nn; %total number of states in the state space

%% Compute the probability rate matrix
Q=zeros(numStates); % probability rate matrix
for rr=1:(N+1)   %index rows of S, R N+1-rr gives number of recovereds 
    for ss=1:rr  %index cols of S, R. ss-1 gives number of susceptibles
        %S->I: S decreases by one (so we take ss-1), R
        %remains the same rate is rSI
        if ss>=2
        %I->R: I decreases by one (ss will stay the same), R increases (so
        %we need rr-1, since r is indexed backwards).  Rate is aI
        if rr>=2 && rr>ss

%% Compute the distribution, p, as a function of time
% build an index array to map vector p to matrix p
revIndex=zeros(numStates,1); %index statevector back into matrix
for ii=1:numStates

%compute distribution
p=zeros(N+1,N+1,numel(t)); % 2d probability distribution, 1st index is S, 2nd is I, 3rd is time
p0(index(N+1-initR,initS+1))=1; %set initial condition
for ii=1:numel(t) % compute distribution as a function of time (solving p'=pQ, p(0)=p0)
    ptemp = p0*expm(Q*t(ii));

%% Condition that there is always at least 1 person infected

%% visualization
axis([0 N+1 0 N+1 -0.04 0.04]);
hold on;
shading flat;
for ii=1:numel(t)
    title([num2str(t(ii)) ' Days']);
    xlabel('S (students)');
    ylabel('R (students)');
    shading interp;