(Redirected from COVID-19/Julian Mendez)

This learning resource is required to process the number of reported cases in a way that they provide an estimation of the real number of cases in the total population. The adjusted number of cases ${\displaystyle a_{n}}$ at day ${\displaystyle n}$ is important for the forcast and mathematical modelling for COVID-19.

## History of Learning Resource

Julian Mendez originally posted this idea here and it is archived here. For an updated reformulation, see below.

We start with a reference day of data collection with the index ${\displaystyle t=0}$ (e.g. January 6, 2020 is ${\displaystyle t=0}$, January 7, 2020 is ${\displaystyle t=1}$, ...). We use the variable ${\displaystyle t}$ because it is a time index.

• ${\displaystyle t}$ number of days after reference day of data collection.
• ${\displaystyle b_{t}}$ number of tests performed at day ${\displaystyle t}$, defines the baseline (${\displaystyle b}$) for positive tested patients.
• ${\displaystyle c_{t}}$ number of COVID-19 positive tests, for day ${\displaystyle t}$ on which the samples for the tests are collected. Please keep in mind, that laboratory needs time to process the collected samples, so do not count the positive tested COVID-19 cases to the day, when they are official reported. Count the positive tests to the day, when the samples are taken from the patients. Variable (${\displaystyle c}$) used because of first letter COVID-19.
• fraction ${\displaystyle {\frac {c_{t}}{b_{t}}}}$ indicates the percentage of positive test of all tests, e.g. ${\displaystyle b_{t}=1000}$ tests ${\displaystyle c_{t}=50}$ positive COVID-19 tests creates the fraction ${\displaystyle {\frac {c_{t}}{b_{t}}}=0.05}$ (i.e. 5% of the tests are positive)
• Patients that show minor symptoms and do not popup in the health system for testing create a bias in the the fraction ${\displaystyle {\frac {c_{t}}{b_{t}}}=0.05}$. Otherwise we could adjust the asymptomatic patients do not show symptoms and the spread the disease without knowing that they are COVID-19 positive.

### Susceptible, Infected and Recovered (SIR)

The spread of the virus is dependend on the immune status of the population. Consider the two completely different situations for the public health status of the population with ${\displaystyle n_{t}}$ citizens at day ${\displaystyle t}$:

• (Situation 1: vulnerable population) There is only one infected citizen (${\displaystyle I_{t}=1}$) at time index ${\displaystyle t}$ among the population and all the other citizens are susceptible (i.e. ${\displaystyle S_{t}=n_{t}-1}$ for COVID-19. Hence there are no citizens, that recovered from the COVID-19 disease (i.e. ${\displaystyle R_{t}=0}$). Therefore an epdidemiologically extremly vulnerable community is exposed to a single infected citizen among the population (patient zero). The spreading of the disease will show an exponential growth.
• (Situation 2: protected population) There is again only one infected citizen (${\displaystyle I_{t}=1}$) among the population, but this time all the other citizens recovered from the COVID-19 infection (i.e. ${\displaystyle R_{t}=n_{t}-1}$. Hence there are no susceptible citizens among the population, that can be infected from the single COVID-19 patient (i.e. ${\displaystyle S_{t}=0}$). The single infected patient among the population cannot infect somebody in population because the recovered patients were immune against COVID-19 infection.

Lesson learnt: Having a higher percentage of recovered (immune) patients among the population slows down the spreading of the disease in the population.

• Explain why the people in the compartment "recovered" (R) may return to "susceptible" (S) after a period of time. Do you know disease that you keep you immune status for lifetime?

### Vaccination and protected Population

Vaccination moves citizens from the vulnerable status "susceptible" (S) into a recovered status (R), because the vaccination "emulates" an infection for the immune system and the allows immune system to produce antibodies against the disease. COVID-19 was a new virus in 2019 and therefore vaccination of the population was not possible before the outbreak. COVID-19 disease could cause a critical status of the patient, so that she/he must be treated on an Intensive Care Unit (ICU), so a protected population would be ideal but was not possible because the new virus COVID-19 was exposed to a totally vulnerable society.

Lesson learnt: Due to the fact that vaccination was not possible for COVID-19, the only option to protect the health system is, that the number of cases increase slowly, so that the health system can provide the health service delivery for patients. Keep in mind, that health system has other patients on ICU and the capacity might not be sufficient for a huge number of COVID-19 cases. Therefore staying at home and reducing the number of physical contacts among the population to slow down the epidemiological spreading of COVID-19.

### Estimation of aggregated COVID-19 Infection among Population

• Now we consider the once again ${\displaystyle b_{t}}$ as number of tests performed at day ${\displaystyle t}$. The selection of people that are tested are not randomly selected among the population ${\displaystyle N}$ (e.g. ${\displaystyle N=100\ million}$ people), so that :: ${\displaystyle {\frac {c_{t}}{b_{t}}}\cdot N=0.05\cdot 100,000,000=5,000,000}$ might be a wrong estimate
for the number of infected people among the population. There might be a bias especially when only patients with symptoms are tested. A randomly selected test sample of ${\displaystyle {\widehat {b}}_{t}}$ tests at time ${\displaystyle t}$ can be selected among the population. The tested people are selected without consideration of any symptoms and the selection should be representative for the total population. This study will also detect a number of ${\displaystyle {\widehat {c}}_{t}}$ COVID-19 positive tests. The ratio ${\displaystyle {\frac {{\widehat {c}}_{t}}{{\widehat {b}}_{t}}}}$ is an estimate for fraction of infected people among the population. This might show probably a difference between the calculated ratio ${\displaystyle {\frac {c_{t}}{b_{t}}}}$ and random control test ${\displaystyle {\frac {\widehat {c_{t}}}{\widehat {b_{t}}}}}$ at day ${\displaystyle t}$, e.g.
${\displaystyle {\frac {\widehat {c_{t}}}{\widehat {b_{t}}}}=0.02\not =0.05={\frac {c_{t}}{b_{t}}}}$
This leads to better estimate for the total number of infected people among the population, if the number of ${\displaystyle {\widehat {b_{t}}}}$ of tests for the randomly selected people is high (see Borels Law of large numbers) with a total number of population ${\displaystyle N}$ (e.g. ${\displaystyle N=100\ million}$ people).
${\displaystyle {\frac {\widehat {c_{t}}}{\widehat {b_{t}}}}\cdot N=0.02\cdot 100,000,000=2,000,000}$
Testing capacity is limited and so random selection of samples is costly and therefore this test design might be applied just for calibrating the model. A COVID-19 tests is a limited resource and tests are mostly applied if the patient showing symptoms for COVID-19 or the immune status must be clarified if someone (e.g. member of medical staff) is a risk for the enviroment in which she/he is working/living. So the estimation for the total number of people that show an immune response or exposure to the COVID-19 virus in the test must be based on ${\displaystyle b_{t}}$ and ${\displaystyle c_{t}}$ of tested patients resp. the fraction

${\displaystyle {\frac {c_{t}}{b_{t}}}}$. A control test was performed only once at day ${\displaystyle t_{0}}$ constant for the number of people in the population the show an immune response or exposure to the COVID-19 virus can be calculated by:

${\displaystyle e_{t_{0}}:={\frac {\widehat {c_{t_{0}}}}{\widehat {b_{t_{0}}}}}\cdot {\frac {b_{t_{0}}}{c_{t_{0}}}}={\frac {0.02}{0.05}}={\frac {2}{5}}}$
• With the error correction value ${\displaystyle e_{t_{0}}}$ (e.g. ${\displaystyle e_{t_{0}}={\frac {2}{5}}}$) the estimate for total number of people that show an immune response or exposure to the COVID-19 virus can be estimated by
${\displaystyle {\frac {c_{t}}{b_{t}}}\cdot e_{t_{0}}\cdot N=0.05\cdot {\frac {2}{5}}\cdot 100,000,000=2,000,000}$
Please keep in mind, that the error correction value ${\displaystyle e_{t_{0}}}$ might be updated not as other as new cases are reported.
• The daily growth rate ${\displaystyle d_{t}}$ for the value of new cases is defined as ${\displaystyle d_{t}=1-{\frac {a_{t}}{a_{t-1}}}}$. E.g. if you have ${\displaystyle a_{t}=3000}$ at day ${\displaystyle t}$ and ${\displaystyle a_{t-1}=2000}$ at day before at index ${\displaystyle t-1}$ then
${\displaystyle d_{t}=1-{\frac {a_{t}}{a_{t-1}}}=1-{\frac {3000}{2000}}=0.5}$.
This means that we have an increase of 50% in the number of new cases for the day ${\displaystyle t}$. The daily growth rate ${\displaystyle d_{t}}$ for the value of new cases could also be negative, e.g. if you have ${\displaystyle a_{t}=3000}$ at day ${\displaystyle t}$ and ${\displaystyle a_{t-1}=4000}$ at day before at index ${\displaystyle t-1}$ then
${\displaystyle d_{t}=1-{\frac {a_{t}}{a_{t-1}}}=1-{\frac {3000}{4000}}=-{\frac {1}{4}}=-0.25}$.
This means that the number of new cases for the day ${\displaystyle t}$ decrease by 25%.
 date index tested positive tests percentage adjusted new cases daily growth rate for new cases ${\displaystyle t}$ days ${\displaystyle b_{t}}$ ${\displaystyle c_{t}}$ ${\displaystyle {\frac {c_{t}}{b_{t}}}}$ ${\displaystyle a_{t}}$ ${\displaystyle d_{t}=1-{\frac {a_{t}}{a_{t-1}}}}$ ${\displaystyle (t-1)}$ days ${\displaystyle b_{t-1}}$ ${\displaystyle c_{t-1}}$ ${\displaystyle {\frac {c_{t}}{b_{t}}}}$ ${\displaystyle a_{t-1}}$ ${\displaystyle d_{t-1}=1-{\frac {a_{t-1}}{a_{t-2}}}}$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ 1 day ${\displaystyle b_{1}}$ ${\displaystyle c_{1}}$ ${\displaystyle {\frac {c_{1}}{b_{1}}}}$ ${\displaystyle a_{1}}$ ${\displaystyle d_{1}={\frac {a_{1}}{a_{0}}}}$ 0 reference day ${\displaystyle b_{0}}$ ${\displaystyle c_{0}}$ ${\displaystyle {\frac {c_{0}}{b_{0}}}}$ ${\displaystyle a_{0}}$ undefined

### Logistical Growth and SIR Model

If we assume, that the logistical growth can be applied on COVID-19 disease, the point in time when the number of new detected cases do not increase anymore. This point in time can be estimated if ${\displaystyle d_{t}\approx 0}$ and the point ${\displaystyle S}$ in the following graph.

With the SIR model is applied on the epidemiological modelling, the logistical growth is with a delay in time similar to the green curve of the recovered.

Blue=Susceptible, Red=Infected, and Green=Recovered

Each member of the population typically progresses from susceptible to infectious to recovered. This can be shown as a flow chart in which the boxes represent the different compartments and the arrows the transition between compartments. An arrow from recovered (R) back to susceptible (S) might be added if the patients loose the immune status after a while. That is similar to the status of patients must refresh their vaccination after a number of year. For some diseases one infection or one vaccination is sufficient for life time. COVID-19 is a new disease, so it difficult to estimate in 2020 how the immune system will be prepare for a new exposure to the Corona virus, if the patient recovered.

• Identify disease that need just one vaccination for life time and identify a disease that need a new vaccination for immune system after a number of years.
• Explain why a arrow in SIR-model might be added to the flow chart from recovered (R) to susceptible (S) if scientific evidence will be available for the model extension?

## Testing

There are different tests for a viral disease:

• Polymerase Chain Reaction (PCR): Polymerase chain reaction (PCR) is a method in molecular biology for making millions of copies of a specific DNA sample of the virus DNA. If the replication fails, the tests provides the result, that the sample did not contain the specific DNA sample of the test. Please keep in test addresses not the complete virus DNA. Therefore a fragmented virus DNA that is not capable to program cells for the production of new viruses might leed to a positive test (false positive). The PCR tet is used to detect patient that might infect other patients. So a PCR provides information about the red curve of infected people in the population.
• Antibodies: a test for antibodies of COVID-19 shows if the immune system was exposed to COVID-19 virus and responded to the virus exposure by creating antibodies. This test provides information about the green curve of recovered. Please keep in mind that the immune system needs time to respond to the exposure to a new virus. Therefore the antibody test might fail and patient may be infected and is able to infect others.

## Julian Mendez Contribution

I would like to share an idea to have a more precise understanding of the number of cases of COVID-19. A more precise current number of cases could be approximated by computing the square of cases today divided by the cases one week ago. My suggestion is to use the growth of the previous days. Let us imagine that a region had the following cases:

 date cases daily growth ${\displaystyle n}$ days ago ${\displaystyle a_{n}}$ ${\displaystyle (n-1)}$ days ago ${\displaystyle a_{n-1}}$ ${\displaystyle {\frac {a_{n-1}}{a_{n}}}}$ ${\displaystyle \cdots }$ 1 day ago ${\displaystyle a_{1}}$ ${\displaystyle {\frac {a_{1}}{a_{2}}}}$ today ${\displaystyle a_{0}}$ ${\displaystyle {\frac {a_{0}}{a_{1}}}}$

We can approximate the future growth by the past, and say that an adjusted number of cases can be approximated with:

${\displaystyle a_{0}\cdot {\frac {a_{0}}{a_{1}}}\cdot {\frac {a_{1}}{a_{2}}}\cdot \ldots \cdot {\frac {a_{n-2}}{a_{n-1}}}\cdot {\frac {a_{n-1}}{a_{n}}}}$

which is the same as ${\displaystyle {\frac {a_{0}^{2}}{a_{n}}}}$

As an example, these are the values for the first 10 countries in the list on 2020-03-14:

 country cases today cases one week ago adjusted cases today (approx.) China 80844 80695 80993 Italy 21157 5883 76087 Iran 12729 5823 27825 South Korea 8162 7134 9338 Spain 6391 430 94988 Germany 3795 684 21056 France 4499 949 21329 United States 2794 352 22177 Switzerland 1359 254 7271 United Kingdom 1140 206 6309

These adjusted values may fluctuate with sudden high values, like in the case of Spain.

I hope someone can find this idea useful.

## Adjustments due to the delay in the tests

This is a reformulation of the previous section. I originally posted it here. The updated version is here.

We would like to approximate how many true cases are there. Let us assume that:

• the time between a patient gets infected and the case is reported is always the same
• people do not significantly change the growth of infected cases

The variables are:

• ${\displaystyle k}$ is the number of days between a patient gets infected and the case is reported
• ${\displaystyle r_{i}}$ is the reported cases for day ${\displaystyle i}$
• ${\displaystyle t_{i}}$ is an approximation of true cases for day ${\displaystyle i}$
 day reported cases daily growth true cases approx. of true cases 0 ${\displaystyle r_{0}}$ ${\displaystyle r_{k}}$ 1 ${\displaystyle r_{1}}$ ${\displaystyle {\frac {r_{1}}{r_{0}}}}$ ${\displaystyle r_{k+1}}$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle k}$ ${\displaystyle r_{k}}$ ${\displaystyle {\frac {r_{k}}{r_{k-1}}}}$ ${\displaystyle r_{2k}}$ ${\displaystyle t_{k}}$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle n-k}$ ${\displaystyle r_{n-k}}$ ${\displaystyle {\frac {r_{n-k}}{r_{n-k-1}}}}$ ${\displaystyle r_{n}}$ ${\displaystyle t_{n-k}}$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle \cdots }$ ${\displaystyle n}$ ${\displaystyle r_{n}}$ ${\displaystyle {\frac {r_{n}}{r_{n-1}}}}$ ${\displaystyle r_{n+k}}$ ${\displaystyle t_{n}}$

We would like to find a formula for ${\displaystyle t_{n}}$ that approximates ${\displaystyle r_{n+k}}$.

One possibility is using the previous ${\displaystyle k}$ growth rates. In this case:

${\displaystyle t_{n}=r_{n}\cdot {\frac {r_{n}}{r_{n-1}}}\cdot {\frac {r_{n-1}}{r_{n-2}}}\cdot \ldots \cdot {\frac {r_{n-k+2}}{r_{n-k+1}}}\cdot {\frac {r_{n-k+1}}{r_{n-k}}}}$

Hence, ${\displaystyle t_{n}={\frac {r_{n}^{2}}{r_{n-k}}}}$