---
title: "Snow and cholera maps"
author: "[Thomas Coleman](http://www.hilerun.org/econ)"
output: html_notebook
---
# Maps for 1855 Broad Street cholera outbreak
#### See "Causality in the Time of Cholera" working paper at https://papers.ssrn.com/abstract=3262234 and my [John Snow project website](http://www.hilerun.org/econ/papers/snow)
#### Using the _cholera_ package from Lindbrook on GitHub: https://github.com/lindbrook/cholera
#### # T Coleman July 2018, April 2023
#### This notebook is licensed under the [BSD 2-Clause License](https://opensource.org/licenses/BSD-2-Clause)
This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code. The results are also saved in a self-contained html document with the suffix *.nb.html*. If you want pure r code (for example to run outside RStudio) you can easily extract code with the command *knit('notebook.Rmd',tangle=TRUE)* which will save a file 'notebook.R' under your working directory.
## Introduction
This notebook displays the Voronoi and walking distance maps from the _cholera_ package, and calculates the actual and predicted cases as discussed in section 5.1.5 of "Causality in the Time of Cholera"
For a brief introduction to Snow's work, see:
+ **Snow's original 1855 monograph** (it is masterful): Snow, John. 1855. *On the Mode of Communication of Cholera*. 2nd ed. London: John Churchill. http://archive.org/details/b28985266.
+ **The best popular exposition I have found**: Johnson, Steven. 2007. *The Ghost Map: The Story of Londonâ€™s Most Terrifying Epidemic--and How It Changed Science, Cities, and the Modern World*. Reprint edition. New York: Riverhead Books.
+ **Another good popular version**: Hempel, Sandra. 2007. *The Strange Case of the Broad Street Pump: John Snow and the Mystery of Cholera*. First edition. Berkeley: University of California Press.
+ **Tufte's classic discussion of Snow's mapping** (a topic I don't cover here): Tufte, Edward R. 1997. *Visual Explanations: Images and Quantities, Evidence and Narrative*. 1st edition. Graphics Press.
+ **Biography**: Vinten-Johansen, Peter, Howard Brody, Nigel Paneth, Stephen Rachman, and Michael Russell Rip. 2003. *Cholera, Chloroform and the Science of Medicine: A Life of John Snow*. Oxford; New York: Oxford University Press. Linked on-line resources https://johnsnow.matrix.msu.edu/snowworks.php
### Map with Pumps
First, we need to load the package. Then we can display the map that shows the pumps, and the address of deaths.
```{r}
rm(list=ls()) # starts a fresh workspace
library(knitr)
# install.packages("cholera")
library("cholera")
snowMap()
```
### Voronoi Diagram
We can plot the map with Voronoi neighborhoods overlaid. A Voronoi diagram or Voronoi tesselation builds "neighborhoods" around each pump by drawing a boundary between pumps that is equidistant from each pump.
```{r}
plot(neighborhoodVoronoi())
```
### Actual vs Expected Counts for Voronoi Neighborhoods (Voronoi Tesselation)
We can calculate the actual number of deaths for each Voronoi neighborhood, and the expected number if deaths were evenly distributed across the map (corresponding to an assumption of equal population density across the map - not such a bad assumption for this densly-populated area of London in 1855). If deaths were even across the map, then the predicted number of deaths (for a pump neighborhood) would be the total number of deaths (321) times the area of a neighborhood.
The function _neighborhoodVoronoi_ from the _cholera_ package calculates the Voronoi neighborhoods, and the $expected.data contains the "pump id", "map area", and the "neighborhood %" for each Voronoi neighborhood. We can populate a table with actual and expected deaths for each Voronoi pump neighborhood. And then we can calculate a Pearson chi-squared statistic to see if they actual and expected are different. In this case (as we would expect) the sum of squares is large and we can soundly reject the hypothesis that the actual and expected are the same. (The 1% significance level for a chi-square with 12 degrees of freedom is 26.2. Our calculated value is much larger. )
```{r}
counts_voronoi <- neighborhoodVoronoi()
# Create table of "Actual vs Expected"
voronoi_actvpred <- matrix(0,nrow=14,ncol=5)
colnames(voronoi_actvpred) <- c("pump.id","Actual","Area %","Expected","Pearson")
rownames(voronoi_actvpred) <- c("Market Place","Adam and Eve Court","Berners Street","Newman Street","Marlborough Mews",
"Little Marlborough Street","Broad Street","Warwick Street","Bridle Street","Rupert Street",
"Dean Street","Tichborne Street","Vigo Street","Total")
# Populate the table from the "counts_voronoi"
voronoi_actvpred[1:13,"pump.id"] <- counts_voronoi$expected.data[,1]
voronoi_actvpred[1:13,"Actual"] <- summary(counts_voronoi) # the actual counts (summing across neighborhoods??)
voronoi_actvpred[1:13,"Area %"] <- counts_voronoi$expected.data[,3] # pump neighborhoods as % of total area
voronoi_actvpred["Total","Actual"] <- sum(voronoi_actvpred[1:13,"Actual"]) # Calculate total counts and area (area should be 1.0)
voronoi_actvpred[1:13,"Expected"] <- voronoi_actvpred[1:13,"Area %"] * voronoi_actvpred["Total","Actual"] # Calculate Expected = %area * total counts
voronoi_actvpred[1:13,"Pearson"] <- ((voronoi_actvpred[1:13,"Actual"] - voronoi_actvpred[1:13,"Expected"])^2) /
voronoi_actvpred[1:13,"Expected"]
voronoi_actvpred[1:13,"Area %"] <- 100*voronoi_actvpred[1:13,"Area %"] # Convert from decimal to percent
voronoi_actvpred["Total",c("Expected","Pearson","Area %")] <- colSums(voronoi_actvpred[1:13,c("Expected","Pearson","Area %")]) # Sum of squares for Pearson chi-squared statistic
kable(voronoi_actvpred,digits=1,caption="Actual versus Expected Deaths by Voronoi Pump Neighborhood",format='pandoc')
```
### Actual vs Expected Counts for Voronoi Neighborhoods (Voronoi Tesselation)
The Voronoi neighborhood calculation above uses the Euclidean distance from the location of a death to a pump - essentially assuming that people would "fly over" buildings in their travels from their home to get water from a pump. This is not realistic. Snow recognized this, and built a "walking neighborhood" around pumps by walking the streets and finding the closest walking distance for deaths. We can do this (or actually the package _cholera_ can do this for us). This defines "walking neighborhoods" and calculates the number of deaths assigned to each pump's walking neighborhood. These will be slightly different from the Voronoi "actual deaths", as can be seen in the table below.
Calculating the expected number of deaths is a little more difficult. What the _cholera_ package does (with the function _neighborhoodWalking(case.set = "expected")_) is to distribute a large number of "artificial deaths" uniformly across the map. For each of these deaths, the function walks to pumps along the roads and finds the pump with the shortest walking distance. At the end we have a list of artificial deaths for each pump, and for each neighborhood we can calculate the fraction of all our original artificial deaths. This fraction or percent is what we would expect given random distribution of deaths across the map, and so we can distribute our total actual deaths (321) according to these percents to given an expected count, assuming even distribution of deaths across the map.
The function _neighborhoodWalking_ from the _cholera_ package calculates the neighborhoods for actual and evenly-distributed deaths. From this we can populate a table with actual and expected deaths for each walking neighborhood. And then we can calculate a Pearson chi-squared statistic to see if they actual and expected are different. In this case (as we would expect) the sum of squares is large and we can soundly reject the hypothesis that the actual and expected are the same. (The 1% significance level for a chi-square with 12 degrees of freedom is 26.2. Our calculated value is much larger. )
#### NOTE: the function _neighborhoodWalking(case.set = "expected")_ is computationally intensive. It will use multiple cores to parallelize the calculation, but it is still slow.
```{r}
counts_walking_act <- neighborhoodWalking() # Takes actual deaths, walks them to nearest pump to define walking neighborhood
counts_walking_exp <- neighborhoodWalking(case.set = "expected") # Distributes artificial "deaths" across map, for each
# walk to nearest pump. NB - this is computationally expensive
# Create table of "Actual vs Expected"
walking_actvpred <- matrix(0,nrow=14,ncol=6)
colnames(walking_actvpred) <- c("pump.id","Actual","Neigh %","Expected","Pearson","Act Voronoi")
rownames(walking_actvpred) <- c("Market Place","Adam and Eve Court","Berners Street","Newman Street","Marlborough Mews",
"Little Marlborough Street","Broad Street","Warwick Street","Bridle Street","Rupert Street",
"Dean Street","Tichborne Street","Vigo Street","Total")
# Populate the table from the "counts_voronoi"
walking_actvpred[1:13,"pump.id"] <- 1:13
walking_actvpred[3:12,"Actual"] <- summary(counts_walking_act) # the actual counts (summing across neighborhoods??)
xx <- summary(counts_walking_exp) # The counts of "artificial deaths" by pump neighborhood. We want to convert to fraction
yy <- sum(xx) # Total of deaths across the map
walking_actvpred[1:13,"Neigh %"] <- xx / yy # pump neighborhood deaths as % of total deaths
walking_actvpred["Total","Actual"] <- sum(walking_actvpred[1:13,"Actual"]) # Calculate total counts and area (area should be 1.0)
walking_actvpred[1:13,"Expected"] <- walking_actvpred[1:13,"Neigh %"] * walking_actvpred["Total","Actual"] # Calculate Expected = %area * total counts
walking_actvpred[1:13,"Pearson"] <- ((walking_actvpred[1:13,"Actual"] - walking_actvpred[1:13,"Expected"])^2) /
walking_actvpred[1:13,"Expected"]
walking_actvpred[1:13,"Neigh %"] <- 100*walking_actvpred[1:13,"Neigh %"] # Convert from decimal to percent
walking_actvpred["Total",c("Expected","Pearson","Neigh %")] <- colSums(walking_actvpred[1:13,c("Expected","Pearson","Neigh %")]) # Sum of squares for Pearson chi-squared statistic
walking_actvpred[,"Act Voronoi"] <- voronoi_actvpred[,"Actual"]
kable(walking_actvpred,digits=1,caption="Actual versus Expected Deaths by Walking Neighborhood",format='pandoc')
```