This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. The results are also saved in a self-contained html document with the suffix .nb.html. If you want pure r code (for example to run outside RStudio) you can easily extract code with the command knit(‘notebook.Rmd’,tangle=TRUE) which will save a file ‘notebook.R’ under your working directory.
This notebook displays the Voronoi and walking distance maps from the cholera package, and calculates the actual and predicted cases as discussed in section 5.1.5 of “Causality in the Time of Cholera”
For a brief introduction to Snow’s work, see:
First, we need to load the package. Then we can display the map that shows the pumps, and the address of deaths.
rm(list=ls()) # starts a fresh workspace
library(knitr)
# install.packages("cholera")
library("cholera")
snowMap()
We can plot the map with Voronoi neighborhoods overlaid. A Voronoi diagram or Voronoi tesselation builds “neighborhoods” around each pump by drawing a boundary between pumps that is equidistant from each pump.
plot(neighborhoodVoronoi())
We can calculate the actual number of deaths for each Voronoi neighborhood, and the expected number if deaths were evenly distributed across the map (corresponding to an assumption of equal population density across the map - not such a bad assumption for this densly-populated area of London in 1855). If deaths were even across the map, then the predicted number of deaths (for a pump neighborhood) would be the total number of deaths (321) times the area of a neighborhood.
The function neighborhoodVoronoi from the cholera package calculates the Voronoi neighborhoods, and the $expected.data contains the “pump id”, “map area”, and the “neighborhood %” for each Voronoi neighborhood. We can populate a table with actual and expected deaths for each Voronoi pump neighborhood. And then we can calculate a Pearson chi-squared statistic to see if they actual and expected are different. In this case (as we would expect) the sum of squares is large and we can soundly reject the hypothesis that the actual and expected are the same. (The 1% significance level for a chi-square with 12 degrees of freedom is 26.2. Our calculated value is much larger. )
counts_voronoi <- neighborhoodVoronoi()
# Create table of "Actual vs Expected"
voronoi_actvpred <- matrix(0,nrow=14,ncol=5)
colnames(voronoi_actvpred) <- c("pump.id","Actual","Area %","Expected","Pearson")
rownames(voronoi_actvpred) <- c("Market Place","Adam and Eve Court","Berners Street","Newman Street","Marlborough Mews",
"Little Marlborough Street","Broad Street","Warwick Street","Bridle Street","Rupert Street",
"Dean Street","Tichborne Street","Vigo Street","Total")
# Populate the table from the "counts_voronoi"
voronoi_actvpred[1:13,"pump.id"] <- counts_voronoi$expected.data[,1]
voronoi_actvpred[1:13,"Actual"] <- summary(counts_voronoi) # the actual counts (summing across neighborhoods??)
voronoi_actvpred[1:13,"Area %"] <- counts_voronoi$expected.data[,3] # pump neighborhoods as % of total area
voronoi_actvpred["Total","Actual"] <- sum(voronoi_actvpred[1:13,"Actual"]) # Calculate total counts and area (area should be 1.0)
voronoi_actvpred[1:13,"Expected"] <- voronoi_actvpred[1:13,"Area %"] * voronoi_actvpred["Total","Actual"] # Calculate Expected = %area * total counts
voronoi_actvpred[1:13,"Pearson"] <- ((voronoi_actvpred[1:13,"Actual"] - voronoi_actvpred[1:13,"Expected"])^2) /
voronoi_actvpred[1:13,"Expected"]
voronoi_actvpred[1:13,"Area %"] <- 100*voronoi_actvpred[1:13,"Area %"] # Convert from decimal to percent
voronoi_actvpred["Total",c("Expected","Pearson","Area %")] <- colSums(voronoi_actvpred[1:13,c("Expected","Pearson","Area %")]) # Sum of squares for Pearson chi-squared statistic
kable(voronoi_actvpred,digits=1,caption="Actual versus Expected Deaths by Voronoi Pump Neighborhood",format='pandoc')
pump.id | Actual | Area % | Expected | Pearson | |
---|---|---|---|---|---|
Market Place | 1 | 0 | 6.1 | 19.5 | 19.5 |
Adam and Eve Court | 2 | 1 | 1.9 | 6.2 | 4.4 |
Berners Street | 3 | 10 | 4.4 | 14.0 | 1.1 |
Newman Street | 4 | 13 | 9.5 | 30.4 | 10.0 |
Marlborough Mews | 5 | 3 | 8.2 | 26.5 | 20.8 |
Little Marlborough Street | 6 | 39 | 12.4 | 39.9 | 0.0 |
Broad Street | 7 | 182 | 8.5 | 27.2 | 881.5 |
Warwick Street | 8 | 12 | 6.9 | 22.1 | 4.6 |
Bridle Street | 9 | 17 | 4.8 | 15.5 | 0.1 |
Rupert Street | 10 | 38 | 5.9 | 19.0 | 19.1 |
Dean Street | 11 | 2 | 7.7 | 24.6 | 20.8 |
Tichborne Street | 12 | 2 | 9.2 | 29.7 | 25.8 |
Vigo Street | 13 | 2 | 14.5 | 46.4 | 42.5 |
Total | 0 | 321 | 100.0 | 321.0 | 1050.2 |
The Voronoi neighborhood calculation above uses the Euclidean distance from the location of a death to a pump - essentially assuming that people would “fly over” buildings in their travels from their home to get water from a pump. This is not realistic. Snow recognized this, and built a “walking neighborhood” around pumps by walking the streets and finding the closest walking distance for deaths. We can do this (or actually the package cholera can do this for us). This defines “walking neighborhoods” and calculates the number of deaths assigned to each pump’s walking neighborhood. These will be slightly different from the Voronoi “actual deaths”, as can be seen in the table below.
Calculating the expected number of deaths is a little more difficult. What the cholera package does (with the function neighborhoodWalking(case.set = “expected”)) is to distribute a large number of “artificial deaths” uniformly across the map. For each of these deaths, the function walks to pumps along the roads and finds the pump with the shortest walking distance. At the end we have a list of artificial deaths for each pump, and for each neighborhood we can calculate the fraction of all our original artificial deaths. This fraction or percent is what we would expect given random distribution of deaths across the map, and so we can distribute our total actual deaths (321) according to these percents to given an expected count, assuming even distribution of deaths across the map.
The function neighborhoodWalking from the cholera package calculates the neighborhoods for actual and evenly-distributed deaths. From this we can populate a table with actual and expected deaths for each walking neighborhood. And then we can calculate a Pearson chi-squared statistic to see if they actual and expected are different. In this case (as we would expect) the sum of squares is large and we can soundly reject the hypothesis that the actual and expected are the same. (The 1% significance level for a chi-square with 12 degrees of freedom is 26.2. Our calculated value is much larger. )
counts_walking_act <- neighborhoodWalking() # Takes actual deaths, walks them to nearest pump to define walking neighborhood
counts_walking_exp <- neighborhoodWalking(case.set = "expected") # Distributes artificial "deaths" across map, for each
# walk to nearest pump. NB - this is computationally expensive
# Create table of "Actual vs Expected"
walking_actvpred <- matrix(0,nrow=14,ncol=6)
colnames(walking_actvpred) <- c("pump.id","Actual","Neigh %","Expected","Pearson","Act Voronoi")
rownames(walking_actvpred) <- c("Market Place","Adam and Eve Court","Berners Street","Newman Street","Marlborough Mews",
"Little Marlborough Street","Broad Street","Warwick Street","Bridle Street","Rupert Street",
"Dean Street","Tichborne Street","Vigo Street","Total")
# Populate the table from the "counts_voronoi"
walking_actvpred[1:13,"pump.id"] <- 1:13
walking_actvpred[3:12,"Actual"] <- summary(counts_walking_act) # the actual counts (summing across neighborhoods??)
xx <- summary(counts_walking_exp) # The counts of "artificial deaths" by pump neighborhood. We want to convert to fraction
yy <- sum(xx) # Total of deaths across the map
walking_actvpred[1:13,"Neigh %"] <- xx / yy # pump neighborhood deaths as % of total deaths
walking_actvpred["Total","Actual"] <- sum(walking_actvpred[1:13,"Actual"]) # Calculate total counts and area (area should be 1.0)
walking_actvpred[1:13,"Expected"] <- walking_actvpred[1:13,"Neigh %"] * walking_actvpred["Total","Actual"] # Calculate Expected = %area * total counts
walking_actvpred[1:13,"Pearson"] <- ((walking_actvpred[1:13,"Actual"] - walking_actvpred[1:13,"Expected"])^2) /
walking_actvpred[1:13,"Expected"]
walking_actvpred[1:13,"Neigh %"] <- 100*walking_actvpred[1:13,"Neigh %"] # Convert from decimal to percent
walking_actvpred["Total",c("Expected","Pearson","Neigh %")] <- colSums(walking_actvpred[1:13,c("Expected","Pearson","Neigh %")]) # Sum of squares for Pearson chi-squared statistic
walking_actvpred[,"Act Voronoi"] <- voronoi_actvpred[,"Actual"]
kable(walking_actvpred,digits=1,caption="Actual versus Expected Deaths by Walking Neighborhood",format='pandoc')
pump.id | Actual | Neigh % | Expected | Pearson | Act Voronoi | |
---|---|---|---|---|---|---|
Market Place | 1 | 0 | 7.1 | 22.7 | 22.7 | 0 |
Adam and Eve Court | 2 | 0 | 0.5 | 1.7 | 1.7 | 1 |
Berners Street | 3 | 12 | 6.0 | 19.1 | 2.7 | 10 |
Newman Street | 4 | 6 | 8.4 | 27.0 | 16.4 | 13 |
Marlborough Mews | 5 | 1 | 4.6 | 14.6 | 12.7 | 3 |
Little Marlborough Street | 6 | 44 | 17.2 | 55.1 | 2.2 | 39 |
Broad Street | 7 | 189 | 8.6 | 27.7 | 941.2 | 182 |
Warwick Street | 8 | 14 | 7.0 | 22.3 | 3.1 | 12 |
Bridle Street | 9 | 32 | 6.1 | 19.6 | 7.8 | 17 |
Rupert Street | 10 | 20 | 4.7 | 14.9 | 1.7 | 38 |
Dean Street | 11 | 2 | 7.7 | 24.8 | 21.0 | 2 |
Tichborne Street | 12 | 1 | 8.9 | 28.5 | 26.5 | 2 |
Vigo Street | 13 | 0 | 13.3 | 42.8 | 42.8 | 2 |
Total | 0 | 321 | 100.0 | 321.0 | 1102.5 | 321 |