Does Taleb's claim of epidemics with a death count of >1,000 being in the lower hundreds (in the last 2,500 years) seem accurate?

score:25

Accepted answer

The key quotes in full, I think, are that Ioannidis et al critique it with:

Examining 72 major epidemics recorded through history, they demonstrate a fat-tailed mortality impact. However, they analyze only the 72 most noticed outbreaks, a sample with astounding selection bias. The most famous outbreaks in human history are preferentially selected from the extreme tail of the distribution of all outbreaks. Tens of millions of outbreaks with a couple deaths must have happened throughout time. Probably hundreds of thousands might have claimed dozens of fatalities. Thousands of outbreaks might have exceeded 1,000 fatalities. Most eluded the historical record. The four garden variety coronaviruses may be causing such outbreaks every year [15,16]. One of them, OC43 seems to have been introduced in humans as recently as 1890, probably causing a “bad influenza year” with over a million deaths [17].

Taleb et al defend this with:

Finally, when the authors in [15] state that "Tens of millions of outbreaks with a couple deaths must have happened throughout time," to support their selection bias claim against 1, they seem to ignore the fact that the analysis deals with pandemics and not with a single sternutation. The class of events under considerations in 1 is precisely defined as "pandemics with fatalities in excess of 1K," and their dataset likely contains most (if not all) of them. Worrying about many false alarms in the tail of the distribution of pandemic fatalities is thus misplaced.

Unfortunately, neither of these is amazingly helpful. Ioannidis's citations do not explicitly say "here's some estimates for how many people these mundane illnesses are killing". On the other hand, Taleb's rebuttal seems to just say "we're not worrying about all the tiny outbreaks, just the big ones", and does not address the key issue of whether that list of 72 is comprehensive.

Instinctively, I have some doubts. Cirillo's list is (as noted in comments) heavily sourced from Wikipedia's list of epidemics; I make it 55/72 (56 with one dual-cited). Ten more are cited to a page on "ListFist", which itself seems to be citing back to the same Wikipedia list. and then there are a couple of sources used once or twice. The WP list as of March 2020 is interesting - comparing it to the one in the article, it looks like they have mostly just skipped ones with no quoted death toll. For example, in the "15th century and earlier" section, WP includes an unknown death rate plague in 412 BC, and in 746 AD, neither of which are shown here (though they do include the 664 British plague, which WP had a blank death toll for).

So, some immediate problems appear. Firstly, the Wikipedia article does not claim to be comprehensive (later versions do misleadingly say "a list of the largest", but this is not in the 2020 version), but they seem to have assumed it was.

Secondly, it is trivial to follow the sources in Wikipedia and find a death toll well in excess of 1000 for some of those "blank" ones. For example, the cited source for the 1636 English plague gives a reasonably precise figure of 10,400 in its introductory paragraph, and mentions other outbreaks in 1603 and 1625 - 1603 but not 1625 is mentioned in Wikipedia, neither in Cirillo. It turns out 1625 killed 26,350 people per a footnote in that article; 1603 "over thirty thousand". We've just extended the list by 4%, just with one country in a 33-year period.

Thirdly, just looking at Cirillo's list, the distribution is striking. Up to the mid-sixth-century AD, they are all in the Mediterranean world; 562, 627, 688 are in Mesopotamia; 664 in Britain; 735 is the first in Asia proper. Then nothing until the Black Death in the fourteenth century. It seems vanishingly unlikely that, to pick some of the obvious points:

  • a) China or India - which had a population in the tens of millions - did not have any substantial epidemic outbreaks for most of its history;
  • b) There was only one major epidemic outbreak pre-1 AD;
  • c) There were none in the early Mesopotamian civilisations or in ancient Egypt;
  • d) There were none in Europe or the Mediterranean for most of the medieval period;

On that first point, in a comment above I mentioned McNeil's Plagues and Peoples (1976). It has an appendix of epidemics recorded in Chinese sources; at a guess, there are around 200. These include what appear to be very major epidemics (killing a substantial proportion of the population in some provinces) in 312, 322, 413, 468, 682, 707, 762, 806... and those are just the ones with relative death tolls noted. It would be challenging to calculate numbers for an analysis like this one, granted, but it seems hard to just dismiss them from consideration because no estimate is available.


Update: skimming the papers citing the original Cirillo paper, we find Intensity and frequency of extreme novel epidemics (PNAS, 2021), which offers a dataset that they have compiled.

In summary, the 1600 to 1945 dataset includes 182 epidemics with known occurrence, duration, and number of deaths, 108 known to have caused less than 10,000 deaths, and 105 for which only occurrence and duration are recorded, for a total of 395 epidemics.

So that gives us almost 200 identified in a 350 year period with over 10,000 deaths, plus an indeterminate number of deaths unknown - well in excess of the original figure

Upvote:6

While my question has been answered well by @Andrew, I'd like to mention here what Corral (2021) had to say since he shares similar concerns about Black Death being the only event in the dataset from 750 to 1450 AD and also provides additional data with respect to Spain which is not included in the dataset (while @Andrew provided for China):

In fact, an important limitation when studying the distribution of fatalities in epidemics comes from the available data, not only because of the small sample size (N = 72 in the data of Ref. [6]), but also from the incompleteness of the data (with a bias in favor of very large events that resampling [6] cannot correct) and from the lack of h*m*geneity in time (with just one event, the Black Death, between 750 and 1450, and 11 events since 2008 in the data of Ref. [6]).

In any case, one can dig a little in the existing records and find many more historical events. As an example, Villalba [43] reported several epidemics in Spain with more than 10 000 fatalities that are not considered in the data of Ref. [6] (these missing Spanish epidemics took place in 1283, 1394, 1490, 1564, 1589, 1637, 1726, 1741, 1784, and 1800). Certainly, there is nothing special about Spain, and other countries can contribute more or less in the same way with more “hidden” epidemics. Nevertheless, the compilation of a reliable record for historical epidemics is something that should not be done by probabilists, statisticians, or physicists and needs to be carefully undertaken by true epidemiologists and historians. We urge here for the necessity of such an important endeavor.

More post

Search Posts

Related post