“Using Crowd-Sourced Data to Explore Police-Related-Deaths in the United States (2000–2017): The Case of Fatal Encounters”

Objectives: We evaluated the Fatal Encounters (FE) database as an open-source surveillance system for tracking police-related deaths (PRDs). Methods: We compared the coverage of FE data to several known government sources of police-related deaths and police homicide data. We also replicated incident selection from a recent review of the National Violent Death Reporting System. Results: FE collected data on n = 23,578 PRDs from 2000–2017. A pilot study and ongoing data integration suggest greater coverage than extant data sets. Advantages of the FE data include circumstance of death specificity, incident geo-locations, identification of involved police-agencies, and near immediate availability of data. Disadvantages include a high rate of missingness for decedent race/ethnicity, potentially higher rates of missing incidents in older data, and the exclusion of more comprehensive police use-of-force and nonlethal use-of-force data—a critique applicable to all extant data sets. Conclusions: FE is the largest collection of PRDs in the United States and remains as the most likely source for historical trend comparisons and police-department level analyses of the causes of PRDs.


Introduction
Citizen deaths that occur during interactions with police officers are increasingly viewed by members of the general public and scholars as a public health concern in the United States [1,2]. Although the term "police homicides" is often used in discussions of citizen deaths during police activities, we prefer the term "police-related deaths" (PRDs) as it permits definitional granularity within a broader net of police violence. PRDs include, among other things: police homicides (law enforcement officers killing citizens, justifiably or otherwise), citizens who die in automobile accidents during vehicle pursuits, citizens who suffer medical emergencies during interactions with police, and citizens committing suicide with police on-scene.
In a recent commentary in the New England Journal of Medicine, Crosby and Lyons argue that legal intervention deaths "are not only devastating to the victims' families and the directly affected communities or neighborhoods; … they represent a significant public health burden and can incite further violence in which more people are killed" [2]. Crosby and Lyons call for a study of police homicides that systematically assesses the scope and nature of such deaths [2]. A related commentary notes that citizens being killed by the police "affect[s] the well-being of the families and communities of the deceased" [1]. Research also suggests that, beyond their impact on victims, families, and communities, police homicides can have long-lasting physical and mental health consequences for police officers, many of whom experience symptoms of "post-shooting stress disorder" [3][4][5].
An in-depth understanding of the broader community impact of police homicides and other PRDs requires thorough knowledge of the scope and nature of such deaths. Unfortunately, however, we lack reliable and comprehensive data about these sorts of deaths and the circumstances surrounding them. No public surveillance system in the United States counts PRDs and the government data collection efforts intended to capture some aspect(s) of the PRD phenomenon-for example, a) the Federal Bureau of Investigation's Supplementary Homicide Reports, b) the Bureau of Justice Statistics' Arrest-Related Deaths Reports, and c) the Centers for Disease Control's National Violent Death Reporting System-are inconsistent and unreliable [6][7][8]. In recent years, however, some citizens have responded to the omissions and flaws in these official government-produced sources by developing data sets designed to produce more accurate and complete counts of citizens who die during interactions with police officers.
These unofficial data rely on internet crowd-sourcing and other data collection efforts conducted by the public to catalogue some aspect(s) of PRD's; several researchers have suggested that these efforts may capture more citizen deaths [9] and may therefore be the best current strategy for collecting data on PRD's [10]. Unfortunately, very little is known about the quality of the information contained in data sets produced by citizen researchers. The primary aim of this paper is to both summarize the state of official data sources and to further our understanding of unofficial data collections by analyzing the relative advantages, limitations, and completeness of one of the most prominent sources of PRD data assembled by citizens to date: the Fatal Encounters Project (http://www. fatalencounters.org/).

Extant Data Sources
Currently, government-funded criminal justice data collections are comprised of two sources: 1) the voluntary justifiable homicides (JH) portion of the Supplementary Homicide Reports (SHR) collected under the Uniform Crime Reporting System of the FBI, and 2) the piecemeal Department of Justice's arrest-related-death (ARD) data that is part of the Deaths in Custody Reporting Program (DCRP). Researchers have long known that these data contain substantial omissions [7,[11][12] and an internal review by the Bureau of Justice Statistics (BJS) notes that a majority of incidents may be missing from the arrestrelated-death data [13], which have not been reported publicly since 2009. Currently, data from only 750 of approximately 17,985 (4.2%) law enforcement agencies voluntarily submit "justifiable homicides" to the FBI's SHR program [6][7], and the BJS reports that between 31-41% of ARD in 2011 were not captured, with approximately 50% uncaptured for years prior (2003)(2004)(2005)(2006)(2007)(2008)(2009) [14].

Federally Mandated and Other Data Collection Efforts
The DCRP has been recently mandated to cover all law enforcement agencies [15] and is moving ahead with plans to collect pilot data; however, the ARD program relies on data submission from a single State Reporting Coordinator (SRC) from each state [16], rather reports mandated and produced by each police-department. Each SRC must collect his/her own data as law enforcement agencies are not required to systematically document or report incidents. These coordinators rely either exclusively (39%) or exclusively/partially (73%) on internet searches of news sources, while fewer than 20% use a law enforcement survey [14]. Going forward, we can expect this federally mandated program to continue to fail to capture a signification portion of PRDs-independent of the program being made mandatory for all states.
Given the clear liabilities of the DOJ data sources, some researchers have turned to other government data sources to measure police-related deaths, including: a) the National Vital Statistics Survey (NVSS), which is based on death certificates, and b) the National Violent Death Reporting System (NVDRS), which is based on death certificates characterized as having resulted from "legal intervention" [9,17], as well as coroner/medical examiner and police reports.
In addition to the non-DOJ government data sets, some scholars have turned to crowd-sourced, internet-based sets developed by citizens, such as Killed by Police, The Counted, The Washington Post, Mapping Police Violence, and Fatal Encounters. However, very little is known about their quality, completeness, or reliability as data sources for quantifying the scope and nature of police-related deaths. To better our understanding of the strengths and weaknesses of citizen-based data sets that catalogue PRDs, we examine one of the most prominent citizen-based data sources assembled to date: the Fatal Encounters Project.

Unofficial PRD Data: Fatal Encounters
PRD data for FE are collected using three methods: 1) Freedom of Information Act (FOIA) and other public records requests of law enforcement agencies, 2) crowdsourcing internet searches by volunteers, paid researchers, and the curator of FE, and 3) cross-checking of data with newly developing online websites such as those by The Guardian and Washington Post. These deaths include police homicides, deaths that occur due to suicides in police presence, accidental deaths during foot-pursuit, and accidental and use-of-force deaths during vehicular pursuits. Fatal Encounters is essentially a "living document" that is curated daily and is fact-checked and corrected by the curator and crowd-sources towards the end of comprehensively documenting all PRDs in the United States.
Newspapers and online news stories have both shown to be excellent sources of injury surveillance and reporting for a variety of phenomenon [18][19][20][21][22]. Further, crowdsourcing of online/print news stories has shown to be a valid method to comprehensively assess prevalence for social and behavioral research in public health [23]. Finally, crowd-sourcing is a relatively low-cost and efficient way to collect information on newsworthy events such as PRDs.

Advantages of FE
Although, as noted above, there are several crowd-sourced data sources that collect information on police homicides, Fatal Encounters is known to be a much more extensive source of data on PRDs in general, and police homicides in particular [9,24]. FE collects data as far back as 2000 and contains more variables than other data sources. First, Fatal Encounters collects an extensive array of police-related deaths with a diverse set of causes and circumstances (see Table 1). This gives researchers the flexibility to explore diverse characterizations of deaths  While incidents that would almost always be defined as police homicides (see e.g., gunshots and bludgeoning) comprise a substantial proportion of these 23,578 police-related deaths-several PRDs include circumstances of death such as asphyxiation, that are unlikely to be reported in death certificates or official government sources. Further, although gunshots are the most prevalent circumstance of police-related deaths, Table 1 reveals that various other circumstances account for a non-trivial portion of PRDs and that vehicular pursuits are the second most prevalent. According to Table 1, suicides during an arrest are the third most prevalent police-related death in FE; while many of these are instances of individuals turning a gun on themselves during a pursuit, others reflect grayer areas such as decedents who intentionally burned their homes during standoffs or who drowned while evading arrest.
A second advantage of FE is that its collection of data back to 2000 allows for important trend analyses of police-related deaths. Third, and perhaps most importantly, FE identifies a death as police-related, based on reports and follow-up reports made by journalists. Thus, it is not subject to the biases and social pressures that may inform whether a death is adjudicated to be the result of police involvement in an official document like a death certificate. For example, deaths ruled accidental in autopsy reports, may in fact be police-related deaths, and autopsy declarations by forensic pathologists can be heavily influenced by police-provided information or biased law enforcement authorities.
Fourth, every incident of a PRD in FE is linked to an address that has been geo-coded. To date, 97% of the 23,578 incidents of police-related deaths in FE have been geo-coded to an exact latitude/longitude. A pilot assessment [24] of the quality of these geo-codes using 15 years of police-related deaths in New York City (n = 384) found that addresses could be described by four tiers of specificity: Tier 1) 90.3% of incidents could be identified by an exact street address or name/cross-street combination (predicted error of ±50 m); Tier 2) 1.3% identified by hundreds blocks ("800 block of Main Street", e.g.; predicted error ±200 m); Tier 3) 7.5% roads ("I-84", e.g.; predicted error ±1,000-100,000 m); Tier 4) 0.7% places ("College of Staten Island", e.g.; predicted error 1000m); and Tier 5) 0% no address. Of course, since FE data are open-source, errors can be (and are) corrected as needed.
Fifth, nearly all PRDs in FE are accompanied by news stories, which allows for a careful micro-analysis of each incident. Sixth, unlike the government sources of police homicide data, which can take several years for public release, FE's ongoing data collection efforts, duplicate checks, and data cleaning, lead to incidents being released within one week of the date of death, often within a day or two.
Seventh, variable availability in FE is much more extensive than other sources and includes the following variables: decedent's full name, decedent's age, decedent's gender, decedent's race/ethnicity, URL image of decedent, date of incident/death, location of death, zip code of death, GPS coordinates, agency involved, circumstances of death (gunshot, vehicle, stun-gun, bludgeoned with instrument, beaten, medical emergency, asphyxiated, domestic violence, stabbed, drug overdose, bean bag rounds, other), details of the death (e.g., routine arrest, suspicion of activity, weapon present, decedent shots fired), indicators of symptoms of mental illness in the victim (e.g., suicidal threats, law enforcement called by family to assist with mentally ill family member), judicial disposition (justified, excusable, criminal, pending investigation, and others), and links to relevant news articles.
Finally, FE are open-sourced-data; corrections and omissions can be submitted by the public and although the ultimate determination for what appears in the file is determined by the FE curator, transparency is the primary motivating force behind the data collection and reporting, allowing individual researchers to make their own determinations.

Limitations of FE
An important limitation of FE is that there is no gold standard with which to compare the completeness of the incidents collected in FE. In fairness, this is true for every data source used to quantify police-related deaths. A pilot study assessing the completeness of the FE data involved making FOIA requests of a random sample of 328 law enforcement agencies in a sample of 11 states (CT, FL, MA, ME, MT, NH, NV, NY, OR, RI, SD) was recently undertaken in order to assess the comprehensiveness of FE data for 2000-2015 (Farman 2016). Responses were obtained from 246 (75%) of the sampled agencies and it was found that FE data are fully complete for 9 of the states sampled. Data were missing for only 1 incident in CT (92% complete) and 8 in FL (95%); these incident news-stories have been located and added to the data. This pilot study further noted that FE data contained a substantial number of incidents that were not reported by police departments through public records requests. Additionally, although FE is the only online project to collect data before 2012-the Killed by Police (KBP) database collects data for May 2013-2018-compared to KBP, FE was 99.1% complete while KBP was 91.4% complete compared to FE for the years of overlap examined (2013-2015) [24].
A second limitation of FE is that it is possible that the reporting of older incidents is less complete due to PRDs being less historically newsworthy and/or the deletion of old internet news stories; on the other hand, the FOIA pilot did not indicate that older incidents were more likely to be missing from FE. A third limitation, and this critique applies to virtually all sources of PRD data (c.f., Vice News non-fatal police shootings), is that incidents reported by FE are only fatal outcomes and do not reflect the full continuum of police-related violence, gunshots that miss their target, and nonfatal gunshots and other uses of force that result in injury. Ultimately, researchers will need to document and collect this wide range of data in order to fully understand police use of force.
A fourth limitation with FE concerns missing data. While missingness is rare for most variables (e.g., name of decedent 2.7%, age 2.8%, gender 0.2%), nearly 40% of the cases (38.9%) are missing information about decedents'' race/ethnicity. This is largely because this variable is coded based on news reports and accompanying photos in reported or related news stories. We implemented Bayesian-improved surname geo-coding [25] to replace missing race/ethnicity data. Using the non-missing incidents as a validation sample, we find that the use of surnames combined with Census demographic data at the level of the geo-coded incident block group yields statistically significant (p < .001) point bi-serial correlations with race/ethnicity as follows: non-Hispanic White, r = .73; non-Hispanic Black, r = .72; Hispanic, r = .89; Asian, r = .73; Native American, r = .55.

Incident Count of FE Data
To assess the coverage of incidents in FE, we first compared various circumstances of PRDs in FE to the DOJ's arrest- intentional use of force plus vehicular homicides (panel 3), intentional use of force plus vehicular homicides and foot pursuit deaths (panel 4), and intentional use of force plus vehicular homicides, foot pursuit deaths, and medical emergencies and overdoses (panel 5). Finally, using a fairly restrictive definition of PRDs that include only intentional use of force and vehicular pursuit homicides (panel 6)-the number of police homicides as a percentage of the total number of population-based homicides has been steadily rising from a low of 5% in 2000, to a peak of 11.1% in 2013 and 2014, with a decline to 9.0% in 2015. At this time, lacking a true gold standard, there is no way to assess the completeness of the FE data, but we do note that FE contains substantially more incidents, even when definitions of incidents closely match. Tedious, but ongoing efforts are being made to ensure that incidents contained in all government data sets are also contained within FE.
Next, we replicated a strict criterion definition and selection for police homicides (2005-2012) that was detailed in a recent publication [9] that was found to be far superior to the NVSS and ARD data sources. Using these precise definitions, we found that overall, FE documented 10% more incidents of police homicide than the NVDRS (see Table 2).

Discussion and Conclusions
The deaths in custody reporting program (DCRP) will be the only government mandated collection of police homicide data for the United States going forward. However, as noted, an internal review of this data collection system discovered large holes in coverage that were not simply reducible to voluntary data submission [13]. In addition, the FBI has recently begun a "National Use-of-Force Data Collection Program" (https://www.fbi.gov/services/cjis/ ucr/use-of-force) which is designed to capture all shootings (missed shots, injuries, and deaths). This program began in January of 2019 and data are not yet available for assessment, but unfortunately, this program remains completely voluntary.