This project presents an enhanced version of the oil spill database, based on IncidentNews maintained by NOAA's Office of Response and Restoration (ORR). The data on the IncidentNews platform encompasses 4,473 oil and chemical release incidents recorded as of December 7, 2023. Each incident entry comprises two categories of data: incident-level and post-level.
The incident-level data includes attributes such as location, date, cause, potential maximum release amount, and incident description. All incident-level data can be downloaded as a CSV file from the Raw Incident Data page on the IncidentNews platform. The post-level data provides a series of textual updates following the incident. Post-level data can be accessed separately for each incident via its homepage. For example, the post data for incident #1275 can be found on this page.
The primary limitation of the original dataset is the absence of actual release amounts (RA), only providing potential maximum estimates, which might not reflect real situations. Our enhanced database addresses this gap by adding structured data regarding the actual RA extracted using Natural Language Processing (NLP) tools from incident descriptions and related posts.
The enhanced dataset includes 3,550 oil spill incidents from 1967 to 2023. For each incident, we added three new columns:
- actual RA (gals): The actual oil spill amount identified from incident texts.
- RA source: Whether the actual RA was extracted from the description, posts, or both.
- update label: The relationship between the actual RA and the original potential maximum release value. The labels include:
- "RA confirmed": The actual RA is identical to the potential maximum RA.
- "RA updated": The actual RA differs from the potential maximum RA.
- "RA newly acquired": The potential maximum RA is unavailable, but text information provides an actual RA.
- "No information better than potential maximum RA": The potential maximum RA is available, but no actual RA was identified from the description and post to confirm or update it.
- "RA still unavailable": Both actual RA and potential maximum release amount information are absent for the incident.
This dataset can be used for environmental research, risk assessment, and policy-making to better understand oil spill impacts. Analyze the data using your preferred data analysis tools.
This database was enhanced by Yiming Liu under the supervision of Hua Cai, with special acknowledgments to NOAA's Office of Response and Restoration (ORR) for their assistance in clarifying questions related to the original dataset.
For further inquiries, feel free to contact us via email at liu3285@purdue.edu or huacai@purdue.edu.