About

This map shows the probability that a restaurant will be flagged with two or more critical violations on its next health inspection. These predictions were modeled off of a training dataset consisting of each restaurant's past health inspection scores, descriptive characteristics, and sentiment scores extracted from Yelp reviews.

Restaurants with red icons have been flagged as being at risk for receiving two or more critical violations on their next health examination.

Where the Data Comes From

New York City restaurant inspection results are published by the Department of Health on the NYC OpenData Portal. Each row contains an individual violation along with the final score and grade that the restaurant received for that inspection. The dataset also contains descriptive information about the restaurant's location and cuisine.

This data was supplemented with reviews scraped from Yelp along with information regarding each establishment's price bracket and star rating.

How Scores are Assigned

We constructed a Random Forest Classifier to predict whether or not a restaurant would receive two or more critical violations on it's next health inspection. The model uses past violations, descriptive features, and aggregated sentiment scores for each establishment's Yelp reviews. These sentiment scores were generated using the NRC Emotion Lexicon

What are Critical Violations?

Health inspectors check for a wide variety of violations, but only critical violations pose a threat to public health. Critical violations range from not storing meat at a proper temperature to having rats or cockroaches on the premises. By targeting only critical violations, our model helps prioritize inspections for establishments that are at higher risk of causing an outbreak of foodborne illness.

Learn more about the health inspection process.