Machine learning system detects 10 outbreaks of foodborne illness from Yelp reviews

Columbia University and the New York City Department of Health and Mental Hygiene (DOHMH) have developed a machine learning computer system that uses keywords found on Yelp reviews to identify foodborne illnesses and outbreaks. Findings are published in the Journal of the American Medical Informatics Association.

Introduced in 2012, a prototype text classifier of the system was developed to determine if a review showed a person experiencing a foodborne illness and to determine if the review indicated multiple foodborne illnesses based on the keywords used in the review.

"Effective information extraction regarding foodborne illness from social media is of high importance--online restaurant review sites are popular, and many people are more likely to discuss food poisoning incidents in such sites than on official government channels," said Luis Gravano and Daniel Hsu, coauthors of the study and professors of Computer Science at Columbia Engineering. "Using machine learning has already had a significant impact on the detection of outbreaks of foodborne illnesses."

The system has assisted DOHMH epidemiologists in identifying 8,523 complaints of foodborne illnesses and 10 outbreaks since 2012, showing a strong correlation between test classifiers. Additionally, the study describes the expansion of the system to include social media sources like Twitter to improve the systems monitoring of possible illnesses and outbreaks.

"The collaboration with Columbia University to identify reports of food poisoning in social media is crucial to improve foodborne illness outbreak detection efforts in New York City," said Health Department epidemiologists and coauthors Vasudha Reddy and Katelynn Devinney. "The incorporation of new data sources allows us to detect outbreaks that may not have been reported and for the earlier identification of outbreaks to prevent more New Yorkers from becoming sick."