MIT's AI deduces ingredients and recipes from food photos
If you've ever been served a delicious dish but were too shy to ask for the recipe, MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) may have the answer. Working with the Qatar Computing Research Institute (QCRI), a CSAIL team has been developing an artificial intelligence system called Pic2Recipe that can predict the ingredients in a dish from an image, and even suggest recipes for similar dishes.
The internet is a tremendously valuable tool for scientists because it provides them with a ready-made source of words, images, and audio to create all sorts of databases. With proper indexing and annotation, such web-derived data has in recent years led to great strides forward in facial recognition software, voice interfaces, and artificial intelligence in general. But when it comes to food, things are still a bit behind the times.
"In computer vision, food is mostly neglected because we don't have the large-scale datasets needed to make predictions," says Yusuf Aytar, an MIT postdoc. "But seemingly useless photos on social media can actually provide valuable insight into health habits and dietary preferences."
Working on previous work by Swiss and Hong Kong researchers, the CSAIL team is developing a database of over one million food images called "Recipe1M" and the algorithms needed to recognize them and extract useful information from them. They did this by feeding the data into an artificial neural network called Pic2Recipe that is trained to look at the images and find patterns that allowed it to draw connections between the food and recipes.
The idea is that if you give Pic2Recipe an image of some prepared food, it should be able to deduce a list of ingredients, then correlate this to other images and provide a list of similar recipes. There's even a simple online version that the public can use to try out the technology.
So far, Pic2Recipe works best with desserts, like cookies or muffins, while more ambiguous foods, like sushi, smoothies, and cocktails, are as difficult for the software to guess what's inside as it is for human diners. The team says it also has trouble with very similar dishes, like variations on lasagna, and the system had to be adjusted to focus on general ingredients that the dishes have in common before comparing recipes.
According to MIT, the next step is to tweak the system to go beyond ingredients to inferring how the dish was prepared – if the tomatoes were diced or stewed, for example – and telling apart different varieties of the same ingredient, like different mushrooms, onions, or potatoes.
The hope is that Pic2Recipe will have a number of applications. It could, for example, provide insights into people's dining habits or track an individual's daily nutrition by simply snapping images of their meals – not to mention being an aid to curious cooks wanting to recreate restaurant dishes at home.
"This could potentially help people figure out what's in their food when they don't have explicit nutritional information," says CSAIL graduate student Nick Hyne. "For example, if you know what ingredients went into a dish but not the amount, you can take a photo, enter the ingredients, and run the model to find a similar recipe with known quantities, and then use that information to approximate your own meal."
The results of the research (PDF) will be presented this month at the Computer Vision and Pattern Recognition conference in Honolulu.
The video below shows how Pic2Recipe works.