Facebook AI researchers have built a system that can analyze a photo of food and then create a recipe from scratch. Snap a photo of a particular dish and, within seconds, the system can analyze the image and generate a recipe with a list of ingredients and steps needed to create the dish.
The system can’t determine the exact type of flour used or cooking temperature, but the system will come up with a recipe for a very credible approximation. The system is currently only available for internal research at Facebook.
The team’s “inverse cooking” system uses computer vision, a technology that extracts information from digital images and videos to give computers a high level of understanding of the visual world. Mobile apps that allow users to identify plant and dog species, or that scan your credit card, also leverage computer vision.
Facebook’s computer vision system leverages two neural networks — algorithms that are designed to recognize patterns in digital images. Michal Drozdzal, a Research Scientist at Facebook AI Research, explains that the inverse cooking system splits the image-to-recipe problem into two parts: One neural network identifies the ingredients that it sees in the dish, while the other devises a recipe from the list.
Food recognition is one of the toughest areas of natural image processing. Food comes in all shapes and sizes — what AI scientists call “high intraclass variability” — and changes appearance when it’s cooked.
Previous image-to-recipe programs were a bit more simple in their approach. In fact, they thought more like librarians than chefs. Drozdzal explains that these less sophisticated systems merely retrieved a recipe from a fixed data set based on the similarity of the photo to the images on file: “It was like having a photo of the food and then searching in a huge cookbook of pictures to match it up.”
The Facebook AI team implemented an ingredient-predicting network that whittles 17,000 possible ingredients down to 1,500 and trained the model to predict that certain ingredients often appear together, like salt and pepper, cheese and tomato, and cinnamon and sugar.
The recipe-generating network also leverages the Recipe1M data set, which the team slimmed down from around 1 million recipes to approximately 350,000. Recipes that made the cut all contained images and had two or more ingredients or instructions. The data set also furnishes the neural network with a vocabulary of nearly 25,000 unique words in addition to the information from the image and the ingredient list. The network also analyzes the interplay between image and ingredients for insights on how the food was processed to produce the resulting dish.
The team is continuing to fine-tune the system. They also want to train the system to deal with the problem of visually similar foods, like mayonnaise versus sour cream, and more importantly, they would like to start cooking the recipes.