Microsoft bringing breakthrough AI image captioning to Word, PowerPoint, Outlook

Yolanda Curtis
October 16, 2020

AI researchers at Microsoft reached a major milestone this week: they managed to create a new "artificial intelligence system" that is, in many cases, actually better than a human at describing the contents of a photo. When presented with an image containing novel objects, the AI system leverages the visual vocabulary to generate an accurate caption. Refining captioning techniques can help every user: It makes it easier to find the images you're looking for in search engines.

But while beating a benchmark is significant, the real test for Microsoft's new model will be how it functions in the real world.

Nonetheless, Microsoft's innovations will help make the internet a better place for visually impaired users and sighted individuals alike.


Back in 2016, Google claimed that its AI systems could caption images with 94 percent accuracy. It says that the new model is twice as accurate as the one it has been using since 2015. "It represents not only understanding the objects in a scene, but how they're interacting, and how to describe them".

During the model training process, one or more tags are randomly masked and the model is asked to predict the masked tags conditioned on the image region features and the other tags. Typically, these sorts of models are trained with images and full captions, which makes it more hard for the models to learn how specific objects interact. The end result is better, more accurate captions.

"Given the benefit of this, we've worked to accelerate the integration of this research breakthrough and get it into production and Azure AI", Eric Boyd, corporate vice president of AI platform at Microsoft, told VentureBeat via phone earlier this week.


Microsoft benchmarked the VIVO-pretrained model on nocaps, a test created to encourage the development of image captioning models that can learn visual concepts from alternative sources of data. The new capability is launching first in Azure Cognitive Services today and will propagate soon to Microsoft Word, Outlook, and PowerPoint, the company said this morning. This will give any developer the ability to integrate the tool into their apps. According to the World Health Organization, the number of people of all ages who are visually impaired is estimated to be 285 million, of whom 39 million are blind.

In addition to the latest version of the Cognitive Services Computer Vision API, Microsoft says the model is now included in Seeing AI.


Other reports by iNewsToday

FOLLOW OUR NEWSPAPER