Scrolling through your Facebook, Twitter or Instagram feed is proof positive that we love taking photos - of our food, our children, our drunken antics and, most of all, ourselves. No matter how good your happy snap, finding a way to caption it can be tricky. It's still early days, but one day you could hand off the responsibility to a Captionbot.
Powered by Microsoft's Cognitive Services, the bot looks over your images and gives rudimentary descriptions of what it can see using a Computer Vision API, an Emotion API and a Bing Image API. This is the same base software Microsoft has used for its How Old Do I Look? system.
To actually create the captions, this system has been coupled with the language system from Tay, Microsoft's attempt at a chat bot that was shut down after a vulnerability led to it tweeting racist and sexist remarks.
The photo captioning system is not completely accurate, but attempts to describe the person in an image, what they're doing and their emotions in the moment. It can also recognize animals and describe landscapes, although it did respond with "I am not really confident" to both the images we uploaded, before confusing one of our male journalists for a female. Okay, so that journalist was me...
What Microsoft's system won't do is read the caption aloud, which means deaf people may still need to turn to Facebook's bot for help. The Facebook bot gives suggestions as to what's in an image, with responses qualified with a rating about how confident it is in the description.
At the moment, the Captionbot system is in testing. Once it's returned a caption, you can rate the response. Before you try and corrupt it by uploading all the crazy, lewd photos you've got, it's worth bearing mind the system keeps all the photos it's analyzed.
Source: Microsoft
It said a koala was a dog. It said a skeleton was a bird in front of a mirror and it said the Earth was a pair of skis ! Back to the drawingboard Microsoft !
That would be some impressive Artificial Intelligence in an area where actual intelligence seems to be lacking.
Gold pocket watch > thought it was a cell phone Kid reading a book > got that one right Recurve bow and arrows leaning against a target > street sign (not a terrible guess...) Owl perched on roof > bird on a while (nice job)
But here was the best one: Hand written note, pen on white paper, measurements for a tuxedo > "I think it's a person on a surf board in a skate park."