Computers

New Google Chrome extension lets you copy and delete text in images

A new Chrome extension called Project Naptha allows users to copy and delete text from images
A new Chrome extension called Project Naptha allows users to copy and delete text from images

It's generally just accepted that text embedded in images on the Web is inaccessible. Because images are rendered as a single layer, that's just the way it is ... or was, because a new extension for Google Chrome called Project Naptha now allows users to highlight and copy text from within images.

The first thing to say is that this functionality does exist elsewhere. Certain pieces of software, such as Microsoft OneNote, Google Drive and Google Street View use optical character recognition (OCR) to identify text within images.

Project Naptha, on the other hand, uses a method call Stroke Width Transform (SWT) that was developed by Microsoft Research. Unsatisfied with the open-source OCR algorithms that were available, developer Kevin Kwok spent time trying to find a solution. He tells Gizmag that he spent weeks looking at letters as "cryptogram puzzles" and recognizing text with an advanced language model, as well as more weeks "trying to build a kind of brute force text recognizer."

Ultimately, he decide to use SWT. This approach uses the width of the lines that make up letters as a means of identifying elements that could potentially be text, rather than trying to spot predetermined separate features as a marker of text. This gives it certain advantages over OCR.

"[Stroke Width Transform] is capable of identifying regions of text in a language-agnostic manner," explains Kwok. "In a sense that’s kind of like what a human can do; we can recognize that a sign bears written language without knowing what language it's written in, never mind what it means."

SWT is also able to detect angled text and text in photos, and indeed was actually designed for the purpose of the latter. This means it isn't limited to making out text in scans of printed letters or screenshots from the Web, in which occurrences text tends to be more familiar to that produced by computers and therefore easier to pick out.

Kwok explains to Gizmag that Project Naptha was something he initially worked on as part of a hackathon at MIT (at which he won 2nd place). "Selecting text in pictures was something which was quite doable on a technical level, that is, the technology that it requires to function exists, and has done so for quite some time," he explains. "But for some kind of inexplicable reason, it hadn't been done before. Everything else, the transcription, translation, text erasure, and modification just came as an obvious and trivial addition once the first, kind of useless, part of the idea was accomplished."

Kwok gives a number of example sources with which Project Naptha can be used, including scans, photos containing text, diagrams with labels, screenshots and images with text overlays. He also demonstrates the ability for text overlays to be deleted from images and the image backfilled, as well as for highlighted text within images to be translated. To provide a seamless experience for the user, Naptha tracks the movement of the cursor and continuously extrapolates a second ahead based on its position and velocity, so it can begin processing any potential text that the user might want to pick out from an image.

Kwok acknowledges that much of the functionality in Project Naptha needs to be improved and suggests that, over time, text recognition, translation and deletion can all be developed further (he actually says in a tweet that the reason he has launched now is to make use of some credit he has with Google that was due to run out). Nevertheless, the basic functionality is very usable and the potential for the more advanced technology is exciting.

"I think the real value that Naptha provides is the experience, which as far as I am aware, is unprecedented," muses Kwok. "In terms of its various subcomponents and algorithms, it's probably quite a few years behind the state of the art, and one of the exciting things would be the possibility of a team to bridge that gap between research and consumer use."

If you were wondering, the name Naptha is derived from the use of a substance called naptha in lighter fuels and the process of highlighting text.

You can find out more about Project Naptha and test drive a demo at the Project Naptha website.

Chrome extension: Project Naptha

  • Facebook
  • Twitter
  • Flipboard
  • LinkedIn
3 comments
Robt
"..the ability for text overlays to be deleted from images and the image backfilled.."
Not good. That would negate photographers ability to post images online while overlaying with 'Copyright etc.'
Jon A.
Does not work at all well on small text. And by small, I mean roughly 10 point or less.
The type of font seems to matter as well. It doesn't do at all well with the standard "LOLCat font," for instance.
Jeff Kang
The author responds to some comments here: https://news.ycombinator.com/item?id=7629396
I think that Sikuli, an open source automation tool, also uses the Tesseract optical character recognition program, an option in Naptha . For those that don’t know what Sikuli is, it’s like AutoHotkey and AutoIt. However, instead of writing keystrokes (e.g. send {control}f) to access the interface elements that could be involved in macros, you just take screenshots of the interface elements. e.g. click . (Picture of the in-line screenshots that are used in Sikuli scripting http://i.imgur.com/2dqGSPr.png).