Project Naptha

Last updated
Project Naptha
Original author(s) Kevin Kwok
Developer(s) Google Chrome
Initial releaseApril 2013;10 years ago (2013-04)
Stable release
Chrome:
0.9.3 / July 7, 2014;9 years ago (2014-07-07)
Written in JavaScript
Operating system Chrome
Size 428KB
Type Browser extension
Website www.projectnaptha.com

Project Naptha is a browser extension software for Google Chrome that allows users to highlight, copy, edit and translate text from within images. [1] It was created by developer Kevin Kwok, [2] and released in April 2014 as a Chrome add-on. This software was first made available only on Google Chrome, downloadable from the Chrome Web Store. It was then made available on Mozilla Firefox, downloadable from the Mozilla Firefox add-ons repository but was soon removed. The reason behind the removal remains unknown. [3]

Contents

The web browser extension uses advanced imaging technology. [4] Similar technologies have also been employed to produce hardcopy art, and the identification of these works. [5]

By adopting several Optical Character Recognition (OCR) algorithms, including libraries developed by Microsoft Research and Google, text is automatically identified in images. The OCR enables the build-up of a model of text regions, words and letters from all images. [6]

The OCR technology that Project Naptha adopts is a slightly differentiated technology in comparison to the technology used by software such as Google Drive and Microsoft OneNote to facilitate and analyse text within images. Project Naptha also makes use of a method called Stroke Width Transform (SWT), [7] developed by Microsoft Research in 2008 as a form of text detection.

Origin of name

The name Naptha is derived from Naphtha, which is a general term that originated few thousand years ago and refers to flammable liquid hydrocarbon. The process of highlighting texts also inspired the naming of the project.

Difficulty in translation of words from images

The process of editing, copying or quoting text inside images was difficult before software such as Project Naptha arrived. Previously, the only way to search or copy a sentence from an image was to manually transcribe the text.

History

In May 2012, Kevin Kwok [2] was reading about seam carving, an algorithm which was able to rescale images without distorting or damaging the quality of the image. Kwok noticed that they tend to converge and arrange themselves in a way that cut through the spaces in between letters. A particularly verbose comic inspired him to develop a software which can read images (with canvas), figure the positions of the lines and letters, and draw selection overlays to assuage a pervasive text-selection habit.

Kwok's first attempt was simple. He projected the image onto the side and a vertical pixel image histogram was formed. The significant valleys of the resulting histograms served as a signature for the ends of text lines. When horizontal lines are detected, each lines are automatically cropped, and the histogram process repeats itself until all horizontal lines in the image have been identified. In order to determine the letter position, a similar process was carried out, but vertically this time. However, carrying out the process vertically was unsuccessful as projections created were not readable. It was less effective, proving that the process was strictly applicable only for horizontal machine printed text. Faced with high technical difficulties, Kwok decided to abandon this project in 2012.

It was only until Kevin Kwok went on to study at Massachusetts Institute of Technology(MIT) and entered a hackathon, that he picked up this project again. This project eventually won him second place. To him, selecting texts in pictures was something that was manageable on a technical level. The relevant technology exists and was readily available for quite some time, yet for inexplicable reason, it hadn't been expanded for the application of translating texts from images. Once Kevin Kwok decided to start on his project again, the technology for transcription, translation, text erasure, and modification flowed naturally afterwards.

Technical Features

Before the Optical Character Recognition (OCR) can be applied, it has to first identify whether blocks of text exists in an image. Once the blocks of texts are identified, the OCR enables for the build-up of a model of text regions, words and letters from any images. [6] This function provides users with the option to copy, translate and even modify text directly in every image, in real-time and in their Google Chrome browser. [8]

The primary feature of Project Naptha is the text detection function. Running on an algorithm called the “Stroke Width Transform, developed by Microsoft Research in 2008, [7] it provides the capability of identifying regions of text in a language-agnostic manner and detecting angled text and text in images. This is done by using the width of the lines that make up letters as a means to identify elements that could potentially be text rather than trying to spot predetermined separate features as a marker of text.

In this case, the programme becomes highly intuitive, similar to humans whereby we do not need to understand a language in order to recognize a written text. [9]

Project Naptha automatically applies state-of-the-art computer vision algorithms on every image available when browsing the web, allowing users to highlight, copy and paste, edit and translate text which were formerly trapped within an image.

A technique similar to Photoshop's "Content-Aware Fill" feature [10] called "inpainting” is adopted. These types of algorithms are famously known as a part of Adobe Photoshop’s “Content-Aware Fill” feature. It involves the using of an algorithm that automatically fills in the space previously occupied by text with colors from the surrounding area, matching the font of the translated text in the style of the original image. This is done so by, first, detecting the text and retrieving the solid colours from the regions surrounding the text. Following, the colours will be spread around and inwards till the entire area is filled up. This technique allows user to reconstruct images as well as to edit and remove words from an image with the capturing and processing of the independent colours from regions around the edited text. [8]

In order to provide a seamless and intuitive experience for the user, the extension technique tracks cursor movements and continuously extrapolates a second ahead based on its position and velocity, predicting where highlights might be made over an image. [1] The Project Naptha software then scans and runs a processor-intensive character recognition algorithms, processing potential text that users might want to pick out from an image, ahead of time. [11]

Application

Project Naptha can be used on a few applications, enabling users to copy texts from any images displayed in the browser. This includes comics, photos, screenshots, images with text overlays such as internet memes, animated GIFS, scans, diagrams with labels, and translations. [12]

Comics

In October 2013, the first prototype for the extension for comics was released. The need for an extension for comic was due to the use of comic fonts, which are more casual and informal. Characters are often placed closely together as if they are connected and if one tries to copy and paste text from a comic, the copied text will usually appear to be jumbled up and unclear.

Photos

The algorithm used by Project Naptha for photos is the Stroke Width Transform, which was specially designed for detecting text in natural scenes and photographs. This is because photographs are generally tougher and more technically challenging to copy texts from as compared to most regular images.

Screenshots

For Screenshots, Project Naptha transforms static screenshots into something more similar to an interactive snapshot of the computer as it was when the screen was captured. The cursor changes when hovering over different parts, and blocks of text become selectable.

Editing Text on Images

Project Naptha allows one to erase and edit texts on an image by using the translation technology. This translation technology essentially makes use of “Inpainting”.

During the changing of a text, it uses the same trick that translation uses. The Translate menu includes the capability to translate in-image text to many other different languages such as English, Spanish, Russian, French, Chinese Simplified, Chinese Traditional, Japanese, or German. [8]

Technical Limitations

There are a few technical difficulties that Project Naptha still faces despite the constant improvements made to the software.

The language-agnostic nature of Project Naptha's underlying Stroke Width Transform algorithm allows it detect the little squiggles as text. Despite it being a plus point since it is capable of detecting minor details, it can also end up to be seen as a bug by detecting and including too many unwanted details.

When the colours of the texts and background of an image are similar, it becomes challenging for words to be detected, as words become less distinctive from the image. This creates inaccuracies in the detection and copying of texts. [12]

Due to character segmentation, handwritings are especially tough for detection. The characters in handwritings are often written too close to each other, making it difficult to segment the characters or to separate the letters apart. Hence, copying texts from these types of sources will result in high inaccuracy and with jumbled letters. [12]

As part of an improvement feature, Project Naptha started work on it and enabling it to support rotated text. However, this function is only limited only up to about 30 degrees. Any text with rotation of more than 30 degrees may become incapable of being copied or translated.

For techniques that make use of inpainting, present loopholes to it is that images may hardly be a substitute for the original and can leave marks of it being edited. It will however, look as though the words have been flawlessly removed from the image from a distance away.

Security

Security Concerns

For any other software that is used on sites, one of the greatest concerns is due to issues arising regarding the balance between user experience and privacy. It is understood that the developers of Project Naptha are doing their best in attempting to allow the processing on the client side (i.e., within the browser). However, as text selected by users for extraction from the image are being processed in the cloud. This means that in order to achieve higher translation accuracy, there is still a need to rely on greater cloud processing and hence compromising on privacy. [4]

There is a default setting which helps to strike a delicate balance between having all the functionality made available and respecting user privacy. By default, when users begin selecting a text, a secure HTTPS request is sent. This is only contains the URL of the specific image and nothing else – no User Tokens, no Website Information, no Cookies or analytics and the requests are not logged. The server responds with a list of existing translations and OCR languages that have been done. This allows you to recognize text from an image with much more accuracy than otherwise possible.

Depending on the preference of the users, this default function can be disabled by checking on the “Disable Lookup” item under the Options Menu.

Privacy

When installed, Project Naptha requires the permissions and sweeping access to user's information. This informations would be requested in the installation dialog. In order to allow for the interaction with all images, it requires the permission from the user for the software to read all images from all sites. On another hand, if the user does not want to allow access for Project Naptha to all images on all sides, they can also disable this function under the installation dialog. In this case, Project Naptha will be operating at a very low level of access, and is ideally the kind of functionality that gets built into browsers and operating systems natively.

The extension is almost entirely written in client side JavaScript, allowing the extension to function without an access to a remote server. However a point to take note is that an online translation running offline is contradicting and the inadequate access to a cached OCR service running in the cloud would mean a compromise and reduction in performance and lower transcription accuracy.

Lastly due to scalability issues, the translation feature is currently in limited rollout. The online OCR services has per-user metering, hence requires a unique identifier token. This token is completely anonymous and is not linked with any personally identifiable information.

Future Developments

Apart from the current software that allows one to manipulate texts inside the images, there is an experimental feature that plans to widen the ability of the software. Under this experimental extension, the software aims to allow users to search for texts inside images on a current page, serving as a great feature for all users. [4]

Project Naptha has also been looking at different ways to improve on its limitations. Currently, text can only be of a rotation angle of not more than 30 degrees [13] otherwise it would be of inferior quality. Project Naptha will aim to increase the quality in its future versions by using better-trained models and algorithms. There is also a possibility of the inclusion of transcription services that will be assisted by humans.

Also, the techniques of inpainting may leave marks on the original image, making it obvious that it has been edited. This technique is expected to improve as well, especially with a technique of detecting logic besides simply detecting fonts. Currently, inpainted reads fonts in this manner - if uppercase and super bold, then Impact font, if uppercase otherwise then XKCD font, and for everything else, Helvetica Neue.

As acknowledged by Kwok, Project Naptha still has to improve on many of its functionality. The main reason is because in terms of its various subcomponents and algorithms, Project Naptha is a few years behind the state of the art. However, he firmly believes that over time, text recognition, translation and deletion can all be developed further and this immense potential is definitely one that will be exciting.

See also

Related Research Articles

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

DjVu is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal (monochrome) images. This allows high-quality, readable images to be stored in a minimum of space, so that they can be made available on the web.

This is a comparison of both historical and current web browsers based on developer, engine, platform(s), releases, license, and cost.

The internationalized domain name (IDN) homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike

<span class="mw-page-title-main">Adblock Plus</span> Content-filtering and ad blocking browser extension

Adblock Plus (ABP) is a free and open-source browser extension for content-filtering and ad blocking. It is developed by developer Wladimir Palant's Eyeo GmbH, a German software company. The extension has been released for Mozilla Firefox, Google Chrome, Internet Explorer, Microsoft Edge, Opera, Safari, Yandex Browser, and Android.

<span class="mw-page-title-main">Page layout</span> Part of graphic design that deals in the arrangement of visual elements on a page

In graphic design, page layout is the arrangement of visual elements on a page. It generally involves organizational principles of composition to achieve specific communication objectives.

<span class="mw-page-title-main">Features of the Opera web browser</span> List of software application features

This article details features of the Opera web browser.

<span class="mw-page-title-main">Web typography</span> Publishing considerations for the Web

Web typography, like typography generally, is the design of pages – their layout and typeface choices. Unlike traditional print-based typography, pages intended for display on the World Wide Web have additional technical challenges and – given its ability to change the presentation dynamically – additional opportunities. Early web page designs were very simple due to technology limitations; modern designs use Cascading Style Sheets (CSS), Javascript and other techniques to deliver the typographer's and the client's vision.

<span class="mw-page-title-main">Google Chrome</span> Web browser developed by Google

Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, and also for Android, where it is the default browser. The browser is also the main component of ChromeOS, where it serves as the platform for web applications.

<span class="mw-page-title-main">Private browsing</span> Privacy feature in some web browsers

Private browsing is a privacy feature in some web browsers. When operating in such a mode, the browser creates a temporary session that is isolated from the browser's main session and user data. Browsing history is not saved, and local data associated with the session, such as Cookies, Web cache, are cleared when the session is closed. These modes are designed primarily to prevent data and history associated with a particular browsing session from persisting on the device, or being discovered by another user of the same device.

Image translation is the machine translation of images of printed text. This is done by applying optical character recognition (OCR) technology to an image to extract any text contained in the image, and then have this text translated into a language of their choice, and the applying digital image processing on the original image to get the translated image with a new language.

<span class="mw-page-title-main">Chromium (web browser)</span> Open-source web browser project

Chromium is a free and open-source web browser project, mainly developed and maintained by Google. This codebase provides the vast majority of code for the Google Chrome browser, which is proprietary software and has some additional features. Chromium's logo is identical in shape to that of Google Chrome, but with blue colors instead of being multicolor.

Microsoft Office shared tools are software components that are included in all Microsoft Office products.

Google Chrome Experiments Online showroom of web browser based experiments

Google Chrome Experiments is an online showroom of web browser based experiments, interactive programs, and artistic projects. Launched on March 1, 2009, Google Chrome Experiments is an official Google website that was originally meant to test the limits of JavaScript and the Google Chrome browser's performance and abilities. As the project progressed, it took the role of showcasing and experimenting latest open-source web-based technologies, such as JavaScript, HTML5, WebGL, Canvas, SVG, CSS, and some others. All the projects on Chrome experiments are user submitted and are made using open source technologies. As of February 24, 2015, there were 1,000 different Chrome projects posted on the website.

<span class="mw-page-title-main">Google Slides</span> Cloud-based presentation software

Google Slides is a presentation program included as part of the free, web-based Google Docs Editors suite offered by Google. Google Slides is available as a web application, mobile app for: Android, iOS, and as a desktop application on Google's ChromeOS. The app is compatible with Microsoft PowerPoint file formats. The app allows users to create and edit files online while collaborating with other users in real-time. Edits are tracked by a user with a revision history presenting changes. An editor's position is highlighted with an editor-specific color and cursor and a permissions system regulates what users can do. Updates have introduced features using machine learning, including "Explore", offering and "tasks to other users.

Brotli is a lossless data compression algorithm developed by Google. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling. Brotli is primarily used by web servers and content delivery networks to compress HTTP content, making internet websites load faster. A successor to gzip, it is supported by all major web browsers and has become increasingly popular, as it provides better compression than gzip.

<span class="mw-page-title-main">Google Docs</span> Cloud-based word processing software

Google Docs is an online word processor included as part of the free, web-based Google Docs Editors suite offered by Google, which also includes Google Sheets, Google Slides, Google Drawings, Google Forms, Google Sites and Google Keep. Google Docs is accessible via an internet browser as a web-based application and is also available as a mobile app on Android and iOS and as a desktop application on Google's ChromeOS.

Copyfish is a browser extension software for Google Chrome and Firefox that allows users to copy and paste or copy and translate text from within images. "Images" come in all kinds of forms: photographs, charts, diagrams, screenshots, PDF documents, comics, error messages, memes, Flash, and subtitles in YouTube movies.

<span class="mw-page-title-main">HackMIT</span>

HackMIT is an annual student-run hackathon held in the fall at the Massachusetts Institute of Technology.

References

  1. 1 2 Stu, Robarts (24 April 2014). "New Google Chrome extension lets you copy and delete text in images". Gizmag. Retrieved 7 April 2015.
  2. 1 2 Kwok, Kevin. "Profile". Google+. Retrieved 7 April 2015.
  3. Brinkmann, Martin (26 September 2014). "Project Naptha text on image recognition technology comes to Firefox". ghacks.net. Retrieved 2 April 2015.
  4. 1 2 3 Hoffman, Chris (26 June 2014). "Edit Image Text With Chrome's Project Naptha: What It Is & How To Use It". Makeuseof. Retrieved 7 April 2015.
  5. Narelle, Jarry (1996). "Computer Imaging Technology: The Process of Identification". The Book and Paper Group. The American Institute for Conservation. Retrieved 2 April 2015.
  6. 1 2 Brian, Matt. "This Chrome add-on lets you copy and erase text inside any image on the web". Engadget. Retrieved 7 April 2015.
  7. 1 2 "Stroke Width Transform". Stroke Width Transform. Retrieved 7 April 2015.
  8. 1 2 3 Chacos, Brad. "Meet Project Naptha, an amazing Chrome extension for modifying text in web images". PCWorld. Retrieved 7 April 2015.
  9. Starr, Michelle. "Chrome extension lets you copy text from images". CNET. Retrieved 2 April 2015.
  10. Wollman, Dana. "Adobe unveils Photoshop CS6 beta with redesigned UI and 65 new features, download it for free today". Engadget. Retrieved 30 March 2015.
  11. Chan, Norman. "In Brief: Project Naptha OCRs Web Images". Tested. Retrieved 2 April 2015.
  12. 1 2 3 "Project Naptha". Project Naptha. Retrieved 7 April 2015.
  13. Khaw, Cassandra (23 April 2014). "Edit Image Text with the useful Chrome extension". TheVerge. Retrieved 2 April 2015.