Copyfish

Last updated

Copyfish is a browser extension software for Google Chrome and Firefox that allows users to copy and paste or copy and translate text from within images. "Images" come in all kinds of forms: photographs, charts, diagrams, screenshots, PDF documents, comics, error messages, memes, Flash, and subtitles in YouTube movies. [1] [2]

After a user marks the text in an image, Copyfish extracts it from a website, video or PDF document. [3] [4]

Copyfish was first published in October 2015. [5] [6] Copyfish is not only used in Western countries but despite being available only with an English user interface, is used by many Chinese and Hindi-speaking Chrome users. [7] [8] The software is published under the GPL open-source license and hosted on GitHub. [9]

Related Research Articles

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

<span class="mw-page-title-main">Handwriting recognition</span> Ability of a computer to receive and interpret intelligible handwritten input

Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words.

<span class="mw-page-title-main">Adobe Acrobat</span> Set of application software to view, edit and manage files in Portable Document Format (PDF)

Adobe Acrobat is a family of application software and Web services developed by Adobe Inc. to view, create, manipulate, print and manage Portable Document Format (PDF) files.

<span class="mw-page-title-main">Image scanner</span> Device that optically scans images, printed text

An image scanner—often abbreviated to just scanner—is a device that optically scans images, printed text, handwriting or an object and converts it to a digital image. Commonly used in offices are variations of the desktop flatbed scanner where the document is placed on a glass window for scanning. Hand-held scanners, where the device is moved by hand, have evolved from text scanning "wands" to 3D scanners used for industrial design, reverse engineering, test and measurement, orthotics, gaming and other applications. Mechanically driven scanners that move the document are typically used for large-format documents, where a flatbed design would be impractical.

DjVu is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal (monochrome) images. This allows high-quality, readable images to be stored in a minimum of space, so that they can be made available on the web.

<span class="mw-page-title-main">Evince</span> Free software document viewer

Evince, also known as GNOME Document Viewer, is a free and open source document viewer supporting many document file formats including PDF, PostScript, DjVu, TIFF, XPS and DVI. It is designed for the GNOME desktop environment.

<span class="mw-page-title-main">Imaging for Windows</span> Software product for scanning paper documents

Imaging for Windows from Global 360 is document imaging software. Earlier versions of Imaging for Windows were available for Windows 95-98/Me/NT/2000. Global360 Imaging for Windows is the upgrade to this Imaging software, which was discontinued as of Windows XP. Its image viewing, editing and scanning functions are superseded by Windows Picture and Fax Viewer and Microsoft Paint, both of which are based on GDI+ in Windows XP. However, the multi-page picture editing functions are gone with the Imaging software.

<span class="mw-page-title-main">Google Translate</span> Multilingual neural machine translation service

Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, as well as an API that helps developers build browser extensions and software applications. As of 2022, Google Translate supports 133 languages at various levels; it claimed over 500 million total users as of April 2016, with more than 100 billion words translated daily, after the company stated in May 2013 that it served over 200 million people daily.

PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, such as font linking and encryption. The ISO requirements for PDF/A file viewers include color management guidelines, support for embedded fonts, and a user interface for reading embedded annotations.

<span class="mw-page-title-main">Wordfast</span>

The name Wordfast is used for any number of translation memory products developed by Wordfast LLC. The original Wordfast product, now called Wordfast Classic, was developed by Yves Champollion in 1999 as a cheaper alternative to Trados, a well-known translation memory program. The current Wordfast products run on a variety of platforms but use largely compatible translation memory formats, and often also have similar workflows. The software is most popular with freelance translators, although some of the products are also suited for corporate environments.

Image translation is the machine translation of images of printed text. This is done by applying optical character recognition (OCR) technology to an image to extract any text contained in the image, and then have this text translated into a language of their choice, and the applying digital image processing on the original image to get the translated image with a new language.

Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format. This can be done manually or automatically, but the general process is that hard copy data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.

Document Capture Software refers to applications that provide the ability and feature set to automate the process of scanning paper documents or importing electronic documents, often for the purposes of feeding advanced document classification and data collection processes. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process.

<span class="mw-page-title-main">OCRFeeder</span>

OCRFeeder is an optical character recognition suite for GNOME, which also supports virtually any command-line OCR engine, such as CuneiForm, GOCR, Ocrad and Tesseract. It converts paper documents to digital document files and can serve to make them accessible to visually impaired users.

Microsoft Office shared tools are software components that are included in all Microsoft Office products.

<span class="mw-page-title-main">Project Naptha</span>

Project Naptha is a browser extension software for Google Chrome that allows users to highlight, copy, edit and translate text from within images. It was created by developer Kevin Kwok, and released in April 2014 as a Chrome add-on. This software was first made available only on Google Chrome, downloadable from the Chrome Web Store. It was then made available on Mozilla Firefox, downloadable from the Mozilla Firefox add-ons repository but was soon removed. The reason behind the removal remains unknown.

<span class="mw-page-title-main">Google Docs</span> Cloud-based word processing software

Google Docs is an online word processor included as part of the free, web-based Google Docs Editors suite offered by Google, which also includes Google Sheets, Google Slides, Google Drawings, Google Forms, Google Sites and Google Keep. Google Docs is accessible via an internet browser as a web-based application and is also available as a mobile app on Android and iOS and as a desktop application on Google's ChromeOS.

Spark NLP is an open-source text processing library for advanced natural language processing for the Python, Java and Scala programming languages. The library is built on top of Apache Spark and its Spark ML library.

References

  1. "Now you can easily extract texts from images using this tool". GizBot, Samden Sherpa. 5 February 2017.
  2. " How to Extract Text From Images (OCR)". Make Use Of, Joel Lee 1 February 2017
  3. Mike Williams. "Copyfish: free OCR and translation for Chrome". Beta News, 19 October 2015.
  4. "Copyfish 2.6.6 for Chrome". PC Adviser, By Mike Williams | 22 Jan 2017[ dead link ]
  5. Martin Brinkmann. "Copyfish for Chrome: copy and translate text from media". gHacks Tech News, 10 October 2015.
  6. "Text aus Bildern herauskopieren und übersetzen mit Copyfish für Opera, Chrome, Vivaldi". 10 October 2015.
  7. "Copyfish 驚奇 Chrome 套件複製圖片影片內中文字!- 電腦玩物". ("Copyfish Surprise Chrome Suite Copy Image in Chinese") Playpcesor.com. by esor huang October 12, 2015
  8. "फोटो पर लिखे टेक्स्ट को टाइप नहीं, कॉपी करें" ("Do not type the text written on the photo, copy it"). Hindustan Times, Rohit Kumar 13 September 2016
  9. A9T9/Copyfish, 2024-02-05, retrieved 2024-02-07