Sketch recognition

Last updated

Sketch recognition describes the process by which a computer, or artificial intelligence can interpret hand-drawn sketches created by a human being, or other machine. [1] Sketch recognition is a key frontier in the field of artificial intelligence and human-computer interaction, similar to natural language processing or conversational artificial intelligence [2] [3]

Contents

Uses and Applications

Research in sketch recognition lies at the crossroads of artificial intelligence and human–computer interaction. Recognition algorithms usually are gesture-based, appearance-based, geometry-based, or a combination thereof.

Advances in the field of sketch recognition would have significant application in the field of forensic science, in which sketches are often used to identify suspects associated with a crime. [4] [5]

In 2023, two developers used OpenAI's DallE-2 image generation platform to create a forensic sketch program. The program's results were described as "hyper-realistic" and purported the potential of exponentially decreasing the creation time of a forensic sketch, while increasing accuracy. [6]

Sketch recognition technology has also been linked to applications in the fields of architecture, videogame production, animation, construction, and academia, among others. [7] [8] [9] [10]

See also

Related Research Articles

<span class="mw-page-title-main">MessagePad</span> Personal digital assistant made by Apple in 1993

The MessagePad was a series of personal digital assistant devices developed by Apple Computer for the Newton platform in 1993. Some electronic engineering and the manufacture of Apple's MessagePad devices was undertaken in Japan by Sharp. The devices are based on the ARM 610 RISC processor and all featured handwriting recognition software and were developed and marketed by Apple. The devices run Newton OS.

<span class="mw-page-title-main">Pointing device gesture</span>

In computing, a pointing device gesture or mouse gesture is a way of combining pointing device or finger movements and clicks that the software recognizes as a specific computer event and responds to accordingly. They can be useful for people who have difficulties typing on a keyboard. For example, in a web browser, a user can navigate to the previously viewed page by pressing the right pointing device button, moving the pointing device briefly to the left, then releasing the button.

<span class="mw-page-title-main">Optical character recognition</span> Computer recognition of visual text

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo or from subtitle text superimposed on an image.

<span class="mw-page-title-main">Handwriting recognition</span> Ability of a computer to receive and interpret intelligible handwritten input

Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words.

Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science. While some core ideas in the field may be traced as far back as to early philosophical inquiries into emotion, the more modern branch of computer science originated with Rosalind Picard's 1995 paper on affective computing and her book Affective Computing published by MIT Press. One of the motivations for the research is the ability to give machines emotional intelligence, including to simulate empathy. The machine should interpret the emotional state of humans and adapt its behavior to them, giving an appropriate response to those emotions.

The PenPoint OS was one of the earliest operating systems written specifically for graphical tablets and personal digital assistants. It was a product of GO Corporation. PenPoint OS ran on AT&T Corporation's EO Personal Communicator and a number of Intel x86 powered tablet PCs including IBM's ThinkPad 700T series, NCR's 3125, 3130 and some of GRiD Systems' pen-based portables. It was never widely adopted.

<span class="mw-page-title-main">Newton OS</span> Discontinued operating system by Apple Inc.

Newton OS is a discontinued operating system for the Apple Newton PDAs produced by Apple Computer, Inc. between 1993 and 1997. It was written entirely in C++ and trimmed to be low power consuming and use the available memory efficiently. Many applications were pre-installed in the ROM of the Newton to save on RAM and flash memory storage for user applications.

<span class="mw-page-title-main">Gesture recognition</span> Topic in computer science and language technology

Gesture recognition is an area of research and development in computer science and language technology concerned with the recognition and interpretation of human gestures. A subdiscipline of computer vision, it employs mathematical algorithms to interpret gestures.

<span class="mw-page-title-main">OpenCV</span> Computer vision library

OpenCV is a library of programming functions mainly for real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage, then Itseez. The library is cross-platform and licensed as free and open-source software under Apache License 2. Starting in 2011, OpenCV features GPU acceleration for real-time operations.

Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.

Unconventional computing is computing by any of a wide range of new or unusual methods.

<span class="mw-page-title-main">Multi-touch</span> Technology

In computing, multi-touch is technology that enables a surface to recognize the presence of more than one point of contact with the surface at the same time. The origins of multitouch began at CERN, MIT, University of Toronto, Carnegie Mellon University and Bell Labs in the 1970s. CERN started using multi-touch screens as early as 1976 for the controls of the Super Proton Synchrotron. Capacitive multi-touch displays were popularized by Apple's iPhone in 2007. Multi-touch may be used to implement additional functionality, such as pinch to zoom or to activate certain subroutines attached to predefined gestures using gesture recognition.

<span class="mw-page-title-main">Pen computing</span> Uses a stylus and tablet/touchscreen

Pen computing refers to any computer user-interface using a pen or stylus and tablet, over input devices such as a keyboard or a mouse.

Hands-on computing is a branch of human-computer interaction research which focuses on computer interfaces that respond to human touch or expression, allowing the machine and the user to interact physically. Hands-on computing can make complicated computer tasks more natural to users by attempting to respond to motions and interactions that are natural to human behavior. Thus hands-on computing is a component of user-centered design, focusing on how users physically respond to virtual environments.

<span class="mw-page-title-main">Pencept</span> American computer company

Pencept, Inc. was a company in the 1980s that developed and marketed pen computing.

<span class="mw-page-title-main">Human–computer interaction</span> Academic discipline studying the relationship between computer systems and their users

Human–computer interaction (HCI) is research in the design and the use of computer technology, which focuses on the interfaces between people (users) and computers. HCI researchers observe the ways humans interact with computers and design technologies that allow humans to interact with computers in novel ways. A device that allows interaction between human being and a computer is known as a "Human-computer Interface (HCI)".

Handwriting movement analysis is the study and analysis of the movements involved in handwriting and drawing. It forms an important part of graphonomics, which became established after the "International Workshop on Handwriting Movement Analysis" in 1982 in Nijmegen, The Netherlands. It would become the first of a continuing series of International Graphonomics Conferences. The first graphonomics milestone was Thomassen, Keuss, Van Galen, Grootveld (1983).

The history of tablet computers and the associated special operating software is an example of pen computing technology, and thus the development of tablets has deep historical roots. The first patent for a system that recognized handwritten characters by analyzing the handwriting motion was granted in 1914. The first publicly demonstrated system using a tablet and handwriting recognition instead of a keyboard for working with a modern digital computer dates to 1956.

<span class="mw-page-title-main">Microsoft Tablet PC</span> Microsofts former line of tablets

Microsoft Tablet PC is a term coined by Microsoft for tablet computers conforming to a set of specifications announced in 2001 by Microsoft, for a pen-enabled personal computer, conforming to hardware specifications devised by Microsoft and running a licensed copy of Windows XP Tablet PC Edition operating system or a derivative thereof.

Cognitive computing refers to technology platforms that, broadly speaking, are based on the scientific disciplines of artificial intelligence and signal processing. These platforms encompass machine learning, reasoning, natural language processing, speech recognition and vision, human–computer interaction, dialog and narrative generation, among other technologies.

References

  1. Hammond, T. and Davis, R. (2005), "LADDER, a sketching language for user interface developers", Computers & Graphics, 2005, 29(4), pp. 518-532.
  2. Hammond, T., Logsdon, D., Peschel, J., Johnston, J., Taele, P., Wolin, A., and Paulson, B. A sketch recognition interface that recognizes hundreds of shapes in course-of-action diagrams. In Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems (CHI EA '10), 2010, pp. 4213-4218.
  3. Jorge, J. and Samavati, F. (2011), Sketch-Based Interfaces and Modeling, Springer
  4. IJRASET. "Forensic Face Sketch Construction and Recognition". www.ijraset.com. Retrieved 2024-01-10.
  5. Lei, Haopeng; Chen, Simin; Wang, Mingwen; He, Xiangjian; Jia, Wenjing; Li, Sibo (2021-05-25). "A New Algorithm for Sketch-Based Fashion Image Retrieval Based on Cross-Domain Transformation". Wireless Communications and Mobile Computing. 2021: e5577735. doi: 10.1155/2021/5577735 . hdl: 10453/149881 . ISSN   1530-8669.
  6. Xiang, Chloe (2023-02-07). "Developers Created AI to Generate Police Sketches. Experts Are Horrified". Vice. Retrieved 2024-01-10.
  7. "Papers with Code - Sketch Recognition". paperswithcode.com. Retrieved 2024-01-10.
  8. "Sketch Recognition". Microsoft Research. Retrieved 2024-01-10.
  9. Zhang, Lei (2021-08-21). "Hand-drawn sketch recognition with a double-channel convolutional neural network". EURASIP Journal on Advances in Signal Processing. 2021 (1): 73. Bibcode:2021EJASP2021...73Z. doi: 10.1186/s13634-021-00752-4 . ISSN   1687-6180.
  10. Korkut, Elif Hilal; Surer, Elif (2021). "Sketch Recognition for Interactive Game Experiences Using Neural Networks". In Baalsrud Hauge, Jannicke; C. S. Cardoso, Jorge; Roque, Licínio; Gonzalez-Calero, Pedro A. (eds.). Entertainment Computing – ICEC 2021. Lecture Notes in Computer Science. Cham: Springer International Publishing. pp. 393–401. doi:10.1007/978-3-030-89394-1_31. ISBN   978-3-030-89394-1. S2CID   240415939.