Video browsing

Last updated

Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, [1] modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach. [2] [3] Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features [4] are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, [5] visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts (e.g., only shots showing cars), through some specific characteristics (e.g., color or motion filtering), through user-provided sketches (e.g., a visually drawn sketch), or through content-based similarity search.

Contents

History

Video browsing was originally proposed by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at Siemens, and it was presented at the ACM International Conference in August 1993. [1] [6] They described a shot detection algorithm for compressed video that was originally encoded with discrete cosine transform (DCT) video coding standards such as JPEG, MPEG and H.26x. [7] The basic idea was that, since the DCT coefficients are mathematically related to the spatial domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as motion vector representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing. [8] The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents. [9]

Video Notebook

Modern video browsing solutions include Video Notebook, a Menlo Park startup founded in 2021 by Mike Lanza, which uses computer vision to extract slides and optical character recognition and speech recognition to facilitate video search. The software can be either used on the client side (using a browser extension), where the slides and text are extracted while the video is watched (e.g. on a video platform like YouTube or Udemy), [10] [11] or on the server side. Processed videos, which can be viewed in the Video Notebook web app, feature a video browsing user interface with extracted timestamped slides, a search bar for querying the video (or a collection of videos), and text chapters. Video Notebook customers include organisations like Ernst & Young. [12]

Video Browser Showdown

The Video Browser Showdown (VBS) [13] is an annual live evaluation competition for exploratory video search tools, where international researchers use video browsing tools to solve ad-hoc video search tasks on a moderately large data set as fast as possible. The main goal of the VBS, which started in 2012 at the International Conference on MultiMedia Modeling (MMM), is to advance the performance of video browsing tools. Since 2016, the VBS also collaborates with TRECVID. [14] The aim of the VBS is to evaluate video browsing tools for efficiency at known-item search (KIS) tasks with a well-defined data set in direct comparison to other tools. [15]

Related Research Articles

<span class="mw-page-title-main">Web crawler</span> Software which systematically browses the World Wide Web

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.

CiteSeerX is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.

MPEG-7 is a multimedia content description standard. It was standardized in ISO/IEC 15938. This description will be associated with the content itself, to allow fast and efficient searching for material that is of interest to the user. MPEG-7 is formally called Multimedia Content Description Interface. Thus, it is not a standard which deals with the actual encoding of moving pictures and audio, like MPEG-1, MPEG-2 and MPEG-4. It uses XML to store metadata, and can be attached to timecode in order to tag particular events, or synchronise lyrics to a song, for example.

Hypermedia, an extension of the term hypertext, is a nonlinear medium of information that includes graphics, audio, video, plain text and hyperlinks. This designation contrasts with the broader term multimedia, which may include non-interactive linear presentations as well as hypermedia. It is also related to the field of electronic literature. The term was first used in a 1965 article written by Ted Nelson.

An image retrieval system is a computer system used for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, title or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools.

A recommender system, or a recommendation system, is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. Typically, the suggestions refer to various decision-making processes, such as what product to purchase, what music to listen to, or what online news to read. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.

Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. User queries can range from multi-sentence full descriptions of an information need to a few words.

<span class="mw-page-title-main">Content-based image retrieval</span> Method of image retrieval

Content-based image retrieval, also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based image retrieval is opposed to traditional concept-based approaches.

<span class="mw-page-title-main">News aggregator</span> Client software that aggregates syndicated web content

In computing, a news aggregator, also termed a feed aggregator, feed reader, news reader, RSS reader, or simply an aggregator is client software or a web application that aggregates syndicated web content such as online newspapers, blogs, podcasts, and video blogs (vlogs) in one location for easy viewing. The updates distributed may include journal tables of contents, podcasts, videos, and news items.

Shot transition detection also called cut detection is a field of research of video processing. Its subject is the automated detection of transitions between shots in digital video with the purpose of temporal segmentation of videos.

Search Engine Results Pages (SERP) are the pages displayed by search engines in response to a query by a user. The main component of the SERP is the listing of results that are returned by the search engine in response to a keyword query.

Exploratory search is a specialization of information exploration which represents the activities carried out by searchers who are:

<span class="mw-page-title-main">Google Images</span> Image search engine by Google Inc.

Google Images is a search engine owned by Google that allows users to search the World Wide Web for images. It was introduced on July 12, 2001, due to a demand for pictures of the green Versace dress of Jennifer Lopez worn in February 2000. In 2011, reverse image search functionality was added.

Human–computer information retrieval (HCIR) is the study and engineering of information retrieval techniques that bring human intelligence into the search process. It combines the fields of human-computer interaction (HCI) and information retrieval (IR) and creates systems that improve search by taking into account the human context, or through a multi-step search process that provides the opportunity for human feedback.

Collaborative search engines (CSE) are Web search engines and enterprise searches within company intranets that let users combine their efforts in information retrieval (IR) activities, share information resources collaboratively using knowledge tags, and allow experts to guide less experienced people through their searches. Collaboration partners do so by providing query terms, collective tagging, adding comments or opinions, rating search results, and links clicked of former (successful) IR activities to users having the same or a related information need.

Collaborative information seeking (CIS) is a field of research that involves studying situations, motivations, and methods for people working in collaborative groups for information seeking projects, as well as building systems for supporting such activities. Such projects often involve information searching or information retrieval (IR), information gathering, and information sharing. Beyond that, CIS can extend to collaborative information synthesis and collaborative sense-making.

Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events.

A video coding format is a content representation format for storage or transmission of digital video content. It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation. A specific software, firmware, or hardware implementation capable of compression or decompression to/from a specific video coding format is called a video codec.

In web analytics, a session, or visit is a unit of measurement of a user's actions taken within a period of time or with regard to completion of a task. Sessions are also used in operational analytics and provision of user-specific recommendations. There are two primary methods used to define a session: time-oriented approaches based on continuity in user activity and navigation-based approaches based on continuity in a chain of requested pages.

Click tracking is when user click behavior or user navigational behavior is collected in order to derive insights and fingerprint users. Click behavior is commonly tracked using server logs which encompass click paths and clicked URLs. This log is often presented in a standard format including information like the hostname, date, and username. However, as technology develops, new software allows for in depth analysis of user click behavior using hypervideo tools. Given that the internet can be considered a risky environment, research strives to understand why users click certain links and not others. Research has also been conducted to explore the user experience of privacy with making user personal identification information individually anonymized and improving how data collection consent forms are written and structured.

References

  1. 1 2 Arman, Farshid; Depommier, Remi; Hsu, Arding; Chiu, Ming-Yee (October 1994). "Content-based browsing of video sequences". Proceedings of the second ACM international conference on Multimedia - MULTIMEDIA '94. Association for Computing Machinery. pp. 97–103. doi: 10.1145/192593.192630 . ISBN   0897916867. S2CID   1360834.
  2. Supporting video library exploratory search: when storyboards are not enough. M. G. Christel. 2008.
  3. The Video Explorer - a tool for navigation and searching within a single video based on fast content analysis. K. Schoeffmann, M. Taschwer, and L. Boeszoermenyi. 2010.
  4. Video Interaction Tools: A Survey of Recent Work. K. Schoeffmann, M. A. Hudelist, and J. Huber. 2015.
  5. Interfaces for timeline-based mobile video browsing. W. Hürst and K. Meier. 2008.
  6. Arman, Farshid; Hsu, Arding; Chiu, Ming-Yee (August 1993). "Image processing on compressed data for large video databases". Proceedings of the first ACM international conference on Multimedia - MULTIMEDIA '93. Association for Computing Machinery. pp. 267–272. doi:10.1145/166266.166297. ISBN   0897915968. S2CID   10392157.
  7. Skodras, Athanassios (2009-01-01). "Real time data hiding by exploiting the IPCM macroblocks in H. 264/AVC streams". Journal of Real-Time Image Processing.
  8. Zhang, HongJiang (1998). "Content-Based Video Browsing And Retrieval". In Furht, Borko (ed.). Handbook of Internet and Multimedia Systems and Applications. CRC Press. pp.  83–108 (89). ISBN   9780849318580.
  9. Steele, Michael; Hearst, Marti A.; Lawrence, A. Rowe (1998). "The Video Workbench: a direct manipulation interface for digital media editing by amateur videographers" (PDF): 1-19 (14). S2CID   18212394. Archived from the original (PDF) on 2019-02-26. Retrieved 18 October 2019.{{cite journal}}: Cite journal requires |journal= (help)
  10. "Video Notebook - Notes on all video platforms". chrome.google.com. Retrieved 2022-06-03.
  11. "Video Screenshots and Notes - YouTube & more". www.videonotebook.com. Retrieved 2022-06-03.
  12. "Videos made Browsable & Searchable - Video Notebook". www.videonotebook.com. Retrieved 2022-06-03.
  13. Video Browser Showdown
  14. TRECVID, Academic benchmark initiative by NIST
  15. Schöffmann, Klaus; Bailer, Werner (2012-07-24). "Video browser showdown". ACM SIGMultimedia Records. 4 (2): 1–2. doi:10.1145/2350204.2350205. S2CID   46224263.