Internet research

Last updated

Internet research is the practice of using Internet information, especially free information on the World Wide Web, or Internet-based resources (like Internet discussion forum) in research.

Contents

Internet research has had a profound impact on the way ideas are formed and knowledge is created. Common applications of Internet research include personal research on a particular subject (something mentioned on the news, a health problem, etc.), students doing research for academic projects and papers, and journalists and other writers researching stories.

Research is a broad term. Here, it is used to mean "looking something up (on the Web)". It includes any activity where a topic is identified, and an effort is made to actively gather information for the purpose of furthering understanding. It may include some post-collection activities, like reading the material, and analysis, such as of quality or synthesis to determine whether it should be read in-depth.

Through searches on the Internet, pages with some relation to a give topic can be visited and read, or be quickly found and gathered. In addition, the Web can be used to communicate with people with relevant interests and experience, such as experts, to learn their opinions and what they know. Communication tools used for this purpose on the Web include email (including mailing lists), online discussion forums (aka message boards, BBS's), and other personal communication facilities (instant messaging, IRC, newsgroups, etc.). can provide direct access to experts and other individuals with relevant interests and knowledge.

Internet research is distinct from library research (focusing on library-bound resources)[ citation needed ] and commercial database research (focusing on commercial databases).[ citation needed ] While many commercial databases are delivered through the Internet, and some libraries purchase access to library databases on behalf of their patrons, searching such databases is generally not considered part of “Internet research”.[ citation needed ] It should also be distinguished from scientific research (research following a defined and rigorous process) carried out on the Internet, from straightforward retrieving of details like a name or phone number, and from research about the Internet.[ citation needed ]

Internet research can provide quick, immediate, and worldwide access to information, although results may be affected by unrecognized bias, difficulties in verifying a writer's credentials (and therefore the accuracy or pertinence of the information obtained) and whether the searcher has sufficient skill to draw meaningful results from the abundance of material typically available. [1] The first resources retrieved may not be the most suitable resources to answer a particular question. Popularity is often a factor used in structuring Internet search results but popular information is not always most correct or representative of the breadth of knowledge and opinion on a topic.

While conducting commercial research fosters a deep concern with costs, and library research fosters a concern with access, Internet research fosters a deep concern for quality, managing the abundance of information and with avoiding unintended bias. This is partly because Internet research occurs in a less mature information environment: an environment with less sophisticated / poorly communicated search skills and much less effort in organizing information. Library and commercial research has many search tactics and strategies unavailable on the Internet and the library and commercial environments invest more deeply in organizing and vetting their information.

Search tools

Search tools for finding information on the Internet include web search engines, the search engines on individual websites, the browsers' hotkey-activated feature for searching in the current page, meta search engines, web directories, and specialty search services.

A Web search allows a user to enter a search query, in the form of keywords or a phrase, into either a search box or on a search form, and then finds matching results and displays them on the screen. The results are accessed from a database, using search algorithms that select web pages based on the location and frequency of keywords on them, along with the quality and number of external hyperlinks pointing at them. The database is supplied with data from a web crawler that follows the hyperlinks that connect webpages, and copies their content, records their URLs, and other data about the page along the way. The content is then indexed, to aid retrieval.

To view this information, a user enters their search query, in the form of keywords or a phrase, into a search box or search form. Then, the search engine uses its algorithms to query a database, selecting

Websites' search feature

Websites often have a search engine of their own, for searching just the site's content, often displayed at the top of every page. For example, Wikipedia provides a search engine for exploring its content. A search engine within a website allows a user to focus on its content and find desired information with more precision than with a web search engine. It may also provide access to information on the website for which a web search engine does not.

Browsers' local search features

Browsers typically provide separate input boxes to search history titles, bookmarks, and the currently displayed web page, though the latter only shows up when a hot key is pressed.

Browsers' search hot key

Using a key combo (two or more keys pressed down at the same time), the user can search the current page displayed by the browser. This is especially useful for long articles. A common key combo for this is Ctrl+f.

Meta search engines

A Meta search engine enables users to enter a search query once and it runs against multiple search engines simultaneously, creating a list of aggregated search results. Since no single search engine covers the entire web, a meta search engine can produce a more comprehensive search of the web. Most meta search engines automatically eliminate duplicate search results. However, meta search engines have a significant limitation because the most popular search engines, such as Google, are not included because of legal restrictions.

Web directories

A Web directory organizes subjects in a hierarchical fashion that lets users investigate the breadth of a specific topic and drill down to find relevant links and content. Web directories can be assembled automatically by algorithms or handcrafted. Human-edited Web directories have the distinct advantage of higher quality and reliability, while those produced by algorithms can offer more comprehensive coverage. The scope of Web directories are generally broad, such as Curlie and The WWW Virtual Library, covering a wide range of subjects, while others focus on specific topics.

Specialty search tools

Specialty search tools enable users to find information that conventional search engines and meta search engines cannot access because the content is stored in databases. In fact, the vast majority of information on the web is stored in databases that require users to go to a specific site and access it through a search form. Often, the content is generated dynamically. As a consequence, Web crawlers are unable to index this information. In a sense, this content is "hidden" from search engines, leading to the term invisible or deep Web. Specialty search tools have evolved to provide users with the means to quickly and easily find deep Web content. These specialty tools rely on advanced bot and intelligent agent technologies to search the deep Web and automatically generate specialty Web directories, such as the Virtual Private Library.

Website authorship

When using the Internet for research, a large number of websites may appear in the search results for whatever search query is entered. Each of these sites has one or more authors or associated organizations providing content, and the accuracy and reliability of the content may be extremely variable. It is necessary to identify authorship of web content so that reliability and bias can be assessed.

The author or sponsoring organization of a website may be found in several ways. Sometimes the author or organization can be found at the bottom of the website home page. Another way is by looking in the ‘Contact Us’ section of the website. It may be directly listed, determined from the email address, or by emailing and asking. If the author's name or sponsoring organization cannot be determined, one should question the trustworthiness of the website. If the author's name or sponsoring organization is found, an Internet search might provide information that can be used to determine if the website is reliable and unbiased.

Internet research software

Internet research software captures information while performing Internet research. This information can then be organized in various ways included tagging and hierarchical trees. The goal is to collect information relevant to a specific research project in one place, so that it can be found and accessed again quickly.

These tools also allow captured content to be edited and annotated and some allow the ability to export to other formats. Other features common to outliners include the ability to use full text search which aids in quickly locating information and filters enable you to drill down to see only information relevant to a specific query. Captured and kept information also provides an additional backup, in case web pages and sites disappear or are inaccessible later.

See also

Related Research Articles

Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can be used to specify page description, keywords and any other metadata not provided through the other head elements and attributes.

In general computing, a search engine is an information retrieval system designed to help find information stored on a computer system. It is an information retrieval software program that discovers, crawls, transforms, and stores information for retrieval and presentation in response to user queries. The search results are usually presented in a list and are commonly called hits. A search engine normally consists of four components, as follows: a search interface, a crawler, an indexer, and a database. The crawler traverses a document collection, deconstructs document text, and assigns surrogates for storage in the search engine index. Online search engines store images, link data and metadata for the document as well.

Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

A web portal is a specially designed website that brings information from diverse sources, like emails, online forums and search engines, together in a uniform way. Usually, each information source gets its dedicated area on the page for displaying information ; often, the user can configure which ones to display. Variants of portals include mashups and intranet dashboards for executives and managers. The extent to which content is displayed in a "uniform way" may depend on the intended user and the intended purpose, as well as the diversity of the content. Very often design emphasis is on a certain "metaphor" for configuring and customizing the presentation of the content and the chosen implementation framework or code libraries. In addition, the role of the user in an organization may determine which content can be added to the portal or deleted from the portal configuration.

<span class="mw-page-title-main">National Center for Biotechnology Information</span> Database branch of the US National Library of Medicine

The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

An image retrieval system is a computer system used for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, title or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools.

The deep web, invisible web, or hidden web are parts of the World Wide Web whose contents are not indexed by standard web search-engine programs. This is in contrast to the "surface web", which is accessible to anyone using the Internet. Computer scientist Michael K. Bergman is credited with inventing the term in 2001 as a search-indexing term.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

<span class="mw-page-title-main">Entrez</span> Cross-database search engine for health sciences

The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCBI is a part of the National Library of Medicine (NLM), which is itself a department of the National Institutes of Health (NIH), which in turn is a part of the United States Department of Health and Human Services. The name "Entrez" was chosen to reflect the spirit of welcoming the public to search the content available from the NLM.

Federated search retrieves information from a variety of sources via a search application built on top of one or more search engines. A user makes a single query request which is distributed to the search engines, databases or other query engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user. Federated search can be used to integrate disparate information resources within a single large organization ("enterprise") or for the entire web.

<span class="mw-page-title-main">Search engine</span> Software system that is designed to search for information on the World Wide Web

A search engine is a software system that finds web pages that match a web search. It searches the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of hyperlinks to web pages, images, videos, infographics, articles, and other types of files. As of January 2022, Google is by far the world's most used search engine, with a market share of 90.6%, and the world's other most used search engines were Bing, Yahoo!, Baidu, Yandex, and DuckDuckGo.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

ChemXSeer project, funded by the National Science Foundation, is a public integrated digital library, database, and search engine for scientific papers in chemistry. It is being developed by a multidisciplinary team of researchers at the Pennsylvania State University. ChemXSeer was conceived by Dr. Prasenjit Mitra, Dr. Lee Giles and Dr. Karl Mueller as a way to integrate the chemical scientific literature with experimental, analytical, and simulation data from different types of experimental systems. The goal of the project is to create an intelligent search and database which will provide access to relevant data to a diverse community of users who have a need for chemical information. It is hosted on the World Wide Web at the College of Information Sciences and Technology, The Pennsylvania State University.

Science.gov is a web portal and specialized search engine. Using federated search technology, Science.gov serves as a gateway to United States government scientific and technical information and research. Currently in its fifth generation, Science.gov provides a search of over 60 databases from 14 federal science agencies and 200 million pages of science information with just one query, and is a gateway to 2,200+ scientific websites.

DeepPeep was a search engine that aimed to crawl and index every database on the public Web. Unlike traditional search engines, which crawl existing webpages and their hyperlinks, DeepPeep aimed to allow access to the so-called Deep web, World Wide Web content only available via for instance typed queries into databases. The project started at the University of Utah and was overseen by Juliana Freire, an associate professor at the university's School of Computing WebDB group. The goal was to make 90% of all WWW content accessible, according to Freire. The project ran a beta search engine and was sponsored by the University of Utah and a $243,000 grant from the National Science Foundation. It generated worldwide interest.

Discoverability is the degree to which something, especially a piece of content or information, can be found in a search of a file, database, or other information system. Discoverability is a concern in library and information science, many aspects of digital media, software and web development, and in marketing, since products and services cannot be used if people cannot find it or do not understand what it can be used for.

Personalized search is a web search tailored specifically to an individual's interests by incorporating information about the individual beyond the specific query provided. There are two general approaches to personalizing search results, involving modifying the user's query and re-ranking search results.

The following outline is provided as an overview of and topical guide to search engines.

Contextual search is a form of optimizing web-based search results based on context provided by the user and the computer being used to enter the query. Contextual search services differ from current search engines based on traditional information retrieval that return lists of documents based on their relevance to the query. Rather, contextual search attempts to increase the precision of results based on how valuable they are to individual users.

References

  1. Hargittai, E. (April 2002). "Second-Level Digital Divide: Differences in People's Online Skills". First Monday . 7 (4). doi: 10.5210/fm.v7i4.942 .