Information explosion

Last updated

The information explosion is the rapid increase in the amount of published information or data and the effects of this abundance. [1] As the amount of available data grows, the problem of managing the information becomes more difficult, which can lead to information overload. The Online Oxford English Dictionary indicates use of the phrase in a March 1964 New Statesman article. [2] The New York Times first used the phrase in its editorial content in an article by Walter Sullivan on June 7, 1964, in which he described the phrase as "much discussed". (p11.) [3] The earliest known use of the phrase was in a speech about television by NBC president Pat Weaver at the Institute of Practitioners of Advertising in London on September 27, 1955. The speech was rebroadcast on radio station WSUI in Iowa and excerpted in the Daily Iowan newspaper two months later. [4]

Contents

Many sectors are seeing this rapid increase in the amount of information available such as healthcare, supermarkets, and governments. [5] Another sector that is being affected by this phenomenon is journalism. Such a profession, which in the past was responsible for the dissemination of information, may be suppressed by the overabundance of information today. [6]

Techniques to gather knowledge from an overabundance of electronic information (e.g., data fusion may help in data mining) have existed since the 1970s. Another common technique to deal with such amount of information is qualitative research. [7] Such approaches aim to organize the information, synthesizing, categorizing and systematizing in order to be more usable and easier to search.

Growth patterns

A new metric that is being used in an attempt to characterize the growth in person-specific information, is the disk storage per person (DSP), which is measured in megabytes/person (where megabytes is 106 bytes and is abbreviated MB). Global DSP (GDSP) is the total rigid disk drive space (in MB) of new units sold in a year divided by the world population in that year. The GDSP metric is a crude measure of how much disk storage could possibly be used to collect person-specific data on the world population. [5] In 1983, one million fixed drives with an estimated total of 90 terabytes were sold worldwide; 30MB drives had the largest market segment. [9] In 1996, 105 million drives, totaling 160,623 terabytes were sold with 1 and 2 gigabyte drives leading the industry. [10] By the year 2000, with 20GB drive leading the industry, rigid drives sold for the year are projected to total 2,829,288 terabytes Rigid disk drive sales to top $34 billion in 1997.

According to Latanya Sweeney, there are three trends in data gathering today:

Type 1. Expansion of the number of fields being collected, known as the “collect more” trend.

Type 2. Replace an existing aggregate data collection with a person-specific one, known as the “collect specifically” trend.

Type 3. Gather information by starting a new person-specific data collection, known as the “collect it if you can” trend. [5]

Since "information" in electronic media is often used synonymously with "data", the term information explosion is closely related to the concept of data flood (also dubbed data deluge). Sometimes the term information flood is used as well. All of those basically boil down to the ever-increasing amount of electronic data exchanged per time unit. The awareness about non-manageable amounts of data grew along with the advent of ever more powerful data processing since the mid-1960s. [11]

Challenges

Even though the abundance of information can be beneficial in several levels, some problems may be of concern such as privacy, legal and ethical guidelines, filtering and data accuracy. [12] Filtering refers to finding useful information in the middle of so much data, which relates to the job of data scientists. A typical example of a necessity of data filtering (data mining) is in healthcare since in the next years is due to have EHRs (Electronic Health Records) of patients available. With so much information available, the doctors will need to be able to identify patterns and select important data for the diagnosis of the patient. [12] On the other hand, according to some experts, having so much public data available makes it difficult to provide data that is actually anonymous. [5] Another point to take into account is the legal and ethical guidelines, which relates to who will be the owner of the data and how frequently he/she is obliged to the release this and for how long. [12] With so many sources of data, another problem will be accuracy of such. An untrusted source may be challenged by others, by ordering a new set of data, causing a repetition in the information. [12] According to Edward Huth, another concern is the accessibility and cost of such information. [13] The accessibility rate could be improved by either reducing the costs or increasing the utility of the information. The reduction of costs according to the author, could be done by associations, which should assess which information was relevant and gather it in a more organized fashion.

Web servers

As of August 2005, there were over 70 million web servers. [14] As of September 2007 there were over 135 million web servers. [15]

Blogs

According to Technorati, the number of blogs doubles about every 6 months with a total of 35.3 million blogs as of April 2006. [16] This is an example of the early stages of logistic growth, where growth is approximately exponential, since blogs are a recent innovation. As the number of blogs approaches the number of possible producers (humans), saturation occurs, growth declines, and the number of blogs eventually stabilizes.

See also

Related Research Articles

<span class="mw-page-title-main">Disk storage</span> General category of storage mechanisms

Disk storage is a general category of storage mechanisms where data is recorded by various electronic, magnetic, optical, or mechanical changes to a surface layer of one or more rotating disks. A disk drive is a device implementing such a storage mechanism. Notable types are the hard disk drive (HDD) containing a non-removable disk, the floppy disk drive (FDD) and its removable floppy disk, and various optical disc drives (ODD) and associated optical disc media.

<span class="mw-page-title-main">Hard disk drive</span> Electro-mechanical data storage device

A hard disk drive (HDD), hard disk, hard drive, or fixed disk, is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. The platters are paired with magnetic heads, usually arranged on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed in a random-access manner, meaning that individual blocks of data can be stored and retrieved in any order. HDDs are a type of non-volatile storage, retaining stored data when powered off. Modern HDDs are typically in the form of a small rectangular box.

<span class="mw-page-title-main">Data storage</span> Recording of information in a storage medium

Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are considered by some as data storage. Recording may be accomplished with virtually any form of energy. Electronic data storage requires electrical power to store and retrieve data.

<span class="mw-page-title-main">Tape drive</span> Data storage device

A tape drive is a data storage device that reads and writes data on a magnetic tape. Magnetic-tape data storage is typically used for offline, archival data storage. Tape media generally has a favorable unit cost and a long archival stability.

New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred filesystem on Windows and is supported in Linux and BSD as well. NTFS reading and writing support is provided using a free and open-source kernel implementation known as NTFS3 in Linux and the NTFS-3G driver in BSD. By using the convert command, Windows can convert FAT32/16/12 into NTFS without the need to rewrite all files. NTFS uses several files typically hidden from the user to store metadata about other files stored on the drive which can help improve speed and performance when reading data. Unlike FAT and High Performance File System (HPFS), NTFS supports access control lists (ACLs), filesystem encryption, transparent compression, sparse files and file system journaling. NTFS also supports shadow copy to allow backups of a system while it is running, but the functionality of the shadow copies varies between different versions of Windows.

<span class="mw-page-title-main">Information Age</span> Industrial shift to information technology

The Information Age is a historical period that began in the mid-20th century. It is characterized by a rapid shift from traditional industries, as established during the Industrial Revolution, to an economy centered on information technology. The onset of the Information Age has been linked to the development of the transistor in 1947, the optical amplifier in 1957, and Unix time, which began on January 1, 1970. These technological advances have had a significant impact on the way information is processed and transmitted.

<span class="mw-page-title-main">Memory hierarchy</span> Computer memory architecture

In computer organisation, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.

In computing, a removable media is a data storage media that is designed to be readily inserted and removed from a system. Most early removable media, such as floppy disks and optical discs, require a dedicated read/write device to be installed in the computer, while others, such as USB flash drives, are plug-and-play with all the hardware required to read them built into the device, so only need a driver software to be installed in order to communicate with the device. Some removable media readers/drives are integrated into the computer case, while others are standalone devices that need to be additionally installed or connected.

<span class="mw-page-title-main">Tape library</span> Storage device containing a robot which automatically loads tapes into tape drives

In computer storage, a tape library, sometimes called a tape silo, tape robot or tape jukebox, is a storage device that contains one or more tape drives, a number of slots to hold tape cartridges, a barcode reader to identify tape cartridges and an automated method for loading tapes. Additionally, the area where tapes that are not currently in a silo are stored is also called a tape library. Tape libraries can contain millions of tapes.

<span class="mw-page-title-main">Information and communications technology</span> Extensional term for information technology

Information and communications technology (ICT) is an extensional term for information technology (IT) that stresses the role of unified communications and the integration of telecommunications and computers, as well as necessary enterprise software, middleware, storage and audiovisual, that enable users to access, store, transmit, understand and manipulate information.

Digital permanence addresses the history and development of digital storage techniques, specifically quantifying the expected lifetime of data stored on various digital media and the factors which influence the permanence of digital data. It is often a mix of ensuring the data itself can be retained on a particular form of media and that the technology remains viable. Where possible, as well as describing expected lifetimes, factors affecting data retention will be detailed, including potential technology issues.

Hierarchical storage management (HSM), also known as Tiered storage, is a data storage and Data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as solid state drive arrays, are more expensive than slower devices, such as hard disk drives, optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

Mark Howard Kryder was Seagate Corp.'s senior vice president of research and chief technology officer. Kryder holds a Bachelor of Science degree in electrical engineering from Stanford University and a Ph.D. in electrical engineering and physics from the California Institute of Technology.

Magnetic-tape data storage is a system for storing digital information on magnetic tape using digital recording.

<span class="mw-page-title-main">Windows Home Server</span> Home server operating system by Microsoft released in 2007

Windows Home Server is a home server operating system from Microsoft. It was announced on 7 January 2007 at the Consumer Electronics Show by Bill Gates, released to manufacturing on 16 July 2007 and officially released on 4 November 2007.

<span class="mw-page-title-main">Information</span> Facts provided or learned about something or someone

Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level, information pertains to the interpretation of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs to convey information, other phenomena and artefacts such as analogue signals, poems, pictures, music or other sounds, and currents convey information in a more continuous form. Information is not knowledge itself, but the meaning that may be derived from a representation through interpretation.

<span class="mw-page-title-main">Microsoft Live Labs Pivot</span> Data exploration tool by Microsoft

Pivot is a software application from Microsoft Live Labs that allows users to interact with and search large amounts of data. It is based on Microsoft's Seadragon. It has been described as allowing users to view the web as a web rather than as isolated pages.

<span class="mw-page-title-main">Big data</span> Information assets characterized by high volume, velocity, and variety

Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many entries (rows) offer greater statistical power, while data with higher complexity may lead to a higher false discovery rate. Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe big data is the one associated with a large body of information that we could not comprehend when used only in smaller amounts.

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications which devote most of their execution time to computational requirements are deemed compute-intensive, whereas computing applications which require large volumes of data and devote most of their processing time to I/O and manipulation of data are deemed data-intensive.

The Zettabyte Era or Zettabyte Zone is a period of human and computer science history that started in the mid-2010s. The precise starting date depends on whether it is defined as when the global IP traffic first exceeded one zettabyte, which happened in 2016, or when the amount of digital data in the world first exceeded a zettabyte, which happened in 2012. A zettabyte is a multiple of the unit byte that measures digital storage, and it is equivalent to 1,000,000,000,000,000,000,000 (1021) bytes.

References

  1. Hilbert, M. (2015). Global information Explosion:https://www.youtube.com/watch?v=8-AqzPe_gNs&list=PLtjBSCvWCU3rNm46D3R85efM0hrzjuAIg. Digital Technology and Social Change [Open Online Course at the University of California] freely available at: https://canvas.instructure.com/courses/949415
  2. “Information.” http://dictionary.oed.com. accessed January 4, 2008
  3. "U. S. WILL REMOVE REACTOR IN ARCTIC; Compacting Snow Squeezes Device Under Ice Sheet". The New York Times. 7 June 1964.
  4. Weaver, Sylvester (22 Nov 1955). "The Impact of TV in the U.S." Daily Iowan. p. 2. Retrieved 18 Aug 2021. I believe that in the last few years we have set in motion an information explosion. To each man there is flooding more information than he can presently handle, but he is learning how to handle it and, as he learns, it will do him good.
  5. 1 2 3 4 Sweeney, Latanya. "Information explosion." Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies (2001): 43-74.
  6. Fuller, Jack. What is happening to news: The information explosion and the crisis in journalism. University of Chicago Press, 2010.
  7. Major, Claire Howell, and Maggi Savin-Baden. An introduction to qualitative research synthesis: Managing the information explosion in social science research. Routledge, 2010.
  8. 1 2 3 "The Womartinhilbert.net/WorldInfoCapacity.html "free access to the study" and "video animation".
  9. Disk/Trend report 1983,” Computer Week. Mountain View, CA. (46) 11/11/83.
  10. Rigid disk drive sales to top $34 billion in 1997,” Disk/Trend News. Mountain View, CA: Disk/Trend, Inc., 1997.
  11. Google Books Ngram viewer for the terms mentioned here
  12. 1 2 3 4 Berner, Eta S., and Jacqueline Moss. "Informatics challenges for the impending patient information explosion." Journal of the American Medical Informatics Association 12.6 (2005): 614-617.
  13. Huth, Edward J. "The information explosion." Bulletin of the New York Academy of Medicine 65.6 (1989): 647.
  14. Robert H Zakon (15 December 2010). "Hobbes' Internet Timeline 10.1". zakon.org. Retrieved 27 August 2011.
  15. "August 2011 Web Server Survey". netcraft.com. August 2011. Retrieved 27 August 2011.
  16. "State of the Blogosphere, April 2006 Part 1: On Blogosphere Growth". Sifry's Alerts (sifry.com). April 17, 2006. Archived from the original on 9 January 2013. Retrieved 27 August 2011.