Information explosion

Last updated

The information explosion is the rapid increase in the amount of published information or data and the effects of this abundance. [1] As the amount of available data grows, the problem of managing the information becomes more difficult, which can lead to information overload. The Online Oxford English Dictionary indicates use of the phrase in a March 1964 New Statesman article. [2] The New York Times first used the phrase in its editorial content in an article by Walter Sullivan on June 7, 1964, in which he described the phrase as "much discussed". (p11.) [3] The earliest known use of the phrase was in a speech about television by NBC president Pat Weaver at the Institute of Practitioners of Advertising in London on September 27, 1955. The speech was rebroadcast on radio station WSUI in Iowa City and excerpted in the Daily Iowan newspaper two months later. [4]

Contents

Many sectors are seeing this rapid increase in the amount of information available such as healthcare, supermarkets, and governments. [5] Another sector that is being affected by this phenomenon is journalism. Such a profession, which in the past was responsible for the dissemination of information, may be suppressed by the overabundance of information today. [6]

Techniques to gather knowledge from an overabundance of electronic information (e.g., data fusion may help in data mining) have existed since the 1970s. Another common technique to deal with such amount of information is qualitative research. [7] Such approaches aim to organize the information, synthesizing, categorizing and systematizing in order to be more usable and easier to search.

Growth patterns

A new metric that is being used in an attempt to characterize the growth in person-specific information, is the disk storage per person (DSP), which is measured in megabytes/person (where megabytes is 106 bytes and is abbreviated MB). Global DSP (GDSP) is the total rigid disk drive space (in MB) of new units sold in a year divided by the world population in that year. The GDSP metric is a crude measure of how much disk storage could possibly be used to collect person-specific data on the world population. [5] In 1983, one million fixed drives with an estimated total of 90 terabytes were sold worldwide; 30MB drives had the largest market segment. [9] In 1996, 105 million drives, totaling 160,623 terabytes were sold with 1 and 2 gigabyte drives leading the industry. [10] By the year 2000, with 20GB drive leading the industry, rigid drives sold for the year are projected to total 2,829,288 terabytes Rigid disk drive sales to top $34 billion in 1997.

According to Latanya Sweeney, there are three trends in data gathering today:

Type 1. Expansion of the number of fields being collected, known as the “collect more” trend.

Type 2. Replace an existing aggregate data collection with a person-specific one, known as the “collect specifically” trend.

Type 3. Gather information by starting a new person-specific data collection, known as the “collect it if you can” trend. [5]

Since "information" in electronic media is often used synonymously with "data", the term information explosion is closely related to the concept of data flood (also dubbed data deluge). Sometimes the term information flood is used as well. All of those basically boil down to the ever-increasing amount of electronic data exchanged per time unit. A term that covers the potential negative effects of information explosion is information inflation. [11] The awareness about non-manageable amounts of data grew along with the advent of ever more powerful data processing since the mid-1960s. [12]

Challenges

Even though the abundance of information can be beneficial in several levels, some problems may be of concern such as privacy, legal and ethical guidelines, filtering and data accuracy. [13] Filtering refers to finding useful information in the middle of so much data, which relates to the job of data scientists. A typical example of a necessity of data filtering (data mining) is in healthcare since in the next years is due to have EHRs (Electronic Health Records) of patients available. With so much information available, the doctors will need to be able to identify patterns and select important data for the diagnosis of the patient. [13] On the other hand, according to some experts, having so much public data available makes it difficult to provide data that is actually anonymous. [5] Another point to take into account is the legal and ethical guidelines, which relates to who will be the owner of the data and how frequently he/she is obliged to the release this and for how long. [13] With so many sources of data, another problem will be accuracy of such. An untrusted source may be challenged by others, by ordering a new set of data, causing a repetition in the information. [13] According to Edward Huth, another concern is the accessibility and cost of such information. [14] The accessibility rate could be improved by either reducing the costs or increasing the utility of the information. The reduction of costs according to the author, could be done by associations, which should assess which information was relevant and gather it in a more organized fashion.

Web servers

As of August 2005, there were over 70 million web servers. [15] As of September 2007 there were over 135 million web servers. [16]

Blogs

According to Technorati, the number of blogs doubles about every 6 months with a total of 35.3 million blogs as of April 2006. [17] This is an example of the early stages of logistic growth, where growth is approximately exponential, since blogs are a recent innovation. As the number of blogs approaches the number of possible producers (humans), saturation occurs, growth declines, and the number of blogs eventually stabilizes.

See also

Related Research Articles

<span class="mw-page-title-main">Disk storage</span> General category of storage mechanisms

Disk storage is a data storage mechanism based on a rotating disk. The recording employs various electronic, magnetic, optical, or mechanical changes to the disk's surface layer. A disk drive is a device implementing such a storage mechanism. Notable types are hard disk drives (HDD), containing one or more non-removable rigid platters; the floppy disk drive (FDD) and its removable floppy disk; and various optical disc drives (ODD) and associated optical disc media.

<span class="mw-page-title-main">Hard disk drive</span> Electro-mechanical data storage device

A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. The platters are paired with magnetic heads, usually arranged on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed in a random-access manner, meaning that individual blocks of data can be stored and retrieved in any order. HDDs are a type of non-volatile storage, retaining stored data when powered off. Modern HDDs are typically in the form of a small rectangular box.

<span class="mw-page-title-main">Data storage</span> Recording of information in a storage medium

Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are considered by some as data storage. Recording may be accomplished with virtually any form of energy. Electronic data storage requires electrical power to store and retrieve data.

<span class="mw-page-title-main">Tape drive</span> Data storage device

A tape drive is a data storage device that reads and writes data on a magnetic tape. Magnetic-tape data storage is typically used for offline, archival data storage. Tape media generally has a favorable unit cost and a long archival stability.

NT File System (NTFS) is a proprietary journaling file system developed by Microsoft in the 1990s.

<span class="mw-page-title-main">Information Age</span> Industrial shift to information technology

The Information Age is a historical period that began in the mid-20th century. It is characterized by a rapid shift from traditional industries, as established during the Industrial Revolution, to an economy centered on information technology. The onset of the Information Age has been linked to the development of the transistor in 1947 and the optical amplifier in 1957. These technological advances have had a significant impact on the way information is processed and transmitted.

In computing, a removable media is a data storage media that is designed to be readily inserted and removed from a system. Most early removable media, such as floppy disks and optical discs, require a dedicated read/write device to be installed in the computer, while others, such as USB flash drives, are plug-and-play with all the hardware required to read them built into the device, so only need a driver software to be installed in order to communicate with the device. Some removable media readers/drives are integrated into the computer case, while others are standalone devices that need to be additionally installed or connected.

<span class="mw-page-title-main">Information and communications technology</span> Extensional term for information technology

Information and communications technology (ICT) is an extensional term for information technology (IT) that stresses the role of unified communications and the integration of telecommunications and computers, as well as necessary enterprise software, middleware, storage and audiovisual, that enable users to access, store, transmit, understand and manipulate information.

Digital permanence addresses the history and development of digital storage techniques, specifically quantifying the expected lifetime of data stored on various digital media and the factors which influence the permanence of digital data. It is often a mix of ensuring the data itself can be retained on a particular form of media and that the technology remains viable. Where possible, as well as describing expected lifetimes, factors affecting data retention will be detailed, including potential technology issues.

Hierarchical storage management (HSM), also known as tiered storage, is a data storage and data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as solid-state drive arrays, are more expensive than slower devices, such as hard disk drives, optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

Journalling Flash File System version 2 or JFFS2 is a log-structured file system for use with flash memory devices. It is the successor to JFFS. JFFS2 has been included into the Linux kernel since September 23, 2001, when it was merged into the Linux kernel mainline as part of the kernel version 2.4.10 release. JFFS2 is also available for a few bootloaders, like Das U-Boot, Open Firmware, the eCos RTOS, the RTEMS RTOS, and the RedBoot. Most prominent usage of the JFFS2 comes from OpenWrt.

Mark Howard Kryder was Seagate Corp.'s senior vice president of research and chief technology officer. Kryder holds a Bachelor of Science degree in electrical engineering from Stanford University and a Ph.D. in electrical engineering and physics from the California Institute of Technology.

Magnetic-tape data storage is a system for storing digital information on magnetic tape using digital recording.

<span class="mw-page-title-main">Windows Home Server</span> Home server operating system by Microsoft released in 2007

Windows Home Server is a home server operating system from Microsoft. It was announced on 7 January 2007 at the Consumer Electronics Show by Bill Gates, released to manufacturing on 16 July 2007 and officially released on 4 November 2007.

<span class="mw-page-title-main">Information</span> Facts provided or learned about something or someone

Information is an abstract concept that refers to something which has the power to inform. At the most fundamental level, it pertains to the interpretation of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs to convey information, other phenomena and artifacts such as analogue signals, poems, pictures, music or other sounds, and currents convey information in a more continuous form. Information is not knowledge itself, but the meaning that may be derived from a representation through interpretation.

<span class="mw-page-title-main">Microsoft Live Labs Pivot</span> Data exploration tool by Microsoft

Pivot is a software application from Microsoft Live Labs that allows users to interact with and search large amounts of data. It is based on Microsoft's Seadragon. It has been described as allowing users to view the web as a web rather than as isolated pages.

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications that devote most of their execution time to computational requirements are deemed compute-intensive, whereas applications are deemed data-intensive require large volumes of data and devote most of their processing time to I/O and manipulation of data.

The term is used for two different things:

  1. In computer science, in-memory processing (PIM) is a computer architecture in which data operations are available directly on the data memory, rather than having to be transferred to CPU registers first. This may improve the power usage and performance of moving data between the processor and the main memory.
  2. In software engineering, in-memory processing is a software architecture where a database is kept entirely in random-access memory (RAM) or flash memory so that usual accesses, in particular read or query operations, do not require access to disk storage. This may allow faster data operations such as "joins", and faster reporting and decision-making in business.

Information technology (IT) is a set of related fields that encompass computer systems, software, programming languages, and data and information processing, and storage. IT forms part of information and communications technology (ICT). An information technology system is generally an information system, a communications system, or, more specifically speaking, a computer system — including all hardware, software, and peripheral equipment — operated by a limited group of IT users, and an IT project usually refers to the commissioning and implementation of an IT system. IT systems play a vital role in facilitating efficient data management, enhancing communication networks, and supporting organizational processes across various industries. Successful IT projects require meticulous planning, seamless integration, and ongoing maintenance to ensure optimal functionality and alignment with organizational objectives.

The Zettabyte Era or Zettabyte Zone is a period of human and computer science history that started in the mid-2010s. The precise starting date depends on whether it is defined as when the global IP traffic first exceeded one zettabyte, which happened in 2016, or when the amount of digital data in the world first exceeded a zettabyte, which happened in 2012. A zettabyte is a multiple of the unit byte that measures digital storage, and it is equivalent to 1,000,000,000,000,000,000,000 (1021) bytes.

References

  1. Hilbert, M. (2015). Global information Explosion:https://www.youtube.com/watch?v=8-AqzPe_gNs&list=PLtjBSCvWCU3rNm46D3R85efM0hrzjuAIg. Digital Technology and Social Change [Open Online Course at the University of California] freely available at: https://canvas.instructure.com/courses/949415
  2. “Information.” http://dictionary.oed.com. accessed January 4, 2008
  3. "U. S. WILL REMOVE REACTOR IN ARCTIC; Compacting Snow Squeezes Device Under Ice Sheet". The New York Times. 7 June 1964.
  4. Weaver, Sylvester (22 Nov 1955). "The Impact of TV in the U.S." Daily Iowan. p. 2. Retrieved 18 Aug 2021. I believe that in the last few years we have set in motion an information explosion. To each man there is flooding more information than he can presently handle, but he is learning how to handle it and, as he learns, it will do him good.
  5. 1 2 3 4 Sweeney, Latanya. "Information explosion." Confidentiality, disclosure, and data access: Theory and practical applications for statistical agencies (2001): 43-74.
  6. Fuller, Jack. What is happening to news: The information explosion and the crisis in journalism. University of Chicago Press, 2010.
  7. Major, Claire Howell, and Maggi Savin-Baden. An introduction to qualitative research synthesis: Managing the information explosion in social science research. Routledge, 2010.
  8. 1 2 3 "The Womartinhilbert.net/WorldInfoCapacity.html "free access to the study" and "video animation".
  9. Disk/Trend report 1983,” Computer Week. Mountain View, CA. (46) 11/11/83.
  10. Rigid disk drive sales to top $34 billion in 1997,” Disk/Trend News. Mountain View, CA: Disk/Trend, Inc., 1997.
  11. Doomen, J. (2009). "Information Inflation". Journal of Information Ethics . 18 (2): 27–37. doi:10.3172/jie.18.2.27.
  12. Google Books Ngram viewer for the terms mentioned here
  13. 1 2 3 4 Berner, Eta S., and Jacqueline Moss. "Informatics challenges for the impending patient information explosion." Journal of the American Medical Informatics Association 12.6 (2005): 614-617.
  14. Huth, Edward J. "The information explosion." Bulletin of the New York Academy of Medicine 65.6 (1989): 647.
  15. Robert H Zakon (15 December 2010). "Hobbes' Internet Timeline 10.1". zakon.org. Retrieved 27 August 2011.
  16. "August 2011 Web Server Survey". netcraft.com. August 2011. Archived from the original on 20 May 2010. Retrieved 27 August 2011.
  17. "State of the Blogosphere, April 2006 Part 1: On Blogosphere Growth". Sifry's Alerts (sifry.com). April 17, 2006. Archived from the original on 9 January 2013. Retrieved 27 August 2011.