Communication |
---|
General aspects |
Fields |
Disciplines |
Categories |
Information is an abstract concept that refers to something which has the power to inform. At the most fundamental level, it pertains to the interpretation (perhaps formally) of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs to convey information, other phenomena and artifacts such as analogue signals, poems, pictures, music or other sounds, and currents convey information in a more continuous form. [1] Information is not knowledge itself, but the meaning that may be derived from a representation through interpretation. [2]
The concept of information is relevant or connected to various concepts, [3] including constraint, communication, control, data, form, education, knowledge, meaning, understanding, mental stimuli, pattern, perception, proposition, representation, and entropy.
Information is often processed iteratively: Data available at one step are processed into information to be interpreted and processed at the next step. For example, in written text each symbol or letter conveys information relevant to the word it is part of, each word conveys information relevant to the phrase it is part of, each phrase conveys information relevant to the sentence it is part of, and so on until at the final step information is interpreted and becomes knowledge in a given domain. In a digital signal, bits may be interpreted into the symbols, letters, numbers, or structures that convey the information available at the next level up. The key characteristic of information is that it is subject to interpretation and processing.
The derivation of information from a signal or message may be thought of as the resolution of ambiguity or uncertainty that arises during the interpretation of patterns within the signal or message. [4]
Information may be structured as data. Redundant data can be compressed up to an optimal size, which is the theoretical limit of compression.
The information available through a collection of data may be derived by analysis. For example, a restaurant collects data from every customer order. That information may be analyzed to produce knowledge that is put to use when the business subsequently wants to identify the most popular or least popular dish.[ citation needed ]
Information can be transmitted in time, via data storage, and space, via communication and telecommunication. [5] Information is expressed either as the content of a message or through direct or indirect observation. That which is perceived can be construed as a message in its own right, and in that sense, all information is always conveyed as the content of a message.
Information can be encoded into various forms for transmission and interpretation (for example, information may be encoded into a sequence of signs, or transmitted via a signal). It can also be encrypted for safe storage and communication.
The uncertainty of an event is measured by its probability of occurrence. Uncertainty is inversely proportional to the probability of occurrence. Information theory takes advantage of this by concluding that more uncertain events require more information to resolve their uncertainty. The bit is a typical unit of information. It is 'that which reduces uncertainty by half'. [6] Other units such as the nat may be used. For example, the information encoded in one "fair" coin flip is log2(2/1) = 1 bit, and in two fair coin flips is log2(4/1) = 2 bits. A 2011 Science article estimates that 97% of technologically stored information was already in digital bits in 2007 and that the year 2002 was the beginning of the digital age for information storage (with digital storage capacity bypassing analogue for the first time). [7]
Information can be defined exactly by set theory:
"Information is a selection from the domain of information".
The "domain of information" is a set that the sender and receiver of information must know before exchanging information. Digital information, for example, consists of building blocks that are all number sequences. Each number sequence represents a selection from its domain. The sender and receiver of digital information (number sequences) must know the domain and binary format of each number sequence before exchanging information. By defining number sequences online, this would be systematically and universally usable. Before the exchanged digital number sequence, an efficient unique link to its online definition can be set. This online-defined digital information (number sequence) would be globally comparable and globally searchable. [8]
The English word "information" comes from Middle French enformacion/informacion/information 'a criminal investigation' and its etymon, Latin informatiō(n) 'conception, teaching, creation'. [9]
In English, "information" is an uncountable mass noun.
Information theory is the scientific study of the quantification, storage, and communication of information. The field itself was fundamentally established by the work of Claude Shannon in the 1940s, with earlier contributions by Harry Nyquist and Ralph Hartley in the 1920s. [10] [11] The field is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering, and electrical engineering.
A key measure in information theory is entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a die (with six equally likely outcomes). Some other important measures in information theory are mutual information, channel capacity, error exponents, and relative entropy. Important sub-fields of information theory include source coding, algorithmic complexity theory, algorithmic information theory, and information-theoretic security.
Applications of fundamental topics of information theory include source coding/data compression (e.g. for ZIP files), and channel coding/error detection and correction (e.g. for DSL). Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones and the development of the Internet. The theory has also found applications in other areas, including statistical inference, [12] cryptography, neurobiology, [13] perception, [14] linguistics, the evolution [15] and function [16] of molecular codes (bioinformatics), thermal physics, [17] quantum computing, black holes, information retrieval, intelligence gathering, plagiarism detection, [18] pattern recognition, anomaly detection [19] and even art creation.
Often information can be viewed as a type of input to an organism or system. Inputs are of two kinds; some inputs are important to the function of the organism (for example, food) or system (energy) by themselves. In his book Sensory Ecology [20] biophysicist David B. Dusenbery called these causal inputs. Other inputs (information) are important only because they are associated with causal inputs and can be used to predict the occurrence of a causal input at a later time (and perhaps another place). Some information is important because of association with other information but eventually there must be a connection to a causal input.
In practice, information is usually carried by weak stimuli that must be detected by specialized sensory systems and amplified by energy inputs before they can be functional to the organism or system. For example, light is mainly (but not only, e.g. plants can grow in the direction of the light source) a causal input to plants but for animals it only provides information. The colored light reflected from a flower is too weak for photosynthesis but the visual system of the bee detects it and the bee's nervous system uses the information to guide the bee to the flower, where the bee often finds nectar or pollen, which are causal inputs, a nutritional function.
The cognitive scientist and applied mathematician Ronaldo Vigo argues that information is a concept that requires at least two related entities to make quantitative sense. These are, any dimensionally defined category of objects S, and any of its subsets R. R, in essence, is a representation of S, or, in other words, conveys representational (and hence, conceptual) information about S. Vigo then defines the amount of information that R conveys about S as the rate of change in the complexity of S whenever the objects in R are removed from S. Under "Vigo information", pattern, invariance, complexity, representation, and information –five fundamental constructs of universal science –are unified under a novel mathematical framework. [21] [22] [23] Among other things, the framework aims to overcome the limitations of Shannon-Weaver information when attempting to characterize and measure subjective information.
Information is any type of pattern that influences the formation or transformation of other patterns. [24] [25] In this sense, there is no need for a conscious mind to perceive, much less appreciate, the pattern. Consider, for example, DNA. The sequence of nucleotides is a pattern that influences the formation and development of an organism without any need for a conscious mind. One might argue though that for a human to consciously define a pattern, for example a nucleotide, naturally involves conscious information processing. However, the existence of unicellular and multicellular organisms, with the complex biochemistry that leads, among other events, to the existence of enzymes and polynucleotides that interact maintaining the biological order and participating in the development of multicellular organisms, precedes by millions of years the emergence of human consciousness and the creation of the scientific culture that produced the chemical nomenclature.
Systems theory at times seems to refer to information in this sense, assuming information does not necessarily involve any conscious mind, and patterns circulating (due to feedback) in the system can be called information. In other words, it can be said that information in this sense is something potentially perceived as representation, though not created or presented for that purpose. For example, Gregory Bateson defines "information" as a "difference that makes a difference". [26]
If, however, the premise of "influence" implies that information has been perceived by a conscious mind and also interpreted by it, the specific context associated with this interpretation may cause the transformation of the information into knowledge. Complex definitions of both "information" and "knowledge" make such semantic and logical analysis difficult, but the condition of "transformation" is an important point in the study of information as it relates to knowledge, especially in the business discipline of knowledge management. In this practice, tools and processes are used to assist a knowledge worker in performing research and making decisions, including steps such as:
Stewart (2001) argues that transformation of information into knowledge is critical, lying at the core of value creation and competitive advantage for the modern enterprise.
In a biological framework, Mizraji [27] has described information as an entity emerging from the interaction of patterns with receptor systems (eg: in molecular or neural receptors capable of interacting with specific patterns, information emerges from those interactions). In addition, he has incorporated the idea of "information catalysts", structures where emerging information promotes the transition from pattern recognition to goal-directed action (for example, the specific transformation of a substrate into a product by an enzyme, or auditory reception of words and the production of an oral response)
The Danish Dictionary of Information Terms [28] argues that information only provides an answer to a posed question. Whether the answer provides knowledge depends on the informed person. So a generalized definition of the concept should be: "Information" = An answer to a specific question".
When Marshall McLuhan speaks of media and their effects on human cultures, he refers to the structure of artifacts that in turn shape our behaviors and mindsets. Also, pheromones are often said to be "information" in this sense.
These sections are using measurements of data rather than information, as information cannot be directly measured.
It is estimated that the world's technological capacity to store information grew from 2.6 (optimally compressed) exabytes in 1986 – which is the informational equivalent to less than one 730-MB CD-ROM per person (539 MB per person) – to 295 (optimally compressed) exabytes in 2007. [7] This is the informational equivalent of almost 61 CD-ROM per person in 2007. [5]
The world's combined technological capacity to receive information through one-way broadcast networks was the informational equivalent of 174 newspapers per person per day in 2007. [7]
The world's combined effective capacity to exchange information through two-way telecommunication networks was the informational equivalent of 6 newspapers per person per day in 2007. [5]
As of 2007, an estimated 90% of all new information is digital, mostly stored on hard drives. [29]
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 64.2 zettabytes in 2020. Over the next five years up to 2025, global data creation is projected to grow to more than 180 zettabytes. [30]
Part of a series on |
Library and information science |
---|
Records are specialized forms of information. Essentially, records are information produced consciously or as by-products of business activities or transactions and retained because of their value. Primarily, their value is as evidence of the activities of the organization but they may also be retained for their informational value. Sound records management ensures that the integrity of records is preserved for as long as they are required.[ citation needed ]
The international standard on records management, ISO 15489, defines records as "information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business". [31] The International Committee on Archives (ICA) Committee on electronic records defined a record as, "recorded information produced or received in the initiation, conduct or completion of an institutional or individual activity and that comprises content, context and structure sufficient to provide evidence of the activity". [32]
Records may be maintained to retain corporate memory of the organization or to meet legal, fiscal or accountability requirements imposed on the organization. Willis expressed the view that sound management of business records and information delivered "...six key requirements for good corporate governance...transparency; accountability; due process; compliance; meeting statutory and common law requirements; and security of personal and corporate information." [33]
Michael Buckland has classified "information" in terms of its uses: "information as process", "information as knowledge", and "information as thing". [34]
Beynon-Davies [35] [36] explains the multi-faceted concept of information in terms of signs and signal-sign systems. Signs themselves can be considered in terms of four inter-dependent levels, layers or branches of semiotics: pragmatics, semantics, syntax, and empirics. These four layers serve to connect the social world on the one hand with the physical or technical world on the other.
Pragmatics is concerned with the purpose of communication. Pragmatics links the issue of signs with the context within which signs are used. The focus of pragmatics is on the intentions of living agents underlying communicative behaviour. In other words, pragmatics link language to action.
Semantics is concerned with the meaning of a message conveyed in a communicative act. Semantics considers the content of communication. Semantics is the study of the meaning of signs – the association between signs and behaviour. Semantics can be considered as the study of the link between symbols and their referents or concepts – particularly the way that signs relate to human behavior.
Syntax is concerned with the formalism used to represent a message. Syntax as an area studies the form of communication in terms of the logic and grammar of sign systems. Syntax is devoted to the study of the form rather than the content of signs and sign systems.
Nielsen (2008) discusses the relationship between semiotics and information in relation to dictionaries. He introduces the concept of lexicographic information costs and refers to the effort a user of a dictionary must make to first find, and then understand data so that they can generate information.
Communication normally exists within the context of some social situation. The social situation sets the context for the intentions conveyed (pragmatics) and the form of communication. In a communicative situation intentions are expressed through messages that comprise collections of inter-related signs taken from a language mutually understood by the agents involved in the communication. Mutual understanding implies that agents involved understand the chosen language in terms of its agreed syntax and semantics. The sender codes the message in the language and sends the message as signals along some communication channel (empirics). The chosen communication channel has inherent properties that determine outcomes such as the speed at which communication can take place, and over what distance.
The existence of information about a closed system is a major concept in both classical physics and quantum mechanics, encompassing the ability, real or theoretical, of an agent to predict the future state of a system based on knowledge gathered during its past and present. Determinism is a philosophical theory holding that causal determination can predict all future events, [37] positing a fully predictable universe described by classical physicist Pierre-Simon Laplace as "the effect of its past and the cause of its future". [38]
Quantum physics instead encodes information as a wave function, which prevents observers from directly identifying all of its possible measurements. Prior to the publication of Bell's theorem, determinists reconciled with this behavior using hidden variable theories, which argued that the information necessary to predict the future of a function must exist, even if it is not accessible for humans; A view surmised by Albert Einstein with the assertion that "God does not play dice". [39]
Modern astronomy cites the mechanical sense of information in the black hole information paradox, positing that, because the complete evaporation of a black hole into Hawking radiation leaves nothing except an expanding cloud of homogeneous particles, this results in the irrecoverability of any information about the matter to have originally crossed the event horizon, violating both classical and quantum assertions against the ability to destroy information. [40] [41]
The information cycle (addressed as a whole or in its distinct components) is of great concern to information technology, information systems, as well as information science. These fields deal with those processes and techniques pertaining to information capture (through sensors) and generation (through computation, formulation or composition), processing (including encoding, encryption, compression, packaging), transmission (including all telecommunication methods), presentation (including visualization / display methods), storage (such as magnetic or optical, including holographic methods), etc.
Information visualization (shortened as InfoVis) depends on the computation and digital representation of data, and assists users in pattern recognition and anomaly detection.
Information security (shortened as InfoSec) is the ongoing process of exercising due diligence to protect information, and information systems, from unauthorized access, use, disclosure, destruction, modification, disruption or distribution, through algorithms and procedures focused on monitoring and detection, as well as incident response and repair.
Information analysis is the process of inspecting, transforming, and modeling information, by converting raw data into actionable knowledge, in support of the decision-making process.
Information quality (shortened as InfoQ) is the potential of a dataset to achieve a specific (scientific or practical) goal using a given empirical analysis method.
Information communication represents the convergence of informatics, telecommunication and audio-visual media & content.
In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication channel or storage in a storage medium. An early example is an invention of language, which enabled a person, through speech, to communicate what they thought, saw, heard, or felt to others. But speech limits the range of communication to the distance a voice can carry and limits the audience to those present when the speech is uttered. The invention of writing, which converted spoken language into visual symbols, extended the range of communication across space and time.
Complexity characterizes the behavior of a system or model whose components interact in multiple ways and follow local rules, leading to non-linearity, randomness, collective dynamics, hierarchy, and emergence.
Digital data, in information theory and information systems, is information represented as a string of discrete symbols, each of which can take on one of only a finite number of values from some alphabet, such as letters or digits. An example is a text document, which consists of a string of alphanumeric characters. The most common form of digital data in modern information systems is binary data, which is represented by a string of binary digits (bits) each of which can have one of two values, either 0 or 1.
Information theory is the mathematical study of the quantification, storage, and communication of information. The field was established and put on a firm footing by Claude Shannon in the 1940s, though early contributions were made in the 1920s through the works of Harry Nyquist and Ralph Hartley. It is at the intersection of electronic engineering, mathematics, statistics, computer science, neurobiology, physics, and electrical engineering.
In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed to describe the state of the variable, considering the distribution of probabilities across all potential states. Given a discrete random variable , which takes values in the set and is distributed according to , the entropy is where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits, while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys". An equivalent definition of entropy is the expected value of the self-information of a variable.
Quantum information is the information of the state of a quantum system. It is the basic entity of study in quantum information theory, and can be manipulated using quantum information processing techniques. Quantum information refers to both the technical definition in terms of Von Neumann entropy and the general computational term.
In linguistics and philosophy, a vague predicate is one which gives rise to borderline cases. For example, the English adjective "tall" is vague since it is not clearly true or false for someone of middling height. By contrast, the word "prime" is not vague since every number is definitively either prime or not. Vagueness is commonly diagnosed by a predicate's ability to give rise to the Sorites paradox. Vagueness is separate from ambiguity, in which an expression has multiple denotations. For instance the word "bank" is ambiguous since it can refer either to a river bank or to a financial institution, but there are no borderline cases between both interpretations.
A communication channel refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel in telecommunications and computer networking. A channel is used for information transfer of, for example, a digital bit stream, from one or several senders to one or several receivers. A channel has a certain capacity for transmitting information, often measured by its bandwidth in Hz or its data rate in bits per second.
Signal refers to both the process and the result of transmission of data over some media accomplished by embedding some variation. Signals are important in multiple subject fields including signal processing, information theory and biology.
Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied by various scientific disciplines—such as information theory, electrical engineering, mathematics, linguistics, and computer science—for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction or detection of errors in the transmitted data.
Theoretical computer science is a subfield of computer science and mathematics that focuses on the abstract and mathematical foundations of computation.
In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.
Relevance is the concept of one topic being connected to another topic in a way that makes it useful to consider the second topic when considering the first. The concept of relevance is studied in many different fields, including cognitive sciences, logic, and library and information science. Most fundamentally, however, it is studied in epistemology. Different theories of knowledge have different implications for what is considered relevant and these fundamental views have implications for all other fields as well.
The actor model and process calculi share an interesting history and co-evolution.
Analytical skill is the ability to deconstruct information into smaller categories in order to draw conclusions. Analytical skill consists of categories that include logical reasoning, critical thinking, communication, research, data analysis and creativity. Analytical skill is taught in contemporary education with the intention of fostering the appropriate practices for future professions. The professions that adopt analytical skill include educational institutions, public institutions, community organisations and industry.
In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs, without any knowledge of its internal workings. Its implementation is "opaque" (black). The term can be used to refer to many inner workings, such as those of a transistor, an engine, an algorithm, the human brain, or an institution or government.
In common usage, data is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data are usually organized into structures such as tables that provide additional context and meaning, and may themselves be used as data in larger structures. Data may be used as variables in a computational process. Data may represent abstract ideas or concrete measurements. Data are commonly used in scientific research, economics, and virtually every other form of human organizational activity. Examples of data sets include price indices, unemployment rates, literacy rates, and census data. In this context, data represent the raw facts and figures from which useful information can be extracted.
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge and insights from potentially noisy, structured, or unstructured data.
The following outline is provided as an overview of and topical guide to natural-language processing:
This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence (AI), its subdisciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.
A theory is deterministic if, and only if, given its state variables for some initial period, the theory logically determines a unique set of values for those variables for any other period.