Information flow (information theory)

Last updated

Information flow in an information theoretical context is the transfer of information from a variable to a variable in a given process. Not all flows may be desirable; for example, a system should not leak any confidential information (partially or not) to public observers—as it is a violation of privacy on an individual level, or might cause major loss on a corporate level.

Contents

Introduction

Securing the data manipulated by computing systems has been a challenge in the past years. Several methods to limit the information disclosure exist today, such as access control lists, firewalls, and cryptography. However, although these methods do impose limits on the information that is released by a system, they provide no guarantees about information propagation. [1] For example, access control lists of file systems prevent unauthorized file access, but they do not control how the data is used afterwards. Similarly, cryptography provides a means to exchange information privately across a non-secure channel, but no guarantees about the confidentiality of the data are given once it is decrypted.

In low level information flow analysis, each variable is usually assigned a security level. The basic model comprises two distinct levels: low and high, meaning, respectively, publicly observable information, and secret information. To ensure confidentiality, flowing information from high to low variables should not be allowed. On the other hand, to ensure integrity, flows to high variables should be restricted. [1]

More generally, the security levels can be viewed as a lattice with information flowing only upwards in the lattice. [2]

For example, considering two security levels and (low and high), if , flows from to , from to , and to would be allowed, while flows from to would not. [3]

Throughout this article, the following notation is used:

Where and are the only two security levels in the lattice being considered.

Explicit flows and side channels

Information flows can be divided in two major categories. The simplest one is explicit flow, where some secret is explicitly leaked to a publicly observable variable. In the following example, the secret in the variable h flows into the publicly observable variable l.

var l, h l := h

The other flows fall into the side channel category. For example, in the timing attack or in the power analysis attack, the system leaks information through, respectively, the time or power it takes to perform an action depending on a secret value.

In the following example, the attacker can deduce if the value of h is one or not by the time the program takes to finish:

var l, h if h = 1 then     (* do some time-consuming work *) l := 0

Another side channel flow is the implicit information flow, which consists in leakage of information through the program control flow. The following program (implicitly) discloses the value of the secret variable h to the variable l. In this case, since the h variable is boolean, all the bits of the variable of h is disclosed (at the end of the program, l will be 3 if h is true, and 42 otherwise).

var l, h if h = true then     l := 3 else     l := 42

Non-interference

Non-interference is a policy that enforces that an attacker should not be able to distinguish two computations from their outputs if they only vary in their secret inputs. However, this policy is too strict to be usable in realistic programs. [4] The classic example is a password checker program that, in order to be useful, needs to disclose some secret information: whether the input password is correct or not (note that the information that an attacker learns in case the program rejects the password is that the attempted password is not the valid one).

Information flow control

A mechanism for information flow control is one that enforces information flow policies. Several methods to enforce information flow policies have been proposed. Run-time mechanisms that tag data with information flow labels have been employed at the operating system level and at the programming language level. Static program analyses have also been developed that ensure information flows within programs are in accordance with policies.

Both static and dynamic analysis for current programming languages have been developed. However, dynamic analysis techniques cannot observe all execution paths, and therefore cannot be both sound and precise. In order to guarantee noninterference, they either terminate executions that might release sensitive information [5] or they ignore updates that might leak information. [6]

A prominent way to enforce information flow policies in a program is through a security type system: that is, a type system that enforces security properties. In such a sound type system, if a program type-checks, it meets the flow policy and therefore contains no improper information flows.

Security type system

In a programming language augmented with a security type system every expression carries both a type (such as boolean, or integer) and a security label.

Following is a simple security type system from [1] that enforces non-interference. The notation means that the expression has type . Similarly, means that the command is typable in the security context .

Well-typed commands include, for example,

.

Conversely, the program

is ill-typed, as it will disclose the value of variable into .

Note that the rule is a subsumption rule, which means that any command that is of security type can also be . For example, can be both and . This is called polymorphism in type theory. Similarly, the type of an expression that satisfies can be both and according to and respectively.

Declassification

As shown previously, non-interference policy is too strict for use in most real-world applications. [7] Therefore, several approaches to allow controlled releases of information have been devised. Such approaches are called information declassification.

Robust declassification requires that an active attacker may not manipulate the system in order to learn more secrets than what passive attackers already know. [4]

Information declassification constructs can be classified in four orthogonal dimensions: what information is released, who is authorized to access the information, where the information is released, and when the information is released. [4]

What

A what declassification policy controls which information (partial or not) may be released to a publicly observable variable.

The following code example shows a declassify construct from. [8] In this code, the value of the variable h is explicitly allowed by the programmer to flow into the publicly observable variable l.

var l, h if l = 1 then     l := declassify(h)

Who

A who declassification policy controls which principals (i.e., who) can access a given piece of information. This kind of policy has been implemented in the Jif compiler. [9]

The following example allows Bob to share its secret contained in the variable b with Alice through the commonly accessible variable ab.

var ab                                (* {Alice, Bob} *)var b                                 (* {Bob} *)if ab = 1 then     ab := declassify(b, {Alice, Bob}) (* {Alice, Bob} *)

Where

A where declassification policy regulates where the information can be released, for example, by controlling in which lines of the source code information can be released.

The following example makes use of the flow construct proposed in [10] . This construct takes a flow policy (in this case, variables in H are allowed to flow to variables in L) and a command, which is run under the given flow policy.

var l, h flow H  L in     l := h

When

A when declassification policy regulates when the information can be released. Policies of this kind can be used to verify programs that implement, for example, controlled release of secret information after payment, or encrypted secrets which should not be released in a certain time given polynomial computational power.

Declassification approaches for implicit flows

An implicit flow occurs when code whose conditional execution is based on private information updates a public variable. This is especially problematic when multiple executions are considered since an attacker could leverage the public variable to infer private information by observing how its value changes over time or with the input.

The naïve approach

The naïve approach consists on enforcing the confidentiality property on all variables whose value is affected by other variables. This method leads to partially leaked information due to on some instances of the application a variable is Low and in others High.[ clarify ]

No sensitive upgrade

"No sensitive upgrade" halts the program whenever a High variable affects the value of a Low variable. Since it simply looks for expressions where an information leakage might happen, without looking at the context, it may halt a program that, despite having potential information leakage, never actually leaks information.

In the following example x is High and y is Low.

var x, y y := false if x = true then     y := true return true

In this case the program would be halted since—syntactically speaking—it uses the value of a High variable to change a Low variable, despite the program never leaking information.

Permissive upgrade

Permissive-upgrade introduces an extra security class P which will identify information leaking variables. When a High variable affects the value of a Low variable, the latter is labeled P. If a P labeled variable affects a Low variable the program would be halted. To prevent the halting the Low and P variables should be converted to High using a privatization function to ensure no information leakage can occur. On subsequent instances the program will run without interruption.

Privatization inference

Privatization inference extends permissive upgrade to automatically apply the privatization function to any variable that might leak information. This method should be used during testing where it will convert most variables. Once the program moves into production the permissive-upgrade should be used to halt the program in case of an information leakage and the privatization functions can be updated to prevent subsequent leaks.

Application in computer systems

Beyond applications to programming language, information flow control theories have been applied to operating systems, [11] distributed systems, [12] and cloud computing. [13] [14]

Related Research Articles

<span class="mw-page-title-main">Optimal control</span> Mathematical way of attaining a desired output from a dynamic system

Optimal control theory is a branch of control theory that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the objective might be to reach the Moon with minimum fuel expenditure. Or the dynamical system could be a nation's economy, with the objective to minimize unemployment; the controls in this case could be fiscal and monetary policy. A dynamical system may also be introduced to embed operations research problems within the framework of optimal control theory.

In type theory, a typing rule is an inference rule that describes how a type system assigns a type to a syntactic construction. These rules may be applied by the type system to determine if a program is well-typed and what type expressions have. A prototypical example of the use of typing rules is in defining type inference in the simply typed lambda calculus, which is the internal language of Cartesian closed categories.

The NTRUEncrypt public key cryptosystem, also known as the NTRU encryption algorithm, is an NTRU lattice-based alternative to RSA and elliptic curve cryptography (ECC) and is based on the shortest vector problem in a lattice.

In mathematical logic and computer science, the calculus of constructions (CoC) is a type theory created by Thierry Coquand. It can serve as both a typed programming language and as constructive foundation for mathematics. For this second reason, the CoC and its variants have been the basis for Coq and other proof assistants.

In computer security, a side-channel attack is any attack based on extra information that can be gathered because of the fundamental way a computer protocol or algorithm is implemented, rather than flaws in the design of the protocol or algorithm itself or minor, but potentially devastating, mistakes or oversights in the implementation. Timing information, power consumption, electromagnetic leaks, and sound are examples of extra information which could be exploited to facilitate side-channel attacks.

<span class="mw-page-title-main">Lambda cube</span>

In mathematical logic and type theory, the λ-cube is a framework introduced by Henk Barendregt to investigate the different dimensions in which the calculus of constructions is a generalization of the simply typed λ-calculus. Each dimension of the cube corresponds to a new kind of dependency between terms and types. Here, "dependency" refers to the capacity of a term or type to bind a term or type. The respective dimensions of the λ-cube correspond to:

Bunched logic is a variety of substructural logic proposed by Peter O'Hearn and David Pym. Bunched logic provides primitives for reasoning about resource composition, which aid in the compositional analysis of computer and other systems. It has category-theoretic and truth-functional semantics, which can be understood in terms of an abstract concept of resource, and a proof theory in which the contexts Γ in an entailment judgement Γ ⊢ A are tree-like structures (bunches) rather than lists or (multi)sets as in most proof calculi. Bunched logic has an associated type theory, and its first application was in providing a way to control the aliasing and other forms of interference in imperative programs. The logic has seen further applications in program verification, where it is the basis of the assertion language of separation logic, and in systems modelling, where it provides a way to decompose the resources used by components of a system.

<span class="mw-page-title-main">Rayleigh–Taylor instability</span> Unstable behavior of two contacting fluids of different densities

The Rayleigh–Taylor instability, or RT instability, is an instability of an interface between two fluids of different densities which occurs when the lighter fluid is pushing the heavier fluid. Examples include the behavior of water suspended above oil in the gravity of Earth, mushroom clouds like those from volcanic eruptions and atmospheric nuclear explosions, supernova explosions in which expanding core gas is accelerated into denser shell gas, instabilities in plasma fusion reactors and inertial confinement fusion.

In computer science, Programming Computable Functions (PCF) is a typed functional language introduced by Gordon Plotkin in 1977, based on previous unpublished material by Dana Scott. It can be considered to be an extended version of the typed lambda calculus or a simplified version of modern typed functional languages such as ML or Haskell.

The United States government classification system is established under Executive Order 13526, the latest in a long series of executive orders on the topic of classified information beginning in 1951. Issued by President Barack Obama in 2009, Executive Order 13526 replaced earlier executive orders on the topic and modified the regulations codified to 32 C.F.R. 2001. It lays out the system of classification, declassification, and handling of national security information generated by the U.S. government and its employees and contractors, as well as information received from other governments.

Noninterference is a strict multilevel security policy model, first described by Goguen and Meseguer in 1982 and developed further in 1984.

An alpha beta filter is a simplified form of observer for estimation, data smoothing and control applications. It is closely related to Kalman filters and to linear state observers used in control theory. Its principal advantage is that it does not require a detailed system model.

In the branches of mathematical logic known as proof theory and type theory, a pure type system (PTS), previously known as a generalized type system (GTS), is a form of typed lambda calculus that allows an arbitrary number of sorts and dependencies between any of these. The framework can be seen as a generalisation of Barendregt's lambda cube, in the sense that all corners of the cube can be represented as instances of a PTS with just two sorts. In fact, Barendregt (1991) framed his cube in this setting. Pure type systems may obscure the distinction between types and terms and collapse the type hierarchy, as is the case with the calculus of constructions, but this is not generally the case, e.g. the simply typed lambda calculus allows only terms to depend on terms.

Minimalist grammars are a class of formal grammars that aim to provide a more rigorous, usually proof-theoretic, formalization of Chomskyan Minimalist program than is normally provided in the mainstream Minimalist literature. A variety of particular formalizations exist, most of them developed by Edward Stabler, Alain Lecomte, Christian Retoré, or combinations thereof.

A Hindley–Milner (HM) type system is a classical type system for the lambda calculus with parametric polymorphism. It is also known as Damas–Milner or Damas–Hindley–Milner. It was first described by J. Roger Hindley and later rediscovered by Robin Milner. Luis Damas contributed a close formal analysis and proof of the method in his PhD thesis.

In computer science, a type system can be described as a syntactic framework which contains a set of rules that are used to assign a type property to various components of a computer program, such as variables or functions. A security type system works in a similar way, only with a main focus on the security of the computer program, through information flow control. Thus, the various components of the program are assigned security types, or labels. The aim of a such system is to ultimately be able to verify that a given program conforms to the type system rules and satisfies non-interference. Security type systems is one of many security techniques used in the field of language-based security, and is tightly connected to information flow and information flow policies.

In computer science, language-based security (LBS) is a set of techniques that may be used to strengthen the security of applications on a high level by using the properties of programming languages. LBS is considered to enforce computer security on an application-level, making it possible to prevent vulnerabilities which traditional operating system security is unable to handle.

In signal processing, nonlinear multidimensional signal processing (NMSP) covers all signal processing using nonlinear multidimensional signals and systems. Nonlinear multidimensional signal processing is a subset of signal processing (multidimensional signal processing). Nonlinear multi-dimensional systems can be used in a broad range such as imaging, teletraffic, communications, hydrology, geology, and economics. Nonlinear systems cannot be treated as linear systems, using Fourier transformation and wavelet analysis. Nonlinear systems will have chaotic behavior, limit cycle, steady state, bifurcation, multi-stability and so on. Nonlinear systems do not have a canonical representation, like impulse response for linear systems. But there are some efforts to characterize nonlinear systems, such as Volterra and Wiener series using polynomial integrals as the use of those methods naturally extend the signal into multi-dimensions. Another example is the Empirical mode decomposition method using Hilbert transform instead of Fourier Transform for nonlinear multi-dimensional systems. This method is an empirical method and can be directly applied to data sets. Multi-dimensional nonlinear filters (MDNF) are also an important part of NMSP, MDNF are mainly used to filter noise in real data. There are nonlinear-type hybrid filters used in color image processing, nonlinear edge-preserving filters use in magnetic resonance image restoration. Those filters use both temporal and spatial information and combine the maximum likelihood estimate with the spatial smoothing algorithm.

In mathematical logic, the intersection type discipline is a branch of type theory encompassing type systems that use the intersection type constructor to assign multiple types to a single term. In particular, if a term can be assigned both the type and the type , then can be assigned the intersection type . Therefore, the intersection type constructor can be used to express finite heterogeneous ad hoc polymorphism . For example, the λ-term can be assigned the type in most intersection type systems, assuming for the term variable both the function type and the corresponding argument type .

LSH is a cryptographic hash function designed in 2014 by South Korea to provide integrity in general-purpose software environments such as PCs and smart devices. LSH is one of the cryptographic algorithms approved by the Korean Cryptographic Module Validation Program (KCMVP). And it is the national standard of South Korea.

References

  1. 1 2 3 Andrei Sabelfeld and Andrew C. Myers. Language-Based Information-Flow Security. IEEE Journal on Selected Areas in Communications, 21(1), Jan. 2003.
  2. Dorothy Denning. A lattice model of secure information flow. Communications of the ACM, 19(5):236-242, 1976.
  3. Smith, Geoffrey (2007). "Principles of Secure Information Flow Analysis". Advances in Information Security. Vol. 27. Springer US. pp. 291–307.
  4. 1 2 3 Andrei Sabelfeld and David Sands. Dimensions and Principles of Declassification. In Proc. of the IEEE Computer Security Foundations Workshop, 2005.
  5. Thomas H. Austin and Cormac Flanagan. Efficient purely-dynamic information flow analysis, Proc. of the ACM SIGPLAN Fourth Workshop on Programming Languages and Analysis for Security, ACM, 2009.
  6. J. S. Fenton. Memoryless Subsystems, Comput. J. 17(2): 143-147 (1974)
  7. S. Zdancewic. Challenges for information-flow security. In Workshop on the Programming Language Interference and Dependence (PLID’04) 2004.
  8. A. Sabelfeld and A. C. Myers. A model for delimited information release. In Proc. of International Symposium on Software Security (ISSS) 2003.
  9. Jif: Java information flow
  10. A. Almeida Matos and G. Boudol. On declassification and the non-disclosure policy. In Proc. IEEE Computer Security Foundations Workshop 2005.
  11. M. Krohn, A. Yip, M. Brodsky, N. Cliffer, M. Kaashoek, E. Kohler and R. Morris. Information flow control for standard OS abstractions. In ACM Special Interest Group on Operating Systems (SIGOPS) Symposium on Operating systems principles 2007.
  12. N. Zeldovich, S. Boyd-Wickizer and D. Mazieres. Securing Distributed Systems with Information Flow Control. In USENIX Symposium on Networked Systems Design and Implementation 2008.
  13. J. Bacon, D. Eyers, T. Pasquier, J. Singh, I. Papagiannis and P. Pietzuch. Information Flow Control for secure cloud computing. In IEEE Transactions on Network and Service Management 2014.
  14. Pasquier, Thomas; Singh, Jatinder; Eyers, David; Bacon, Jean (2015). "CamFlow: Managed Data-sharing for Cloud Services". IEEE Transactions on Cloud Computing. 5 (3): 472–484. arXiv: 1506.04391 . Bibcode:2015arXiv150604391P. doi:10.1109/TCC.2015.2489211. S2CID   11537746.