Data compaction

Last updated

In telecommunication, data compaction is the reduction of the number of data elements, bandwidth, cost, and time for the generation, transmission, and storage of data without loss of information by eliminating unnecessary redundancy, removing irrelevancy, or using special coding.

Contents

Examples of data compaction methods are the use of fixed-tolerance bands, variable-tolerance bands, slope-keypoints, sample changes, curve patterns, curve fitting, variable-precision coding, frequency analysis, and probability analysis.

Simply squeezing noncompacted data into a smaller space, for example by increasing packing density by transferring images from newsprint to microfilm or by transferring data on punched cards onto magnetic tape, is not data compaction.[ further explanation needed ]

Everyday examples

The use of acronyms in texting is an everyday example. The number of bits required to transmit and store "WYSIWYG" (What You See Is What You Get) is reduced from its expanded equivalent (7 characters vs 28). The representation of Mersenne primes is another example. The largest known as of February 2013 is over 17 million digits long but it is represented as M57885161 in a much more compacted form.

See also

Related Research Articles

In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication channel or storage in a storage medium. An early example is an invention of language, which enabled a person, through speech, to communicate what they thought, saw, heard, or felt to others. But speech limits the range of communication to the distance a voice can carry and limits the audience to those present when the speech is uttered. The invention of writing, which converted spoken language into visual symbols, extended the range of communication across space and time.

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

A parameter, generally, is any characteristic that can help in defining or classifying a particular system. That is, a parameter is an element of a system that is useful, or critical, when identifying the system, or when evaluating its performance, status, condition, etc.

In computer science, best, worst, and average cases of a given algorithm express what the resource usage is at least, at most and on average, respectively. Usually the resource being considered is running time, i.e. time complexity, but could also be memory or some other resource. Best case is the function which performs the minimum number of steps on input data of n elements. Worst case is the function which performs the maximum number of steps on input data of size n. Average case is the function which performs an average number of steps on input data of n elements.

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

<span class="mw-page-title-main">Light curve</span> Graph of light intensity of a celestial object or region, as a function of time

In astronomy, a light curve is a graph of light intensity of a celestial object or region as a function of time, typically with the magnitude of light received on the y axis and with time on the x axis. The light is usually in a particular frequency interval or band. Light curves can be periodic, as in the case of eclipsing binaries, Cepheid variables, other periodic variables, and transiting extrasolar planets, or aperiodic, like the light curve of a nova, a cataclysmic variable star, a supernova or a microlensing event or binary as observed during occultation events. The study of the light curve, together with other observations, can yield considerable information about the physical process that produces it or constrain the physical theories about it.

<span class="mw-page-title-main">Complex geometry</span> Study of complex manifolds and several complex variables

In mathematics, complex geometry is the study of geometric structures and constructions arising out of, or described by, the complex numbers. In particular, complex geometry is concerned with the study of spaces such as complex manifolds and complex algebraic varieties, functions of several complex variables, and holomorphic constructions such as holomorphic vector bundles and coherent sheaves. Application of transcendental methods to algebraic geometry falls in this category, together with more geometric aspects of complex analysis.

In telecommunications and computing, bit rate is the number of bits that are conveyed or processed per unit of time.

In signal processing and electronics, the frequency response of a system is the quantitative measure of the magnitude and phase of the output as a function of input frequency. The frequency response is widely used in the design and analysis of systems, such as audio and control systems, where they simplify mathematical analysis by converting governing differential equations into algebraic equations. In an audio system, it may be used to minimize audible distortion by designing components so that the overall response is as flat (uniform) as possible across the system's bandwidth. In control systems, such as a vehicle's cruise control, it may be used to assess system stability, often through the use of Bode plots. Systems with a specific frequency response can be designed using analog and digital filters.

<span class="mw-page-title-main">Break-even (economics)</span> Equality of costs and revenues

The break-even point (BEP) in economics, business—and specifically cost accounting—is the point at which total cost and total revenue are equal, i.e. "even". There is no net loss or gain, and one has "broken even", though opportunity costs have been paid and capital has received the risk-adjusted, expected return. In short, all costs that must be paid are paid, and there is neither profit nor loss. The break-even analysis was developed by Karl Bücher and Johann Friedrich Schär.

<span class="mw-page-title-main">Time series</span> Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

<span class="mw-page-title-main">Content analysis</span> Research method for studying documents and communication artifacts

Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. One of the key advantages of using content analysis to analyse social phenomena is its non-invasive nature, in contrast to simulating social experiences or collecting survey answers.

In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property. In computer science and some branches of mathematics, categorical variables are referred to as enumerations or enumerated types. Commonly, each of the possible values of a categorical variable is referred to as a level. The probability distribution associated with a random categorical variable is called a categorical distribution.

<span class="mw-page-title-main">Data dredging</span> Misuse of data analysis

Data dredging is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. This is done by performing many statistical tests on the data and only reporting those that come back with significant results.

<span class="mw-page-title-main">Data analysis</span> Machine Learning Data analysis process inspection. cleansing, generic data-sets and modeling

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

<span class="mw-page-title-main">Hilbert curve</span> Space-filling curve

The Hilbert curve is a continuous fractal space-filling curve first described by the German mathematician David Hilbert in 1891, as a variant of the space-filling Peano curves discovered by Giuseppe Peano in 1890.

Optical lens design is the process of designing a lens to meet a set of performance requirements and constraints, including cost and manufacturing limitations. Parameters include surface profile types, as well as radius of curvature, distance to the next surface, material type and optionally tilt and decenter. The process is computationally intensive, using ray tracing or other techniques to model how the lens affects light that passes through it.

<span class="mw-page-title-main">Plot (graphics)</span>

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

Tolerance analysis is the general term for activities related to the study of accumulated variation in mechanical parts and assemblies. Its methods may be used on other types of systems subject to accumulated variation, such as mechanical and electrical systems. Engineers analyze tolerances for the purpose of evaluating geometric dimensioning and tolerancing (GD&T). Methods include 2D tolerance stacks, 3D Monte Carlo simulations, and datum conversions.

<span class="mw-page-title-main">Cascade chart (NDI interval reliability)</span>

A cascade chart is tool that can be used in damage tolerance analysis to determine the proper inspection interval, based on reliability analysis, considering all the context uncertainties. The chart is called a "cascade chart" because the scatter of data points and downward curvature resembles a waterfall or cascade. This name was first introduced by Dr. Alberto W Mello in his work "Reliability prediction for structures under cyclic loads and recurring inspections". Materials subject to cyclic loads, as shown in the graph on the right, may form and propagate cracks over time due to fatigue. Therefore, it is essential to determine a reliable inspection interval. There are numerous factors that must be considered to determine this inspection interval. The non-destructive inspection (NDI) technique must have a high probability of detecting a crack in the material. If missed, a crack may lead the structure to a catastrophic failure before the next inspection. On the other hand, the inspection interval cannot be too frequent that the structure's maintenance is no longer profitable.

References

PD-icon.svg This article incorporates public domain material from Federal Standard 1037C. General Services Administration. Archived from the original on 2022-01-22.