Data compaction

Last updated

In telecommunications, data compaction is the reduction of the number of data elements, bandwidth, cost, and time for the generation, transmission, and storage of data without loss of information by eliminating unnecessary redundancy, removing irrelevancy, or using special coding.

Contents

Examples of data compaction methods are the use of fixed-tolerance bands, variable-tolerance bands, slope-keypoints, sample changes, curve patterns, curve fitting, variable-precision coding, frequency analysis, and probability analysis.

Simply squeezing noncompacted data into a smaller space, for example by increasing packing density by transferring images from newsprint to microfilm or by transferring data on punched cards onto magnetic tape, is not data compaction.[ further explanation needed ]

Everyday examples

The use of acronyms in texting is an everyday example. The number of bits required to transmit and store "WYSIWYG" (What You See Is What You Get) is reduced from its expanded equivalent (7 characters vs 28). The representation of Mersenne primes is another example. The largest known as of February 2013 is over 17 million digits long but it is represented as M57885161 in a much more compacted form.

See also

Related Research Articles

In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication channel or storage in a storage medium. An early example is an invention of language, which enabled a person, through speech, to communicate what they thought, saw, heard, or felt to others. But speech limits the range of communication to the distance a voice can carry and limits the audience to those present when the speech is uttered. The invention of writing, which converted spoken language into visual symbols, extended the range of communication across space and time.

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., multivariate random variables. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

A parameter, generally, is any characteristic that can help in defining or classifying a particular system. That is, a parameter is an element of a system that is useful, or critical, when identifying the system, or when evaluating its performance, status, condition, etc.

In computer science, best, worst, and average cases of a given algorithm express what the resource usage is at least, at most and on average, respectively. Usually the resource being considered is running time, i.e. time complexity, but could also be memory or some other resource. Best case is the function which performs the minimum number of steps on input data of n elements. Worst case is the function which performs the maximum number of steps on input data of size n. Average case is the function which performs an average number of steps on input data of n elements.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption.

<span class="mw-page-title-main">Light curve</span> Graph of light intensity of a celestial object or region, as a function of time

In astronomy, a light curve is a graph of the light intensity of a celestial object or region as a function of time, typically with the magnitude of light received on the y-axis and with time on the x-axis. The light is usually in a particular frequency interval or band.

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.

In mathematics, complex geometry is the study of geometric structures and constructions arising out of, or described by, the complex numbers. In particular, complex geometry is concerned with the study of spaces such as complex manifolds and complex algebraic varieties, functions of several complex variables, and holomorphic constructions such as holomorphic vector bundles and coherent sheaves. Application of transcendental methods to algebraic geometry falls in this category, together with more geometric aspects of complex analysis.

In telecommunications and computing, bit rate is the number of bits that are conveyed or processed per unit of time.

In signal processing and electronics, the frequency response of a system is the quantitative measure of the magnitude and phase of the output as a function of input frequency. The frequency response is widely used in the design and analysis of systems, such as audio and control systems, where they simplify mathematical analysis by converting governing differential equations into algebraic equations. In an audio system, it may be used to minimize audible distortion by designing components so that the overall response is as flat (uniform) as possible across the system's bandwidth. In control systems, such as a vehicle's cruise control, it may be used to assess system stability, often through the use of Bode plots. Systems with a specific frequency response can be designed using analog and digital filters.

<span class="mw-page-title-main">Break-even point</span> Equality of costs and revenues

The break-even point (BEP) in economics, business—and specifically cost accounting—is the point at which total cost and total revenue are equal, i.e. "even". In layman's terms, after all costs are paid for there is neither profit nor loss. In economics specifically, the term has a broader definition; even if there is no net loss or gain, and one has "broken even", opportunity costs have been covered and capital has received the risk-adjusted, expected return. The break-even analysis was developed by Karl Bücher and Johann Friedrich Schär.

<span class="mw-page-title-main">Time series</span> Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

<span class="mw-page-title-main">Content analysis</span> Research method for studying documents and communication artifacts

Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. One of the key advantages of using content analysis to analyse social phenomena is their non-invasive nature, in contrast to simulating social experiences or collecting survey answers.

In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property. In computer science and some branches of mathematics, categorical variables are referred to as enumerations or enumerated types. Commonly, each of the possible values of a categorical variable is referred to as a level. The probability distribution associated with a random categorical variable is called a categorical distribution.

Within computer science, a use-definition chain is a data structure that consists of a use U, of a variable, and all the definitions D of that variable that can reach that use without any other intervening definitions. A UD Chain generally means the assignment of some value to a variable.

<span class="mw-page-title-main">Data dredging</span> Misuse of data analysis

Data dredging is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. This is done by performing many statistical tests on the data and only reporting those that come back with significant results.

<span class="mw-page-title-main">Data analysis</span> The process of analyzing data to discover useful information and support decision-making

Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

Optical lens design is the process of designing a lens to meet a set of performance requirements and constraints, including cost and manufacturing limitations. Parameters include surface profile types, as well as radius of curvature, distance to the next surface, material type and optionally tilt and decenter. The process is computationally intensive, using ray tracing or other techniques to model how the lens affects light that passes through it.

Tolerance analysis is the general term for activities related to the study of accumulated variation in mechanical parts and assemblies. Its methods may be used on other types of systems subject to accumulated variation, such as mechanical and electrical systems. Engineers analyze tolerances for the purpose of evaluating geometric dimensioning and tolerancing (GD&T). Methods include 2D tolerance stacks, 3D Monte Carlo simulations, and datum conversions.

<span class="mw-page-title-main">Cascade chart (NDI interval reliability)</span> Tool to determine the inspection intervals

A cascade chart is tool that can be used in damage tolerance analysis to determine the proper inspection interval, based on reliability analysis, considering all the context uncertainties. The chart is called a "cascade chart" because the scatter of data points and downward curvature resembles a waterfall or cascade. This name was first introduced by Dr. Alberto W Mello in his work "Reliability prediction for structures under cyclic loads and recurring inspections". Materials subject to cyclic loads, as shown in the graph on the right, may form and propagate cracks over time due to fatigue. Therefore, it is essential to determine a reliable inspection interval. There are numerous factors that must be considered to determine this inspection interval. The non-destructive inspection (NDI) technique must have a high probability of detecting a crack in the material. If missed, a crack may lead the structure to a catastrophic failure before the next inspection. On the other hand, the inspection interval cannot be too frequent that the structure's maintenance is no longer profitable.

References

PD-icon.svg This article incorporates public domain material from Federal Standard 1037C. General Services Administration. Archived from the original on 2022-01-22.