Combinatorial data analysis

Last updated

In statistics, combinatorial data analysis (CDA) is the study of data sets where the order in which objects are arranged is important. CDA can be used either to determine how well a given combinatorial construct reflects the observed data, or to search for a suitable combinatorial construct that does fit the data. [1] [2] [3]

See also

Related Research Articles

Bayesian probability is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief.

Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many applications ranging from logic to statistical physics and from evolutionary biology to computer science.

<span class="mw-page-title-main">Discrete mathematics</span> Study of discrete mathematical structures

Discrete mathematics is the study of mathematical structures that can be considered "discrete" rather than "continuous". Objects studied in discrete mathematics include integers, graphs, and statements in logic. By contrast, discrete mathematics excludes topics in "continuous mathematics" such as real numbers, calculus or Euclidean geometry. Discrete objects can often be enumerated by integers; more formally, discrete mathematics has been characterized as the branch of mathematics dealing with countable sets. However, there is no exact definition of the term "discrete mathematics".

<span class="mw-page-title-main">Psychological statistics</span>

Psychological statistics is application of formulas, theorems, numbers and laws to psychology. Statistical methods for psychology include development and application statistical theory and methods for modeling psychological data. These methods include psychometrics, factor analysis, experimental designs, and Bayesian statistics. The article also discusses journals in the same field.

<span class="mw-page-title-main">Greedy algorithm</span> Sequence of locally optimal choices

A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time.

<span class="mw-page-title-main">Time series</span> Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling and thereby contrasts traditional hypothesis testing. Exploratory data analysis has been promoted by John Tukey since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

Critical discourse analysis (CDA) is an interdisciplinary approach to the study of discourse that views language as a form of social practice. CDA combines critique of discourse and explanation of how it figures within and contributes to the existing social reality, as a basis for action to change that existing reality in particular respects. Scholars working in the tradition of CDA generally argue that (non-linguistic) social practice and linguistic practice constitute one another and focus on investigating how societal power relations are established and reinforced through language use. In this sense, it differs from discourse analysis in that it highlights issues of power asymmetries, manipulation, exploitation, and structural inequities in domains such as education, media, and politics.

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

<span class="mw-page-title-main">Data analysis</span> The process of analyzing data to discover useful information and support decision-making

Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

Louis (Eliyahu) Guttman was an American sociologist and Professor of Social and Psychological Assessment at the Hebrew University of Jerusalem, known primarily for his work in social statistics.

<span class="mw-page-title-main">Anomaly detection</span> Approach in data analysis

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

John Michael Steele is C.F. Koo Professor of Statistics at the Wharton School of the University of Pennsylvania, and he was previously affiliated with Stanford University, Columbia University and Princeton University.

<span class="mw-page-title-main">Kurt Mehlhorn</span> German computer scientist (born 1949)

Kurt Mehlhorn is a German theoretical computer scientist. He has been a vice president of the Max Planck Society and is director of the Max Planck Institute for Computer Science.

Geometric data analysis comprises geometric aspects of image analysis, pattern analysis, and shape analysis, and the approach of multivariate statistics, which treat arbitrary data sets as clouds of points in a space that is n-dimensional. This includes topological data analysis, cluster analysis, inductive data analysis, correspondence analysis, multiple correspondence analysis, principal components analysis and fr:Iconographie des corrélations.

Structured data analysis is the statistical data analysis of structured data. This can arise either in the form of an a priori structure such as multiple-choice questionnaires or in situations with the need to search for structure that fits the given data, either exactly or approximately. This structure can then be used for making comparisons, predictions, manipulations etc.

<span class="mw-page-title-main">Peter Rousseeuw</span> Belgian statistician (born 1956)

Peter J. Rousseeuw is a statistician known for his work on robust statistics and cluster analysis. He obtained his PhD in 1981 at the Vrije Universiteit Brussel, following research carried out at the ETH in Zurich, which led to a book on influence functions. Later he was professor at the Delft University of Technology, The Netherlands, at the University of Fribourg, Switzerland, and at the University of Antwerp, Belgium. Next he was a senior researcher at Renaissance Technologies. He then returned to Belgium as professor at KU Leuven, until becoming emeritus in 2022. His former PhD students include Annick Leroy, Hendrik Lopuhaä, Geert Molenberghs, Christophe Croux, Mia Hubert, Stefan Van Aelst, Tim Verdonck and Jakob Raymaekers.

Mathematics is a broad subject that is commonly divided in many areas that may be defined by their objects of study, by the used methods, or by both. For example, analytic number theory is a subarea of number theory devoted to the use of methods of analysis for the study of natural numbers.

<span class="mw-page-title-main">Jacqueline Meulman</span> Dutch statistician

Jacqueline Meulman is a Dutch statistician and professor emerita of Applied Statistics at the Mathematical Institute of Leiden University.

Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories are not known. These data exist on an ordinal scale, one of four levels of measurement described by S. S. Stevens in 1946. The ordinal scale is distinguished from the nominal scale by having a ranking. It also differs from the interval scale and ratio scale by not having category widths that represent equal increments of the underlying attribute.

References

  1. Lawrence J. Hubert (1987). Assignment Methods in Combinatorial Data Analysis. Marcel Dekker. ISBN   978-0824776176.
  2. Lawrence J. Hubert; Phipps Arabie; Jacqueline Meulman (2001). Combinatorial Data Analysis: Optimization by Dynamic Programming. SIAM. ISBN   978-0898714784.
  3. Michael J Brusco; Stephanie Stahl (2005). Branch-and-bound Applications in Combinatorial Data Analysis. Springer. ISBN   978-0387250373.