Arthur Zimek

Arthur Zimek
Arthur Zimek
Nationality	German
Alma mater	Ludwig-Maximilians-Universität München
	Scientific career
Fields	outlier detection, correlation clustering
Institutions	University of Southern Denmark, University of Alberta, Ludwig-Maximilians-Universität München
Doctoral advisor	Hans-Peter Kriegel

Last updated May 22, 2024

Arthur Zimek is a professor in data mining, data science and machine learning at the University of Southern Denmark in Odense, Denmark.

He graduated from the Ludwig Maximilian University of Munich in Munich, Germany, where he worked with Prof. Hans-Peter Kriegel.^[1] His dissertation on "Correlation Clustering" was awarded the "SIGKDD Doctoral Dissertation Award 2009 Runner-up"^[2] by the Association for Computing Machinery.

He is well known^[3] for his work on outlier detection,^[4]^[5] density-based clustering,^[6] correlation clustering,^[7]^[8] and the curse of dimensionality.^[9]^[10]

He is one of the founders and core developers of the open-source ELKI data mining framework.^[11]^[12]

Related Research Articles

<span class="mw-page-title-main">Outlier</span> Observation far apart from others in statistics and data science

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are sometimes excluded from the data set. An outlier can be an indication of exciting possibility, but can also cause serious problems in statistical analyses.

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

<span class="mw-page-title-main">Cluster analysis</span> Grouping a set of objects by similarity

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming. The curse generally refers to issues that arise when the number of datapoints is small relative to the intrinsic dimension of the data.

<span class="mw-page-title-main">R-tree</span> Data structures used in spatial indexing

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts. A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" or "find the nearest gas station". The R-tree can also accelerate nearest neighbor search for various distance metrics, including great-circle distance.

<span class="mw-page-title-main">Parallel coordinates</span> Chart displaying multivariate data

Parallel Coordinates plots are a common method of visualizing high-dimensional datasets to analyze multivariate data having multiple variables, or attributes.

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression:

In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data.

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed, and marks as outliers points that lie alone in low-density regions . DBSCAN is one of the most common, and most commonly cited, clustering algorithms.

Clustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance.

Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters in spatial data. It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and Jörg Sander. Its basic idea is similar to DBSCAN, but it addresses one of DBSCAN's major weaknesses: the problem of detecting meaningful clusters in data of varying density. To do so, the points of the database are (linearly) ordered such that spatially closest points become neighbors in the ordering. Additionally, a special distance is stored for each point that represents the density that must be accepted for a cluster so that both points belong to the same cluster. This is represented as a dendrogram.

Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Such high-dimensional spaces of data are often encountered in areas such as medicine, where DNA microarray technology can produce many measurements at once, and the clustering of text documents, where, if a word-frequency vector is used, the number of dimensions equals the size of the vocabulary.

ELKI is a data mining software framework developed for use in research and teaching. It was originally created by the database systems research unit at the Ludwig Maximilian University of Munich, Germany, led by Professor Hans-Peter Kriegel. The project has continued at the Technical University of Dortmund, Germany. It aims at allowing the development and evaluation of advanced data mining algorithms and their interaction with database index structures.

In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours.

Hans-Peter Kriegel is a German computer scientist and professor at the Ludwig Maximilian University of Munich and leading the Database Systems Group in the Department of Computer Science. He was previously professor at the University of Würzburg and the University of Bremen after habilitation at the Technical University of Dortmund and doctorate from Karlsruhe Institute of Technology.

AMiner is a free online service used to index, search, and mine big scientific data.

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand.

<span class="mw-page-title-main">Author name disambiguation</span>

Author name disambiguation is a type of disambiguation and record linkage applied to the names of individual people. The process could, for example, distinguish individuals with the name "John Smith".

Discovering communities in a network, known as community detection/discovery, is a fundamental problem in network science, which attracted much attention in the past several decades. In recent years, with the tremendous studies on big data, another related but different problem, called community search, which aims to find the most likely community that contains the query node, has attracted great attention from both academic and industry areas. It is a query-dependent variant of the community detection problem. A detailed survey of community search can be found at ref., which reviews all the recent studies

Gautam Das is a computer scientist in the field of databases research. He is an ACM Fellow and IEEE Fellow.

References

↑ "SIGKDD Awards : 2015 SIGKDD Innovation Award: Hans-Peter Kriegel". www.kdd.org. Retrieved 2017-05-29. with his team members Peer Kroeger, Erich Schubert and Arthur Zimek
↑ "SIGKDD Doctoral Dissertation Award". ACM SIGKDD. Archived from the original on 2010-11-29. Retrieved 30 May 2010.
↑ E.g. Aggarwal, Charu C. (2016-12-10). Outlier analysis. Springer. pp. 49pp. ISBN 9783319475783. OCLC 967215852.
↑ Kriegel, Hans-Peter; Schubert, Matthias; Zimek, Arthur (2008). "Angle-based outlier detection in high-dimensional data". Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD '08. New York, NY, USA: ACM. pp. 444–452. CiteSeerX 10.1.1.329.7579 . doi:10.1145/1401890.1401946. ISBN 9781605581934. S2CID 3072058.
↑ Kriegel, Hans-Peter; Kröger, Peer; Schubert, Erich; Zimek, Arthur (2009). "LoOP". Proceedings of the 18th ACM conference on Information and knowledge management. CIKM '09. New York, NY, USA: ACM. pp. 1649–1652. doi:10.1145/1645953.1646195. ISBN 9781605585123. S2CID 14401236.
↑ Kriegel, Hans-Peter; Kröger, Peer; Sander, Jörg; Zimek, Arthur (2011-04-05). "Density-based clustering" . Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 1 (3): 231–240. doi:10.1002/widm.30. S2CID 36920706.
↑ Böhm, Christian; Kailing, Karin; Kröger, Peer; Zimek, Arthur (2004). "Computing Clusters of Correlation Connected objects". Proceedings of the 2004 ACM SIGMOD international conference on Management of data. SIGMOD '04. New York, NY, USA: ACM. pp. 455–466. CiteSeerX 10.1.1.5.1279 . doi:10.1145/1007568.1007620. ISBN 978-1581138597. S2CID 6411037.
↑ Achtert, E.; Böhm, C.; David, J.; Kröger, P.; Zimek, A. (2008-04-24). Proceedings of the 2008 SIAM International Conference on Data Mining. Proceedings. Society for Industrial and Applied Mathematics. pp. 763–774. doi:10.1137/1.9781611972788.69. ISBN 9780898716542.
↑ Zimek, Arthur; Erich, Schubert; Hans-Peter, Kriegel (2012-08-27). "A survey on unsupervised outlier detection in high-dimensional numerical data". Statistical Analysis and Data Mining. 5 (5): 5. doi:10.1002/sam.11161. S2CID 6724536.
↑ Houle, Michael E.; Kriegel, Hans-Peter; Kröger, Peer; Schubert, Erich; Zimek, Arthur (2010-06-30). "Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?". Scientific and Statistical Database Management. Lecture Notes in Computer Science. Vol. 6187. Springer, Berlin, Heidelberg. pp. 482–500. CiteSeerX 10.1.1.378.3285 . doi:10.1007/978-3-642-13818-8_34. ISBN 978-3-642-13817-1.
↑ Achtert, Elke; Kriegel, Hans-Peter; Zimek, Arthur (2008-07-09). "ELKI: A Software System for Evaluation of Subspace Clustering Algorithms". Scientific and Statistical Database Management. Lecture Notes in Computer Science. Vol. 5069. Springer, Berlin, Heidelberg. pp. 580–585. CiteSeerX 10.1.1.144.3263 . doi:10.1007/978-3-540-69497-7_41. ISBN 978-3-540-69476-2.
↑ "The ELKI Team". elki-project.github.io. Retrieved 2017-05-29.

External links

This article about a German scientist is a stub. You can help Wikipedia by expanding it.

P ≟ NP

This biographical article relating to a computer scientist is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "SIGKDD Awards : 2015 SIGKDD Innovation Award: Hans-Peter Kriegel". www.kdd.org. Retrieved 2017-05-29. with his team members Peer Kroeger, Erich Schubert and Arthur Zimek

[2] "SIGKDD Doctoral Dissertation Award". ACM SIGKDD. Archived from the original on 2010-11-29. Retrieved 30 May 2010.

[3] E.g. Aggarwal, Charu C. (2016-12-10). Outlier analysis. Springer. pp. 49pp. ISBN 9783319475783. OCLC 967215852.

[4] Kriegel, Hans-Peter; Schubert, Matthias; Zimek, Arthur (2008). "Angle-based outlier detection in high-dimensional data". Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD '08. New York, NY, USA: ACM. pp. 444–452. CiteSeerX 10.1.1.329.7579 . doi:10.1145/1401890.1401946. ISBN 9781605581934. S2CID 3072058.

[5] Kriegel, Hans-Peter; Kröger, Peer; Schubert, Erich; Zimek, Arthur (2009). "LoOP". Proceedings of the 18th ACM conference on Information and knowledge management. CIKM '09. New York, NY, USA: ACM. pp. 1649–1652. doi:10.1145/1645953.1646195. ISBN 9781605585123. S2CID 14401236.

[6] Kriegel, Hans-Peter; Kröger, Peer; Sander, Jörg; Zimek, Arthur (2011-04-05). "Density-based clustering" . Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 1 (3): 231–240. doi:10.1002/widm.30. S2CID 36920706.

[7] Böhm, Christian; Kailing, Karin; Kröger, Peer; Zimek, Arthur (2004). "Computing Clusters of Correlation Connected objects". Proceedings of the 2004 ACM SIGMOD international conference on Management of data. SIGMOD '04. New York, NY, USA: ACM. pp. 455–466. CiteSeerX 10.1.1.5.1279 . doi:10.1145/1007568.1007620. ISBN 978-1581138597. S2CID 6411037.

[8] Achtert, E.; Böhm, C.; David, J.; Kröger, P.; Zimek, A. (2008-04-24). Proceedings of the 2008 SIAM International Conference on Data Mining. Proceedings. Society for Industrial and Applied Mathematics. pp. 763–774. doi:10.1137/1.9781611972788.69. ISBN 9780898716542.

[9] Zimek, Arthur; Erich, Schubert; Hans-Peter, Kriegel (2012-08-27). "A survey on unsupervised outlier detection in high-dimensional numerical data". Statistical Analysis and Data Mining. 5 (5): 5. doi:10.1002/sam.11161. S2CID 6724536.

[10] Houle, Michael E.; Kriegel, Hans-Peter; Kröger, Peer; Schubert, Erich; Zimek, Arthur (2010-06-30). "Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?". Scientific and Statistical Database Management. Lecture Notes in Computer Science. Vol. 6187. Springer, Berlin, Heidelberg. pp. 482–500. CiteSeerX 10.1.1.378.3285 . doi:10.1007/978-3-642-13818-8_34. ISBN 978-3-642-13817-1.

[11] Achtert, Elke; Kriegel, Hans-Peter; Zimek, Arthur (2008-07-09). "ELKI: A Software System for Evaluation of Subspace Clustering Algorithms". Scientific and Statistical Database Management. Lecture Notes in Computer Science. Vol. 5069. Springer, Berlin, Heidelberg. pp. 580–585. CiteSeerX 10.1.1.144.3263 . doi:10.1007/978-3-540-69497-7_41. ISBN 978-3-540-69476-2.

[12] "The ELKI Team". elki-project.github.io. Retrieved 2017-05-29.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Authority control databases
International	ISNI VIAF
National	Germany
Academics	Association for Computing Machinery DBLP Google Scholar MathSciNet Mathematics Genealogy Project ORCID Scopus zbMATH