Coreset

Last updated March 27, 2025

In computational geometry, a coreset of an input set is a subset of points, such that solving a problem on the coreset provably yields similar results as solving the problem on the entire point set, for some given family of problems.^[1] Coresets are commonly used in Mathematical optimization, Cluster analysis and Range Queries to reduce computational complexity while maintaining high accuracy. They allow algorithms to operate efficiently on large datasets by replacing the original data with a significantly smaller representative subset.^[2]

Many natural geometric optimization problems have coresets that approximate an optimal solution to within a factor of $1 + ε$ , that can be found quickly (in linear time or near-linear time), and that have size bounded by a function of $1/ ε$ independent of the input size, where $ε$ is an arbitrary positive number. When this is the case, one obtains a linear-time or near-linear time approximation scheme, based on the idea of finding a coreset and then applying an exact optimization algorithm to the coreset. Regardless of how slow the exact optimization algorithm is, for any fixed choice of $ε$ , the running time of this approximation scheme will be $O (1)$ plus the time to find the coreset.^[3]^[4]

Definition

A coreset is a subset $S\subseteq P$ of a point set $P$ , possibly with associated weights, that preserves an optimization cost function within a factor of $1\pm \epsilon$ , where $\epsilon >0$ is some user defined approximation parameter. Formally, for an optimization problem with some cost function COST $\left(P\right)$ , a coreset $S$ satisfies the following inequality:^[5]

$(1-\epsilon )\cdot$ COST $\left(P\right)$ $\leq$ COST $\left(S\right)$ $\leq$ (1 + $\epsilon$ ) $\cdot$ COST $\left(P\right)$

Applications

Coresets are used in a variety of problems, a few key examples include:^[6]

Clustering: Approximating solutions for K-means clustering, K-medians clustering and K-center clustering while significantly reducing computation.
Range Queries: Speeding up spatial searches in Geographic Information Systems or large databases by efficiently summarizing data.
Machine Learning: Enhancing performance in Hyperparameter optimization by working with a smaller representative set.

References

↑ Jubran, Ibrahim; Maalouf, Alaa; Feldman, Dan (2019-10-19), Introduction to Coresets: Accurate Coresets, arXiv: 1910.08707 , retrieved 2025-02-22
↑ Feldman, Dan (2020), Introduction to Core-sets: an Updated Survey, arXiv: 2011.09384 , retrieved 2025-02-22
↑ Agarwal, Pankaj K.; Har-Peled, Sariel; Varadarajan, Kasturi R. (2005), "Geometric approximation via coresets", in Goodman, Jacob E.; Pach, János; Welzl, Emo (eds.), Combinatorial and Computational Geometry, Mathematical Sciences Research Institute Publications, vol. 52, Cambridge Univ. Press, Cambridge, pp. 1–30, MR 2178310 .
↑ Nielsen, Frank (2016). "10. Fast approximate optimization in high dimensions with core-sets and fast dimension reduction". Introduction to HPC with MPI for Data Science. Springer. pp. 259–272. ISBN 978-3-319-21903-5.
↑ Frahling, Gereon; Sohler, Christian (2005-05-22). "Coresets in dynamic geometric data streams". Proceedings of the thirty-seventh annual ACM symposium on Theory of computing. STOC '05. New York, NY, USA: Association for Computing Machinery. pp. 209–217. doi:10.1145/1060590.1060622. ISBN 978-1-58113-960-0.
↑ Nijsten, Sam (2024-07-10). "Range-Centric Coresets in Dynamic Geometric Streams". Research Portal Eindhoven University of Technology. Retrieved 2025-02-21.

This algorithms or data structures-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Jubran, Ibrahim; Maalouf, Alaa; Feldman, Dan (2019-10-19), Introduction to Coresets: Accurate Coresets, arXiv: 1910.08707 , retrieved 2025-02-22

[2] Feldman, Dan (2020), Introduction to Core-sets: an Updated Survey, arXiv: 2011.09384 , retrieved 2025-02-22

[Agarwal-3] Agarwal, Pankaj K.; Har-Peled, Sariel; Varadarajan, Kasturi R. (2005), "Geometric approximation via coresets", in Goodman, Jacob E.; Pach, János; Welzl, Emo (eds.), Combinatorial and Computational Geometry, Mathematical Sciences Research Institute Publications, vol. 52, Cambridge Univ. Press, Cambridge, pp. 1–30, MR 2178310 .

[4] Nielsen, Frank (2016). "10. Fast approximate optimization in high dimensions with core-sets and fast dimension reduction". Introduction to HPC with MPI for Data Science. Springer. pp. 259–272. ISBN 978-3-319-21903-5.

[5] Frahling, Gereon; Sohler, Christian (2005-05-22). "Coresets in dynamic geometric data streams". Proceedings of the thirty-seventh annual ACM symposium on Theory of computing. STOC '05. New York, NY, USA: Association for Computing Machinery. pp. 209–217. doi:10.1145/1060590.1060622. ISBN 978-1-58113-960-0.

[6] Nijsten, Sam (2024-07-10). "Range-Centric Coresets in Dynamic Geometric Streams". Research Portal Eindhoven University of Technology. Retrieved 2025-02-21.

[1]

[2]

[3]

[4]

[5]

[6]

Coreset

Contents

Definition

Applications

References