Biclustering

Last updated

Biclustering, block clustering, [1] [2] Co-clustering or two-mode clustering [3] [4] [5] is a data mining technique which allows simultaneous clustering of the rows and columns of a matrix. The term was first introduced by Boris Mirkin [6] to name a technique introduced many years earlier, [6] in 1972, by John A. Hartigan. [7]

Contents

Given a set of samples represented by an -dimensional feature vector, the entire dataset can be represented as rows in columns (i.e., an matrix). The Biclustering algorithm generates Biclusters. A Bicluster is a subset of rows which exhibit similar behavior across a subset of columns, or vice versa.

Development

Biclustering was originally introduced by John A. Hartigan in 1972. [7] The term "Biclustering" was then later used and refined by Boris G. Mirkin. This algorithm was not generalized until 2000, when Y. Cheng and George M. Church proposed a biclustering algorithm based on the mean squared residue score (MSR) and applied it to biological gene expression data. [8]

In 2001 and 2003, I. S. Dhillon published two algorithms applying biclustering to files and words. One version was based on bipartite spectral graph partitioning. [9] The other was based on information theory. Dhillon assumed the loss of mutual information during biclustering was equal to the Kullback–Leibler-distance (KL-distance) between P and Q. P represents the distribution of files and feature words before Biclustering, while Q is the distribution after Biclustering. KL-distance is for measuring the difference between two random distributions. KL = 0 when the two distributions are the same and KL increases as the difference increases. [10] Thus, the aim of the algorithm was to find the minimum KL-distance between P and Q. In 2004, Arindam Banerjee used a weighted-Bregman distance instead of KL-distance to design a Biclustering algorithm that was suitable for any kind of matrix, unlike the KL-distance algorithm. [11]

To cluster more than two types of objects, in 2005, Bekkerman expanded the mutual information in Dhillon's theorem from a single pair into multiple pairs. [12]

Complexity

The complexity of the Biclustering problem depends on the exact problem formulation, and particularly on the merit function used to evaluate the quality of a given Bicluster. However, the most interesting variants of this problem are NP-complete. NP-complete has two conditions. In the simple case that there is an only element a(i,j) either 0 or 1 in the binary matrix A, a Bicluster is equal to a biclique in the corresponding bipartite graph. The maximum size Bicluster is equivalent to the maximum edge biclique in the bipartite graph. In the complex case, the element in matrix A is used to compute the quality of a given Bicluster and solve the more restricted version of the problem. [13] It requires either large computational effort or the use of lossy heuristics to short-circuit the calculation. [14]

Types of Biclusters

Bicluster with constant values (a)

When a Biclustering algorithm tries to find a constant-value Bicluster, it reorders the rows and columns of the matrix to group together similar rows and columns, eventually grouping Biclusters with similar values. This method is sufficient when the data is normalized. A perfect constant Bicluster is a matrix(I,J) in which all values a(i,j) are equal to a given constant μ. In tangible data, these entries a(i,j) may be represented with the form n(i,j) + μ where n(i,j) denotes the noise. According to Hartigan's algorithm, by splitting the original data matrix into a set of Biclusters, variance is used to compute constant Biclusters. Hence, a perfect Bicluster may be equivalently defined as a matrix with a variance of zero. In order to prevent the partitioning of the data matrix into Biclusters with the only one row and one column; Hartigan assumes that there are, for example, K Biclusters within the data matrix. When the data matrix is partitioned into K Biclusters, the algorithm ends.

Bicluster with constant values on rows (b) or columns (c)

Unlike the constant-value Biclusters, these types of Biclusters cannot be evaluated solely based on the variance of their values. To finish the identification, the columns and the rows should be normalized first. There are, however, other algorithms, without the normalization step, that can find Biclusters which have rows and columns with different approaches.

Bicluster with coherent values (d, e)

For Biclusters with coherent values on rows and columns, an overall improvement over the algorithms for Biclusters with constant values on rows or on columns should be considered. This algorithm may contain analysis of variance between groups, using co-variance between both rows and columns. In Cheng and Church's theorem, a Bicluster is defined as a subset of rows and columns with almost the same score. The similarity score is used to measure the coherence of rows and columns.


a) Bicluster with constant values
2.02.02.02.02.0
2.02.02.02.02.0
2.02.02.02.02.0
2.02.02.02.02.0
2.02.02.02.02.0
b) Bicluster with constant values on rows
1.01.01.01.01.0
2.02.02.02.02.0
3.03.03.03.03.0
4.04.04.04.04.0
5.05.05.05.05.0
c) Bicluster with constant values on columns
1.02.03.04.05.0
1.02.03.04.05.0
1.02.03.04.05.0
1.02.03.04.05.0
1.02.03.04.05.0
d) Bicluster with coherent values (additive)
1.04.05.00.01.5
4.07.08.03.04.5
3.06.07.02.03.5
5.08.09.04.05.5
2.05.06.01.02.5
e) Bicluster with coherent values (multiplicative)
1.00.52.00.20.8
2.01.04.00.41.6
3.01.56.00.62.4
4.02.08.00.83.2
5.02.510.01.04.0


The relationship between these cluster models and other types of clustering such as correlation clustering is discussed in. [15]

Algorithms

There are many Biclustering algorithms developed for bioinformatics, including: block clustering, CTWC (Coupled Two-Way Clustering), ITWC (Interrelated Two-Way Clustering), δ-bicluster, δ-pCluster, δ-pattern, FLOC, OPC, Plaid Model, OPSMs (Order-preserving submatrixes), Gibbs, SAMBA (Statistical-Algorithmic Method for Bicluster Analysis), [16] Robust Biclustering Algorithm (RoBA), Crossing Minimization, [17] cMonkey, [18] PRMs, DCC, LEB (Localize and Extract Biclusters), QUBIC (QUalitative BIClustering), BCCA (Bi-Correlation Clustering Algorithm) BIMAX, ISA and FABIA (Factor analysis for Bicluster Acquisition), [19] runibic, [20] and recently proposed hybrid method EBIC (evolutionary-based Biclustering), [21] which was shown to detect multiple patterns with very high accuracy. More recently, IMMD-CC [22] is proposed that is developed based on the iterative complexity reduction concept. IMMD-CC is able to identify co-cluster centroids from highly sparse transformation obtained by iterative multi-mode discretization.

Biclustering algorithms have also been proposed and used in other application fields under the names co-clustering, bi-dimensional clustering, and subspace clustering. [14]

Given the known importance of discovering local patterns in time-series data. Recent proposals have addressed the Biclustering problem in the specific case of time-series gene expression data. In this case, the interesting Biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the development of efficient exhaustive enumeration algorithms such as CCC-Biclustering [23] and e-CCC-Biclustering. [24] The approximate patterns in CCC-Biclustering algorithms allow a given number of errors, per gene, relatively to an expression profile representing the expression pattern in the Bicluster. The e-CCC-Biclustering algorithm uses approximate expressions to find and report all maximal CCC-Bicluster's by a discretized matrix A and efficient string processing techniques.

These algorithms find and report all maximal Biclusters with coherent and contiguous columns with perfect/approximate expression patterns, in time linear/polynomial which is obtained by manipulating a discretized version of original expression matrix in the size of the time-series gene expression matrix using efficient string processing techniques based on suffix trees. These algorithms are also applied to solve problems and sketch the analysis of computational complexity.

Some recent algorithms have attempted to include additional support for Biclustering rectangular matrices in the form of other datatypes, including cMonkey.

There is an ongoing debate about how to judge the results of these methods, as Biclustering allows overlap between clusters and some algorithms allow the exclusion of hard-to-reconcile columns/conditions. Not all of the available algorithms are deterministic and the analyst must pay attention to the degree to which results represent stable minima. Because this is an unsupervised classification problem, the lack of a gold standard makes it difficult to spot errors in the results. One approach is to utilize multiple Biclustering algorithms, with the majority or super-majority voting amongst them to decide the best result. Another way is to analyze the quality of shifting and scaling patterns in Biclusters. [25] Biclustering has been used in the domain of text mining (or classification) which is popularly known as co-clustering. [26] Text corpora are represented in a vectoral form as a matrix D whose rows denote the documents and whose columns denote the words in the dictionary. Matrix elements Dij denote occurrence of word j in document i. Co-clustering algorithms are then applied to discover blocks in D that correspond to a group of documents (rows) characterized by a group of words(columns).

Text clustering can solve the high-dimensional sparse problem, which means clustering text and words at the same time. When clustering text, we need to think about not only the words information, but also the information of words clusters that was composed by words. Then, according to similarity of feature words in the text, will eventually cluster the feature words. This is called co-clustering. There are two advantages of co-clustering: one is clustering the test based on words clusters can extremely decrease the dimension of clustering, it can also appropriate to measure the distance between the tests. Second is mining more useful information and can get the corresponding information in test clusters and words clusters. This corresponding information can be used to describe the type of texts and words, at the same time, the result of words clustering can be also used to text mining and information retrieval.

Several approaches have been proposed based on the information contents of the resulting blocks: matrix-based approaches such as SVD and BVD, and graph-based approaches. Information-theoretic algorithms iteratively assign each row to a cluster of documents and each column to a cluster of words such that the mutual information is maximized. Matrix-based methods focus on the decomposition of matrices into blocks such that the error between the original matrix and the regenerated matrices from the decomposition is minimized. Graph-based methods tend to minimize the cuts between the clusters. Given two groups of documents d1 and d2, the number of cuts can be measured as the number of words that occur in documents of groups d1 and d2.

More recently (Bisson and Hussain) [26] have proposed a new approach of using the similarity between words and the similarity between documents to co-cluster the matrix. Their method (known as χ-Sim, for cross similarity) is based on finding document-document similarity and word-word similarity, and then using classical clustering methods such as hierarchical clustering. Instead of explicitly clustering rows and columns alternately, they consider higher-order occurrences of words, inherently taking into account the documents in which they occur. Thus, the similarity between two words is calculated based on the documents in which they occur and also the documents in which "similar" words occur. The idea here is that two documents about the same topic do not necessarily use the same set of words to describe it, but a subset of the words and other similar words that are characteristic of that topic. This approach of taking higher-order similarities takes the latent semantic structure of the whole corpus into consideration with the result of generating a better clustering of the documents and words.

In text databases, for a document collection defined by a document by term D matrix (of size m by n, m: number of documents, n: number of terms) the cover-coefficient based clustering methodology [27] yields the same number of clusters both for documents and terms (words) using a double-stage probability experiment. According to the cover coefficient concept number of clusters can also be roughly estimated by the following formula where t is the number of non-zero entries in D. Note that in D each row and each column must contain at least one non-zero element.

In contrast to other approaches, FABIA is a multiplicative model that assumes realistic non-Gaussian signal distributions with heavy tails. FABIA utilizes well understood model selection techniques like variational approaches and applies the Bayesian framework. The generative framework allows FABIA to determine the information content of each Bicluster to separate spurious Biclusters from true Biclusters.

See also

Related Research Articles

<span class="mw-page-title-main">Sequence alignment</span> Process in bioinformatics that identifies equivalent sites within molecular sequences

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns. Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

In information science, formal concept analysis (FCA) is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the objects sharing some set of properties; and each sub-concept in the hierarchy represents a subset of the objects in the concepts above it. The term was introduced by Rudolf Wille in 1981, and builds on the mathematical theory of lattices and ordered sets that was developed by Garrett Birkhoff and others in the 1930s.

<span class="mw-page-title-main">Cluster analysis</span> Grouping a set of objects by similarity

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per document is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents.

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

Medoids are representative objects of a data set or a cluster within a data set whose sum of dissimilarities to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always restricted to be members of the data set. Medoids are most commonly used on data when a mean or centroid cannot be defined, such as graphs. They are also used in contexts where the centroid is not representative of the dataset like in images, 3-D trajectories and gene expression. These are also of interest while wanting to find a representative using some distance other than squared euclidean distance.

<span class="mw-page-title-main">Fuzzy clustering</span>

Fuzzy clustering is a form of clustering in which each data point can belong to more than one cluster.

Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data. These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. As such, computational genomics may be regarded as a subset of bioinformatics and computational biology, but with a focus on using whole genomes to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

<span class="mw-page-title-main">Non-negative matrix factorization</span> Algorithms for matrix decomposition

Non-negative matrix factorization, also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. For example, these techniques have been used to explore the family tree of hominid species and the relationships between specific genes shared by many types of organisms.

<span class="mw-page-title-main">Microarray analysis techniques</span>

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes - in many cases, an organism's entire genome - in a single experiment. Such experiments can generate very large amounts of data, allowing researchers to assess the overall state of a cell or organism. Data in such large quantities is difficult - if not impossible - to analyze without the help of computer programs.

<span class="mw-page-title-main">Spectral clustering</span> Clustering methods

In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.

Consensus clustering is a method of aggregating results from multiple clustering algorithms. Also called cluster ensembles or aggregation of clustering, it refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete, even when the number of input clusterings is three. Consensus clustering for unsupervised learning is analogous to ensemble learning in supervised learning.

Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Such high-dimensional spaces of data are often encountered in areas such as medicine, where DNA microarray technology can produce many measurements at once, and the clustering of text documents, where, if a word-frequency vector is used, the number of dimensions equals the size of the vocabulary.

In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word. Given that the output of word-sense induction is a set of senses for the target word, this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.

<span class="mw-page-title-main">Ron Shamir</span> Israeli professor of computer science (born 1953)

Ron Shamir is an Israeli professor of computer science known for his work in graph theory and in computational biology. He holds the Raymond and Beverly Sackler Chair in Bioinformatics, and is the founder and former head of the Edmond J. Safra Center for Bioinformatics at Tel Aviv University.

<span class="mw-page-title-main">Gene co-expression network</span>

A gene co-expression network (GCN) is an undirected graph, where each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them. Having gene expression profiles of a number of genes for several samples or experimental conditions, a gene co-expression network can be constructed by looking for pairs of genes which show a similar expression pattern across samples, since the transcript levels of two co-expressed genes rise and fall together across samples. Gene co-expression networks are of biological interest since co-expressed genes are controlled by the same transcriptional regulatory program, functionally related, or members of the same pathway or protein complex.

<span class="mw-page-title-main">Haesun Park</span> South Korean American mathematician

Haesun Park is a professor and chair of Computational Science and Engineering at the Georgia Institute of Technology. She is an IEEE Fellow, ACM Fellow, and Society for Industrial and Applied Mathematics Fellow. Park's main areas of research are Numerical Algorithms, Data Analysis, Visual Analytics and Parallel Computing. She has co-authored over 100 articles in peer-reviewed journals and conferences.

<span class="mw-page-title-main">Umit Catalyurek</span>

Ümit V. Çatalyürek is a professor of computer science at the Georgia Institute of Technology, and Adjunct Professor in department of Biomedical Informatics at the Ohio State University. He is known for his work on graph analytics, parallel algorithms for scientific applications, data-intensive computing, and large scale genomic and biomedical applications. He was the director of the High Performance Computing Lab at the Ohio State University. He was named Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2016 for contributions to combinatorial scientific computing and parallel computing.

Machine learning in bioinformatics is the application of machine learning algorithms to bioinformatics, including genomics, proteomics, microarrays, systems biology, evolution, and text mining.

References

  1. G. Govaert; M. Nadif (2008). "Block clustering with bernoulli mixture models: Comparison of different approaches". Computational Statistics and Data Analysis. 52 (6): 3233–3245. doi:10.1016/j.csda.2007.09.007.
  2. R. Balamurugan; A.M. Natarajan; K. Premalatha (2015). "Stellar-Mass Black Hole Optimization for Biclustering Microarray Gene Expression Data". Applied Artificial Intelligence. 29 (4): 353–381. doi: 10.1080/08839514.2015.1016391 . S2CID   44624424.
  3. G. Govaert; M. Nadif (2013). Co-clustering: models, algorithms and applications. ISTE, Wiley. ISBN   978-1-84821-473-6.
  4. R. Balamurugan; A.M. Natarajan; K. Premalatha (2016). "A Modified Harmony Search Method for Biclustering Microarray Gene Expression Data". International Journal of Data Mining and Bioinformatics. 16 (4): 269–289. doi:10.1504/IJDMB.2016.082205.
  5. Van Mechelen I, Bock HH, De Boeck P (2004). "Two-mode clustering methods:a structured overview". Statistical Methods in Medical Research. 13 (5): 363–94. CiteSeerX   10.1.1.706.4201 . doi:10.1191/0962280204sm373ra. PMID   15516031. S2CID   19058237.
  6. 1 2 Mirkin, Boris (1996). Mathematical Classification and Clustering. Kluwer Academic Publishers. ISBN   978-0-7923-4159-8.
  7. 1 2 Hartigan JA (1972). "Direct clustering of a data matrix". Journal of the American Statistical Association. 67 (337): 123–9. doi:10.2307/2284710. JSTOR   2284710.
  8. https://www.cs.princeton.edu/courses/archive/fall03/cs597F/Articles/biclustering_of_expression_data.pdf Cheng Y, Church G M. Biclustering of expression data[C]//Ismb. 2000, 8: 93–103.
  9. Dhillon, Inderjit S. (2001). "Co-clustering documents and words using bipartite spectral graph partitioning". Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 269–274. doi:10.1145/502512.502550. ISBN   158113391X. S2CID   11847258.
  10. Dhillon, Inderjit S.; Mallela, Subramanyam; Modha, Dharmendra S. (2003). "Information-theoretic co-clustering". Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 89–98. doi:10.1145/956750.956764. ISBN   1581137370. S2CID   12286784.
  11. Banerjee, Arindam; Dhillon, Inderjit; Ghosh, Joydeep; Merugu, Srujana; Modha, Dharmendra S. (2004). "A generalized maximum entropy approach to bregman co-clustering and matrix approximation". Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 509–514. doi:10.1145/1014052.1014111. ISBN   1581138881. S2CID   2719002.
  12. Bekkerman, Ron; El-Yaniv, Ran; McCallum, Andrew (2005). "Multi-way distributional clustering via pairwise interactions". Proceedings of the 22nd international conference on Machine learning - ICML '05. pp. 41–48. doi:10.1145/1102351.1102357. ISBN   1595931805. S2CID   858524.
  13. Peeters R (2003). "The maximum edge biclique problem is NP-complete". Discrete Applied Mathematics. 131 (3): 651–654. doi: 10.1016/S0166-218X(03)00333-0 . S2CID   3102766.
  14. 1 2 Madeira SC, Oliveira AL (2004). "Biclustering Algorithms for Biological Data Analysis: A Survey". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 1 (1): 24–45. doi:10.1109/TCBB.2004.2. PMID   17048406. S2CID   206628783.
  15. Kriegel, H.-P.; Kröger, P.; Zimek, A. (March 2009). "Clustering High Dimensional Data: A Survey on Subspace Clustering, Pattern-based Clustering, and Correlation Clustering". ACM Transactions on Knowledge Discovery from Data. 3 (1): 1–58. doi:10.1145/1497577.1497578. S2CID   17363900.
  16. Tanay A, Sharan R, Kupiec M, Shamir R (2004). "Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data". Proceedings of the National Academy of Sciences. 101 (9): 2981–2986. Bibcode:2004PNAS..101.2981T. doi: 10.1073/pnas.0308661100 . PMC   365731 . PMID   14973197.
  17. Abdullah, Ahsan; Hussain, Amir (2006). "A new biclustering technique based on crossing minimization". Neurocomputing. 69 (16–18): 1882–1896. doi:10.1016/j.neucom.2006.02.018.
  18. Reiss DJ, Baliga NS, Bonneau R (2006). "Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks". BMC Bioinformatics. 7: 280–302. doi: 10.1186/1471-2105-7-280 . PMC   1502140 . PMID   16749936.
  19. Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W, Bijnens L, Gohlmann HW, Shkedy Z, Clevert DA (2010). "FABIA: factor analysis for bicluster acquisition". Bioinformatics. 26 (12): 1520–1527. doi:10.1093/bioinformatics/btq227. PMC   2881408 . PMID   20418340.
  20. Orzechowski P, Pańszczyk A, Huang X, Moore JH (2018). "runibic: a Bioconductor package for parallel row-based biclustering of gene expression data". Bioinformatics. 34 (24): 4302–4304. doi:10.1093/bioinformatics/bty512. PMC   6289127 . PMID   29939213.
  21. Orzechowski P, Sipper M, Huang X, Moore JH (2018). "EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery". Bioinformatics. 34 (21): 3719–3726. arXiv: 1801.03039 . doi:10.1093/bioinformatics/bty401. PMC   6198864 . PMID   29790909.
  22. Fanaee-T H, Thoresen, M (2020). "Iterative Multi-mode Discretization: Applications to Co-clustering". Discovery Science. Lecture Notes in Computer Science. Vol. 12323. pp. 94–105. doi:10.1007/978-3-030-61527-7_7. hdl: 10852/82994 . ISBN   978-3-030-61526-0. S2CID   222832035.
  23. Madeira SC, Teixeira MC, Sá-Correia I, Oliveira AL (2010). "Identification of Regulatory Modules in Time Series Gene Expression Data using a Linear Time Biclustering Algorithm". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 1 (7): 153–165. doi:10.1109/TCBB.2008.34. PMID   20150677. S2CID   7369531.
  24. Madeira SC, Oliveira AL (2009). "A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series". Algorithms for Molecular Biology. 4 (8): 8. doi: 10.1186/1748-7188-4-8 . PMC   2709627 . PMID   19497096.
  25. Aguilar-Ruiz JS (2005). "Shifting and scaling patterns from gene expression data". Bioinformatics. 21 (10): 3840–3845. doi: 10.1093/bioinformatics/bti641 . PMID   16144809.
  26. 1 2 Bisson G.; Hussain F. (2008). "Chi-Sim: A New Similarity Measure for the Co-clustering Task". 2008 Seventh International Conference on Machine Learning and Applications. pp. 211–217. doi:10.1109/ICMLA.2008.103. ISBN   978-0-7695-3495-4. S2CID   15506600.
  27. Can, F.; Ozkarahan, E. A. (1990). "Concepts and effectiveness of the cover coefficient based clustering methodology for text databases" (PDF). ACM Transactions on Database Systems. 15 (4): 483–517. doi:10.1145/99935.99938. hdl: 2374.MIA/246 . S2CID   14309214.

Others