Machine Learning: BIRCH Clustering Algorithm Clearly Explained (2024)

July 01, 2019

Machine Learning: BIRCH Clustering Algorithm Clearly Explained

Existing data clustering methods do not adequately address the problem of processing large datasets with a limited amount of resources (i.e. memory and cpu cycles). In consequence, as the dataset size increases, they scale poorly in terms of running time, and result quality. At a high level, Balanced Iterative Reducing and Clustering using Hierarchies, or BIRCH for short, deals with large datasets by first generating a more compact summary that retains as much distribution information as possible, and then clustering the data summary instead of the original dataset. BIRCH actually complements other clustering algorithms by virtue if the fact that different clustering algorithms can be applied to the summary produced by BIRCH. BIRCH can only deal with metric attributes (similar to the kind of features KMEANS can handle). A metric attribute is one whose values can be represented by explicit coordinates in an Euclidean space (no categorical variables).

Clustering Feature(CF)

BIRCH attempts to minimize the memory requirements of large datasets by summarizing the information contained in dense regions as Clustering Feature (CF) entries.

Machine Learning: BIRCH Clustering Algorithm Clearly Explained (2)

As we’re about to see, it’s possible to have CFs composed of other CFs. In this case, the subcluster is equal to the sum of the CFs.

Machine Learning: BIRCH Clustering Algorithm Clearly Explained (3)

CF-tree

The CF-tree is a very compact representation of the dataset because each entry in a leaf node is not a single data point but a subcluster. Each nonleaf node contains at most B entries. In this context, a single entry contains a pointer to a child node and a CF made up of the sum of the CFs in the child (subclusters of subclusters). On the other hand, a leaf node contains at most L entries, and each entry is a CF (subclusters of data points). All entries in a leaf node must satisfy a threshold requirement. That is to say, the diameter of each leaf entry has to be less than Threshold. In addition, every leaf node has two pointers, prev and next, which are used to chain all leaf nodes together for efficient scans.

Machine Learning: BIRCH Clustering Algorithm Clearly Explained (4)

Insertion Algorithm

Let’s describe how we’d go about inserting a CF entry (a single data point or subcluster) into a CF-tree.

Identify the appropriate leaf: Starting from the root, recursively descend the DF-tree by choosing the closest child node according to the chosen distance metric (i.e. euclidean distance).
Modify the leaf: Upon reaching a leaf node, find the closest entry and test whether it can absorb the CF entry without violating the threshold condition. If it can, update the CF entry, otherwise, add a new CF entry to the leaf. If there isn’t enough space on the leaf for this new entry to fit in, then we must split the leaf node. Node splitting is done by choosing the two entries that are farthest apart as seeds and redistributing the remaining entries based on distance.
Modify the path to the leaf: Recall how every nonleaf node is itself a CF composed of the CFs of all its children. Therefore, after inserting a CF entry into a leaf, we update the CF information for each nonleaf entry on the path to the leaf. In the event of a split, we must insert a new nonleaf entry into the parent node and have it point to the newly formed leaf. If according to B, the parent doesn’t have enough room, then we must split the parent as well, and so on up to the root.

Clustering Algorithm

Now that we’ve covered some of the concepts underlying BIRCH, let’s walk through how the algorithm works.

Phase 1

The algorithm starts with an initial threshold value, scans the data, and inserts points into the tree. If it runs out of memory before it finishes scanning the data, it increases the threshold value, and rebuilds a new, smaller CF-tree, by re-inserting the leaf entries of the old CF-tree into the new CF-tree. After all the old leaf entries have been re-inserted, the scanning of the data and insertion into the new CF-tree is resumed from the point at which it was interrupted.

A good choice of threshold value can greatly reduce the number of rebuilds. However, if the initial threshold is too high, we will obtain a less detailed CF-tree than is feasible with the available memory.

Machine Learning: BIRCH Clustering Algorithm Clearly Explained (6)

Optionally, we can allocate a fixed amount of disk space for handling outliers. Outliers are leaf entries of low density that are judged to be unimportant with respect to the overall clustering pattern. When we rebuild the CF-tree by reinserting the old leaf entries, the size of the new CF-tree is reduced in two ways. First, we increase the threshold value, thereby allowing each leaf entry to absorb more points. Second, we treat some leaf entries as potential outliers and write them out to disk. An old leaf entry is considered a potential outlier if it has far fewer data points than average. An increase in the threshold value or a change in the distribution in response to the new data could well mean that the potential outlier no longer qualifies as an outlier. In consequence, the potential outliers are scanned to check if they can be re-absorbed in the tree without causing the tree to grow in size.

Phase 2

Given that certain clustering algorithms perform best when the number of objects is within a certain range, we can group crowded subclusters into larger ones resulting in an overall smaller CF-tree.

Phase 3

Almost any clustering algorithm can be adapted to categorize Clustering Features instead of data points. For instance, we could use KMEANS to categorize our data, all the while deriving the benefits from BIRCH (i.e. minimize I/O operations).

Phase 4

Up until now, although the tree may have been rebuilt multiple times, the original data has only been scanned once. Phase 4 involves additional passes over the data to correct inaccuracies caused by the fact that the clustering algorithm is applied to a coarse summary of the data. Phase 4 also provides us with the option of discarding outliers.

Code

Next, we’ll implement BIRCH in Python.

import numpy as np from matplotlib import pyplot as plt import seaborn as sns sns.set() from sklearn.datasets.samples_generator import make_blobs from sklearn.cluster import Birch

We use scikit-learn to generate data with nicely defined clusters.

X, clusters = make_blobs(n_samples=450, centers=6, cluster_std=0.70, random_state=0) plt.scatter(X[:,0], X[:,1], alpha=0.7, edgecolors='b')

Machine Learning: BIRCH Clustering Algorithm Clearly Explained (7)

Next, we initialize and train our model, using the following parameters:

threshold: The radius of the subcluster obtained by merging a new sample and the closest subcluster should be lesser than the threshold.
branching_factor: Maximum number of CF subclusters in each node
n_clusters: Number of clusters after the final clustering step, which treats the subclusters from the leaves as new samples. If set to None, the final clustering step is not performed and the subclusters are returned as they are.

brc = Birch(branching_factor=50, n_clusters=None, threshold=1.5)

brc.fit(X)

We use the predict method to obtain a list of points and their respective cluster.

labels = brc.predict(X)

Finally, we plot the data points using a different color for each cluster.

plt.scatter(X[:,0], X[:,1], c=labels, cmap='rainbow', alpha=0.7, edgecolors='b')

Machine Learning: BIRCH Clustering Algorithm Clearly Explained (8)

Final Thoughts

BIRCH provides a clustering method for very large datasets. It makes a large clustering problem plausible by concentrating on densely occupied regions, and creating a compact summary. BIRCH can work with any given amount of memory, and the I/O complexity is a little more than one scan of data. Other clustering algorithms can be applied to the subclusters produced by BIRCH.

Sources

http://www.cs.uvm.edu/~xwu/kdd/BIRCH.pdf

FAQs

Machine Learning: BIRCH Clustering Algorithm Clearly Explained? ›

BIRCH algorithm divides large data into small clusters and tries to retain the maximum amount of information possible. Smaller groups are then clustered for a final output instead of clustering the large datasets directly.

What is the formula for BIRCH algorithm? ›

In standard BIRCH, the CF formula used is CF = (N, LS, and SS). This study will use changes to the CF-Leaf value. The CF-leaf modif formula to be used is CF-Leaf (modif) = (N, LS, SS, T).

Read The Full Story ›

What is the difference between BIRCH and Kmeans? ›

K-Means is a technique of partitioning-based clustering, whereas BIRCH clustering is a technique of hierarchical method of clustering.

Read On ›

How does the BIRCH algorithm work explain? ›

This algorithm is based on the CF (clustering features) tree. In addition, this algorithm uses a tree-structured summary to create clusters. In context to the CF tree, the algorithm compresses the data into the sets of CF nodes. Those nodes that have several sub-clusters can be called CF subclusters.

Discover More ›

What are the disadvantages of BIRCH algorithm? ›

However, BIRCH has one major drawback – it can only process metric attributes. A metric attribute is any attribute whose values can be represented in Euclidean space i.e., no categorical attributes should be present.

Show Me More ›

What is the complexity of birch clustering? ›

“How effective is BIRCH?” The time complexity of the algorithm is O(n), where n is the number of objects to be clustered. Experiments have shown the linear scalability of the algorithm with respect to the number of objects, and good quality of clustering of the data.

See Details ›

Is BIRCH agglomerative or divisive? ›

'BIRCH' constructs an in-memory CF-tree (a tree data structure). The second stage, global clustering, employs agglomerative hierarchical clustering with linkage criteria to cluster small-diameter dense sub-clusters represented by CF vectors in the leaf nodes of the CF tree.

View Details ›

What are the advantages of birch clustering? ›

An advantage of BIRCH is its ability to incrementally and dynamically cluster incoming, multi-dimensional metric data points in an attempt to produce the best quality clustering for a given set of resources (memory and time constraints). In most cases, BIRCH only requires a single scan of the database.

What is the threshold of birch clustering? ›

PARAMETERS OF BIRCH:

There are three parameters in this algorithm, which needs to be tuned. threshold : threshold is the maximum number of data points a sub-cluster in the leaf node of the CF tree can hold. branching factor : This parameter specifies the maximum number of CF sub-clusters in each node (internal node).

Get More Info ›

What are the parameters of birch clustering? ›

BIRCH requires three parameters: the branching factor Br, the threshold T, and the cluster count k. While the data points are entered into BIRCH, a height-balanced tree, the cluster features tree, or CF tree, of hierarchical clusters is built.

Discover More ›

What is BIRCH classification machine learning? ›

The BIRCH Classification tool runs the memory-efficient BIRCH algorithm to construct a tree data structure with the cluster centroids being read off the leaf. This tool performs unsupervised classification on a single raster. You provide an input raster and parameter settings to generate a classified raster.

See Details ›

Is BIRCH incremental? ›

BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints).

Explore More ›

What is hierarchical clustering in machine learning? ›

Hierarchical clustering is a connectivity-based clustering model that groups the data points together that are close to each other based on the measure of similarity or distance. The assumption is that data points that are close to each other are more similar or related than data points that are farther apart.

What is the formula for logarithm algorithm? ›

The logarithm of base b for value y is the power to which b is raised to get y. Normally, this is written as logby=x. Thus, if logby=x then bx=y, and blogby=y.

Discover More ›

What is the formula for gradient descent algorithm? ›

Gradient descent minimizes differentiable functions that output a number and have any amount of input variables. It does this by taking a guess ‍ and successively applying the formula x n + 1 = x n − α ∇ f ( x n ) ‍ . In words, the formula says to take a small step in the direction of the negative gradient.

Learn More ›

What is the formula for the quadratic equation algorithm? ›

Quadratic equations are the polynomial equations of degree 2 in one variable of type: f(x) = ax^2 +bx + c, where a, b, c, ∈ R and a ≠ 0. The general form of the quadratic equation is called the leading coefficient, and c is called the absolute term of f(x).

View Details ›

What is the formula for algorithm growth rate? ›

The function f(n) is called the algorithm's growth-rate function. Since the capital O is used in the notation, this notation is called the Big O notation. If Algorithm A requires time proportional to n², it is O(n²). If Algorithm A requires time proportional to n, it is O(n).

Machine Learning: BIRCH Clustering Algorithm Clearly Explained (2024)

Machine Learning: BIRCH Clustering Algorithm Clearly Explained

Clustering Feature(CF)

CF-tree

Insertion Algorithm

Clustering Algorithm

Phase 1

Phase 2

Phase 3

Phase 4

Code

Final Thoughts

Sources

FAQs

Machine Learning: BIRCH Clustering Algorithm Clearly Explained? ›

What is BIRCH classification machine learning? ›

References