In today’s class, Professor mentioned about the instability of DBSCAN compared to K-means. Below are the different scenarios which highlight the instability of DBSCAN.
Sensitivity to Density Variations:
DBSCAN’s stability can be influenced by variations in data point density. When data density exhibits significant discrepancies across different segments of the dataset, it can lead to the formation of clusters with varying sizes and shapes. Consequently, the task of selecting appropriate parameters (such as the maximum distance ε and minimum point thresholds) for defining clusters effectively becomes challenging.
Conversely, K-means operates under the assumption of spherical and uniformly sized clusters, thereby potentially performing more effectively when the clusters share similar densities and shapes.
Sensitivity to Parameter Choices:
DBSCAN necessitates the configuration of hyperparameters, such as ε (representing the maximum distance that defines the neighborhood of a data point) and the minimum number of data points required to establish a dense region. These parameter choices hold considerable influence over the resultant clusters.
K-means, while also requiring a parameter (the number of clusters, K), is generally more straightforward to determine since it directly reflects the desired number of clusters. In contrast, the parameters of DBSCAN are more abstract, which can introduce sensitivity to the selection of parameter values.
Boundary Points and Noise:
DBSCAN explicitly identifies noise points, which are data points that do not belong to any cluster, and is proficient at handling outliers. However, the delineation of boundary points (those located on the periphery of a cluster) within DBSCAN can sometimes exhibit an arbitrary nature.
In K-means, data points situated at the boundaries of clusters may be assigned to one of the neighboring clusters, potentially resulting in instability when a data point is proximate to the boundary shared by two clusters.
Varying Cluster Shapes:
DBSCAN excels in its ability to accommodate clusters with arbitrary shapes and to detect clusters with irregular boundaries. This stands in contrast to K-means, which presupposes the presence of roughly spherical clusters and consequently demonstrates greater stability when the data conforms to this assumption.