Algorithms

Clustering Algorithms

The k-means algorithm goes as follows:

1) Place k-number of points as the initial group centers in the space represented by the objects being clustered

2) Assign each object to the group that has the closest center

3)When all objects have been assigned, recalculate the positions of the k-number of cluster centers

4)Repeat steps 2 and 3 until the centers no longer move. This procedure produces a separation of the objects into groups from which the metric to be minimized can be calculated

The COP-KMeans Clustering algorithm goes as follows:

Let there be a data set D, a set of must-link constraints Con₌ ⊆ D x D, and a set of cannot-link constraints Con_≠ ⊆ D x D.

1) Let C₁...C_k be the initial cluster centers.

2) For each point d_i in D, assign it to the closest cluster C_j such that VIOLATE_CONSTRAINTS(d_i, C_j, Con_&sube, Con_&ne) is false. If no such cluster exists, fail (return{}).

3) For each cluster C_i, update its center by averaging all of the points d_j that have been assigned to it.

4) Iterate between (2) and (3) until convergence.

5) Return {C₁...C_k}.

VIOLATE-CONSTRAINTS(data point d, cluster C, must-link constraints Con₌ ⊆ D x D, cannot-link constraints Con_≠ ⊆ D x D)

1) For each (d, d₌) ∈ Con₌ : If d₌ ∉ C, return true.

2) For each (d, d_≠) ∈ Con_≠ : If d_≠ ∈ C, return true.

3) Otherwise, return false.

The PCKMeans algorithm is similar to COP-KMeans. The main difference is that this algorithm can violate the constraints, but it has to penalize itself for doing so. It tries to come up with a good cluster formation while minimizing the penalty that it incurs.