Algorithms

Clustering Algorithms

The k-means algorithm goes as follows:

1) Place k-number of points as the initial group centers in the space represented by the objects being clustered

2) Assign each object to the group that has the closest center

3)When all objects have been assigned, recalculate the positions of the k-number of cluster centers

4)Repeat steps 2 and 3 until the centers no longer move. This procedure produces a separation of the objects into groups from which the metric to be minimized can be calculated

The COP-KMeans Clustering algorithm goes as follows:

Let there be a data set D, a set of must-link constraints Con=D x D, and a set of cannot-link constraints ConD x D.

1) Let C1...Ck be the initial cluster centers.

2) For each point di in D, assign it to the closest cluster Cj such that VIOLATE_CONSTRAINTS(di, Cj, Con&sube, Con&ne) is false. If no such cluster exists, fail (return{}).

3) For each cluster Ci, update its center by averaging all of the points dj that have been assigned to it.

4) Iterate between (2) and (3) until convergence.

5) Return {C1...Ck}.

VIOLATE-CONSTRAINTS(data point d, cluster C, must-link constraints Con=D x D, cannot-link constraints ConD x D)

1) For each (d, d=) ∈ Con= : If d=C, return true.

2) For each (d, d) ∈ Con : If dC, return true.

3) Otherwise, return false.

The PCKMeans algorithm is similar to COP-KMeans. The main difference is that this algorithm can violate the constraints, but it has to penalize itself for doing so. It tries to come up with a good cluster formation while minimizing the penalty that it incurs.