1) Place k-number
of points as the initial group centers in the space represented
by the objects being clustered
2) Assign each
object to the group that has the closest center
3)When all
objects have been assigned, recalculate the positions of the k-number
of cluster centers
4)Repeat steps
2 and 3 until the centers no longer move. This procedure produces
a separation of the objects into groups from which the metric
to be minimized can be calculated
Let there be
a data set D, a set of must-link constraints Con=
⊆ D x D, and a set of cannot-link constraints
Con≠ ⊆ D x D.
1) Let C1...Ck
be the initial cluster centers.
2) For each
point di in D, assign it to the closest
cluster Cj such that VIOLATE_CONSTRAINTS(di,
Cj, Con&sube, Con&ne)
is false. If no such cluster exists, fail (return{}).
3) For each
cluster Ci, update its center by averaging
all of the points dj that have been assigned
to it.
4) Iterate
between (2) and (3) until convergence.
5) Return {C1...Ck}.
VIOLATE-CONSTRAINTS(data
point d, cluster C, must-link constraints Con=
⊆ D x D, cannot-link constraints Con≠
⊆ D x D)
1) For each
(d, d=) ∈ Con=
: If d= ∉ C, return true.
2) For each
(d, d≠) ∈ Con≠
: If d≠ ∈ C, return true.
3) Otherwise,
return false.