DBSCAN (DENSITY BASED SPATIAL CLUSTERING OF APPLICATION WITH NOISE)

DEFINITIONS

Define as the radius of an hypersphere and a threshold value

  • CORE

    a point with points inside is hypersphere of radius

  • BORDER

    a point with points inside is hypersphere of radius

  • NEIGHBORHOOD

    two points are in neighborhood with each other when they are inside each hypersphere

  • DIRECT DENSITY REACHABILITY

    a point is in direct density reachable with a point when is core and is in neighborhood

  • DENSITY REACHABILITY

    a point is in density reachable with a point when and are connected by a series of direct density reachable points

  • DENSITY CONNECTION

    a point is density connected to point if there is a point such that and are density reachable from

ALGORITHM

Algorithm 1 DBSCAN
Require: SetOfPoints: UNCLASSIfIED points
Require: Eps, MinPts
	ClusterId <- nextId(NOISE);
	for i = 1 to SetOfPoints.size do
		Point <- SetOfPoints.get(i)
		if Point.ClId = UNCLASSIfIED then
			if ExpandCluster(SetOfPoints, Point, ClusterId, Eps, MinPts) then
				ClusterId <- nextId(ClusterId)
Ensure: SetOfPoints
ExpandCluster(SetOfPoints, Point, ClId, Eps, MinPts) : Boolean;
	seeds:=SetOfPoints.regionQuery(Point,Eps);
	If seeds.size<MinPts THEN # no core point
		SetOfPoint.changeClId(Point,NOISE);
		RETURN False;
	ELSE
# all points in seeds are density-reachable from Point
		SetOfPoints.changeClIds(seeds,ClId);
		seeds.delete(Point);
		WHILE seeds <> Empty DO
			currentP := seeds.first();
			result := SetOfPoints.regionQuery(currentP,Eps);
			If result.size >= MinPts THEN
				For i FROM 1 TO result.size DO
					resultP := result.get(i);
					If resultP.ClId IN {UNCLASSIfIED, NOISE} THEN
						If resultP.ClId = UNCLASSIfIED THEN seeds.append(resultP);
						END If;
						SetOfPoints.changeClId(resultP,ClId);
					END If; # UNCLASSIfIED or NOISE
				END For;
			END If; # result.size >= MinPts
			seeds.delete(currentP);
		END WHILE; # seeds <> Empty
		RETURN True;
	END If
END; # ExpandCluster

PARAMETERS TO TUNE

and are the parameter that need to be tuned, a good value for can be where is the number of dimensions

PREVIOUS