Identifying Clusters from Positive Data

dc.contributor.authorJohn CASEen_US
dc.contributor.authorSanjay JAINen_US
dc.contributor.authorEric MARTINen_US
dc.contributor.authorArun SHARMAen_US
dc.contributor.authorFrank STEPHANen_US
dc.date.accessioned2004-10-21T14:28:52Zen_US
dc.date.accessioned2017-01-23T06:59:50Z
dc.date.available2004-10-21T14:28:52Zen_US
dc.date.available2017-01-23T06:59:50Z
dc.date.issued2004-07-01T00:00:00Zen_US
dc.description.abstractThe present work studies clustering from an abstract point of view and investigates its properties in the framework of inductive inference. Any class S considered is given by a numbering A0,A1,... of nonempty subsets of N or Rk which is also used as a hypothesis space. A clustering task is a finite and nonempty set of indices of pairwise disjoint sets. The class S is said to be clusterable if there is an algorithm which, for every clustering task I, converges in the limit on any text for yi I Ai to a finite set J of indices of pairwise disjoint clusters such that y j J Aj = yi I Ai. A class is called semiclusterable if there is such an algorithm which finds a J with the last condition relaxed to yj J Aj { yi I Ai. The relationship between natural topological properties and clusterability is investigated. Topological properties can provide sufficient or necessary conditions for clusterability but they cannot characterize it. On one hand, many interesting conditions make use of both the topological structure of the class and a well-chosen numbering. On the other hand, the clusterability of a class does not depend on the decision which numbering of the class is used as a hypothesis space for the clusterer. These ideas are demonstrated in the context of geometrically defined classes. Clustering of many of these classes requires besides the text for the clustering task some additional information: the class of convex hulls of finitely many points in a rational vector space can be clustered with the number of clusters as additional information. Interestingly, the class of polygons (together with their interiors) is clusterable if the number of clusters and the overall number of vertices of these clusters is given to the clusterer as additional information. Intriguingly this additional information is not sufficient for classes including figures with holes. While some classes are unclusterable due to their topological structure, others are only computationally intractable. An oracle might permit clustering all computationally intractable tasks but fail on some classes which are topologically difficult. It is shown that an oracle E permits clustering all computationally difficult classes iff E dT K and E' dT K''. Furthermore, no 1-generic oracle below K and no 2-generic oracle permits clustering any class which is not clusterable without an oracle.en_US
dc.format.extent885043 bytesen_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.urihttps://dl.comp.nus.edu.sg/xmlui/handle/1900.100/1449en_US
dc.language.isoenen_US
dc.relation.ispartofseriesTRA7/04en_US
dc.titleIdentifying Clusters from Positive Dataen_US
dc.typeTechnical Reporten_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
report.pdf
Size:
864.3 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.52 KB
Format:
Plain Text
Description: