3 edition of Effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval found in the catalog.
Thesis (Ph. D.)--Cornell University, 1986Bibliography: p. 169-177Photocopy of typescript. Ann Arbor, Mich. : UMI Dissertation Information Service, 1989. 22 cm
|The Physical Object|
|Pagination||xvi, 71 p. :|
|Number of Pages||92|
nodata File Size: 5MB.
Gordon MD 1991 User-based clustering by redescribing subject descriptors with a genetic algorithm. In: Proceedings of the 19th annual ACM SIGIR conference, Zurich, Switzerland, 18—22 August 1996, pp 4—11• Grouper  adopts a phrase-analysis algorithm called Suffix Tree Clustering STC, in which snippets sharing the same sequence of words are grouped together. Montreal, 5—7 June 1985, pp 197—203 Cite this article Tombros, A. Xu J, Croft WB 1996 Query expansion using local and global document analysis.
Zeng  re-formalizes the clustering problem as a salient phrase ranking problem. For web page clustering, some of the previous works have adapted K-means and agglomerative hierarchical clustering algorithm [10,12,13,17].
Suffix trees have been studied and used extensively in fundamental string problems such as large volumes of biological sequence data searching , approximate string matches  and text features extraction in spam email classification . Some widely used similarity measures are the cosine coefficient which gives the cosine of the angle between the two featured vectors, the Jaccard coefficients, Euclidean and Pearson Correlation and the dice coefficients all normalised. All the relationships are analysed in the hierarchical algorithms which tend to be costly in terms of time and processing power.
The motivation of this paper is to explain the need of clustering in retrieving efficient information that closely associates documents which are relevant to the same query.
The first problem is finding the similarity among data. Phase 2: condense into desirable length by building a smaller CF tree. The application of document clustering can be categorized to two types, online and offline.
Preece SE 1973 Clustering as an output option. PhD thesis, University of Cambridge, October 1978. In this algorithm, at first a random data item is selected and its neighborhood is investigated to determine whether it has an acceptable number of data points. The disadvantage of partitional algorithm is that the initial choice of clusters is arbitrary and does not necessarily comprise all the actual groups that exist within a data set.
The hierarchical algorithms can be further divided into agglomerative approach and divisive approach. or its licensors or contributors. Moreover, these algorithms will yield inconsistent results. We can start with vector space model VSMwhich represents a document as a vector of the terms that appear in all the document set.
Jansen BJ, Spink A, Saracevic T 2000 Real life, real users, and real needs: a study and analysis of users on the web.