Abstract: The aim of the authorship attribution is identifying the author of an unknown/anonymous document. Many earlier researches used authorship attribution as a multi class single labelled text classifier problem. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised attribution models that are able to estimate similarities/differences in personal style of authors. The present paper experimets authorship attribution as a clustering task using various unsupervised clustering algorithms like K-means, Mini Batch K-means and Ward Hierarchialclusterings and our authorship clustering algorithm achieves 97% of clustering accuracy in clustering C50 English news groups artcles.

Keywords: authorship clustering; unsupervised algorithms; C50 data set.