I have done one project for Document Similarity which is used for finding the duplicate content from some
other document and also used for grouping same type of relevant document based on similarity measurement,
so this type document similarity based on lucene index and some similarity algorithm we used