Home » Source Code » Document Similarity

Document Similarity

ibr
2014-10-28 04:31:53
The author
View(s):
Download(s): 3
Point (s): 2 
Category Category:
Java DevelopmentJava Development JavaJava

Description

I have done one project for Document Similarity which is used for finding the duplicate content from some 

other document and also used for grouping same type of relevant document based on similarity measurement,

so this type document similarity based on lucene index and some similarity algorithm we used

Sponsored links

File list

Tips: You can preview the content of files by clicking file names^_^
Name Size Date
built-jar.properties75.00 B26-07-13 19:15
<.netbeans_automatic_build>0.00 B29-07-13 12:49
<.netbeans_update_resources>0.00 B29-07-13 12:49
AbbreviationRecognizer.class5.00 kB29-07-13 12:49
AbstractSimilarity.class1.71 kB29-07-13 12:49
BoundaryRecognizer.class2.31 kB29-07-13 12:49
ContentWordRecognizer.class2.71 kB29-07-13 12:49
CosineSimilarity.class1.61 kB29-07-13 12:49
CosineSimilarity_1.class4.67 kB29-07-13 12:49
DocumentVector.class2.07 kB29-07-13 12:49
Document_Similarity.class1.06 kB29-07-13 12:55
DocVector.class1.70 kB29-07-13 12:49
IdfIndexer.class2.06 kB29-07-13 12:49
IRecognizer.class411.00 B29-07-13 12:49
JaccardSimilarity.class1.02 kB29-07-13 12:49
LsiIndexer.class2.45 kB29-07-13 12:49
NumericTokenFilter.class1.31 kB29-07-13 12:49
PageRanker.class2.71 kB29-07-13 12:49
PhraseRecognizer.class4.18 kB29-07-13 12:49
RecognizerChain.class1.79 kB29-07-13 12:49
RegexBasedWordBreakIterator.rs52.00 B29-07-13 12:49
Searcher$1.class1.23 kB29-07-13 12:49
Searcher$SearchResult.class634.00 B29-07-13 12:49
Searcher.class4.64 kB29-07-13 12:49
SentenceTokenizer.class1.34 kB29-07-13 12:49
SimilarityTest.class6.65 kB29-07-13 12:49
StopwordRecognizer.class3.03 kB29-07-13 12:49
TermVectorBasedSimilarityTest$DocVector.class2.04 kB29-07-13 12:49
TermVectorBasedSimilarityTest.class4.42 kB29-07-13 12:49
TfIndexer.class1.56 kB29-07-13 12:49
Token.class1.34 kB29-07-13 12:49
TokenType.class1.70 kB29-07-13 12:49
VectorGenerator.class6.78 kB29-07-13 12:49
WordTokenizer.class2.79 kB29-07-13 12:49
nytimes-obama.txt2.96 kB29-07-13 12:49
paragraph_break_rules.txt3.86 kB29-07-13 12:49
resample-nb.txt3.06 kB29-07-13 12:49
sentence_break_rules.txt3.73 kB29-07-13 12:49
stopwords.txt3.84 kB29-07-13 12:49
word_break_rules.txt5.78 kB29-07-13 12:49
Searcher$1.class1.28 kB29-07-13 12:49
Searcher$SearchResult.class684.00 B29-07-13 12:49
RegexBasedWordBreakIterator.class3.09 kB29-07-13 12:49
data.adj3.01 MB29-07-13 12:49
data.adv504.59 kB29-07-13 12:49
data.noun14.59 MB29-07-13 12:49
data.verb2.64 MB29-07-13 12:49
index.adj804.81 kB29-07-13 12:49
index.adv159.00 kB29-07-13 12:49
index.noun4.56 MB29-07-13 12:49
index.sense6.96 MB29-07-13 12:49
index.verb511.70 kB29-07-13 12:49
indexing_sample_data.txt378.00 B29-07-13 12:57
resample-nb.txt3.06 kB29-07-13 12:49
stopwords.txt3.84 kB29-07-13 12:49
word_break_rules.txt5.78 kB29-07-13 12:49
build.xml3.66 kB26-07-13 16:15
README.TXT1.30 kB26-07-13 19:15
manifest.mf85.00 B26-07-13 16:15
build-impl.xml77.10 kB26-07-13 16:15
genfiles.properties475.00 B26-07-13 16:15
private.properties133.00 B26-07-13 16:15
private.xml230.00 B26-07-13 19:17
project.properties6.41 kB26-07-13 16:20
project.xml527.00 B26-07-13 16:15
AbbreviationRecognizer.java4.79 kB26-07-13 16:46
AbstractSimilarity.java1.31 kB26-07-13 16:40
BoundaryRecognizer.java1.85 kB26-07-13 16:46
ContentWordRecognizer.java1.63 kB26-07-13 18:20
CosineSimilarity.java1.62 kB26-07-13 16:41
CosineSimilarity_1.java2.82 kB26-07-13 19:07
DocumentVector.java959.00 B26-07-13 19:08
Document_Similarity.java753.00 B29-07-13 12:55
DocVector.java1.08 kB26-07-13 19:06
IdfIndexer.java2.12 kB26-07-13 16:41
IRecognizer.java774.00 B26-07-13 16:45
JaccardSimilarity.java878.00 B26-07-13 16:41
LsiIndexer.java2.27 kB26-07-13 16:42
NumericTokenFilter.java1.23 kB26-07-13 16:42
PageRanker.java2.21 kB26-07-13 16:23
PhraseRecognizer.java2.84 kB26-07-13 16:45
RecognizerChain.java1.45 kB26-07-13 18:23
RegexBasedWordBreakIterator.java5.11 kB26-07-13 16:19
Searcher.java3.02 kB26-07-13 16:43
SentenceTokenizer.java1.18 kB26-07-13 16:43
SimilarityTest.java5.65 kB26-07-13 18:29
StopwordRecognizer.java2.06 kB26-07-13 16:45
TermVectorBasedSimilarityTest.java3.24 kB26-07-13 18:36
TfIndexer.java1.29 kB26-07-13 16:44
Token.java779.00 B26-07-13 16:44
TokenType.java379.00 B26-07-13 16:44
VectorGenerator.java4.79 kB26-07-13 16:44
WordTokenizer.java2.17 kB26-07-13 16:44
nytimes-obama.txt2.96 kB26-07-13 16:55
paragraph_break_rules.txt3.86 kB26-07-13 16:55
resample-nb.txt3.06 kB26-07-13 16:55
sentence_break_rules.txt3.73 kB26-07-13 16:55
stopwords.txt3.84 kB26-07-13 16:55
word_break_rules.txt5.78 kB26-07-13 16:55
data.adj3.01 MB26-07-13 18:19
data.adv504.59 kB26-07-13 18:19
data.noun14.59 MB26-07-13 18:19
data.verb2.64 MB26-07-13 18:19
index.adj804.81 kB26-07-13 18:19
index.adv159.00 kB26-07-13 18:19
index.noun4.56 MB26-07-13 18:19
index.sense6.96 MB26-07-13 18:19
index.verb511.70 kB26-07-13 18:19
indexing_sample_data.txt376.00 B29-07-13 12:57
resample-nb.txt3.06 kB26-07-13 16:50
stopwords.txt3.84 kB26-07-13 16:50
word_break_rules.txt5.78 kB26-07-13 16:50
<matrix>0.00 B29-07-13 12:49
<similarity>0.00 B29-07-13 12:49
<tokenizers>0.00 B29-07-13 12:49
<jtmt>0.00 B26-07-13 19:15
<dict>0.00 B26-07-13 19:15
<data>0.00 B26-07-13 19:15
<resources>0.00 B26-07-13 19:15
<sf>0.00 B26-07-13 19:15
<wordnet-3.0>0.00 B26-07-13 19:15
<resources>0.00 B26-07-13 19:15
<dict>0.00 B26-07-13 18:19
<data>0.00 B26-07-13 16:50
<document_similarity>0.00 B29-07-13 12:55
<main>0.00 B26-07-13 19:15
<net>0.00 B26-07-13 19:15
<opt>0.00 B26-07-13 19:15
<test>0.00 B26-07-13 19:15
<ap-source-output>0.00 B26-07-13 19:15
<resources>0.00 B26-07-13 16:55
<wordnet-3.0>0.00 B26-07-13 18:19
<resources>0.00 B26-07-13 16:49
<classes>0.00 B29-07-13 12:49
<empty>0.00 B26-07-13 19:15
<generated-sources>0.00 B26-07-13 19:15
<private>0.00 B26-07-13 19:17
<document_similarity>0.00 B26-07-13 19:06
<main>0.00 B26-07-13 16:55
<opt>0.00 B26-07-13 18:19
<test>0.00 B26-07-13 16:49
<build>0.00 B26-07-13 19:15
<dist>0.00 B26-07-13 19:15
<nbproject>0.00 B26-07-13 16:15
<src>0.00 B26-07-13 18:19
<Document_Similarity>0.00 B26-07-13 19:15
...
Sponsored links

Comments

(Add your comment, get 0.1 Point)
Minimum:15 words, Maximum:160 words
  • 1
  • Page 1
  • Total 1

Document Similarity (17.13 MB)(12.91 MB)

Need 2 Point(s)
Your Point (s)

Your Point isn't enough.

Get 22 Point immediately by PayPal

Point will be added to your account automatically after the transaction.

More(Debit card / Credit card / PayPal Credit / Online Banking)

Submit your source codes. Get more Points

LOGIN

Don't have an account? Register now
Need any help?
Mail to: support@codeforge.com

切换到中文版?

CodeForge Chinese Version
CodeForge English Version

Where are you going?

^_^"Oops ...

Sorry!This guy is mysterious, its blog hasn't been opened, try another, please!
OK

Warm tip!

CodeForge to FavoriteFavorite by Ctrl+D