Fudan University the development of natural language processing

FNLP was mainly developed for Chinese natural language processing toolkits, including machine learning algorithms in order to achieve these tasks and data sets. This kit contains a DataSet use LGPL3.0 license. Function as follows:        information retrieval: Clustering in Chine...

Slope One algorithms

Slope One algorithms, implemented in Java programming language, can be used for data mining and knowledge discovery, which also provides data sets for analysis....

Apriori and FP-growth source code and output

Frequent pattern not only output, and output support for frequent patterns (absolute value, namely support count)Each line in the output file format is as follows:abc: 1000Represents the output mode for the ABC, its support for 1000...

Java memory and external sorting sorting

10,000,000 records have generated a text file in which each record consists of 100 bytes. Experiments considering only one attribute A record, assume that A is an integer type. Recorded on the block package, using non-spanned manner that the recording block is smaller than a space not used. Block si...

The decision tree and that her Naive Bayesian method for data classification

Application backgroundThis is code of classification study, and in particular, the comparison analysis between the decision tree method and the naïve Bayesian method for classifying data.Key Technologycomparison analysis decision tree Bayesian classifying data...

KNN classification algorithm

KNN algorithm for data classification algorithm is simple and easy to understand for beginners! In simple terms, K-NN can be seen: there are so a bunch of you already know the data classification, and when a new data entry time, they begin with the training data for each point in seeking distance, t...

Keyword extraction algorithm based on statistics (using the TF-IDF algorithm)

Using the TF-IDF algorithm of keyword extraction algorithm based on statistics. Use IKAnalyzer breaker, need to add the corresponding jar package before using.You can find every file in the folder specified in the keyword, Word and select the top 5 results for display.Can change the specified folder...

APriori running no problem of distributed programs

The code has already been deployed to Hadoop platform, running no problem, but reduce some not very perfect, can't output frequent itemsets, experts hope can help solve it, really do it anymore, thanks!!...

Recommendation system - Collaborative Filtering Algorithm

Application backgroundThis program is based on the Java implementation of the user based collaborative filtering algorithm, the algorithm is simple, due to the online information on the implementation of the code rarely,So the main purpose of the information is to provide a reference for your inform...

