Text classification (1) - text preprocessing & tex
2016-08-23
0 0 0
no vote
Other
Earn points
Application background
1. Environment: Ubuntu14, Hadoop2.6, Eclipse, NLPIR/ICTCLAS2015, etc.;Two, algorithm profile:1, this project is based on MapReduce Hadoop2.6 parallel development;2, this project is a text classification of text preprocessing and text representation, including word segmentation, to stop words, feature selection and text representation (classification algorithm using the random forest algorithm, temporarily not open, readers can use Mahout or Weka for verification);3, the word segmentation is NLPIR/ICTCLAS2015; the text is used in the VSM model, the weight is calculated using TFIDF; the feature selection is based on the CHI algorithm (chi square statistics);4, about the environment of parallel word segmentation, can refer to my blog http://www.cnblogs.com/merru/p/4917665.html5, about the Hadoop environment to build, can refer to my blog http://www.cnblogs.com/merru/p/4901528.html and http://www.cnblogs.com/merru/p/4905118.html.shell
分类
hadoop
文本
基于
预处理
表示
Related Source Codes
naive Bayesian
0
0
no vote
Establishment of frequency division multiplexing m
0
0
no vote
Writing text charts to word
0
0
no vote
Snake source code
0
0
no vote
Library management system source code
0
0
no vote
No comment