Upload Code
loading-left
loading loading loading
loading-right

Loading

Profile
No self-introduction
codes (1)
Text classification (1) - text preprocessing & tex
no vote
Application background 1. Environment: Ubuntu14, Hadoop2.6, Eclipse, NLPIR/ICTCLAS2015, etc.;Two, algorithm profile:1, this project is based on MapReduce Hadoop2.6 parallel development;2, this project is a text classification of text preprocessing and text representation, including word segmentation, to stop words, feature selection and text representation (classification algorithm using the random forest algorithm, temporarily not open, readers can use Mahout or Weka for verification);3, the word segmentation is NLPIR/ICTCLAS2015; the text is used in the VSM model, the weight is calculated using TFIDF; the feature selection is based on the CHI algorithm (chi square statistics);4, about the environment of parallel word segmentation, can refer to my blog http://www.cnblogs.com/merru/p/4917665.html5, about the Hadoop environment to build, can refer to my blog http://www.cnblogs.com/merru/p/4901528.html and http://www.cnblogs.com/merru/p/4905118.html.
1055353855@qq.com
2016-08-23
0
1
No more~