excellent developer-CodeForge.com

Profile

No self-introduction

codes (1)

Text classification (1) - text preprocessing & tex

no vote

Application background 1. Environment: Ubuntu14, Hadoop2.6, Eclipse, NLPIR/ICTCLAS2015, etc.;Two, algorithm profile:1, this project is based on MapReduce Hadoop2.6 parallel development;2, this project is a text classification of text preprocessing and text representation, including word segmentation, to stop words, feature selection and text representation (classification algorithm using the random forest algorithm, temporarily not open, readers can use Mahout or Weka for verification);3, the word segmentation is NLPIR/ICTCLAS2015; the text is used in the VSM model, the weight is calculated using TFIDF; the feature selection is based on the CHI algorithm (chi square statistics);4, about the environment of parallel word segmentation, can refer to my blog http://www.cnblogs.com/merru/p/4917665.html5, about the Hadoop environment to build, can refer to my blog http://www.cnblogs.com/merru/p/4901528.html and http://www.cnblogs.com/merru/p/4905118.html.

1055353855@qq.com

2016-08-23

No more~

Follows

Fans