Co forest semi supervised classification
The main content of semi supervised learning is how to use a small amount of labeled data and a large number of unlabeled data to train the classifier efficiently. Compared with supervised learning, semi supervised learning can get higher cost performance, so semi supervised learning has been widely concerned in theory and practice. The earliest idea of using unlabeled data in training (self training) is : first, use the labeled data set to train the initial classifier, use the classifier to label some unlabeled data, put some new labeled examples with the highest reliability into the labeled data set, and then conduct the next training on the new labeled data set until the cut-off condition is met (e.g, Scudder(1965);Fralick(1967);Agrawala(1970)）。 Here, unlabeled data is used to modify and improve the accuracy of the classifier. Since the initial classifier is always a weak classifier, self training continuously uses the classifier trained in the last iteration to classify the unlabeled data, and adds the classification results to the next iteration of the training process, which will lead to self training algorithm continuously accumulating its own classification errors, and ultimately lead to low classification efficiency.