Home » Source Code » Naive Bayesian classifier

Naive Bayesian classifier

sabarish.raghu
2015-10-31 03:54:44
The author
View(s):
Download(s): 1
Point (s): 1 
Category Category:
FileFile JavaJava

Description

Application background

Naïve Bayes Classifier:

The Naive Bayes classifier is a probabilistic classifier.

We compute the probability of a document d being in a class c as follows:

 P(c|d) ∝ P(c) Y 1≤k≤nd P(tk |c)

nd is the length of the document. (number of tokens)

 P(tk |c) is the conditional probability of term tk occurring in a document of class c

P(tk |c) as a measure of how much evidence tk contributes that c is the correct class.

 P(c) is the prior probability of c.

 If a document’s terms do not provide clear evidence for one class vs. another, we choose the c with highest P(c)

Algorithm (More of Code like):       

Naïve Bayes(Test_Data_Dir, Training_Data_Dir)

{

For(each test file in test data directory)

            For each class

                        Map<class, probability> ProbabilityMap;

For each word in test file

Wordprobability=Probability of occurance of that word in the class

                        ProbabilityMap.put(className,probability*Wordprobability)

            Classified_class=Key of Max probability value

}

Project Design

Model Classes:

TestRecord:

Holds the Test record as an object.

           

·       String RecordId                      Filename of the Test File      

·       String fullRecord                    Test record as a single string.

·       ArrayList<String> words       words in the test record.

OccuranceProbabilties:

Used as a cache to store the probabilities of words associated with a particular class.

·       String className                    Classname

·       Hashmap<String,Double>      Probability of the each word

MemoryFile:

Holds the training record as an object.

·       String className                    Class name of the training file

·       ArrayList<String> content     Words in the class.

 

Flow of the Code:

1.     Read each test file, remove stopwords, perform stemming and load in to objects.

2.     Read each training file, remove stopwords, perform stemming and load in to objects.

3.     For each test file, for each class name, for each word; check if the probability already exist in cache.

4.     Else compute the probability of each word and multiply them to get overall probability for the test file.

5.     Check which probability has maximum among the classes for the test file which gives the class value of the file.

Two Modes of Execution:

Take your choice depending upon the size of the dataset and computing power you have in the machine.

·       In Memory

o   Training Data is loaded in to memory as objects.

o   Executes much faster

o   Significantly less number of file reads.

o   Higher memory load.

·       File Read

o   Handles Training data as files as it is.

o   Executes slower

o   More number of file reads.

o   Significantly less memory load.

 

Steps for Execution:

·       Please follow the following structure of directory for Test and training directory.

o   TrainingDirectoryName

§  ClassName1

·       Trainingfile1

·       Trainingfile2

.

.

.

§  ClassName2

·       Trainingfile1

·       Trainingfile2

.

.

                                    …….

o   TestDirectoryName

§  Testfile1

§  Testfile2

§  Testfile3

.

.

.

.

·       NaiveBayesClassifier is the main class. So, To run the classifier, run the following command,

 

Correct Usage: java NaiveBayesClassifier <TrainingDataDirectory> <TestDataDirectory> <InMemoryFlag>

Or run

Java –jar NaiveBayesClassifier <TrainingDataDirectory> <TestDataDirectory> <InMemoryFlag>

Arg 1: Training data directory (No spaces in directory name please).

Arg 2: Test data directory(No spaces in directory name please).

Arg 3: InMemoryFlag <Two ways of execution> Set to “true”, if you want Training Data to be loaded in memory (Faster computation but higher memory load).

If set to “false”, It reads training data from file again and again. Since, Number of file reads is high, it becomes very slow; but memory load is significantly less.

Key Technology

Java

Machine Learning

Naive Bayesian

Text Classifier

Sponsored links

File list

Tips: You can preview the content of files by clicking file names^_^
Name Size Date
01.97 kB
MemoryFile.java534.00 B2015-09-29 21:42
NaiveBayesClassifier.java8.76 kB2015-09-30 18:18
OccuranceProbabilties.java479.00 B2015-09-19 21:27
PorterStemmer.java11.21 kB2015-09-19 16:31
StopWordAnalyzer.java17.84 kB2015-09-29 23:23
TestRecord.java542.00 B2015-09-30 08:40
...
Sponsored links

Comments

(Add your comment, get 0.1 Point)
Minimum:15 words, Maximum:160 words
  • 1
  • Page 1
  • Total 1

Naive Bayesian classifier (9.39 kB)

Need 1 Point(s)
Your Point (s)

Your Point isn't enough.

Get 22 Point immediately by PayPal

Point will be added to your account automatically after the transaction.

More(Debit card / Credit card / PayPal Credit / Online Banking)

Submit your source codes. Get more Points

LOGIN

Don't have an account? Register now
Need any help?
Mail to: support@codeforge.com

切换到中文版?

CodeForge Chinese Version
CodeForge English Version

Where are you going?

^_^"Oops ...

Sorry!This guy is mysterious, its blog hasn't been opened, try another, please!
OK

Warm tip!

CodeForge to FavoriteFavorite by Ctrl+D