From document collection point
no vote
Application background
This code counts the number of times each word appears from a set of documents.
Includes Mapper, Reducer and Driver.
Mapper reads one line at time and divides into (key,value) pairs..
The output from mapper is given as input to the reducer..
Output from mapper is <key,1>--<key,1>--<key,1>--<key,1>--<key,1>...
In between Mapper and Reducer shuffling and sorting will be done..
Input to Reducer will be <key,11111111.....>
Finally the output will be <key,value> ie <word,count>
Key Technology
Hadoop:
Hadoop is a framework designed for Big Data.
It consists of mainly 1. Hadoop Distributed File system 2. Mapreduce
Hadoop Distributed fiel system stores data..
Mapreduce processes the data..
Every input given to the