Product SiteDocumentation Site

2.6.2. Configure & Write Job

Once logged in, you need to configure your job's basic settings. Use any job name and select Python 3 for the source language. There is no need to set the number of map or reduce tasks in this example, so leave these fields unchanged.
Job Configuration
Figure 2.2. Job Configuration

Next supply input to the job. A small amount of text will suffice for this example. This has the benefit of allowing us to check whether we received the right results at a glance. This same example could, however, be run on a much larger dataset (try it out when you are finished!). Use something simple, preferably with repeated words, like the text shown in the screenshot in Figure 2.3, “Job Input”:
Job Input
Figure 2.3. Job Input

Now define a mapper. The following mapper does exactly what is described in Section 1.2.3, “Example: WordCount” in the introduction to map-reduce: it takes an entire line of input as its key (ignoring the empty value), and outputs every word in the line along with a 1 as the value:
def mapper(key, val):
  words = key.split()
  for word in words:
    Wmr.emit(word, '1')
Notice the following parts of the code:
If the mapper receives the line "one fish" as input, its output should be:
one    1
fish   1
Mapper Source Code
Figure 2.4. Mapper Source Code

Finally, we will write a reducer. This reducer adds all the 1's associated with each word to get a final count:
def reducer(key, values):
  sum = 0
  for v in values:
    sum = sum + int(v)
  Wmr.emit(key, str(sum))
Notice the following:
Reducer Source Code
Figure 2.5. Reducer Source Code



[3] All languages require certain names for mappers and reducers, although the exact name may differ. The chapter on each language will give the specific requirement.

[4] Although this is generally true in WebMapReduce, some languages may behave differently. These differences will be noted in the chapter for each language.

[5] Other languages may have similar functions, or they may use the return value of the function as output. See the chapter on each language.