************ Introduction ************ What is WebMapReduce? ===================== WebMapReduce is a simple web-based user interface for creating and submitting Hadoop Map-Reduce jobs in practically any language. It is ideally suited for use in the introductory computer science classroom, requiring very little programming experience to write massively parallel programs. Some of its features include: - *Simplified Map-Reduce model*: WebMapReduce offers the features of Map-Reduce that are crucial to the core concept, without details that add to the learning curve. - *Extensible language support*: Mappers and reducers can be written in practically any language. All that is needed to support a new language is a simple wrapper library, possibly with an API for users to easily perform common tasks such as string processing. - *Adaptable APIs*: In addition, preincluded languages can be customized to alter their APIs—for example, in order to support libraries already introduced in a class, or to suit a particular teaching style or programming paradigm. Purely functional, imperative, or object-oriented strategies are all fair game. - *Customizable authentication*: It is possible to authenticate users against an LDAP server or through PAM, which itself provides many options, including traditional Unix authentication, database-based authentication, and many others. .. _sect-Admin_Guide-Introduction-Architecture: Architecture ============ WebMapReduce runs on top of an installation of `Hadoop`_, Apache's implementation of MapReduce. It manages many of the details of using Hadoop, such as compiling code, configuring and submitting jobs, and reading output, mitigating the need to perform these tasks through the command line. It also uses a modified version of `Hadoop Streaming`_, combined with special wrappers, to allow jobs to be written in any language. WebMapReduce is split into two components, a frontend and a backend: ================= ============================= =================================== \ Frontend Backend ================= ============================= =================================== Technology Django (Python) web app Java web service Responsibilities Interacting with the user: Interacting with Hadoop: - Authentication - Preparing code (e.g., compiling) - Submission & monitoring - Submitting jobs workflow - Reading job output - Reporting results/errors Location Any :term:`WSGI`-capable Head node of Hadoop cluster webserver that can communicate with the backend Communication `Apache Thrift`_\ -based RPC protocol ----------------- ------------------------------------------------------------------ ================= ============================= =================================== One of the reasons for the separate components is flexibility: the frontend can run, for example, on your school's public webserver, while the backend can be run on a cluster that is not widely accessible. Of course, the two components can also be installed on the same machine. As will be explained later, this is sometimes desirable for simplicity and security. This installation guide will cover both cases. About ===== WebMapReduce was developed at St. Olaf College by Patrick Garrity '12 and Tim Yates '12 starting in 2009. Stephen N. Lee '14 contributed various bugfixes to version 2.0, while the C# plugin was contributed by Boyang Wei '13. It is free software released under the Apache License, version 2.0. .. _Hadoop: http://hadoop.apache.org/ .. _Hadoop Streaming: http://hadoop.apache.org/mapreduce/docs/current/streaming.html .. _Jetty: http://jetty.codehaus.org/jetty/ .. _Apache Thrift: http://incubator.apache.org/thrift/