Product SiteDocumentation Site

1.2. Architecture

WebMapReduce runs on top of an installation of Hadoop, Apache's implementation of MapReduce. It manages many of the details of using Hadoop, such as compiling code, configuring and submitting jobs, and reading output, mitigating the need to perform these tasks through the command line. It also uses a modified version of Hadoop Streaming, combined with special wrappers, to allow jobs to be written in any language.
WebMapReduce is split into two components, a frontend and a backend:
Frontend Backend
Technology PHP web application Java web service (using embedded Jetty webserver)
Responsibilities Interacting with the user:
  • Authentication
  • Providing web form
  • Reporting results/errors
Interacting with Hadoop:
  • Preparing code (e.g., compiling)
  • Submitting jobs
  • Reading job output
  • Protecting against DoS attacks
Location Any PHP-capable webserver that can communicate with the backend Head node of Hadoop cluster
Communication HTTP-based XML protocol over an optionally SSL/TLS-encrypted connection.
One of the reasons for the separate components is flexibility: the frontend can run, for example, on your school's public webserver, while the backend can be run on a cluster that is not widely accessible.
Of course, the two components can also be installed on the same machine. As will be explained later, this is sometimes desirable for simplicity and security. This installation guide will cover both cases.