1. Introduction

Amazon’s Elastic Compute Cloud (EC2) can be a cost-effective way to try out cluster computing before investing time and money into equipment and maintenance. In order to make it even easier to get started on Hadoop and parallel computing, we have created Amazon Machine Images (AMIs) which, following the process below, can be used to bring up and start testing a fully-operational Hadoop cluster in a matter of minutes.

1.1. About our AMIs

Amazon Machine Images are prebuilt virtual machine images specially designed for EC2. They are stored on Amazon’s servers, and can be created, customized, and shared by any EC2 user. Users with access can use them to launch their own instances, running virtual machines over which they have complete control.

As of this writing, we have built the following AMI for creating Hadoop clusters with WebMapReduce (see http://webmapreduce.sf.net/ec2.php for the latest list):

WebMapReduce AMIs
AMI ID OS/Architecture Preloaded Software
ami-b07885d9 Ubuntu 10.10 Lucid Server (32-bit)
  • Cloudera Distribution for Hadoop 3, based on Apache Hadoop 0.20.2
  • Apache HTTP Server 2 with Django 1.2.3
  • WebMapReduce frontend & backend
ami-6a7b8603 Ubuntu 10.10 Lucid Server (64-bit)
  • Cloudera Distribution for Hadoop 3, based on Apache Hadoop 0.20.2
  • Apache HTTP Server 2 with Django 1.2.3
  • WebMapReduce frontend & backend

Our AMIs come with the following features:

  • Fully configurable through Amazon’s web-based AWS Management Console, requiring only a browser for setup and an SSH client for logging in.
  • Persistent storage with Amazon’s Elastic Block Store (EBS) allows you to shut down cluster machines without losing their data.
  • Easy to add cluster nodes with increasing demand.

Table Of Contents

Previous topic

EC2 Guide

Next topic

2. Starting a Cluster

This Page