Product SiteDocumentation Site

Chapter 2. Starting a Cluster

2.1. Creating Security Groups
2.2. Starting the Master Instance
2.3. Starting Slave Instances
The first step to set up a cluster with our AMIs is to sign up for EC2. If you have never used EC2 before, we strongly suggest reading Amazon's Getting Started Guide. It shouldn't take any longer than 10 minutes, and it makes the process in the following section much clearer. Rest assured, however, that we will still target this guide to users who are new to EC2.

Important

Remember that if you start EC2 nodes, Amazon will begin charging to your account! Simply running through these instructions should cost no more than a few dollars, as long as you terminate the cluster afterward.

2.1. Creating Security Groups

Before we launch the cluster itself, we will want to create security groups for it. These define firewall rules for the EC2 instances that make up the cluster. This is a one-time-only procedure: once you have created appropriate security groups, you should not have to recreate them when starting new clusters unless you would like to keep your groups separate.
First, we will create a security group for all cluster nodes:
  1. Log in to the AWS Management Console if you have not already.
  2. Click on Security Groups in the left-hand Navigation pane.
  3. In the toolbar at the top, click Create Security Group. We will first create a group for all the machines in the cluster.
    1. For Security Group Name, enter hadoop-cluster, or anything else meaningful to you.
    2. For Description, enter anything you choose.
    3. Click Create.
  4. Back in the Security Groups panel, your new group should be selected. In the lower pane, you should see a table where you can add individual security rules.
    1. Leaving all other fields empty, enter the name of your security group (hadoop-cluster) in the Source (IP or group) column of the row at the bottom of the table.
    2. Click the Save button to the right.

    Note

    Due to an apparent bug in the AWS Management Console, you may have to click the Refresh button in the top right toolbar to see changes after you click Save.
    Now you should see three entries: one each for ICMP, TCP, and UDP on all ports. This will allow all your cluster instances to communicate freely among themselves.
Next, we will create a security group for the master node only:
  1. Click Create Security Group again.
    1. For Security Group Name, enter hadoop-cluster-master, or anything else meaningful to you.
    2. For Description, enter anything you choose.
    3. Click Create.
  2. Back in the Security Groups panel, select your new security group and create the following four rules using the bottom pane:
    Connection Method Protocol From Port To Port Source
    HTTP TCP 80 80 0.0.0.0/0
    SSH TCP 22 22 0.0.0.0/0
    Custom... TCP 50030 50030 0.0.0.0/0
    Custom... TCP 50070 50070 0.0.0.0/0
    The last two rules are for accessing the web interfaces for the Hadoop JobTracker and NameNode, respectively.
    The Source column specifies the hosts that are allowed to connect in CIDR notation. The value given above allows access from the entire internet (only, of course, on these ports). If you happen to know the IP range of your school or intended users, enter it here instead for greater security.