Documentation

Prerequisites

Python 2.5+
boto
simplejson
prettytable
setuptools
dateutil
cElementTree
elementtree
PyYAML

To install the packages above you may need to download and install the Python setup tools first which are not installed by default on Ubuntu for example. Without them, you cannot run easy_install. See http://pypi.python.org/pypi/setuptools for more info.

easy_install simplejson
easy_install boto
easy_install prettytable
easy_install setuptools
easy_install python-dateutil
easy_install PyYAML
easy_install cElementTree
easy_install elementtree

Setting Environment Variables to Specify AWS Credentials

You must specify your AWS credentials when using stratus. The simplest way to do this is to set the environment variables:

AWS_ACCESS_KEY_ID: Your AWS Access Key ID
AWS_SECRET_ACCESS_KEY: Your AWS Secret Access Key

Configuration

To configure stratus, create a directory called .stratus in your home directory (note the leading period "."). In that directory, create a file called clusters.cfg that contains a section for each cluster you want to control. Start each section with a unique name for the section enclosed in square brackets. Each key/value pair must be on its own line. Keys are separated from values by an equals sign. For example:

[my-cassandra-cluster]
service_type=cassandra
cloud_provider=ec2

Each cluster requires the following key/value pairs:

service_type: One of [cassandra, hadoop, hadoop_cassandra_hybrid]
cloud_provider: Only ec2 is supported
image_id: The Amazon EC2 image ID for your cluster nodes
instance_type: The type of EC2 instance to run (small, medium, large, etc...see EC2 documentation for a valid list of these)
key_name: Key name to use
availability_zone: The zone to place your instance in (see EC2 documentation)
region: The region to place your instance in (see EC2 documentation)
private_key: Path to your private key for password-less SSH commands
user_data_file: Path to a bootstrap script that will be executed on each node after the instance is started (see http://aws.amazon.com/articles/1085)

Optional commands:

ssh_options: Options to supply to ssh and scp
security_groups: Any user-defined security groups to authorize your cluster to use (separated by newlines)
env: List of user-defined key/value pairs to be set in your node's environment (separated by newlines)

NOTES

It's best practice to define your cluster with a unique and identifiable name so that other users will know who owns this cluster.
security_groups allow you to define custom security groups for your cluster. This is useful if you have multiple clusters that need to communicate via their internal/private network.
See Cloudera CDH for other AMIs to use with Stratus.
Be sure that your clusters.cfg file uses the proper line feed characters.

Configuring Cassandra 0.6.x Clusters

The following example shows how to specify an i386 Fedora OS as the AMI in a clusters.cfg file for a Cassandra cluster:

[my-cassandra-cluster]
service_type=cassandra
cloud_provider=ec2
image_id=ami-6159bf08
instance_type=m1.small
key_name=your_key_name
availability_zone=us-east-1c
region=us-east-1
private_key=/path/to/key/file
ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
security_groups=security-group-1
    security-group-2
    security-group-3
user_data_file=file:///path/to/cassandra-ec2-init-remote.sh
cassandra_config_file=file:///path/to/storage-conf.xml
env=AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_GOES_HERE
    AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY_GOES_HERE

NOTES

cassandra-config_file is the location to your storage-conf.xml file. This file will be copied to each node in your cluster and Cassandra will use it for its configuration. See the Cassandra 0.6.x Config File section for details.

Configuring Cassandra 0.7.x Clusters (last tested with Beta3)

[my-cassandra-cluster]
service_type=cassandra
cloud_provider=ec2
image_id=ami-6159bf08
instance_type=m1.small
key_name=your_key_name
availability_zone=us-east-1d
region=us-east-1
private_key=/path/to/key/file
ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
security_groups=security-group-1
user_data_file=file:///path/to/cassandra-ec2-init-remote.sh
cassandra_config_file=file:///path/to/cassandra.yaml
keyspace_definitions_file=file:///path/to/keyspace_definitions
env=AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_GOES_HERE
    AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY_GOES_HERE
    CASSANDRA_URL=http://apache.mirrors.pair.com//cassandra/0.7.0/apache-cassandra-0.7.0-beta3-bin.tar.gz

NOTES

cassandra-config_file is the location to your cassandra.yamll file. This file will be copied to each node in your cluster and Cassandra will use it for its configuration. See the Cassandra 0.7.x Config File section for details.
keyspace_definitions_file points to a text file containing a batch of Thrift APIs that will be used to set up your keyspaces initially. Cassandra 0.7 allows for dynamic keyspaces and you are now required to use the API to manage them. (see Keyspace Definitions File section for an example)
CASSANDRA_URL in the env section will override which version of Cassandra to be pulled and installed on each node of your cluster. See the cassandra-ec2-init-remote.sh file in cassandra/data for how this variable is used to configure Cassandra.

Configuring Hadoop Clusters

The following example shows how to specify an i386 Fedora OS (ami-6159bf08) as the AMI in a clusters.cfg file for a Hadoop cluster:

[my-hadoop-cluster]
service_type=hadoop
cloud_provider=ec2
image_id=ami-6159bf08
instance_type=m1.small
key_name=your_key_name
availability_zone=us-east-1c
region=us-east-1
private_key=/path/to/key/file
ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
security_groups=security-group-1
    security-group-2
    security-group-3
user_data_file=file:///path/to/cassandra-ec2-init-remote.sh
env=AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_GOES_HERE
    AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY_GOES_HERE

NOTES

cassandra_config_file is not used for Hadoop clusters and is not present here.

Configuring Hadoop/Cassandra Hybrid Clusters

Hybrid Hadoop/Cassandra clusters operate exactly like Hadoop clusters where there will be one node that acts as a namenode, secondary namenode and job tracker, and one or more nodes act as data nodes and task trackers. The only difference is that Cassandra will be installed and started on the Hadoop nodes designated as data nodes. The same commands to operate a Cassandra cluster will also be available, but will only manipulate data nodes with Cassandra services on them.

[my-hadoop-cassandra-cluster]
service_type=hadoop_cassandra_hybrid
cloud_provider=ec2
image_id=ami-6159bf08
instance_type=m1.small
key_name=your_key_name
availability_zone=us-east-1c
region=us-east-1
private_key=/path/to/key/file
ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
security_groups=security-group-1
    security-group-2
    security-group-3
user_data_file=file:///path/to/hadoop-cassandra-hybrid-ec2-init-remote.sh
cassandra_config_file=file:///path/to/storage-conf.xml
env=AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_GOES_HERE
    AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY_GOES_HERE

NOTES

cassandra_config_file is the same as in a pure Cassandra 0.6.x or 0.7x cluster.
For Cassandra 0.7.x remember to supply your keyspace_definitions_file

Cassandra 0.6.x Config File

The cassandra_config_file parameter in your clusters.cfg file points to a local copy of a storage-conf.xml file for Cassandra v0.6.x that will be pushed out to each node in your cluster. You are responsible for configuring settings in this file, but keep in mind that stratus will automatically copy this file and modify various parameters before it pushes it out. The modifications for storage-conf.xml files are:

Seeds element will contain valid Seed elements containing the private IP address of the seed nodes. Stratus arbitrarily chooses the first two nodes to be seeds.
InitialToken will contain a generated token for proper key distribution
CommitLogDirectory will be /mnt/cassandra-logs
DataFileDirectories will contain one DataFileDirectory element with the value /mnt/cassandra-data
ListenAddress and ThriftAddress will be null

Cassandra 0.7.x Config File

The cassandra_config_file parameter in your clusters.cfg file points to a local copy of a cassandra.yaml file for Cassandra v0.7.x that will be pushed out to each node in your cluster. You are responsible for configuring settings in this file, but keep in mind that stratus will automatically copy this file and modify various parameters before it pushes it out. The modifications for cassandra.yaml files area:

seeds will contain a list of private IP addresses of the seed nodes. Stratus arbitrarily chooses the first two nodes to be seeds.
initial_token will contain a generated token for proper key distribution
commitlog_directory will be /mnt/cassandra-logs
data_file_directories will contain a single list with the value /mnt/cassandra-data
listen_address and rpc_address will be null

Keyspace Definition File

For Cassandra 0.7.x only

The following is a sample file pulled from http://wiki.apache.org/cassandra/LiveSchemaUpdates that shows how you can use Thrift API commands in a batch style to build up your keyspaces. The following will create a keyspace Keyspace1 with two column families: Standard1 and Standard2. This file if passed in through the keyspace_definition_file parameter of your clusters.cfg file will be executed on ONE node via the cassandra-cli utility after the Cassandra service has started.

/* Create a new keyspace */
create keyspace Keyspace1 with replication_factor = 3 and placement_strategy = 'org.apache.cassandra.locator.RackUnawareStrategy'

/* Switch to the new keyspace */
use Keyspace1

/* Create new column families */
create column family Standard1 with column_type = 'Standard' and comparator = 'BytesType'
create column family Standard2 with column_type = 'Standard' and comparator = 'UTF8Type' and rows_cached = 10000

Installing and Configuring Cloud Scripts

Check out the package, browse to that project's root directory, and run the following:

% sudo python setup.py install

Running a Basic Cloud Script

After specifying an AMI, you can run stratus. It will display usage instructions when you invoke it without arguments.

You can test that the script can connect to your cloud provider by typing:

% stratus list --all

this will list the cluster name, service type, and cloud provider for ALL clusters that have been defined or are currently running in EC2

Launching a Cluster

After you install stratus and setup your EC2 account information, starting a Cassandra cluster with 10 nodes is easy by using one command:

% stratus exec CLUSTER_NAME launch-cluster 10 # (where CLUSTER_NAME is a defined cluster in your ~/.stratus/clusters.cfg file)

Using Persistent Clusters

Create a new section in your clusters.cfg file. (This is completely optional. Most users will want EBS so you can use an existing cluster config if you would like.)
Create storage for the new cluster by creating a temporary EBS volume, formatting it, and saving it as a snapshot in S3. This way, you only have to do the formatting once and can use the snapshot to clone cluster volumes later. NOTE: You only have to do this step once unless you remove the snapshot later. All snapshots of a given size are identical, so you can just reuse one if one already exists in the size you want.
Create a JSON spec file that defines how storage volumes will be created and assigned for your cluster. This spec file should reference the snapshot ID you created in the previous step. Remember that if you already have a formatted snapshot you may use that ID instead. IMPORTANT CASSANDRA INFO: All Cassandra cluster nodes expect to have two separate storage devices defined. One storage volume will be used to store Cassandra log files (/dev/sdj) and the second will be used to store Cassandra data (/dev/sdk). The automatic configuration of the nodes will try to mount these volumes to /mnt/cassandra-logs and /mnt/cassandra-data respectively and MUST exist for persistent storage. A sample JSON spec file can be found in the stratus/cassandra/data directory of the project and is referenced below in the "Sample JSON spec file" section.
Use the create-storage command to create the storage volumes defined in your spec file for the number nodes your cluster will have. The following example creates storage for a 3-node Cassandra cluster -- assuming your spec defines the required two volumes per node this command will create 6 volumes (2 for each node)
Launch your cluster with the appropriate number of nodes (should be the same number from the previous step).
When all nodes have finished the configuration of your nodes will begin. This consists of assigning the devices for your storage volumes to the appropriate nodes, mounting those volumes to the proper mount points, and launching the Cassandra services. You can test your persistent storage by:
- writing data to the Cassandra services
- terminating your clusters like normal: % stratus CLUSTER_NAME terminate-cluster
- re-launching the cluster: % stratus CLUSTER_NAME launch-cluster N
- retrieve data previously written to Cassandra
- SSH into your cluster: % stratus CLUSTER_NAME login

Example:

The following example shows how to create a 100GB snapshot, create storage for a 3-node cluster, and then launch the cluster.

% stratus exec CLUSTER_NAME create-formatted-snapshot 100
% stratus exec CLUSTER_NAME create-storage 3 ~/.stratus/my-cassandra-ebs-cluster-storage-spec.json
% stratus exec CLUSTER_NAME launch-cluster 3

JSON Spec File Keys

nn = Hadoop name node
snn = Hadoop secondary name node
dn = Hadoop data node
tt = Hadoop task tracker
jt = Hadoop job tracker
cn = Cassandra node
hcn = Hadoop/Cassandra node
Prefix Hadoop-specific keys with "hybrid_" for Hadoop/Cassandra hybrid keys (e.g, hybrid_nn)

Sample Cassandra JSON spec file

{
    "cn": [
        {
          "device": "/dev/sdj",
          "mount_point": "/mnt/cassandra-logs",
          "size_gb": "100",
          "snapshot_id": "snap-xxxxxx"
        },
        {
          "device": "/dev/sdk",
          "mount_point": "/mnt/cassandra-data",
          "size_gb": "100",
          "snapshot_id": "snap-xxxxxx"
        }
    ]
}

For the automatic configuration to work correctly there needs to be two volumes defined and must reference the devices /dev/sdj and /dev/sdk. The sdj device must have the mount point /mnt/cassandra-logs and the sdk device must have the /mnt/cassandra-data mount point.

Sample Hadoop JSON spec file

{
    "nn": [
        {
          "device": "/dev/sdh",
          "mount_point": "/mnt/hadoop-ebs",
          "size_gb": "100",
          "snapshot_id": "snap-xxxxxx"
        }
    ],
    "dn": [
        {
          "device": "/dev/sdi",
          "mount_point": "/mnt/hadoop-ebs",
          "size_gb": "100",
          "snapshot_id": "snap-xxxxxx"
        }
    ]
}

Sample Hadoop/Cassandra Hybrid JSON spec file

{
    "hybrid_nn": [
        {
          "device": "/dev/sdh",
          "mount_point": "/mnt/hadoop-ebs",
          "size_gb": "100",
          "snapshot_id": "snap-xxxxxx"
        }
    ],
    "hybrid_dn": [
        {
          "device": "/dev/sdi",
          "mount_point": "/mnt/hadoop-ebs",
          "size_gb": "100",
          "snapshot_id": "snap-xxxxxx"
        }
    ],
    "cn": [
        {
          "device": "/dev/sdj",
          "mount_point": "/mnt/cassandra-logs",
          "size_gb": "100",
          "snapshot_id": "snap-xxxxxx"
        },
        {
          "device": "/dev/sdk",
          "mount_point": "/mnt/cassandra-data",
          "size_gb": "100",
          "snapshot_id": "snap-xxxxxx"
        }
    ]
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation

Prerequisites

Setting Environment Variables to Specify AWS Credentials

Configuration

Configuring Cassandra 0.6.x Clusters

Configuring Cassandra 0.7.x Clusters (last tested with Beta3)

Configuring Hadoop Clusters

Configuring Hadoop/Cassandra Hybrid Clusters

Cassandra 0.6.x Config File

Cassandra 0.7.x Config File

Keyspace Definition File

Installing and Configuring Cloud Scripts

Running a Basic Cloud Script

Launching a Cluster

Using Persistent Clusters

JSON Spec File Keys

Sample Cassandra JSON spec file

Sample Hadoop JSON spec file

Sample Hadoop/Cassandra Hybrid JSON spec file

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally