The purpose of this tutorial is to introduce you to some of the key concepts about RightGrid and demonstrate how to set up a RightGrid application of your own.
In this example, you will build a "Hello World" RightGrid application that demonstrates the batch processing functionality of the RightGrid system.
NOTE: This tutorial only applies to Grid or Premium accounts. If you have a Developer account and would like to upgrade, please contact sales@rightscale.com.
Before you attempt to set up a RightGrid application, it's important to understand some of the basic concepts in order to better understand how all the different pieces fit together. The diagram below highlights the main parts of a basic RightGrid application.

Now that you understand the core components of RightGrid, we will now show you how to set up a basic RightGrid application.
When creating a RightGrid application or porting an existing application to RightGrid, most users perform the following tasks:
Step 1: Create an SQS Queues (Input and Output)
Step 2: Create an S3 bucket
Step 3: Create a Job Producer
Step 4: Create a Job Consumer
Step 5: Create a RightGrid Configuration File
Step 6: Create your Application/Kicker Class
Step 7: Configure a ServerTemplate for Worker Instances
Step 8: Create a Queue-based Server Array
Step 9: Launch a Worker Instance and Test the Results
The first step is to create your SQS Queues. You'll need to create an SQS Input Queue to receive work unit messages from the job producer and an SQS Output Queue to receive the result messages after the work unit has been processed. Optionally, you can create an SQS Audit Queue where RightGrid will send its audit entires.

Go to Manage -> Queues. Click the New Queue button.

Click the Create button. A confirmation window will appear where you can add a message to the queue as a test.
Go back to Clouds -> AWS -> Queues and repeat the process by creating an SQS Output Queue.
You should now have an input queue and output queue.

NOTE: Similar to S3 bucket names, SQS Queue names must be unique.
You will need to create an S3 bucket to store work unit data. You can use the same S3 bucket for storing input and output (result) data.

Go to Manage -> Storage -> S3 Browser and click the New Bucket button.

The job producer does not have to be running on EC2. It can be located anywhere on the Internet. The code for the job producer and job consumer can be written in any programming language provided that you can upload/download data to S3 and send/receive work units from SQS queues.

A job producer performs the following tasks:
To create a job producer follow the steps below.
Similar to SQS messages, input queue messages are limited to 256KB. In most cases, the input queue messages contain only the work unit meta-data while the actual input data files for the worker application are uploaded to S3.
The sample code below is written in Ruby. Use this code as a template for creating your own job producer.
require 'yaml'
require 'rubygems'
require 'right_aws'
def upload_file(bucket, key, data)
bucket.put(key, data)
end
def enqueue_work_unit(queue, work_unit)
queue.send_message(work_unit)
end
# Load jobspec
jobspec = YAML::load_file("oneshotspec.yml")
# Get S3 and SQS handle
s3 = RightAws::S3.new(jobspec[:access_key_id], jobspec[:secret_access_key])
bucket = s3.bucket(jobspec[:bucket], false)
sqs = RightAws::SqsGen2.new(jobspec[:access_key_id], jobspec[:secret_access_key])
inqueue = sqs.queue(jobspec[:inputqueue], false)
# Generate work units
for id in 1...(jobspec[:number_of_units]+1)
puts "Generating work unit #{id}"
filename = "in/Log#{id}.log"
text = "HelloWorld!"
work_unit = {
:created_at => Time.now.utc.strftime('%Y-%m-%d %H:%M:%S %Z'),
:s3_download => [File.join(jobspec[:bucket], filename)],
:worker_name => jobspec[:worker_name],
:id => id,
}
wu_yaml = work_unit.to_yaml
upload_file(bucket, filename, text)
enqueue_work_unit(inqueue, wu_yaml)
puts wu_yaml
end
oneshotspec.yml
--- :name: OneshotJob :worker_name: RGHelloWorld :number_of_units: 5000 :bucket: dw_rightgrid_demo :inputqueue: RG-Inputs :outputqueue: RG-Outputs :access_key_id: <AWS_ACCESS_KEY> :secret_access_key: <AWS_SECRET_ACCESS_KEY>
Similar to the job producer, the job consumer can be located anywhere on the Internet. The job consumer typically parses a work unit's result message and data files. It can also update a central database, if necessary.

To create a job consumer that's compatible with the RightGrid framework:
The sample code below was written in Ruby. Use this code as a template for creating your own job consumer.
jobconsumer.rb
require 'rubygems'
require 'yaml'
require 'right_aws'
def download_result(bucket, key)
bucket.get(key)
end
def dequeue_entry(queue)
queue.pop
end
# Load jobspec
jobspec = YAML::load_file("oneshotspec.yml")
# Get S3 and SQS handles
s3 = RightAws::S3.new(jobspec[:access_key_id], jobspec[:secret_access_key])
bucket = s3.bucket(jobspec[:bucket], false)
sqs = RightAws::SqsGen2.new(jobspec[:access_key_id], jobspec[:secret_access_key])
outputqueue = sqs.queue(jobspec[:outputqueue], false)
# Continually Pop messages off the result queue
while true do
msg = dequeue_entry(outputqueue)
#Here is where you would:
# 1. Decode msg
# 2. Download result files from s3
# 3. Update a central database/update job statistics
end
The rightworker.yml configuration file is the heart of a RightGrid application. The config file sets variables needed by the RightGrid worker daemon in order to call the user's application with the correct parameters.
The rightworker.yml file contains the following parameters:
The sample code below was written in Ruby. Use this code as a template for creating your own job producer.
rightworker.yml
development:
RightWorkersDaemon:
aws_access_key: <AWS_ACCESS_KEY>
aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
log: RGHelloWorld.log
email: yourName@yourSite.com
halt_on_exit: true
workers: 1
user:
custom_entry_a: user_entry_1
custom_entry_b: user_entry_2
queues:
rg_example_inputs:
invocation_model: oneshot
result_queue: rg_example_results
audit_queue: rg_example_audit
message_decoder: RightYamlDecoder
s3_log: dw_rightgrid_demo/log/%{DATE}/%{MESSAGE_ID}
s3_out: dw_rightgrid_demo/out/%{DATE}-%{TIME}-%{MESSAGE_ID}
receive_message_timeout: 3600
default_worker_name: RGHelloWorld
life_time_after_fault: 7200
s3_in: /tmp/s3_in
s3_in_delete: false
s3_in_overwrite: false
s3_in_flat: true
In the sample code above, the parameters are defined in three sub-sections.
The 'Environment' section is the highest-level section in the configuration file and is commonly used to create different configurations setups, such as for development, testing, and production. You can have multiple environments and use a different RightGrid application for each environment. Each environment section requires a subsection called 'RightWorkersDaemon.'
In this example, we are defining a 'development' environment.
The 'RightWorkersDaemon' section holds all of the RightGrid-specific configuration information. RightGrid will ignore any other subsections of the 'Environment' section.
The 'RightWorkersDaemon' section includes the following variables:
The 'User' section holds application-specific configuration information. This is a useful way for passing common variables/information to all worker instances. It can contain any number of key/value pairs. RightGrid does not read this information, but rather passes it on to the do_work() method of the application class as part of the message_env hash. In the sample code above, the 'custom_entry_a' and 'custom_entry_b' values will be passed to all worker instances.
The 'Queue' subsection which defines one or more input queues to monitor. If multiple queues are specified, RightGrid will monitor them in round-robin order. The title of each queue subsection must be the exact name of the input queue.
The following variables are common to all queues:
These are only a sample of the variables that can be defined in the rightworker.yml file. For a complete list of all the variables, see the RightGrid User Guide.
RightGrid supports two ways of invoking the user's application: one shot or persistent. In this example, we are using the 'one shot' invocation model where a new process is created for each new work unit that is received by a worker.
RGHelloWorld.rb
class RGHelloWorld
def do_work(message_env, message)
starttime = Time.now
for j in 0...10000 do
2 + 2
end
finishtime = Time.now
result = {
:result => 0,
:id => message_env[:id],
:starttime => starttime,
:finishtime => finishtime,
:created_at => message_env[:created_at],
:output => "Goodbye World!"
}
end
end
Where message_env is given by the following hash:
message_env = {
'tmp_dir' => “Directory for temp files”,
's3_in' => "Directory with the files downloaded from S3",
'output_dir' => "Directory where the app or kicker should put output files to be
uploaded to S3",
'log_dir' => "Directory where the app or kicker should put log files to be
uploaded to S3",
'log_file' => "File for right_worker logs',
'message_id' => "SQS message id",
'controller' => "RightWorkersDaemon",
'worker_name' => "name of user worker"
'logger' => "handle to the RightGrid logger object"
‘s3_downloaded_list’ => {‘bucket/key1’=> ‘local_file1’, ..., ‘bucket/keyN’=>
‘local_fileN’}
}
Each worker instance must be created with the same ServerTemplate. RightScale provides the RightGrid Worker ServerTemplate, which is already configured to install RightGrid and automatically start the rightworker daemon. Simply configure the ServerTemplate with the appropriate input parameters such as your SVN repository and add any custom RightScripts for your application and its dependencies.
Go to Design -> ServerTemplates. Under the RightScale tab, filter by 'worker' and select the latest version of the RightGrid Worker template.

Since you will probably customize the ServerTemplate, click the Clone button and rename the ServerTemplate "Worker."
Now go to the cloned ServerTemplate's Inputs tab and click the Edit link.

Each instance will need to be able to access your SVN repository in order to download your application code.
Define the the following input parameters.
The last step is to create a scalable server array for all worker instances.
You can create two types of server arrays, based on how you want the server array to resize. You can either create a server array that will resize based on the number of jobs in the queue or based on the amount of time a job is in the queue. For more information, see Server Arrays.
In this example we will create a server array based on the number of jobs in the queue.
Go to Clouds -> AWS -> Queues and click the New button.

Click Save.

The next step is to activate your server array. Click the enable link.
You're almost ready to launch your RightGrid application.
The first step is to start the job producer to put work units into the SQS Input Queue. Now go to your input queue to see the number of work units. Go to Manage -> Queues.

Next, go to your RightGrid deployment (ex: RightGrid Example). Notice that the "RG Worker Array" is active and running.

Click the Launch button. An input confirmation window will appear. Confirm that you are using the correct launch inputs and click the Launch button.
Since we have 500 work units sitting in the input queue and we specified "20 items per instance" 25 instances would be launched, but since we defined a maximum of 20 instances in the server array, only 20 instances were launched.
Now check your SQS Queues. You should now see the number of jobs in the output queue slowly increase as more jobs from the input queue are successfully processed.

NOTE: The job producer and job consumer can run continuously.
Eventually all of the jobs will be processed and the server array will eventually resize down to 1 worker instance.
Congratulations! You've just created a basic RightGrid application and have seen how powerful and easy it is to use cloud computing resources for more efficient batch processing tasks.
----------------------
Did you find this document helpful? Please feel free to leave us a comment below so that we'll know how we can improve our documentation. Thanks!