Use Cascading with QDS

Cascading is a platform to develop big data applications on Hadoop and it has advantages over other MapReduce-based tools. Create an uber jar (that is with cascading classes bundled into it) wordcount-1.0-SNAPSHOT.jar using the Wordcount example in the target directory. Upload that jar to an Amazon S3 location. After uploading it, just run an Hadoop Command on QDS with following parameters:

Job Type: Custom Jar
Path to Jar File: <s3_location>/wordcount-1.0-SNAPSHOT.jar
Arguments:

com.qubole.cascading.WordCount

s3://paid-qubole/default-datasets/gutenberg/

s3://<output_location>/

The final data is present in s3://<output_location>/.

Compose a Hadoop Custom Jar Query describes how to compose a Hadoop custom jar command using the Analyze UI query composer and Submit a Hadoop Jar Command describes how to compose a REST API Call to submit a Hadoop jar command.

For creating own Cascading applications on QDS, follow the dependencies as mentioned in the pom.xml file available in the Wordcount example. The README file in the Wordcount example also describes how to run the cascading job.