Composing a Hadoop Job

Use the command composer on the Analyze page to compose a Hadoop job. See the Quick Start Guide for an example.

You can use the query composer for these types of Hadoop job:

Note

Hadoop 1, Hadoop 2, and Presto clusters support Hadoop job queries. See Mapping of Cluster and Command Types for more information.

Ensure that the output directory is new and does not exist before running a Hadoop job.

Qubole has deprecated Hadoop 1 as-a-service. For more information, see Hadoop 1 is Deprecated.

Compose a Hadoop Custom Jar Query

Note

Using the Supported Keyboard Shortcuts in Analyze describes the supported keyboard shortcuts. In new QDS accounts, QDS provides example saved queries of different command types. For more information, see Workspace Tab.

Perform the following steps to compose a Hadoop jar query:

  1. Navigate to the Analyze page and click Compose. Select Hadoop Job from the Command Type drop-down list. Custom Jar is selected by default in the Job Type drop-down list, and this is what you want.

  2. In the Path to Jar File field, specify the path of the directory that contains the Hadoop jar file.

  3. In the Arguments text field, specify the main class, generic options, and other JAR arguments. The following figure shows an example using AWS S3:

    ../../_images/ComposeHadoopJob.png
  4. Click Run to execute the query. Click Save if you want to re-run the same query later. (See Workspace for more information on saving queries.)

For REST API-related information, see Submitting a Hadoop Jar Command. For developing applications, see Use Cascading with QDS.

You can see the result under the Results tab, and the logs under the Logs tab. The Logs tab has the Errors and Warnings filter. For more information on how to download command results and logs, see Download Results and Logs from the Analyze UI.

Compose a Hadoop Streaming Query

Perform the following steps to compose a Hadoop streaming job query:

  1. Navigate to the Analyze page and click Compose.
  2. Select Hadoop Job from the Command Type drop-down list.
  3. Select Streaming from the Job Type drop-down list.
  4. In the Arguments field, specify the streaming and generic options.
  5. Click Run to execute the query. Click Save if you want to re-run the same query later. (See Workspace for more information on saving queries.)

You can see the result under the Results tab, and the logs under the Logs tab. The Logs tab has the Errors and Warnings filter. For more information on how to download command results and logs, see Download Results and Logs from the Analyze UI.

Compose a Hadoop DistCp Command

Perform the following steps to compose a Hadoop DistCp command:

  1. Navigate to the Analyze page and click + Create. Select Hadoop Job from the Command Type drop-down list.
  2. From the Job Type drop-down list, select s3distcp for AWS, or clouddistcp for Azure or Oracle.
  3. In the Arguments text field, specify the generic and DistCp options.
  4. Click Run to execute the command. Click Save if you want to re-run the same command later. (See Workspace for more information on saving queries.)

You can see the result under the Results tab, and the logs under the Logs tab. The Logs tab has an Errors and Warnings filter. For more information on how to download command results and logs, see Download Results and Logs from the Analyze UI.