Composing a Hadoop Job

Use the command composer on the Analyze page to compose a Hadoop job.

You can use the query composer for these types of Hadoop job:

Note

Before running a Hadoop job, make sure that the output directory is new and does not exist.

Hadoop and Presto clusters support Hadoop job queries. See Mapping of Cluster and Command Types for more information.

See running-hadoop-job for an example.

Compose a Hadoop Custom Jar Query

Perform the following steps to compose a Hadoop jar query:

Note

Using the Supported Keyboard Shortcuts in Analyze describes the supported keyboard shortcuts. In new QDS accounts, QDS provides example saved queries of different command types. For more information, see Workspace Tab.

  1. Navigate to the Analyze page and click Compose. Select Hadoop Job from the Command Type drop-down list. Custom Jar is selected by default in the Job Type drop-down list, and this is what you want.

  2. In the Path to Jar File field, specify the path of the directory that contains the Hadoop jar file.

  3. In the Arguments text field, specify the main class, generic options, and other JAR arguments. The following figure shows an example using AWS S3:

    ../../_images/ComposeHadoopJob.png
  4. Click Run to execute the query. Click Save if you want to re-run the same query later. (See Workspace for more information on saving queries.)

You can see the result under the Results tab, and the logs under the Logs tab. For more information on how to download command results and logs, see Downloading Results and Logs.

For information on the REST API, see Submitting a Hadoop Jar Command. For information on using Cascade to develop applications, see Use Cascading with QDS.

Compose a Hadoop Streaming Query

Perform the following steps to compose a Hadoop streaming query:

  1. Navigate to the Analyze page and click Compose.
  2. Select Hadoop Job from the Command Type drop-down list.
  3. Select Streaming from the Job Type drop-down list.
  4. In the Arguments field, specify the streaming and generic options.
  5. Click Run to execute the query. Click Save if you want to re-run the same query later. (See Workspace for more information on saving queries.)

You can see the result under the Results tab, and the logs under the Logs tab. For more information on how to download command results and logs, see Downloading Results and Logs.

Compose a Hadoop DistCp Command

Perform the following steps to compose a Hadoop DistCp command:

  1. Navigate to the Analyze page and click + Create. Select Hadoop Job from the Command Type drop-down list.
  2. From the Job Type drop-down list, select s3distcp for AWS, or clouddistcp for Azure, GCP, or Oracle.
  3. In the Arguments text field, specify the generic and DistCp options.
  4. Click Run to execute the command. Click Save if you want to re-run the same command later. (See Workspace for more information on saving queries.)

You can see the result under the Results tab, and the logs under the Logs tab. For more information on how to download command results and logs, see Downloading Results and Logs.