Running a Mahout Job

This Quick Start Guide is for users, who want to run Mahout jobs using Qubole Data Service (QDS).

Example Mahout Job

For this example, a simple recommender job is used. To make this example easily accessible to Qubole users, the required data and code are provided in a publicly accessible bucket:

  • Input Data: s3://paid-qubole/mahout/links-converted.txt and s3://paid-qubole/mahout/users.txt
  • Jar File: s3://paid-qubole/mahout/mahout-core-0.7-job.jar This is the Mahout jar version 0.7.

Running Mahout Jobs from Analyze

Perform the following steps to run a Mahout job:

  1. Navigate to the Analyze page from the top menu and select the Compose tab.
  2. In Command Type, select the command type as Hadoop job from the drop-down list.
  3. Specify the location of the job JAR file in the Path to Jar File text field (in this case: s3://paid-qubole/mahout/mahout-core-0.7-job.jar)
  4. Specify the arguments to the JAR file in the Arguments text field. In the illustrated example provided below, these are the arguments:
-Dmapred.output.dir=hdfs:///tmp/mo1 --usersFile
s3://paid-qubole/mahout/users.txt --booleanData -s SIMILARITY_LOGLIKELIHOOD
--tempDir hdfs:///tmp/mo1-inter
  1. Click Run to execute the job. The status of the job is displayed in the Results tab.


You can provide an output location in a bucket that you own.

Congratulations! You have executed your first Mahout command using QDS.

You can also run a Mahout job for the example mentioned above by running a shell command. In the query composer of the Analyze page, select Shell Command from the Command Type drop-down list. Enter the bash command, hadoop dfs -cat /tmp/mo1/part* in the Bash Commands text field. Click Run to execute the job.

Further documentation is available at our Documentation home page.