Running a Mahout Job¶
This Quick Start Guide is for users, who want to run Mahout jobs using Qubole Data Service (QDS).
Example Mahout Job¶
For this example, a simple recommender job is used. To make this example easily accessible to Qubole users, the required data and code are provided in a publicly accessible bucket:
- Input Data: s3://paid-qubole/mahout/links-converted.txt and s3://paid-qubole/mahout/users.txt
- Jar File: s3://paid-qubole/mahout/mahout-core-0.7-job.jar This is the Mahout jar version 0.7.
Running Mahout Jobs from Analyze¶
Perform the following steps to run a Mahout job:
- Navigate to the Analyze page from the top menu and select the Compose tab.
- In Command Type, select the command type as Hadoop job from the drop-down list.
- Specify the location of the job JAR file in the Path to Jar File text field (in this case: s3://paid-qubole/mahout/mahout-core-0.7-job.jar)
- Specify the arguments to the JAR file in the Arguments text field. In the illustrated example provided below, these are the arguments:
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=s3://paid-qubole/mahout/links-converted.txt -Dmapred.output.dir=hdfs:///tmp/mo1 --usersFile s3://paid-qubole/mahout/users.txt --booleanData -s SIMILARITY_LOGLIKELIHOOD --tempDir hdfs:///tmp/mo1-inter
- Click Run to execute the job. The status of the job is displayed in the Results tab.
You can provide an output location in a bucket that you own.
Congratulations! You have executed your first Mahout command using QDS.
You can also run a Mahout job for the example mentioned above by running a shell command. In the query composer of the Analyze page, select Shell Command from the Command Type drop-down list. Enter the bash command, hadoop dfs -cat /tmp/mo1/part* in the Bash Commands text field. Click Run to execute the job.
Further documentation is available at our Documentation home page.