How Sessions Work¶
Hive allows you to embed code (Python scripts, shell scripts, Java functions) in SQL queries. This is a way to add functionality that is not natively present in HiveQL. See this Hive page for more information and examples.
Qubole simulates this functionality by associating every command in QDS with a session. Sessions allow you to create temporary data sets, configure parameters which can be used to tune query behavior, and add your code as scripts to QDS to run transformations within HiveQL. These parameters, datasets, and user-defined transformations are active only for the session. A session’s duration is configurable; the default is two hours.
To ensure that scripts are accessible to Qubole’s Hive clusters, upload them to your Cloud storage. A script can be in any Cloud location that is readable using the keys associated with the account, but Qubole recommends that you place scripts in the default location’s scripts folder. Once you have uploaded a script, add it to your session by means of an add file … command.
For example, consider a simple
#!/usr/bin/python import sys line = sys.stdin.readline(); while line: print line line = sys.stdin.readline(): sys.exit(0)
- If the default AWS location is
- If the default Azure location is
- If the default Oracle OCI location is
Then you can add it to the session and then use it in a query; for example (AWS):
add file s3n://prod.qubole.com/ec2-user_hu_6/scripts/test.py; select count (*) from (select transform(a) using 'python test.py' from tb) V
add file wasb://email@example.com/scripts/test.py select count (*) from (select transform(a) using 'python test.py' from tb) V
or (Oracle OCI):
add file oci://<bucket>@qubole/defloc/scripts/hadoop/test.py select count (*) from (select transform(a) using 'python test.py' from tb) V
Creating and Managing Sessions¶
Navigate to the Sessions page in the Control Panel.
To create a session, click the Add icon at the top right corner of the Sessions page.
A dialog appears:
Select the cluster on which you want to create a session from the drop-down list and click Create Session. A session is created; for example:
The Sessions page contains five columns:
Id: The session ID.
Cluster Id: The ID of the cluster on which session is created.
Start Time: The start time of the session
Duration: The default duration is two hours. You can change the number of hours to any value between 1 and 6.
Commands: The number of commands that have been run. To see the commands, click the number. This takes you to the Compose tab of the Analyze page.
Action: Click the down arrow in the Action column to see a list of actions:
You can perform the following operations on an existing session:
View Commands: Select this to see the commands that are running in the session; the Session Details dialog appears. Click the down arrow in Action column:
You can choose to go to the Analyze page, or to Remove the command.
Change Duration: Select this to change the session duration. The Set Duration dialog appears:
The default is two hours. To change it, enter a new value between 1 and 6, and click OK. Click Cancel to restore the previous setting.
Deactivate: Select this to deactivate a session. You can re-activate a deactivated session within two hours. To re-activate a deactivated session, click the down arrow in the Action column and click Activate.
Delete: Select this to delete the session.