Using QDS Package Management

Note

Package management is not enabled by default. Create a ticket with Qubole Support to enable it.

If package management is enabled, you can use the Environments page in the Control Panel of the QDS UI to manage Python and R packages in Spark applications; and in addition, QDS automatically attaches an environment with Python version 3.5 to an Airflow 1.8.2 cluster.

The QDS package manager provides:

  • R and Python version selection through the QDS UI to create an environment.
  • An environment loaded with default Anaconda packages. You can install additional Python and R packages via the Environments page.
  • An environment loaded with the CRAN package repo, which you can use to install only R packages. See Adding a Python or R Package for more information.
  • Distributed installation of packages in a running Spark application or Airflow workflow.

Use the Environments tab for:

Package Management Environment API provides a list of APIs for creating, editing, cloning, and viewing an environment, and for attaching a cluster to an environment.

Creating an Environment

Navigate to the Environments page in the Control Panel and choose New to create a new environment. The following dialog appears:

../../_images/Environment.png

To create an environment, perform these steps:

  1. Name the environment.
  2. Provide a description for the environment.
  3. Select the Python Version. Python 2.7 is the default version; Python 3.5 is the other supported version.
  4. Currently, only R Version 3.3 is supported on Qubole.
  5. Click Create. A new environment is created and displayed:
../../_images/NewEnv.png

A newly created environment by default contains the Anaconda distribution of R and Python packages and a list of pre-installed Python and R packages. Click See list of pre-installed packages. See also Viewing the List of Pre-installed Python and R Packages.

You can also edit or clone an environment, as described under Editing an Environment and Cloning an Environment.

Attaching a Cluster to an Environment

Click Edit against Cluster Attached to attach an environment to a cluster. After you click Edit, you can see a drop-down list of available Spark clusters or Airflow clusters, for example:

../../_images/AttachClustertoEnv.png

You can attach an environment only to a cluster that is down. You can attach only one cluster to an environment.

Select the cluster that you want to attach to the environment and click Attach Cluster.

You can attach environments to Spark clusters. A Conda virtual environment gets created for Python and R environments. In the Spark cluster, Python and R Conda environments are located in /usr/lib/envs/. The spark.pyspark.python configuration in /usr/lib/spark/conf/spark-defaults.conf points to the Python version installed in the Conda virtual environment for a Spark cluster.

In a Spark notebook associated with a cluster attached to the package management environment, configure these in its interpreter settings to point to the virtual environment:

  • Set zeppelin.R.cmd to cluster_env_default_r
  • Set zeppelin.pyspark.python to cluster_env_default_py

You can also attach an environment with Python 3.5 to a cluster running Airflow 1.8.2; any previous environment must be detached from the cluster first. For more information, see Configuring an Airflow Cluster.

To detach a cluster from an environment, click the Delete icon next to the cluster ID. The cluster must be down. If you detach a Python 3.5 environment from an Airflow 1.8.2 cluster, you must attach another Python 3.5 environment or the cluster will not start.

Adding a Python or R Package

A newly created environment contains the Anaconda distribution of R and Python packages by default. An environment also supports the conda-forge channel which supports more packages. Questions about Package Management provides answers to questions related to adding packages.

To add Python or R packages, click Add against Packages in a specific environment. The Add Packages dialog appears as shown here.

../../_images/AddPackages.png

Perform these steps:

  1. By default, the Source shows Py Packages. You can choose R Packages as the source from the list to install an R package.

  2. Adding source supports two input modes: Simple and Advanced. The Simple mode is the default input mode and add the name of the package in the Name field.

    As you try to type the name, an autocomplete list appears and the package name can be added and the version is optional and it can be incremental as shown here.

    ../../_images/SimpleModePackage.png ../../_images/SimpleModePackage2.png

    If you just mention the package name, then the latest version of the package is installed.

    Note

    If you upgrade or downgrade a Python package, the changed version is reflected only after you restart the Spark interpreter. Interpreter Operations lists the restart and other Spark interpreter operations.

    If you choose the Advanced mode, it shows suggestions and as you start typing the package name, you can see the autocomplete list as shown here.

    ../../_images/AutoCompletePackage.png

    In the Advanced mode, you can add multiple names of packages as a comma-separated list. You can also mention a specific version of the package, for example, numpy==1.1. For downgrading, you can just mention the version number to which you want to downgrade. If you just mention the package name, then the latest version of the package is installed.

    Qubole supports adding a R Package from the CRAN package repo. This feature enhancement is not available by default. Create a ticket with Qubole Support to enable this feature on a QDS account. Qubole allows you to add an R package from the CRAN package repo only in the Advanced Mode. To add a R package from the CRAN package repo, follow these steps:

    1. Click Add Package.
    2. Select R Packages as the Source.
    3. In the CRAN package, you can enter a comma-separated R package names. You can also simultaneously install packages from the Conda Packages. The Conda Packages as well as CRAN Packages text fields accept a comma-separated list of packages.

    Here is an example of the UI dialog to add R Packages in the Advanced Mode.

    ../../_images/CranPackage-R.png

    After adding Python or R package, click Add. The package gets added with its status first shown as Installing.. as shown here.

    ../../_images/Package-Initial.png

    After a while, the status appears as Installed as shown here.

    ../../_images/Package-Success.png

Removing a Python or R Package

To remove a Python or R Package in an environment, click the delete icon that is against that package. Here is an example that shows the icon against the installed package.

../../_images/Package-Success.png

Editing an Environment

You can edit an existing environment. In the left-navigation bar, you can see a Gear (settings icon) if you do a mouse hover on a specific environment. Click the icon and you can see these options.

../../_images/EnvironSettings.png

Click Edit and you can see the dialog as shown here.

../../_images/EditEnv.png

You can edit the name and description of an environment. After changing the name and/or description, click Edit. You can click Cancel if you do not want to edit the environment.

Cloning an Environment

When you want to use the same environment on a different cluster, clone it and attach it to that cluster. (An environment can be attached to only one cluster). In the left-navigation bar, you can see a Gear (settings icon) if you do a mouse hover on a specific environment. Click the icon and you can see these options.

../../_images/EnvironSettings.png

Click Clone and you can see the dialog as shown here.

../../_images/CloneEnv.png

By default, a suffix to the name that is <environment name>-clone is added in the Name field. You can retain that name or change it. You can also change the description. You cannot change application versions. After doing the changes, click Clone. You can click Cancel if you do not want to clone the environment.

Managing Permissions of an Environment

Here, you can set permission for an environment. By default, all users in a Qubole account have read access on the environment but you can change the access. You can override the environment access that is granted at the account-level in the Control Panel. If you are part of the system-admin group or any group which have full access on the Environments and Packages resource, then you can manage permissions. For more information, see Managing Roles.

Set Object Policy for a Package Management Environment describes how to set the permissions through the REST API.

A system-admin and the owner can manage the permissions of a environment by default. Perform the following steps to manage a environment’s permissions:

  1. Click the gear box icon next to the environment and click Manage Permissions from the list of options (that are as displayed here).

    ../../_images/EnvironSettings.png
  2. The dialog to manage permissions for a specific environment is displayed as shown in the following figure.

    ../../_images/ManagePerm-PM.png
  3. You can set the following environment-level permissions for a user or a group:

    • Read: Set it if you want to change a user/group’s read access to this specific environment.
    • Update: Set it if you want a user/group to have write privileges for this specific environment.
    • Delete: Set it if you want a user/group who can delete this specific environment.
    • Manage: Set it if you want a user/group to grant and manage access to other users/groups for accessing this specific environment.
  4. You can add any number of permissions to the environment by clicking Add Permission.

  5. You can click the delete icon against a permission to delete it.

  6. Click Save for setting permissions to the user/group. Click Cancel to go back to the previous tab.

Deleting an Environment

You can delete an environment. In the left-navigation bar, you can see a Gear (settings icon) if you do a mouse hover on a specific environment. Click the icon and you can see these options.

../../_images/EnvironSettings.png

Click Delete to remove the environment.

Migrating Existing Interpreters to use the Package Management

Even after attaching a Spark cluster to an environment, existing Spark interpreters in the notebook keep using the system/virtualenv Python and system R. To use the environment, change Python and R interpreter property values in the existing interpreter to use Anaconda-specific Python and R. Change these interpreter property values:

  • Set zeppelin.R.cmd to cluster_env_default_r.
  • Set zeppelin.pyspark.python to cluster_env_default_py.

The interpreter automatically restarts after its properties change.

However, a new Spark (not a cloned cluster) cluster, which is attached to an environment contains the default Spark Interpreter set to Anaconda-specific Python and R that is cluster_env_default_py and cluster_env_default_r. Similarly, a new interpreter on an existing cluster uses the Anaconda-specific Python and R.

Note

After a cluster is detached from an environment, the Spark interpreter (existing or new) falls back to system/virtualenv Python and system R.