Prerequisites for Data Visualization

Before using packages for data visualization, you must ensure that the required libraries are installed and the environment is set up appropriately.

Depending on whether the package management feature is enabled on your QDS account, perform the appropriate action:

Creating an Environment

You should create an environment, attach it to the cluster, and add the required packages for data visualization.

  1. From the Home menu, navigate to Control Panel >> Environments.

  2. Click New.

  3. In the Create New Environment dialog box, enter the name and the description in the respective fields.

  4. Select the appropriate Python and R versions in the respective drop-down lists, and click Create.

  5. Select the newly created environment. On the top right corner, from the Cluster drop-down list, select a cluster to attach the cluster to the environment.

    The following figure shows a sample environment that is created for package management.

    ../../../../../_images/sample-env.png
  6. Click See list of pre-installed packages link to view the list of pre-installed packages.

  7. If you want to add more packages or a different version of a pre-installed package, perform the following steps:

    1. Click +Add.

    2. In the Add Packages dialog box, from the Source drop-down list, select the required source package.

    3. Enter the name and version (optional) of the packages and click Add.

    The following figure shows a sample Add Packages dialog box.

    ../../../../../_images/sample-add-pkg.png

For more information, see Using the Default Package Management UI.

Installing the Libraries

  1. From the Home menu, navigate to Clusters. Select the required cluster to view the settings.

  2. Verify if the appropriate Python version is set as the default value for the cluster.

  3. If you want to change the default Python version, then add the following code to the Cluster node bootstrap script:

    source /usr/lib/hustler/bin/qubole-bash-lib.sh
    make-python<version>-system-default
    

The following example shows how to set Python 2.7 as the default version for the cluster.

source /usr/lib/hustler/bin/qubole-bash-lib.sh
make-python2.7-system-default
  1. Add the following code to the Cluster node bootstrap script to install the libraries.

    pip install <library name>
    

    The following example shows how to install Pandas and Plotly libraries.

    pip install pandas
    pip install 'plotly<=2.0'
    
  2. Navigate to Notebooks >> Interpreters. In the Interpreter settings, set zeppelin.default.interpreter to pyspark.