Using Deep Learning Notebooks

QDS supports adding/managing a notebook through the UI and API. For Notebooks UI, see Notebooks and for notebooks through REST APIs, see Notebook API.

As Python 3.5 is the default and supported version to use Python 2.7.13 on Deep Learning notebooks, go to interpreter setting and set zeppelin.pyspark.python property in the user-level interpreter to /usr/lib/a-4.2.0-py-2.7.13-dl-gpu-full/bin/python.

Python is the default language on Deep Learning clusters and notebooks. In addition to pyspark interpreter, Deep Learning notebooks support other Spark type interpreters such as scala and r.

There are two ways of using Deep Learning and they are:

  • Non-distributed Deep Learning: In this mode, each Deep Learning job uses resources of any one slave node. Multiple users can use the same cluster to run different jobs. The recommended way to launch Deep Learning jobs is through notebooks.

  • Distributed Deep Learning: This mode supports use cases where the size of data is so large that storage and computations cannot be handled by a single node. Currently, Qubole supports distributed mode only for TensorFlow using Yahoo’s open-source project TensorflowOnSpark. As the dynamic allocation is disabled by default in Deep Learning clusters, set these properties in the Deep Learning notebook’s Spark interpreter to enable dynamic allocation:

    • spark.dynamicAllocation.enabled to true
    • spark.executor.instances = <the number of executors you want>
    • spark.qubole.max.executors = spark.executor.instances

    In this mode, each executor takes the entire slave node. Ensure that the cluster’s minimum and maximum nodes are set appropriately so that Deep Learning jobs can autoscale if required. For distributed mode, the maximum cluster size must be 1+ (one more) than the number of executors given as input to the TFOS job.