Troubleshooting Errors and Exceptions in Notebook Paragraphs

This topic provides information about the errors and exceptions that you might encounter when running notebook paragraphs. You can resolve these errors and exceptions by following the respective workarounds.

Notebook fails to load
Paragraph stops responding
Paragraph keeps running for a long time
Error due to insufficient Spark driver memory
Paragraphs Failed
- TTransportException
- Nullpointer Exception in a Spark Notebook

Notebook fails to load

Description: Notebook fails to get loaded on the UI and the following error might occur.
```
502 Bad Gateway
```
This error occurs mainly when the Zeppelin server is not running or the underneath daemon is getting killed.
Resolution:
1. Check Zeppelin server logs at /media/ephemeral0/logs/zeppelin/logs/<zeppelin_server.log> and /media/ephemeral0/logs/zeppelin/logs/<zeppelin_server_log.out> files.
2. The logs might contain the Zeppelin running Out Of Memory (OOM) error as shown below.
```
at org.eclipse.jetty.server.Server.doStart(Server.java:354)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.apache.zeppelin.server.ZeppelinServer.main(ZeppelinServer.java:204)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
```
  The default heap space might not be sufficient for Zeppelin to load all the notebooks. Default heap space of Zeppelin server is configured to be 10% of the coordinator node memory. Ideally, the coordinator node should be configured with higher memory if number of notebooks is large or delete notebooks that are not in use.
3. Increase heap memory using node bootstrap. Contact Qubole Support.

Paragraph stops responding

Description: While using a notebook, paragraphs might stop responding due to various reasons.
Resolution:
1. Click the Cancel button.
2. If canceling the paragraph fails, then navigate to the Interpreters page and restart the corresponding interpreter.
3. If the issue still persists, restart the Zeppelin server by running the following command as a root user:
```
/usr/lib/zeppelin/bin/zeppelin-daemon.sh restart
```
Description: Paragraphs might stop responding when the spark job is sluggish or when the spark job fails.
Resolution:
1. In the Notebooks page, navigate to Interpreters and click Logs.
2. Open the corresponding Interpreter logs.
3. Analyze the log files for the container, executor or task.
4. Check connectivity to the thrift server.

Paragraph keeps running for a long time

Description: Due to less resources, paragraphs might run for a long time.
Resolution: Tune the job by providing more resources like minimum number of executors, executor memory, executor memory overhead, and max executors.
1. Set an appropriate high value for minimum number of executors, executor memory, executor memory overhead, and max executors in the Interpreter settings.
2. Restart the interpreter.

Error due to insufficient Spark driver memory

Description: For Qubole notebook, if the configured spark driver memory is not sufficient to run the job the following error occurs.

Interpreter JVM has stopped responding. Restart interpreter with higher driver memory controlled by setting spark.driver.memory.

Resolution:
1. Set an appropriate high value for driver memory value by configuring spark.driver.memory in the Interpreter settings.
2. Restart the interpreter.

Paragraphs Failed

Paragraphs might fail for various reasons. You should identify if the paragraph failed due to the interpreter or the job.

Analyze the interpreter logs to check if there is an issue with the interpreter.
If there are no failures at the interpreter, then check the logs in the Spark UI. Analyze the executor container logs or failed executor logs. See Accessing the Spark Logs.
If the failures are in the Spark job, see Troubleshooting Spark Issues.

If the issue still persists, then contact Qubole Support.

TTransportException

Description: While running paragraphs, the TTransportException exception as shown below might occur due to various unexpected reasons. This exception signifies that there is some error in the communication between Zeppelin and driver/spark-applications.

org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:249)

Resolution: Depending on the error in the communication between Zeppelin and driver/spark-applications, perform the appropriate actions:

Metastore connectivity failure

Check the Interpreter logs. In the Notebooks page, navigate to Interpreters and click Logs.

If Zeppelin is not able to connect to metastore, then the logs might contain one of the following errors.

Error 1:

MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset

OR

Error 2:

Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out


at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client

Verify the metastore connectivity and rerun the job.

Interpreter not initiated

The interpreter might have not been initiated due to large or insufficient driver memory. Set an appropriate high value of driver memory by configuring spark.driver.memory in the Interpreter settings. Rerun the job.
Restart the interpreter and rerun the job.

Nullpointer Exception in a Spark Notebook

Description: In a Spark notebook, sometimes you cannot create a Spark Session and the following Nullpointer exception occurs.

java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:34)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:467)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:456)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:156)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:938)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:531)
at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:201)
at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:170)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:344)
at org.apache.zeppelin.scheduler.Job.run(Job.java:185)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
...

Resolution:
1. Check if there are any artifacts (dependencies in Interpreter Settings). Remove artifacts (if any) and restart the interpreter.
2. If the problem persists even after removing artifacts, trace the error in Intepreter logs as described here:
  1. In the Notebooks page, navigate to Interpreters and click Logs.
  2. Open the corresponding Interpreter logs.
  3. Trace errors in that log file(s).
If you are still unable to trace the error, create a ticket with Qubole Support.