Error invoking plotly's init_notebook_mode with Jupyter (Apache Toree PySpark)

1k Views Asked by At

I'm running Jupyter (v4.2.1) with Apache Toree - PySpark. When I try to invoke plotly's init_notebook_mode function, I run into the following error :

import numpy as np
import pandas as pd

import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode()

Error :

Name: org.apache.toree.interpreter.broker.BrokerException
Message: Traceback (most recent call last):
  File "/tmp/kernel-PySpark-6415c581-01c4-4c90-b4d9-81773c2bc03f/pyspark_runner.py", line 134, in <module>
    eval(compiled_code)
  File "<string>", line 7, in <module>
  File "/usr/local/lib/python3.4/dist-packages/plotly/offline/offline.py", line 151, in init_notebook_mode
    display(HTML(script_inject))
  File "/usr/local/lib/python3.4/dist-packages/IPython/core/display.py", line 158, in display
    format = InteractiveShell.instance().display_formatter.format
  File "/usr/local/lib/python3.4/dist-packages/traitlets/config/configurable.py", line 412, in instance
    inst = cls(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 499, in __init__
    self.init_io()
  File "/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py", line 658, in init_io
    io.stdout = io.IOStream(sys.stdout)
  File "/usr/local/lib/python3.4/dist-packages/IPython/utils/io.py", line 34, in __init__
    raise ValueError("fallback required, but not specified")
ValueError: fallback required, but not specified

StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140)
org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:140)
scala.Option.foreach(Option.scala:236)
org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:139)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
py4j.Gateway.invoke(Gateway.java:259)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.GatewayConnection.run(GatewayConnection.java:209)
java.lang.Thread.run(Thread.java:745)

I'm unable to find any info about this on the web. When I digged into the code where this is failing - io.py in IPython utils, I see that the stream that is being passed must have both the attributes - write as well as flush. But for some reason, the stream passed in this case - sys.stdout has only the "write" attribute, and not the "flush" attribute.

1

There are 1 best solutions below

0
On

I believe this happens because plotly's notebook mode assumes that it is running inside an IPython jupyter kernel doing the notebook communictation; you see in the stacktrace that it's trying to call into IPython packages.

Toree, however, is a different jupyter kernel and has its own protocol handling for communicating with the notebook server. Even when you use toree to run a PySpark interpreter, you get a "plain" PySpark (just like when you start it from a shell) and toree drives the input/output of that interpreter.

So the IPython machinery is not set up and calling init_notebook_mode() in that environment will fail, just like it would when you run in in a PySpark started directly from the shell, which knows nothing about notebooks.

To my knowledge, there is currently no way to get plotting output from a PySpark session run via toree -- we recently faced the same problem. Instead of running python via toree, you need to run an IPython kernel, import the PySpark libs there and connect to your Spark cluster. See https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook for a dockerized example to do that.