UnknownHostExceptionError in Spark Streaming

827 Views Asked by At

I want my code to read the json text file that is being generated per minute (it is the station feed data from the Citibike), and I tried to use Spark Streaming. But I keep getting the unknown host exception error.

My code:

    String url = "http://citibikenyc.com/stations/json";

    SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("Streaming");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaStreamingContext jssc = new JavaStreamingContext(sc, new Duration(60000));
    JavaDStream<String> lines = jssc.socketTextStream(url, 9999);
    lines.print();

    jssc.start();
    jssc.awaitTermination();

and the error:

14/11/22 15:32:54 ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0: Restarting        receiver with delay 2000ms: Error receiving data - java.net.UnknownHostException: http://citibikenyc.com/stations/json
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:208)
    at org.apache.spark.streaming.dstream.SocketReceiver.receive(SocketInputDStream.scala:71)
    at org.apache.spark.streaming.dstream.SocketReceiver$$anon$2.run(SocketInputDStream.scala:57)
14/11/22 15:32:54 INFO receiver.ReceiverSupervisorImpl: Stopped receiver 0
1

There are 1 best solutions below

0
On

.socketTextStream serves a completely different purpose. Spark Streaming does not have any receiver to fetch a URL periodically.

You will need to write a separate program to fetch the URL periodically and feed it to Spark Streaming. You have many options:

  • Write a shell script to download the URL periodically to a directory, then use Apache Flume to read the files in that directory and send them to Spark Streaming. There is an integration guide: Spark Streaming + Flume Integration Guide
  • Write your own Spark Streaming receiver. You can start here.
  • In your Spark app, start a thread that fetches the URL periodically and open a socket to send the contents, then connect to that socket (e.g. .socketTextStream(127.0.0.1, 9999)).

There are a lot of variations and a few more advanced solutions, but I would say these are the more convenient.