Handle pycurl hang on Twitter streaming api

2k Views Asked by At

I am using pycurl to connect to the twitter streaming API.

This works well but sometimes after running for a few hours it will stop hang indefinitely, not throwing any exceptions. How can I detect/handle a hang in this script?

import pycurl, json

STREAM_URL = "http://stream.twitter.com/1/statuses/filter.json"

USER = "presidentskroob"
PASS = "12345"

def on_receive(data):
  print data

conn = pycurl.Curl()
conn.setopt(pycurl.USERPWD, "%s:%s" % (USER, PASS))
conn.setopt(pycurl.URL, STREAM_URL)
conn.setopt(pycurl.WRITEFUNCTION, on_receive)
conn.perform()
4

There are 4 best solutions below

0
On BEST ANSWER

FROM: http://man-wiki.net/index.php/3:curl_easy_setopt

CURLOPT_LOW_SPEED_LIMIT - Pass a long as parameter. It contains the transfer speed in bytes per second that the transfer should be below during CURLOPT_LOW_SPEED_TIME seconds for the library to consider it too slow and abort.

and

CURLOPT_LOW_SPEED_TIME - Pass a long as parameter. It contains the time in seconds that the transfer should be below the CURLOPT_LOW_SPEED_LIMIT for the library to consider it too slow and abort.


Example:

conn.setopt(pycurl.LOW_SPEED_LIMIT, 1)
conn.setopt(pycurl.LOW_SPEED_TIME, 90)
1
On

You can use the timeout settings:

 conn.setopt(pycurl.CONNECTTIMEOUT, 15) 
 conn.setopt(pycurl.TIMEOUT, 25) 

You'll get a pycurl.error exception if curl times out.

2
On

I have a premonition that this could be related to "tcp broken pipe" scenario. I.e. the other peer at some moment closes the connection, but our peer somehow ignores the event. You will need to use some kind of keep-alives to deel with this.

The "right", elegant solution of the problem may require some actions from twitter itself. This is rather common issue; my friend have used the streaming api and encountered the same problem.

1
On

The curl switch --speed-limit allows you to have curl return an error if the transfer speed dips below a given threshold for a given length of time. Unfortunately, the speed threshold cannot be set to values less than one, and the ideal value for the Twitter Streaming API would be 1/30 since it sends a single character every 30 seconds for its keep alive. The best you can do is used a threshold of 1 Bps, but then curl will give up whenever there is a period of inactivity (no tweets) longer than the duration you select. The command below will give up if there is a 30 second period during which it receives less than 30 bytes.

curl -d @filter.txt https://stream.twitter.com/1/statuses/filter.json -uTwitterLogin:TwitterPassword --speed-time 30 --speed-limit 1

To summarize: no satisfactory solution using just the options in of curl.