Collecting and Processing data with PHP (Twitter Streaming API)

2.3k Views Asked by At

after reading through all of the twitter streaming API and Phirehose PHP documentation i've come across something I have yet to do, collect and process data separately.

The logic behind it, If I understand correctly, is to prevent a log jam at the processing phase that will back up the collecting process. I've seen examples before but they basically write right to a MySQL database right after collection which seems to go against what twitter recommends you do.

What I'd like some advice/help on is, what is the best way to handle this and how. It seems that people recommend writing all the data directly to a text file then parsing/processing it with a separate function. But with this method, I'd assume it could be a memory hog.

Here's the catch, it's all going to be running as a daemon/background process. So does anyone have any experience with solving a problem like this, or more specifically, the twitter phirehose library? Thanks!

Some notes: *The connection will be through a socket so my guess is that the file will constantly be appended? not sure if anyone has any feedback on that

1

There are 1 best solutions below

0
On

The phirehose library comes with an example of how to do this. See:

This uses a flat file, which is very scalable and fast, ie: Your average hard disk can write sequentially at 40MB/s+ and scales linearly (ie: unlike a database, it doesn't slow down as it gets bigger).

You don't need any database functionality to consume a stream (ie: you just want the next tweet, there's no "querying" involved).

If you rotate the file fairly often, you will get near-realtime performance (if desired).