What is the best way to pass XML files (of size 500-600 KB) as Kafka messages?

1.4k Views Asked by At

I want to read streaming XML files and parse them in Apache Storm. I am using Kafka as MQ system to queue the XML files of size ~ 500 KB. I want to pass a whole file as a message to KafkaSpout. How should I go about it?

2

There are 2 best solutions below

0
On BEST ANSWER

Just go ahead and pass the whole file. Based on the benchmark from linkedin (I added the relevant details).


I have mostly shown performance on small 100 byte messages. Smaller messages are the harder problem for a messaging system as they magnify the overhead of the bookkeeping the system does. We can show this by just graphing throughput in both records/second and MB/second as we vary the record size.

enter image description here

So, as we would expect, this graph shows that the raw count of records we can send per second decreases as the records get bigger. But if we look at MB/second, we see that the total byte throughput of real user data increases as messages get bigger:

enter image description here

We can see that with the 10 byte messages we are actually CPU bound by just acquiring the lock and enqueuing the message for sending—we are not able to actually max out the network. However, starting with 100 bytes, we are actually seeing network saturation (though the MB/sec continues to increase as our fixed-size bookkeeping bytes become an increasingly small percentage of the total bytes sent).

0
On

There's nothing wrong with sending the XML file as is. Given the size of the payload, you might want to look at the compression options, but sending XML is not going to cause problems.