I am trying to refresh the .tmp file with additional events in every 5 minutes, my source is slow and it takes 30 min to get 128MB file in my hdfs sink.
Is there any property in flume hdfs sink where I can control the refresh rate of .tmp file before the file is rolled into HDFS.
I need this to see the data in HDFS using hive table from the .tmp file.
Currently I am viewing the data from .tmp file using hive table but the .tmp file is not refreshing for a long time as the roll size is 128MB.
Consider decreasing your channel's capacity and transactionCapacity settings:
These settings are responsible for controlling how many events get spooled before they are flushed to your sink. If you lower that to 10 for instance, every 10 events will be flushed to your tmp file.
The second value you will need to change the batchSize in your hdfs sink:
The default value of 100 will probably be too high if you have a very slow source and you want to see events more often.