Flume not closing all files when adding it successively

Question

Flume not closing all files when adding it successively

42 Views Asked by Astora At 06 February 2023 at 16:38

Here is my flume conf

agent.sinks = s3hdfs
agent.sources = MySpooler
agent.channels = channel

agent.sinks.s3hdfs.type = hdfs
agent.sinks.s3hdfs.hdfs.path = s3a://testbucket/test
agent.sinks.s3hdfs.hdfs.filePrefix = FilePrefix
agent.sinks.s3hdfs.hdfs.writeFormat = Text
agent.sinks.s3hdfs.hdfs.fileType = DataStream
agent.sinks.s3hdfs.channel = channel
agent.sinks.s3hdfs.hdfs.useLocalTimeStamp = true
agent.sinks.s3hdfs.hdfs.rollInterval = 0
agent.sinks.s3hdfs.hdfs.rollSize = 0
agent.sinks.s3hdfs.hdfs.rollCount = 0
agent.sinks.s3hdfs.hdfs.idleTimeout = 15

agent.sources.MySpooler.channels = channel
agent.sources.MySpooler.type = spooldir
agent.sources.MySpooler.spoolDir = /flume_to_aws
agent.sources.MySpooler.fileHeader = false
agent.sources.MySpooler.deserializer.maxLineLength = 110000

agent.channels.channel.type = memory
agent.channels.channel.capacity = 100000000

When I add a file in /flume_to_aws and wait for it, it is uploaded in amazon s3 and file is closed normally.

[root@de flume_to_aws]# cp /tmp_flume/globalterrorismdb_0522dist.00001.csv .

log:

06 Feb 2023 14:02:11,802 INFO  [hdfs-s3hdfs-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing s3a://testbucket/test/FilePrefix.1675699321675.tmp
06 Feb 2023 14:02:13,599 INFO  [hdfs-s3hdfs-call-runner-4] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming s3a://testbucket/test/FilePrefix.1675699321675.tmp to s3a://testbucket/test/FilePrefix.1675699321675

But when I add several files without wait, it does not upload all files

ie:

[root@de flume_to_aws]# cp /tmp_flume/globalterrorismdb_0522dist.00001.csv .
[root@de flume_to_aws]# cp /tmp_flume/globalterrorismdb_0522dist.00002.csv .
[root@de flume_to_aws]# cp /tmp_flume/globalterrorismdb_0522dist.00003.csv .

log (only one file).

06 Feb 2023 14:02:27,842 INFO  [hdfs-s3hdfs-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.doClose:438)  - Closing s3a://testbucket/test/FilePrefix.1675699338165.tmp
06 Feb 2023 14:02:31,411 INFO  [hdfs-s3hdfs-call-runner-0] (org.apache.flume.sink.hdfs.BucketWriter$7.call:681)  - Renaming s3a://testbucket/test/FilePrefix.1675699338165.tmp to s3a://testbucket/test/FilePrefix.1675699338165

In s3 I only see one file. Why this happen?

Original Q&A

There are 1 best solutions below

**Astora** · Answer 1 · 2023-02-06T23:07:18.553000

I misunderstood the concept.

Actually, it is working fine. Flume seems to work doing something called "roll". Those 3 files are rolled together, especially because those 3 parameters.

agent.sinks.s3hdfs.hdfs.rollInterval = 0
agent.sinks.s3hdfs.hdfs.rollSize = 0
agent.sinks.s3hdfs.hdfs.rollCount = 0

Since there is no interval to roll (rollInterval), no size to roll (rollSize) and no event count to roll (rollCount), it will roll those files together and store all the files in a single file in s3 after the timeout agent.sinks.s3hdfs.hdfs.idleTimeout = 15.

In my case, now I am using agent.sinks.s3hdfs.hdfs.rollSize = 2097152, so it will roll when the file reaches 2mb. In this case the size of those three files are:

[root@de flume_to_aws]# du -sk /tmp_flume/globalterrorismdb_0522dist.00001.csv
1532    /tmp_flume/globalterrorismdb_0522dist.00001.csv
[root@de flume_to_aws]# du -sk /tmp_flume/globalterrorismdb_0522dist.00002.csv
1040    /tmp_flume/globalterrorismdb_0522dist.00002.csv
[root@de flume_to_aws]# du -sk /tmp_flume/globalterrorismdb_0522dist.00003.csv
908     /tmp_flume/globalterrorismdb_0522dist.00003.csv

1532kb + 1040kb + 908kb = 3,480 (3.4mb)

As I am setting it to roll after 2mb, it will store 2 files in s3.

as we can see, the size of the files in s3 match with the above sum.

2mb + 1.4mb = 3.4mb

Please, I just leaned that. Leave a feedback if something is wrong.

Flume not closing all files when adding it successively

There are 1 best solutions below

Related Questions in AMAZON-S3

Related Questions in FLUME

Related Questions in FLUME-NG

Trending Questions

Popular # Hahtags

Popular Questions