Generate exactly 1 Flowfile

2.5k Views Asked by At

I'm using the GenerateFlowFile processor in Apache Nifi - When I activate it, I want the processor to create exactly 1 Flowfile.

Right now I use the REST API via Python to change the state to RUNNING, wait 0.5 seconds and change the state to STOPPED. This results in 1 FlowFile being added to the queue to the next processor.

I tested a bit and waiting for 1.5 seconds gives me 2 FlowFiles, 2.5 seconds gives me 3 FlowFiles - I'm guessing the processor generates one Flowfile each second it is running.

How can I ensure that exactly 1 Flowfile is being generated? The above method obviously is dependent on the network connection and roundtrip times. Worst case: the connection drops while I wait and I cannot stop the processor anymore and x Flowfiles are being generated.

My current configs are:

Settings:

Yield duration: 1 sec
Penalty Duration: 30sec
Bulletin Level: WARN

Scheduling:

Scheduling Strategy: CRON driven 
Concurrent Tasks: 1 
Run Schedule: * * * * * ?
Execution: All nodes
Run duration: 0ms 

Properties:

File Size: 0B
Batch Size: 1
Data Format: Text
Unique FlowFiles: false
Custom Text: No value set
Character Set: UTF-8
Mime Type: No value set
1

There are 1 best solutions below

0
On BEST ANSWER

You'll want to flag the GenerateFlowFile as Primary node only (assuming you have more than 1 node) to ensure each node is not generating its own FlowFile.

Set the Scheduling to Timer and whack the run schedule up to something like 604800 (1 week) - this means that it even if you leave the processor running, it's only going to run once a week - that should give you plenty time to fix a connectivity issue if your script can't connect to tell the processor to stop.

Keep concurrency at 1.