OOZIE coordinator jobs always showing in RUNNING state - max concurrency reached

15 Views Asked by At

I have submitted a coordinator job where my workflow.xml is -

<workflow-app xmlns="uri:oozie:workflow:0.5" name="my_workflow">
<start to="abc"/>
  <action name='abc'>
    <shell xmlns="uri:oozie:shell-action:0.1">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <exec>sh</exec>
      <argument>script.sh</argument>
      <file>/user/nirmalya/share/lib/script.sh</file>
    </shell>
    <ok to="end" />
    <error to="fail" />
  </action>
  <kill name="fail">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <end name="end"/> 
</workflow-app>

and coordinator.xml -

<coordinator-app xmlns="uri:oozie:coordinator:0.1" name="MyShellScriptCoordinator" frequency="2 * * * *" start="2024-02-01T00:00Z" end="2024-02-02T00:00Z" timezone="UTC">
    <controls>
        <timeout>-1</timeout>
        <concurrency>1</concurrency>
        <execution>FIFO</execution>
    </controls>
    <action>
        <workflow>
            <app-path>/user/nirmalya/oozie_workflows/workflow.xml</app-path>
        </workflow>
    </action>
</coordinator-app>

and my script.sh -

PYTHON_INTERPRETER="/usr/bin/python3.8"

# Set the path to the directory containing your Python script
SCRIPT_DIR="/usr/local/hadoop"

# Set any additional environment variables or configurations needed
export PYTHONPATH=$SCRIPT_DIR:$PYTHONPATH
export FLASK_APP=script.py  # Replace with your actual script name

# Run your Python script
$PYTHON_INTERPRETER $SCRIPT_DIR/script.py

and script.py -

#!/usr/bin/env python3.8

from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello_world():
    print("HELLO WORLD CALLED !!!!!!!")
    return 'Hello, world'

hello_world()

if __name__ == "__main__":
    app.run(debug=True) 

# print(flask.__version__)

So, when I execute this coordinator job 3 types of messages are showing and my coordinator job remains always in a RUNNING state.

  1. E1100: Command precondition does not hold before execution, [workflow's status is RUNNING is not SUSPENDED], Error Code: E1100

  2. 2024-02-08 07:03:23,294 WARN CoordActionReadyXCommand:544 - SERVER[nirmalya-Lenovo-V15-G2-ITL-Ua] USER[nirmalya] GROUP[-] TOKEN[] APP[MyShellScriptCoordinator] JOB[0000000-240208062749401-oozie-nirm-C] ACTION[-] No actions to start for jobId=0000000-240208062749401-oozie-nirm-C as max concurrency reached!

  3. When I check the nodemanager logs - then it is showing under stderr -

WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

  • Running on http://127.0.0.1:5000 Press CTRL+C to quit
  • Restarting with stat
  • Debugger is active!
  • Debugger PIN: 219-115-920

and in stdout it is showing (Heart beap)

Invoking Shell command line now >>

Stdoutput HELLO WORLD CALLED !!!!!!! Stdoutput * Serving Flask app 'script' Stdoutput * Debug mode: on Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat Heart beat

I have searched few links and I think some yarn memory related issue is there.

I want to see that coordinator job runs successfully.

0

There are 0 best solutions below