Why systemctl doesn't return a value in an NRPE check?

916 Views Asked by At

I have a problem with an NRPE check that I wrote.

It's a simple shell script that run "systemctl is-active [service_name]" and return the value to our Thruk.

When I run the script directly with the user nrpe, it works :

-bash-4.2$ /usr/lib64/nagios/plugins/check_service_active.sh --service dynflowd
dynflowd
Service dynflowd démarré

But when I run it with NRPE, locally, it tells me that the service is stopped :

-bash-4.2$ ./check_nrpe -H 127.0.0.1 -c check_service_active -a 'dynflowd'
dynflowd
Service dynflowd arrêté

After multiple tests, I figure out that it's linked to the systemctl command. When I replace systemctl by another command like "echo", it works.

So I think there is something with NRPE and systemctl but I can't find what ? And I don't find anything about it on Google.

So here I am !

Thank you in advance for your reply and sorry if I'm not understandable enough.

Here's my script :

#!/bin/sh
#
# Script d'interrogation d'un service via systemctl

# Nagios return codes
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
STATE_DEPENDENT=4

#Recuperation des parametres
while test -n "$1"; do
        case "$1" in
                --service)
                        SERV=$2
                        shift
                        ;;

                -u)
                        print_usage
                        exit $STATE_OK
                        ;;
        esac
        shift
done

STAT=$(systemctl is-active $SERV)

if [[ $STAT  == "active" ]]
then
        echo "Service $SERV démarré"
        exit $STATE_OK
else
        echo "Service $SERV arrêté"
        exit $STATE_CRITICAL
fi
2

There are 2 best solutions below

0
Grimmj0w On BEST ANSWER

I finally find the problem : NRPE version !!!

On my server, NRPE is in nrpe-3.2.1-6.

I run my script via NRPE on another server and it works.

This other server runs nrpe-3.2.1-8.

So the solution is : updating !

Thank you for your time and ideas, especially the >> /tmp/paxdebug.dynflowd 2>&1 idea which help me figured out the problem.

6
paxdiablo On

Okay, similar to cron jobs, it may be that NRPE (the server) runs with a different environment to your shell, and that distinct environment is somehow not running systemctl properly.

An easy way to see this is to modify the:

STAT=$(systemctl is-active $SERV)

line temporarily so you can see what's happening. Change the script so that line now becomes:

(
    echo ==== $(date) ==== ${SERV}
    systemctl is-active $SERV
) >> /tmp/paxdebug.dynflowd 2>&1
STAT=$(systemctl is-active $SERV)

That will, as well as running the script to get the status, write some useful information to the /tmp/paxdebug.dynflowd file, which you can then examine to see exactly what's happening in the NRPE-started instance of the script.

Hopefully, it'll say something simple like Cannot find systemctl (indicating path problems) but, whatever it gives you, it should help toward figuring out exactly what the problem is.


Update 1: based on your comments that attempting to run systemctl resulted in:

systemctl: command not found

That's almost certainly because the path is wrong. You can check the path by adding the following line into that debug code I posted:

echo "PATH is [$PATH]"

To fix it, either modify your path in the script to include /usr/bin (assuming that's where systemctl resides) or just run the absolute path (in both the debug and original areas):

/usr/bin/systemctl is-active ${SERV}
STAT=$(/usr/bin/systemctl is-active ${SERV})

Update 2: based on your comments that, with the absolute path being used, you now get:

/usr/lib64/nagios/plugins/check_service_active.sh: line 32:
    /usr/bin/systemctl: Permission denied

This is likely to be NRPE running at a low privilege level, or as a different user to provide security from attacks. Given how central systemd is to the running of a system, it would be unwise to allow unfettered access to it.

So, similar to the previous update, add the following to the debug area:

/bin/ls -al /usr/bin/systemctl # Check "ls" is in this directory first.
/usr/bin/id                    # Ditto for "id".

The first line will get you the permissions, the second will get you your user details. At that point, it becomes an exercise in figuring out how to run systemctl without violating security.

If it turns out this is a permission or user issue, one possibility would be to provide a well-secured setuid script which would be owned by (and hence run as) a user that's allowed to run systemctl. But I really mean well-secured, since you don't want to open up a hole:

# SysCtlIsActive.sh: only allows certain services to be queried.

# Limit to these ones (white-space separated).

allowed="dynflowd"

# If not allowed, reject with special status.

result="GoAway"
for service in ${allowed} ; do
    [[ "$1" = "${service}" ]] && result=""
done

# If it IS allowed, get actual status.

[[ -z "${result}" ]] && result="$(/usr/bin/systemctl is-active "$1")"

echo "${result}"

There may be other methods (and they may be better) but that should hopefully be a good start if that is indeed the problem.


Just be aware that I think setuid is ignored for shell scripts that have the shebang line (like #!/usr/bin/env bash) so you may have to work around that, possibly by building a real executable file to do this work.

If you do have to build a real executable for it, you can start with the following C code, which is an adaptation of the shell script above:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    // Check service name provided.

    if (argc < 2) {
        puts("NoServiceProvided");
        return 1;
    }

    // Check service name allowed.

    static char *allowed[] = { "dynflowd", NULL };
    int isAllowed = 0;
    for (char **service = &(allowed[0]); *service != NULL; service++) {
        if (strcmp(*service, argv[1]) == 0) {
            isAllowed = 1;
            break;
        }
    }
    if (! isAllowed) {
        puts("InvalidServiceName");
        return 1;
    }

    // Try to allocate memory for command.

    char *prefix = "/usr/bin/systemctl is-active ";
    char *cmdBuff = malloc(strlen(prefix) + strlen(argv[1]) + 1);
    if (cmdBuff == NULL) {
        puts("OutOfMemory");
        return 1;
    }

    // Execute command, free memory, and return.

    sprintf(cmdBuff, "%s%s", prefix, argv[1]);
    system(cmdBuff);
    free(cmdBuff);

    return 0;
}