Why does the Linux kernel not stop at the first handler for a shared IRQ that returns IRQ_HANDLED?

845 Views Asked by At

I'm sure there's a good reason for this, but I can't see what it is. Inside __handle_irq_event_percpu the kernel loops over all the handlers registered for a particular IRQ line and calls it. What I don't understand is why this loop isn't exited when the first handler returning IRQ_HANDLED is reached? It seems like a simple performance improvement, so there must be something I don't understand.

Does anyone know why?

2

There are 2 best solutions below

1
On BEST ANSWER

In the Linux source tree, __handle_irq_event_percpu() is in kernel/irq/handle.c:

irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int *flags)
{
    irqreturn_t retval = IRQ_NONE;
    unsigned int irq = desc->irq_data.irq;
    struct irqaction *action;

    record_irq_time(desc);

    for_each_action_of_desc(desc, action) {
        irqreturn_t res;

        trace_irq_handler_entry(irq, action);
        res = action->handler(irq, action->dev_id);
        trace_irq_handler_exit(irq, action, res);

        if (WARN_ONCE(!irqs_disabled(),"irq %u handler %pS enabled interrupts\n",
                  irq, action->handler))
            local_irq_disable();

        switch (res) {
        case IRQ_WAKE_THREAD:
            /*
             * Catch drivers which return WAKE_THREAD but
             * did not set up a thread function
             */
            if (unlikely(!action->thread_fn)) {
                warn_no_thread(irq, action);
                break;
            }

            __irq_wake_thread(desc, action);

            /* Fall through - to add to randomness */
        case IRQ_HANDLED:
            *flags |= action->flags;
            break;

        default:
            break;
        }

        retval |= res;
    }

    return retval;
}

The for_each_action_of_desc(desc, action) macro travels in the action list of the IRQ descriptor:

#define for_each_action_of_desc(desc, act)          \
    for (act = desc->action; act; act = act->next)
[...]
struct irq_desc {
    struct irq_common_data  irq_common_data;
    struct irq_data     irq_data;
    unsigned int __percpu   *kstat_irqs;
    irq_flow_handler_t  handle_irq;
    struct irqaction    *action;    /* IRQ action list */
[...]
struct irqaction {
    irq_handler_t       handler;
    void            *dev_id;
    void __percpu       *percpu_dev_id;
    struct irqaction    *next;
    irq_handler_t       thread_fn;
    struct task_struct  *thread;
    struct irqaction    *secondary;
    unsigned int        irq;
    unsigned int        flags;
    unsigned long       thread_flags;
    unsigned long       thread_mask;
    const char      *name;
    struct proc_dir_entry   *dir;
} ____cacheline_internodealigned_in_smp;

There are multiple entries in the action list if the interrupt line is shared by several devices. So, several devices may enter in interrupt state at the same time. Hence, the action is to be called for all the devices sharing the line to check if there is something to do.

N.B.:

  • This answer is better argumented on the subject
  • This blog article depicts the steps of interrupt handling in the Linux kernel.
0
On

Inside __handle_irq_event_percpu the kernel loops over all the handlers registered for a particular IRQ line and calls it. What I don't understand is why this loop isn't exited when the first handler returning IRQ_HANDLED is reached? It seems like a simple performance improvement, so there must be something I don't understand.

There are 2 cases to consider - shared edge triggered IRQs and shared level triggered IRQs.

Shared Edge Triggered IRQs

In this case, 2 or more devices can send an IRQ at the same time or at similar times. If this happens and the "for each driver" loop is exited when the first handler returns IRQ_HANDLED then other devices can/will become stuck in a "waiting for IRQ handler's attention" state (most likely causing devices to lock up permanently). To avoid that, for edge triggered IRQs, the kernel's "for each driver" loop must notify all drivers (and can't stop as soon as one returns IRQ_HANDLED).

Note that shared edge triggered IRQs are rare. For 80x86 PCs it's possible when there are more than 2 serial port controllers (which can be solved by using the same driver for all serial port controllers and dealing with the problem in the driver and not in the kernel's IRQ management code), but apart from that shared edge triggered IRQs simply don't exist (on 80x86 PCs).

Shared Level Triggered IRQs

In this case, 2 or more devices can send an IRQ at the same time or at similar times; but if this happens and the "for each driver" loop is exited when the first handler returning IRQ_HANDLED then the other IRQs (from other devices) are not lost. Instead, the interrupt controller will see "level is still being triggered by at least one device" and will re-issue the IRQ (and keep sending more IRQs until all devices are satisfied).

For shared level triggered IRQs, it's a performance compromise (that has nothing to do with "correctness"). More specifically:

  • If it's very likely that multiple devices will want attention at the same or similar time; then you can improve performance by continuing the loop (when a driver returns IRQ_HANDLED) because it's likely that this will avoid the cost of the interrupt controller re-issuing the IRQ.

  • If it's very unlikely that multiple devices will want attention at the same or similar time; then you can improve performance by stopping the loop as soon as driver returns IRQ_HANDLED because it's likely that this will avoid the cost of the executing unnecessary device drivers' interrupt handlers.

Note that this depends on the order that device drivers' IRQ handlers are called. To understand this imagine there are 2 devices sharing an IRQ line and almost all IRQs come from the first device. If the first device's driver's IRQ handler is called first and returns IRQ_HANDLED then it'd be unlikely that the second device also sent an IRQ at the same time; but if the second device's driver's IRQ handler is called first and returns IRQ_HANDLED then it'd be likely that the second device also sent an IRQ at the same time.

In other words; if the kernel sorted the list of device drivers in order of "chance the device sent an IRQ"; then it becomes more likely that stopping the loop as soon as a driver returns IRQ_HANDLED will improve performance (and it becomes more likely that the first driver called will return IRQ_HANDLED sooner).

However tracking statistics and "being smarter" (determining how to optimize performance dynamically based on those statistics) would also add a little overhead, and (at least in theory, especially if device drivers' interrupt handlers are extremely fast anyway) this could cost more performance than you'd gain.

Essentially; it'd take a lot of work (research, benchmarking) to quantify and maximize the potential benefits; and it's a lot easier to not bother (and always call all device driver's interrupt handlers" even when it is worse).