Interrupt handler (top-half) in the Linux kernel – part 1

1. Preface
Hi all,

While I’m continuing to get a grasp about the Linux kernel I have now reached the point (chapter 7 in the Robert Love‘s amazing book) that I study about interrupts and interrupts handlers. In this post I will introduce the basics from what I have learned about interrupts and interrupts handlers, in particular, as the name of this post says, here the focus will be on the “first half” (chronological manner) of the interrupt handler.

2. Introduction and motivation for Interrupts:
One of the core capabilities that any OS needs is the ability to enable the (hardware) devices that are connected to the machine – to “interrupt her” (i.e.- let her know about something).
One could ask – why does the OS need this kind of capability in the first place ? The naive answer (solution) is that the OS can manage “just fine” without this capability – by polling the hardware devices to check whether a task that was “handed” to them was done (completed) or not, YET this approach will cause the system to suffer from low utilization of the CPU – lots of CPU cycles will be wasted on simply polling the hardware device(s) “only” to receive (in most of the times) the answer:”No sir, I’m not done yet…”.
This is where interrupts comes to the rescue !!
Interrupts enable all the different hardware devices to signal to the processor whenever they “have something for it” and/or “they are done performing the task they were doing” – for instance, the network card, just finished to receive packets from the network interface, and it wants to “hand it” to the kernel for further processing.
The interrupts are, eventually, signals that the hardware devices send to the processor (HW meaning) – meaning they are actual electrical signals that are being sent on the processor interrupt lines (on the mother board). The processor than, receives the interrupt (signal) and indicates about it to the OS (kernel) so the OS in return can respond to the new data (or more generally, event) – by invoking the corresponding interrupt handler (to the exact device that just sent this interrupt).By implementing all of this “mechanism” the OS & processor enables us to use the “interrupt mechanism”.

3. Interrupt controller and Advanced interrupt controller:
Let us shortly describe the two (old and new) hardware mechanisms that enables the actual “dispatching” of the electrical signal from the different devices towards the CPU.
a. Programmable Interrupt Controller (PIC – old solution):
The old mechanism used a fixed number of 16 lines that were connected to effectively 15 different “devices” all of which were than connected to the master controller – which is essentially a simple chip on the motherboard. The main (only ?) task of the master chip is to multiplex the (effectively) 15 lines of the different devices into a single “contact point” with the CPU.  This is how the signals were passed on to the CPU. Note that in the picture below there are actually 16 lines, yet one of them (line number 2 of the Master PIC) is allocated for the communication with the Slave PIC.

b. Advanced Programmable Interrupt Controller (APIC – new solution):
A more modern mechanism that is used nowadays is the APIC, which overcomes the main issue of the PIC by enabling more “virtual” interrupt lines (up to 2048).
Here, all the different hardware devices share the same shared bus and use it to indicate that they need the CPU’s attention. This shared bus is than connected to the APIC, which acts the same (more or less) as the Master chip in the PIC mechanism – it buffers and passes on the interrupt signals of the different devices towards the CPU.
Either way (utilizing the PIC or the APIC), once the CPU (processor) is aware of the interrupt from a specific interrupt line, it can than indicate to the OS about this interrupt, in order for her to invoke the respective Interrupt Service Routine (also known as ISR) for the respective device (again, in the APIC case, the interrupt handler should also verify that indeed the device it serves was the actual device that generated this interrupt).

An important note:
– Just to be clear about it, so far we have talked ONLY about interrupts that are caused due to “attention” that some physical device (outside of the CPU chip) needs. This is the main fact for the reason that these interrupts are also known as “hardware interrupts“.
– It is important to note, that these interrupts (hardware interrupts) can occur at ANY given moment, not necessarily with synchronization to the CPU clock !! Due to this fact, hardware interrupts also known as “Asynchronous interrupts” – which is in contrast to “Exceptions”, which are also known as “Synchronous interrupts

4. Interrupts Service Routine (ISR):
This is the C function that the kernel runs upon an interrupt from a specific device (each device has it own ISR pre-defined).
As mentioned earlier, it is divided into two “halves”.
Top and Bottom halves.
The first one that takes place is the “Top” part.
As an example of illustrate the “top-half/bottom-half” dichotomy, we can use the network card (again). When network cards receive packets from the network, they need to alert the kernel of their availability. They want and need to do this immediately, to optimize network throughput and latency and avoid timeouts. Thus, they immediately issue an interrupt – to be more precise, the “top half” of the network card interrupt.  The kernel, if so, indeed responds by executing the network card’s registered interrupt service routine.
The interrupt runs, acknowledges the hardware, copies the new networking packets
into main memory (from the network card “on chip” memory buffer), and readies the network card for more packets. Due to the fact that these tasks are the important, time-critical, and hardware-specific work – they need to be done quickly – in this case, we need to copy the networking packet into main memory because the network data buffer on the networking card is fixed and minuscule in size, particularly compared to main memory. After the networking data is safely in the main memory, the critical (top half) interrupt’s job is done, and it can return control of the system to whatever code was interrupted when the interrupt was generated. Further processing of the networking packets can be delayed to a “more convenient time” later on – this will be the task of the “bottom half” of the ISR.

5. Register an interrupt handler:
As mentioned earlier, interrupt handlers are relevant for devices – therefor, a driver of a particular device is the entity that is responsible to register one interrupt handler for that device “within” the kernel.
In order to do so, upon initialization of the driver, a call to the following method must take place:

int request_irq(unsigned int irq,
                irq_handler_t handler,
                unsigned long flags,
                const char *name,
                void *dev)

a. irq – specifies the interrupt number to allocate. For some devices,
for example legacy PC devices such as the system timer or keyboard, this value is typically hard-coded and well known. For most other devices, it is probed or otherwise determined pro-grammatically and dynamically.
b. handler – this is the pointer to the actual “top-half” interrupt handler function that the kernel will invoke upon receiving an hardware interrupt from that device.
c. flags –  this argument can be either zero or a bit mask of one or more of the flags
defined in <linux/interrupt.h>.Among these flags, the most important are:
IRQF_DISABLED – When set, this flag instructs the kernel to disable all interrupts
when executing this interrupt handler.When unset, interrupt handlers run with all
interrupts except their own enabled. Most interrupt handlers do not set this flag, as
disabling all interrupts is bad form. Its use is reserved for performance-sensitive interrupts that execute very (!!) quickly (thus decreasing dramatically the possibility that another interrupt will be invoked during the other one is running).
IRQF_SHARED – This flag specifies that the interrupt line (that this device uses) can be shared among multiple interrupt handlers (of other devices). Each handler which registered on a given (shared) line like this, must specify this flag as well, otherwise, only one handler can exist per line.
d. name – is an ASCII text representation of the device associated with the interrupt.
e. dev – is used for shared interrupt lines.When an interrupt handler is freed, dev provides a unique cookie to enable the removal of only the desired interrupt handler from the interrupt line.Without this parameter, it would be impossible for the kernel to know which handler to remove on a given shared interrupt line. You can pass NULL here if the line is not shared, but you must pass a unique cookie if your interrupt line is shared. This pointer is also passed into the interrupt handler on each invocation. A common practice is to pass the driver’s device structure: This pointer is unique and might be useful to have within the handlers.
– On success,  request_irq() returns zero.
– A nonzero value indicates an error, in which case the specified interrupt handler was not registered. A common error is -EBUSY, which denotes that the given interrupt line is already in use (and either the current user or you did not specify IRQF_SHARED).
– request_irq() can sleep and therefore cannot be called from interrupt
context or other situations where code cannot block.

6. Un-register (free) the interrupt handler:
When a driver unloads, it needs to un-register its interrupt handler and potentially disable the interrupt line (in case this line was shared and this device was the last device that shared it).To do this, we call the following function:

void free_irq(unsigned int irq, void *dev)

– If the specified interrupt line is not shared, this function removes the handler and “automatically” also disables the line – there are no other interrupt handlers sharing this line, so we can disable this line “right away”.
– If the specified interrupt line is shared, only the handler identified via dev is removed (as in the former case), but the interrupt line is disabled only when the last handler is removed.With shared interrupt lines, a unique cookie is required to differentiate between the multiple handlers that can exist on a single line and enable free_irq() to remove only the correct handler. In either case (shared or non shared), if dev is non-NULL, it must match the desired handler.
NOTE: A call to free_irq() must be made from process context !!

IMPORTANT NOTE ABOUT REENTRANCY: Interrupt handlers in Linux need not be re-entrant. When a given interrupt handler is executing, the corresponding interrupt line is masked out on all processors, preventing another interrupt on the same line from being received (normally all other interrupts are enabled, so other interrupts are serviced). Consequently, the same interrupt handler is never invoked concurrently to service a nested interrupt. This greatly simplifies writing your interrupt handler.

7. Shared handlers:
– A shared handler is an interrupt handler for some device, that is registered on the same line that (at least) additional other device(s) used to register their own interrupt handler as well – meaning, they “share” the interrupt line (IRQ). The motivation for that is quiet obvious – it enables to register more devices (ISRs) than the actual number of “real” interrupt lines.
– Shared handlers registered and executed much like a non shared handler. Following are three main differences:
a. The IRQF_SHARED flag must be set in the flags argument to request_irq().
b. The dev argument must be unique to each registered handler.A pointer to any
per-device structure is sufficient; a common choice is the device structure as it is
both unique and potentially useful to the handler.You cannot pass NULL for a shared
c. The interrupt handler itself must be capable of distinguishing whether its device actually generated an interrupt.This requires both hardware support and associated logic in the interrupt handler (software support). If the hardware did not offer this capability, there would be no way for the interrupt handler to know whether its associated device or some other device sharing the line caused the interrupt. On most systems this is possible via a special “status” register that indicates that the specific device indeed caused the interrupt, which is than checked in the beginning of the “top-half” ISR.

– From “drivers’s point of view”, all drivers sharing the same interrupt line must meet the previous requirements. If any one device does not share “fairly“, none can share the line.
– When request_irq() is called with IRQF_SHARED specified, the call succeeds only if the interrupt line is currently not registered, or if all registered handlers on the line also specified IRQF_SHARED as well. Shared handlers, however, can mix usage of IRQF_DISABLED.
NOTE: See an important note about concurrent handling of different ISRs on a shared IRQ at the Resources section below.
– From “kernel point of view”, when the kernel receives an interrupt from a shared line, it invokes sequentially each registered handler on the line.Therefore, it is important that the handler be capable of distinguishing whether it generated a given interrupt.The handler must quickly exit if its associated device did not generate the interrupt.This requires both hardware and software support mentioned earlier.

8. Implementing an interrupt handler:
The following is a declaration of an interrupt handler:

static irqreturn_t intr_handler(int irq, void *dev)

a. irq is the numeric value of the interrupt line the
handler is servicing.This value is passed into the handler, but it is not used very often (actually it had a meaning in older versions of the Linux kernel, but now there is no added value with it).
b. dev – is a generic pointer to the same dev that was given to request_irq() when the interrupt handler was registered. If this value is unique (which is required to support shared interrupt line), it can also act as a cookie, to differentiate between multiple devices potentially using the same interrupt handler.

The return value of an interrupt handler is the special type irqreturn_t – which is basically an int .An interrupt handler can return two special values:
IRQ_NON: is returned when the interrupt handler detects an interrupt for which its device was not the originator  so it should (must) not handle it.
IRQ_HANDLED: is returned if the interrupt handler was correctly invoked, and its device did indeed cause the interrupt.
These special values are used to let the kernel know whether devices are issuing spurious (un-requested) interrupts. If all the interrupt handlers on the given (shared) interrupt line returned IRQ_NONE, then the kernel can detect the problem.
The interrupt handler is normally marked static because it is never called directly from another file.

9. Interrupt context:
An important distinguish to make between an interrupt handler and a “normal”
process is regarding the context they are “running in”. When a user-space process runs, it runs in its “own” context. Moreover, even if it currently runs in kernel mode, due to, for instance, some system call it invoked, it is still running in process context (for example, the current macro points to the associated task of this process in the kernel).
While on the other hand, when an interrupt (again, here it is important to emphasis that we are referring to an hardware interrupt) runs, it runs in “interrupt context”.Interrupt context, is not associated with a process.The current
macro is not relevant (although it points to the interrupted process).Without a backing process, interrupt context cannot sleep  – how would it ever reschedule? Therefore, you cannot call certain functions from interrupt context: If a function sleeps, you cannot use it from your interrupt handler.

10. Interrupt handler stack:
Early in the 2.6 kernel process, an option was added to reduce the stack size from two pages down to one, providing only a 4KB stack on 32-bit systems.This reduced memory pressure because every process on the system previously needed two pages of contiguous, non-swappable kernel memory. To cope with the reduced stack size, interrupt handlers were given their own stack, one stack per processor, one page in size.This stack is referred to as the interrupt stack.Although the total size of the interrupt stack is half that of the original shared stack, the average stack space available is greater because interrupt handlers get the full page of memory to themselves.

11. Implementing interrupt handler:
Interrupt handler implementation is platform specific in Linux kernel (much as system call are).
The generic flow of every (hardware) interrupt can be described in high level according to the next figure:

1) The hardware device generates the hardware signal towards the interrupt controller (whether it is PIC or APIC).
2) If this interrupt line is NOT disabled, the interrupt controller passes the signal on towards the CPU.
3) Unless interrupts are disabled on the processor (which can also happen), the processor immediately stops what it is doing, disables the interrupt system, and jumps to a predefined location in memory and executes the code located there.This predefined point is set up by the kernel and is the entry point for interrupt handlers. This part is platform specific (I guess it is implemented in the assembly of the specific architecture).
4) The interrupt’s journey in the kernel begins at this predefined entry point. For each interrupt line, the processor jumps to a unique location in memory and executes the code located there. In this manner, the kernel knows the IRQ number of the incoming interrupt.
The initial entry point simply saves this value and stores the current register values
(which belong to the interrupted task) on the stack; then the kernel calls do_IRQ().
do_IRQ() does several things from this point:
a) Due to the way C calling convention works, it (do_IRQ()) can get the interrupt line so it can than acknowledge the kernel about the receipt of the interrupt
and disable interrupt delivery on the line.
On normal PC machines, these operations are handled by mask_and_ack_8259A().
b) It verifies that
– Indeed there is a valid handler that is registered on this interrupt line
– This line is NOT disabled and NOT already running at this moment.
5) If all checks goes well (in step 4)  it calls handle_IRQ_event() so it will “actually” invoke the respective handler for this line. According to the way the handle_IRQ_event() is implemented (can be found here), one can note that it invokes all registered handlers for this line (i.e.- in case this line is NOT shared, this loop will have a single iteration).Finally, the function does all the required clean-ups and returns to do_IRQ().
6 +7) Back in do_IRQ(), this function calls ret_from_intr(). This routinei, as with the initial entry code, written in assembly. It checks whether a reschedule is pending. If a reschedule is pending, and the kernel is returning to user-space (that is, the interrupt interrupted a user process), schedule() is called. If the kernel is returning to kernel-space (that is, the interrupt interrupted the kernel itself), schedule() is called only if the preempt_count is zero. Otherwise it is not safe to preempt the kernel. After schedule() returns, or if there is no work pending, the initial registers are restored and the kernel resumes whatever was interrupted. After schedule() returns, or if there is no work pending, the initial registers are restored and the kernel resumes whatever was interrupted.

12. Interrupt control:
I won’t get into too many details here, but it is worth mention that Linux provides also a set of methods (interface) to control interrupts, such as local_irq_disable() and local_irq_enable() which disables and enabled respectively ALL interrupts on a specific (“local”). Again, here I won’t get into details, but it is important to be aware and know these interfaces.

a. Nice tutorial on Programmable Interrupt Controller
b. Nice tutorial  on interrupt handlers in the linux kernel
c. Nice Q&A on StackOverflow about shared IRQs
d. Important note (Q&A) regarding concurrent interrupt handling for shared IRQ in Linux

The picture: Praia Mole (Mole beach),  Florianopolis, State of Santa Caterina, Brazil.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s