Emulator Debugging: Halt During Boot

My emulator has a bug. 

I've encountered and fixed countless bugs over the course of development, but this one has haunted me since I was first able to boot the IBM 5160 BIOS, and still isn't squashed. I thought it might be interesting to blog my process of finally squashing this bug once and for all - as I start to write this blog entry, I haven't yet. So we'll see if I am successful!

The bug is that while using the IBM 5160 BIOS, if you hit a key during boot, the system will halt. Not good. Many error conditions in the BIOS POST process will produce beeps or error codes and carry on, but I suppose IBM warranted specific, early errors to be so severe that there was no point in continuing.

Background

Before any of the following will make sense, I'll need to explain a little bit of the two chips involved here, the PPI and the PIC.

The PIC

The PIC, or the Intel 8259A Programmable Interrupt Controller, manages interrupts on the IBM PC. It has 8 interrupt request lines, which are directly connected to hardware. Internal to the PIC are three 8-bit registers, the Interrupt Mask Register (IMR), the In-Service Register (ISR) and the Interrupt Request Register (IRR).  

The IRR reflects acknowledgement of each of the external IR lines. When the PIC is configured for edge-triggered operation, as it is on the PC, a rising edge of an IR line will set the corresponding bit in the IRR. Normally, this would trigger an interrupt request, unless the corresponding bit in the Interrupt Mask Register is set.  The IMR is a set of the same corresponding bits, but if a bit is set in the IMR, a bit in the IRR is basically ignored for the time being.  Once that bit in the IMR is cleared, the bit in the IRR will be recognized. In this way we can control what interrupts are to be serviced in a much more granular way than simply setting or clearing the Interrupt flag on the CPU.

The ISR or In-Service Register sets the corresponding bit when an interrupt is acknowledged by the CPU and begins service (enters its ISR).

The PPI

The PPI is the Intel 8255 Programmable Peripheral Interface. This chip has a number of capabilities, but on the IBM PC is used fairly simply - there are 3 8-bit ports on the chip we can read or write to, used to interface with certain hardware lines on the motherboard. It is used for reading motherboard DIP switches, controlling the timer and speaker, and most importantly in our case, reading data from the keyboard.

Digging In

The first thing to do, of course, is to look at the address of where we are halting.  In flat addressing, it's at location FE38E. Here's the relevant BIOS source:

;----- TEST THE IMR REGISTER

C21A:   MOV     DATA_AREA[@MFG_ERR_FLAG-DATA40],05H
                                        ; <><><><><><><><><><><><>
                                        ; <><><>CHECKPOINT 5<><><>
        MOV     AL,0                    ; SET IMR TO ZERO
        OUT     INTA01,AL
        IN      AL,INTA01               ; READ IMR
        OR      AL,AL                   ; IMR = 0?
        JNZ     D6                      ; GO TO ERR ROUTINE IF NOT 0
        MOV     AL,0FFH                 ; DISABLE DEVICE INTERRUPTS
        OUT     INTA01,AL               ; WRITE TO IMR
        IN      AL,INTA01               ; READ IMR
        ADD     AL,1                    ; ALL IMR BITS ON?
        JNZ     D6                      ; NO - GO TO ERR ROUTINE

...

D6:
        MOV     SI,OFFSET E0            ; DISPLAY 101 ERROR
        CALL    E_MSG
        CLI
; FE38E
        HLT                             ; HALT THE SYSTEM

What this routine is doing is testing the IMR of the PIC. It's a simple test: We write 0 to the register, and read the register back to see if it still 0. If it isn't, we jump to the error handler at D6.

We enter this routine at FE35C, and by setting a breakpoint there we can quickly determine we jump to D6 from the first JNZ, so the IMR was not 0 when it was read back.

What else is interesting is that based on the comment and the CALL to E_MSG, there is supposed to be some sort of message printed to the screen, but nothing shows up. Peeking into video memory, we can see the characters '101' are on the screen, or should be. If the CPU halts in an unrecoverable way (i.e. the CPU interrupt flag is cleared), I stop the emulation and pop up an error. I thought this would be useful to the user, sort of a way to say Hey, this machine has stopped forever, you might want to restart.  However, since I halt immediately this stops all devices as well. The CGA card never gets a chance to actually scan across the screen and draw the characters! 

If I want to keep my halt popup, perhaps it should only trigger after a certain number of cycles have elapsed in the halted state, so that the screen can catch up before we stop emulating.  This behavior also might not be desired (In theory, you could recover via an NMI), so I'll probably make it optional via the config file.

Back to the original issue at hand; why is the IMR not 0 when read back? 

If we hit a key (or release one), we produce a rising edge on IR #1. Since the PIC is in edge-triggered mode, this rising edge sets the corresponding bit #1 in the Interrupt Request Register (IRR). If the same bit is set in the IMR, it is masked, and will not result in an interrupt being serviced. 

The IMR is 0xFF when we enter this procedure, that is, all interrupts are masked. If we hit a key during boot, we can see a pending interrupt with bit #1 in the IMR being set.


When the test procedure runs, it writes 0 to the IMR register - this unmasks all interrupts, and so our keyboard interrupt fires off.

The IVR has been populated at this point, most of them pointing to a temporary ISR procedure in the BIOS. Here it is:

D11	PROC	NEAR
	ASSUME	DS:DATA
	PUSH	DS
	CALL	DDS
	PUSH	AX                          ; SAVE REG AX CONTENTS
	MOV	AL,0BH                      ; READ IN-SERVICE REG
	OUT	INTA00,AL                   ; (FIND OUT WHAT LEVEL BEING SERVICED)
	NOP			        
	IN	AL,INTA00                   ; GET LEVEL
	MOV	AH,AL                       ; SAVE IT
	OR	AL,AH                       ; 00? (NO HARDWARE ISR ACTIVE)
	JNZ	HW_INT
	MOV	AH,0FFH
	JMP	SHORT SET_INTR_FLAG         ; SET FLAG TO FF IF NON-HDWARE
HW_INT:
	IN	AL,INTA01                   ; GET MASK VALUE
	OR	AL,AH                       ; MASK OFF LVL BEING SERVICED
	OUT	INTA01,AL
	MOV	AL,EOI
	OUT	INTA00,AL
SET_INTR_FLAG:
	MOV	@INTR_FLAG,AH               ; SET FLAG
	POP	AX                          ; RESTORE REG AX CONTENTS
	POP	DS
DUMMY_RETURN:                               ; NEED IRET FOR VECTOR TABLE
	IRET
D11	ENDP

This temporary ISR doesn't do much besides determine the interrupt level by reading the In-Service Register, then reading the IMR and writing back the IMR bitwise-OR'd with the ISR, effectively masking off the interrupt that just occurred.

This means the IMR is no longer 0 after an interrupt.

Here's the sequence of events:
  1. A key is hit during boot, causing a pending interrupt in the IRR, but is masked.
  2. The IMR test in the BIOS sets the IMR to 0.
  3. The keyboard interrupt is immediately serviced, beginning execution of the temporary ISR.
  4. The temporary ISR modifies the IMR to 0x02 and returns.
  5. The IMR test reads back the IMR register, which is now 0x02.
  6. The IMR test jumps to the failure condition since the IMR is not 0.

One question that comes immediately to mind - if the sole reason for this test is to just validate that we can write and read the same value to the IMR register, why have the Interrupt CPU flag enabled at all?

If IBM had preceded this routine with a CLI instruction, none of this would occur, since the INTR line produced by the PIC when the keyboard interrupt was no longer masked would be ignored until the Interrupt bit was set again, most likely with a STI.  In fact, if we look a little later down the source:

;----- INTERRUPTS ARE MASKED OFF.  CHECK THAT NO INTERRUPTS OCCUR

	MOV	DATA_AREA[@INTR_FLAG-DATA40],AL     ; CLEAR INTERRUPT FLAG
	STI				            ; ENABLE EXTERNAL INTERRUPTS
	SUB	CX,CX	                            ; WAIT 1 SEC FOR ANY INTRS THAT
D4:
	LOOP	D4		                    ; MIGHT OCCUR
D5:
	LOOP	D5
	CMP	DATA_AREA[@INTR_FLAG-DATA40],00H    ; ANY INTERRUPTS OCCUR?
	JZ	D7			            ; NO - GO TO NEXT TEST
D6:
	MOV	SI,OFFSET E0		            ; DISPLAY 101 ERROR
	CALL	E_MSG
	CLI
	HLT				            ; HALT THE SYSTEM

We can see that immediately after the IMR test, there's a test for "Hot Interrupts" - and it begins with an STI.  The implication of this was that IBM expected the interrupt flag to be cleared at this point. 

This bug does not occur with the IBM 5150 BIOS.  Let's take a look at that, for comparison:

;   TEST THE IMR REGISTER

    CLI                     ; DISABLE INTERRUPTS
    MOV AL,0
    OUT INTA01,AL
    IN  AL,INTA01           ; READ IMR
    OR  AL,AL               ; IMR = 0?
    JNZ D6                  ; GO TO ERR ROUTINE IF NOT 0
    MOV AL,0FFH             ; DISABLE DEVICE INTERRUPTS
    OUT INTA01,AL           ; WRITE TO IMR
    IN  AL,INTA01           ; READ IMR
    ADD AL,1                ; ALL IMR BITS ON?
    JNZ D6                  ; NO - GO TO ERR ROUTINE

The IMR test on the 5150 begins with a CLI.  The keyboard interrupt is ignored by the CPU, the temporary ISR does not fire, the IMR is not modified, and the test passes and POST continues.

Validating Hardware Behavior

Is this a goof on IBM's part? Did they just forget to clear the interrupt flag before the IMR test? Only one way to find out - try it on real hardware.  If we can mash keys immediately after turning on an IBM 5160 and the system prints '101' and halts, then we know this was IBM's bug, and our emulator is actually correct. I was able to find a volunteer with an IBM 5160, and we set about trying our best, immediately mashing keys after flipping the power switch.

IBM 5160 Keyboard Error

After a dozen or so attempts, no 101 error appeared, and no halt occurred. So my emulator is wrong, after all.

Back to the Drawing Board

At this point it might be useful to come up with a few hypotheses: 
  1. Maybe the interrupt flag should actually be cleared at this point, but isn't, for some reason.
  2. Maybe the keyboard should be disabled at this point, somehow.
  3. Maybe the IMR should not be 0xFF to start, allowing a previous keyboard interrupt to be serviced.
  4. My PIC logic could be wrong. Maybe a pending bit in the IRR shouldn't immediately trigger an interrupt when the corresponding mask bit in the IMR is cleared

Let's tackle 1. first. So when is the interrupt flag actually set? We clear the flags when the CPU is reset on boot. Tracing backwards, we can see that an STI occurs during the 'KBD_RESET' procedure prior to the IMR test.

;------------------------------------------------------------------------
;	THIS PROCEDURE WILL SEND A SOFTWARE RESET TO THE KEYBOARD.	:
;	SCAN CODE 'AA' SHOULD BE RETURNED TO THE CPU.                   :
;------------------------------------------------------------------------
KBD_RESET	PROC	NEAR
	ASSUME	DS:ABS0
	MOV	AL,08H			; SET KBD CLK LINE LOW
	OUT	PORT_B,AL		; WRITE 8255 PORT B
	MOV	CX,10582		; HOLD KBD CLK LOW FOR 20 MS
G8:
	LOOP	G8			; LOOP FOR 20 MS
	MOV	AL,0C8H 		; SET CLK, ENABLE LINES HIGH
	OUT	PORT_B,AL
SP_TEST:				; ENTRY FOR MANUFACTURING TEST 2
	MOV	AL,48H			; SET KBD CLK HIGH, ENABLE LOW
	OUT	PORT_B,AL
	MOV	AL,0FDH 		; ENABLE KEYBOARD INTERRUPTS
	OUT	INTA01,AL		; WRITE 8255 IMR
	MOV	DATA_AREA[@INTR_FLAG-DATA40],0	; RESET INTERRUPT INDICATOR
	STI				; ENABLE INTERRUPTS
	SUB	CX,CX			; SETUP INTERRUPT TIMEOUT CNT
G9:
	TEST	DATA_AREA[@INTR_FLAG-DATA40],02H ; DID A KEYBOARD INTR OCCUR?
	JNZ	G10			; YES - READ SCAN CODE RETURNED
	LOOP	G9			; NO - LOOP TILL TIMEOUT
G10:
	IN	AL,PORT_A		; READ KEYBOARD SCAN CODE
	MOV	BL,AL			; SAVE SCAN CODE JUST READ
	MOV	AL,0C8H 		; CLEAR KEYBOARD
	OUT	PORT_B,AL
	RET				; RETURN TO CALLER
KBD_RESET	ENDP

This procedure resets the keyboard by holding the keyboard clock line low for approximately 20ms. The keyboard detects this as a reset request, and when the clock line is returned high, will respond with a single byte 0xAA. This scancode generates an interrupt like any regular keystroke would, and that interrupt is checked for.

We can see the STI right there in the middle; it sets the CPU interrupt flag and nothing clears it afterwards. The IMR is set to 0xFD, masking all but keyboard interrupts. As we saw before, when the temporary ISR fires, it will mask its own interrupt level, meaning we'll end up with an IMR of 0xFF. This effectively shoots down hypotheses #1 & #3 at once.

Besides the clock line, I can't help but notice there's a reference to an "enable line." Setting PPI port B to 0xC8 appears to set the enable line high, whereas 0x48 appears to set this enable line to low. C8-48 = 80.  So the last bit of PPI port B, PB7 is referenced as an enable line.

minuszerodegrees.net has been my primary resource for the behavior of the PPI on the IBM 5150 and 5160.  You can see the page on the PPI registers here.  This is what it has to say about bit 7 of PPI port B:

HIGH: Clear KSR*  +  Clear IRQ1
LOW: Normal state of PB7
*KSR means the Keyboard Shift Register chip (a 74LS322).


Now, I do handle this bit in my emulator, but perhaps I am not handling it properly. Right now, when this bit is set, I clear the virtual KSR, and I clear the IR line #1.  But is that how this bit really operates? This behavior would be correct if this was an edge-triggered effect, but what if it's a level-triggered state? If this bit actually suppressed IR line #1 while set, that would effectively disable keyboard interrupts, and our IMR test could pass without issue.

Let's look to the IBM 5160 diagram.


This is a subsection of Sheet 9.  PB7 comes in to the image in the lower left, through the inverter U89, and is tied to both the !CLR line of the 74LS322 shift register U27, and the 72LS74 flip-flip U70.  PB7 also connects to the !OE line of the shift register. Since the output (!QH) of the shift register is the data input of the flip flop, and PB7 high disables that output, we can see that PB7 high will clear and suppress IR1 indefinitely.

If this is all correct, then at some point, PB7 has to be cleared so the keyboard can operate again. Let's see if we can find that in the BIOS source.

;------------------------------------------------
;	KEYBOARD TEST				:
;DESCRIPTION					:
;	RESET THE KEYBOARD AND CHECK THAT SCAN	:
;	CODE 'AA' IS RETURNED TO THE CPU.       :
;	CHECK FOR STUCK KEYS			:
;------------------------------------------------
TST12:
	MOV	AL,99H			; SET 8255 MODE A,C=IN B=OUT
	OUT	CMD_PORT,AL
	MOV	AL,DATA_AREA[@EQUIP_FLAG-DATA40]
	AND	AL,01			; TEST CHAMBER?
	JZ	F7			; BYPASS IF SO
	CMP	DATA_AREA[@MFG_TST-DATA40],1	; MANUFACTURING TEST MODE?
	JE	F7			; YES - SKIP KEYBOARD TEST
	CALL	KBD_RESET		; ISSUE RESET TO KEYBRD
	JCXZ	F6			; PRINT ERR MSG IF NO INTERRUPT
	MOV	AL,49H			; ENABLE KEYBOARD
	OUT	PORT_B,AL
	CMP	BL,0AAH 		; SCAN CODE AS EXPECTED?
	JNE	F6			; NO - DISPLAY ERROR MSG

;----- CHECK FOR STUCK KEYS

	MOV	AL,0C8H 		; CLR KBD, SET CLK LINE HIGH
	OUT	PORT_B,AL
	MOV	AL,48H			; ENABLE KBD,CLK IN NEXT BYTE

Easy enough - shortly after the next test, the PIT checkout, there's a keyboard test routine. Within, at the bottom of the listing, PB7 is cleared. This routine loops for a while reading the keyboard shift register and checking that it is 0. If a scancode appears, it will print the "301" keyboard error code.

Bug Fixed?

I changed the behavior of the PPI to set a 'kb_enabled' flag to true when PB7 is clear, and vice versa. Let's try booting the emulator and mashing keys now:


Drat. We halted again. Moreover, if we look at the state of PB7, it's clear now. This is odd.

If we just let the BIOS post without hitting a key, with a breakpoint at the IMR test, then PB7 is set as expected. What is clearing PB7?

Enabling instruction logging, we mash keys until we halt. Then it's a simple matter of searching backwards from the instruction log for 'out 0x61, al' , where AL==08h.  We find it at F000:ECF5. 

        IN      AL,PORT_B
        OR      AL,030H                 ; TOGGLE PARITY CHECK LATCHES
        OUT     PORT_B,AL
        NOP
        AND     AL,0CFH
        OUT     PORT_B,AL

Unfortunately, this is a dead end - it's a masking operation that wouldn't affect PB7.  We need to keep searching backwards in the instruction log until we find the last 'out 0x61, al' where the MSB of AL is 0.

There's a candidate at F000:E253. 

        CALL    KBD_RESET               ; SEE IF MFG. JUMPER IN
        CMP     BL,0EAH                 ; IS THIS THE EXTENDED KEYBOARD?
        JNE     KBX1                    ; IF NOT THEN LEAVE THE FLAG ALONE
        MOV     DATA_AREA[@KB_FLAG_3-DATA40],KBX  ; EXTENDED KEYBOARD
        JMP     SHORT E6                ; DONE WITH KEYBOARD HERE
KBX1:
        CMP     BL,0AAH                 ; KEYBOARD PRESENT?
        JE      E6
        CMP     BL,065H                 ; LOAD MFG. TEST REQUEST?
        JNE     D3B
        JMP     MFG_BOOT                ; GO TO BOOTSTRAP IF SO
D3B:
        OR      BL,BL                   ; MFG PLUG IN?
        JNZ     E6                      ; NO
        MOV     AL,38H
; FE253
        OUT     PORT_B,AL
        NOP
        NOP
        IN      AL,PORT_A
        AND     AL,0FFH                 ; WAS DATA LINE GROUNDED
        JNZ     E6
        INC     DATA_AREA[@MFG_TST-DATA40]      ; SET MANUFACTURING TEST FLAG

This is part of the manufacturing mode check.  KBD_RESET is called, which should return scancode 0xAA.  But it looks like we could have something else connected to the keyboard which could return scancode 0x65, and if it is we would jump to MFG_BOOT.  Otherwise, if the scancode is nonzero, we wind up at FE253, and we set PPI port B to 0x38, which enables the keyboard allowing an interrupt through - which will hang our IMR test.  

But wait - how did we get here exactly? KBD_RESET should have returned AA, right?  

Embarrassingly, my initial fix was bad:

        if self.kb_enabled {
            log::trace!("PPI: Sending keyboard reset byte");
            self.kb_byte = 0xAA;
            pic.request_interrupt(1);
        }
        else {
            log::trace!("PPI: Keyboard reset byte suppressed by keyboard enable bit");
        }

Recall that PB7 suppresses the output of the keyboard shift register, but it does not affect the output of PPI Port A. We should still set 'kb_byte' to 0xAA, even if PB7 is high. But we shouldn't send an interrupt:

        log::trace!("PPI: Sending keyboard reset byte");
        self.kb_byte = 0xAA;

        if self.kb_enabled {
            pic.request_interrupt(1);
        }

Now, when reading PPI port A, we should either return kb_byte if the PB7 is low, or 0 if it is high:

    fn read_u8(&mut self, port: u16, _delta: DeviceRunTimeUnit) -> u8 {
        //log::trace!("PPI Read from port: {:04X}", port);
        match port {
            PPI_PORT_A => {
                // Return dip switch block 1 or kb_byte depending on port mode
                // 5160 will always return kb_byte.
                // PPI PB7 supresses keyboard shift register output.
                match self.port_a_mode {
                    PortAMode::SwitchBlock1 => self.dip_sw1,
                    PortAMode::KeyboardByte => {
                        if self.kb_enabled {
                            self.kb_byte
                        }
                        else {
                            0
                        }
                    }
                }
            },

And now.... 

It still halts.

Keyboard Input Behavior

Let's take a step back and consider what happens when a key is hit on the 5160.  There is no keyboard buffer to speak of.  A single byte of scancode is available via PPI Port A, and an interrupt is normally generated when a scancode is available. If the keyboard ISR cannot keep up with inbound scancodes, then a keystroke (or key release) could be lost.

When the keyboard is reset by holding the clock line low for 20ms, scancode 0xAA is produced by the keyboard when the clock line is raised high again.  Early in development, I added a slight delay between when the keyboard is reset, and when the scancode 0xAA is produced. If I recall, it was because I encountered an error condition when sending 0xAA immediately, but this could have been because I wasn't handling the interaction between the IRR and IMR registers properly at the time. In any case, that delay has stayed in my code since.

Fixing It For Good?

The KBD_RESET routine polls PPI port A in a loop that exits on the first non-zero scancode encountered. That means if we are mashing keys, the following can happen:
  1. KBD_RESET pulls the clock line low for 20ms, triggering the keyboard to reset.
  2. The emulated keyboard schedules the reset byte to be sent after a delay.
  3. The user mashes a key, setting the keyboard shift register to some non-zero scancode.
  4. KBD_RESET returns, returning the scancode in BL.
  5. At some point later, the keyboard shift register is set to 0xAA, but it's too late.
The easiest thing to do now would be to see if this reset delay is needed anymore. Let's try setting the delay time to 0, before taking it out entirely.

Time to boot up the emulator and start mashing keys:



I've never been so happy to see a keyboard error.

Not So Fast

This addendum to this article is new as of 6/21/2023. One of the new features I've added to my emulator is the ability to send the emulated machine the key sequence control-alt-delete, which performs a 'warm boot' on the IBM PC.  When the BIOS keyboard ISR detects all three keys are held down, it literally just does a far jump into the BIOS entrypoint at F000:E05B, not even bothering to properly end the interrupt with an EOI.

This presents two additional problems.  First, there's a bit stuck on in the ISR for the keyboard, since no EOI was received and we are not in auto-EOI mode. This means, no more keyboard interrupts can be serviced. That's not good. 

What's worse is, we halt during cold boot again. This is the bug that will not die.

We're halting in the same place, the IMR register test. But this time, the situation is a little different. The bit that is on in the IMR after it is cleared is 01h; timer IRQ0. 

Consider that from a cold boot, the Programmable Interval Timer (PIT) is considered to be in an uninitialized state - that means, none of its timers are running and will not start running until configured. Therefore, we didn't have to worry about bit #0 in the IRR being set.  But in a warm boot scenario, timer 0 is likely running and merrily generating interrupts. A quick scan through the BIOS shows us that nothing touches, configures, or resets Timer 0 before the IMR test.

There's nothing to save us this time that will suppress IRQ0 like we found with the keyboard. Checking the schematic, the output of PIT Timer 0 connects directly to IR0 of the PIC with no intermediate logic. A bit in the IRR will be set when the timer hits terminal count, and nothing stops the timer from doing so. 

As far as our first problem goes, the ISR, the BIOS does initialize the PIC by sending Initialization Command Words. The Intel documentation for the 8259A PIC indicates that the IMR is set/reset on receiving an ICW1.  There's no mention of the ISR, however.  For the moment, I am prepared to assume that the ISR is also cleared with an ICW - perhaps something that requires validation against hardware, but for now it solves problem #1. 

That leaves us with the IMR test.

This whole time we've been assuming that an interrupt will fire the moment the IMR is cleared due to the pending bit set in the IRR, so that the temporary BIOS ISR will execute immediately after the OUT instruction, modifying the IMR before the IN instruction can read it back.

But what if that assumption is wrong? We can write a test program. What I came up with is a bit long to post the entire listing, but here's the heart of it:

        mov     al, ~02h
        out     PIC_DATA_PORT, al
        in      al, PIC_DATA_PORT               ; Read the IMR back immediately.
        cmp     al, ~02h
        jne     .imr_changed
        jmp     .imr_not_changed

The test program instructs the user to press a key - setting a bit in the IRR, while the IMR is masked off. We have a temporary keyboard ISR installed that sets the IMR to FFh and prints "INT!" to the screen. We then clear the IMR bit for the keyboard, allowing the pending keyboard interrupt request to be serviced. We read the IMR value back immediately.

If the interrupt fires before IN, the IMR value read back will be FFh; otherwise the interrupt fires after the IN. 

Running it on an IBM 5150, we find that the IMR is unchanged, but the interrupt does fire. The interrupt just does not fire immediately after OUT.

Interrupt Processing on the 8088

Intel documentation states that the INTR line - signifying an interrupt request from the PIC - is sampled on the "last clock cycle of an instruction."  

Consider the OUT that we use to clear the IMR mask. The microcode for OUT imm8, al is as follows:


The important bit is the 'W, RNI' instruction in green. The last thing the instruction does is initiate a IO Write bus cycle. This effectively ends the instruction on T3 of that write.

There's some ambiguity over what Intel considers the end of an instruction. The RNI (Read Next Instruction) command may have to fetch the next opcode byte if the instruction queue is empty - are those cycles counted as part of the OUT or part of the next instruction?

Even if the PIC raises INTR instantaneously when the IMR is cleared (an assumption we can't really make), it's possible that the INTR line is sampled on the same cycle as the write itself, that final T3.  From our test program, we can assume that either INTR is delayed on clearing the IMR, or INTR is sampled too early to register - in any case, it appears we can clear the IMR with an OUT and any pending interrupts will not fire until the next instruction.

Fixed At Last (No, Really)

And that was it - the entire bug this entire time: a bad assumption that the interrupt must fire between OUT and IN.  By delaying INTR for one instruction, the IMR test can never fail.  Maybe this is why IBM removed the CLI - some smart engineer recognized this and saved an entire byte from the code listing. Who knows.

I chased down a lot of bad leads, but it's not all for nothing - we discovered plenty of other issues in the process, and now have more accurate PPI and keyboard emulation to show for it. 

I recognize that this was a long, meandering article, so if you got this far, I sincerely thank you for reading!


Comments

Post a Comment

Popular posts from this blog

Hacking the Book8088 for Better Accuracy

Bus Sniffing the IBM 5150: Part 1

The Complete Bus Logic of the Intel 8088