Emulator Debugging: Windows 3.0 and VGA

My emulator has incomplete implementations of the EGA/VGA cards. Since these cards share a lot of common functionality and register architecture, I developed them simultaneously. 

In some ways, the VGA is just an EGA card with a DAC, faster crystal, and color registers bolted onto it. So much so, that the VGA's classic 640x480 16 color-mode is nearly identical in function to the EGA's equivalent 640x350 16 color mode, just with a higher vertical resolution.

This allows you to do some silly things. In Windows 95, you can select the 'Standard VGA' driver, then turn your computer off and replace your VGA card with an EGA card and power it back on - and you will see your desktop again, just with the bottom of the screen cut off, since you now only have 350 lines of vertical resolution. Now you have Windows 95 on an EGA card - something that was never officially supported. I remember doing this out of desperation as a teenager when our VGA card died. I was even able to get WinQuake running on an EGA card! The Windows GDI graphics functions would dither the 256 color graphics down to 16 colors. It was ugly, it ran at seconds per frame, but it ran.

However there is one subtle difference between the VGA and EGA that would end up haunting me for weeks...

Windows 3.0

Running Windows is of course one of the seminal exercises for a VGA card. I'm emulating an 8088 however, and the VGA driver for Windows 3.0 unfortunately required an 286 CPU (Technically, a 186, but those were scarcely encountered other than the Tandy 2000). This limits an XT machine to running Windows 3.0 in a miserable black and white, stretched high-resolution CGA mode.

That is, until Montecarlo4tony on the VCF Forums got to hacking and removed the incompatible instructions from the Windows 3.0 VGA driver.  Quite impressive work! I've tried it on my real IBM 5150 with a 8-bit compatible Oak VGA video card, and it works brilliantly, if a bit sluggish in Solitaire (which simply a matter of the slow CPU).

So of course I was eager to use this driver in my emulator.  I had already tested out the 16 color modes using QBasic and utilities like EGAD, and games like EGA Trek. Given the commonalities between EGA and VGA, I was fairly confident that we'd have Windows running, no problem.

The Windows splash screen came up, so far so good:


But once I reached the Windows desktop, I was greeted with this:


Of course, the vertical stripes should not be there. And the background should be a neutral grey, not the lovely lilac shade it is (although I appreciated the color-coordination with my purple debug UI).

It would make sense if this was just a poor implementation of the 16 color mode, after all, one does not cobble together a VGA card from specifications and documentation without making a few errors in implementation to work through.

But compare the results to another 16-color GUI of the era, GeoWorks Ensemble


It looks perfect. What then, is the video driver for Windows 3.0 doing differently?

Debugging


Thankfully, I had extended my video card debugger to encompass all the new VGA registers:

VGA Registers


As you can see, the VGA has a lot of registers. Many of these (the fields with brackets) are multiple bit-fields encoded into a single register port. Potentially, a bug in the handling of any of these could be the culprit.

If you don't know much about the VGA graphics hardware, you might simply be asking, aren't I just writing the wrong values to VRAM? That would indeed be fairly straightforward to debug. But it's a bit more complicated than that.

The VGA Graphics Pipeline

The VGA card can do much fancier tricks than its ancestor, the CGA.  On the CGA card, when you wrote to graphics memory, the byte you wrote was transferred verbatim, and the CGA card interpreted that data literally, depending on the graphics mode set.  Not so with the VGA.  When you write a byte of memory to the VGA card, it is tumbled through a pipeline of configurable operators before turning into a pixel result.

The byte can be rotated, optionally have a Boolean operation (AND, OR, or XOR) applied to it against existing data, with the operation able to be masked as well.

Those of you with a bit of graphics background may appreciate the particular usefulness of XOR.  Applied twice in a row, it will erase what you just wrote. Combined with the VGA's masking abilities, it is the most useful operator. If you move a the text-selection mouse cursor over some text on the computer you're using right now, you might very well see XOR in action.

I started out debugging systematically, by disabling parts of the graphics pipeline to see if I could isolate just what was drawing those horrendous yellow and red stripes.  It became clear that Windows wasn't using the other two Boolean operators, but it was using XOR. 

I set the VGA to return a white pixel when the XOR operator was used:


Interesting. There's a lilac color remaining where there shouldn't be, but we can see that windows uses the XOR mode to draw the desktop and the window backgrounds. What remains looks more-or-less correct. So is XOR my culprit?

I stared at my XOR implementation for quite some time. I pulled it apart into component steps and verified each one. From everything I could tell, it was correct.

Trace Logging the VGA

I ended up adding trace log functionality to the VGA card - a file handle passed to the VGA device where debug logs, including both memory and register read/write operations and various logic decisions internal to the card could be dumped.

This file grew very large after running Windows for only a few seconds, but it wasn't so huge that I couldn't still load it in Notepad++.  I was able to identify when the splash screen occurred by finding the graphics writes for the particular stippled background effect it has. 

Windows 3.0 splash screen pattern

Now I could step through the splash screen, and find when the actual desktop started to be drawn.

Remember those VGA pipeline operators? They don't operate on the graphics data already at the destination, they operate on the contents of the VGA latches. The latches are four internal registers - one for each bit plane - that are set when a read operation is performed.  

These latches also enable efficient video memory to memory copies. We're not talking hardware blitting here, but you can copy eight pixels at a time within video memory by loading the latches and writing somewhere else. Without bit planes and latches, we'd need four separate 'mov' instructions to accomplish the same result. 

Recall also the masking capabilities of the VGA. We may not want to perform a XOR on the entire span of 8 pixels loaded into the latches. A mask register allows us to select a subset of pixels to operate on - a very handy function indeed if you're say, drawing or erasing text or complicated UI elements. 

A careful analysis of the trace logs produced by my VGA device seemed to indicate that the latches were being loaded correctly, but a bad value was being programmed into the VGA mask register.  

Consider the effect that might have - we end up drawing pixels where we didn't intend to. Garbage pixels. 

Red and yellow garbage pixels.

About that EGA/VGA compatibility...

I said before that the 16 color graphics modes for EGA and VGA are nearly identical, other than vertical resolution. This is true.  And the pixel-pipeline operations between the EGA and VGA are largely the same - the VGA adds a new write mode the EGA does not have, but I had already determined that Windows 3.0 wasn't using it. 

But there's actually another subtle difference between the EGA and VGA cards. 

On the EGA, most of its vast array of registers are write-only.  This was extremely inconvenient for programmers, who found themselves unable to query the current state of the adapter, and had to either ensure that the state was established on every graphics operation or maintain in-memory mirrors of the EGA registers to keep track. Not very efficient in either case.

When IBM designed the VGA, they addressed this, making nearly the entire suite of VGA registers readable at last. 

Say if you wanted to set the mask register by modifying its current value.  On the EGA, you would modify your internal variable representing what the mask register currently is, then write the new value.  All while hoping that your internal state didn't get out of sync. On the VGA you could do one better - you could read the VGA mask register, twiddle the necessary bits, and write it back.

Unless of course, the VGA mask register wasn't readable -

- because the emulator programmer forgot to implement the newly-readable registers on the VGA.

An Embarrassing Bug


Without properly handling reading of the mask register, the IO operation would return a null result, which the driver would then assume was the current mask value, modify it, and write it back - putting an invalid mask in place. Then, pixels that shouldn't have been drawn end up on the screen - in garish stripes.  When we allow the driver to actually read the mask register and modify it correctly, our stripes disappear, properly masked away.

Weeks of troubleshooting, for something so simple.  Why am I confessing this embarrassment to the world?  Maybe it will make someone else feel better about a difficult bug they're struggling with. Often the hardest bugs have the simplest explanations. It's also an important lesson: always check your assumptions.

Why didn't I consider this earlier? The main reason is that most 16-color software shouldn't be affected. After all, if you were going to write software for 16-color mode, you probably intended to make it compatible with both EGA and VGA, for maximum compatibility with the varied PC ecosystem at the time. That means you couldn't rely on the registers being readable.  GeoWorks clearly is written like this - the VGA driver just utilizes the increased vertical resolution, and doesn't make any attempt to use fancy new VGA features like readable registers.

Also recall my memory from the 90's about swapping out the VGA for an EGA in my home computer? I remembered that worked fine - no stripes to be seen - which lingered in the back of my mind and told me that there shouldn't be a major difference between the EGA and VGA drivers in Windows. If Montecarlo4tony had made an 8088 version of the EGA Windows 3.0 driver as well, I probably would have tried it at some point, and seeing it work flawlessly may have given me the clue I needed.  But sadly, an 8088 EGA driver does not exist.

Ultimately, I was comparing apples to oranges.  The Windows 95 VGA driver apparently doesn't read and modify the VGA mask register, but the Windows 3.0 VGA driver does.  There's plenty of reasons why this might be the case - the code may not share any common heritage, perhaps even written from scratch by a different programmer entirely. There's fundamentally no reason to assume that they operate exactly the same way.

About That Purple...

We still have a purple background. This bug, at least, would not torment me for weeks. 

The EGA card has a set of palette registers for each of the 16 possible colors it can generate at one time - these palette registers allow any of 64 color definitions to fill those 16 palette slots. 

Of course we're all familiar with the VGA's 256 color capability. The VGA adds a second level of indirection - the 16 palette registers of the EGA become look-ups into the 256 color registers on the VGA.  The purple isn't a color at all - it's a color index.

Properly resolving the correct color registers, we finally see Windows 3.0 in all its glory:





Comments

Popular posts from this blog

Hacking the Book8088 for Better Accuracy

Bus Sniffing the IBM 5150: Part 1

The Complete Bus Logic of the Intel 8088