A Test Suite for the Intel 8088

In a previous article, I revealed one of the secrets to MartyPC's accuracy, hardware validation. To summarize, it is a method of using a microcontroller to run an instruction on a real 8088 CPU at the same time as the emulator executes the same an instruction, and then comparing the resulting cycle states (and optionally registers) for consistency.

Once again, credit goes to phix and his VirtualXT emulator for pioneering this basic idea. 

The downside to this method is that it is quite slow, and requires going out and buying a Raspberry Pi or an Arduino, ordering some vintage chips off eBay, then either breadboarding up a CPU or ordering a custom PCB and soldering it together, not to mention the the effort of integrating client communication and test logic into your emulator.  All this presents a barrier to entry that makes this method probably only suitable for an especially dedicated few. 

Compounding things, it's not really practical to validate instructions that frequently. Every time you tweak something as fundamental as BIU logic, you need to validate your emulator all over again, and validating the entire ISA can take many hours.  

In that same blog post I bemoaned the lack of a JSON test suite for the 8088 CPU. I'm happy to announce that as of now, the 8088 no longer lacks a test suite! With the help of my friend Folkert van Heusden, we have taken three Arduino8088 hardware boards and spent the past week or so generating the first complete JSON CPU test suite for the 8088

My two Arduino8088 boards busy generating tests

JSON CPU Tests

A bit of a background if you're not familiar with the concept of a JSON CPU test suite. The idea was originally popularized by Tom Harte, as far as I can tell. JSON is a highly portable data format with readily available parsers in almost every language, saving emulator developers the trouble of writing a custom parser for a binary format. JSON isn't the most efficient storage format, so these JSON files can be a bit weighty - larger test sets are gzipped to keep the repository sizes reasonable.

Each JSON file typically corresponds to a single opcode, and contains an array of randomized tests that include the initial and final states of memory and CPU registers, and an array of data for each cycle spent executing that opcode.

Tom's CPU tests have been very well received in the emulation development community, giving a valuable and convenient way to quickly locate and squash emulation bugs that might otherwise take weeks of debugging to catch. Ever since developing the Arduino8088, I realized that using it to generate a set of tests was theoretically possible if I put in the work.

A number of CPU test authors have collaborated to combine their efforts, hosting test for various CPU architectures in the SingleStepTests organization on GitHub.

The 8088 Test Suite

The 8088 Test Suite is fairly comprehensive, exercising 10,000 tests not just on the opcode level but on the opcode extension level as well, with only a handful of opcodes omitted due to practicality (WAIT, HALT) or unpredictability (mostly certain undefined opcode forms).

The test suite contains 324 opcode forms, over 3 million instruction tests in total, including nearly 90 million cycle states. Compressed, the collection weighs in at 677 MB. The uncompressed, pretty-printed JSON totals 9.68GB.

That's a lot of test data!

Example Test

Here's an example test entry:

{
    "name": "add byte ds:[bx+si+C2h], al",
    "bytes": [0, 64, 194],
    "initial": {
        "regs": {
            "ax": 52773,
            "bx": 22214,
            "cx": 16054,
            "dx": 57938,
            "cs": 60492,
            "ss": 17184,
            "ds": 15619,
            "es": 60510,
            "sp": 56738,
            "bp": 13363,
            "si": 58400,
            "di": 31158,
            "ip": 16937,
            "flags": 62535
        },
        "ram": [
            [264920, 71],
            [984809, 0],
            [984810, 64],
            [984811, 194],
            [984812, 144],
            [984813, 144],
            [984814, 144],
            [984815, 144]
        ],
        "queue": []
    },
    "final": {
        "regs": {
            "ax": 52773,
            "bx": 22214,
            "cx": 16054,
            "dx": 57938,
            "cs": 60492,
            "ss": 17184,
            "ds": 15619,
            "es": 60510,
            "sp": 56738,
            "bp": 13363,
            "si": 58400,
            "di": 31158,
            "ip": 16940,
            "flags": 62470
        },
        "ram": [
            [264920, 108],
            [984809, 0],
            [984810, 64],
            [984811, 194],
            [984812, 144],
            [984813, 144],
            [984814, 144],
            [984815, 144]
        ],
        "queue": [144, 144, 144]
    },
    "cycles": [
        ["-", 984810, "CS", "R--", "---", 0, "CODE", "T2", "F", 0],
        ["-", 984810, "CS", "R--", "---", 64, "PASV", "T3", "-", 0],
        ["-", 984810, "CS", "---", "---", 0, "PASV", "T4", "-", 0],
        ["A", 984811, "--", "---", "---", 0, "CODE", "T1", "-", 0],
        ["-", 984811, "CS", "R--", "---", 0, "CODE", "T2", "S", 64],
        ["-", 984811, "CS", "R--", "---", 194, "PASV", "T3", "-", 0],
        ["-", 984811, "CS", "---", "---", 0, "PASV", "T4", "-", 0],
        ["A", 984812, "--", "---", "---", 0, "CODE", "T1", "-", 0],
        ["-", 984812, "CS", "R--", "---", 0, "CODE", "T2", "-", 0],
        ["-", 984812, "CS", "R--", "---", 144, "PASV", "T3", "-", 0],
        ["-", 984812, "CS", "---", "---", 0, "PASV", "T4", "S", 194],
        ["A", 984813, "--", "---", "---", 0, "CODE", "T1", "-", 0],
        ["-", 984813, "CS", "R--", "---", 0, "CODE", "T2", "-", 0],
        ["-", 984813, "CS", "R--", "---", 144, "PASV", "T3", "-", 0],
        ["-", 984813, "CS", "---", "---", 0, "PASV", "T4", "-", 0],
        ["-", 984813, "--", "---", "---", 0, "PASV", "Ti", "-", 0],
        ["-", 984813, "--", "---", "---", 0, "PASV", "Ti", "-", 0],
        ["A", 264920, "--", "---", "---", 0, "MEMR", "T1", "-", 0],
        ["-", 264920, "DS", "R--", "---", 0, "MEMR", "T2", "-", 0],
        ["-", 264920, "DS", "R--", "---", 71, "PASV", "T3", "-", 0],
        ["-", 264920, "DS", "---", "---", 0, "PASV", "T4", "-", 0],
        ["A", 984814, "--", "---", "---", 0, "CODE", "T1", "-", 0],
        ["-", 984814, "CS", "R--", "---", 0, "CODE", "T2", "-", 0],
        ["-", 984814, "CS", "R--", "---", 144, "PASV", "T3", "-", 0],
        ["-", 984814, "CS", "---", "---", 0, "PASV", "T4", "-", 0],
        ["A", 984815, "--", "---", "---", 0, "CODE", "T1", "-", 0],
        ["-", 984815, "CS", "R--", "---", 0, "CODE", "T2", "-", 0],
        ["-", 984815, "CS", "R--", "---", 144, "PASV", "T3", "-", 0],
        ["-", 984815, "CS", "---", "---", 0, "PASV", "T4", "-", 0],
        ["A", 264920, "--", "---", "---", 0, "MEMW", "T1", "-", 0],
        ["-", 264920, "DS", "-A-", "---", 0, "MEMW", "T2", "-", 0],
        ["-", 264920, "DS", "-AW", "---", 108, "PASV", "T3", "-", 0]
    ]
}

There's a full breakdown of this format in the corresponding README.md file, but basically, the 'initial' object contains the initial register and memory state before the instruction has run, and the 'final' object contains the corresponding post-instruction state. 

To use this test, you set up your emulator, initialize or reset the CPU, set the registers according to the initial state, and write the bytes in the initial 'ram' array to memory.  Then you execute the instruction, and compare your emulator's registers and memory to the values in the 'final' state.  This can be very fast - MartyPC can execute 10,000 tests in a few seconds, making it practical to re-run the test suite in its entirety on major CPU logic changes. All the benefits of hardware validation, but without spending money and several days doing it yourself.

A Rust Implementation

Here's a quick overview of how I run these tests in my emulator, MartyPC.

First, we read our test file into a string, then deserialize it using an implementation of the Deserialize trait on a compatible test structure, utilizing the serde-json crate:

        file.read_to_string(&mut file_string).expect("Error reading in JSON file to string!");

        result = match serde_json::from_str(&file_string) {
            Ok(json_obj) => Some(json_obj),
            Err(e) if e.is_eof() => {
                println!("JSON file {:?} is empty. Creating new vector.", test_path);
                Some(LinkedList::new())
            } 
            Err(e) => {
                eprintln!("Failed to read json from file: {:?}: {:?}", test_path, e);
                None
            }
        }

We deserialize into a LinkedList to avoid vector reallocations. Then, we loop through each test in the list. During the loop we set up the initial register and memory state:

        // Set up CPU registers to initial state.
        println!("Setting up initial register state...");
        println!("{}",test.initial_state.regs);

        // Set reset vector to our test instruction ip.
        let cs = test.initial_state.regs.cs;
        let ip = test.initial_state.regs.ip;
        cpu.set_reset_vector(CpuAddress::Segmented(cs, ip));
        cpu.reset();

        cpu.set_register16(Register16::AX, test.initial_state.regs.ax);
        cpu.set_register16(Register16::CX, test.initial_state.regs.cx);
        cpu.set_register16(Register16::DX, test.initial_state.regs.dx);
        cpu.set_register16(Register16::BX, test.initial_state.regs.bx);
        cpu.set_register16(Register16::SP, test.initial_state.regs.sp);
        cpu.set_register16(Register16::BP, test.initial_state.regs.bp);
        cpu.set_register16(Register16::SI, test.initial_state.regs.si);
        cpu.set_register16(Register16::DI, test.initial_state.regs.di);
        cpu.set_register16(Register16::ES, test.initial_state.regs.es);
        cpu.set_register16(Register16::CS, test.initial_state.regs.cs);
        cpu.set_register16(Register16::SS, test.initial_state.regs.ss);
        cpu.set_register16(Register16::DS, test.initial_state.regs.ds);
        cpu.set_register16(Register16::IP, test.initial_state.regs.ip);
        cpu.set_flags(test.initial_state.regs.flags);

        // Set up memory to initial state.
        println!("Setting up initial memory state. {} memory states provided.", test.initial_state.ram.len());
        for mem_entry in &test.initial_state.ram {
            // Validate that mem_entry[1] fits in u8.

            let byte: u8 = mem_entry[1].try_into().expect(&format!("Invalid memory byte value: {:?}", mem_entry[1]));
            cpu.bus_mut().write_u8(mem_entry[0] as usize, byte, 0).expect("Failed to write memory");
        }

Then we can run the instruction itself:

        // We loop here to handle REP string instructions, which are broken up into 1 effective instruction
        // execution per iteration. The 8088 makes no such distinction.
        loop {
            match cpu.step(false) {
                Ok((step_result, cycles)) => {
                    println!("Instruction reported result {:?}, {} cycles", step_result, cycles);

                    if rep & cpu.in_rep() {
                        continue
                    }
                    break;
                },
                Err(err) => {
                    eprintln!("CPU Error: {}\n", err);
                    cpu.trace_flush();
                    panic!("CPU Error: {}\n", err);
                } 
            }
        }

        // CPU is done with execution. Check final state.
        println!("CPU completed execution.");

        // Get cycle states and registers from CPU.
        let mut cpu_cycles = cpu.get_cycle_states().clone();
        let cpu_regs = cpu.get_vregisters();


The CPU collects cycle states when it is run with validation enabled, we retrieve them with get_cycle_states() so that we can compare with the tests' cycle states, if desired. We also read the CPU registers post-instruction. Comparing the registers is fairly trivial, although you may want to mask flag values to remove undefined flags from the equation (mask values for this purpose are provided in the metadata file 8088.json).

Then, we just need to validate the post-memory state:

        // Validate final memory state.
        for mem_entry in &test.final_state.ram {
            
            // Validate that mem_entry[0] < 1MB
            if mem_entry[0] > 0xFFFFF {
                panic!("Test {}: Invalid memory address value: {:?}", n, mem_entry[0]);
            }

            let addr: usize = mem_entry[0] as usize;

            // Validate that mem_entry[1] fits in u8.
            let byte: u8 = mem_entry[1].try_into().expect(&format!("Test {}: Invalid memory byte value: {:?}", n, mem_entry[1]));
            
            let mem_byte = cpu.bus().peek_u8(addr).expect("Failed to read memory!");

            if byte != mem_byte {
                eprintln!("Test {}: Memory validation error. Address: {:05X} Test value: {:02X} Actual value: {:02X}", n, addr, byte, mem_byte);
                results.pass = false;
            }
        }

That's all you need to validate the functional aspects of a CPU instruction - if you're not interested in emulating a cycle-accurate 8088 CPU, you can stop there.  But if you want that extra accuracy, you can parse the 'cycles' array and get a cycle-by-cycle readout of the CPU's status and bus lines, comparing your emulator to how the real thing performed. Did your reads and writes occur on the same cycle as hardware? Is your prefetch up to snuff? Now you can know for sure.

All of these values, save the t-states and queue read byte, are read directly off of the physical CPU. The t-states and queue read byte are provided by MartyPC running in tandem for your convenience.

Since these tests are hardware generated, we have pretty strong confidence that they are accurate, although there are possible differences between 8088 steppings.  The tests were generated using Harris CD80C88 CPUs. 

The individual tests are double-validated - before a test can be written to disk, MartyPC and the Arduino-controlled 8088 must agree - on a register, memory and cycle-by-cycle basis.  This is extra assurance that the hardware interface isn't producing erroneous results - Serial transfer errors happen! Once a complete test file is produced, MartyPC then runs a separate validation check on the entire file before the file is error-checked, formatted, compressed and committed to GitHub.

A New Era of 8088 Emulation

The 8088 is experiencing an emulation renaissance - with the work of reenigne paving the way, the 8088 has been giving up its secrets at a record pace. We have the microcode disassembly. We now understand the prefetch algorithm and BIU logic. And enterprising souls have been diving deep into analysis of the silicon.  Of course I am proud of MartyPC, but new emulators are appearing that are not just cycle accurate - they actually run the microcode itself. Work is proceeding apace to integrate new findings into the 8088 implementations of 86box and UniPCemu.

If you are thinking of making an 8088 emulator today, making a cycle accurate one is no longer a ridiculous proposition. It's now a fairly straightforward process. Now you have a test set to eliminate trial and error cycle tweaks and lengthy and error prone post-change test processes such as running the entire 8088MPH demo and hoping it all still works... (we've all been there)

It is my sincere hope that emulator developers find this test suite useful and that it encourages more emulator developers take up the challenge of cycle-accurate PC emulation.

The Future - 8088 Test Suite V2

V1 of the 8088 CPU Test Suite is not as comprehensive as it could be, in theory.  Each test is run from a blank slate - a freshly reset CPU.  This has the advantage of starting the instruction with an empty prefetch queue and a known bus state, but it also means it is possible to validate every instruction in the set and still have accuracy issues handling the various bus and queue states encountered during the natural instruction flow of a 8088 CPU deep in program execution.  

Instructions might begin with fetch delay cycles active, they might be fully or partially prefetched. The trap flag or interrupt flag might interrupt instruction flow.  These are all things that are probably worth modelling and testing - but complicate the generation of tests.  

To set up and read out the register state for a single instruction, we have to disturb the state of the CPU by running short programs to capture that information.  That raises a question of how you capture the CPU state while preserving the register and prefetch queue contents during an instruction stream.  An emulator could assist with this, a well-validated emulator producing the register state while the CPU cycle states are captured from hardware. Hybrid test generation, so to speak.  

I have some ideas for producing a set that will fully exercise your BIU and prefetch logic through a variety of instruction transitions, and integrate traps and interrupt handling.  But that is work for another day.

Give me the tests, already!

These tests are now available along many others in the SingleStepTests organization on GitHub. You can find the 8088 tests here.

Happy Testing!

Comments

Popular posts from this blog

PC Floppy Copy Protection: Softguard Superlok

PC Floppy Copy Protection: Formaster Copy-Lock

The Complete Bus Logic of the Intel 8088