CPU devices

1. Overview

CPU devices derivatives are used, unsurprisingly, to implement the emulation of CPUs, MCUs and SOCs. A CPU device is first a combination of device_execute_interface, device_memory_interface, device_state_interface and device_disasm_interface. Refer to the associated documentations when they exist.

Two more functionalities are specific to CPU devices which are the DRC and the interruptibility support.

2. DRC

TODO.

3. Interruptibility

3.1 Definition

An interruptible CPU is defined as a core which is able to suspend the execution of one instruction at any time, exit execute_run, then at the next call of execute_run keep going from where it was. This includes being able to abort an issued memory access, quit execute_run, then upon the next call of execute_run reissue the exact same access.

3.2 Implementation requirements

Memory accesses must be done with read_interruptible or write_interruptible on a memory_access_specific or a memory_access_cache. The access must be done as bus width and bus alignment.

After each access the core must test whether icount <= 0. This test should be done after icount is decremented of the time taken by the access itself, to limit the number of tests. When icount reaches 0 or less it means that the instruction emulation needs to be suspended.

To know whether the access needs to be re-issued, access_to_be_redone() needs to be called. If it returns true then the time taken by the access needs to be credited back, since it hasn't yet happened, and the access will need to be re-issued. The call to access_to_be_redone() clears the reissue flag. If you need to check the flag without clearing it use access_to_be_redone_noclear().

The core needs to do enough bookkeeping to eventually restart the instruction execution just before the access or just after the test, depending on the need of reissue.

Finally, to indicate to the rest of the infrastructure the support, it must override cpu_is_interruptible() to return true.

3.3 Example implementation with generators

To ensure decent performance, the current implementations (h8, 6502 and 68000) use a python generator to generate two versions of each instruction interpreter, one for the normal emulation, and one for restarting the instruction.

The restarted version looks like that (for a 4-cycles per access cpu):

void device::execute_inst_restarted()
{
    switch(m_inst_substate) {
    case 0:
        [...]

        m_address = [...];
        m_mask = [...];
        [[fallthrough]];
    case 42:
        m_result = specific.read_interruptible(m_address, m_mask);
        m_icount -= 4;
        if(m_icount <= 0) {
            if(access_to_be_redone()) {
                m_icount += 4;
                m_inst_substate = 42;
            } else
                m_inst_substate = 43;
            return;
        }
        [[fallthrough]];
    case 43:
        [...] = m_result;
        [...]
    }
    m_inst_substate = 0;
    return;
}

The non-restarted version is the same thing with the switch and the final m_inst_substate clearing removed.

void device::execute_inst_non_restarted()
{
    [...]
    m_address = [...];
    m_mask = [...];
    m_result = specific.read_interruptible(m_address, m_mask);
    m_icount -= 4;
    if(m_icount <= 0) {
        if(access_to_be_redone()) {
            m_icount += 4;
            m_inst_substate = 42;
        } else
            m_inst_substate = 43;
        return;
    }
    [...] = m_result;
    [...]
    return;
}

The main loop then looks like this:

void device::execute_run()
{
    if(m_inst_substate)
        call appropriate restarted instruction handler
    while(m_icount > 0) {
        debugger_instruction_hook(m_pc);
        call appropriate non-restarted instruction handler
    }
}

The idea is thus that m_inst_substate indicates where in an instruction one is, but only when an interruption happens. It otherwise stays at 0 and is essentially never looked at. Having two versions of the interpretation allows to remove the overhead of the switch and the end-of-instruction substate clearing.

It is not a requirement to use a generator-based that method, but a different one which does not have unacceptable performance implications has not yet been found.

3.4 Bus contention cpu_device interface

The main way to setup bus contention is through the memory maps. Lower-level access can be obtained through some methods on cpu_device though.

bool cpu_device::access_before_time(u64 access_time, u64 current_time) noexcept;

The method access_before_time allows to try to run an access at a given time in cpu cycles. It takes the current time (total_cycles()) and the expected time for the access. If there aren't enough cycles to reach that time the remaining cycles are eaten and the method returns true to tell not to do the access and call the method again eventually. Otherwise enough cycles are eaten to reach the access time and false is returned to tell to do the access.

bool cpu_device::access_before_delay(u32 cycles, const void *tag) noexcept;

The method access_before_delay allows to try to run an access after a given delay. The tag is an opaque, non-nullptr value used to characterize the source of the delay, so that the delay is not applied multiple times. Similarly to the previous method cycles are eaten and true is returned to abort the access, false to execute it.

void cpu_device::access_after_delay(u32 cycles) noexcept;

The method access_after_delay allows to add a delay after an access is done. There is no abort possible, hence no return boolean.

void cpu_device::defer_access() noexcept;

The method defer_access tells the cpu that we need to wait for an external event. It marks the access as to be redone, and eats all the remaining cycles of the timeslice. The idea is then that the access will be retried after time advances up to the next global system synchronisation event (sync, timer timeout or set_input_line). This is the method to use when for instance waiting on a magic latch for data expected from scsi transfers, which happen on timer timeouts.

void cpu_device::retry_access() noexcept;

The method retry_access tells the cpu that the access will need to be retried, and nothing else. This can easily reach a situation of livelock, so be careful. It is used for instance to simulate a wait line (for the z80 for instance) which is controlled through set_input_line. The idea is that the device setting wait does the set_input_line and a retry_access. The cpu core, as long as the wait line is set just eats cycles. Then, when the line is cleared the core will retry the access.

3.5 Interaction with DRC

At this point, interruptibility and DRC are entirely incompatible. We do not have a method to quit the generated code before or after an access. It's theorically possible but definitely non-trivial.