Software 3D Rendering in MAME¶

Background ¶

Beginning in the late 1980s, many arcade games began incorporating hardware-rendered 3D graphics into their video. These 3D graphics are typically rendered from low-level primitives into a frame buffer (usually double- or triple-buffered), then perhaps combined with traditional tilemaps or sprites, before being presented to the player.

When it comes to emulating 3D games, there are two general approaches. The first approach is to leverage modern 3D hardware by mapping the low-level primitives onto modern equivalents. For a cross-platform emulator like MAME, this requires having an API that is flexible enough to describe the primitives and all their associated behaviors with high accuracy. It also requires the emulator to be able to read back from the rendered frame buffer (since many games do this) and combine it with other elements, in a way that is properly synchronized with background rendering.

The alternative approach is to render the low-level primitives directly in software. This has the advantage of being able to achieve pretty much any behavior exhibited by the original hardware, but at the cost of speed. In MAME, since all emulation happens on one thread, this is particularly painful. However, just as with the 3D hardware approach, in theory a software-based approach could be spun off to other threads to handle the work, as long as mechanisms were present to synchronize when necessary, for example, when reading/writing directly to/from the frame buffer.

For the time being, MAME has opted for the second approach, leveraging a templated helper class called poly_manager to handle common situations.

Concepts ¶

At its core, poly_manager is a mechanism to support multi-threaded rendering of low-level 3D primitives. Callers provide poly_manager with a set of vertices for a primitive plus a render callback. poly_manager breaks the primitive into clipped scanline extents and distributes the work among a pool of worker threads. The render callback is then called on the worker thread for each extent, where game-specific logic can do whatever needs to happen to render the data.

One key responsibility that poly_manager takes care of is ensuring order. Given a pool of threads and a number of work items to complete, it is important that—at least within a given scanline—all work is performed serially in order. The basic approach is to assign each extent to a bucket based on the Y coordinate. poly_manager then ensures that only one worker thread at a time is responsible for processing work in a given bucket.

Vertices in poly_manager consist of simple 2D X and Y coordinates, plus zero or more additional iterated parameters. These iterated parameters can be anything: intensity values for lighting; RGB(A) colors for Gouraud shading; normalized U, V coordinates for texture mapping; 1/Z values for Z buffering; etc. Iterated parameters, regardless of what they represent, are interpolated linearly across the primitive in screen space and provided as part of the extent to the render callback.

ObjectType ¶

When creating a poly_manager class, you must provide it a special type that you define, known as ObjectType.

Because rendering happens asynchronously on worker threads, the idea is that the ObjectType class will hold a snapshot of all the relevant data needed for rendering. This allows the main thread to proceed—potentially modifying some of the relevant state—while rendering happens elsewhere.

In theory, we could allocate a new ObjectType class for each primitive rendered; however, that would be rather inefficient. It is quite common to set up the rendering state and then render several primitives using the same state.

For this reason, poly_manager maintains an internal array of ObjectType objects and keeps a copy of the last ObjectType used. Before submitting a new primitive, callers can see if the rendering state has changed. If it has, it can ask poly_manager to allocate a new ObjectType class and fill it in. When the primitive is submitted for rendering, the most recently allocated ObjectType instance is implicitly captured and provided to the render callbacks.

For more complex scenarios, where data might change even more infrequently, there is a poly_array template, which can be used to manage data in a similar way. In fact, internally poly_manager uses the poly_array class to manage its ObjectType allocations. More information on the poly_array class is provided later.

Primitives ¶

poly_manager supports several different types of primitives:

The most commonly-used primitive in poly_manager is the triangle, which has the nice property that iterated parameters have constant deltas across the full surface. Arbitrary-length triangle fans and triangle strips are also supported.
In addition to triangles, poly_manager also supports polygons with an arbitrary number of vertices. The list of vertices is expected to be in either clockwise or anticlockwise order. poly_manager will walk the edges to compute deltas across each extent.
As a special case, poly_manager supports a tile primitive, which is a simple quad defined by two vertices, a top-left vertex and a bottom-right vertex. Like triangles, tiles have constant iterated parameter deltas across their surface.
Finally, poly_manager supports a fully custom mechanism where the caller provides a list of extents that are more or less fed directly to the worker threads. This is useful if emulating a system that has unusual primitives or requires highly specific behaviors for its edges.

Synchronization ¶

One of the key requirements of providing an asynchronous rendering mechanism is synchronization. Synchronization in poly_manager is super simple: just call the wait() function.

There are several common reasons for issuing a wait:

At display time, the pixel data must be copied to the screen. If any primitives were queued which touch the portion of the display that is going to be shown, you need to wait for rendering to be complete before copying. Note that this wait may not be strictly necessary in some situations (for example, a triple-buffered system).
If the emulated system has a mechanism to read back from the framebuffer after rendering, then a wait must be issued prior to the read in order to ensure that asynchronous rendering is complete.
If the emulated system modifies any state that is not cached in the ObjectType or elsewhere (for example, texture memory), then a wait must be issued to ensure that pending primitives which might consume that state have finished their work.
If the emulated system can use a previous render target as, say, the texture source for a new primitive, then submitting the second primitive must wait until the first completes. poly_manager provides no internal mechanism to help detect this, so it is on the caller to determine when or if this is necessary.

Because the wait operation knows after it is done that all rendering is complete, poly_manager also takes this opportunity to reclaim all memory allocated for its internal structures, as well as memory allocated for ObjectType structures. Thus it is important that you don’t hang onto any ObjectType pointers after a wait is called.

The poly_manager class ¶

In most applications, poly_manager is not used directly, but rather serves as the base class for a more complete rendering class. The poly_manager class itself is a template:

template<typename BaseType, class ObjectType, int MaxParams, u8 Flags = 0>
class poly_manager;

and the template parameters are:

BaseType is the type used internally for coordinates and iterated parameters, and should generally be either float or double. In theory, a fixed-point integral type could also be used, though the math logic has not been designed for that, so you may encounter problems.
ObjectType is the user-defined per-object data structure described above. Internally, poly_manager will manage a poly_array of these, and a pointer to the most-recently allocated one at the time a primitive is submitted will be implicitly passed to the render callback for each corresponding extent.
MaxParams is the maximum number of iterated parameters that may be specified in a vertex. Iterated parameters are generic and treated identically, so the mapping of parameter indices is completely up to the contract between the caller and the render callback. It is permitted for MaxParams to be 0.
Flags is zero or more of the following flags:
- POLY_FLAG_NO_WORK_QUEUE — specify this flag to disable asynchronous rendering; this can be useful for debugging. When this option is enabled, all primitives are queued and then processed in order on the calling thread when wait() is called on the poly_manager class.
- POLY_FLAG_NO_CLIPPING — specify this if you want poly_manager to skip its internal clipping. Use this if your render callbacks do their own clipping, or if the caller always handles clipping prior to submitting primitives.

Types & Constants ¶

vertex_t ¶

Within the poly_manager class, you’ll find a vertex_t type that describes a single vertex. All primitive drawing methods accept 2 or more of these vertex_t objects. The vertex_t includes the X and Y coordinates along with an array of iterated parameter values at that vertex:

struct vertex_t
{
    vertex_t() { }
    vertex_t(BaseType _x, BaseType _y) { x = _x; y = _y; }

    BaseType x, y;                          // X, Y coordinates
    std::array<BaseType, MaxParams> p;      // iterated parameters
};

Note that vertex_t itself is defined in terms of the BaseType and MaxParams template values of the owning poly_manager class.

All of poly_manager’s primitives operate in screen space, where (0,0) represents the top-left corner of the top-left pixel, and (0.5,0.5) represents the center of that pixel. Left and top pixel values are inclusive, while right and bottom pixel values are exclusive.

Thus, a tile rendered from (2,2)-(4,3) will completely cover 2 pixels: (2,2) and (3,2).

When calling a primitive drawing method, the iterated parameter array p need not be completely filled out. The number of valid iterated parameter values is specified as a template parameter to the primitive drawing methods, so only that many parameters need to actually be populated in the vertex_t structures that are passed in.

extent_t ¶

poly_manager breaks primitives into extents, which are contiguous horizontal spans contained within a single scanline. These extents are then distributed to worker threads, who will call the render callback with information on how to render each extent. The extent_t type describes one such extent, providing the bounding X coordinates along with an array of iterated parameter start values and deltas across the span:

struct extent_t
{
    struct param_t
    {
        BaseType start;                     // parameter value at start
        BaseType dpdx;                      // dp/dx relative to start
    };
    int16_t startx, stopx;                  // starting (inclusive)/ending (exclusive) endpoints
    std::array<param_t, MaxParams> param;   // array of parameter start/deltas
    void *userdata;                         // custom per-span data
};

For each iterated parameter, the start value contains the value at the left side of the span. The dpdx value contains the change of the parameter’s value per X coordinate.

There is also a userdata field in the extent_t structure, which is not normally used, except when performing custom rendering.

render_delegate ¶

When rendering a primitive, in addition to the vertices, you must also provide a render_delegate callback of the form:

void render(int32_t y, extent_t const &extent, ObjectType const &object, int threadid)

This callback is responsible for the actual rendering. It will be called at a later time, likely on a different thread, for each extent. The parameters passed are:

y is the Y coordinate (scanline) of the current extent.
extent is a reference to a extent_t structure, described above, which specifies for this extent the start/stop X values along with the start/delta values for each iterated parameter.
object is a reference to the most recently allocated ObjectType at the time the primitive was submitted for rendering; in theory it should contain most of not all of the necessary data to perform rendering.
threadid is a unique ID indicating the index of the thread you’re running on; this value is useful if you are keeping any kind of statistics and don’t want to add contention over shared values. In this situation, you can allocate WORK_MAX_THREADS instances of your data and update the instance for the threadid you are passed. When you want to display the statistics, the main thread can accumulate and reset the data from all threads when it’s safe to do so (e.g., after a wait).

Methods ¶

poly_manager ¶

poly_manager(running_machine &machine);

The poly_manager constructor takes just one parameter, a reference to the running_machine. This grants poly_manager access to the work queues needed for multithreaded running.

wait ¶

void wait(char const *debug_reason = "general");

Calling wait() stalls the calling thread until all outstanding rendering is complete:

debug_reason is an optional parameter specifying the reason for the wait. It is useful if the compile-time constant TRACK_POLY_WAITS is enabled, as it will print a summary of wait times and reasons at the end of execution.

Return value: none.

object_data ¶

objectdata_array &object_data();

This method just returns a reference to the internally-maintained poly_array of the ObjectType you specified when creating poly_manager. For most applications, the only interesting thing to do with this object is call the next() method to allocate a new object to fill out.

Return value: reference to a poly_array of ObjectType.

register_poly_array ¶

void register_poly_array(poly_array_base &array);

For advanced applications, you may choose to create your own poly_array objects to manage large chunks of infrequently-changed data, such a palettes. After each wait(), poly_manager resets all the poly_array objects it knows about in order to reclaim all outstanding allocated memory. By registering your poly_array objects here, you can ensure that your arrays will also be reset after an wait() call.

Return value: none.