State Management in WineD3D
The idea behind the state management is to keep a list of changed states per OpenGL context. If the Application changes a Direct3D state, it is recorded as beeing dirty and its OpenGL equivalents are set before the next draw is performed. The basic ideas are:
- Avoid to apply states unnecessarily
- Avoid double applying states
- Keep track of states that interoperate with each other
- And of course, make sure every state that has to be applied is applied
Structures and Data types
States
All states are mapped into a single field of state numbers. This is done by Macros defined in dlls/wined3d/wined3d_private.h. For example
STATE_RENDER(x) returns the number of render state x STATE_TRANSFORM(x) returns the number of transform state x STATE_TEXTURESTAGE(s, x) returns the number of texture stage state x for texture stage s STATE_VERTEXSHADER returns the number of the vertex shader state
STATE_IS_RENDER(y) returns if state number y identifies a render state STATE_IS_TEXTURESTAGE(y) returns if state y identifies a texture stage state.
STATE_HIGHEST is the number of the highest known state
The state table
typedef void (*APPLYSTATEFUNC)(DWORD state, IWineD3DStateBlockImpl *stateblock, WineD3DContext *ctx);
struct StateEntry
{
DWORD representative;
APPLYSTATEFUNC apply;
};
This structure describes a Describes an entry for a single state in the state table. The representative is a state that represents a group of Direct3D states which affect each other. See "Grouping states" below. Apply is a pointer to a function that applies the state to the OpenGL context.
The dirty list
The dirty states list consists of an array of state numbers, a counter for the number of dirty states and a bitmap marking each state dirty. The dirty state array is a redundant copy of the bitmap to quickly find out which states are dirty. Finding dirty states in the bitmap would need STATE_HIGHEST / 32 iterations. Experience has shown that usually less than 20 states are dirtified between draws. The bitmap on the other hand is needed to quickly find out if a specific state is dirty.
There is a list per gl context used. There is one gl context for each swapchain and, if pbuffers are used, one context for offscreen rendering with the pbuffer drawable. The context to apply the state to is passed to the apply function
Code and Functions
Marking a state dirty
void IWineD3DDeviceImpl_MarkStateDirty(IWineD3DDeviceImpl *This, DWORD state);
This function marks a Direct3D state dirty for all contexts. The function first retrieves the representative of the state from the state table. Then it checks if the representative is already marked dirty. If the representative is 0, or marked dirty already nothing is done. Otherwise it is put onto the dirty states list and the bit representing the state is set to 1 in the dirty bitmap. In the context management code there is also a function to mark states dirty for one context, but that shouldn't be needed outside of the context manager.
Checking if a state is dirty
static inline BOOL isStateDirty(WineD3DContext *context, DWORD state);
Checks the dirty state bitmap if the bit for a specific state is set in the context's dirty state list. Returns FALSE if it isn't set, and any number != 0 if it is set.
Applying a state
typedef void (*APPLYSTATEFUNC)(DWORD state, IWineD3DStateBlockImpl *stateblock, WineD3DContext *ctx);
This is the prototype for the state application function. To create a new apply function for a state implement a new apply function in dlls/wined3d/state.c and specify the apply function in the state table at the index of the state.
The apply function gets the number of the dirty state passed, and a pointer to the stateblock implementation to read the new settings from. Most apply functions will inherently know which state they have to apply. For example, state_lighting(applying the render state WINED3DRS_LIGHTING) will never be called with a different render state than STATE_LIGHTING, so it just doesn't use the parameter. The functions applying sampler states and texture stage states however need the value to find the proper sampler / stage which was updated.
ctx is a per-context structure, it specifies the context the state is applied to. Some optimization members are stored in the context structure.
The apply loop
Applying all dirty states is done in ActivateContext(), in context.c The apply function for every state that is marked dirty is called, the dirty bit for that state is cleared before doing that. After all states were applied the number of dirty states is set to 0. The dirty list does not need other cleaning.
For states to be applied CTXUSAGE_DRAWPRIM has to be passed to ActivateContext. Other options are CTXUSAGE_BLIT, which sets up the context for 2D drawing(most effects deactivated, multitexturing off, shaders off). CTXUSAGE_LOADRESOURCE just activates the context without changing any states.
Techniques
State grouping
State grouping is an effective way to deal with states that interact with each other. For example, WINED3DRS_FOGENABLE, WINED3DRS_FOGSTART, FOGEND, FOGTABLEMODE and FOGVERTEXMODE affect the glEnable(GL_FOG) state, the fog ranges and the fog niceness hinting value. If one of those states is changed all 3 gl states have to be reapplied. Consequently, if all 5 d3d states are changed it is enough to apply only one of them once because it will update the others too.
To deal with this, the representative of these 5 render states is STATE_RENDER(WINED3DRS_FOGENABLE), and the apply function is always state_fog. The first fog state that is dirtified will dirtify FOGENABLE, while the others will find their representative dirty and do nothing.
It also works across state types: STATE_TEXTURESTAGE(0, WINED3DTSS_TEXTURETRANSFORMFLAGS) is grouped with STATE_TRANSFORM(WINED3DTS_TEXTURE0). Likewise for stage 1, WINED3DTS_TEXTURE1 and so on.
Dynamic state grouping
While the state grouping above is a nice way which does not need any extra code, it has a disadvantage. The bigger a group is the more expensive it is to apply it and the more likely it is that it gets dirtified. Many state dependencies are in the way that state A depends on the settings of state B, but state B not on state A. An example are the Vertex declaration(STATE_VDECL) and the lighting state(STATE_RENDER(WINED3DRS_LIGHTING)). If the vertex declaration is changed lighting may have to be enabled / disabled depending on that, but if lighting is on or off it does not change the vertex declaration.
This is dealt with in the following way:
STATE_VDECL checks if the lighting state is dirty. If it is not, then the apply function of that state is called:
if(!isStateDirty(device, STATE_RENDER(WINED3DRS_LIGHTING))) {
state_lighting(STATE_RENDER(WINED3DRS_LIGHTING), stateblock);
}
The lighting state checks if the vertex declaration is dirty:
if(isStateDirty(device, STATE_VDECL)) {
return;
}
Then it reads what it needs from the vertex declaration and applies the new lighting settings to gl. This is used heavily around the vertex declaration state and between WINED3DTSS_COLOROP and the samplers.
There are cases when state A will require a modification of state B only under rare conditions. The vertex declaration state will require a projection matrix change only if it switches between transformed and untransformed states. Here the projection matrix apply function does not check if the vertex declaration is dirty. This makes general vertex decl switches cheaper while risking that the projection matrix is applied twice on transformed / untransformed switches.
Another example are vertex shaders and lighting. If a vshader is enabled, WINED3DRS_LIGHTING has no effect. So state_lighting could just return if a vertex shader is enabled, that could safe new games the penalty of a lighting state switch if they (accidentally) change WINED3DRS_LIGHTING. However, it would make turning vertex shaders on and off more expensive because lighting will have to be applied every time the vertex shader is changed.
The policy is basically to put priorities on newer applications, as older ones should run faster anyway. But often we rely on the cleverness of the apps not to change fixed function states needlessly if using shaders, or hope that the driver makes that cheap.
States are marked clean before the apply function is called!
Debugging
To debug regressions, it is useful to find out if they occured due to incorrect dirtification, or a problem with moving the application code. To test this, the applying function can be called forefully after the applying loop in drawprimitive():
StateTable[STATE_SOMETHING].apply(STATE_SOMETHING, This->stateBlock);
will force an update of STATE_SOMETHING before each draw. If that changes the graphics in any way, for the better or worse, then something is wrong with the dirtification of that state. If it does not, then the dirtification of STATE_SOMETHING should work ok.
Basically states should be applyable at any time, even in IWineD3DDeviceImpl_MarkStateDirty(That one would confuse the code a lot and cause major performance issues with vertex buffers).
Misc stuff
- Applying certain states is very expensive. This affects mainly the vertex declaration state. Some apps(Half-Life 1, Half-Life 2, the d3d8 billboard demo) got up to 100% faster by not applying this state every draw. That state should not be that expensive. I think it is loading the OpenGL arrays that is so expensive. Other culprits are sampler states and the colorop state
Applying all states directly in MarkStateDirty causes some corruptions. This should not happen. While applying all states immediately will heavily confuse the vertex buffer code and cause performance issues due to that, it should still render correctly. Find out why that is.
