Difference between revisions of "Performance"

From WineHQ Wiki
Jump to: navigation, search
(Fix category)
(Update the contents a bit)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
Now Wine has matured to the point of running many applications correctly, people are expecting them to run as fast as on Windows. Sadly, this is not always the case. Here are a few notes related to tracking down performance issues.
+
Now that Wine has matured to the point of running many applications correctly, people are expecting them to run as fast as on Windows. Sadly, this is not always the case. Here are a few notes related to tracking down performance issues.
  
 
=== User Tips ===
 
=== User Tips ===
  
If you're just trying to increase FPS in your games, you can experiment with the following short list of settings:
+
Surprising as it may be, there really isn't a magic toggle to make everything faster. Anyway, for games, you can experiment with the following short list of settings:
* if the game has an opengl mode, try using that
+
* if the game has an OpenGL mode, you could try using that. Notice that sometimes it enables code paths that aren't well tested (or are plain broken) in the game
* else winetricks glsl=disable can improve Direct3D performance (but not for all games, and probably only if you have an nvidia card)
+
otherwise
* anything listed in the appdb as helping your particular game. For instance, http://appdb.winehq.org/objectManager.php?sClass=version&iId=19065 says Batman Arkham Asylum runs a lot faster if you set AmbientOcclusion=False in UserEngine.ini.
+
* enable CSMT (https://wiki.winehq.org/Useful_Registry_Keys)
 +
* if you run your game from the command line, run it with WINEDEBUG=-all. That disables a bunch of error checking and validation
 +
* enable threaded optimization in the GL drivers: you can do that with environment variables, that's mesa_glthread=true for Mesa drivers or __GL_THREADED_OPTIMIZATIONS=1 for Nvidia proprietary drivers
 +
* disabling UseGLSL might slightly improve Direct3D performance and reduce shader compilation stuttering (but that has a chance to work only for d3d9 or older games and usually only with the Nvidia proprietary driver)
 +
* anything listed in the appdb as helping your particular game. For instance, http://appdb.winehq.org/objectManager.php?sClass=version&iId=19065 says Batman Arkham Asylum runs a lot faster if you set AmbientOcclusion=False in UserEngine.ini.
 
* anything listed on the web as speeding up the Windows version.  (For instance, http://www.hardocp.com/article/2009/10/19/batman_arkham_asylum_physx_gameplay_review/3 suggests Batman Arkman Asylum runs a lot faster if you set MotionBlur=False in that same file.)
 
* anything listed on the web as speeding up the Windows version.  (For instance, http://www.hardocp.com/article/2009/10/19/batman_arkham_asylum_physx_gameplay_review/3 suggests Batman Arkman Asylum runs a lot faster if you set MotionBlur=False in that same file.)
  
Yup, that's a short list.  Sorry.
 
  
 
The rest of this document is meant for developers more than users.
 
The rest of this document is meant for developers more than users.
Line 28: Line 31:
  
 
=== Known Bottlenecks ===
 
=== Known Bottlenecks ===
* applications that frequently compile d3d shaders may have low FPS; see e.g. http://bugs.winehq.org/show_bug.cgi?id=23832  ('winetricks glsl-disable' helps some games with this, as it avoids translating hlsl bytecodes into glsl and running them through the slow compiler. Only works on Nvidia, though.)
+
* applications that frequently compile D3D shaders may suffer from stuttering; see e.g. http://bugs.winehq.org/show_bug.cgi?id=23832  ('winetricks glsl-disable' helps some games with this, as it translates D3D bytecode into the lower level ARB_{vertex|fragment}_program language rather than GLSL and the driver is generally quicker at compiling those. Only works on Nvidia, though.)
 
* applications that rely on precise thread scheduling will be disappointed (see e.g. http://hisouten.koumakan.jp/wiki/Linux_support#Resolved_bugs )
 
* applications that rely on precise thread scheduling will be disappointed (see e.g. http://hisouten.koumakan.jp/wiki/Linux_support#Resolved_bugs )
 
* applications that use kernel events heavily may run slowly (e.g. Unreal Tournament 3 - Stefan?)
 
* applications that use kernel events heavily may run slowly (e.g. Unreal Tournament 3 - Stefan?)
 
* applications that are picky about what core a thread runs on may be disappointed (see e.g. http://bugs.winehq.org/show_bug.cgi?id=19748 )
 
* applications that are picky about what core a thread runs on may be disappointed (see e.g. http://bugs.winehq.org/show_bug.cgi?id=19748 )
* some applications are much slower now that ORM defaults to fbo ( see e.g. http://bugs.winehq.org/show_bug.cgi?id=18232 )
+
* some applications don't interact well with power management and supposedly get 30% to 80% higher frame rate if you run at full power unconditionally (see http://bugs.winehq.org/show_bug.cgi?id=24558)
* some applications (sc2) don't interact well with power management, and supposedly get 30% to 80% more frame rate if you run at full power unconditionally ( see http://bugs.winehq.org/show_bug.cgi?id=24558 )
+
  
 
=== Kernel? ===
 
=== Kernel? ===
The linux kernel might have something to do with performance. CONFIG_NO_HZ might make a difference, for instance.
+
The linux kernel might have something to do with performance. CONFIG_NO_HZ might make a difference, for instance.
 
Here are a few notes on the subject:
 
Here are a few notes on the subject:
* World of Warcraft supposedly has pauses every 15 seconds unless you modify /etc/sysctl to change kernel.sched_batch_wakeup_granularity_ns and a couple other parameters (see http://wiki.archlinux.org/index.php/World_of_Warcraft#Kernel_Timing_Bug )
+
* World of Warcraft supposedly had pauses every 15 seconds unless you modify /etc/sysctl to change kernel.sched_batch_wakeup_granularity_ns and a couple other parameters (see http://wiki.archlinux.org/index.php/World_of_Warcraft#Kernel_Timing_Bug)
 
* http://forums.gentoo.org/viewtopic-t-789725.html is a thread about kernel configuration and Doom3/Quake4/Prey performance.
 
* http://forums.gentoo.org/viewtopic-t-789725.html is a thread about kernel configuration and Doom3/Quake4/Prey performance.
 
* http://code.google.com/p/realtimeconfigquickscan/ (you have to check the whole tree out) is a little script to check whether your system is set up for low latency (mostly for audio)
 
* http://code.google.com/p/realtimeconfigquickscan/ (you have to check the whole tree out) is a little script to check whether your system is set up for low latency (mostly for audio)
Line 54: Line 56:
  
 
Here are a few that have been used with some success:
 
Here are a few that have been used with some success:
* [:ProfileGuidedOptimization:Profile-guided optimization]
+
* [https://perf.wiki.kernel.org/index.php/Main_Page perf]
* gprof (not really recommended anymore)
+
 
* [http://oprofile.sourceforge.net/ oprofile] (classic sampling whole-system profiler for linux, can profile into kernel, too)
 
* [http://oprofile.sourceforge.net/ oprofile] (classic sampling whole-system profiler for linux, can profile into kernel, too)
 
* [http://www.daimi.au.dk/~sandmann/sysprof/ sysprof] newer, easier sampling whole-system profiler for linux
 
* [http://www.daimi.au.dk/~sandmann/sysprof/ sysprof] newer, easier sampling whole-system profiler for linux
See also [http://wine-wiki.org/index.php/Debugging_Wine#Profiling_Wine_-_.28making_it_go_faster.29 jswindle's notes about profiling].
 
  
See [[WineD3DOnWindows]] for how to build WineD3D for Windows; this is cool because it lets you figure out if the bottleneck is in wined3d itself. You might also be able to use debugging nvidia or ati drivers on Windows that point out silly things wined3d is doing.
+
There are also a few graphic debug tools that might come in handy, like [https://apitrace.github.io/ apitrace] or [https://renderdoc.org/ renderdoc].
  
See also [http://members.gamedev.net/jhoxley/directx/DirectXForumFAQ.htm the DirectX Forum FAQ], which seems to have some useful tips.
+
It's possible to build wined3d for Windows; this is cool because it might let you figure out if the bottleneck is in wined3d itself.
 
+
Graphic Remedy are making their OpenGL profiler/debugger gDEBugger, available at zero cost [http://www.gremedy.com/purchase.php see here], GNU/Linux, Mac and Windows versions are available.
+
  
 
=== Game Performance Debugging Tutorial ===
 
=== Game Performance Debugging Tutorial ===
Line 78: Line 76:
 
* Make sure you actually have a problem: Test the game on Windows on the same hardware. If it runs at the same speed there's probably nothing Wine can do to fix it. You're free to try though.
 
* Make sure you actually have a problem: Test the game on Windows on the same hardware. If it runs at the same speed there's probably nothing Wine can do to fix it. You're free to try though.
 
* Make sure your System is OK. No background processes consuming CPU time, etc. Try disabling Desktop compositing
 
* Make sure your System is OK. No background processes consuming CPU time, etc. Try disabling Desktop compositing
* A few drivers are known to cause issues(Sept. 17th 2010): Mesa(all Open Source Linux drivers) and drivers for Intel GPUs on OSX.
 
 
* Check for log messages. Sometimes Wine knows it hits a slow path and writes a FIXME or WARN. Don't fix too much on such warnings, they may indicate a comparatively minor problem.
 
* Check for log messages. Sometimes Wine knows it hits a slow path and writes a FIXME or WARN. Don't fix too much on such warnings, they may indicate a comparatively minor problem.
 
* Watch out for high wineserver CPU usage
 
* Watch out for high wineserver CPU usage
 
* Currently 80% of Windows performance is a high water mark, few games run faster. Many games run around 50%, which means there's a lot of room for improvement. If your game runs even slower there are probably some nasty bugs around.
 
* Currently 80% of Windows performance is a high water mark, few games run faster. Many games run around 50%, which means there's a lot of room for improvement. If your game runs even slower there are probably some nasty bugs around.
  
2) GPU, CPU or bandwidth limited?
+
Some words about CSMT. The basic idea is to segregate wined3d's GL calls to a separate thread: wined3d functions indirectly called by the application threads queue messages to the command stream thread which, on its part, essentially keeps checking the queue and executing whatever instruction it finds there.<br />
 +
CSMT has been initially developed as a way to enhance wined3d performance, with the "theory" being that by splitting the potentially expensive GL calls out of the application thread we can quickly return to the application code and hopefully lessen our CPU-side woes. That turned out to work pretty well but with a caveat: some commands (e.g. mapping buffers) actually NEED to wait for the operation to complete since they have to return data to the application. Without some "tricks" when we get one of those commands we have to effectively wait until all the commands queued before it complete their execution, which means that the application thread has to block and wait, potentially throwing a lot of the performance improvement out of the window.<br />
 +
The version of CSMT initially introduced in "official" Wine does without some of those tricks (specifically, those that weren't generally safe) and has some room for further performance improvements. It still works perfectly fine as a better replacement of the StrictDrawOrdering option.
  
3D rendering is a complex process, and multiple components have to play together:
+
2) GPU, CPU or sync limited?
 +
 
 +
3D rendering is a complex process where multiple components have to play together:
 
* The GPU has to render the scene fast enough
 
* The GPU has to render the scene fast enough
 
* Software on the CPU side(driver, game, ...) has to generate rendering commands fast enough
 
* Software on the CPU side(driver, game, ...) has to generate rendering commands fast enough
* The bus must be fast enough to transfer data between the GPU and CPU without delays
+
* Software and GPU should, as much as possible, not have to wait for each other
  
Bus transfers are usually not an issue with Wine. If they are you can probably fix this by improving our GL_ARB_vertex_buffer or GL_ARB_pixel_buffer based code.
+
If your game slows down if you increase the screen resolution (or speeds up when you lower it) you are probably GPU limited. This is because the CPU rarely touches single pixels, so with increased resolution only the strain on the GPU increases. Exceptions may apply especially in older ddraw/d3d7 based games..
  
If your game slows down if you increase the screen resolution(or speeds up when you lower it) you are probably GPU limited. This is because the CPU rarely touches single pixels, so with increased resolution only the strain on the GPU increases. Exceptions may apply especially in older ddraw/d3d7 based games..
+
Figuring out if the issue is excessive synchronization is a bit harder. A significant portion of wined3d time being spent in wined3d_cs_mt_finish() (with CSMT enabled) might hint in this direction, although that isn't necessarily the case and there might be synchronization points hidden in other spots.
  
 
3a) GPU limited situation:
 
3a) GPU limited situation:
  
If you're GPU limited the problem could be inefficient shaders, or incorrectly configured render target formats. Try ARB shaders(if the game is happy with Shader Model 2.0 or you have a Nvidia GPU), and check logs for usage of 16 bit per channel or floating point render targets.
+
If you're GPU limited the problem could be inefficient shaders, or incorrectly configured render target formats. Try ARB shaders (if the game is happy with Shader Model 2.0 or you have a Nvidia GPU), and check logs for usage of 16 bit per channel or floating point render targets.
  
 
Unless you have a low end GPU or play at a very high resolution GPU limitations are rare. A seeming GPU limitation may also be a sign of a software rendering fallback in the fragment pipeline.
 
Unless you have a low end GPU or play at a very high resolution GPU limitations are rare. A seeming GPU limitation may also be a sign of a software rendering fallback in the fragment pipeline.
Line 113: Line 114:
  
 
# The performance on a normal Windows system
 
# The performance on a normal Windows system
# The performance on Windows while running the game with wined3d[insert link here]
+
# The performance on Windows while running the game with wined3d
# The performance on Linux/OSX with Wine
+
# The performance on Linux/macOS with Wine
# If you are on OSX: The performance with Wine+OSX.
+
# If you are on macOS: The performance with Wine+macOS.
  
 
If there is a noticeable difference between (1) and (2) the problem is likely in the 3D related parts(wined3d, OpenGL driver). That's because all the other components haven't been changed. If there is a noticeable difference between (2) and (3) the problem is either in the non-3D parts(Rest of Wine, Linux kernel, ...). Make sure the game runs in the same codepath in all 4(or 3) cases - some games can use both d3d9 and d3d10 for example.
 
If there is a noticeable difference between (1) and (2) the problem is likely in the 3D related parts(wined3d, OpenGL driver). That's because all the other components haven't been changed. If there is a noticeable difference between (2) and (3) the problem is either in the non-3D parts(Rest of Wine, Linux kernel, ...). Make sure the game runs in the same codepath in all 4(or 3) cases - some games can use both d3d9 and d3d10 for example.
  
A difference between (2) and (3) may also indicate a problem in the Linux 3D driver that does not occur on Windows. If you are using AMD's or Nvidia's binary drivers this is unlikely because the Windows and Linux drivers share a pretty big common core. On OSX there is less code sharing, so if you're debugging performance issues on OSX the difference between (3) and (4) can matter.
+
A difference between (2) and (3) may also indicate a problem in the Linux 3D driver that does not occur on Windows. If you are using AMD's or Nvidia's binary drivers this is unlikely because the Windows and Linux drivers share a pretty big common core. On macOS there is less code sharing, so if you're debugging performance issues on macOS the difference between (3) and (4) can matter.
  
4) Oprofile
+
3c) Limited by CPU-GPU synchronization
  
Now that you have a rough idea if you're looking for issues in 3D or non 3D parts it's time to separate modules further. Oprofile or other profilers can help here.
+
Ideally GPU and CPU should run as independent of each other as possible, to allow them to work at the same time without interruptions and exploit the available resources to the fullest.
 +
It's the application that ultimately decides how to set up its rendering pipeline, which means that sometimes there isn't much that Wine can do to avoid costly CPU-GPU synchronization points. That's one more reason why comparing performance with Windows is important.
 +
Assuming a well-behaved application, wined3d should strive to avoid introducing its own synchronization points. To figure out if there is room for improvement, the only practical thing to do is to look at the d3d log by hand (probably with some additional traces around wined3d_cs_finish() or its callers or other interesting points). I've also found it useful to make a log with just those ad-hoc traces (so to not slow down the game too much) and check it "realtime" with the game and maybe perf.
 +
 
 +
4) Profilers
 +
 
 +
Now that you have a rough idea if you're looking for issues in 3D or non 3D parts it's time to separate modules further. perf, sysprof or other profilers can help here.
  
 
Case A): 3D related problems:
 
Case A): 3D related problems:
  
 
The main question is if the GPU time is spent in wined3d.dll or the driver. If more than 5% CPU time are spent in wined3d this is suspicious. If a lot of CPU time is spent in the driver libraries(> 30% if you need some threshold) it may indicate a bug in the driver itself, or inefficient GL calls made by wined3d. With binary drivers the difference doesn't matter much since you can't change the driver anyway.
 
The main question is if the GPU time is spent in wined3d.dll or the driver. If more than 5% CPU time are spent in wined3d this is suspicious. If a lot of CPU time is spent in the driver libraries(> 30% if you need some threshold) it may indicate a bug in the driver itself, or inefficient GL calls made by wined3d. With binary drivers the difference doesn't matter much since you can't change the driver anyway.
 +
 +
Specifically for CSMT, a high wined3d_cs_run() CPU usage most likely means that the command stream thread is spinning a lot waiting for new commands to execute. That in turn points to the application thread not being able to keep up with the pace of the command stream / GPU execution (possibly because of wined3d stuff in the application thread) or that there is a lot of synchronization with the command stream thread and the queue is flushed a lot. A lot of time being spent in wined3d_cs_emit_present(), instead, suggests that the command stream thread is the bottleneck. In turn, that might be busy on the GPU or CPU side. In the latter case, driver's threaded optimizations might help, if available.
  
 
Case B): Non 3D related problems:
 
Case B): Non 3D related problems:
Line 137: Line 146:
 
Telling those apart is not easy. A driver may not support OpenGL extensions used for speedups, causing high CPU usage in wined3d(e.g. GL_ARB_vertex_buffer_object, GL_ARB_vertey_array_bgra). WineD3D may make inefficient calls causing high CPU usage in the driver.
 
Telling those apart is not easy. A driver may not support OpenGL extensions used for speedups, causing high CPU usage in wined3d(e.g. GL_ARB_vertex_buffer_object, GL_ARB_vertey_array_bgra). WineD3D may make inefficient calls causing high CPU usage in the driver.
  
A rule of thumb way is to try a different GPU/driver. If a game is slow on Nvidia GPUs(compared to Windows) and fast on AMD GPUs(compared to Windows on the same card) then the problem may be in the driver. If you have a binary driver you may as well just assume that the problem is in wined3d since otherwise you're mostly out of luck anyway.
+
A rule of thumb way is to try a different GPU/driver. If a game is slow on Nvidia GPUs (compared to Windows) and fast on AMD GPUs (compared to Windows on the same card) then the problem may be in the driver. If you have a binary driver you may as well just assume that the problem is in wined3d since otherwise you're mostly out of luck anyway.
  
 
If you isolate a problem in the driver don't forget to report it to the driver vendor.
 
If you isolate a problem in the driver don't forget to report it to the driver vendor.
Line 143: Line 152:
 
6) Other hints:
 
6) Other hints:
  
* GPU vendors offer various performance debugging tools. Those can be helpful for debugging GPU or bandwidth limitations. They're usually focused on Windows, so you're probably better off by debugging wined3d on windows
+
* GPU vendors offer various performance debugging tools. Those can be helpful for debugging GPU or bandwidth limitations. They're usually focused on Windows, so you're probably better off by debugging wined3d on Windows
* OSX has some nice tools too, specifically the OpenGL profiler and the Driver Monitor. If you have OSX available it may give you some helpful hints even if you are mainly focusing on fixing a performance bug on Linux.
+
 
* Windows has helpful tools as well, for example [http://msdn.microsoft.com/en-us/library/ms182404(VS.80).aspx VsPerfMon] (Kinda like oprofile). To use those you may have to compile wined3d with Visual Studio to get Microsoft-Style debug symbols.
 
* Windows has helpful tools as well, for example [http://msdn.microsoft.com/en-us/library/ms182404(VS.80).aspx VsPerfMon] (Kinda like oprofile). To use those you may have to compile wined3d with Visual Studio to get Microsoft-Style debug symbols.
 
* Don't hesitate to contact wine-devel for help.
 
* Don't hesitate to contact wine-devel for help.

Latest revision as of 18:55, 17 January 2018

Now that Wine has matured to the point of running many applications correctly, people are expecting them to run as fast as on Windows. Sadly, this is not always the case. Here are a few notes related to tracking down performance issues.

User Tips

Surprising as it may be, there really isn't a magic toggle to make everything faster. Anyway, for games, you can experiment with the following short list of settings:

  • if the game has an OpenGL mode, you could try using that. Notice that sometimes it enables code paths that aren't well tested (or are plain broken) in the game

otherwise

  • enable CSMT (https://wiki.winehq.org/Useful_Registry_Keys)
  • if you run your game from the command line, run it with WINEDEBUG=-all. That disables a bunch of error checking and validation
  • enable threaded optimization in the GL drivers: you can do that with environment variables, that's mesa_glthread=true for Mesa drivers or __GL_THREADED_OPTIMIZATIONS=1 for Nvidia proprietary drivers
  • disabling UseGLSL might slightly improve Direct3D performance and reduce shader compilation stuttering (but that has a chance to work only for d3d9 or older games and usually only with the Nvidia proprietary driver)
  • anything listed in the appdb as helping your particular game. For instance, http://appdb.winehq.org/objectManager.php?sClass=version&iId=19065 says Batman Arkham Asylum runs a lot faster if you set AmbientOcclusion=False in UserEngine.ini.
  • anything listed on the web as speeding up the Windows version. (For instance, http://www.hardocp.com/article/2009/10/19/batman_arkham_asylum_physx_gameplay_review/3 suggests Batman Arkman Asylum runs a lot faster if you set MotionBlur=False in that same file.)


The rest of this document is meant for developers more than users.

Related Discussion

Performance-related bugs

Here are some performance-related bugzilla queries:

Known Bottlenecks

  • applications that frequently compile D3D shaders may suffer from stuttering; see e.g. http://bugs.winehq.org/show_bug.cgi?id=23832 ('winetricks glsl-disable' helps some games with this, as it translates D3D bytecode into the lower level ARB_{vertex|fragment}_program language rather than GLSL and the driver is generally quicker at compiling those. Only works on Nvidia, though.)
  • applications that rely on precise thread scheduling will be disappointed (see e.g. http://hisouten.koumakan.jp/wiki/Linux_support#Resolved_bugs )
  • applications that use kernel events heavily may run slowly (e.g. Unreal Tournament 3 - Stefan?)
  • applications that are picky about what core a thread runs on may be disappointed (see e.g. http://bugs.winehq.org/show_bug.cgi?id=19748 )
  • some applications don't interact well with power management and supposedly get 30% to 80% higher frame rate if you run at full power unconditionally (see http://bugs.winehq.org/show_bug.cgi?id=24558)

Kernel?

The linux kernel might have something to do with performance. CONFIG_NO_HZ might make a difference, for instance. Here are a few notes on the subject:

Measuring Performance

If you're making a tweak that speeds up one application, you probably want to run a bunch of different performance benchmarks before and after, to make sure it doesn't slow other applications down.

Many benchmarks exist for Windows. ["yagmark"] is a shell script that knows how to download and run several of them. 3DMark 06 in particular is a classic benchmark we'd like to do better on.

Looking for Bottlenecks

There are many profiling tools that can help find bottlenecks in Wine.

Here are a few that have been used with some success:

  • perf
  • oprofile (classic sampling whole-system profiler for linux, can profile into kernel, too)
  • sysprof newer, easier sampling whole-system profiler for linux

There are also a few graphic debug tools that might come in handy, like apitrace or renderdoc.

It's possible to build wined3d for Windows; this is cool because it might let you figure out if the bottleneck is in wined3d itself.

Game Performance Debugging Tutorial

First, the bad news: Often there is not a magic bullet to improve performance. It needs careful debugging and may be tricky to fix.

Note that "unlikely" below doesn't mean "impossible", neither does "likely" mean "always". Rules of thumb can be wrong, so some intuition, double-checking of assumptions and deviations from the guide are necessary.

1) Basic steps for debugging Direct3D performance problems:

Before digging too deep check a few common issues:

  • Keep your software up to date
  • Make sure you actually have a problem: Test the game on Windows on the same hardware. If it runs at the same speed there's probably nothing Wine can do to fix it. You're free to try though.
  • Make sure your System is OK. No background processes consuming CPU time, etc. Try disabling Desktop compositing
  • Check for log messages. Sometimes Wine knows it hits a slow path and writes a FIXME or WARN. Don't fix too much on such warnings, they may indicate a comparatively minor problem.
  • Watch out for high wineserver CPU usage
  • Currently 80% of Windows performance is a high water mark, few games run faster. Many games run around 50%, which means there's a lot of room for improvement. If your game runs even slower there are probably some nasty bugs around.

Some words about CSMT. The basic idea is to segregate wined3d's GL calls to a separate thread: wined3d functions indirectly called by the application threads queue messages to the command stream thread which, on its part, essentially keeps checking the queue and executing whatever instruction it finds there.
CSMT has been initially developed as a way to enhance wined3d performance, with the "theory" being that by splitting the potentially expensive GL calls out of the application thread we can quickly return to the application code and hopefully lessen our CPU-side woes. That turned out to work pretty well but with a caveat: some commands (e.g. mapping buffers) actually NEED to wait for the operation to complete since they have to return data to the application. Without some "tricks" when we get one of those commands we have to effectively wait until all the commands queued before it complete their execution, which means that the application thread has to block and wait, potentially throwing a lot of the performance improvement out of the window.
The version of CSMT initially introduced in "official" Wine does without some of those tricks (specifically, those that weren't generally safe) and has some room for further performance improvements. It still works perfectly fine as a better replacement of the StrictDrawOrdering option.

2) GPU, CPU or sync limited?

3D rendering is a complex process where multiple components have to play together:

  • The GPU has to render the scene fast enough
  • Software on the CPU side(driver, game, ...) has to generate rendering commands fast enough
  • Software and GPU should, as much as possible, not have to wait for each other

If your game slows down if you increase the screen resolution (or speeds up when you lower it) you are probably GPU limited. This is because the CPU rarely touches single pixels, so with increased resolution only the strain on the GPU increases. Exceptions may apply especially in older ddraw/d3d7 based games..

Figuring out if the issue is excessive synchronization is a bit harder. A significant portion of wined3d time being spent in wined3d_cs_mt_finish() (with CSMT enabled) might hint in this direction, although that isn't necessarily the case and there might be synchronization points hidden in other spots.

3a) GPU limited situation:

If you're GPU limited the problem could be inefficient shaders, or incorrectly configured render target formats. Try ARB shaders (if the game is happy with Shader Model 2.0 or you have a Nvidia GPU), and check logs for usage of 16 bit per channel or floating point render targets.

Unless you have a low end GPU or play at a very high resolution GPU limitations are rare. A seeming GPU limitation may also be a sign of a software rendering fallback in the fragment pipeline.

3b) CPU limitation

Most performance issues on Mid-End or High-End systems are CPU side bottlenecks. The problem can be in pretty much every component involved in running the game:

  • The Linux Kernel
  • The X server
  • The 3D parts of Wine
  • The non-3D parts of Wine
  • The 3D driver

It is tricky to tell those apart. A useful first test is comparing the following 3 metrics:

  1. The performance on a normal Windows system
  2. The performance on Windows while running the game with wined3d
  3. The performance on Linux/macOS with Wine
  4. If you are on macOS: The performance with Wine+macOS.

If there is a noticeable difference between (1) and (2) the problem is likely in the 3D related parts(wined3d, OpenGL driver). That's because all the other components haven't been changed. If there is a noticeable difference between (2) and (3) the problem is either in the non-3D parts(Rest of Wine, Linux kernel, ...). Make sure the game runs in the same codepath in all 4(or 3) cases - some games can use both d3d9 and d3d10 for example.

A difference between (2) and (3) may also indicate a problem in the Linux 3D driver that does not occur on Windows. If you are using AMD's or Nvidia's binary drivers this is unlikely because the Windows and Linux drivers share a pretty big common core. On macOS there is less code sharing, so if you're debugging performance issues on macOS the difference between (3) and (4) can matter.

3c) Limited by CPU-GPU synchronization

Ideally GPU and CPU should run as independent of each other as possible, to allow them to work at the same time without interruptions and exploit the available resources to the fullest. It's the application that ultimately decides how to set up its rendering pipeline, which means that sometimes there isn't much that Wine can do to avoid costly CPU-GPU synchronization points. That's one more reason why comparing performance with Windows is important. Assuming a well-behaved application, wined3d should strive to avoid introducing its own synchronization points. To figure out if there is room for improvement, the only practical thing to do is to look at the d3d log by hand (probably with some additional traces around wined3d_cs_finish() or its callers or other interesting points). I've also found it useful to make a log with just those ad-hoc traces (so to not slow down the game too much) and check it "realtime" with the game and maybe perf.

4) Profilers

Now that you have a rough idea if you're looking for issues in 3D or non 3D parts it's time to separate modules further. perf, sysprof or other profilers can help here.

Case A): 3D related problems:

The main question is if the GPU time is spent in wined3d.dll or the driver. If more than 5% CPU time are spent in wined3d this is suspicious. If a lot of CPU time is spent in the driver libraries(> 30% if you need some threshold) it may indicate a bug in the driver itself, or inefficient GL calls made by wined3d. With binary drivers the difference doesn't matter much since you can't change the driver anyway.

Specifically for CSMT, a high wined3d_cs_run() CPU usage most likely means that the command stream thread is spinning a lot waiting for new commands to execute. That in turn points to the application thread not being able to keep up with the pace of the command stream / GPU execution (possibly because of wined3d stuff in the application thread) or that there is a lot of synchronization with the command stream thread and the queue is flushed a lot. A lot of time being spent in wined3d_cs_emit_present(), instead, suggests that the command stream thread is the bottleneck. In turn, that might be busy on the GPU or CPU side. In the latter case, driver's threaded optimizations might help, if available.

Case B): Non 3D related problems:

Usual suspects are ntdll.dll, kernel32.dll, wineserver, various C/C++ runtime libraries, the Linux kernel. You may be able to use the native version of the library in Wine. If that fixes your issues you know where to fix it in Wine. Again note that a high CPU usage in the Linux kernel or in low level Wine libs may be caused by inefficient calls to that library from a higher level, so be careful before taking out the pitchforks.

5) Telling wined3d problems from driver problems

Telling those apart is not easy. A driver may not support OpenGL extensions used for speedups, causing high CPU usage in wined3d(e.g. GL_ARB_vertex_buffer_object, GL_ARB_vertey_array_bgra). WineD3D may make inefficient calls causing high CPU usage in the driver.

A rule of thumb way is to try a different GPU/driver. If a game is slow on Nvidia GPUs (compared to Windows) and fast on AMD GPUs (compared to Windows on the same card) then the problem may be in the driver. If you have a binary driver you may as well just assume that the problem is in wined3d since otherwise you're mostly out of luck anyway.

If you isolate a problem in the driver don't forget to report it to the driver vendor.

6) Other hints:

  • GPU vendors offer various performance debugging tools. Those can be helpful for debugging GPU or bandwidth limitations. They're usually focused on Windows, so you're probably better off by debugging wined3d on Windows
  • Windows has helpful tools as well, for example VsPerfMon (Kinda like oprofile). To use those you may have to compile wined3d with Visual Studio to get Microsoft-Style debug symbols.
  • Don't hesitate to contact wine-devel for help.
  • Do your homework before contacting driver developers or other projects. Installing Wine and a Windows Game means a lot of work for non-Wine people, so make sure you're not wasting their time.

Tutorial by Stefan Dösinger