Although at first, disassembly and decompilation may seem like very useful techniques for Wine developers to use, they should probably only be seen as a last resort for a few different reasons. The purely practical reason is that in most cases, compilation strips and optimizes away much of what makes the original source code easy for a human to understand. It often requires a lot of analysis, work, and skill just to have a reasonable guess at what a bit of diassembled code is doing. Other debugging techniques will usually allow you to narrow down your problem faster and more clearly. This is especially true for native Windows DLLs, where disassembly is almost always useless (in addition to bringing up the issue below).
The other problem is more legal in nature. Because Wine is very strict about Clean Room Guidelines, in order to keep from violating any Microsoft copyrights, any contamination of the "cleanroom" by a developer puts the whole project at risk. While this shouldn't be an issue so long as you are only disassembling a third-party application, even briefly studying disassembled Microsoft code for a feature that Wine plans to implement could lead to legal trouble. While in theory it is still possible for someone that has disassembled Microsoft code to document what the code does generally, in their own words, this can become a legal grey area. If in doubt about whether you might violate the Clean Room Guidelines, or if you think you may but would still like to help the project somehow, please ask on the Wine developer's mailing list.
Do not try submitting disassembled or decompiled code to the project. It's been tried before, it didn't work then, it won't work now, and the developers know disassembled code when they see it. Not only will your patch not be accepted, but your reputation will be permanently tarnished, and you probably will not be allowed to contribute to Wine again.
If other methods haven't worked and you want to disassemble a program, you can actually use Wine's built-in debugger. To disassemble a program, first start winedbg and attach it to a process the same way you would for collecting backtraces. When you want to begin disassembling, issue the disas command to winedbg. You can also specify a starting address or an address range to disassemble over with disas <expression> or disas <expression>,<expression> respectively.
When working backwards from a particular address try and choose aligned addresses, otherwise you risk starting the disassembly from the middle of an instruction which will produce garbage. Also, if you aren't already familiar with x86 assembly, you might consider practicing by using the disassembler in gdb on ELF binaries that you've compiled yourself, then going back and comparing the disassembled code to the original source code. The Intel instruction set is vast, but very few instructions are used frequently. If you know what to expect, you won't be tripped up by bad disassembles or anti-disassembly tricks which can corrupt the output.
There are a few common patterns that you especially might want to look for:
- Code which moves values into 0x0(%fs) is probably setting up an SEH handler frame.
- The return code of a function is stored in %eax on all calling conventions used in Windows, so if you see a function call followed by a check against this register, you're very likely to be seeing a check for an error code.
- Call instructions which dereference a value, such as call *0x12345678 are usually calls into other DLLs (often API calls). Winedbg should be able to show you which API the call instruction is invoking. Also keep in mind that instruction sequences that move a value into a register from the stack and then dereference it in a call instruction could well be COM method invocations.
- C++ applications compiled using Microsoft Visual C++ (the most popular C++ compiler on Windows) use the "thiscall" calling convention, in which the instance pointer is stored in %ecx. If you get a crash dereferencing the value of %ecx in a C++ program, you might be seeing a null pointer dereference on an object instance. This is especially likely if the offset is quite large, i.e. a crash accessing 0x36 is a pretty good indication that it's a C++ object and not just a very large C struct.
- Optimizing compilers do a lot of interesting things, such as rearranging instructions to better match the internal scheduling algorithms on superscalar processors. This can seriously obfuscate the assembly stream, so don't be at all surprised if you find it hard to read. A common trick is to use xorl %eax, %eax to clear the %eax register rather than the more intuitive movl $0, %eax. Why? Because the instruction is smaller, so it takes less space, so it uses the CPU cache more effectively.