Earlier this year, I released a feature for my score tracker that allowed players to post in-game scores to Discord by pressing a button on the numpad. Sound familiar? While the code was about as straight-forward as it could possibly get, every now and then I’d receive a vague issue report of the ‘button not doing anything’ that I wasn’t able to reproduce on my end.
Well, fast forward 10 months later and I’ve finally fixed it. Please consider this post an apology for the delay.
To understand where things went wrong, let’s take a look at the function I’m hooking:
.text:0000000180A92680 89 54 24 10 mov [rsp+arg_8], edx .text:0000000180A92684 48 89 4C 24 08 mov [rsp+arg_0], rcx .text:0000000180A92689 48 83 EC 48 sub rsp, 48h .text:0000000180A9268D C7 44 24 20 00 00 00 00 mov [rsp+48h+var_28], 0 .text:0000000180A92695 EB 0A jmp short loc_180A926A1
There’s nothing actually noteworthy here, it’s just the prologue of a rather lengthy function. For reasons unknown to present me, past me decided against making a pattern from bytes in this function, and followed the only reference it had instead.
.rdata:000000018126A6A0 ; const StageResultDrawFrame::`vftable' .rdata:000000018126A6A0 90 0A A9 80 01 00 00 00 ??_7StageResultDrawFrame@@6B@ dq offset sub_180A90A90 .rdata:000000018126A6A0 ; DATA XREF: sub_180A90690+32↑o .rdata:000000018126A6A8 D0 0D A9 80 01 00 00 00 dq offset sub_180A90DD0 .rdata:000000018126A6B0 80 26 A9 80 01 00 00 00 dq offset sub_180A92680
Since the target was a virtual function, I made a pattern to an instruction containing the start address of the table, 0x18126A6A0
, dereferenced that, then added 24 bytes to get to the third function, giving me a final address of 0x18126A6B0
.
Here’s what the code for that looked like, excluding the pattern scan for the sake of brevity:
target = 0x18126A6A0 + (sizeof(void*) * 3); // 0x18126A6B0
target = *reinterpret_cast<std::uintptr_t*>(target); // 0x180A92680
target -= _library->image_base(); // 0x000A92680
target += _library->base_address(); // 0x180A92680 * may differ depending on base address
Do you see the problem? I didn’t. Until a few hours ago, I was sure this was doing what I thought it was doing.
If you look again at the value of 0x18126A6B0
, you can see that it points to the absolute address of 0x180A92680
.
From a quick glance, I assumed that this was relative to the image base, which in this case is 0x180000000
. But since we can’t guarantee the library will always be loaded at the image base defined in the PE header (due to potential conflicts or ASLR), we’ll have to subtract the image base, then add the real base of where the module was actually loaded instead, right?
I’ll admit, the more I think about it now, the more it doesn’t really hold up to scrutiny.
I would’ve caught it there and then during testing, but for some reason, on my machine and the machines of everyone else who successfully used this feature, the game library would always load at its preferred image base value of 0x180000000
.
In all my time working on the game, across the countless reboots, I’ve never seen it load anywhere else.
Fast forward to the present and someone else is having the same issue again. We’ve already eliminated the obvious options, so I decide to try hooking a different function and send over a test build. It’s functionally identical to the non-working builds, just using a different pattern scan which leads to a similar function, but still calling the same underlying input polling code.
I was a little surprised when they reported back confirming the new build had fixed the issue. I would’ve been more than happy to switch out the pattern, push out a new release and call it a day, but something caught my eye in the log they had sent over…
2023-10-25 12:37:41 DEBUG [hook_manager.h:37] Installed hook at 0x1B868600690 forwarding to 0x7FF9777C7B90. 2023-10-25 12:37:41 DEBUG [hook_manager.h:37] Installed hook at 0x1B868595B90 forwarding to 0x7FF9777C7A50. 2023-10-25 12:37:41 DEBUG [hook_manager.h:37] Installed hook at 0x1B868554A30 forwarding to 0x7FF9777C7CE0. 2023-10-25 12:37:41 DEBUG [hook_manager.h:37] Installed hook at 0x1B86861CE70 forwarding to 0x7FF9777C8050.
The game library had loaded somewhere completely different for them. Again, this isn’t abnormal, but it’s something I’ve never been able to pull off while testing, so I asked them to try the non-working build one last time and send over that log…
2023-10-25 13:33:28 DEBUG [hook_manager.h:37] Installed hook at 0x2A2D2030690 forwarding to 0x7FF9779981A0. 2023-10-25 13:33:28 DEBUG [hook_manager.h:37] Installed hook at 0x2A2D1FC5B90 forwarding to 0x7FF977998060. 2023-10-25 13:33:28 DEBUG [hook_manager.h:37] Installed hook at 0x2A2D1F84A30 forwarding to 0x7FF9779982F0. 2023-10-25 13:33:28 DEBUG [hook_manager.h:37] Installed hook at 0x544235D2680 forwarding to 0x7FF977998660.
…and there it was. The fourth hook, the one that I had changed in the test build, was pointing to a completely incorrect address. Something was obviously wrong about my “subtract image base, then add base address” approach, but how could I test it?
My initial thought was to simply edit the image base address in the PE header. I’m sure you can imagine how well that went. Thinking I was on the right track, I tried rebasing the library. Turns out there’s a tool from Visual Studio that can do that.
A:\>editbin /REBASE:BASE=0x200000000 client.dll Microsoft (R) COFF/PE Editor Version 14.36.32537.0 Copyright (C) Microsoft Corporation. All rights reserved. LINK : fatal error LNK1282: unable to rebase 'client.dll'; it has been signed
…Okay, fine, I guess that makes sense.
A:\>signtool remove /s client.dll Successfully committed changes to the file: A:\client.dll Number of errors: 0 A:\>editbin /REBASE:BASE=0x200000000 client.dll Microsoft (R) COFF/PE Editor Version 14.36.32537.0 Copyright (C) Microsoft Corporation. All rights reserved.
It should try to load at 0x200000000
now, but the value of that 3rd virtual function address will still be 0x180A92680
, right?
Nope. If we convert 0x18126A6B0
to a file offset, then inspect it in a hex editor, we can see it changed in the rebased version.
What’s going on here? This must mean that the editbin
tool somehow knew where all the absolute addresses in the binary were, and as part of the rebase operation, amended all of them to be relative to the new image base.
Turns out, that’s exactly what happened. Rather than attempt to explain what happened here, I’ll just link this blog post and a Stack Exchange answer that helped me get a better understanding of what was going on. Long story short, the DLL contains a table of ‘relocations’, which is used by the PE loader when mapping a library into a process.
With that in mind, knowing the loader has fixed the address for us already, here’s what the code should’ve looked like:
target = 0x18126A6A0 + (sizeof(void*) * 3); // 0x18126A6B0
target = *reinterpret_cast<std::uintptr_t*>(target); // 0x180A92680 * may differ depending on base address
Finally, I rebased the game library again to test my fix. This time, using the base address of kernel32.dll
, since it’s loaded into pretty much everything, and also has the same base address across all of those processes, only being randomized upon rebooting.
As far as bugs go, this one wasn’t too bad. Didn’t cause any instability for people it did affect, and I got to learn a few new things in the process of fixing it. At the same time, it’s a long overdue fix and I’m glad I don’t have to worry about it anymore.