This is the second in a series of posts on streamlining score sharing in beatmania IIDX. Using the data we found last time, we'll build an internal library to read score data from memory, find and hook a function to run our code on the result screen, and finally, hijack an import to get our library loaded automatically.

Before we get started, let me quickly emphasise the importance of testing. Indeed, haphazardly scanning memory is fine by me - so long as the results are reliable! After spending some time trying out various gameplay permutations, I found that some things didn't quite add up and needed adjusting.

All the updated constants, structures, classes, and a few new utility functions can be found in these header files.

Without further ado, let's get started! I'll be trying to keep the code simple, albeit a little untidy, for expository purposes. Feel free to re-organise and split things up into separate classes and files as you see fit.

std::uintptr_t bm2dx_addr = 0;      // Base address of the 'bm2dx.exe' module.
state_t* state = nullptr;           // Structure containing various game state data.
judgement_t* judgement = nullptr;   // Structure containing judgement data for both players.
const char* player_name = nullptr;  // DJ name of the player, works for both sides.
void* get_pacemaker_data = nullptr; // A function for retrieving pacemaker data.
// Print score data out to the console. Just a placeholder for now.
void scorehook_dump(StageResultDrawFrame* frame = nullptr) {}
DWORD WINAPI scorehook_init(LPVOID dll_instance) {
  // Create a console window for printing text.
  AllocConsole();
  freopen_s((FILE**) stdout, "CONOUT$", "w", stdout);

  // Get the base address of bm2dx.exe.
  bm2dx_addr = (std::uintptr_t) GetModuleHandleA("bm2dx.exe");

  // Cast various interesting areas of game memory.
  state = (state_t*) (bm2dx_addr + game_state_addr);
  judgement = (judgement_t*) (bm2dx_addr + judgement_addr);
  player_name = (const char*) (bm2dx_addr + player_name_addr);
  get_pacemaker_data = (void*) (bm2dx_addr + pacemaker_addr);

  do {
    if (GetAsyncKeyState(VK_F9))
      scorehook_dump();
    else if (GetAsyncKeyState(VK_F10))
      break;

    Sleep(100);
  } while (true);

  // Print some text, just so we know that something is happening.
  printf("Detaching from process..\n");

  // Free resources and detach from process.
  FreeConsole();
  FreeLibraryAndExitThread((HMODULE) dll_instance, EXIT_SUCCESS);

  return EXIT_SUCCESS;
}
BOOL APIENTRY DllMain(HMODULE dll_instance, DWORD reason, LPVOID) {
  if (reason == DLL_PROCESS_ATTACH)
    CreateThread(NULL, NULL, scorehook_init, dll_instance, NULL, NULL);

  return TRUE;
}

Here's my usual boilerplate code. When loaded into a process it creates a console window before entering an infinite input processing loop. We can invoke scorehook_dump by pressing F9, although this won't actually do anything just yet.

When the shutdown signal is received - in this case, the F10 key is pressed - the library frees any resources and detaches from the process; ready for changes to be made, the library to be recompiled and loaded into the process again.

If you really want to see an empty console window, feel free to compile and load that into the game now. Pretty much any remote library loader should work just fine. I'm using Process Hacker 2 with the "Miscellaneous > Inject DLL" context menu option.

Now for something a bit more practical - let's start filling in the scorehook_dump function.

auto chart = bm2dx::get_chart();
auto judgement = bm2dx::get_judgement();

This is where the bm2dx_util.h helper functions come in handy.

get_chart uses the global state variable to check which side the player is on, the play style and chosen difficulty for that side, and finally, the current active music entry. Using these, it returns a chart_t containing the rating, BPM and note count.

get_judgement is a little simpler. It returns a judgement_player_t structure, which is just a cut down version of judgement_t containing values for a single player rather than both. For the Double play style, it combines both P1 and P2 values.

printf("Player name:        %s\n\n", player_name);

printf("Active play side:   %s\n", state->p1_active ? "P1": "P2");
printf("Active play style:  %s\n\n", state->play_style == 0 ? "Single": "Double");

printf("Current music:      %s - %s\n", state->music->artist, state->music->title);

printf("Chart note count:   %i\n", chart.notes);
printf("Chart rating:       Lv. %i\n", chart.rating);

printf("Chart BPM:          %i ~ %i\n\n", chart.bpm_min, chart.bpm_max);

printf("PGREAT:             %i\n", judgement.pgreat);
printf("GREAT:              %i\n", judgement.great);
printf("GOOD:               %i\n", judgement.good);
printf("BAD:                %i\n", judgement.bad);
printf("POOR:               %i\n\n", judgement.poor);

printf("COMBO BREAK:        %i\n\n", judgement.combo_break);

printf("FAST:               %i\n", judgement.fast);
printf("SLOW:               %i\n\n", judgement.slow);
Player name:        AIXXE

Active play side:   P1
Active play style:  Single

Current music:      dj TAKA - Liberation
Chart note count:   760
Chart rating:       Lv. 6
Chart BPM:          0 ~ 150

PGREAT:             463
GREAT:              239
GOOD:               58
BAD:                0
POOR:               0

COMBO BREAK:        0

FAST:               250
SLOW:               47

That all looks about right. Nice and easy so far. Now to add in some data from StageResultDrawFrame.

Let's not dwell on finding the address of this class programmatically just yet - I'll cover that a little further down. For now I've hard-coded in an address using the "Find out what accesses this address" technique from the previous post.

frame = (StageResultDrawFrame*) 0x4939DC00;
auto frame_data = state->p1_active ? frame->p1: frame->p2;

printf("Best clear type:    %s\n", CLEAR_TYPE[frame_data.best_clear_type]);
printf("Current clear type: %s\n\n", CLEAR_TYPE[frame_data.current_clear_type]);

printf("Best DJ level:      %s\n", DJ_LEVEL[frame_data.best_dj_level]);
printf("Current DJ level:   %s\n\n", DJ_LEVEL[frame_data.current_dj_level]);

printf("Best EX score:      %i\n", frame_data.best_ex_score);
printf("Current EX score:   %i\n\n", frame_data.current_ex_score);

printf("Best miss count:    %i\n", frame_data.best_miss_count);
printf("Current miss count: %i\n\n", frame_data.current_miss_count);
Best clear type:    clear_failed
Current clear type: clear_fullcombo

Best DJ level:      level_s_c
Current DJ level:   level_s_a

Best EX score:      820
Current EX score:   1165

Best miss count:    60
Current miss count: 0

Excellent. That just leaves the pacemaker data now. Remember what I said last time?

As luck would have it, getting to this data programmatically would be easy. I would only need to allocate enough memory to store all the above, meaning 264 bytes, then call sub_520A40 with a pointer to said memory.

That's still what we're going to do, but there's a slight complication.

int __usercall sub_512160@<eax>(_DWORD *a1@<eax>)

The address has changed in an update, making this sub_512160 now, but that's not it.

That __usercall calling convention is IDA's way of telling us that we should pass the a1 argument into the eax register before calling this function. The @<eax> here means that it will also return a value in the eax register.

It's a little different, but nothing a bit of inline assembly can't handle.

pacemaker_t pacemaker_data;

_asm {
  lea eax, [pacemaker_data];
  call [get_pacemaker_data];
}

printf("Pacemaker target:   %i [%s]\n", pacemaker_data.score, pacemaker_data.name);
printf("Pacemaker type:     %s\n", PACEMAKER_TYPE[pacemaker_data.type]);
Pacemaker target:   1216 [80%]
Pacemaker type:     sg_pacemaker
beatmania IIDX INFINITAS result screen The dumped score data and the corresponding result screen.

That was pretty fast. Of course, there's still that one glaring issue: the hard-coded StageResultDrawFrame address.

Let's take another look at Class Informer for some inspiration.

IDA Class Informer results window

If I had to guess, I'd bet that something in CStageResultScene is responsible for instantiating all those other stage result drawing classes. Perhaps we'll even find them, or something that could lead us to them somewhere inside.

That doesn't sound like an entirely unreasonable line of thought, so let's jump back to StageResultDrawFrame's virtual function table in IDA and press Ctrl+X on the first entry to bring up the cross-references list.

Cross-reference list window in IDA

A single reference in sub_511010. Here's the cut-down pseudo-code of the interesting parts.

_DWORD *__stdcall sub_511010(_DWORD *a1)
{
  // ...
  *a1 = &StageResultDrawFrame::`vftable';
  // ...

  return a1;
}

Seems that the a1 argument passed to this function becomes an instance of StageResultDrawFrame.

int __thiscall sub_516930(void *this, int a2)
{
  // ...
  *(_DWORD *)a2 = &CStageResultScene::`vftable';
  // ...
  sub_511010((_DWORD *)(a2 + 1480));
  // ...
}

Jumping up to where this is called takes us to sub_516930. The code here suggests we'd be able to find a StageResultDrawFrame by adding 1480 bytes to the address of a CStageResultScene. Let's confirm that.

For this, you can utilise the ancient art of "setting breakpoints on a bunch of virtual functions until one of them fires". This only took a few attempts in the CStageResultScene table before I was able to find sub_5B52B0.

Adjusted for the image base, this became bm2dx.exe+0x1B52B0.

Cheat Engine debugger at a breakpoint

If you look back at the earlier code, you'll see that the hard-coded address for StageResultDrawFrame was 0x4939DC00.

If we take that StageResultDrawFrame address, 4939DC00, then subtract the 4939D638 currently in the ecx register, we get 5C8, or 1480 in decimal. So yeah, that works and we could do that, but after looking around in ReClass we can do even better.

ReClass.NET dissecting a region of memory So much better.

It turns out there's a class a bit further down from StageResultDrawFrame called StageResultDrawParts. It literally just contains pointers to all the other classes, including itself. We're reaching levels of convenience that shouldn't be possible.

class StageResultDrawBg;
class StageResultDrawGraph;
class StageResultDrawDjLevel;
class StageResultDrawRivalWindow;
class StageResultDrawMusicInfo;
class StageResultDrawInvalidFrame;
class StageResultDrawDeadPoint;

class StageResultDrawParts {
  private:
    virtual ~StageResultDrawParts() = 0;
  public:
    StageResultDrawBg* bg;
    StageResultDrawGraph* graph;
    StageResultDrawDjLevel* dj_level;
    StageResultDrawRivalWindow* rival_window;
    StageResultDrawMusicInfo* music_info;
    StageResultDrawFrame* frame;
    StageResultDrawInvalidFrame* invalid_frame;
    StageResultDrawDeadPoint* dead_point;
    StageResultDrawParts* parts;
};

We're only interested in StageResultDrawFrame at the moment but there's plenty more fun to be had with these classes.

For instance, you could extract all your rival names and scores from StageResultDrawRivalWindow, or read out the groove graph points from StageResultDrawGraph. I'll leave that up to you for now, though.

int __usercall sub_517330@<eax>(int a1@<eax>, int a2@<ecx>)
{
  int result; // eax

  switch ( a2 )
  {
    case 0: result = a1 + 144; break;   // StageResultDrawBg
    case 1: result = a1 + 420; break;   // StageResultDrawGraph
    case 2: result = a1 + 464; break;   // StageResultDrawDjLevel
    case 3: result = a1 + 476; break;   // StageResultDrawRivalWindow
    case 4: result = a1 + 1468; break;  // StageResultDrawMusicInfo
    case 5: result = a1 + 1480; break;  // StageResultDrawFrame
    case 6: result = a1 + 1652; break;  // StageResultDrawInvalidFrame
    case 7: result = a1 + 1660; break;  // StageResultDrawDeadPoint
    case 8: result = a1 + 1672; break;  // StageResultDrawParts
    default: result = 0; break;
  }
  return result;
}

Looking around a bit more, I found this further down in sub_516930. As you might've guessed from the amount of cases in the switch, this function gets called nine times in order to populate the pointers in StageResultDrawParts.

I was initially thinking of hooking something from CStageResultScene and just padding out the class until it reached either StageResultDrawFrame or StageResultDrawParts, but this function has given me another idea.

              loc_516A56:                             ; CODE XREF: sub_516930+136↓j
00516A56   8B C5              mov     eax, ebp
00516A58   E8 D3 08 00 00     call    sub_517330      ; Call Procedure
00516A5D   89 02              mov     [edx], eax
00516A5F   41                 inc     ecx             ; Increment by 1
00516A60   83 C2 04           add     edx, 4          ; Add
00516A63   83 F9 09           cmp     ecx, 9          ; Compare Two Operands
00516A66   7C EE              jl      short loc_516A56 ; Jump if Less (SF!=OF)

I reckon we could intercept the call instruction in the loop where this function gets called and redirect it to our code.

We would then have to call sub_517330 ourselves. Once again we're dealing with __usercall, so we'd have access to one of the class pointers in eax and the number from the switch in ecx, which can be used to identify the class we're dealing with.

Then, simply check if ecx matches 5, the number for StageResultDrawFrame, and call the scorehook_dump function. Finally, return to the game by jumping back to the mov instruction at 516A5D, immediately after where we placed the hook.

That sounds like it would work, so let's go ahead and give it a shot.

constexpr std::uintptr_t result_hook_addr = 0x116A58;
void* result_hook_original_fn = nullptr; // 517330
void* result_hook_return_addr = nullptr; // 516A5D (516A58 + 5)

We'll define these properly when we install the hook.

__declspec(naked) void scorehook_intercept() {
    __asm {
      // instructions go here..
    }
}

The __declspec(naked) attribute prevents the compiler from generating an unnecessary prologue or epilogue for this function. We'll be writing the whole hook function using the inline assembler. Perhaps not strictly necessary, but certainly more fun.

call        [result_hook_original_fn];

As mentioned earlier, calling this function puts the important data into the eax and ecx registers.

The only relevant class to us is StageResultDrawFrame, so let's make sure that's the one we're dealing with here.

cmp         ecx, 5;
jne         back;

Remember that 5 was the index for StageResultDrawFrame in the switch, so if the current value in ecx is not 5 then we should jump to the back label. It's not defined yet, but it'll allow us to return to the function we placed our hook in.

pushad;

Now that we're definitely dealing with the right class, we need to make sure we don't mess up the existing registers when we call scorehook_dump. We can do this by pushing all their values onto the stack with a single convenient instruction.

push        eax;
call        [scorehook_dump];

Before calling scorehook_dump, we push the StageResultDrawFrame* argument it expects from eax to the stack.

pop         eax;
popad;

Almost done. Now just clean up that argument we pushed, then pop everything back into the correct registers.

back:
jmp         [result_hook_return_addr];

Finally, we hand control back to the game by returning to the next instruction in the function we hooked.

I don't know about you, but I'm pretty excited to try this out. There's just last thing we have to take into account since the call we're replacing is a relative one. The bytes that make up the instruction are E8 D3 08 00 00.

How is IDA turning that into 517330? Nothing too complicated, just take the address of the call, 516A58, add the length of the instruction, meaning 5 bytes, then add the relative displacement part, D3 08, so 8D3.

516A58 + 5 + 8D3 = 517330

Well, that's the gist of it. For a proper explanation, check out the x86 instruction reference page for call over here.

I've included a helper function called GetAbsoluteAddress in bm2dx_util.h that will handle this for us. But that's enough explaining, now we just need to get this thing hooked up! I'll be using MinHook for this example.

// This should go before the 'do while' loop in scorehook_init.
MH_Initialize();

const std::uintptr_t hook_address = (bm2dx_addr + result_hook_addr);

result_hook_original_fn = GetAbsoluteAddress(hook_address);
result_hook_return_addr = (void*) (hook_address + 5);

MH_CreateHook((void*) hook_address, &scorehook_intercept, NULL);
MH_EnableHook((void*) hook_address);
// This should go after the 'do while' loop in scorehook_init.
MH_Uninitialize();

We've essentially reached the end for this part now. Compile and load that into the game and score data will be printed to the console when you reach the result screen. No more manual key pressing required!

Okay.. there's actually one more thing to do. Last one for real this time. Our library may be functional now, but the process of getting it running is still way too manual for my liking. Does anyone really want to do this every time they start the game?

If you don't, I've got just the trick. It's commonly known as DLL hijacking. I'll only be giving a quick rundown of how it can be applied here since it's already well documented elsewhere. By taking advantage of the DLL search order in Windows, we can replace an imported library with a fake version that acts identically (well, identically enough) to the original.

PE viewer imports tab Viewing imports using the PE Viewer utility from Process Hacker 2.

Here you can see the game executable imports d3d9.dll in order to call Direct3DCreate9. Let's cross our fingers and hope that's all the game expects of that library. We'll quickly build a drop-in replacement that provides the same functionality, then place it in the same directory as bm2dx.exe. According to MSDN, this would give it a higher search priority than the real one.

If SafeDllSearchMode is enabled, the search order is as follows:

  1. The directory from which the application loaded.
  2. The system directory. Use the GetSystemDirectory function to get the path of this directory.

We'll assume that the real d3d9.dll can be found in the directory returned from GetSystemDirectory.

Our fake library only needs to do a few things for this to work: export Direct3DCreate9 with the same signature as the original, load the real d3d9.dll from the system directory, call the real Direct3DCreate9 and return the real result.

Anything else is up to us. We'll keep it simple and just load the scorehook library somewhere in-between.

#include <windows.h>
#include <filesystem>

class IDirect3D9* __stdcall Direct3DCreate9(UINT SDKVersion) {
  // Ensure the function is exported without a decorated name.
  #pragma comment(linker, "/EXPORT:" __FUNCTION__ "=" __FUNCDNAME__)

  // Load our library first.
  LoadLibrary(L"scorehook.dll");

  // Assume we can find the real 'd3d9.dll' library in system32.
  TCHAR system_dir[MAX_PATH];
  GetSystemDirectory(system_dir, MAX_PATH);

  // Locate the real 'Direct3DCreate9' function, call it and return the result.
  const auto real_d3d9 = LoadLibrary(std::filesystem::path(system_dir).append("d3d9.dll").c_str());
  const auto real_create = reinterpret_cast<decltype(&Direct3DCreate9)>(GetProcAddress(real_d3d9, "Direct3DCreate9"));

  return real_create(SDKVersion);
}

Make sure that the final library has a single Direct3DCreate9 export, then just copy both libraries to the game\app directory and you'll never have to load manually ever again! But yeah, you might want to add some basic error checking first.

PE viewer exports tab A single Direct3DCreate9 symbol as expected.

Admittedly, the scorehook library doesn't do that much right now, so it's probably not that useful to have it loading automatically yet. This section just doesn't quite fit in any of the upcoming parts but I didn't want to leave it out.

But with that, we're finished for real this time. There's still more to go in this series so look forward to the next instalment, whenever that may be. Until next time, Merry Christmas and a Happy New Year!

Published on Thursday, 26 December 2019 at 11:49 AM.