Developer Documentation
   - How it works ?
   - How hooking works ?
   - What is no stack hooking and how it works ?
   - How hooking at startup works ?
   - How remote call works ?
   - How process creation monitoring is done ?

How it works?

We first inject the APIOverride dll in the target process using the Injlib code written by Jeffrey Richter. (this creates a remote thread with the LoadLibrary function as start address and the name of dll to be injected as parameter)

Injecting APIOverride dll

The DllMain function of APIOverride.dll then initializes interprocess communication to be managed by WinAPIOverride.
The injected dll job is to set up hooks, load dlls and monitoring files, and make function calls in the target process (see How remote call works)

Loading monitoring files or overriding dlls
Target memory space after overriding dll load

Our project is composed of two main parts:
   - WinApiOverride32.exe: user interface for the CAPIOverride class (easy management of the injected dll)
   - APIOverride.dll: the injected dll.

The CAPIOverride object allows you to perform all the required stuff for managing the injected dll.
Notice: The CAPIOverride object methods are described here


How hooking works ?

To perform a hook an a function, we need this function address.
This can be done by providing directly function address (if we know it), or by performing a GetModuleHandle or a LoadLibrary with the dll name, then followed by a GetProcAddress with function name
.
To create the hook, we have to modify the first asm instructions at the specified function address.
For example, GetProcAddress on the MessageBoxA function in User32.dll returns 0x77d3add7 on some versions of Windows XP.
Therefore, to set up our hook, we just have to replace the first bytes at address 0x77d3add7 by "call OurHandlerAddr" using the WriteProcessMemory API function.

To replace as less bytes as possible, we just perform a call. This requires 5 opcode bytes.
The issue is that we need more parameters, such as information about the hooked function, before entering a generic hook.

So we allocate memory to do it. Doing so, we can plug in as much asm code as we want before calling the generic hook.
The following picture sums it up:


When entering the generic c hook, the stack history is:
   1) push HookedFunc params
   2) call HookedFunc --> push return address of HookedFunc
   --- code modification of first bytes of API ---
   3) call Dynamic asm code --> push return address our asm call
   4) push pointer of hooked function object
   5) push asm registers


So the stack is in the following state:
- asm registers
- OurHandlerParam
- return address of our asm call
- return address of HookedFunc
- HookedFunc params

To call the original function inside the hook, we first have to restore the original asm instructions (this temporarily removes the hook)
Next, as we know the number of parameters (they are given in the configuration file), we can push them on the stack again and call the original function or another one.

To preserve stack security, we push more data than the declared parameters in the configuration file. This avoids any crash in case of a bad configuration file.
To log parameters, we just copy them from the stack before and/or after the function call.

After the call, we restore the hook to be ready to log the next function call.

Finally, we restore the stack into the same state as it should be after a call without the hook.


Particular cases :

   "|FirstBytesCanExecuteAnywhere"
The big difference with other hooking methods is that hook is not removed when calling original function, so you can't loose any call.

Since version 3.1, winapioverride include a size length disassembly, so, according to options, first bytes of each function will be automatically analysed to try to put a such hook


   "|BlockingCall"
Quite like a standard hook, but with a timeout like for hook restoration.
Hooking this way is interesting for slow blocking function like messagebox, because if a new call is done it can be hooked even if the first as not finished.
It can be interesting because you don't need to now the size of movable asm code like with the FirstBytesCanExecuteAnywhere option.

Disadvantages :
- you can still loose calls
- As thread creation is time consuming, there's no use to try this way for high speed blocking call like SendMessages, because you loose more calls than
with the standard method. Another trouble for high speed blocking calls is if restoration of original opcode in thread is done in the same time as the hook removal is done by the generic c handler for another call, you can crash your application.



What is no stack hooking and how it works ?


Hooking needs some storage infromations.
With standard hooking, these infromations are stored on stack, next hooked function is called, and after the call, stack is restored according to function calling convention.
Doing this will affect the esp and ebp registers, so if the hooked function is using these registers values (or stack relative values), hooked function call will fail or return bad result.
By the way ebp can be used as standard registers and contains value for /Oy optimization

Using "no stack hooking", only the return address is modified, the great advantage is that esp and ebp are not changes, so functions using these registers values (or stack relative values) can be hooked.
To avoid stack and ebp changes, requiered information must be allocated inside heap.
To allow multithreaded safety, we just use Thread Local Storage (TLS) to store heap allocation pointers.
Notice : using no stack hooking allows to unhook function even if a function call has not finished : we just need to restore exception handler and return address and hooked process won't crash. This is impossible to do with stack hooking (we need to end hooked function return to restore stack correctly).

The trouble of no stack hooking is for catching exception.
Please read the Matt Pietrek great article "A Crash Course on theDepths of Win32 Structured Exception Handling, MSJ January 1997" for exceptions information (and to understand the two lines below).
Doing a try / catch allocate a little piece of memory on stack address, and, for security reasons, the C++ runtime checks that addresses of try / catch blocks belong to the stack (so you can't allocate this piece of information in heap).
So the only way is to replace the existing exception filtering callback by our one.
Doing this, we MUST register our filtering callback function as a structured exception handler (using of SAFESEH option), else XP SP2 and newer systems won't execute it.

Flow :
Impact on stack :



How hooking at startup works ?

There are two different ways for hooking processes:


   - When we use the "Attach at application startup" method, we create the process in a suspended state, so we know that the entry point has not been reached.
We could settle a breakpoint and debug the process, but debugging a process changes some of its attributes. So we use a different way: we install a hook at the entry point.
Hence the entry point content is replaced by jmp OurHookAddress.
Where OurHook is a function that loads the APIOverride dll, restore the original content of the entry point and suspend the process again to allow the user to load his monitoring and overriding files.
Of course this function doesn't exist in the process we want to hook, and we have to inject the code into it through a call to VirtualAllocEx and WriteProcessMemory.

   - When we use the "Inject before statically linked dll execution", we have to inject our dll inside the IAT. This insertion must be before existing imported dll, because the NTloader sequence is the following :
      Put all modules in memory

      Call first dll Tls
      Call first dll DllMain

      Call second dll Tls
      Call second dll DllMain

      ...

      Call exe tls
      Call exe main
So by being the first imported dll, our DllMain is the first executed code, and so we can do all work we need and we will get control of all other dll.

Injection is done by new import section creation and redirection of Import Table (IMAGE_DIRECTORY_ENTRY_IMPORT index of the NTHeader.OptionalHeader.DataDirectory array of the pseudo header). See microsoft pe specification for PE information.
Notice1 : our dll must export at least a function, because if a dll is declared in the Import Directory Table (IDT), but no function is imported in Import Lookup Table or Import Address Table, our dll won't be loaded by ntloader.
Notice2 : Using CreateProcess with suspended state is useless because nt loader execute imported dll Tls and DllMain before suspending (and giving use control).

   - When we use the "Attach to all new processes" option, we don't create the process, thus we don't know when the procmon driver callback happens.
So we can't hook the entry point of the process (it may have already been reached); and we can't inject the dll as soon as we enter the callback, because the NT loader may have not finished its job (and if we try to do so at this moment, the application crashs)
So we have to do polling until the end of the NT Loader. As soon as there is a module loaded into the process, we can perform injection.
To detect whether the first module has loaded, we poll the Module32First API, and as soon as the result of this API is TRUE,
it means that at least the exe module is loaded, so we can inject our library.



How remote call works ?

To call a function from another process, we have to transmit its parameters by any possible way (shared memory, mailbox, pipe, tcp ...), and then push them to the stack before calling the function address. We next transmit the parameters and return back to our process.
The following code allows to call the function with any parameter/struct and only a SIMPLE pointer to any parameter/struct

Copyright Jacquelin POTIER 2006 Sources under GPL V2 license

The following structure is used to define each parameter.
  1. typedef struct _STRUCT_FUNC_PARAM
  2. {
  3. BOOL bPassAsRef; // true if param is pass as ref
  4. DWORD dwDataSize; // size in byte
  5. PBYTE pData; // pointer to data
  6. }STRUCT_FUNC_PARAM,*PSTRUCT_FUNC_PARAM;
With pFunc as address function we should call, NbParams the number of parameters and pParams a PSTRUCT_FUNC_PARAM array, the code is the following
  1. _asm
  2. {
  3. // store esp to restore it without caring about calling convention
  4. mov [dwOriginalESP],ESP
  5.  
  6. // use a security to avoid crashing in case of bad parameters number
  7. Sub ESP, [dwEspSecuritySize]
  8. }
  9. // make things cleaner :D
  10. try
  11. {
  12. // set our allocated memory to 0
  13. // (warning memory si allocated by esp-xxx, so to empty buffer we have to do it from new esp addr to old one)
  14. memset((PBYTE)(dwOriginalESP-dwEspSecuritySize),0,dwEspSecuritySize);
  15. // for each param
  16. for (cnt=NbParams-1;cnt>=0;cnt--)
  17. {
  18. pCurrentParam=&pParams[cnt];
  19.  
  20. // if params should be passed as ref
  21. if (pCurrentParam->bPassAsRef)
  22. {
  23. // push param address
  24. dw=(DWORD)pCurrentParam->pData;
  25. _asm
  26. {
  27. mov eax,dw
  28. push eax
  29. }
  30. }
  31. else // we have to push param value
  32. {
  33. // byte
  34. if (pCurrentParam->dwDataSize==1)
  35. {
  36. b=pCurrentParam->pData[0];
  37. _asm
  38. {
  39. mov al,b
  40. push eax
  41. }
  42. }
  43. // short
  44. else if (pCurrentParam->dwDataSize==2)
  45. {
  46. memcpy(&us,pCurrentParam->pData,2);
  47. _asm
  48. {
  49. mov ax,us
  50. push eax
  51. }
  52. }
  53. // dword
  54. else if (pCurrentParam->dwDataSize==4)
  55. {
  56. memcpy(&dw,pCurrentParam->pData,4);
  57. _asm
  58. {
  59. mov eax,dw
  60. push eax
  61. }
  62. }
  63. // more than dword
  64. else
  65. {
  66. // as we are not always 4 bytes aligned we can't do a loop with push
  67.  
  68. // allocate necessary space in stack
  69. dwDataSize=pCurrentParam->dwDataSize;
  70. _asm
  71. {
  72. sub esp, [dwDataSize]
  73. mov [dwCurrentESP],esp
  74. }
  75.  
  76. // copy data to stack
  77. memcpy((PVOID)dwCurrentESP,pCurrentParam->pData,dwDataSize);
  78.  
  79. }
  80. }
  81. }
  82. // now all params are pushed in stack --> just make call
  83. _asm
  84. {
  85. ////////////////////////////////
  86. // save local registers
  87. ////////////////////////////////
  88. mov [LocalRegisters.eax],eax
  89. mov [LocalRegisters.ebx],ebx
  90. mov [LocalRegisters.ecx],ecx
  91. mov [LocalRegisters.edx],edx
  92. mov [LocalRegisters.esi],esi
  93. mov [LocalRegisters.edi],edi
  94. pushfd
  95. pop [LocalRegisters.efl]
  96.  
  97. ////////////////////////////////
  98. // set registers as wanted
  99. ////////////////////////////////
  100. mov eax, [Registers.eax]
  101. mov ebx, [Registers.ebx]
  102. mov ecx, [Registers.ecx]
  103. mov edx, [Registers.edx]
  104. mov esi, [Registers.esi]
  105. mov edi, [Registers.edi]
  106. push [Registers.efl]
  107. popfd
  108.  
  109. // call func
  110. call pFunc
  111.  
  112. // save registers after call
  113. mov [Registers.eax],eax
  114. mov [Registers.ebx],ebx
  115. mov [Registers.ecx],ecx
  116. mov [Registers.edx],edx
  117. mov [Registers.esi],esi
  118. mov [Registers.edi],edi
  119. pushfd
  120. pop [Registers.efl]
  121.  
  122.  
  123. // put pointer to return address in ecx
  124. mov ecx,[pRet]
  125. // put return value in the address pointed by ecx --> *pRet=eax
  126. mov eax,[Registers.eax]
  127. mov [ecx],eax
  128.  
  129. fst qword ptr [FloatingResult]
  130.  
  131. ////////////////////////////////
  132. // restore local registers
  133. ////////////////////////////////
  134. mov eax, [LocalRegisters.eax]
  135. mov ebx, [LocalRegisters.ebx]
  136. mov ecx, [LocalRegisters.ecx]
  137. mov edx, [LocalRegisters.edx]
  138. mov esi, [LocalRegisters.esi]
  139. mov edi, [LocalRegisters.edi]
  140. push [LocalRegisters.efl]
  141. popfd
  142. }
  143.  
  144. bRet=TRUE;
  145. }
  146.  
  147. catch(...)
  148. {
  149. bRet=FALSE;
  150. }
  151.  
  152. _asm
  153. {
  154. // restore esp (works for both calling convention)
  155. mov ESP,[dwOriginalESP]
  156. }
  157.  
  158. if (bRet)
  159. memcpy(pRegisters,&Registers,sizeof(REGISTERS));
  160.  
  161. *pFloatingResult=FloatingResult;
  162.  
  163. return bRet;
  164. }
By the way, for GetTempPathA(DWORD nBufferLength,LPSTR lpBuffer) func in kernell32, the STRUCT_FUNC_PARAM struct can be filled like
  1. STRUCT_FUNC_PARAM pParams[2];
  2. DWORD dwParam=255;
  3. char pc[255];
  4. pParams[0].bPassAsRef=FALSE;
  5. pParams[0].dwDataSize=4;
  6. pParams[0].pData=(PBYTE)&dwParam;
  7. pParams[1].bPassAsRef=TRUE;
  8. pParams[1].dwDataSize=dwParam;
  9. pParams[1].pData=(PBYTE)pc;
  10.  
  11. pApiOverride->ProcessInternalCall(_T("Kernel32.dll"),_T("GetTempPathA"),2,pParams,&dwResult,dwTimeOut);

How process creation monitoring is done ?

Everything is achieved by a call to the PsSetCreateProcessNotifyRoutine API.
This function puts a callback in place, which is called upon every process creation and given the process ID and parent process ID as parameters.

Sounds easy, doesn't it?
Not really as this function can actually be called only in kernel mode, that means we have to build up a driver (ProcMonDrvJP[64].sys)
All sources of this driver are located in the Tools\Process\ProcessMonitor\Src\ProcMon directory, but as the windows DDK (Driver Development Kit) is required to build it, the debug and release versions of the driver are included in the sources.
To easily manage this driver, the class CProcMonInterface has been written (located in Tools\Process\ProcessMonitor)
It allows to define a callback (in user mode of course), start and stop the driver, and start or stop process creation monitoring