Developer Documentation

   - How it works ?
   - How hooking works ?
   - What is no stack hooking and how it works ?
   - How hooking at startup works ?
   - How remote call works ?
   - How process creation monitoring is done ?

How it works?

We first inject the APIOverride dll in the target process using the Injlib code written by Jeffrey Richter. (this creates a remote thread with the LoadLibrary function as start address and the name of dll to be injected as parameter)

Injecting APIOverride dll

The DllMain function of APIOverride.dll then initializes interprocess communication to be managed by WinAPIOverride.
The injected dll job is to set up hooks, load dlls and monitoring files, and make function calls in the target process (see How remote call works)

Loading monitoring files or overriding dlls

Target memory space after overriding dll load

Our project is composed of two main parts:
- WinApiOverride32.exe: user interface for the CAPIOverride class (easy management of the injected dll)
- APIOverride.dll: the injected dll.

The CAPIOverride object allows you to perform all the required stuff for managing the injected dll.
Notice: The CAPIOverride object methods are described here

How hooking works ?

To perform a hook an a function, we need this function address.
This can be done by providing directly function address (if we know it), or by performing a GetModuleHandle or a LoadLibrary with the dll name, then followed by a GetProcAddress with function name
.
To create the hook, we have to modify the first asm instructions at the specified function address.
For example, GetProcAddress on the MessageBoxA function in User32.dll returns 0x77d3add7 on some versions of Windows XP.
Therefore, to set up our hook, we just have to replace the first bytes at address 0x77d3add7 by "call OurHandlerAddr" using the WriteProcessMemory API function.

To replace as less bytes as possible, we just perform a call. This requires 5 opcode bytes.
The issue is that we need more parameters, such as information about the hooked function, before entering a generic hook.

So we allocate memory to do it. Doing so, we can plug in as much asm code as we want before calling the generic hook.
The following picture sums it up:

When entering the generic c hook, the stack history is:
   1) push HookedFunc params
   2) call HookedFunc --> push return address of HookedFunc
   --- code modification of first bytes of API ---
   3) call Dynamic asm code --> push return address our asm call
   4) push pointer of hooked function object
   5) push asm registers

So the stack is in the following state:
- asm registers
- OurHandlerParam
- return address of our asm call
- return address of HookedFunc
- HookedFunc params

To call the original function inside the hook, we first have to restore the original asm instructions (this temporarily removes the hook)
Next, as we know the number of parameters (they are given in the configuration file), we can push them on the stack again and call the original function or another one.

To preserve stack security, we push more data than the declared parameters in the configuration file. This avoids any crash in case of a bad configuration file.
To log parameters, we just copy them from the stack before and/or after the function call.

After the call, we restore the hook to be ready to log the next function call.

Finally, we restore the stack into the same state as it should be after a call without the hook.

Particular cases :

   "|FirstBytesCanExecuteAnywhere"

The big difference with other hooking methods is that hook is not removed when calling original function, so you can't loose any call.

Since version 3.1, winapioverride include a size length disassembly, so, according to options, first bytes of each function will be automatically analysed to try to put a such hook

"|BlockingCall"

Quite like a standard hook, but with a timeout like for hook restoration.
Hooking this way is interesting for slow blocking function like messagebox, because if a new call is done it can be hooked even if the first as not finished.
It can be interesting because you don't need to now the size of movable asm code like with the FirstBytesCanExecuteAnywhere option.

Disadvantages :
- you can still loose calls
- As thread creation is time consuming, there's no use to try this way for high speed blocking call like SendMessages, because you loose more calls than
with the standard method. Another trouble for high speed blocking calls is if restoration of original opcode in thread is done in the same time as the hook removal is done by the generic c handler for another call, you can crash your application.

What is no stack hooking and how it works ?

Hooking needs some storage infromations.
With standard hooking, these infromations are stored on stack, next hooked function is called, and after the call, stack is restored according to function calling convention.
Doing this will affect the esp and ebp registers, so if the hooked function is using these registers values (or stack relative values), hooked function call will fail or return bad result.
By the way ebp can be used as standard registers and contains value for /Oy optimization

Using "no stack hooking", only the return address is modified, the great advantage is that esp and ebp are not changes, so functions using these registers values (or stack relative values) can be hooked.
To avoid stack and ebp changes, requiered information must be allocated inside heap.
To allow multithreaded safety, we just use Thread Local Storage (TLS) to store heap allocation pointers.
Notice : using no stack hooking allows to unhook function even if a function call has not finished : we just need to restore exception handler and return address and hooked process won't crash. This is impossible to do with stack hooking (we need to end hooked function return to restore stack correctly).

The trouble of no stack hooking is for catching exception.
Please read the Matt Pietrek great article "A Crash Course on theDepths of Win32 Structured Exception Handling, MSJ January 1997" for exceptions information (and to understand the two lines below).
Doing a try / catch allocate a little piece of memory on stack address, and, for security reasons, the C++ runtime checks that addresses of try / catch blocks belong to the stack (so you can't allocate this piece of information in heap).
So the only way is to replace the existing exception filtering callback by our one.
Doing this, we MUST register our filtering callback function as a structured exception handler (using of SAFESEH option), else XP SP2 and newer systems won't execute it.

Flow :

Impact on stack :

How hooking at startup works ?

There are two different ways for hooking processes:

   - When we use the "Attach at application startup" method, we create the process in a suspended state, so we know that the entry point has not been reached.
We could settle a breakpoint and debug the process, but debugging a process changes some of its attributes. So we use a different way: we install a hook at the entry point.
Hence the entry point content is replaced by jmp OurHookAddress.
Where OurHook is a function that loads the APIOverride dll, restore the original content of the entry point and suspend the process again to allow the user to load his monitoring and overriding files.
Of course this function doesn't exist in the process we want to hook, and we have to inject the code into it through a call to VirtualAllocEx and WriteProcessMemory.

   - When we use the "Inject before statically linked dll execution", we have to inject our dll inside the IAT. This insertion must be before existing imported dll, because the NTloader sequence is the following :
      Put all modules in memory

      Call first dll Tls
      Call first dll DllMain

      Call second dll Tls
      Call second dll DllMain

      ...

      Call exe tls
      Call exe main
So by being the first imported dll, our DllMain is the first executed code, and so we can do all work we need and we will get control of all other dll.

Injection is done by new import section creation and redirection of Import Table (IMAGE_DIRECTORY_ENTRY_IMPORT index of the NTHeader.OptionalHeader.DataDirectory array of the pseudo header). See microsoft pe specification for PE information.
Notice1 : our dll must export at least a function, because if a dll is declared in the Import Directory Table (IDT), but no function is imported in Import Lookup Table or Import Address Table, our dll won't be loaded by ntloader.
Notice2 : Using CreateProcess with suspended state is useless because nt loader execute imported dll Tls and DllMain before suspending (and giving use control).

   - When we use the "Attach to all new processes" option, we don't create the process, thus we don't know when the procmon driver callback happens.
So we can't hook the entry point of the process (it may have already been reached); and we can't inject the dll as soon as we enter the callback, because the NT loader may have not finished its job (and if we try to do so at this moment, the application crashs)
So we have to do polling until the end of the NT Loader. As soon as there is a module loaded into the process, we can perform injection.
To detect whether the first module has loaded, we poll the Module32First API, and as soon as the result of this API is TRUE,
it means that at least the exe module is loaded, so we can inject our library.

How remote call works ?

To call a function from another process, we have to transmit its parameters by any possible way (shared memory, mailbox, pipe, tcp ...), and then push them to the stack before calling the function address. We next transmit the parameters and return back to our process.
The following code allows to call the function with any parameter/struct and only a SIMPLE pointer to any parameter/struct

Copyright Jacquelin POTIER 2006 Sources under GPL V2 license

The following structure is used to define each parameter.

typedef struct _STRUCT_FUNC_PARAM
{
    BOOL bPassAsRef;    // true if param is pass as ref
    DWORD dwDataSize;   // size in byte
    PBYTE pData;        // pointer to data
}STRUCT_FUNC_PARAM,*PSTRUCT_FUNC_PARAM;

With pFunc as address function we should call, NbParams the number of parameters and pParams a PSTRUCT_FUNC_PARAM array, the code is the following

    _asm
    {
        // store esp to restore it without caring about calling convention
        mov [dwOriginalESP],ESP
 
        // use a security to avoid crashing in case of bad parameters number
        Sub ESP, [dwEspSecuritySize]
    }
    // make things cleaner :D
    try
    {
        // set our allocated memory to 0 
        // (warning memory si allocated by esp-xxx, so to empty buffer we have to do it from new esp addr to old one)
        memset((PBYTE)(dwOriginalESP-dwEspSecuritySize),0,dwEspSecuritySize);
        // for each param
        for (cnt=NbParams-1;cnt>=0;cnt--)
        {
            pCurrentParam=&pParams[cnt];
 
            // if params should be passed as ref
            if (pCurrentParam->bPassAsRef)
            {
                // push param address
                dw=(DWORD)pCurrentParam->pData;
                _asm
                {
                    mov eax,dw
                    push eax
                }
            }
            else // we have to push param value
            {
                // byte
                if (pCurrentParam->dwDataSize==1)
                {
                    b=pCurrentParam->pData[0];
                    _asm
                    {
                        mov al,b
                        push eax
                    }
                }
                // short
                else if (pCurrentParam->dwDataSize==2)
                {
                    memcpy(&us,pCurrentParam->pData,2);
                    _asm
                    {
                        mov ax,us
                        push eax
                    }
                }
                // dword
                else if (pCurrentParam->dwDataSize==4)
                {
                    memcpy(&dw,pCurrentParam->pData,4);
                    _asm
                    {
                        mov eax,dw
                        push eax
                    }
                }
                // more than dword
                else
                {
                    // as we are not always 4 bytes aligned we can't do a loop with push
 
                    // allocate necessary space in stack
                    dwDataSize=pCurrentParam->dwDataSize;
                    _asm
                    {
                        sub esp, [dwDataSize]
                        mov [dwCurrentESP],esp
                    }
 
                    // copy data to stack
                    memcpy((PVOID)dwCurrentESP,pCurrentParam->pData,dwDataSize);
 
                }
            }
        }
        // now all params are pushed in stack --> just make call
        _asm
        {
            ////////////////////////////////
            // save local registers
            ////////////////////////////////
            mov [LocalRegisters.eax],eax
            mov [LocalRegisters.ebx],ebx
            mov [LocalRegisters.ecx],ecx
            mov [LocalRegisters.edx],edx
            mov [LocalRegisters.esi],esi
            mov [LocalRegisters.edi],edi
            pushfd
            pop [LocalRegisters.efl]
 
            ////////////////////////////////
            // set registers as wanted
            ////////////////////////////////
            mov eax, [Registers.eax]
            mov ebx, [Registers.ebx]
            mov ecx, [Registers.ecx]
            mov edx, [Registers.edx]
            mov esi, [Registers.esi]
            mov edi, [Registers.edi]
            push [Registers.efl]
            popfd
 
            // call func
            call pFunc
 
            // save registers after call
            mov [Registers.eax],eax
            mov [Registers.ebx],ebx
            mov [Registers.ecx],ecx
            mov [Registers.edx],edx
            mov [Registers.esi],esi
            mov [Registers.edi],edi
            pushfd
            pop [Registers.efl]
 
 
            // put pointer to return address in ecx
            mov ecx,[pRet]
            // put return value in the address pointed by ecx --> *pRet=eax
            mov eax,[Registers.eax]
            mov [ecx],eax
 
            fst qword ptr [FloatingResult]
 
            ////////////////////////////////
            // restore local registers
            ////////////////////////////////
            mov eax, [LocalRegisters.eax]
            mov ebx, [LocalRegisters.ebx]
            mov ecx, [LocalRegisters.ecx]
            mov edx, [LocalRegisters.edx]
            mov esi, [LocalRegisters.esi]
            mov edi, [LocalRegisters.edi]
            push [LocalRegisters.efl]
            popfd
        }
 
        bRet=TRUE;
    }
 
    catch(...)
    {
        bRet=FALSE;
    }
 
    _asm
    {
        // restore esp (works for both calling convention)
        mov ESP,[dwOriginalESP]
    }
 
    if (bRet)
        memcpy(pRegisters,&Registers,sizeof(REGISTERS));
 
    *pFloatingResult=FloatingResult;
 
    return bRet;
}

By the way, for GetTempPathA(DWORD nBufferLength,LPSTR lpBuffer) func in kernell32, the STRUCT_FUNC_PARAM struct can be filled like

STRUCT_FUNC_PARAM pParams[2];
DWORD dwParam=255;
char pc[255];
  
pParams[0].bPassAsRef=FALSE;
pParams[0].dwDataSize=4;
pParams[0].pData=(PBYTE)&dwParam;
pParams[1].bPassAsRef=TRUE;
pParams[1].dwDataSize=dwParam;
pParams[1].pData=(PBYTE)pc; 
 
pApiOverride->ProcessInternalCall(_T("Kernel32.dll"),_T("GetTempPathA"),2,pParams,&dwResult,dwTimeOut);

How process creation monitoring is done ?

Everything is achieved by a call to the PsSetCreateProcessNotifyRoutine API.
This function puts a callback in place, which is called upon every process creation and given the process ID and parent process ID as parameters.

Sounds easy, doesn't it?
Not really as this function can actually be called only in kernel mode, that means we have to build up a driver (ProcMonDrvJP[64].sys)
All sources of this driver are located in the Tools\Process\ProcessMonitor\Src\ProcMon directory, but as the windows DDK (Driver Development Kit) is required to build it, the debug and release versions of the driver are included in the sources.
To easily manage this driver, the class CProcMonInterface has been written (located in Tools\Process\ProcessMonitor)
It allows to define a callback (in user mode of course), start and stop the driver, and start or stop process creation monitoring