Thread overhead

        1. What is the Thread Kernel Object?

        2. What is the Thread Environment Block (TEB)?

        3. What is the Windows User mode stack?

        4. What is the Kernel mode stack?

        5. DLL_ATTACH flags

        5. What is the context switching?

        6. How does the context switching hits the performance?

The Thread Kernel Object is a structure which is created by each thread and holds the thread context data which are CPU registers. It resides about 700 bytes for x32 processors and 1240 bytes for x64 processors of its memory space.

The Thread Environment Block contains thread local storage for static and global variables (TLS) data. When the thread creates a SEH block it inserts the head of this block and removes it when leaves the methods scope  The TEB needs a page of the memory (4kB for x32, x64).

The User mode stack holds arguments and local variables and also keeps an address of code which will be called next time. It needs about 1 Mb of RAM.

The Kernel mode stack holds data which are passed from an application functions to a kernel-mode functions. The OS validate and make copy of these arguments so the application code won’t be able to manipulate data in kernel. The kernel mode functions also use the kernel mode stack. The kernel mode stack is 12 kB of memory when running on x32 Windows and 24kB of memory when running on x64 Windows.

DLL-thread attach and thread detach notifications. The Windows has a policy that all DLL loaded into process have their DllMain() method called. Whenever a thread is created in a process the Windows passes a DLL_THREAD_ATTACH flag to their DllMain() methods and when the thread dies the Windows passes a DLL_THREAD_DETACH flag, so the DLLs is notified that  resources of the DLLs can be released.

Windows gives a time-slice(quant) for running a thread, which is about 30 ms. Every context switch requires:

  1. To copy data from registers to thread context structure
  2. Select one thread from the set of existing threads to schedule next. If the thread is owned by another thread, then Windows must also switch the virtual address space before it start executing any code. 
  3. To copy data from thread context structure to the CPU’s registers.

 When Windows executes code it uses the CPU’s cache which improves performance, when Windows performs context switching it has to copy data from RAM to populate the cache for this thread.

If Windows schedules next thread to perform and it is the same thread which is currently running it doesn’t perform context switching and just gives the thread to continue running. 

For the hardware with multiple CPU’s cores Windows makes sure that threads is scheduled on the same CPU’s core.