C ++: Cunning and Love, or What Could Go Wrong?





“C makes it easy to shoot yourself in the foot. It's harder to do this in C ++, but it'll take a whole leg off. ”- Björn Stroustrup, C ++ Creator.


In this article, we'll show you how to write stable, secure, and reliable code, and how easy it is to actually break it completely unintentionally. For this we have tried to collect the most useful and fascinating material.







At SimbirSoft, we work closely with the Secure Code Warrior project to train other developers to create secure solutions. Especially for Habr, we translated an article written by our author for the CodeProject.com portal.



So to the code!



Here is a small piece of abstract C ++ code. This code was specially written in order to demonstrate all kinds of problems and vulnerabilities that can potentially be found on very real projects. As you can see, this is a Windows DLL code (this is an important point). Suppose someone is going to use this code in some (safe, of course) solution.



Take a closer look at the code. What, in your opinion, could go wrong in it?



The code
class Finalizer
{
    struct Data
    {
        int i = 0;
        char* c = nullptr;
        
        union U
        {
            long double d;
            
            int i[sizeof(d) / sizeof(int)];
            
            char c [sizeof(i)];
        } u = {};
        
        time_t time;
    };
    
    struct DataNew;
    DataNew* data2 = nullptr;
    
    typedef DataNew* (*SpawnDataNewFunc)();
    SpawnDataNewFunc spawnDataNewFunc = nullptr;
    
    typedef Data* (*Func)();
    Func func = nullptr;
    
    Finalizer()
    {
        func = GetProcAddress(OTHER_LIB, "func")
        
        auto data = func();
        
        auto str = data->c;
        
        memset(str, 0, sizeof(str));
        
        data->u.d = 123456.789;
        
        const int i0 = data->u.i[sizeof(long double) - 1U];
        
        spawnDataNewFunc = GetProcAddress(OTHER_LIB, "SpawnDataNewFunc")
        data2 = spawnDataNewFunc();
    }
    
    ~Finalizer()
    {
        auto data = func();
        
        delete[] data2;
    }
};

Finalizer FINALIZER;

HMODULE OTHER_LIB;
std::vector<int>* INTEGERS;

DWORD WINAPI Init(LPVOID lpParam)
{
    OleInitialize(nullptr);
    
    ExitThread(0U);
}

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved)
{
    static std::vector<std::thread::id> THREADS;
    
    switch (fdwReason)
    {
        case DLL_PROCESS_ATTACH:
            CoInitializeEx(nullptr, COINIT_MULTITHREADED);
            
            srand(time(nullptr));
            
            OTHER_LIB = LoadLibrary("B.dll");
            
            if (OTHER_LIB = nullptr)
                return FALSE;
            
            CreateThread(nullptr, 0U, &Init, nullptr, 0U, nullptr);
        break;
        
        case DLL_PROCESS_DETACH:
            CoUninitialize();
            
            OleUninitialize();
            {
                free(INTEGERS);
                
                const BOOL result = FreeLibrary(OTHER_LIB);
                
                if (!result)
                    throw new std::runtime_error("Required module was not loaded");
                
                return result;
            }
        break;
        
        case DLL_THREAD_ATTACH:
            THREADS.push_back(std::this_thread::get_id());
        break;
        
        case DLL_THREAD_DETACH:
            THREADS.pop_back();
        break;
    }
    return TRUE;
}

__declspec(dllexport) int Initialize(std::vector<int> integers, int& c) throw()
{
    for (int i : integers)
        i *= c;
    
    INTEGERS = new std::vector<int>(integers);
}

int Random()
{
    return rand() + rand();
}

__declspec(dllexport) long long int __cdecl _GetInt(int a)
{
    return 100 / a <= 0 ? a : a + 1 + Random();
}




Perhaps you found this code simple, obvious, and safe enough? Or maybe you found some problems in it? Maybe even a dozen or two?



Well, there are actually over 43 potential threats of varying degrees of significance in this snippet !







What you should pay attention to



1) sizeof (d) (where d is a long double) is not necessarily a multiple of sizeof (int)



int i[sizeof(d) / sizeof(int)];


This situation is not tested or handled here. For example, a long double can be 10 bytes on some platforms (which is not true for the MS VS compiler , but true for RAD Studio , formerly known as C ++ Builder ).



int can also be of different sizes depending on the platform (the above code is for Windows , therefore, in relation to this particular situation, the problem is somewhat contrived, but for portable code this problem is very relevant).



All of this can become a problem if we want to use the so-called typing pun . By the way, it causes undefined behavioraccording to the C ++ language standard. It is common practice to use the typing pun , however, since modern compilers usually define the correct, expected behavior for a given case (as, for example, GCC does ).







Source: Medium.com



By the way, unlike C ++, in modern C, the typing pun is perfectly acceptable (you know that C ++ and C are different languages , and you should n't expect to know C if you know C ++, and the other way around, right?)



Solution: use static_assertto control all such assumptions at compile time. It will warn you if something goes wrong with the type sizes:



static_assert(0U == (sizeof(d) % sizeof(int)), “Houston, we have a problem”);


2) time_t is a macro, in Visual Studio it can refer to 32-bit (old) or 64-bit (new) integer type



time_t time;


Access to a variable of this type from different executable modules (for example, the executable file and the DLL it loads) can lead to read / write outside the bounds of the object if the two binaries are compiled with a different physical representation of this type. Which, in turn, will lead to memory corruption or garbage reading.







Solution: make sure that the same types of a strictly defined size are used for data exchange between all modules:



int64_t time;


3) B.dll (the handle of which is stored by the OTHER_LIB variable ) has not yet been loaded at the moment when we access the above variable, so we cannot get the addresses of the functions of this library



4) the problem with the initialization order of static objects ( SIOF ): ( OTHER_LIB object used in code before it was initialized)



func = GetProcAddress(OTHER_LIB, "func");


FINALIZER is a static object that is created before calling the DllMain function . In its constructor, we are trying to use a library that has not been loaded yet. The problem is compounded by the fact that the static OTHER_LIB that is used by the static FINALIZER is placed in the translation unit downstream. This means that it will also be initialized (zeroed) later. That is, at the time when it will be accessed, it will contain some pseudo-random garbage. WinAPIin general, it should react normally to this, because with a high degree of probability there will simply not be a loaded module with such a descriptor at all. And even if there is an absolutely incredible coincidence and it will still be - it is unlikely that it will have a function named "Func" .



Solution: General advice is to avoid using global objects, especially complex ones, especially if they depend on each other, especially in DLLs . However, if you still need them for any reason, be extremely careful and careful with the order in which they are initialized. To control this order , put all instances (definitions) of global objects into one translation unitin the correct order to ensure they are initialized correctly.



5) previously returned result is not checked before use



auto data = func();


func is a function pointer . And it should point to a function from B.dll . However, since we completely failed everything in the previous step, this will be nullptr . Thus, trying to dereference it, instead of the expected function call, we will receive an access violation or a general protection fault or something like that.



Solution: when working with external code (in our case with WinAPI ), always check the return result of the called functions. For reliable and fault-tolerant systems, this rule even applies to functions for which there is a strict contract [about what they should return and when].



6) reading / writing garbage when exchanging data between modules compiled with different alignment / padding settings



auto str = data->c;


If Data structure (which is used for exchanging information between communication modules) has these same modules in different physical presentation will result in all the previously mentioned access violation , an error memory protection , fault segmentation , heap corruption , etc. Or we'll just read the garbage. The exact result will depend on the actual usage scenario for this memory. All of this can happen because there are no explicit alignment / padding settings for the structure itself . Therefore, if these global settings at the time of compilation were different for interacting modules, we will have problems.







Decision:make sure that all shared data structures have a strong, explicitly defined and obvious physical representation (using fixed-size types, explicitly specified alignment, etc.) and / or interoperable binaries have been compiled with the same global alignment settings / filling.



see also
Alignment (C++ Declarations)

Data structure alignment

Struct padding in C++



7) using the size of a pointer to an array instead of the size of the array itself



memset(str, 0, sizeof(str));


This is usually the result of a trivial typo. But this problem can also potentially arise when dealing with static polymorphism or when the auto keyword is used thoughtlessly ( especially when it is clearly being overused ). One would like to hope, however, that modern compilers are smart enough to detect such problems at compile time, using the capabilities of an internal static analyzer .



Decision:



  • never confuse sizeof ( <full object type> ) and sizeof ( <object pointer type> );
  • do not ignore compiler warnings ;

  • you can also use a bit of C ++ boilerplate magic by combining typeid, constexpr, and static_assert to ensure that types are correct at compile time ( type traits can also be useful here , in particular std :: is_pointer ).


8) undefined behavior when trying to read a different union field than what was previously used to set the value



9) an attempt to read out of bounds is possible if the length of a long double differs between binary modules



const int i0 = data->u.i[sizeof(long double) - 1U];


This was already mentioned earlier, so here we just got another point of presence of the previously mentioned problem.



Solution: Don't refer to a field other than the one you set earlier unless you are sure your compiler is handling it correctly. Make sure the sizes of the shared object types are the same across all interacting modules.



see also
Type-punning and strict-aliasing

What is the Strict Aliasing Rule and Why do we care?



10) even if B.dll was loaded correctly and the "func" function was correctly exported and imported, B.dll is still unloaded from memory by this time (because the FreeLibrary system function was previously called in the DLL_PROCESS_DETACH section of the DllMain callback function )



auto data = func();


Calling a virtual function on a previously destroyed object of polymorphic type, as well as calling a function in an already unloaded dynamic library, will probably result in a pure virtual call error .



Solution: Implement the correct finalization procedure in the application to ensure that all DLLs are finished / unloaded in the correct order. Avoid using static objects with complex logic in DL L. Avoid performing any operations inside the library after calling DllMain / DLL_PROCESS_DETACH (when the library enters its last stage of its life cycle - the phase of destroying its static objects).



You need to understand what the life cycle of a DLL is:



) LoadLibrary



  • ( , )
  • DllMain -> DLL_PROCESS_ATTACH ( , )
  • [] DllMain -> DLL_THREAD_ATTACH / DLL_THREAD_DETACH ( , . 30).
  • , , (, ),
  • ( / , , )
  • , ()
  • ( / , , )
  • - : ,


) FreeLibrary



  • DllMain -> DLL_PROCESS_DETACH ( , )
  • ( , )






11) deletion of an opaque pointer (the compiler needs to know the full type in order to call the destructor, so deleting an object using an opaque pointer can lead to memory leaks and other problems)



12) if the DataNew destructor is virtual, even if the class is exported and imported correctly and the complete information about it, anyway calling its destructor at this stage is a problem - this will probably lead to a purely virtual function call (since the DataNew type is imported from the already unloaded B.dll file ). This problem is possible even if the destructor is not virtual.



13) if DataNew class is abstract polymorphic type, and its base class has a pure virtual destructor without a body, in any case a pure virtual function call will occur .



14) undefined behavior if memory is allocated using new and deleted using delete []



delete[] data2;


In general, you should always be careful when freeing objects received from external modules.



It is also good practice to zero out pointers to destroyed objects.



Decision:



  • when deleting an object, its full type must be known
  • all destructors must have a body
  • the library from which the code is exported should not be unloaded too early
  • always use different forms new and delete correctly, do not confuse them
  • the pointer to the remote object must be zeroed.






Also note the following:

- calling delete on a pointer to void will result in undefined behavior

purely virtual functions should not be called from the constructor

- calling a virtual function in the constructor will not be virtual

- try to avoid manual memory management - use containers , move semantics, and smart pointers



see also
Heap corruption: What could the cause be?



15) ExitThread is the preferred method of exiting a thread in C. In C ++, calling this function will terminate the thread before calling the destructors of local objects (and any other automatic cleanup), so terminating a thread in C ++ should be done simply by returning from the thread function



ExitThread(0U);


Solution: Never manually use this function in C ++ code.



16) in the body of DllMain, calling any standard functions that require system DLLs other than Kernel32.dll can lead to various hard-to-diagnose problems



CoInitializeEx(nullptr, COINIT_MULTITHREADED);


Solution in DllMain:



  • avoid any complicated (de) initialization
  • avoid calling functions from other libraries (or at least be extremely careful with this)






17) incorrect initialization of the pseudo-random number generator in a multithreaded environment



18) since the time returned by the time function has a resolution of 1 sec., Any thread in the program that calls this function during this time period will receive the same value at the output. Using this number to initialize PRNG can lead to collisions (for example, generation of the same pseudo-random names for temporary files, the same port numbers, etc.). One possible solution is to mix ( xor ) the result with some kind of pseudo-random value , such as the address of any stack or object in the heap, a more accurate time, etc.



srand(time(nullptr));


Solution: MS VS requires PRNG initialization for each thread . In addition, using Unix time as an initializer provides insufficient entropy , more advanced initialization value generation is preferred .



see also
Is there an alternative to using time to seed a random number generation?

C++ seeding surprises

Getting random numbers in a thread-safe way [C#]


19) may deadlock or crash (or create dependency loops in DLL load order )



OTHER_LIB = LoadLibrary("B.dll");


Solution: Don't use LoadLibrary at the DllMain entry point . Any complex (de) initialization must be done in certain DLL developer exported functions such as "Init" and "Deint" . The library provides these functions to the user, and the user must call them correctly at the right time. Both parties must strictly abide by this contract.







20) typo (condition is always false), wrong program logic and possible resource leak (because OTHER_LIB is never unloaded on successful download)



if (OTHER_LIB = nullptr)
    return FALSE;


The assignment operator by copying returns a link of the left type, i.e. if will check for OTHER_LIB value (which will be nullptr) and nullptr will be interpreted as false.



Solution: Always use the reverse form to avoid typos like this:



if/while (<constant> == <variable/expression>)


21) it is recommended to use the _beginthread system function to create a new thread in the application (especially if the application was linked with a static version of the C runtime library) otherwise memory leaks may occur when calling ExitThread, DisableThreadLibraryCalls



22) all external calls to DllMain are serialized, so in the body This function should not attempt to create threads / processes or interact with them, otherwise deadlocks may occur.



CreateThread(nullptr, 0U, &Init, nullptr, 0U, nullptr);


23) calling COM functions during DLL termination can lead to incorrect memory access, since the corresponding component may already be unloaded



CoUninitialize();


24) there is no way to control the loading and unloading order of in-process COM / OLE services, so don't call OleInitialize or OleUninitialize from the DllMain function



OleUninitialize();


see also
COM Clients and Servers

In-process, Out-of-process, and Remote Servers



25) call free for a block of memory allocated with new



26) if the application process is in the process of terminating its work (as indicated by a nonzero value of the lpvReserved parameter), all threads in the process, except for the current one, have either already terminated or were forcedly stopped when calling the ExitProcess function, which can leave some of the process resources, such as the heap, in an inconsistent state. As a result, it is not DLL- safe to clean up resources . Instead, the DLL should allow the operating system to reclaim memory.



free(INTEGERS);


Solution: Make sure the old C style of manual memory allocation is not mixed with the “new” C ++ style. Be extremely careful when managing resources in the DllMain function .



27) can cause the DLL to be used even after the system has executed its exit code



const BOOL result = FreeLibrary(OTHER_LIB);


Solution: Don't call FreeLibrary at the DllMain entry point.



28) the current (possibly main) thread will crash



throw new std::runtime_error("    ");


Solution: Avoid throwing exceptions in the DllMain function. If the DLL cannot be loaded correctly for any reason, the function should simply return FALSE. You should also not throw exceptions from the DLL_PROCESS_DETACH section.



Always be careful when throwing exceptions outside the DLL. Any complex objects (for example, classes of the standard library ) can have different physical representation (and even logic of work) in different executable modules if they are compiled with different (incompatible) versions of the runtime libraries .







Try to exchange only simple data types between modules(with a fixed size and well-defined binary representation).



Remember that terminating the main thread will automatically terminate all other threads (which do not terminate correctly and can therefore damage memory, leaving synchronization primitives and other objects in an unpredictable and incorrect state. Moreover, these threads will already cease to exist at the moment when static objects will start their own deconstruction, so don't try to wait for any threads to finish in the destructors of static objects).



see also
Top 20 C++ multithreading mistakes and how to avoid them



29) you can throw an exception (for example, std :: bad_alloc), which is not caught here



THREADS.push_back(std::this_thread::get_id());


Since the DLL_THREAD_ATTACH section is called from some unknown external code, don't expect to see correct behavior here.



Solution: Use try / catch to enclose statements that might throw exceptions that most likely cannot be handled correctly (especially if they exit from the DLL ).



see also
How can I handle a destructor that fails?



30) UB if streams were presented before loading this DLL



THREADS.pop_back();


Threads that already exist at the time the DLL is loaded (including the one that directly loads the DLL ) do not call the loaded DLL's entry point function (which is why they are not registered with the THREADS vector during the DLL_THREAD_ATTACH event), while they still call it with the DLL_THREAD_DETACH event upon completion.

This means that the number of calls to the DLL_THREAD_ATTACH and DLL_THREAD_DETACH sections of the DllMain function will be different.



31) it is better to use fixed-size integer types



32) passing a complex object between modules may crash if compiled with different link and compile settings and flags (different versions of the runtime library, etc.)



33) accessing object c by its virtual address (which is shared by modules) can cause problems if pointers are handled differently in these modules (for example, if modules are associated with different LARGEADDRESSAWARE parameters )



__declspec(dllexport) int Initialize(std::vector<int> integers, int& c) throw()


see also
Is it possible to use more than 2 Gbytes of memory in a 32-bit program launched in the 64-bit Windows?

Application with LARGEADDRESSAWARE flag set getting less virtual memory

Drawbacks of using /LARGEADDRESSAWARE for 32 bit Windows executables?

how to check if exe is set as LARGEADDRESSAWARE [C#]

/LARGEADDRESSAWARE [Ru]

ASLR (Address Space Layout Randomization) [Ru]



And...
Virtual memory

Physical Address Extension

Tagged pointer

std::ptrdiff_t

What is uintptr_t data type

Pointer arithmetic

Pointer aliasing

What is the strict aliasing rule?

reinterpret_cast conversion

restrict type qualifier



The above list is hardly complete, so you can probably add something important in the comments.



Working with pointers is actually much more complex than people usually think of them. Without a doubt, seasoned developers will be able to remember other existing nuances and subtleties (for example, something about the difference between pointers to an object and pointers to a function , because of which, perhaps, not all bits of the pointer can be used , etc. .).







34) an exception can be thrown inside a function :



INTEGERS = new std::vector<int>(integers);


the throw () specification of this function is empty:



__declspec(dllexport) int Initialize(std::vector<int> integers, int& c) throw()


std :: unexpected is called by the C ++ runtime when an exception specification is violated: an exception is thrown from a function whose exception specification disallows exceptions of this type.



Solution: use try / catch (especially when allocating resources, especially in DLLs ) or nothrow form of the new operator. In any case, never start from the naive assumption that all attempts to allocate various kinds of resources will always end successfully .



see also
RAII

We do not use C++ exceptions

Memory Limits for Windows and Windows Server Releases









Problem 1: the formation of such a "more random" value is incorrect. According to the central limit theorem , the sum of independent random variables tends to a normal distribution , not a uniform one (even if the initial values ​​themselves are distributed uniformly).



Issue 2: Possible integer type overflow (which is undefined behavior for signed integer types )



return rand() + rand();


When working with pseudo-random number generators, encryption and the like, always beware of using homemade "solutions." Unless you have specialized education and experience in these highly specific areas, the chances are very high that you will simply outsmart yourself and make the situation worse.



35) the name of the exported function will be decorated (changed) to prevent this use of extern "C"



36) names starting with '_' are implicitly forbidden for C ++, as this naming style is reserved for the STL



__declspec(dllexport) long long int __cdecl _GetInt(int a)


Several problems (and their possible solutions):



37) rand is not thread safe, use rand_r / rand_s instead



38) rand is deprecated, better use modern
C++11 <random>


39) it is not a fact that the rand function was initialized specifically for the current thread (MS VS requires initialization of this function for each thread where it will be called)



40) there are special generators of pseudo-random numbers , and it is better to use them in hack-resistant solutions (they are suitable portable solutions like Libsodium / randombytes_buf , OpenSSL / RAND_bytes , etc.)



41) potential division by zero: may cause the current thread to crash



42) operators with different precedence are used in the same row , which introduces chaos in the order of calculation - use parentheses and / or sequence pointsto specify the obvious sequence of computation



43) potential integer overflow



return 100 / a <= 0 ? a : a + 1 + Random();




see also
Do not use std::rand() for generating pseudorandom numbers





And...
ExitThread function

ExitProcess function

TerminateThread function

TerminateProcess function





And that is not all!



Imagine that you have some important content in memory (for example, a user's password). Of course, you don't want to keep it in memory for longer than is really necessary, thus increasing the likelihood that someone can read it from there .



A naive approach for solving this problem would look something like this:



bool login(char* const userNameBuf, const size_t userNameBufSize,
           char* const pwdBuf, const size_t pwdBufSize) throw()
{
    if (nullptr == userNameBuf || '\0' == *userNameBuf || nullptr == pwdBuf)
        return false;
    
    // Here some actual implementation, which does not checks params
    //  nor does it care of the 'userNameBuf' or 'pwdBuf' lifetime,
    //   while both of them obviously contains private information 
    const bool result = doLoginInternall(userNameBuf, pwdBuf);
    
    // We want to minimize the time this private information is stored within the memory
    memset(userNameBuf, 0, userNameBufSize);
    memset(pwdBuf, 0, pwdBufSize);
}


And it certainly wo n't work the way we would like it to. What then is to be done? :(



Incorrect "solution" # 1: if memset doesn't work, let's do it manually!



void clearMemory(char* const memBuf, const size_t memBufSize) throw()
{
    if (!memBuf || memBufSize < 1U)
        return;
    
    for (size_t idx = 0U; idx < memBufSize; ++idx)
        memBuf[idx] = '\0';
}


Why doesn't this suit us either? The fact is that there are no restrictions in this code that would not allow a modern compiler to optimize it (by the way, the memset function , if it is still used, will most likely be built-in ).



see also
The as-if rule

Are there situations where this rule does not apply?

Copy elision

Atomics and optimization



Incorrect "solution" # 2: try to "improve" the previous "solution" by playing around with the volatile keyword



void clearMemory(volatile char* const volatile memBuf, const volatile size_t memBufSize) throw()
{
    if (!memBuf || memBufSize < 1U)
        return;
    
    for (volatile size_t idx = 0U; idx < memBufSize; ++idx)
        memBuf[idx] = '\0';
    
    *(volatile char*)memBuf = *(volatile char*)memBuf;
    // There is also possibility for someone to remove this "useless" code in the future
}


Will this work? Maybe. For example, this approach is used in RtlSecureZeroMemory (which you can see for yourself by looking at the actual implementation of this function in the Windows SDK sources ).



However, this technique will not work as expected with all compilers .



see also
volatile member functions



Wrong "solution" # 3: use inappropriate OS API function (eg RtlZeroMemory ) or STL (eg std :: fill, std :: for_each)



RtlZeroMemory(memBuf, memBufSize);


More examples of attempts to solve this problem here .



And how is it right?



  • use the correct OS API function , for example, RtlSecureZeroMemory for Windows
  • use function C11 memset_s :


In addition, we can prevent the compiler from optimizing the code by printing (to a file, console or other stream) the value of the variable, but this is obviously not very useful.



see also
Safe clearing of private Data



Summing up



This, of course, is not a complete list of all possible problems, nuances and subtleties that you may encounter when writing applications in C / C ++ .



There are also great things like:





And much more.







Anything to add? Share your interesting experiences in the comments!



PS Want to know more?
Software security errors

Common weakness enumeration

Common types of software vulnerabilities



Vulnerability database

Vulnerability notes database

National vulnerability database



Coding standards

Application security verification standard

Guidelines for the use of the C++ language in critical systems



Secure programming HOWTO

32 OpenMP Traps For C++ Developers

A Collection of Examples of 64-bit Errors in Real Programs




All Articles