Why C won't stop you from making mistakes



In short: because we said so.



:)



Okay, this is too short an explanation for an article, dear reader, and my provocative words require an explanation.



The meeting of the C language committee - which was originally planned to be held in Freiburg, Germany, but did not grow together for obvious reasons - ended on August 7. It went well, we made progress on all fronts. Yes, we are indeed making progress, I assure you, and C is not dead.



I will also mention that I became the Editor of the C project, so before taking the title as an ignorant statement from someone who is too lazy to "try to improve", I want to assure you that I am actually working very hard to ensure that C can to meet the needs of developers without having to screw 50 specific extensions for the sake of building more or less beautiful and useful libraries and applications.



And yet I said it (that the C language will never stop you from making mistakes), which means I have to justify. We can look at thousands of CVEs and associated tickets with a bunch of C code, or we can get MISRA to vigorously check every single C feature for potential misuse ( hello, K&R prototype declarations...) or more complex and fun bugs related to portability and undefined behavior. But instead we read the original source - what the Committee itself said.



Oh, it's time to get some popcorn ?!



No dear reader, put the popcorn aside. As with all ISO procedures, I cannot quote anyone else's words, and this article is not meant to shame anyone. But I will explain why something that we can easily consider bad behavior in a standards-compliant ISO C document will never be ruled out. And let's start with the document by Dr. Philipp Klaus Krause:



N2526, use const for library data that will not be modified .



N2526 is a very simple document.



, , , . — , ! …


I agree, it is not exactly the same thing, but I am sure that the idea will seem reasonable to you, dear reader. When this document was put to a vote, there were almost no votes against it. Later, several people strongly objected because this proposal broke the old code. Of course, this is bad: even I take my breath away when I think about adding const? There is no ABI in the C language that can be influenced by innovation. C (its implementation) does not even pay attention to qualifiers, how can we break something ?! Let's take a look at why some people think this will be a breaking change.



Language C



Or, as I like to call it, "Type safety is for failing languages." Yes, too verbose, so let's stop at "C". You may be wondering why I say that languages ​​like C are not type-safe. After all, here is:



struct Meow {
    int a;
};

struct Bark {
    double b;
    void* c;
};

int main (int argc, char* argv[]) {
    (void)argc;
    (void)argv;

    struct Meow cat;
    struct Bark dog = cat;
    // error: initializing 'struct Bark' with an expression of incompatible type 'struct Meow'

    return 0;
}


To be honest, this looks like strong type safety to me, Jim! And so everything becomes even more piquant:



#include <stdlib.h>

struct Meow {
    int a;
};

struct Bark {
    double b;
    void* c;
};

int main (int argc, char* argv[]) {
    (void)argc;
    (void)argv;

    struct Meow* p_cat = (struct Meow*)malloc(sizeof(struct Meow));
    struct Bark* p_dog = p_cat;
    // :3
    
    return 0;
}


Yes, the C standard allows two completely independent pointer types to refer to each other. Most compilers will warn you about this, but the standard requires you to accept this code unless you unwind -Werror -Wall -Wpedantic, etc., etc., etc.



In fact, the compiler can accept this without explicit casting:



  • volatile (who needs these semantics at all ?!)
  • const (write any read-only data here!)
  • _Atomic (thread safety!)


I'm not saying that you shouldn't be able to do all of this. But when you write in C - in which it is very easy to create a function of 500-1000 lines with completely incomprehensible variable names - Infa Sotka, that you work mainly with pointers, and you generally lack security in terms of the base language. Note: this violates the restrictions, but so much old code has already been written that every implementation somehow ignores qualifiers, and because of this your code will not be prevented from compiling ( thanks @fanf!)! In this situation, it is possible with the compiler to easily identify every potential failure, and you will receive warnings, but you will not be required to typecast in order to let the compiler know what you really wanted to do. More importantly, though, the human beings who come after you will also not understand what you set out to do.



All you need to do is remove the feature -Werror -Wall -Wpedantic, and you will be ready to commit the crimes of multithreading, read-only mode, and hardware registers.



It's all fair now, right? If someone removes all these warning and error flags, they won't care what kind of missteps or stupid things you do. This means that in the end, these warnings are completely irrelevant and harmless as far as ISO C compliance is concerned. And yet ...



We are considering breaking warnings



Yes.



This is a special hell that C developers and to a lesser extent C ++ developers are used to. Warnings are annoying, and, as practice shows, including -Weverythingor /W4, very annoying. Hiding variable warnings in the global namespace (thanks, now all headers and C libraries are a problem), using "reserved" names (as the kids say, " lol nice one xd !! "), and "this structure has padding because you used alignof"(... yes, yes, I know she has padding, I explicitly asked for more padding, BECAUSE I USED alignof, MR. COMPILATOR) - all this takes a lot of time.



But these are warnings.



Even if they are annoying, they help you avoid problems. The fact that I can shamelessly ignore all qualifiers and neglect all sorts of read, write, stream, and read-only security is a major concern when it comes to communicating my intentions and avoiding bugs. Even the old K&R syntax led to bugs in industrial and government codebases because users were doing something wrong. And they did this not because they were lousy programmers, but because they work with code bases that are often older than them, and are preparing to fight with those. long millions of lines. It's impossible to keep the entire codebase in your head: conventions, static analysis, high-level warnings, and other tools are for this. Unfortunately,



everyone wants to have a code without warnings.



This means that when the GCC developer makes warnings more sensitive to potentially problematic situations, maintainers (not the original developers) unexpectedly receive bold logs of several gigabytes from the old code, containing a lot of new warnings and all sorts of different things. "This is idiocy," they say, "the code worked for YEARS so why is GCC complaining now?" That is, even if you add constfunctions to the signature, even if it is morally, spiritually and actually correct, it will be avoided. "Breaking" people means "now they have to look for a code that has dubious intentions." This is code that can - on pain of Undefined Behavior - destroy your chip or damage your memory.... But this is another problem that accompanies the C-developer profession these days.



Age as a measure of quality



How many people even assumed that they sudohad such a primitive vulnerability as "-1 or an integer overflow gives access to everything"? How many people thought Heartbleed could be a real problem? How many game developers ship “small” stb libraries without even using a phaser and not realizing that these libraries contain more important input vulnerabilities than you might imagine? I am not criticizing all of these developments or their authors: they provide us with vital help, on which the world depended for decades, often with little or no support until some big problem arises. But people who idolize these developments and put them on their own, then ooze a poisonous saying from themselves, illustrating the survivor's mistake:



, ?


Keeping the principles of backward compatibility and "relaxed" as the highest ideals of C, people who survive long enough in this industry begin to equate age with quality, like codebases are barrels of wine. The older and longer the code is used, the finer and more refined the wine.



Unfortunately, everything is not so romantic and cute: full of bugs, with an abundance of security holes, all this technical debt is getting more dangerous every day. Over time, all systems turn into half-life, unkempt and partially unsupported rotting heaps. They are embellished and given a spirit of nobility, but in reality they are mummies who are just waiting to be awkwardly poked, and then their festering, antediluvian boils will explode and flood your application with their beautiful seasoned botulism.



Hmm ... disgusting. But what about Standard C?



The problem I noticed during my (incredibly short) tenure as a meeting participant is that we prioritize backward compatibility. For those who even migrate to C today, we are holding on to old applications and their use cases and depriving ourselves of the opportunity to improve the safety, security, or artistry of the C code. Dr. Krause's proposal is so succinct that it is almost indisputable: if someone does not like warnings , he can turn them off. These warnings, not errors, are not in vain: an abstract C machine does not requireto do diagnostics, ISO C allows such code to be accepted in strict assembly modes. And that would help the whole world to break away from APIs that openly state, "changing the content we provide you is undefined behavior."



However, we changed our mind about this document after the reason was given as “we cannot introduce new warnings”.



The argument against the proposal was: "There is so much code written that will break if we change these signatures." This, again, limits changes to warnings in terms of breaking behavior (remember, implicit conversions that remove qualifiers - even_Atomic- are fully valid according to ISO C, even if they violate the restrictions). If that were the case, every compiler author would introduce something like Ages in Rust, For Warnings Only, to give people a "stable" benchmark for testing. This proposal is not new: I've read similar documents from Coverity engineers about generating new alerts and how users react to them. It is difficult to manage the "confidence" of developers about new warnings and other things. It takes a long time to convince people of their usefulness. Even John Carmack had to work hard to get the right set of warnings and errors from his static analysis tools to suit his development, before he concluded that it was "irresponsible not to use this . "



And yet we, the Committee, did not agree to add constvalue-returning functions to the four functions because it would add warnings to potentially dangerous code. We objected to the old K&R syntax being deprecated, despite strong evidence of both innocent oversights and serious vulnerabilities in passing the wrong types. We've almost added undefined behavior to the preprocessor, just to get it down, but to make the C implementation "behave as it should". Due to backward compatibility, we always walk on the edge to avoid making obvious mistakes. And this, dear reader, scares me the most about the future of S.



Standard C does not protect you



Make no mistake: it doesn't matter what the programmers tell you or what they whisper to you. The C language steering committee is very clear. We will not add new warnings to your old code, even if this code could be dangerous. We will not hold you back from making mistakes, because it can undermine the idea of ​​how your old code works, which is wrong. We will not help newbies write better C code. We will not require your old code to comply with any Standard. Each new feature will be optional because we can't imagine forcing compiler authors to stick to a higher standard or expecting more from our standard library developers.



We'll let the compiler lie to you. We'll lie to your code. And when everything goes wrong - there will be an error, "Oh, some kind of garbage happened", there will be a data leak - we will solemnly shake our heads. We will share our ideas and pray for you and say, "Well, what a shame." Indeed, shame ...



Perhaps someday we will fix it, dear reader.



All Articles