Understanding JIT in PHP 8

Translation of the article was prepared on the eve of the start of the course "Backend-developer in PHP"








TL; DR



The compiler Just In Time in PHP 8 is implemented as part of the Opcache extension and is designed to compile the operating code into processor instructions in runtime.



This means that with JIT some operating codes should not be interpreted by Zend VM, such instructions will be executed directly as processor level instructions.



JIT in PHP 8



One of the most commented features of PHP 8 is the Just In Time (JIT) compiler. It is heard in many blogs and communities - there is a lot of buzz around it, but so far I have not found a lot of details about how JIT works in detail.



After many attempts and frustrations to find useful information, I decided to study the PHP source code. Combining my little knowledge of C with all the scattered information I've been able to gather so far, I've managed to prepare this article and hope it helps you understand PHP JIT better.



To keep things simple: When JIT works as expected, your code will not be executed through the Zend VM, instead it will execute directly as a set of processor-level instructions.



This is the whole idea.



But to understand this better, we need to think about how php works internally. It's not very difficult, but it does take some introduction.



I've already written an article with a quick overview of how php works . If you think this article is getting too complicated, just read its predecessor and come back. This should make things a little easier.



How is PHP code executed?



We all know php is an interpreted language. But what does this actually mean?



Whenever you want to execute PHP code, be it a snippet or a whole web application, you have to go through the php interpreter. The most commonly used ones are PHP FPM and the CLI interpreter. Their job is very simple: get the php code, interpret it, and return the result.



This is a common picture for every interpreted language. Some steps may vary, but the general idea is the same. In PHP it works like this:



  1. PHP code is read and converted into a set of keywords known as Tokens. This process allows the interpreter to understand in which part of the program each piece of code is written. This first step is called Lexing or Tokenizing .
  2. , PHP . (Abstract Syntax Tree — AST) , (parsing). AST , , . , «echo 1 + 1» « 1 + 1» , , « , — 1 + 1».
  3. AST, , . -, , (Intermediate Representation IR), PHP (Opcode). AST .
  4. Now that we have the opcodes, comes the most interesting: the implementation of the code! PHP has an engine called Zend VM that is capable of getting a list of opcodes and executing them. After all opcodes have been executed, the program ends.




To make it a little clearer, I made a diagram:





A simplified diagram of the PHP interpretation process.



Pretty straightforward as you can see. But there's also a bottleneck here: what's the point of lexing and parsing your code every time you execute it if your php code may not even change that often?



After all, we're only interested in opcodes, right? Right! This is why the Opcache extension exists .



Opcache extension



The Opcache extension comes with PHP and there is usually no particular reason to deactivate it. If you are using PHP you should probably enable Opcache.



What it does is add an on-line shared opcode cache layer. Its job is to fetch the recently generated opcodes from our AST and cache them so that later executions can easily skip the lexing and parsing phases.



Here is a diagram of the same process with the Opcache extension in mind:





PHP interpretation flow with Opcache. If the file has already been parsed, php extracts the cached opcode for it, rather than re-parse it.



It's just mesmerizing how beautifully the lexing, parsing and compilation steps are skipped.

Note : This is where the PHP 7.4 preload feature comes in handy ! This allows you to tell PHP FPM to parse your codebase, convert it to opcodes, and cache them even before you actually do anything.


You may start to wonder where you can stick JIT here, right ?! At least I hope so, which is why I am writing this article ...



What does the Just In Time compiler do?



After listening to Ziv's explanation in an episode of PHP and JIT podcasts from PHP Internals News , I was able to get some idea of ​​what the JIT is actually supposed to do ...



If Opcache allows faster fetching of opcode so that it can jump directly to the Zend VM, JIT intended to make it work without Zend VM at all.



Zend VM is a C program that acts as a layer between the operating code and the processor itself. The JIT generates the compiled code at runtime, so php can skip the Zend VM and jump directly to the processor . In theory, we should benefit from this in terms of performance.



It sounded strange at first, because to compile machine code, you have to write a very specific implementation for each type of architecture. But in fact it is quite real.



The JIT implementation in PHP uses the DynASM (Dynamic Assembler) library , which maps a set of CPU instructions in a specific format to assembly code for many different types of CPUs. Thus, the Just In Time compiler converts operating code to architecture-specific machine code using DynASM.



Although one thought still haunted me ...



If preloading is capable of parsing php code to operational before execution, and DynASM can compile operational code to machine code (Just In Time compilation), why the heck don't we compile PHP right in place using Ahead of Time compilation ?!



One of the thoughts I got from the podcast episode was that PHP is weakly typed, meaning PHP often doesn't know what type a variable is until Zend VM tries to execute a specific opcode.



You can understand this by looking at the zend_value union type , which has many pointers to different type representations for a variable. Whenever Zend VM tries to fetch a value from zend_value, it uses macros like ZSTR_VALthat are trying to access the string pointer from the value concatenation.



For example, this Zend VM handler must handle the less than or equal to (<=) expression. See how it branches into many different code paths to guess the types of the operands.



Duplicating this type inference logic with machine code is not feasible and could potentially make things even slower.



The final compilation after the types have been evaluated is also not a good option because compiling to machine code is a CPU-intensive task. So compiling EVERYTHING at runtime is a bad idea.



How does the Just In Time compiler behave?



We now know that we cannot deduce types to generate good enough pre-compilation. We also know that compilation at runtime is expensive. How can JIT be useful for PHP?



To balance this equation, PHP JIT tries to compile only a few opcodes that it thinks are worth it. To do this, it profiles the opcodes executed by the Zend virtual machine and checks which ones make sense to compile. (depending on your configuration) .



When a particular opcode is compiled, it then delegates execution to that compiled code instead of delegating to the Zend VM. It looks like the diagram below:





PHP interpretation flow with JIT. If they are already compiled, the opcodes are not executed through the Zend VM.



Thus, there are a couple of instructions in the Opcache extension that determine whether certain operating code should be compiled or not. If so, the compiler converts it to machine code using DynASM and executes this newly generated machine code.



Interestingly, since the current implementation has a megabyte limit for compiled code (also configurable), code execution should be able to seamlessly switch between JIT and interpreted code.



By the way, this talk by Benoit Jacquemont about JIT from php helped me VERY much to figure this out.



I am still not sure in what specific cases the compilation takes place, but I think I don’t really want to know this yet.



So your productivity gain probably won't be colossal



I hope it is much clearer now WHY everyone is saying that most php applications will not get much performance benefit from using the Just In Time compiler. And why Ziv's recommendation for profiling and experimenting with different JIT configurations for your application is the best way to go.



Compiled opcodes will usually be spread across multiple requests if you are using PHP FPM, but this is still not a game changer.



This is because JIT optimizes CPU operations, and nowadays most php applications are more I / O tied than anything else. It doesn't matter if the processing operations are compiled if you have to access disk or network anyway. The timings will be very similar.



If only...



You're doing something non-I / O, like image processing or machine learning. Anything other than I / O will benefit from the Just In Time compiler. This is also the reason people now say they lean more towards writing native PHP functions written in PHP rather than C. The overhead will not be dramatically different if such functions are compiled anyway.



An interesting time being a PHP programmer ...



I hope this article was helpful to you and you got a better understanding of what JIT is in PHP 8. Feel free to tweet me if you want to add something I might have forgotten here, and don't forget to share this with your fellow developers, it will surely add a little bit of value to your conversations!-- @nawarian






PHP:







All Articles