Porting Detroit: Become Human from Playstation 4 to PC

Introduction



In this series of posts, we'll cover porting Detroit: Become Human  from PlayStation 4 to PC.



Detroit: Become Human was released on PlayStation 4 in May 2018. We started work on the PC version in July 2018 and released it in December 2019. This is an adventure game with three playable characters and many storylines. It has very high quality graphics and most of the graphics technology was developed by Quantic Dream itself.



The 3D engine has excellent features:



  • Realistic rendering of characters.
  • PBR lighting.
  • High quality post-processing like Depth of Field (DOF), motion blur, and so on.
  • Temporary anti-aliasing.






Detroit: Become Human



From the very beginning, the game's 3D engine was designed specifically for the PlayStation, and we had no idea that it would later support other platforms. Therefore, the PC version was a challenge for us.



  • 3D Engine Lead Ronan Marshalot and 3D Engine Leads Nicholas Viseri and Jonathan Siret of Quantic Dream will talk about rendering aspects of the ported game. They will explain what optimizations could be seamlessly transferred from PlayStation 4 to PC, and what difficulties they faced due to differences between platforms.
  • Lou Kramer is a Technology Development Engineer at AMD . She helped us optimize the game, so she will talk in detail about the heterogeneous indexing of resources on the PC and, in particular, in AMD cards.


Choosing a Graphics API



We already had an OpenGL version of the engine, which we used in our development tools.



But we didn't want to release the game in OpenGL:



  • We had a lot of proprietary extensions that weren't open to all GPU manufacturers.
  • The engine had very low performance in OpenGL, although, of course, it could be optimized.
  • In OpenGL, there are many ways to implement different aspects, so it was a nightmare to implement different aspects correctly on all platforms.
  • OpenGL . , , .


Due to the heavy use of unrelated resources, we could not port the game to DirectX11. It does not have enough resource slots, and it would be very difficult to achieve decent performance if we had to redo the shaders to use fewer resources.



We chose between DirectX 12 and Vulkan, which have a very similar feature set. Vulkan would further enable us to provide support for Linux and mobile phones, and DirectX 12 would provide support for the Microsoft Xbox. We knew that eventually we would need to implement support for both APIs, but it would make more sense for the port to focus on only one API.



Vulkan supports Windows 7 and Windows 8. Since we wanted to make Detroit: Become Humanaccessible to as many players as possible, this has become a very strong argument. However, the porting took one year, and this argument is already unimportant, because Windows 10 is now in widespread use!



Various Graphics API Concepts



OpenGL and older DirectX versions have a very simple GPU control model. These APIs are easy to understand and very well suited for learning. They instruct the driver to do a lot of work that is hidden from the developer. Therefore, it will be very difficult to optimize a fully functional 3D engine in them.



On the other hand, the PlayStation 4 API is very lightweight and very close to hardware.



Vulkan is somewhere in between. It also has abstractions because it runs on different GPUs, but developers have more control. Let's say we have a task to implement memory management or shader cache. Since there is less work left for the driver, we have to do it! However, we developed projects on the PlayStation, and therefore it is more convenient for us when we can control everything.



Difficulties



The PlayStation 4 CPU is an AMD Jaguar with 8 cores. Obviously slower than newer PC hardware; however, the PlayStation 4 has important advantages, in particular, very fast access to hardware. We believe the PlayStation 4 graphics API is far more efficient than all APIs on the PC. He is very straightforward and wastes few resources. This means that we can achieve a large number of draw calls per frame. We knew that high draw calls can be a problem on slower PCs.



Another important advantage was that all shaders on the PlayStation 4 could be compiled in advance, which meant that they were loaded almost instantly. On a PC, the driver must compile shaders at boot time: due to the large number of GPU and driver configurations supported, this process cannot be done in advance.



During the development of Detroit: Become Human on PlayStation 4, artists were able to create unique shader trees for all materials. This produced an insane amount of vertex and pixel shaders, so we knew from the beginning of the port that this would be a huge problem.



Shader pipelines



As we know from our OpenGL engine, compiling shaders can be time consuming on a PC. During the production of the game, we generated a shader cache based on the GPU model of our workstations. Generating a full shader cache for Detroit: Become Human took a whole night! All employees got access to this shader cache in the morning. But the game still slowed down, because the driver needed to convert this code into the native assembler code of the GPU shader.



It turned out that Vulkan handles this problem much better than OpenGL.



First, Vulkan does not directly use a high-level shader language like HLSL, but instead uses an intermediate shader language called SPIR-V. SPIR-V speeds up shader compilation and makes it easy to optimize for the driver shader compiler. In fact, in terms of performance, it is comparable to the OpenGL shader cache system.



In Vulkan, shaders must be linked to form VkPipeline. For example, VkPipelineyou can create from a vertex and pixel shader. It also contains information about the state of the rendering (depth tests, stencil, blending, etc.) and render target formats. This information is important to the driver so that it can compile shaders as efficiently as possible.



In OpenGL, compiling shaders does not know the context of using shaders. The driver needs to wait for a draw call to generate the GPU binary, which is why the first draw call with a new shader can take a long time on the CPU.



In Vulkan, the pipeline VkPipelineprovides a context of use, so the driver has all the information it needs to generate the GPU binary, and the first draw call doesn't waste any resources. Also, we can update VkPipelineCacheon creation VkPipeline.



Initially, we tried to create VkPipelinesthe first time we need it. This caused slowdowns similar to the situation with the OpenGL drivers. Then it was VkPipelineCacheupdated, and the braking disappeared until the next draw call.



Then we predicted that we would be able to create VkPipelinesat boot time, but when it VkPipelineCachewas irrelevant, it was so slow that the strategy for loading in the background could not be implemented.



Ultimately, we decided to generate everything VkPipelineduring the first launch of the game. This completely eliminated the braking problems, but now we are faced with a new difficulty: the generation VkPipelineCachetook a very long time.



Detroit: Become Human contains roughly 99,500 VkPipeline! The game uses forward rendering, so the material shaders contain all the lighting code. Therefore, compilation of each shader can take a long time.



We came up with several ideas for optimizing the process:



  • , SPIR-V.
  • SPIR-V SPIR-V.
  • , CPU 100% VkPipeline.


Also, an important optimization was suggested by Jeff Boltz from NVIDIA, and in our case it turned out to be very effective.



Many are VkPipelinevery similar. For example, some VkPipelinemay have the same vertex and pixel shaders, differing in only a few render states, such as stencil parameters. In this case, the driver can treat them as one pipeline. But if we create them at the same time, one of the threads will simply idle, waiting for the other to complete the task. By its nature, our process transmitted all similar ones VkPipelineat the same time. To solve this problem, we just changed the sort order VkPipeline. The "clones" were placed at the end, and as a result, their creation began to take much less time.



Creation performanceVkPipelinesvaries greatly. In particular, it is highly dependent on the number of available hardware threads. On AMD Ryzen Threadripper with 64 hardware threads, it can take as little as two minutes. But on weak PCs, this process, unfortunately, can take more than 20 minutes.



The latter was too long for us. Unfortunately, the only way to further reduce this time was to reduce the number of shaders. We would need to change the way we create materials so that as many of them as possible are shared. For Detroit: Become Human, this was impossible, because the artists would have to redo all the materials. We plan to implement proper material instancing in the next game, but it was too late for Detroit: Become Human .



Indexing descriptors



To optimize the speed of draw calls on the PC, we used the indexing of the descriptors using the extension VK_EXT_descriptor_indexing. Its principle is simple: we can create a set of descriptors containing all the buffers and textures used in the frame. Then we can access the buffers and textures through indexes. The main advantage of this is that resources are bound only once per frame, even if used in multiple draw calls. This is very similar to using unbound resources in OpenGL.



We create resource arrays for all types of resources used:



  • One array for all 2D textures.
  • One array for all 3D textures.
  • One array for all cubic textures.
  • One array for all material buffers.


We only have a main buffer that changes between draw calls (implemented as a circular buffer) containing a descriptor index that refers to the desired material buffer and the required matrices. Each material buffer contains indices of the textures used.





Thanks to this strategy, we were able to keep a small number of descriptor sets common to all draw calls and containing all the information needed to draw the frame.



Optimizing descriptor set updates



Even with a small number of descriptor sets, updating them was still a bottleneck. Updating a descriptor set can be very costly if it contains many resources. For example, in one frame of Detroit: Become Human there can be more than four thousand textures.



We have implemented incremental updates to the descriptor sets, keeping track of resources that become visible and invisible in the current frame. In addition, this limits the size of the descriptor arrays, because they have enough capacity to handle the visible resources at the current time. Tracking visibility wastes little resources because we don't use a costly algorithm for computing intersections withO(n.log(n))... Instead, we use two lists, one for the current frame and one for the previous one. Moving the remaining visible resources from one list to another and examining the remaining resources in the first list helps to determine which resources enter and disappear from the pyramid of visibility.



The deltas obtained during these calculations are stored for four frames - we use triple buffering, and to calculate the motion vectors of objects with skinning, one more frame is required. The descriptor set must remain unchanged for at least four frames before it can be modified again, because it can still be useful to the GPU. Therefore, we apply deltas to groups of four frames.



Ultimately, this optimization reduced the update time for descriptor sets by one to two orders of magnitude.



Butching primitives



Using descriptor indexing allows us to batch batch multiple primitives in a single draw call using vkCmdDrawIndexedIndirect. We use gl_InstanceIDto access the desired indexes in the main buffer. Primitives can be grouped into batches if they have the same set of descriptors, the same shader pipeline, and the same vertex buffer. This is very effective, especially during depth and shadow passes. The total number of draw calls is reduced by 60%.



This concludes the first part of the article series. In Part 2, Technology Engineer Lou Kramer will talk about heterogeneous resource indexing on PCs and AMD cards in particular.



All Articles