About the implementation of the Map data structure in V8



The standard ECMAScript 2015 , known as the ES6, there are many new JavaScript-collections such as Map, Set, WeakMapand WeakSet. They seem to be a great addition to the standard JavaScript capabilities. They are widely used in various libraries, in applications, in the core of Node.js. Today we will talk about the collection Map, try to figure out the specifics of its implementation in V8 and draw some practical conclusions based on the knowledge gained.



The ES6 standard does not provide a clear indication of the approach that should be taken to implement data structure support Map. It only gives some hints on possible ways to implement it. It also contains information about the expected fromMapperformance metrics:



The Map object must be implemented using either hash tables or other mechanisms that, on average, provide sublinear access to the elements of the collection. The data structures used in the Map specification are only intended to describe the observable semantics of Map objects. They were not conceived as a real model for the implementation of these objects.



As you can see, the specification gives those who create JS engines a lot of freedom. But at the same time, there are no specific guidelines regarding the specific approach used for implementation Map, its performance, and memory consumption characteristics. If data structures are used in a critical part of your applicationMapand if you write large amounts of information into such data structures, then a solid knowledge of the implementation Mapwill certainly be of great benefit to you.



I have Java development experience, I'm used to Java collections, where you can choose between different implementations of the interface Mapand even fine-tune the selected implementation if the corresponding class supports it. Moreover, in Java, you can always read the open source code of any class of the standard library and get acquainted with its implementation (it, of course, may change in new versions, but only in the direction of increasing efficiency). That is why I could not resist learning how objects work Mapin V8.



Before we start, I want to note that what will be discussed below refers to the V8 8.4 engine, which is built into the fresh dev version of Node.js (more precisely, we are talking about commit 238104c). You don't need to expect anything outside of the specification.



Algorithm underlying the Map implementation



First of all, I will say that data structures are Mapbased on hash tables. Below I am assuming that you know how hash tables work. If you are not familiar with hash tables, then you should first read about them ( here , for example) and only then continue reading this article.



If you have significant experience with objects Map, then you may already have noticed a contradiction. Namely, hash tables are not guaranteed to return items in some constant order when iterating over them. And the ES6 specification states that in order to implement an object, Mapit is necessary, when traversing it, to return the elements in the order in which they were added to it. As a result, the "classical" algorithm for the implementationMapdoes not fit. But there is a feeling that it, with some changes, can still be used.



V8 uses the so-called " deterministic hash tables " proposed by Tyler Close. The following pseudocode, based on TypeScript, demonstrates the basic data structures used to implement such hash tables:



interface Entry {
    key: any;
    value: any;
    chain: number;
}
 
interface CloseTable {
    hashTable: number[];
    dataTable: Entry[];
    nextSlot: number;
    size: number;
}


Here the interface CloseTablerepresents a hash table. It contains an array hashTablewhose size is equivalent to the number of hash containers. The array element with the index Ncorresponds to the N-th hash container and stores the index of its head element that is in the array dataTable. And this array contains the records of the table in the order they were inserted into it. The entries are presented by the interface Entry. Finally, each entry has a property chainthat points to the next entry in the chain of hash container entries (or, more accurately, in a singly linked list).



Each time a new record is inserted into the table, it is stored in the array element dataTablewith indexnextSlot... This process also requires updating the data in the corresponding hash container, which causes the inserted record to become the new last element of the singly linked list.



When a record is removed from a table, it is removed from dataTable(for example, by writing to its properties keyand valuevalues undefined). Then the entry preceding it and the entry following it are linked directly to each other. As you may have noticed, this means that all deleted entries continue to occupy space in the dataTable.



Now for the last piece of our puzzle. When a table is full of records (both current and deleted), it must be rehashed (rebuilt) with an increase in its size. The size of the table can be changed downwards.



With this approach, traversing the data structure is the Mapsame as traversing an array dataTable. This ensures that the order in which records are inserted into the table is preserved and that the standard is met. With this in mind, I would expect most (if not all) JS engines to use deterministic hash tables as one of the underlying implementation mechanisms Map.



Practical research of the algorithm



Let's take a look at a few examples to help us explore the algorithm in practice. Suppose we have CloseTable2 hash containers ( hastTable.length), the total capacity of which is 4 elements ( dataTable.length). This table is filled with the following content:



// ,    -, 
// ,     ,   function hashCode(n) { return n; }
table.set(0, 'a'); // => - 0 (0 % 2)
table.set(1, 'b'); // => - 1 (1 % 2)
table.set(2, 'c'); // => - 0 (2 % 2)


The internal representation of the table obtained in this example might look like this:



const tableInternals = {
    hashTable: [0, 1],
    dataTable: [
        {
            key: 0,
            value: 'a',
            chain: 2 //  <2, 'c'>
        },
        {
            key: 1,
            value: 'b',
            chain: -1 // -1    
        },
        {
            key: 2,
            value: 'c',
            chain: -1
        },
        //  
    ],
    nextSlot: 3, //    
    size: 3
}


If you delete a record using the method table.delete(0), the hash table will look like the following:



const tableInternals = {
    hashTable: [0, 1],
    dataTable: [
        {
            key: undefined, //  
            value: undefined,
            chain: 2 
        },
        {
            key: 1,
            value: 'b',
            chain: -1
        },
        {
            key: 2,
            value: 'c',
            chain: -1
        },
        //  
    ],
    nextSlot: 3,
    size: 2 //  
}


If we add a couple more records to the table, then it will need to be hashed. We will discuss this process in detail below.



The same approach can be applied when implementing data structures Set. The only difference is that these data structures do not need a property value.



Now that we've figured out what's behind the objects Mapin V8, we're ready to move on.



Implementation details



The data structure implementation Mapin V8 is written in C ++, after which the JS code is given access to the corresponding mechanisms. Most of the code related to Mapis in the OrderedHashTableand classes OrderedHashMap. We already know how these classes work. If you want to take a look at their code yourself, you can find it here , here and here .



Since we are especially interested in the practical details of the implementation Mapin V8, we first need to understand how the capacity of the table is set.



Table capacity



In V8, the capacity of the hash table (data structure Map) is always a power of two. If we talk about the utilization rate of hash containers, then it is always represented by the number 2. That is, the maximum capacity of the table 2 * number_of_bucketsis 2 times the number of hash containers. When creating an empty object, Mapthere are 2 hash containers in its internal hash table. As a result, the capacity of such an object is equal to 4 records.



There is a limitation on the maximum capacity of objects Map. On 64-bit systems, this will be about 16.7 million records. This limitation is due to the peculiarities of representing data structures Mapin the heap. We will talk about it later.



And finally, the factor of increasing or decreasing the table is also always represented by the multiplication of some number by 2. This means that after 4 records are added to the described table, the next insert operation will cause the need to rehashed the table, during which the table size will increase by two times. With a decrease in the size of the table, it, respectively, can become 2 times smaller.



In order to make sure that what I saw in the source code works exactly as I understood it, I modified the V8 engine code built into Node.js, making it so that a Mapnew property bucketscontaining information about the number of hash containers. You can find the results of this modification here... In this special assembly of Node.js, the following script can be run:



const map = new Map();
let prevBuckets = 0;
for (let i = 0; i < 100; i++) {
  if (prevBuckets !== map.buckets) {
    console.log(`size: ${i}, buckets: ${map.buckets}, capacity: ${map.buckets * 2}`);
    prevBuckets = map.buckets;
  }
  map.set({}, {});
}


This script simply inserts Map100 records into the data structure . This is what is displayed in the console after launching it:



$ ./node /home/puzpuzpuz/map-grow-capacity.js
size: 0, buckets: 2, capacity: 4
size: 5, buckets: 4, capacity: 8
size: 9, buckets: 8, capacity: 16
size: 17, buckets: 16, capacity: 32
size: 33, buckets: 32, capacity: 64
size: 65, buckets: 64, capacity: 128


As you can see, when the table is filled, then, with each change of its size, it increases by 2 times. Now let's try to reduce the table by removing elements from it:



const map = new Map();
for (let i = 0; i < 100; i++) {
  map.set(i, i);
}
console.log(`initial size: ${map.size}, buckets: ${map.buckets}, capacity: ${map.buckets * 2}`);
 
let prevBuckets = 0;
for (let i = 0; i < 100; i++) {
  map.delete(i);
  if (prevBuckets !== map.buckets) {
    console.log(`size: ${map.size}, buckets: ${map.buckets}, capacity: ${map.buckets * 2}`);
    prevBuckets = map.buckets;
  }
}


This is what this script will output:



$ ./node /home/puzpuzpuz/map-shrink-capacity.js
initial size: 100, buckets: 64, capacity: 128
size: 99, buckets: 64, capacity: 128
size: 31, buckets: 32, capacity: 64
size: 15, buckets: 16, capacity: 32
size: 7, buckets: 8, capacity: 16
size: 3, buckets: 4, capacity: 8
size: 1, buckets: 2, capacity: 4


Here, again, you can see that the size of the table is reduced each time it has fewer number_of_buckets / 2elements.



Hash function



So far, we have not touched on the question of how V8 calculates hash codes for keys stored in objects Map. And this is an important question.



For values โ€‹โ€‹that can be classified as numeric, some well-known hash function with a low probability of collision is used.



For string values, a hash code is computed based on the values โ€‹โ€‹themselves. After that, this code is cached in the internal header.



And finally, for objects, hashes are calculated based on a random number, and what happens is then cached in the internal header.



Time complexity of operations with Map objects



Most of the operations performed on data structures Map, such as setor delete, require searching through these data structures. As in the case with the "classic" hash tables, the time complexity of the search in our case is O(1).



Imagine a worst-case scenario, when the table is full, that is to say, it occupied Nfrom the Nseats. In this case, all records belong to a single hash container, and the required record is at the very end of the chain of records. In a scenario like this, you will need to take steps to find this entry N.



On the other hand, in the best scenario, when the table is full and there are only 2 records in each hash container, finding a record will require only 2 steps.



Certain operations in hash tables are very fast, but this is not the case with hash operations. The time complexity of the hash operation is O(N). It requires a new hash table to be allocated on the heap. Moreover, rehashing is performed as needed, as part of the operations to insert or remove elements from the table. Therefore, for example, the call map.set()may turn out to be much "more expensive" than expected. Fortunately, the hash operation is rarely performed.



Memory consumption



Of course, the underlying hash table Mapmust be stored on the heap somehow. It is stored in what is called "auxiliary storage". And here another interesting fact awaits us. The entire table (and, therefore, everything that is placed in Map) is stored in a single array of fixed length. The structure of this array is shown in the following figure.





Array used to store Map data structures in memory The



individual parts of the array have the following purposes:



  • Header: Contains general information, such as the number of hash containers or the number of items removed from Map.
  • Hash Container Details: This is where we store data about the containers that correspond to the array hashTablefrom our example.
  • Hash table entries: This is where the data corresponding to the array is stored dataTable. Namely, it contains information about hash table entries. Each record occupies three cells in the array. One stores the key, the second stores the value, and the third stores the "pointer" to the next record in the chain.


If we talk about the size of the array, then it can be roughly estimated as N * 3,5. Here Nis the capacity of the table. In order to understand what this means in terms of memory consumption, let's imagine that we have a 64-bit system and the V8's pointer compression feature is disabled . In this case, 8 bytes are needed to store each element of the array. As a result Map, 29 MB of heap memory is required to store a data structure that contains approximately 1 million records.



Outcome



In this article, we have covered a lot of things related to the data structure Mapin JavaScript. Let's summarize:



  • V8 Mapuses deterministic hash tables for implementation . It is very likely that this data structure is also implemented in other JS engines.
  • The mechanisms that support the work Mapare implemented in C ++, after which they are presented as an API that is accessible from JavaScript.
  • If we talk about the time complexity of operations performed with objects Map, then they, as well as when working with "classic" hash tables, have complexity O(1). In this case, the time complexity of the hashing operation is O(N).
  • 64- Map 1 29 , .
  • , , Set.


Map JavaScript-?










All Articles