Why did you leave package-lock.json support in npm 7?

From the moment we announced that files will be supported in npm 7 yarn.lock, they asked me the same question several times. It sounded like this: “Why then leave support package-lock.json? Why not use only yarn.lock? ” The short answer to this question is: “Because it doesn't fully meet the needs of npm. If you rely solely on it, it will impair npm's ability to form optimal package installation schemes and the ability to add new functionality to the project. " A more detailed answer is presented in this material.







yarn.lock



The basic structure of the yarn.lock file



The file yarn.lockis a description of the correspondence of package dependency specifiers and metadata describing the resolution of these dependencies. For example:



mkdirp@1.x:
  version "1.0.2"
  resolved "https://registry.yarnpkg.com/mkdirp/-/mkdirp-1.0.2.tgz#5ccd93437619ca7050b538573fc918327eba98fb"
  integrity sha512-N2REVrJ/X/jGPfit2d7zea2J1pf7EAR5chIUcfHffAZ7gmlam5U65sAm76+o4ntQbSRdTjYf7qZz3chuHlwXEA==


The following is reported in this passage: "Any dependence on mkdirp@1.xshould be resolved exactly to what is indicated here." If several packages depend on mkdirp@1.x, then all these dependencies will be resolved in the same way.



In npm 7, if a file exists in the project yarn.lock, npm will use the metadata it contains. The field values resolvedwill tell npm where it needs to download packages from, and the field values integritywill be used to check what is received to ensure that it matches what was expected to be received. If packages are added to or removed from the project, the contents are updated accordingly yarn.lock.



At the same time, npm, as before, creates a filepackage-lock.json. If this file is present in the project, it will be used as the authoritative source of information about the structure (form) of the dependency tree.



The question here is, "If yarn.lockgood enough for Yarn's package manager, why can't npm just use this file?"



Deterministic Dependency Installation Results



The results of installing packages using Yarn are guaranteed to be the same when using the same file yarn.lockand the same version of Yarn. Using different versions of Yarn can cause the package files on the disk to be located differently.



The file yarn.lockguarantees deterministic dependency resolution. For example, if foo@1.xallowed in foo@1.2.3, then, given the use of the same file yarn.lock, this will always happen, in all versions of Yarn. But this (at least in itself) is not equivalent to guaranteeing that the structure of the dependency tree is deterministic!



Consider the following dependency graph:



root -> (foo@1, bar@1)
foo -> (baz@1)
bar -> (baz@2)


Here are a couple of dependency tree schemas, each of which can be considered correct.



Tree number 1:



root
+-- foo
+-- bar
|   +-- baz@2
+-- baz@1


Tree number 2:



+-- foo
|   +-- baz@1
+-- bar
+-- baz@2


The file yarn.lockcannot tell us which dependency tree to use. If a rootcommand is executed in the package require(«baz»)(which is incorrect, since this dependency is not reflected in the dependency tree), the file yarn.lockdoes not guarantee the correct execution of this operation. This is a form of determinism that a file can give package-lock.json, but not yarn.lock.



In practice, of course, since Yarn, in the fileyarn.lock, there is all the information needed to select the appropriate version of a dependency, the choice is deterministic as long as everyone is using the same version of Yarn. This means that the choice of version is always done in the same way. The code doesn't change until someone changes it. It should be noted that Yarn is smart enough to not be affected by discrepancies regarding the load time of the package manifest when creating the dependency tree. Otherwise, the determinism of the results could not be guaranteed.



Since this is determined by the features of the Yarn algorithms, rather than the data structures available on the disk (not identifying the algorithm that will be used), this guarantee of determinism is basically weaker than the guarantee that it givespackage-lock.jsoncontaining a complete description of the structure of the dependency tree stored on disk.



In other words, how Yarn builds the dependency tree is influenced by the file yarn.lockand implementation of Yarn itself. And in npm, only the file affects how the dependency tree looks like package-lock.json. This package-lock.jsonmakes it harder to accidentally break the project structure as described in different versions of npm. And if changes are made to the file (maybe by mistake, or intentionally), these changes will be clearly visible in the file when adding its changed version to the project repository, which uses the version control system.



Nested dependencies and dependency deduplication



Moreover, there is a whole class of situations involving work with nested dependencies and deduplication of dependencies, when the file is yarn.locknot able to accurately reflect the result of dependency resolution, which will be used by npm in practice. Moreover, this is true even for those cases when npm uses yarn.lockmetadata as a source. While npm uses it yarn.lockas a reliable source of information, npm does not consider this file as an authoritative source of information about restrictions imposed on dependency versions.



In some cases, Yarn generates a dependency tree with a very high level of package duplication, and we don't need it. As a result, it turns out that exactly following Yarn's algorithm in such cases is far from ideal.



Consider the following dependency graph:



root -> (x@1.x, y@1.x, z@1.x)
x@1.1.0 -> ()
x@1.2.0 -> ()
y@1.0.0 -> (x@1.1, z@2.x)
z@1.0.0 -> ()
z@2.0.0 -> (x@1.x)


The project rootdepends on the 1.xpackage versions x, yand z. The package ydepends on x@1.1and on z@2.x. A zversion 1 package has no dependencies, but the same version 2 package depends on x@1.x.



Based on this information, npm generates the following dependency tree:



root (x@1.x, y@1.x, z@1.x) <--   x@1.x
+-- x 1.2.0                <-- x@1.x   1.2.0
+-- y (x@1.1, z@2.x)
|   +-- x 1.1.0            <-- x@1.x   1.1.0
|   +-- z 2.0.0 (x@1.x)    <--   x@1.x
+-- z 1.0.0


The package z@2.0.0depends on x@1.x, the same can be said about root. The file yarn.lockmaps to x@1.xc 1.2.0. However, a package dependency zwhere also specified x@1.xwill instead be resolved to x@1.1.0.



As a result, even though the dependency x@1.xis described in yarn.lockwhere it is stated that it should resolve to the package version 1.2.0, there is a second resolution result x@1.xto the package version 1.1.0.



If you run npm with a flag --prefer-dedupe, then the system will go one step further and install only one instance of the dependency x, which will lead to the formation of the following dependency tree:



root (x@1.x, y@1.x, z@1.x)
+-- x 1.1.0       <-- x@1.x       1.1.0
+-- y (x@1.1, z@2.x)
|   +-- z 2.0.0 (x@1.x)
+-- z 1.0.0


This minimizes duplication of dependencies, the resulting dependency tree is committed to the file package-lock.json.



Since the file yarn.lockonly captures the order in which dependencies are resolved, not the resulting package tree, Yarn will generate a dependency tree like this:



root (x@1.x, y@1.x, z@1.x) <--   x@1.x
+-- x 1.2.0                <-- x@1.x   1.2.0
+-- y (x@1.1, z@2.x)
|   +-- x 1.1.0            <-- x@1.x   1.1.0
|   +-- z 2.0.0 (x@1.x)    <-- x@1.1.0   , ...
|       +-- x 1.2.0        <-- Yarn     ,    yarn.lock
+-- z 1.0.0


A package x, when using Yarn, appears in the dependency tree three times. When using npm without additional settings - 2 times. And when using the flag --prefer-dedupe- only once (although then the dependency tree is not the newest and not the best version of the package).



All three resulting dependency trees can be considered correct in the sense that each package will receive those versions of the dependencies that meet the stated requirements. But we don't want to create package trees with too many duplicates. Think about what will happen if x- this is a large package that has many own dependencies!



As a result, there is only one way in which npm can optimize the package tree, while supporting the creation of deterministic and reproducible dependency trees. This method consists in using a lock file, the principle of the formation and use of which at a fundamental level differs from yarn.lock.



Recording the results of the implementation of user intent



As already mentioned, in npm 7, the user can use the flag --prefer-dedupein order to apply the dependency tree generation algorithm, in which the priority is given to deduplication of dependencies, and not the desire to always install the latest package versions. The use of a flag is --prefer-dedupeusually ideal in situations where packet duplication needs to be minimized.



If this flag is used, the resulting tree for the above example will look like this:



root (x@1.x, y@1.x, z@1.x) <--   x@1.x 
+-- x 1.1.0                <-- x@1.x   1.1.0   
+-- y (x@1.1, z@2.x)
|   +-- z 2.0.0 (x@1.x)    <--   x@1.x
+-- z 1.0.0


In this case, npm sees that even though it x@1.2.0is the most recent version of the package that satisfies the requirement x@1.x, it is quite possible to choose instead x@1.1.0. Choosing this version will result in less duplication of packages in the dependency tree.



If you did not fix the structure of the dependency tree in a lock file, then each programmer working on a project in a team would have to set up their working environment in the same way as other team members configure it. Only this will allow him to get the same result as the others. If the "implementation" of the dependency tree building mechanism can be tweaked in this way, it gives npm users a serious opportunity to optimize dependencies for their own specific needs. But, if the results of tree creation depend on the system implementation, this makes it impossible to create deterministic dependency trees. This is exactly what leads to file usage yarn.lock.



Here are a few more examples of how advanced npm settings can lead to the creation of different dependency trees:



  • --legacy-peer-deps, a flag that forces npm to ignore completely peerDependencies.
  • --legacy-bundling, a flag telling npm that it shouldn't even try to flatten the dependency tree.
  • --global-style, the flag by which all transitive dependencies are installed as nested dependencies in the higher-level dependency folders.


Capturing and fixing the results of dependency resolution and the expectation that the same algorithm will be used to generate the dependency tree does not work in conditions when we give users the ability to customize the mechanism for building the dependency tree.



Fixing the structure of the finished dependency tree allows us to put at the disposal of users such opportunities and at the same time not to disrupt the process of building deterministic and reproducible dependency trees.



Performance and data completeness



The file is package-lock.jsonuseful not only when you need to ensure the determinism and reproducibility of dependency trees. We also rely on this file to track and store package metadata, significantly saving time that would otherwise package.jsonbe spent using the npm registry. Since the file's capabilities are yarn.lockvery limited, it does not contain metadata that we need to constantly download.



In npm 7, the file package-lock.jsoncontains everything npm needs to fully build the project dependency tree. In npm 6, this data is not so conveniently stored, so when we encounter an old lock-file, we have to load the system with additional work, but this is done, for one project, only once.



As a result, even if inyarn.lock and information was written about the structure of the dependency tree, we have to use another file to store additional metadata.



Future opportunities



What we were talking about here can change dramatically if we take into account various new approaches to placing dependencies on disks. These are pnpm, yarn 2 / berry and PnP Yarn.



As we work on npm 8, we're going to explore a virtual filesystem approach to dependency trees. This idea was modeled on Tink, and the concept was confirmed to work in 2019. In addition, we are discussing the idea of ​​switching to something like the structure used by pnpm, although this, in a sense, is an even more massive cardinal change than using a virtual file system.



If all dependencies are in some central repository, and nested dependencies are represented only by symbolic links or a virtual file system, then modeling the structure of the dependency tree would not be such an important issue for us. But we still need more metadata than the file can provide yarn.lock. As a result, it makes more sense to update and rationalize the existing file format package-lock.jsonrather than a complete transition to yarn.lock.



This is not an article that could be called "On the dangers of yarn.lock"



I would like to emphasize that, judging by what I know, Yarn reliably creates the correct project dependency trees. And, for a specific version of Yarn (at the time of this writing, this applies to all fresh versions of Yarn), these trees are, as with npm, completely deterministic.



A file is yarn.lockenough to create deterministic dependency trees using the same version of Yarn. But we cannot rely on mechanisms that depend on the package manager implementation given the use of similar mechanisms in many tools. This is even more true when you consider that the file format implementationyarn.lockis not formally documented anywhere. (This is not a problem unique to Yarn, the same situation happened in npm. Documenting file formats is a pretty serious job.) The



best way to ensure the reliability of building strictly determinate dependency trees is, in the long run, fixing the results of resolving dependencies. Do not rely on the belief that future implementations of the package manager will, when resolving dependencies, follow the same path as previous implementations. This approach limits our ability to construct optimized dependency trees.



Deviations from the initially fixed structure of the dependency tree should be the result of the explicit desire of the user. Such deviations should document themselves, making changes to previously recorded data on the structure of the dependency tree.



Only package-lock.json, or a mechanism like this file is capable of giving npm such capabilities.



What package manager do you use in your JavaScript projects?






All Articles