yarn.lock
, they asked me the same question several times. It sounded like this: “Why then leave support package-lock.json
? Why not use only yarn.lock
? ”
The short answer to this question is: “Because it doesn't fully meet the needs of npm. If you rely solely on it, it will impair npm's ability to form optimal package installation schemes and the ability to add new functionality to the project. " A more detailed answer is presented in this material.
yarn.lock
The basic structure of the yarn.lock file
The file
yarn.lock
is a description of the correspondence of package dependency specifiers and metadata describing the resolution of these dependencies. For example:
mkdirp@1.x:
version "1.0.2"
resolved "https://registry.yarnpkg.com/mkdirp/-/mkdirp-1.0.2.tgz#5ccd93437619ca7050b538573fc918327eba98fb"
integrity sha512-N2REVrJ/X/jGPfit2d7zea2J1pf7EAR5chIUcfHffAZ7gmlam5U65sAm76+o4ntQbSRdTjYf7qZz3chuHlwXEA==
The following is reported in this passage: "Any dependence on
mkdirp@1.x
should be resolved exactly to what is indicated here." If several packages depend on mkdirp@1.x
, then all these dependencies will be resolved in the same way.
In npm 7, if a file exists in the project
yarn.lock
, npm will use the metadata it contains. The field values resolved
will tell npm where it needs to download packages from, and the field values integrity
will be used to check what is received to ensure that it matches what was expected to be received. If packages are added to or removed from the project, the contents are updated accordingly yarn.lock
.
At the same time, npm, as before, creates a file
package-lock.json
. If this file is present in the project, it will be used as the authoritative source of information about the structure (form) of the dependency tree.
The question here is, "If
yarn.lock
good enough for Yarn's package manager, why can't npm just use this file?"
Deterministic Dependency Installation Results
The results of installing packages using Yarn are guaranteed to be the same when using the same file
yarn.lock
and the same version of Yarn. Using different versions of Yarn can cause the package files on the disk to be located differently.
The file
yarn.lock
guarantees deterministic dependency resolution. For example, if foo@1.x
allowed in foo@1.2.3
, then, given the use of the same file yarn.lock
, this will always happen, in all versions of Yarn. But this (at least in itself) is not equivalent to guaranteeing that the structure of the dependency tree is deterministic!
Consider the following dependency graph:
root -> (foo@1, bar@1)
foo -> (baz@1)
bar -> (baz@2)
Here are a couple of dependency tree schemas, each of which can be considered correct.
Tree number 1:
root
+-- foo
+-- bar
| +-- baz@2
+-- baz@1
Tree number 2:
+-- foo
| +-- baz@1
+-- bar
+-- baz@2
The file
yarn.lock
cannot tell us which dependency tree to use. If a root
command is executed in the package require(«baz»)
(which is incorrect, since this dependency is not reflected in the dependency tree), the file yarn.lock
does not guarantee the correct execution of this operation. This is a form of determinism that a file can give package-lock.json
, but not yarn.lock
.
In practice, of course, since Yarn, in the file
yarn.lock
, there is all the information needed to select the appropriate version of a dependency, the choice is deterministic as long as everyone is using the same version of Yarn. This means that the choice of version is always done in the same way. The code doesn't change until someone changes it. It should be noted that Yarn is smart enough to not be affected by discrepancies regarding the load time of the package manifest when creating the dependency tree. Otherwise, the determinism of the results could not be guaranteed.
Since this is determined by the features of the Yarn algorithms, rather than the data structures available on the disk (not identifying the algorithm that will be used), this guarantee of determinism is basically weaker than the guarantee that it gives
package-lock.json
containing a complete description of the structure of the dependency tree stored on disk.
In other words, how Yarn builds the dependency tree is influenced by the file
yarn.lock
and implementation of Yarn itself. And in npm, only the file affects how the dependency tree looks like package-lock.json
. This package-lock.json
makes it harder to accidentally break the project structure as described in different versions of npm. And if changes are made to the file (maybe by mistake, or intentionally), these changes will be clearly visible in the file when adding its changed version to the project repository, which uses the version control system.
Nested dependencies and dependency deduplication
Moreover, there is a whole class of situations involving work with nested dependencies and deduplication of dependencies, when the file is
yarn.lock
not able to accurately reflect the result of dependency resolution, which will be used by npm in practice. Moreover, this is true even for those cases when npm uses yarn.lock
metadata as a source. While npm uses it yarn.lock
as a reliable source of information, npm does not consider this file as an authoritative source of information about restrictions imposed on dependency versions.
In some cases, Yarn generates a dependency tree with a very high level of package duplication, and we don't need it. As a result, it turns out that exactly following Yarn's algorithm in such cases is far from ideal.
Consider the following dependency graph:
root -> (x@1.x, y@1.x, z@1.x)
x@1.1.0 -> ()
x@1.2.0 -> ()
y@1.0.0 -> (x@1.1, z@2.x)
z@1.0.0 -> ()
z@2.0.0 -> (x@1.x)
The project
root
depends on the 1.x
package versions x
, y
and z
. The package y
depends on x@1.1
and on z@2.x
. A z
version 1 package has no dependencies, but the same version 2 package depends on x@1.x
.
Based on this information, npm generates the following dependency tree:
root (x@1.x, y@1.x, z@1.x) <-- x@1.x
+-- x 1.2.0 <-- x@1.x 1.2.0
+-- y (x@1.1, z@2.x)
| +-- x 1.1.0 <-- x@1.x 1.1.0
| +-- z 2.0.0 (x@1.x) <-- x@1.x
+-- z 1.0.0
The package
z@2.0.0
depends on x@1.x
, the same can be said about root
. The file yarn.lock
maps to x@1.x
c 1.2.0
. However, a package dependency z
where also specified x@1.x
will instead be resolved to x@1.1.0
.
As a result, even though the dependency
x@1.x
is described in yarn.lock
where it is stated that it should resolve to the package version 1.2.0
, there is a second resolution result x@1.x
to the package version 1.1.0
.
If you run npm with a flag
--prefer-dedupe
, then the system will go one step further and install only one instance of the dependency x
, which will lead to the formation of the following dependency tree:
root (x@1.x, y@1.x, z@1.x)
+-- x 1.1.0 <-- x@1.x 1.1.0
+-- y (x@1.1, z@2.x)
| +-- z 2.0.0 (x@1.x)
+-- z 1.0.0
This minimizes duplication of dependencies, the resulting dependency tree is committed to the file
package-lock.json
.
Since the file
yarn.lock
only captures the order in which dependencies are resolved, not the resulting package tree, Yarn will generate a dependency tree like this:
root (x@1.x, y@1.x, z@1.x) <-- x@1.x
+-- x 1.2.0 <-- x@1.x 1.2.0
+-- y (x@1.1, z@2.x)
| +-- x 1.1.0 <-- x@1.x 1.1.0
| +-- z 2.0.0 (x@1.x) <-- x@1.1.0 , ...
| +-- x 1.2.0 <-- Yarn , yarn.lock
+-- z 1.0.0
A package
x
, when using Yarn, appears in the dependency tree three times. When using npm without additional settings - 2 times. And when using the flag --prefer-dedupe
- only once (although then the dependency tree is not the newest and not the best version of the package).
All three resulting dependency trees can be considered correct in the sense that each package will receive those versions of the dependencies that meet the stated requirements. But we don't want to create package trees with too many duplicates. Think about what will happen if
x
- this is a large package that has many own dependencies!
As a result, there is only one way in which npm can optimize the package tree, while supporting the creation of deterministic and reproducible dependency trees. This method consists in using a lock file, the principle of the formation and use of which at a fundamental level differs from
yarn.lock
.
Recording the results of the implementation of user intent
As already mentioned, in npm 7, the user can use the flag
--prefer-dedupe
in order to apply the dependency tree generation algorithm, in which the priority is given to deduplication of dependencies, and not the desire to always install the latest package versions. The use of a flag is --prefer-dedupe
usually ideal in situations where packet duplication needs to be minimized.
If this flag is used, the resulting tree for the above example will look like this:
root (x@1.x, y@1.x, z@1.x) <-- x@1.x
+-- x 1.1.0 <-- x@1.x 1.1.0
+-- y (x@1.1, z@2.x)
| +-- z 2.0.0 (x@1.x) <-- x@1.x
+-- z 1.0.0
In this case, npm sees that even though it
x@1.2.0
is the most recent version of the package that satisfies the requirement x@1.x
, it is quite possible to choose instead x@1.1.0
. Choosing this version will result in less duplication of packages in the dependency tree.
If you did not fix the structure of the dependency tree in a lock file, then each programmer working on a project in a team would have to set up their working environment in the same way as other team members configure it. Only this will allow him to get the same result as the others. If the "implementation" of the dependency tree building mechanism can be tweaked in this way, it gives npm users a serious opportunity to optimize dependencies for their own specific needs. But, if the results of tree creation depend on the system implementation, this makes it impossible to create deterministic dependency trees. This is exactly what leads to file usage
yarn.lock
.
Here are a few more examples of how advanced npm settings can lead to the creation of different dependency trees:
--legacy-peer-deps
, a flag that forces npm to ignore completelypeerDependencies
.--legacy-bundling
, a flag telling npm that it shouldn't even try to flatten the dependency tree.--global-style
, the flag by which all transitive dependencies are installed as nested dependencies in the higher-level dependency folders.
Capturing and fixing the results of dependency resolution and the expectation that the same algorithm will be used to generate the dependency tree does not work in conditions when we give users the ability to customize the mechanism for building the dependency tree.
Fixing the structure of the finished dependency tree allows us to put at the disposal of users such opportunities and at the same time not to disrupt the process of building deterministic and reproducible dependency trees.
Performance and data completeness
The file is
package-lock.json
useful not only when you need to ensure the determinism and reproducibility of dependency trees. We also rely on this file to track and store package metadata, significantly saving time that would otherwise package.json
be spent using the npm registry. Since the file's capabilities are yarn.lock
very limited, it does not contain metadata that we need to constantly download.
In npm 7, the file
package-lock.json
contains everything npm needs to fully build the project dependency tree. In npm 6, this data is not so conveniently stored, so when we encounter an old lock-file, we have to load the system with additional work, but this is done, for one project, only once.
As a result, even if in
yarn.lock
and information was written about the structure of the dependency tree, we have to use another file to store additional metadata.
Future opportunities
What we were talking about here can change dramatically if we take into account various new approaches to placing dependencies on disks. These are pnpm, yarn 2 / berry and PnP Yarn.
As we work on npm 8, we're going to explore a virtual filesystem approach to dependency trees. This idea was modeled on Tink, and the concept was confirmed to work in 2019. In addition, we are discussing the idea of switching to something like the structure used by pnpm, although this, in a sense, is an even more massive cardinal change than using a virtual file system.
If all dependencies are in some central repository, and nested dependencies are represented only by symbolic links or a virtual file system, then modeling the structure of the dependency tree would not be such an important issue for us. But we still need more metadata than the file can provide
yarn.lock
. As a result, it makes more sense to update and rationalize the existing file format package-lock.json
rather than a complete transition to yarn.lock
.
This is not an article that could be called "On the dangers of yarn.lock"
I would like to emphasize that, judging by what I know, Yarn reliably creates the correct project dependency trees. And, for a specific version of Yarn (at the time of this writing, this applies to all fresh versions of Yarn), these trees are, as with npm, completely deterministic.
A file is
yarn.lock
enough to create deterministic dependency trees using the same version of Yarn. But we cannot rely on mechanisms that depend on the package manager implementation given the use of similar mechanisms in many tools. This is even more true when you consider that the file format implementationyarn.lock
is not formally documented anywhere. (This is not a problem unique to Yarn, the same situation happened in npm. Documenting file formats is a pretty serious job.) The
best way to ensure the reliability of building strictly determinate dependency trees is, in the long run, fixing the results of resolving dependencies. Do not rely on the belief that future implementations of the package manager will, when resolving dependencies, follow the same path as previous implementations. This approach limits our ability to construct optimized dependency trees.
Deviations from the initially fixed structure of the dependency tree should be the result of the explicit desire of the user. Such deviations should document themselves, making changes to previously recorded data on the structure of the dependency tree.
Only
package-lock.json
, or a mechanism like this file is capable of giving npm such capabilities.
What package manager do you use in your JavaScript projects?