Approx. transl. : This is not an ordinary translation, because it is not based on a separate article, but a recent case with Stack Exchange, which became the main hit of the resource this month. Its author asks a question, the answer to which turned out to be a real revelation for some site visitors.
Compressing directories by ~ 1.3 GB, each with 1440 JSON files, I found a 15x difference between the size of archives compressed using tar
macOS or Raspbian 10 (Buster) and archives obtained using the tarfile library built in in Python.
Minimal working example
This script compares both methods:
#!/usr/bin/env python3
from pathlib import Path
from subprocess import call
import tarfile
fullpath = Path("/Users/user/Desktop/temp/tar/2021-03-11")
zsh_out = Path(fullpath.parent, "zsh-archive.tar.xz")
py_out = Path(fullpath.parent, "py-archive.tar.xz")
# tar using terminal
# tar cJf zsh-archive.tar.xz folderpath
call(["tar", "cJf", zsh_out, fullpath])
# tar using tarfile library
with tarfile.open(py_out, "w:xz") as tar:
tar.add(fullpath, arcname=fullpath.stem)
# Print filesizes
print(f"zsh tar filesize: {round(Path(zsh_out).stat().st_size/(1024*1024), 2)} MB")
print(f"py tar filesize: {round(Path(py_out).stat().st_size/(1024*1024), 2)} MB")
The result is this:
zsh tar filesize: 23.7 MB py tar filesize: 1.49 MB
The following versions were used:
tar
on MacOS:bsdtar 3.3.2 - libarchive 3.3.2 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.6
;
tar
Raspbian at 10:xz (XZ Utils) 5.2.4 liblzma 5.2.4
;
tarfile
Python:0.9.0
.
:
diff -r py-archive-expanded zsh-archive-expanded
.
Β« Β» ( ) :
β diff zsh-archive.tar.xz py-archive.tar.xz Binary files zsh-archive.tar.xz and py-archive.tar.xz differ
Quicklook ( Betterzip) , -:
zsh
, Python β . , .
? ? , Python- ? 15- - Python-?
: , tarlib
Python ; BSD- tar
.
:
, , BSD- GNU- tar
.
GNU tar
--sort
:
ORDER
,none
,name
inode
.
--sort=none
β , .
GNU tar
GNU tar
Mac:
brew install gnu-tar
'tar' , --sort
:
gtar --sort='name' -cJf zsh-archive-sorted.tar.xz /Users/user/Desktop/temp/tar/2021-03-11
zsh-archive-sorted.tar.xz
1,5 β , , Python-.
, , JSON-, ( β unixtime), BSD tar
:
cat *.json > all.txt tar cJf zsh-cat-archive.tar.xz all.txt
zsh-cat-archive.tar.xz
1,5 .
Python- tarfile
, TarFile.add Python , tarfile
Python :
. , recursive False. .
, , , :
JSON- . , .
, . , .
P.S.
UPD: β XZ/LZMA β , @iliazeus!
:
Β«Git happens! 6 Git Β»;
-
-