Why are tar.xz files created with Python tar 15 times smaller than macOS tar

Approx. transl. : This is not an ordinary translation, because it is not based on a separate article, but a recent case with Stack Exchange, which became the main hit of the resource this month. Its author asks a question, the answer to which turned out to be a real revelation for some site visitors.





Compressing directories by ~ 1.3 GB, each with 1440 JSON files, I found a 15x difference between the size of archives compressed using tar



macOS or Raspbian 10 (Buster) and archives obtained using the tarfile library built in in Python.





Minimal working example

This script compares both methods:





#!/usr/bin/env python3

from pathlib import Path 
from subprocess import call 
import tarfile

fullpath = Path("/Users/user/Desktop/temp/tar/2021-03-11") 
zsh_out = Path(fullpath.parent, "zsh-archive.tar.xz") 
py_out = Path(fullpath.parent, "py-archive.tar.xz")

# tar using terminal 
# tar cJf zsh-archive.tar.xz folderpath
call(["tar", "cJf", zsh_out, fullpath])

# tar using tarfile library 
with tarfile.open(py_out, "w:xz") as tar:
    tar.add(fullpath, arcname=fullpath.stem)

# Print filesizes 
print(f"zsh tar filesize: {round(Path(zsh_out).stat().st_size/(1024*1024), 2)} MB") 
print(f"py tar filesize: {round(Path(py_out).stat().st_size/(1024*1024), 2)} MB")
      
      



The result is this:





zsh tar filesize: 23.7 MB
py tar filesize: 1.49 MB
      
      



The following versions were used:





  • tar



    on MacOS: bsdtar 3.3.2 - libarchive 3.3.2 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.6



    ;





  • tar



    Raspbian at 10: xz (XZ Utils) 5.2.4 liblzma 5.2.4



    ;





  • tarfile



    Python: 0.9.0



    .





:





diff -r py-archive-expanded zsh-archive-expanded
      
      



.





Β« Β» ( ) :





➜ diff zsh-archive.tar.xz py-archive.tar.xz
Binary files zsh-archive.tar.xz and py-archive.tar.xz differ
      
      



Quicklook ( Betterzip) , -:





On the left is zsh-archive.tar.xz, on the right is py-archive.tar.xz.
β€” zsh-archive.tar.xz, β€” py-archive.tar.xz.

zsh



, Python β€” . , .





? ? , Python- ? 15- - Python-?





: , tarlib



Python ; BSD- tar



.





:

, , BSD- GNU- tar



.





GNU tar



--sort



:





ORDER



, none



, name



inode



.





--sort=none



β€” , .





GNU tar

GNU tar



Mac:





brew install gnu-tar
      
      



'tar' , --sort



:





gtar --sort='name' -cJf zsh-archive-sorted.tar.xz /Users/user/Desktop/temp/tar/2021-03-11
      
      



zsh-archive-sorted.tar.xz



1,5 β€” , , Python-.





, , JSON-, ( β€” unixtime), BSD tar



:





cat *.json > all.txt
tar cJf zsh-cat-archive.tar.xz all.txt
      
      



zsh-cat-archive.tar.xz



1,5 .





Python- tarfile

, TarFile.add Python , tarfile



Python :





. , recursive False. .





, , , :





JSON- . , .





, . , .





P.S.

UPD: β€” XZ/LZMA β€” , @iliazeus!





:





  • Β«Git happens! 6 Git Β»;





  • Β« Β»;





  • Β« Β».








All Articles