We downloaded 10 million Jupyter notebooks from Github - and this is what we found out.

Hello, Habr! 





The Datalore by JetBrains team is in touch . We would like to share with you the results of the analysis of several million publicly available Github repositories with Jupyter notebooks. We downloaded laptops to get a little more numerical insight into the current status of arguably the most popular data science tool.









Inspired by research from the Design Lab team at UC San Diego, we downloaded Jupyter laptops twice, in October 2019 and in October 2020. 





Two years ago, there were 1.23 million laptops in the public domain. In October 2020, the number of laptops grew 8 times, and we were able to download 9.72 million files. We made this dataset public - download instructions can be found at the end of the post.





Datalore . Datalore — Jupyter- , JetBrains. , , Datalore.





, . , @JBDatalore contact@datalore.jetbrains.com





.





data science

R Julia , Python Jupyter-.





, Bash, MatLab Scilab, , , , : Scala, C++ Java.





. , , “nan”.









Python 2 Python 3 2018, 2019 2020 .









Python 2





Python 3





Other languages





2018





52,5%





43,8%





3,7%





2019 (JetBrains Datalore)





18,1% ( 1029 K)





72,6% ( 4128 K)





9,3% ( 529 K)





2020 (JetBrains Datalore)





11,8% ( 1154 K, +125 K 2019)





79,3% ( 7710 K, +3582 K 2019)





10,8% ( 1050 K, +521 K 2019)





, Python 3, 2019 87%, Python 2 — 12%.





, Python R, :





data science

Datalore , Python-. Jupyter-.





, 60% Numpy, 47% Pandas Matplotlib.





:





:





PyTorch TensorFlow

, PyTorch TensorFlow.





, PyTorch , TensorFlow.





, Keras TensorFlow , Fast.ai PyTorch . , TensorFlow, , , , .









TensorFlow





Keras





PyTorch





Fastai





2019 (JetBrains Datalore)





321 K





231 K





110 K





19 K





2020





(JetBrains Datalore)





430 K (+34%)





367 K(+59%)





253 K(+130%)





25 K(+32%)









( , Python 3.6 ):





  • 71,90% Markdown.





  • 42,13% output.





  • 12,34% LaTex.





  • 19,77% HTML.





  • 20,63% Markdown.





Markdown Jupyter-. 50% 4 Markdown 14 . 





Markdown- :





. , 25 000 , 95% 465 :





. , 42% . 10% 8 .





Jupyter-

Jupyter- — . , . Jupyter- , 36% Jupyter- , . . .





, Markdown- . , , , , , .









Jupyter- , data science. 





, . , , Datalore-.





2018





Datalore





:





  1. :





    1. (10 , 4,4 ): https://github-notebooks-update1.s3-eu-west-1.amazonaws.com/





    2. c AWS S3 API , JSON : https://github-notebooks-samples.s3-eu-west-1.amazonaws.com/ntbslist.json





    3. JSON , , : https://github-notebooks-update1.s3-eu-west-1.amazonaws.com/0000036466ae1fe8f89eada0a7e55faa1773e7ed.ipynb





  2. (3 ). Datalore-.












All Articles