R vs Python in a productive loop

Elegant tricks in a notebook on a personal computer (laptop) are good and interesting. But as soon as it comes to executing code in a productive loop, a lot of restrictions immediately appear in the form:

  • the amount of available iron;
  • performance requirements;
  • stability;
  • compliance with IS requirements;
  • … (Add spices to taste).

Today in Russia there is such a phase that for data science tasks the python language is positioned as a "silver bullet". It seems that such a thesis was put forward by those who sell courses on DS in python. And then the flywheel went. In general, this is quite normal - almost all processes in the physical world are oscillatory.

But, nevertheless, in this hype they are a little under-talked about. There are a number of annoying moments in python, even in basic DS tasks, which greatly complicate its use in a productive circuit.

Problem 1

The name of this problem is BlockManager

. This is one of the pillars of architecture pandas

. Outwardly manifested in the fact that:

  • memory consumes "as if not into itself";
  • the execution time of the code depends on the previous states of the interpreter and the sequence of operations and can vary by several orders of magnitude.

, . .

, , :



+ sql


( — ) data.table

+ Clickhouse

( data.frame

). Database-like ops benchmark. , .


Story-telling . Literate Programming. . python

, , Rmarkdown


It is clear that our trends are formed by courses and requirements for vacancies on hh.ru. But if we talk about solving practical problems in an enterprise, then using the R

+ bundle Clickhouse

turns out to be much more profitable. You can also add to this clip golang

, also a great tool.

Fin, get your napalm out.

frame from children's cartoon

Previous publication - "R, Monte Carlo and Enterprise Problems, Part 2" .

All Articles