R vs Python in a productive loop

Elegant tricks in a notebook on a personal computer (laptop) are good and interesting. But as soon as it comes to executing code in a productive loop, a lot of restrictions immediately appear in the form:







  • the amount of available iron;
  • performance requirements;
  • stability;
  • compliance with IS requirements;
  • … (Add spices to taste).


Today in Russia there is such a phase that for data science tasks the python language is positioned as a "silver bullet". It seems that such a thesis was put forward by those who sell courses on DS in python. And then the flywheel went. In general, this is quite normal - almost all processes in the physical world are oscillatory.







But, nevertheless, in this hype they are a little under-talked about. There are a number of annoying moments in python, even in basic DS tasks, which greatly complicate its use in a productive circuit.







Problem 1



The name of this problem is BlockManager



. This is one of the pillars of architecture pandas



. Outwardly manifested in the fact that:







  • memory consumes "as if not into itself";
  • the execution time of the code depends on the previous states of the interpreter and the sequence of operations and can vary by several orders of magnitude.


, . .







, , :









2



pandas



+ sql



/spark



( — ) data.table



+ Clickhouse



( data.frame



). Database-like ops benchmark. , .







3



Story-telling . Literate Programming. . python



, , Rmarkdown



.









It is clear that our trends are formed by courses and requirements for vacancies on hh.ru. But if we talk about solving practical problems in an enterprise, then using the R



+ bundle Clickhouse



turns out to be much more profitable. You can also add to this clip golang



, also a great tool.







Fin, get your napalm out.







frame from children's cartoon







Previous publication - "R, Monte Carlo and Enterprise Problems, Part 2" .








All Articles