Star Wars or the detailed dplyr guide

Today, May 4, the day of Star Wars, we have prepared for you a detailed guide on the main functions of the library dplyr



. Why on Star Wars Day? And because we will disassemble everything using the example of a dataset starwars



.





Let `s start!





, Data Science , , . .





, , 4 ? - «May the fource be with you» «May, the 4th», .. 4 .





, dplyr



. library







starwars



. , .





  1. name - . , - . 





  2. height -





  3. mass -





  4. hair_color -  





  5. skin_color -





  6. eye_color -  





  7. birth_year - ( )





  8. sex - ( )





  9. gender - (, )





  10. homeworld -  





  11. species -  





  12. films - ,  





  13. vehicles - ,





  14. starships - ,





, , dplyr



. - .





dplyr

dplyr



- tidyverse



. Python



- Pandas



. dplyr



: , , . 





dplyr



SQL



. Netpeak : dplyr



SQL



. .





, dplyr



, , tidyverse



tidy data



. , « » - : 

















, starwars



. , , ?





tidy data



, .





, dplyr



. dplyr



































- !





- ? - select



.





, ? , 20 , . . - :





select



. , . :





  • contains



    :  





  • ends_with



    :





  • matches



    :





  • num_range



    : , , «V1, V2, V3...»





  • one_of



    :





  • starts_with



    :





. , «»



.





, .





- : , «t»



, 1 .





, tibble



. , , ? , dplyr



pull



. , dplyr



, .





, . , .





- WHERE



SQL



. dplyr



filter



( , ?).





filter



- , True



. :





&



|



:





>/<



, :





  • >= 







  • <=







  • is.na







  • !is.na







  • %in%







  • !







:





filter



, . , distinct



.





, sample_n



, n



.





slice



, , :





sample_frac



. , . , 0.5



, . 





, , .





SQL



ORDER BY



. dplyr



arrange



.





- , .





, desc



.





? , :)





, arrange



, select



. across



. :





, , - .





- , - , , , . SQL



GROUP BY



sum, min, max



. dplyr



… . , .





eye_color



:





15 , . - - . summarise



.





, , , , .





, drop_na



tidyr



, , . , // NA







4 ( , ) . , summarise







, , : 





  • n_distinct



    -





  • last



    -





  • nth



    - n-





  • quantile



    -  





  • IQR



    - , inter-quartile range





  • mad



    - , median absolute deviation





  • sd



    -





  • var



    -





, . …





- , , ? , mass



, mass



height







- across



.





, mass



height



. , («_»), .names



.





, .





- . , A



, B



, A/B



. - . mutate.





- .





? - across



. , 10 . - .





, , 10.





- «_new»



. stringr



tidyverse



.





, _new



. , .





, mutate



SQL



. , mass



dense_rank



:





, rnk



.





, , . : 





  • lag







  • lead







  • cumsum







  • dense_rank







  • ntile







  • row_number







  • case_when







  • coalesce







, , 100% SQL



. .





, - .





, - . dplyr



SQL



. :





  • left_join







  • right_join







  • inner_join







  • full_join







SQL



. - starwars



, - . .





, rename name. , by



( ON



SQL



) .





inner_join



, 35 , .. df



35. by



, .





full_join



, 87, .. starwars



87.





, . .. mass



, , .x



.y



. ?





1 :





, by



. , . - - new_name



.





2 :





, , , , . , :





, dplyr



:





  • bind_rows



    - «»





  • bind_cols



    - «»





  • intersect



    -





  • setdiff



    - , .. ,





  • union



    - , ( )





  • union_all



    - union



    ,





dplyr



. - , . - :)





, Data Science , , . .





, . May the fource be with you!





P.S. dplyr



. :)








All Articles