How the data practice was built in EPAM

EPAM has been working with data for a long time, the first large customers with Big Data projects appeared back in 2001. At the time, well-known analytics companies Gartner and Forrester, as well as major vendors Oracle, Microsoft and IBM, noted that companies should move towards Big Data, since these technologies are indispensable in all areas related to processing large amounts of data. Since then, EPAM's team of experts has grown steadily, working on increasingly complex projects and offering proven solutions and quality products for working with big data. Today, only in Russian EPAM, more than 500 people work in Data practice. About how it all began, what projects we met, what failures happened, what Data specialists should prepare for and what kind of Data specialists there are,I spoke with the head of EPAM Data Practice in Russia -Ilya Gerasimov .  





Career 

Tell us how you came to the Data direction

I joined EPAM in 2006 as a junior developer on .NET and MS SQL Server, before that I worked in a product company and held the position of a team leader, developing software for automating hotels and restaurants. But at EPAM, I started my career from scratch. By 2013, I had grown to a team lead and was looking for new opportunities for my development in EPAM, and it was at this time that I met at the SEC in Minsk with the head of the Big Data competence center, and we agreed that this area should be developed in Russia.   





Then there were two or three of us. Colleagues from other countries helped us, gave us courses, and involved us in various activities related to this area. I had to study a lot, and then spread the knowledge gained.  





Why have you been working for the company for so long?

Data , - . , , . - โ€” , .





Data?

   โ€”  Data,  Data.  :)





Data-?

   Data-: Data Science, Machine Learning, Business Intelligence, Enterprise Search, DevOps in Data, Data Quality, Business Data Analysis.   500 โ€”        .  





     .     ยซยป  ,  ยซยป  .  





 Data-  Data governance, ..  , , ,    .   , ,  ,  .. 





 โ€”  .  





, , . ,    โ€”  , ,  , Data Science  .  





, , ,

2013-2014 , - , , , , Data Science.





, Scala , DevOps, , . , , , .





?

. , . Java, Python, DevOps- .





ยซ ยป, , . , 2012 โ€” , . , , , . , . , , , , -, EPAM.





โ€” Data Analytics, , Data Engineering, Data Science , โ€” EPAM.





, , . โ€” , .





? ?

, - , - , - , . , . , . . Cadence, , , , , , .





, Reinforcement Learning. . 2- , . , . , , Reinforcement Learning. , , , .





ยซ ยป, Data-. . , , ยซยป . , ยซ ยป โ€” . , , , . , , Theano, TensorFlow, Theano - .





  •  Apache  , ,  โ€”  Spark, Cassandra, Elasticsearch  .  





  • Yarn, HDFSMapReduceHiveKafkaZooKeeper โ€”   , .   Hadoop  ,     ,   ,  ,  . 





  •     โ€”  Amazon, Microsoft Azure, GCP โ€”   Hadoop, .  





  • ,  Kerberos, Knox, Ranger.  





  • ,  NoSQL  NewSQL  โ€” Cassandra,  (  ), Snowflake, Amazon Redshift, HBase, MongoDBTeradata





  • DevOps  โ€” Kubernetes, Docker, Jenkins.  





  • : Power BI, Tableau, QlikView. 





  •  Data Science  ,  TensorFlow  Google BERT (  ยซ ยป,   ), PyTorchKeras.  





  •  Streaming. Streaming    Data,    โ€” Spark Streaming, Kafka Streams, Apache Flink, Apache Storm.   





.   





SQL ( ), DWH ( โ€” , , Data Vault,  ..), ( ,   , , ), , DWH, Data Mart, Data Lake.  





, . , AWS, Azure, GCP. 





, ETL ( ) ETL ELT, , ,  slowly changed dimension. ETL (PL/SQL, T-SQL, pgSQL, Python, Spark), (, Airflow), ,  , (TalendInformatica Power CenterPentaho, etc.). 





(Data Analytics and Visualization), 2- (Power BI, Tableau, TIBCO Spotfire, MicroStrategy, Pentaho,  ..)   (, Storytelling). 





- ?

   Apache โ€”  Spark, NiFi, Elasticsearch . .   ,  ,  โ€” - Open Source .  





,     Open Source , , Open Data Analytics Hub (ODAHU) , ML .  





?

- ,    Data โ€”     ,   . (blueprint) . ,  ,   .  ,  ,        , , . 





  blueprint   -  , ,      , ,   Data Scientists, ,   .. 





?

    , ,   , e-commerce,  ,  ,  Life Science   โ€”  , -.  , ,  blueprints  , .   





,     , , .    , ,   .





2020 ?

, ,      XXI .  2020 , (late majority),   ,    . 





The diffusion of innovations according to Rogers.  (From Wikipedia)
The diffusion of innovations according to Rogers. (From Wikipedia)

, : , ?

, ,    , . , ,   ,    .  





,   ,   Data,  Java, Scala  Python. 





- EPAM , Data Engineering, Data Science, BI, Python , .





, Data EPAM?

. ,  Data  โ€”  Java, Scala  Python (, ),  SQL, , , ,  DevOps- ,      Machine Learning  .   








All Articles