EPAM has been working with data for a long time, the first large customers with Big Data projects appeared back in 2001. At the time, well-known analytics companies Gartner and Forrester, as well as major vendors Oracle, Microsoft and IBM, noted that companies should move towards Big Data, since these technologies are indispensable in all areas related to processing large amounts of data. Since then, EPAM's team of experts has grown steadily, working on increasingly complex projects and offering proven solutions and quality products for working with big data. Today, only in Russian EPAM, more than 500 people work in Data practice. About how it all began, what projects we met, what failures happened, what Data specialists should prepare for and what kind of Data specialists there are,I spoke with the head of EPAM Data Practice in Russia -Ilya Gerasimov .
Career
Tell us how you came to the Data direction
I joined EPAM in 2006 as a junior developer on .NET and MS SQL Server, before that I worked in a product company and held the position of a team leader, developing software for automating hotels and restaurants. But at EPAM, I started my career from scratch. By 2013, I had grown to a team lead and was looking for new opportunities for my development in EPAM, and it was at this time that I met at the SEC in Minsk with the head of the Big Data competence center, and we agreed that this area should be developed in Russia.
Then there were two or three of us. Colleagues from other countries helped us, gave us courses, and involved us in various activities related to this area. I had to study a lot, and then spread the knowledge gained.
Why have you been working for the company for so long?
Data , - . , , . - โ , .
Data?
โ Data, Data. :)
Data-?
Data-: Data Science, Machine Learning, Business Intelligence, Enterprise Search, DevOps in Data, Data Quality, Business Data Analysis. 500 โ .
. ยซยป , ยซยป .
Data- Data governance, .. , , , . , , , ..
โ .
, , . , โ , , , Data Science .
, , ,
2013-2014 , - , , , , Data Science.
, Scala , DevOps, , . , , , .
?
. , . Java, Python, DevOps- .
ยซ ยป, , . , 2012 โ , . , , , . , . , , , , -, EPAM.
โ Data Analytics, , Data Engineering, Data Science , โ EPAM.
, , . โ , .
? ?
, - , - , - , . , . , . . Cadence, , , , , , .
, Reinforcement Learning. . 2- , . , . , , Reinforcement Learning. , , , .
ยซ ยป, Data-. . , , ยซยป . , ยซ ยป โ . , , , . , , Theano, TensorFlow, Theano - .
Apache , , โ Spark, Cassandra, Elasticsearch .
Yarn, HDFS, MapReduce, Hive, Kafka, ZooKeeper โ , . Hadoop , , , , .
โ Amazon, Microsoft Azure, GCP โ Hadoop, .
, Kerberos, Knox, Ranger.
, NoSQL NewSQL โ Cassandra, ( ), Snowflake, Amazon Redshift, HBase, MongoDB, Teradata.
DevOps โ Kubernetes, Docker, Jenkins.
: Power BI, Tableau, QlikView.
Data Science , TensorFlow Google BERT ( ยซ ยป, ), PyTorch, Keras.
Streaming. Streaming Data, โ Spark Streaming, Kafka Streams, Apache Flink, Apache Storm.
.
SQL ( ), DWH ( โ , , Data Vault, ..), ( , , , ), , DWH, Data Mart, Data Lake.
, . , AWS, Azure, GCP.
, ETL ( ) ETL ELT, , , slowly changed dimension. ETL (PL/SQL, T-SQL, pgSQL, Python, Spark), (, Airflow), , , (Talend, Informatica Power Center, Pentaho, etc.).
(Data Analytics and Visualization), 2- (Power BI, Tableau, TIBCO Spotfire, MicroStrategy, Pentaho, ..) (, Storytelling).
- ?
Apache โ Spark, NiFi, Elasticsearch . . , , โ - Open Source .
, Open Source , , Open Data Analytics Hub (ODAHU) , ML .
?
- , Data โ , . (blueprint) . , , . , , , , .
blueprint - , , , , Data Scientists, , ..
?
, , , e-commerce, , , Life Science โ , -. , , blueprints , .
, , , . , , .
2020 ?
, , XXI . 2020 , (late majority), , .
, : , ?
, , , . , , , .
, , Data, Java, Scala Python.
- EPAM , Data Engineering, Data Science, BI, Python , .
, Data EPAM?
. , Data โ Java, Scala Python (, ), SQL, , , , DevOps- , Machine Learning .