Date Scientist notes: where to start and is it needed?



TL; DR is a post for questions / answers about Data Science and how to enter and develop in the profession. In the article I will analyze the basic principles and FAQ and am ready to answer your specific questions - write in the comments (or in a personal), I will try to answer everything within a few days.
With the advent of the cycle of notes "date of the satanist", a lot of messages and comments have come with questions about how to start and where to dig, and today we will analyze the main skills and questions that arose after the publications.



Everything mentioned here does not claim any ultimate truth and is the subjective opinion of the author. We'll go over the main things that seem to be the most important in the process.



Why exactly is this needed



In order for the goal to be achievable better, so that it at least somehow looks concrete - you want to become a DS or Research Scientist in Facebook / Apple / Amazon / Netflix / Google - see the requirements, languages ​​and necessary skills directly for which position. What is the hiring process? How do you go about a typical day in such a role? What does the average profile of the person who works there look like?



Often the general picture is that a person does not really understand what exactly he wants and it is not entirely clear how to prepare for this vague image - therefore it is worth having at least a rough plan of what exactly you want.

Narrow down your current goal view
Even if it changes along the way, and it is generally normal to change plans during the play, you should have a goal in front of you and focus on it, periodically evaluating and rethinking.



Will or is it still relevant



By the time you grow to a position.



Imagine that before your position you need to get a PhD, work for 2-3 years in the industry and generally get your hair cut while meditating in a monastery - will there be the same situation with Data Science as it once did with economists and lawyers? Will not everything change beyond recognition in the area that you want to do.



Is there a good chance that everyone will rush there now and we will see a picture when there is a wide layer of people who are trying to enter the profession - and the starting positions will be just scanty.



It may be worth considering current trends when choosing a path, not only the current state of the labor market, but also your idea of ​​how it is changing and where it is.



For example, the author did not plan to go to the date of Satanists, but during the PhD he sawed side projects that strongly resonated with DS in terms of skills and after graduating from graduate school naturally moved to Wednesday, seeing a good position.



If in the course of the play it turns out that it will be necessary to go somewhere else - for there is now the very movement and all the most interesting action is taking place, well, then we will go there naturally.



Breakdown of skills



These are conditional categories of skills that I think are key to a full and effective work in DS. Separately, I will highlight English - learn whatever you do in CS. Next will be the key categories.



Programming / Scripting



What languages ​​should you definitely get acquainted with? Python? Java? Shell scripting? Lua? Sql? C ++?



What exactly you need to be able to do and why in terms of programming - here the range of positions is very different.



For example, I often have to implement complex logic, queries, models, analytics and generally develop interpreted systems, but there are almost never requirements for the speed of the code, except for the most general and reasonable ones.



Therefore, my skill set is very different from those who write the Tensorflow library and are thinking about optimizing the code for efficient use of l1 cache and the like, so see what exactly you need and evaluate the right path to learning.



For example, for python, people are already making a language learning map .



Surely, for your needs, there are already experienced advice and there are good sources - you need to decide on a list and start working on it.



Understanding business processes



Without it, nowhere: you need to understand why you are needed in this process, what you are doing and why. Often this is what can save you tons of time, maximize your benefits and not waste time and resources on bullshit.



I usually ask the following questions:



  • What exactly am I doing in the company?
  • What for?
  • Who will use it and how?
  • What options do I have?
  • What are the limits of the parameters?


Here is a little more detail about the parameters: you can often greatly change the scenario of work, if you know that something can be sacrificed: for example, interpretability or vice versa, a couple of percent will not play a role here and we have a sooooo quick solution, and the client needs it, because he pays for the time the pipeline is running on AWS.



Maths



Here you think and you yourself understand everything - without knowledge of basic mathematics, you are nothing more than a baby monkey with a grenade (forgive the Random Forest) - so you need to understand at least basic things. If I were to make the most minimal list, then it would include:



  • Linear Algebra - a huge amount of resources is easy to google, look for what suits you best;
  • Mathematical analysis - (at least in the volume of the first two semesters);
  • Probability theory is everywhere in machine learning;
  • Combinatorics - it is actually complementary to the theorver;
  • Graph theory - at least BASIC;
  • Algorithms - at least the volume of the first two semesters (see Cormen's recommendations in his book);
  • Matlogic - at least basic.


Practical data analysis and visualization



One of the most important things is to be able not to be afraid to get your hands dirty in the data and to conduct a comprehensive analysis of the dataset, the project and throw in a quick visualization of the data.



Exploratory data analysis should become just something natural, like all other data transformations and the ability to throw a simple pipeline from unix tuzles (see previous articles) or write a readable and understandable laptop.



I will separately mention visualization: it is better to see once than hear a hundred times.



Showing a graph to the manager is a hundred times easier and more understandable than a set of numbers, so matplotlib, seaborn and ggplot2 are your friends.



Soft skills



It is equally important to be able to communicate your ideas, as well as results and concerns (etc.) to others - make sure that you are able to clearly state the task in both technical and business terms.



You can explain to colleagues, managers, bosses, clients and everyone who needs it, what is happening, what data you operate on and what kind of results you got.



Your charts and documentation should be readable without you. That is, you do not need to go to you to understand what is written there.



You can make a clear presentation to get the message across and / or document your project / work.



You can convey your position in a reasoned and emotionless way, say yes / no, or question / support the decision.



Training



There are many different places where you can learn all this. I will give a short list - I tried everything from it and, to be honest, each item has its pros and cons. Try and decide what works for you, but I highly recommend trying several options and not getting hung up on one.



  • Online courses: coursera, udacity, Edx, etc;
  • New schools: online and offline - SkillFactory, SHAD, MADE;
  • Classical schools: university master's programs and refresher courses;
  • Projects - you can simply select tasks that are of interest to you and cut them by uploading them to github;
  • Internships - it is difficult to suggest something here, you have to look for what is available and find suitable options.


Is it necessary?



In conclusion, perhaps I will add three personal principles that I try to follow myself.



  • Should be interesting;
  • To bring inner pleasure (= at least not to cause suffering);
  • « ».


Why exactly are they? It is difficult to imagine that you will be doing something from day to day and you will not like it or will not be interested. Imagine that you are a doctor and hate to communicate with people - of course, this can somehow work, but you will be constantly uncomfortable with the flow of patients who want to ask you something. This doesn't work in the long run.



Why did I specifically mention even inner pleasure? It seems to me that this is necessary for the further development and, in principle, of the learning process. I really enjoy it when I manage to complete some complex feature and build a model or calculate an important parameter. I enjoy when my code is aesthetically pleasing and well written. Therefore, learning something new is interesting and does not require any significant motivation directly.



“Being yours” is the very feeling that you wanted to do that. I have a little story. Since childhood, I was fond of rock music (and metal - SALMON!) And how many people wanted to learn how to play, and that's all. It turned out that I had no hearing and no voice - this did not bother me at all (and I must say this does not bother many performers right on the stage), and as a schoolboy I got a guitar ... and it became clear that I didn't really like sitting for hours and play it. It was going hard, all the time it seemed to me that some kind of garbage was coming out - I did not get any pleasure from this at all and only felt lousy, stupid and completely incapable. I literally forced myself to sit down for classes from under the stick and in general it was not in the horse's forage.



At the same time, I could quite calmly sit for hours developing some kind of toy, animate something on a flash (or something else) with the help of a script, and I was wildly motivated to finish elements in the game or deal with the mechanics of movement and / or connecting third-party libraries, plugins and everything else.



And at some point I realized that playing the guitar is not mine, and really, I like listening, not playing. And my eyes were on fire when I wrote games and code (listening to all kinds of metal at that moment) and this is what I liked then, and I should have been doing this.



Do you still have questions?



Of course, we could not go through all the topics and questions, so write your comments and in a personal - I am always glad to have questions.










All Articles