In this material, I want to talk about how you can act for someone who wants to independently acquire knowledge in the areas of DS, AI and ML. Applying the teaching methods suggested here can lead to good progress in learning new things. Here, in addition, I am going to share links to resources that I use, and which I am without a shadow of a doubt ready to recommend to others.
Learn some math disciplines
Mathematics, even if someone doesn't like it, is very important in the field of interest to us. I think it is safe to say that those who read this already have some of the math knowledge they got in school. This is a good base, but this is not close enough for someone who wants to develop in the areas of DS, AI and ML. Namely, here you need to delve into mathematics a little deeper than it is done in school, you have to learn some things from statistics, algebra and other mathematical disciplines. I would put together a list of useful math resources for DS, but this has already been done for me in this article. And they did it very well.
Learn to program
If you're just starting out with self-learning, don't jump straight into learning how to write code for machine learning purposes. Instead, it is worth learning basic programming concepts that are not tied to any domain. Learn what programming is, familiarize yourself with the different types of code that exist, and understand how to write programs correctly. This is very important because as you master programming, you will learn many basic ideas that will serve you well throughout your DS career.
Do not rush, do not strive to immediately learn something difficult. How well you understand the basics will have an impact on your entire future professional career. Hereyou can find very good video tutorials that introduce you to programming and computer science. Namely, they figure out the most important things that you need to understand. Take some time to this question and try to come to an understanding of everything you learn.
Pick one programming language and get it right
There are many programming languages used by those in the DS, AI and ML fields. Most commonly used here are Python, R, Java, Julia and SQL. Other languages can be used in these areas, but the ones I have listed are used more often than others for a reason:
- They are easy to learn. If you allocate enough time to study them and show perseverance and constancy in learning, then you can achieve certain success quickly enough.
- , , .
- , .
- , , , DS, AI ML.
- — .
There is nothing wrong with learning multiple programming languages. And, in fact, it is useful to know more than one language. But when learning to program, you shouldn't rush. You need to try to study, in a certain period of time, only one language, otherwise you can get very confused. Therefore, it is best to learn languages one at a time, paying special attention to those of their mechanisms that will be useful to you in your work. I would suggest choosing Python as your first language. This is a fairly simple language that even a beginner can easily understand. In addition, I would recommend that you first learn general programming in Python, and then go into specialized Python tools for data analysis.
Learn to collect data
More often than not, no one will give you data that is specially designed for you, and sometimes you may not have any data at all. But, in any case, you need to find a way to collect the data that you will work with. The organization you work for may have a good data collection system. If so, this is a big plus for you. If your organization does not have such a system, then you have to find a way to collect data. But we are not talking about any data, but about high-quality information, with which you can work productively, achieving your goals. Data collection is not directly related to "data mining", with their in-depth analysis. Data collection is a work step that precedes analysis.
Open data that can be used for free can be found in many places on the Internet. Sometimes, the data you need can be collected from websites using web scraping techniques. Web scraping is a very important data scientist skill, so I really ask everyone who is going to work in the areas of DS, AI and ML to master this skill. Here's a good guide to web scraping.
Data, in addition, can be stored in databases, so the initial knowledge of database administration and knowledge of interacting with databases will be very useful to you. In particular, knowledge of SQL is very important here. Learn SQL here .
Learn to process data
What I'm going to talk about here is often referred to as Data Wrangling. This process includes clearing the existing data. It uses exploratory analysis of data and the removal of all unnecessary from them. This process also includes structuring the data, bringing it into a form that you can work with. This stage of working with data is the most difficult and exhausting. The data that you come across in the learning process will already be prepared for analysis. But the data you come across in the real world can be completely raw. If you really want to become a data scientist, you need to find the real data and find ways to make it look decent.
Real data can be found almost everywhere. For example - onKaggle . This great platform has data from many companies around the world. Primary data processing is a very tedious activity, but if you do it regularly and persistently, you will gradually realize that it is also very interesting activity. Here are some good lectures on primary data processing.
Learn to visualize data
If you are a DS, AI or ML expert and are well versed in your business, you should not forget that what seems obvious to you may be completely incomprehensible to others. Don't expect them to, for example, be able to draw conclusions by looking at columns of numbers. Learning to visualize data is necessary so that the results of your work could be used by specialists from other fields. “Data visualization” is commonly referred to as the process of presenting data in a graphical form. Such a presentation of the data will make it possible to benefit from them even for those who do not have special knowledge in the fields of DS, AI and ML.
There are many ways to visualize data. Since we are programmers, our main method of data visualization should be writing the appropriate code. It is fast and does not require the purchase of specialized tools. When writing code for data visualization, you can use many free and open source libraries created for the programming languages we use. For example, there are libraries of this kind for Python. These are Matplotlib, Seaborn and Bokeh. Here is a video tutorial on Matplotlib.
Another way to visualize data is to use closed-source tools. For example - Tableau... There are many of these tools out there and they can get you pretty good results, but they are not free. Tableau is one of the most common of these tools, and I use it a lot. I would advise anyone involved in data analysis and visualization to learn Tableau. Here's a good guide to this tool.
Artificial intelligence and machine learning
AI and ML can be thought of as subsections of DS as they are data driven. AI and ML are technologies that are based on teaching machines to behave similar to human behavior. For this, specially prepared data is used, transmitted to the machines. Computer models can teach a lot about what humans are capable of. To do this, they are trained and guided to the desired result. In this case, "machines" can be perceived as small children with absolutely no knowledge. These children are gradually taught to identify objects, to speak. They learn from their mistakes and, as they learn, begin to better solve the tasks assigned to them. This is the case with cars.
AI and ML technologies are what brings machines to life using a variety of mathematical algorithms. Humanity still does not know the limits of the capabilities of these constantly improving technologies. These days, AI and ML technologies are widely used to solve cognitive problems. These are object detection and recognition, face and speech recognition, natural language processing, spam detection and fraud detection. This list can be continued for a very long time.
A more detailed story about AI and ML is worthy of a separate publication. In the meantime, I can recommend this video regarding general questions of the application of these technologies. But here- Hours of video tutorial on machine learning. By working through these videos, you can acquire ML knowledge that matches the beginner or even intermediate level. You will learn about the many existing machine learning algorithms, how they work, and how to use them. After that, you should have enough knowledge to start creating your own simple ML-models. You can read about how to do this here .
Explore Ways to Publish ML Models Online
There are tools that allow you to publish ML models on the Internet. This allows you to give them access to everyone. In order to publish models on the Internet, you need a good understanding of web development processes. The point is that by "publishing a model" is meant the creation of a web page (or a group of pages) that makes it possible to work with the model in a browser. In addition, you need to take into account that the front-end of the project, its interface, must exchange data with the backend, with the server-side of the project, where the model itself is located. In order to build such projects, you must be able to create server-side APIs and use these APIs in the client-side of applications.
In the event that you plan to publish models in the cloud, if you are going to use Docker technology, you will need a good knowledge of the field of cloud computing and DevOps.
In fact, there are many ways to deploy models on the Internet. I would suggest starting by learning how to do this using the Python-based Flask web framework. Here's a good tutorial on this.
Find a mentor
Self-learning is great, but nothing beats learning from professionals. The fact is that with this approach, what is used in reality is assimilated, and that this is how learning goes through practice. Many things can only be learned through practice. Coaching has many strengths, but it should be borne in mind that not every mentor can make a significant impact on your career or life. This is why it is very important to find a good mentor.
For example, you can try to solve this problem using the Notitia AI platform.... Here, students are assigned personal mentors who make personal and professional contributions to student development. Mentors take those who want to learn from beginner to expert level in the areas of DS, AI and ML. Notia AI is also the most affordable platform of its kind.
Outcome
Keep in mind that studying courses, reading articles, and watching videos will not make you a data scientist. You will need to be certified by a specialized institution. In addition, some vacancies require certain educational qualifications. Invest time in self-study, get certified or get your education credentials and you'll be ready for real work.
What do you think one needs to know and be able to aspire to become a valuable expert in the fields of artificial intelligence or machine learning?