Sometimes I like to explore the datasets that interest me. If I build a successful model for data from Kaggle, for which countless models have already been created, there will be no practical value, but it will at least allow me to learn something new. But data scientists are people who strive to create something new, unique, something that can bring real benefit to the world.
How do you generate new ideas? In order to find the answer to this question, I combined my own experience and the results of creativity research. This led to the fact that I was able to form 5 questions, the answers to which help to find new ideas. Here I will also give examples of ideas found thanks to the technique I proposed. As you search for answers to the questions presented here, you will walk the path of generating new ideas and will be able to use your creative potential to its fullest. As a result, you will have new unique ideas that you can implement in your Data Science projects.
1. Why do I want to start working on a new project?
When you think about starting a new project, you have an intention or goal in your head. First, you need to find the answer to the question of why you want to create another project in the field of data science. Having a rough outline of what kind of goal you are aiming for will help you focus on finding an idea. So think about what you are going to create a project for. Here are some options:
- This is a portfolio project that you are going to showcase to potential employers.
- This is a draft for an article about concepts, models, or exploratory data analysis.
- This is a project that will allow you to practice something. For example, we can talk about natural language processing, about data visualization, about primary data processing, about some specific machine learning algorithm.
- This is a very special project that is not described in this list.
2. What areas are my interests and my experience?
There are three main reasons to think about this question.
- First, remember the Venn diagrams used in data science to describe the skills required in this area. Knowledge in a specific area is an important asset that every data scientist should have. It is possible to solve certain problems by processing data only if the subject area to which this data belongs is clear. Otherwise, algorithms will be applied, visualizations and predictions will be created that seem inadequate to any practitioner of the appropriate profile. And if what you are doing doesn't make sense, then why bother doing it at all?
- -, , , , . , . , . , , .
- -, , , . , , - .
Let me give you an example. Areas of knowledge that interest me and in which I have experience include environmental and socio-economic sustainability of systems, finance, popular culture, natural language processing. Focusing on these topics helps me to leverage what I already have. Thanks to this knowledge, I determine whether I can, inspired by something, come up with a new idea that can be implemented.
3. How do you find inspiration?
The main source of inspiration is reading. As you search for an idea, you can find interesting topics by reading various materials:
- , , . , , . , WIRED , , Google . , . , Google.
- . , . , GPT-2 , , , , , , . - ?
- . , , Data Science, , . , NLP- «», , , . - ? , ? GPT-2.
If we talk about other sources of inspiration, then inspiration, without closing oneself to new ideas, can be found in everyday life. Whenever you are interested in a question, think about whether you can answer that question using data manipulation techniques. For example, I recently stumbled upon a trailer for Boys, and found a lot of positive reviews about it on IMDb. βIs there any confirmation that the number of violent scenes in TV shows is increasing over time ?β I asked myself. βIs there an ever-growing audience that enjoys violent TV shows?β I continued. If something interests you, take a moment and study the relevant data.
How do you generate project ideas from the above sources of inspiration? Neuroscientists have identified three different psychological processes associated with generating ideas:
- You can combine existing ideas to create new ones (combinatorial creativity). For example, various projects analyzed rental offers posted on Airbnb. There are projects aimed at analyzing the real estate market. If you combine these ideas, you can look for an answer to the question of whether housing prices in a certain city are increasing thanks to Airbnb.
- , ( ). , -, , , . , , - .
- - , ( ). β . . . : , , .
4. ?
Once you have decided on the general direction of research, you will need to search for data that will allow you to understand how to implement your idea in the form of a Data Science project. This is extremely important in determining whether an idea will succeed. In answering the question in the title of this section, you should consider the possibility of having what you need in existing data stores. You may have to collect the necessary data yourself, which complicates the task. So here's an overview of data sources:
- : Kaggle, Google Datasets, FiveThirtyEight, BuzzFeed, AWS, UCI Machine Learning Repository, data.world, Data.gov , Google.
- , -. Google Google Scholar. , - , . ? , Our World in Data , .
- Data you need to collect yourself. To collect such data, you can resort to web scraping, text analysis, various APIs, event tracking , and working with log files .
If you are unable to find data that can help you implement your project idea, reformulate the idea. Try to get an idea from the original idea that you can implement with the data you have. In the meantime, ask yourself a question about why you are not able to find the data you need. What is wrong with the area you are interested in? What can you do about it? Answers to these questions alone can lead to the emergence of a new Data Science project.
5. Is the found idea realizable?
So you have a fantastic idea! But is it possible to implement it? Go through the steps in the idea generation process again. Think about what you want to achieve (question number 1), are you interested in the chosen area, if you have experience in it (question number 2), do you have the data necessary to implement the idea (question number 4). Now you need to determine the following: do you have the skills necessary to implement the idea and to achieve the goal.
It is important to take into account such a factor as the time you plan to spend on this project. You are probably not going to write a doctoral dissertation on your chosen topic. Therefore, the project that you will do within the framework of the found idea, perhaps, will affect only a certain part of it. Maybe it will consist only in learning something new, you need to implement the idea in the future.
After you go through the 5 steps above for generating an idea, you should have a question that you can and want to answer, spending as much time on it as you are willing to spend on achieving your goal.
Outcome
- . , , , . β , , , . β . , , .
- - . . , - , . , . , , , , .
- Don't be afraid to start over. Whatever you do, you always learn something new. Every time you write a line of code, you practice and expand your knowledge and skills. If you realized that the implementation of the found idea will not bring you closer to your goal, or if it turns out that the idea is not feasible, do not be afraid to leave it and move on. The time you have spent looking for this idea is not lost for you. You need to sensibly evaluate the benefits that can be obtained from the implementation of the idea.
Using the method described here, I constantly find original ideas for my Data Science projects. I hope this technique is useful to you as well.
How do you look for new ideas for your Data Science projects?