How to Analyze the Photo Studio Market with Python (3/3). Analytics

Everyone who opens their own business wants to guess the perfect opening moment, find the perfect location and take precise, effective steps for the business to survive and grow. Finding ideal parameters is impossible, but statistical analysis tools help to assess the best opportunities.



Open sources contain a huge amount of useful information. Collecting, storing and analyzing it correctly will help you find the best business opportunities.



A group of young entrepreneurs were considering the option of opening their own photo studio in Moscow. They needed to find out:



  • What is the general state of the photo studio market: growing, stable or falling?
  • what is the seasonality of the market?
  • how much can they earn?
  • where is it better to open the halls?
  • how much to invest in the project?
  • how strong is the competition in the market?


A simple parser , a database, and the analytics provided in this article helped them answer these and many other questions .







In the first article, we examined the parsing of the ugoloc.ru photo studio aggregator site and uploaded general information about photo studios, halls and data on booking rooms.



In the second article, we examined writing the received data to the database and reading data from the database, and also set up the parsing operation depending on the information in the database.



In this article, we will conduct a simple analysis of the collected data.



You can find a finished project with examples of tables from the database, intermediate tables, graphs, additional comments on my page on github .



What directions for analysis we will use



  • determine the dynamics of opening photo studios;
  • calculate the profitability of photo studios depending on the month of opening;
  • determine the seasonality of the business;
  • calculate the average income per hall, as well as the optimal number of halls at the photo studio;
  • investigate the dependence of profitability on the location of the photo studio;
  • find out the number of halls of competing studios;
  • calculate the influence of other parameters on income, such as ceiling height, hall area, booking prices;
  • consider other possible directions of analysis.


Unloading data from the database



To unload, do the following:



establishing a connection with the base
directory = './/'
conn = sqlite3.connect(directory + 'photostudios_moscow1.sqlite')
cur = conn.cursor() 




uploading data by studio
studios = db_to_studios(conn)
studios




through the halls
halls = db_to_halls(conn)
halls




on booking
booking = db_to_booking(conn)
booking




we leave studios with opening dates and exclude dressing rooms from the list of halls
studios = studios[[x.year > 0 for x in studios['established_date']]]
halls = halls[halls['is_hall'] == 1]




Dynamics of opening photo studios by year



Let's construct a frequency histogram of photo studios opening for different years. To do this, we calculate the number of periods (years) and build a histogram.



plotting a histogram
num_bins = np.max(studios['established_date']).year - np.min(studios['established_date']).year + 1
plt.hist([x.year for x in studios['established_date']], num_bins)
plt.show()






The histogram shows a clear growth of new photo studios from year to year. This pattern tells us not about the actual growth of the market by 2 times annually, but rather about the growth of the aggregator itself.



This fact tells us about the need to divide studios into 2 categories: registered on the aggregator when opening a photo studio ("new") and after a long time ("old"). This will be our next task.



Identifying new photo studios



Which photo studio is new? The one that is just being promoted and gaining clients. A visual analysis of the booking calendars from the moment of opening shows that the studio is gaining a steady stream of customers in a few months.



It turns out that in order to distinguish a new photo studio from an old one (which did not immediately join the aggregator), you need to compare the income for the first half month from the moment of "opening" with the same period a year later. The new studios' income should grow significantly over the year, while the old one should remain at about the same level.



First, let's combine all the tables and leave only the booked hours
# merge all tables
data = (booking
         .merge(halls, left_on = 'hall_id', right_on = 'hall_id', how = 'inner')
         .merge(studios, left_on ='studio_id', right_on = 'studio_id', how = 'inner')
        )
data = data[data['is_working_hour'] == 1]
data['date'] = pd.to_datetime(data['date'])
data




Then we calculate the income in the first half month of the work of the photo studio
first_month = (data[data['date'] <= [x + datetime.timedelta(days = 15) for x in data['established_date']]]
               .loc[:, ['studio_id', 'price', 'duration']]
              )
first_month['income'] = first_month['price'] * first_month['duration']
first_month = first_month.groupby('studio_id').agg(np.sum)
first_month




In half a month after a year
month_after_year = (data[(data['date'] >= [x + datetime.timedelta(days = 365) for x in data['established_date']])
                         & (data['date'] <= [x + datetime.timedelta(days = 365 + 15) for x in data['established_date']])
                        ]
                    .loc[:, ['studio_id', 'price', 'duration']]
                   )
month_after_year['income'] = month_after_year['price'] * month_after_year['duration']
month_after_year = month_after_year.groupby('studio_id').agg(np.sum)
month_after_year




We will divide the indicators in a year by similar ones at the opening
month_diff = (month_after_year.merge(first_month, left_on = 'studio_id', right_on = 'studio_id', how = 'inner')
              .merge(halls.groupby('studio_id').count()
                     , left_on = 'studio_id', right_on = 'studio_id', how = 'inner')
             )[['income_x', 'income_y', 'is_hall']]
month_diff['income_diff'] = (month_diff['income_x'] / month_diff['income_y']) ** (1 / month_diff['is_hall'])
month_diff.sort_values('income_diff')




Received the growth rate of income after a year. The indicator for different studios is distributed from 0.75 to 2.1 without sharp jumps. This suggests that the studio could connect to the aggregator immediately after opening, after a week, month, year, etc.



To determine new photo studios, we take the conditional value of the income growth rate in the median value: 1.18. Those. if the income of a photo studio for the year has grown by more than 18%, then we will consider this photo studio as new. There were 22 such studios.



In what month is it better to open a photo studio?



We have calculated photo studios that registered on the aggregator shortly after opening. Therefore, the actual opening day and the opening day, according to our data, will be considered the same for these studios.



For the calculation, we will take new photo studios, calculate the income as the sum of the booking prices of all booked hours, group by rooms (taking into account the month of its opening), calculate the average annual income by opening months.



Calculation of the average income for the year depending on the month of opening
new = studios['is_new'].reset_index().merge(data, left_on = 'studio_id', right_on = 'studio_id', how = 'inner')
new = new[new['is_new'] == 1]
new = new[new['date'] <= [x + datetime.timedelta(days = 365) for x in new['established_date']]]
new['est_year'] = [x.year for x in new['established_date']]
new['est_month'] = [x.month for x in new['established_date']]
new['income'] = new['price'] * new['is_booked']
mean_income = (new
 .groupby(['hall_id', 'est_year', 'est_month']).agg('sum')['income'].reset_index()
 .groupby('est_month').agg('mean')['income']
plt.bar(range(1, 12), mean_income)
plt.show()
)








The histogram shows a clear relationship:



  • the best months to open a photo studio are the beginning of the year (January-April)
  • also good months to open are September-October;
  • the worst months are May-June.


It will be interesting to compare this data with the seasonality of the market.



Determination of business seasonality



Seasonality - change in the number of orders depending on the period. Let's analyze the annual seasonality.



For the calculation, let's take studios that were open until 2018 and see their reservations for 2018-2020. Studio income is defined as the sum of prices for the booked hours. Next, we calculate the total income of all studios for each month of the selected period.



Seasonality calculation
season = data[(data['open_date'] < '2018-01-01') & (data['date'] > '2018-01-01')]
season['income'] = season['price'] * season['duration']
season['year'] = [x.year for x in season['date']]
season['month'] = [x.month for x in season['date']]
incomes = season.groupby(['year', 'month']).agg(np.sum)['income']
incomes = incomes[incomes.index]




Plotting
incomes = incomes[: -3]
plt.figure(figsize = (20, 10))
plt.plot([str(x[0]) + '-' + str(x[1]) for x in incomes.index], incomes)
plt.xticks(rotation=60) 
plt.grid()
plt.show()








The graph shows a clearly pronounced seasonality: the largest number of orders from October to April and a sharp drop from May to September. Seasonality fits into the logic of the business. In summer, people take pictures on the street, in parks. In winter, this is not possible, and you have to arrange photo sessions indoors. Seasonality is connected with this: in summer there are few clients, in winter there are many. The peak of orders is in December. This is probably due to the New Year and the holiday feeling that you want to capture in a photograph.



The best months to open are seasonally bound. It is better to open a studio during the season or a month before its start. In the period from May to August, the studio should not be opened, because get into the off-season.



Hall profitability calculation



An important indicator for a business being opened is the income from one room.



To calculate, we group income by room for each month, exclude 2020 as an anomalous year due to quarantine, and look at the income selection using the .describe () function.



Calculation of the profitability of 1 hall
hall_income = season.groupby(['studio_id','hall_id', 'year', 'month']).agg(sum)['income'].reset_index()
hall_income = hall_income[hall_income['year'] < 2020]
hall_income['income'].describe()




count       648.000000
mean     184299.691358
std      114304.925311
min           0.000000
25%       95575.000000
50%      170350.000000
75%      256575.000000
max      617400.000000
Name: income, dtype: float64


Received income per hall in rubles.



From the data on percentiles it can be seen that the income of half of the halls is within the range of 95,000 rubles. up to 256,000 rubles. with a median value of 170,000 rubles.



From the data on the average and standard deviation, we see that, according to the 1 sigma rule, two-thirds of the halls bring from 70,000 rubles. up to RUB 300,000 from the middle of 184,000 rubles.



It turns out that the average hall can count on an income of 170,000 - 180,000 rubles. Β± 80,000 rubles



Such a large spread is explained by the influence of other factors, which we will try to determine in the future.



How many halls are there to open in a photo studio?



To calculate, we will calculate the average monthly profitability for each hall, calculate the average profitability of a hall for a photo studio, calculate the number of halls in a photo studio and group the data by the number of halls, calculating the average yield per hall.



Calculation of the profitability of the hall depending on the number of halls in the photo studio
(hall_income
 .groupby(['studio_id', 'hall_id']).agg('mean').reset_index()
 .groupby('studio_id').agg(['count', 'mean'])['income']
 .groupby('count').agg('mean')
)




mean
count	
1	134847.916667
2	146531.944444
3	300231.944444
4	222202.604167


Received an average monthly profitability of 1 hall, depending on the number of halls in the photo studio. Let's notice the regularity: the more halls, the more profitability. Maximum profitability for studios with 3 rooms.



This phenomenon is due to the fact that using one room of the photo studio, the client can see another room and immediately book it. Thus, one room of the photo studio "promotes" others.



Dependence of income on the location of the hall



The location of the hall can greatly affect the profitability: in the center, the halls will be more accessible to customers, which means that the income will be higher. Let's check the hypothesis.



For the calculation, let's calculate the average monthly income of the hall, group it according to the "metro" criterion and sort in ascending order.



Hall profitability depending on distance from the center
data['income'] = data['price'] * data['duration']
data['year'] = [x.year for x in data['date']]
data['month'] = [x.month for x in data['date']]
(data
 .groupby(['hall_id', 'metro', 'year', 'month']).agg('sum')['income'].reset_index()
 .groupby(['hall_id', 'metro']).agg('mean')['income'].reset_index()
 .groupby('metro').agg('mean')['income'].sort_values()
)[-59:]




We got the following data:



metro
                               5016.666667
                             10485.264378
                                      11925.000000
/                    18116.666667
,                        19000.000000
                                    21963.333333
                                  30667.051729
                                 31031.250000
                                   37787.500000
/                       39357.142857
                                  44354.375000
                                  45888.888889
                         46566.666667
                                    48541.666667
. ,              49086.503623
                                55340.659341
 ,  ,          55944.444444
. / .            59771.111111
                               66780.000000
                                    66847.058824
                                  67692.545788
.                                 70090.341880
.                                70337.676411
,                         72974.494949
                                   79987.083333
                         88800.000000
                                   95550.000000
                              98326.086957
                                  99216.279070
                                              99925.000000
                         102835.622784
. , . , . \    104956.521739
                        111050.684459
                                     111090.000000
                                    111909.090909
                                   116426.892180
                        117450.000000
                                   118382.236364
                                      122626.500000
,                      123258.518519
-                        124557.894737
,                           126300.000000
                                  129222.916667
                                   135281.642512
,                     138945.454545
                                      152246.883469
,                      168484.500000
.                           169079.381010
.                                172618.798439
                             173777.659900
                                  178254.545455
                                         181041.818182
                                      187283.444198
                              189140.857975
                      250975.000000
, ,             252685.714286
,                  264164.473684
-                              277162.791991
                                  556621.746032
Name: income, dtype: float64


Please note that I left the metro data as is. For a more accurate picture, they need to be brought to a common format, for example, "Baumanskaya, Elektrozavodskaya", "Elektrozavodskaya metro station" and "Electrozavodskaya" should be written in one name.



From the data we see that in areas with expensive real estate, such as Maryina Roshcha, Novye Cheryomushki, Krylatskoye, the profitability per hall is higher.



How many halls do competing studios have



How many halls do the studios in the market have? To answer this question, let's attach a table with halls to the studio table, group it by studio, counting the number of halls, and build a frequency histogram.



Calculation of the number of halls in studios
hall_num = studios.merge(halls, left_on='studio_id', right_on='studio_id').groupby('studio_id').agg('count')['is_hall']

plt.hist(hall_num, range(np.min(hall_num), np.max(hall_num)+1))
plt.show()
hall_num.describe()








count    105.000000
mean       2.685714
std        2.292606
min        1.000000
25%        1.000000
50%        2.000000
75%        3.000000
max       13.000000


From the data obtained, we see that most photo studios (over 75%) have no more than 3 halls. In the entire market, as a rule, studios have no more than 5 halls.



Impact of other parameters on studio income



Ceiling height



Photos require a lot of light, and large windows in a room with high ceilings provide plenty of natural light. In addition, the higher the ceilings, the more diffused, even light reaches the floor. Therefore, the height of the ceiling can affect the profitability of the photo studio. Let's check this hypothesis.



Let's calculate the average monthly income of each hall while storing data on the height of the ceiling, then calculate the average income depending on the height of the ceiling and build a graph.



Hall income depending on the ceiling height in meters
halls_sq_ceil = (data
 .groupby(['hall_id', 'ceiling', 'square', 'year', 'month']).agg('sum')['income'].reset_index()
 .groupby(['hall_id', 'ceiling', 'square']).agg('mean')['income'].reset_index()
)
plt.bar(halls_sq_ceil.groupby('ceiling').agg('mean')['income'].index[:-2],
        halls_sq_ceil.groupby('ceiling').agg('mean')['income'][: len(halls_sq_ceil) - 2]
       )
plt.show()








In the data obtained, we see that up to 6 meters there is a direct dependence of the profitability of the photo studio on the height of the ceiling. The optimum height is 5-6 meters.



Hall area



Hypothesis: the larger the area of ​​the hall, the more income the hall brings.



We test the hypothesis. We use the previous calculations, calculate the average profitability depending on the area, build a graph.



Hall income depending on its area
square = halls_sq_ceil.groupby('square').agg('mean')['income']
plt.bar(square.index[:-3],
        square.iloc[: len(square) - 3]
       )
plt.show()








A clear pattern is visible on the graphs: the larger the area, the more the hall brings.



Booking price



Hypothesis: there is an optimal hall price that clients pay for almost any hall. Customers are willing to pay a higher price exclusively for high quality.



To test the hypothesis, first consider the current price level. To do this, let's group the general booking table by room, price, year, month and sum up the income. Then we will group by room and booking price, calculating the average income. Next, let's group by price, calculating the average income. Received an average monthly income per studio depending on the set booking price



Average monthly studio profitability depending on the room booking price
price = (data
 .groupby(['hall_id', 'price', 'year', 'month']).agg('sum')['income'].reset_index()
 .groupby(['hall_id', 'price']).agg('mean')['income'].reset_index()
 .groupby('price').agg('mean')['income']
)




How many rooms have a specific price per hour rent
plt.figure(figsize = (20, 10))
plt.hist(price.iloc[: len(price) - 5].index)
plt.show()








From the frequency histogram, we see that most of the studios set a rental price from 500 to 2000 rubles. Below RUB 500 - a rarity. The maximum rental price of the hall is 3500 rubles.



Graph of the dependence of the average monthly income on the rental price of the hall
price = price[price > 10000]
plt.figure(figsize = (20, 10))
plt.scatter(price.index, price)
plt.show()








The graph shows that up to 2000 rubles. there is a clear direct relationship: the higher the booking price is set, the more the studio earns. At a price above 2000 rubles. room income can be either low or high. Apparently, more than 2,000 rubles. clients are ready to pay only for the high quality of the services provided: either for a convenient location, or for equipment, or for a large area, or for a high-quality interior, etc.



Other areas of market analytics



Equipment analysis



The site ugoloc.ru has information about the equipment of photo studios: the presence of colored backgrounds, the brand of flashes, etc. Equipment of photo studios can also affect profitability, therefore, for completeness of the analysis, this factor should also be taken into account.



Not all studios may indicate the presence of additional equipment. Therefore, the assessment of the influence of this factor may be inaccurate.



Analysis of the influence of several parameters on income



The parameters do not affect income in isolation from each other. For example, space and booking price are linked and together affect the overall profitability of a studio. Therefore, it is more reasonable to consider their influence together. The influence of several parameters should be considered based on the specifics of the client's requests.



Enhanced data collection



Photo studios on ugoloc.ru account for less than a third of the market in terms of quantity. It is not possible to estimate the share of studios from this aggregator site by income and market segment. For a more accurate picture, it is worth collecting data from AppEvent, Google Calendars, and possibly from custom booking applications.



Accounting expenses



You may have noticed that there was often not enough expense to complete the picture. For example, the larger the area of ​​the hall, the more income from it. The conclusion, of course, is good, but as the area grows, the cost of renting a hall increases. Therefore, it will certainly be useful to plot the increase in rental costs on the schedule. The profitability of the project is hidden in the optimal ratio of income and expenses for a specific parameter.



The cost of repair also depends on the area: the larger the area, the more expensive the repair.



With an increase in the number of halls, personnel costs per hall decrease, since 1 administrator can serve both 1 hall and 3.



Analysis of the distance from the metro



When assessing the impact of studio location on the income of the hall, an important unaccounted factor is the distance from the metro. You will have to put it down manually, or those who are familiar with the Google API can try to automate this action.



Distance from competitors



Most often, studios are located close to each other. There are about 40 of them at Electrozavod alone. There is a hypothesis that proximity to other photo studios increases profitability. The place (building / business center) may be familiar to customers and they can trust it, which will have a positive effect on all photo studios in the location.



Workload of photo studios



Separately, you can investigate the workload of photo studios:



  • what percentage of the hall's opening hours are reservations;
  • how reservations are related to the day of the week (spoiler: they book more often on weekends);
  • whether there are unbooked days (on which the administrator may not go to work);
  • what hours are most often booked (especially interesting to see on weekdays)
  • etc.


State of photo studios in off-season



Studios close more often in the summer when there are no orders. At the same time, the number of orders for some photo studios does not drop much. What are the advantages of popular off-season studios? This is a separate area for consideration.



Competitor profitability analysis



Having information about the cost of renting premises for a photo studio and the average salaries of staff, one can assess the financial condition of competitors. It may turn out that some studios are on the verge of closing. Accordingly, you can identify their mistakes and try to avoid them.



Likewise, you can explore the experiences of the most lucrative photo studios and take advantage of them in your studio.



Analysis stages



The above analytics is the first step to give a rough picture of the market. For further analysis, the client needs to decide what studio he would like to open, what price segment, what possible location, what is the rental price, what equipment, etc.



Ideal: Identify multiple rental options. Then the area, and the height of the ceilings, and the approximate number of halls, and costs, and the closest competitors will be determined.



In this case, analytics can be carried out more substantively and accurately.



Outcome



In a series of articles, we looked at how to collect data from open sources, save it to a database and analyze it. The result of the work was a general understanding of the photo studio services market.



The above calculations can be applied:



  • in creating a business plan in the revenue part. And this will be statistically confirmed data;
  • in assessing the feasibility and profitability of the project, comparing income and expenses for different opening options;
  • operating photo studios. Many photo studios are idle without orders or operate at a loss. So they are doing something wrong. The above analytics can help studios to identify the causes of their condition.


I enjoyed doing this project.



I decided to share my experience that might be useful to you.



How helpful was the information in these three articles?



Share your opinion.



You can find the finished project on my github page .



All Articles