How to find the number of all letters on all signs of the "entry to city X" type in the country? The exact way to answer such questions

Recently, in the framework of one interview, I needed to solve a problem, the condition of which is given below:

The best manager in the world named Penultimo has another brilliant idea, which you have to realize. He believes that the flow of tourists to Isla de Educados will increase if he can tell the whole world how many wonderful road signs with long inscriptions they have on the island. You are invited to come up with an algorithm that allows you to calculate the total number of letters on all signs "Entrance to the city X" on the island, and then apply the knowledge gained to calculate a similar metric for the Republic of Belarus. Pay attention to the language used to designate settlements, as well as the fact that there may be several entrances to the city. Penultimo also encourages initiative, so you can research this issue for specific areas, compare with the number of people living in the area,and also conduct any other research that you find interesting.


Under the cut I will show you the exact solution to this and other similar problems, for example: "How many gas stations are located within Moscow?"



General solution method



If you look at the OpenStreetMap map, then the following idea immediately comes to mind: let's get for each city its borders and the roads inside its borders, and then find their intersections, on which there will be signs! How we will look for intersections: we take a segment of the border, then a segment of the road and see if they intersect (a typical geometric problem). And so on until all the sections and cities are over.



About OSM data architecture
, : , .

ID, .



  • — , ID
  • — ,
  • — , , ,




OverPass



OverPass - This is an API for getting data from OpenStreetMap. It has its own language for writing queries, you can read about it in detail in this article .



In order to make it easier and more convenient to compose queries, there is a tool Overpass-turbo , where the result of the query can be viewed in a convenient and interactive form.



Using the OverPass API in Python



To process data from OSM in Python, you can use the Overpy package as a wrapper.

To send requests and receive data, you need to do the following:



import overpy

api = overpy.Overpass()
Data = api.query("""
* *
""")


where the variable (?) Data contains everything that the server gave us.



How to process this data? Suppose we entered the following request to get the boundaries of Minsk:



relation["type"="boundary"]["boundary"="administrative"]["name:be"="і"];
//:      
>; out skel qt;


At the output, we have an XML file (you can choose Json) with the following structure:



<* *>
<     >
  <node id="277218521" lat="53.8605688" lon="27.3946601"/>
  <node id="4623647835" lat="53.8603938" lon="27.3966685"/>
  <node id="4713906615" lat="53.8605343" lon="27.3998220"/>
  <node id="4713906616" lat="53.8605398" lon="27.3966820"/>
  <node id="4713906617" lat="53.8605986" lon="27.3947987"/>
  <node id="277050633" lat="53.8463790" lon="27.4431241"/>
  <node id="277050634" lat="53.8455797" lon="27.4452681"/>
  <node id="4713906607" lat="53.8460017" lon="27.4439797"/>
<    ID ,    >
<way id="572768148">
    <nd ref="5502433452"/>
    <nd ref="277218520"/>
    <nd ref="4713906620"/>
    <nd ref="277218521"/>
    <nd ref="4713906617"/>
    <nd ref="4623647835"/>
    <nd ref="4713906616"/>
</way>
<way id="29079842">
    <nd ref="277212682"/>
    <nd ref="277051005"/>
    <nd ref="4739822889"/>
    <nd ref="4739822888"/>
    <nd ref="4739845423"/>
    <nd ref="4739845422"/>
    <nd ref="4739845421"/>
</way>


Let's get some data:



import overpy

api = overpy.Overpass()
Data = api.query("""
relation["type"="boundary"]["boundary"="administrative"]["name:be"="і"];
>; out skel qt;
""")
Xa=Data.ways[0].nodes[0].lon #     
Ya=Data.ways[0].nodes[0].lat # 
Xb=Data.ways[0].nodes[1].lon
Yb=Data.ways[0].nodes[1].lat
NodeID=Data.ways[0]._node_ids[0] # ID    
print(len(Data.nodes)) #   
print(NodeID)
print(Xa,Ya)
print(Xb,Yb)


From the point of view of working with OpenStreetMap in python, this is all that is needed to get the data.



Let's go directly to the problem



To solve it, the code was written in Python, you can see it under the spoiler. Please do not scold too much for the quality of the code, this is the first such large project on it.



Spoiler header
import overpy


###########################
def line_intersection(line1, line2): #  
    ax1 = line1[0][0]
    ay1 = line1[0][1]
    ax2 = line1[1][0]
    ay2 = line1[1][1]
    bx1 = line2[0][0]
    by1 = line2[0][1]
    bx2 = line2[1][0]
    by2 = line2[1][1]
    v1 = (bx2 - bx1) * (ay1 - by1) - (by2 - by1) * (ax1 - bx1)
    v2 = (bx2 - bx1) * (ay2 - by1) - (by2 - by1) * (ax2 - bx1)
    v3 = (ax2 - ax1) * (by1 - ay1) - (ay2 - ay1) * (bx1 - ax1)
    v4 = (ax2 - ax1) * (by2 - ay1) - (ay2 - ay1) * (bx2 - ax1)
    return (v1 * v2 < 0) & (v3 * v4 < 0)


#######################################
citytmp = []
city = []
Borderway = []
Roadway = []
Total = 0
A = [0, 0]
B = [0, 0]
C = [0, 0]
D = [0, 0]
amount = 0
progressbar = 0 
ReadyData = open(' .txt','w')
with open(" .txt", "r", encoding='utf8') as file:
    for i in range(115):
        citytmp.append(file.readline())
citytmp = [line.rstrip() for line in citytmp]
for i in range(115):
    city.append('"' + citytmp[i] + '"')
city[0]='"і"'

api = overpy.Overpass()
for number in range(0,115):#  ,  
    borderstring = """(
relation["type"="boundary"]["boundary"="administrative"]["name:be"=""" + city[number] + """][place=town]; 
relation["type"="boundary"]["boundary"="administrative"]["name:be"=""" + city[number] + """][place=city];
);
>; out skel qt;"""
    roadstring = """(
area[place=town]["name:be"=""" + city[number] + """]; 
way["highway"][highway!=service]["highway"!="footway"]["highway"!="track"]["highway"!="path"]
    ["highway"!="cycleway"]["highway"!="pedestrian"]["highway"!="steps"]["highway"!="residential"](area);
area[place=city]["name:be"=""" + city[number] + """]; 
way["highway"][highway!=service]["highway"!="footway"]["highway"!="track"]["highway"!="path"]
    ["highway"!="cycleway"]["highway"!="pedestrian"]["highway"!="steps"]["highway"!="residential"](area);
);
out body; >; out skel qt;"""
    print('Getting data about', city[number],'...')
        road = api.query(roadstring)
        border = api.query(borderstring)
    print('got data!, city:', city[number]) # 
    for w in range(len(border.ways)): #  
        for i in range(len(border.ways[w]._node_ids)):#    
            progressbar = i / len(border.ways[w]._node_ids) * 100
            print(progressbar, "%;", w, "of", len(border.ways), "parts ready; city-", city[number])
            A[0] = border.ways[w].nodes[i].lon
            A[1] = border.ways[w].nodes[i].lat
            if i == len(border.ways[w]._node_ids) - 1:
                break
            B[0] = border.ways[w].nodes[i+1].lon
            B[1] = border.ways[w].nodes[i+1].lat
            for j in range(len(road.ways)):
                for k in range(len(road.ways[j]._node_ids)):
                    C[0] = road.ways[j].nodes[k].lon
                    C[1] = road.ways[j].nodes[k].lat
                    if k == len(road.ways[j]._node_ids) - 1:
                        break
                    D[0] = road.ways[j].nodes[k+1].lon
                    D[1] = road.ways[j].nodes[k+1].lat
                    if line_intersection((A, B), (C, D)) == 1:
                        amount += 1
                        print(road.ways[j]._node_ids[k])
    print(amount)
    Total += amount * len(city[number])
    ReadyData.write(str(city[number]))
    ReadyData.write(str(amount))
    ReadyData.write('\n')
    amount = 0
print('Total', Total) #  





Code notes



I made a request for a long time, choosing different types of roads so that it was less to count and not to miss the signs. The final query simply removes those roads on which there are no signs, for example residential, service, footway, track, etc.



I parsed the list of cities from Wikipedia and saved them in the format.tht



The code takes a long time, I even had a desire once rewrite it in C ++, but decided to leave it as it is. It took me two days, all due to problems with the dictatorship of the Belarusian Internet and the overload of the OverPass server. To solve the second problem, you need to make one request for all cities, but I have not yet figured out how to do this normally.



My answer to the problem

18981





What I want to say about the correctness of the figure: everything rests on the quality of the data of the OSM itself, that is, there are places where, for example, one road crosses two border lines, or somewhere at the junction the border is drawn a little bit wrong, and as a result we have too much / missing intersection. But this is a feature of this particular task that has no practical meaning, otherwise OSM is strength.



Second task



Now let's calculate the number of gas stations within Moscow:

area[name=""];
(
  node["amenity"="fuel"](area);
  way["amenity"="fuel"](area);
  relation["amenity"="fuel"](area);
);
out body;
>;
out skel qt;


The code
import overpy

api = overpy.Overpass()
Data = api.query("""
area[name=""];
(
  node["amenity"="fuel"](area);
  way["amenity"="fuel"](area);
  relation["amenity"="fuel"](area);
);
out body;
>;
out skel qt;
""")
print(len(Data.nodes)) #   




Result - 489 fillings:






All Articles