Algorithm for ranking segments of a river network using graphs for geoinformation analysis

Hello, Habr.



In this article, I would like to touch upon the topic of the application of information technology in the Earth Sciences, namely, in Hydrology and Cartography. Below the cut is a description of the stream ranking algorithm and the plug-in we have implemented for the open geographic information system QGIS.



Introduction



Currently, an important aspect of hydrological research is not only field observations, but also office work with hydrological data, including using GIS (geographic information systems). However, studying the spatial structure of hydrological systems can be difficult due to the large amount of data. In such cases, you cannot do without the use of additional tools that allow a specialist to automate the process.



An important role in working with digital data in general and hydrological data in particular is played by visualization - the correct and visual presentation of the analysis results. To display watercourses in classical cartography, the following method is adopted: rivers are depicted by a solid line with a gradual (depending on the number of tributaries flowing into the river) thickening from the source to the mouth. In addition, for the segments of the river network, some ranking is often necessary according to the degree of distance from the source. Information of this kind is important not only from the point of view of visualization, but also suitable for a more complete perception of the data structure, their spatial distribution, and correct subsequent processing.



The task of ranking watercourses can be illustrated as follows (Fig. 1):



Figure 1. The task of ranking watercourses, numbers indicate the attribute of the total number of inflowing tributaries assigned to each segment of the river network



Thus, each segment of the river needs to be associated with a value that would show how many segments collectively flow into the given section.



, ArcGIS open-source QGIS, . , , , , . , - . , , , (OpenStreetMap, Natural Earth, ).



, , , .



β€” . QGIS β€” "Lines Ranking". QGIS, GitHub.



For the plugin to work, the QGIS version> = 3.14 is required, as well as the following dependencies: Python programming language libraries - networkx and pandas.



, (Line, MultiLine). , .



  • β€œfid”. GRASS β€” v.clean β€” , .


(Start Point Coordinates), . , , , QGIS. , ( ).



: , , .





: . β€” , . , , . , , . . .



β€” , , .





, . / , .. , .. (. 2), , ..





2.



: β€œβ€œ , . QGIS – β€œ ” (fixgeometries β€” ) v.clean ( GRASS).



, . (. 3).





3.



, QGIS " " (splitwithlines β€” ) .



QGIS (. 4). ( -> -> -> ).





4.



" " (lineintersections β€” ) , . .



(. 5).





5.



Python networkx. , β€” , ( ), .





, , , ( ). , , "" (). :



  1. "rank" ( ) "offspring" ( );
  2. "value" "distance" ( ).


, , .



"rank" "offspring"



β€” . , "rank" . , , , , . ( )





? , , .



: - ( ), . (. 6).





6.



, 1 , β€” .



β€” ? β€” , , , β€” ( , , , ). A* (A-star), , . ( ).



1) . . . (. 7)





7.



, "length", . , .





distance_attr

Input:



  • G β€”
  • start β€” ,
  • dataframe β€” pandas , 2 : id_field ( /) 'length' ( )
  • id_field β€” dataframe,
  • main_id β€” , ( = None)


Output:



  • , 1, : 2 3, 'length' 1 β€” 10, 2 β€” 15, 3 β€” 20. : (1, 2) (2, 1) β€” 15; (1, 3) (3, 1) β€” 20 ..
  • last_vertex β€” ( )


# Function for assigning weights to graph edges
def distance_attr(G, start, dataframe, id_field, main_id = None):

    # List of all vertexes that can be reached from the start vertex using BFS
    vert_list = list(nx.bfs_successors(G, source=start))
    # One of the most remote vertices in the graph (this will be necessary for A*)
    last_vertex = vert_list[-1][-1][0]

    for component in vert_list:
        vertex = component[0]  # The vertex where we are at this iteration
        neighbors = component[1]  # Vertices that are neighboring (which we haven't visited yet)

        dist_vertex = int(dataframe['length'][dataframe[id_field] == vertex])
        # Assign the segment length value as a vertex attribute
        attrs = {vertex: {'component' : 1, 'size' : dist_vertex}}
        nx.set_node_attributes(G, attrs)

        for n in neighbors:  

            # If the main index value is not set
            if main_id == None:
                # Assign weights to the edges of the graph
                # The length of the section in meters (int)
                dist_n = int(dataframe['length'][dataframe[id_field] == n])   
            # Otherwise we are dealing with a complex composite index
            else:
                # If the vertex we are at is part of the main river
                if vertex.split(':')[0] == main_id:
                    # And at the same time, the vertex that we "look" at from the vertex "vertex" also, then
                    if n.split(':')[0] == main_id:
                        # The weight value must be zero
                        dist_n = 0
                    else:
                        dist_n = int(dataframe['length'][dataframe[id_field] == n])
                else:
                    dist_n = int(dataframe['length'][dataframe[id_field] == n])
            attrs = {(vertex, n): {'weight': dist_n},
                     (n, vertex): {'weight': dist_n}}
            nx.set_edge_attributes(G, attrs)    

            # Assign attributes to the nodes of the graph
            attrs = {n: {'component' : 1, 'size' : dist_n}}
            nx.set_node_attributes(G, attrs)

        # Look at the surroundings of the vertex where we are located
        offspring = list(nx.bfs_successors(G, source = vertex, depth_limit = 1)) 
        offspring = offspring[0][1]
        # If the weight value was not assigned, we assign it
        for n in offspring:

            if len(G.get_edge_data(vertex, n)) == 0:

                ##############################
                # Assigning weights to edges #
                ##############################
                if main_id == None:
                    dist_n = int(dataframe['length'][dataframe[id_field] == n])   
                else:
                    if vertex.split(':')[0] == main_id:
                        if n.split(':')[0] == main_id:
                            dist_n = 0
                        else:
                            dist_n = int(dataframe['length'][dataframe[id_field] == n])
                    else:
                        dist_n = int(dataframe['length'][dataframe[id_field] == n])
                attrs = {(vertex, n): {'weight': dist_n},
                         (n, vertex): {'weight': dist_n}}
                nx.set_edge_attributes(G, attrs)
                ##############################
                # Assigning weights to edges #
                ##############################

            elif len(G.get_edge_data(n, vertex)) == 0:

                ##############################
                # Assigning weights to edges #
                ##############################
                if main_id == None:
                    dist_n = int(dataframe['length'][dataframe[id_field] == n])   
                else:
                    if vertex.split(':')[0] == main_id:
                        if n.split(':')[0] == main_id:
                            dist_n = 0
                        else:
                            dist_n = int(dataframe['length'][dataframe[id_field] == n])
                    else:
                        dist_n = int(dataframe['length'][dataframe[id_field] == n])
                attrs = {(vertex, n): {'weight': dist_n},
                         (n, vertex): {'weight': dist_n}}
                nx.set_edge_attributes(G, attrs)    
                ##############################
                # Assigning weights to edges #
                ##############################

    for vertex in list(G.nodes()):
        # If the graph is incompletely connected, then we delete the elements that we can't get to
        if G.nodes[vertex].get('component') == None:
            G.remove_node(vertex)
        else:
            pass    
    return(last_vertex)    

# The application of the algorithm
last_vertex = distance_attr(G, '7126:23', dataframe, id_field = 'id', main_id = '7126')


2) A* (-star) ( ). "";



3) . , , 1, β€” 2, β€” 3 ..



4) . , , , .





rank_set

Input:



  • G β€”
  • start β€” ,
  • last_vertex β€”


Output:



  • . 1 , . 2 , 1. 3 , 2 .. , 'rank', 'offspring'. 'offspring' ' , '.


# Function for assigning 'rank' and 'offspring' attributes to graph vertices
def rank_set(G, start, last_vertex):

    # Traversing a graph with attribute assignment
    # G           --- graph as a networkx object
    # vertex      --- vertex from which the graph search begins
    # kernel_path --- list of vertexes that are part of the main path that the search is being built from
    def bfs_attributes(G, vertex, kernel_path):

        # Creating a copy of the graph
        G_copy = G.copy()

        # Deleting all edges that are associated with the reference vertexes
        for kernel_vertex in kernel_path:
            # Leaving the reference vertex from which we start the crawl
            if kernel_vertex == vertex:
                pass
            else:
                # For all other vertexes, we delete edges
                kernel_n = list(nx.bfs_successors(G_copy, source = kernel_vertex, depth_limit = 1))   
                kernel_n = kernel_n[0][1]
                for i in kernel_n:
                    try:
                        G_copy.remove_edge(i, kernel_vertex)
                    except Exception:
                        pass

        # The obtained subgraph is isolated from all reference vertices, except the one 
        # from which the search begins at this iteration                   
        # Breadth-first search
        all_neighbors = list(nx.bfs_successors(G_copy, source = vertex))    

        # Attention!                                 
        # Labels are not assigned on an isolated subgraph, but on the source graph 

        for component in all_neighbors:
            v = component[0] # The vertex where we are at this iteration
            neighbors = component[1] # Vertices that are neighboring (which we haven't visited yet)

            # Value of the 'rank' attribute in the considering vertex
            att = G.nodes[v].get('rank')
            if att != None:
                # The value of the attribute to be assigned to neighboring vertices
                att_number = att + 1

            # We look at all the closest first offspring
            first_n = list(nx.bfs_successors(G, source = v, depth_limit = 1))   
            first_n = first_n[0][1]

            # Assigning ranks to vertices
            for i in first_n: 
                # If the neighboring vertex is the main node in this iteration, then skip it
                # vertex - the reference point from which we started the search
                if i == vertex:
                    pass 
                else:
                    current_i_rank = G.nodes[i].get('rank')
                    # If the rank value has not yet been assigned, then assign it
                    if current_i_rank == None:
                        attrs = {i: {'rank': att_number}}
                        nx.set_node_attributes(G, attrs)
                    # If the rank in this node is already assigned
                    else:
                        # The algorithm either "looks back" at vertices that it has already visited
                        # In this case we don't do anything
                        # Either the algorithm "came up" to the main path (kernel path) in the graph
                        if any(i == bearing_v for bearing_v in kernel_path):
                            G.remove_edge(v, i)
                        else:
                            pass

            # Additional "search"
            for neighbor in neighbors:
                # We look at all the closest first offspring
                first_n = list(nx.bfs_successors(G, source = neighbor, depth_limit = 1))   
                first_n = first_n[0][1]

                for i in first_n: 
                    # If the neighboring vertex is the main node in this iteration, then skip it
                    # vertex - the reference point from which we started the search
                    if i == vertex:
                        pass 
                    else:
                        # The algorithm either "looks back" at vertices that it has already visited
                        # In this case we don't do anything
                        # Either the algorithm "came up" to the main path (kernel path) in the graph
                        if any(i == bearing_v for bearing_v in kernel_path):
                            G.remove_edge(neighbor, i)
                        else:
                            pass               

    # Finding the shortest path A* - building a route around which we will build the next searchs
    a_path = list(nx.astar_path(G, source = start, target = last_vertex, weight = 'weight'))

    ##############################
    #   Route validation block   #
    ##############################
    true_a_path = []
    for index, V in enumerate(a_path):
        if index == 0:
            true_a_path.append(V)
        elif index == (len(a_path) - 1):
            true_a_path.append(V)
        else:
            # Previous and next vertices for the reference path (a_path)
            V_prev = a_path[index - 1]
            V_next = a_path[index + 1]

            # Which vertexes are adjacent to this one
            V_prev_neighborhood = list(nx.bfs_successors(G, source = V_prev, depth_limit = 1)) 
            V_prev_neighborhood = V_prev_neighborhood[0][1]
            V_next_neighborhood = list(nx.bfs_successors(G, source = V_next, depth_limit = 1))
            V_next_neighborhood = V_next_neighborhood[0][1]

            # If the next and previous vertices are connected to each other without an intermediary
            # in the form of vertex V, then vertex V is excluded from the reference path
            if any(V_next == VPREV for VPREV in V_prev_neighborhood):
                if any(V_prev == VNEXT for VNEXT in V_next_neighborhood):
                    pass
                else:
                    true_a_path.append(V)
            else:
                true_a_path.append(V)
    ##############################
    #   Route validation block   #
    ##############################

    # Verification completed
    a_path = true_a_path     
    RANK = 1
    for v in a_path:
        # Assign the attribute rank value - 1 to the starting vertex. The further away, the greater the value
        attrs = {v: {'rank' : RANK}}
        nx.set_node_attributes(G, attrs) 
        RANK += 1

    # The main route is ready, then we will iteratively move from each node
    for index, vertex in enumerate(a_path):
        # Starting vertex
        if index == 0:
            next_vertex = a_path[index + 1]

            # Disconnect vertices
            G.remove_edge(vertex, next_vertex)

            # Subgraph BFS block
            bfs_attributes(G, vertex = vertex, kernel_path = a_path)      

            # Connect vertices back
            G.add_edge(vertex, next_vertex)

        # Finishing vertex
        elif index == (len(a_path) - 1):
            prev_vertex = a_path[index - 1]

            # Disconnect vertices
            G.remove_edge(prev_vertex, vertex)

            # Subgraph BFS block
            bfs_attributes(G, vertex = vertex, kernel_path = a_path)

            # Connect vertices back
            G.add_edge(prev_vertex, vertex)

        # Vertices that are not the first or last in the reference path
        else:
            prev_vertex = a_path[index - 1]
            next_vertex = a_path[index + 1]

            # Disconnect vertices 
            # Previous with current vertex
            try:
                G.remove_edge(prev_vertex, vertex)
            except Exception:
                pass
            # Current with next vertex
            try:
                G.remove_edge(vertex, next_vertex)
            except Exception:
                pass

            # Subgraph BFS block
            bfs_attributes(G, vertex = vertex, kernel_path = a_path)

            # Connect vertices back
            try:
                G.add_edge(prev_vertex, vertex)
                G.add_edge(vertex, next_vertex)
            except Exception:
                pass

    # Attribute assignment block - number of descendants 
    vert_list = list(nx.bfs_successors(G, source = start)) 
    for component in vert_list:
        vertex = component[0] # The vertex where we are at this iteration
        neighbors = component[1] # Vertices that are neighboring (which we haven't visited yet)

        # Adding an attribute - the number of descendants of this vertex
        n_offspring = len(neighbors)
        attrs = {vertex: {'offspring' : n_offspring}}
        nx.set_node_attributes(G, attrs)


:





, , . .



, "" , , , , , .



"value" "distance"



, "rank" "offspring". .



, , , , "value" β€” 1. , , "value", 1, "value" ( 1 , ) "value". , , .



, .



iter_set_values

Input:



  • G β€” 'rank' 'offspring'
  • start β€” ,


Output:



  • G 'value'. ' '. 'distance', , , .


# Function for determining the order of river segments similar to the Shreve method
# In addition, the "distance" attribute is assigned
def set_values(G, start, considering_rank, vert_list): 

    # For each vertex in the list
    for vertex in vert_list:

        # If value has already been assigned, then skip it
        if G.nodes[vertex].get('value') == 1:
            pass
        else:                    
            # Defining descendants
            offspring = list(nx.bfs_successors(G, source = vertex, depth_limit = 1)) 
            # We use only the nearest neighbors to this vertex (first descendants)
            offspring = offspring[0][1]

            # The cycle of determining the values at the vertices of a descendant
            last_values = []
            for child in offspring:
                # We only need descendants whose rank value is greater than that of the vertex
                if G.nodes[child].get('rank') > considering_rank:
                    if G.nodes[child].get('value') != None:
                        last_values.append(G.nodes[child].get('value'))
                    else:
                        pass
                else:
                    pass

            last_values = np.array(last_values)
            sum_values = np.sum(last_values)

            # If the amount is not equal to 0, the attribute is assigned
            if sum_values != 0:
                attrs = {vertex: {'value' : sum_values}}
                nx.set_node_attributes(G, attrs)
            else:
                pass

# Function for iteratively assigning the value attribute
def iter_set_values(G, start):

    # Vertices and corresponding attribute values
    ranks_list = []
    vertices_list = []
    offspring_list = []
    for vertex in list(G.nodes()):
        ranks_list.append(G.nodes[vertex].get('rank'))
        vertices_list.append(vertex)
        att_offspring = G.nodes[vertex].get('offspring')

        if att_offspring == None:
            offspring_list.append(0) 
        else:
            offspring_list.append(att_offspring)

    # Largest rank value in a graph
    max_rank = max(ranks_list)

    # Creating pandas dataframe
    df = pd.DataFrame({'ranks': ranks_list,
                       'vertices': vertices_list,
                       'offspring': offspring_list})

    # We assign value = 1 to all vertices of the graph that have no offspring
    value_1_list = list(df['vertices'][df['offspring'] == 0])
    for vertex in value_1_list:
        attrs = {vertex: {'value' : 1}}
        nx.set_node_attributes(G, attrs) 

    # For each rank, we begin to assign attributes
    for considering_rank in range(max_rank, 0, -1):

        # List of vertices of suitable rank
        vert_list = list(df['vertices'][df['ranks'] == considering_rank])
        set_values(G, start, considering_rank, vert_list) 

    # Assigning the "distance" attribute to the graph vertices
    # List of all vertexes that can be reached from the start vertex using BFS
    vert_list = list(nx.bfs_successors(G, source = start))
    for component in vert_list:
        vertex = component[0] # The vertex where we are at this iteration
        neighbors = component[1] # Vertices that are neighboring (which we haven't visited yet)

        # If we are at the closing vertex
        if vertex == start:
            # Length of this segment
            att_vertex_size = G.nodes[vertex].get('size')
            # Adding the value of the distance attribute
            attrs = {vertex: {'distance' : att_vertex_size}}
            nx.set_node_attributes(G, attrs)
        else:
            pass

        vertex_distance = G.nodes[vertex].get('distance')        
        # For each neighbor, we assign an attribute
        for i in neighbors:            
            # Adding the value of the distance attribute
            i_size = G.nodes[i].get('size')
            attrs = {i: {'distance' : (vertex_distance + i_size)}}
            nx.set_node_attributes(G, attrs) 




"value", "distance", , , , , , .



8. "rank" "value".





8. ( OpenStreetMap, "rank", - "value")





, . ( ), .



, , .



Python, QGIS . , .



.



Source code repository:





Mikhail Sarafanov and Yulia Borisova worked on the algorithm and article .




All Articles