🌵 👀 💳 What happens when Google searches use the word "vs" 🧖🏼 👩🏾 👩‍👩‍👧‍👧

Have you ever had this: search for something on Google and enter after the searched word "vs", hoping that the search engine will automatically suggest you something that looks a bit like what you need?

Entering "vs" after the search word

This has happened to me.

As it turned out, this is a big deal. This is a technique that, when looking for an alternative to something, can save a ton of time.

I see 3 reasons why this technique shows itself perfectly if it is used to search for information about technologies, certain developments and concepts that they want to understand:

The best way to learn something new is to find out how it is, new, similar to what is already known, or how new differs from known. For example, in the list of sentences that appears after "vs", you can see something about which you can say: "Ah, so it turns out that what I'm looking for looks like this, I'm already familiar."
— . , , .
«vs» — , Google , - -. «or», - -. , «or», Google , - .

When processing a bert or request, Google makes suggestions regarding Sesame Street. And the query "bert vs" gives hints on Google BERT.

It got me thinking. But what if we take the words that Google suggested after entering “vs” and search in them, also adding “vs” after them? What if you repeat this several times? If so, you can get a nice network graph of related queries.

For example, it may look like this.

Ego graph for a bert query with a radius of 25

This is a very useful technique for creating mental maps of technologies, developments, or ideas that reflect the interconnectedness of such entities.

I'll tell you how to build such graphs.

Automating the collection of “vs” data from Google

Here is a link you can use to get suggestions for auto-completion in XML from Google. This feature does not seem like an API intended for general use, so it probably shouldn't be too heavy on this link.

http://suggestqueries.google.com/complete/search?&output=toolbar&gl=us&hl=en&q=<search_term>

The URL parameter output=toolbarindicates that we are interested in the results in XML format, gl=ussets the country code, hl=enallows you to specify the language, and the construction q=<search_term>is exactly what you need to get the auto-completion results for.

The parameters gland hluse the standard two-letter country and language identifiers .

Let's experiment with all this by starting the search, say, with a query tensorflow.

The first step of the work is to refer to the specified the URL, using the following structure describing the query: q=tensorflow%20vs%20. The entire link will look like this:

http://suggestqueries.google.com/complete/search?&output=toolbar&gl=us&hl=en&q=tensorflow%20vs%20

In response, we will receive XML data.

What to do with XML?

Now you need to check the results of the auto-completion against a certain set of criteria. With those that suit us, we will continue to work.

Checking the results

I, when checking the results, used the following criteria:

The recommended search query should not contain the text of the original query (i.e. - tensorflow).
The recommendation should not include requests that were previously found to be suitable (for example - pytorch).
A recommendation should not include multiple "vs" words.
After 5 matching searches have been found, the rest are not considered.

This is just one way to "clean up" Google's autocomplete search suggestions list. I also sometimes see the benefit of choosing from a list of only one-word recommendations, but the use of this technique depends on each specific situation.

So, using this set of criteria, we got the following 5 results, each of which is assigned a certain weight.

5 results

Next iteration

Then, these 5 found recommendations are subjected to the same processing as the initial search query. They are passed to the API using the word “vs” and again 5 autocompletion results are selected that meet the above criteria. Here is the result of such processing of the above list.

Finding Auto-Complete Results for Words

That Have Already Found You can continue this process by examining the words in the column that have not yet been examinedtarget.

If you run enough iterations of this word search, you get a fairly large table containing information about queries and about weights. This data is well suited for graphical visualization.

Ego counts

The network graph that I showed you at the beginning of the article is the so-called ego graph, built, in our case, for a query tensorflow. An ego graph is a graph, all nodes of which are at some distance from the node tensorflow. This distance must not exceed the specified distance.

How is the distance between nodes determined?

Let's take a look at the finished graph first.

Ego-graph for tensorflow query with radius 22 We already know the

weight of the edge connecting the queryAandB. This is the rank of the recommendation from the auto-completion list, ranging from 1 to 5. To make the graph undirected, you can simply add the weights of connections between vertices going in two directions (that is, fromAtoB, and, if there is such a connection, fromBtoA) . This will give us edge weights ranging from 1 to 10.

The edge length (distance), thus, will be calculated using the formula11 — ... We chose 11 here because the maximum edge weight is 10 (an edge will have that weight if both recommendations appear at the very top of each other's auto-completion lists). As a result, the minimum distance between requests will be 1. The

size (size) and color (color) of the graph vertex is determined by the number (count) of cases in which the corresponding request appears in the list of recommendations. As a result, it turns out that the larger the peak, the more important the concept it represents.

The ego graph in question has a radius of 22. This means that you can get to each request from the top by tensorflowpassing a distance not exceeding 22. Let us take a look at what happens if you increase the radius of the graph to 50.

Ego-graph for tensorflow query with radius 50

It turned out interesting! This graph contains most of the basic technologies that those involved in artificial intelligence should know about. Moreover, the names of these technologies are logically grouped.

And it's all built around one single keyword.

How to draw such graphs?

I used the Flourish online tool to draw such a graph .

This service allows you to build network graphs and other diagrams using a simple interface. I suppose it is worth looking at for those interested in building ego graphs.

How to create an ego graph with a given radius?

You can use the Python package to create an ego graph with a given radius networkx. It has a very handy feature ego_graph. The radius of the graph is indicated when this function is called.

import networkx as nx

#  
#nodes = [('tensorflow', {'count': 13}),
# ('pytorch', {'count': 6}),
# ('keras', {'count': 6}),
# ('scikit', {'count': 2}),
# ('opencv', {'count': 5}),
# ('spark', {'count': 13}), ...]

#edges = [('pytorch', 'tensorflow', {'weight': 10, 'distance': 1}),
# ('keras', 'tensorflow', {'weight': 9, 'distance': 2}),
# ('scikit', 'tensorflow', {'weight': 8, 'distance': 3}),
# ('opencv', 'tensorflow', {'weight': 7, 'distance': 4}),
# ('spark', 'tensorflow', {'weight': 1, 'distance': 10}), ...]

#   
G=nx.Graph()
G.add_nodes_from(nodes)
G.add_edges_from(edges)

# -  'tensorflow'
EG = nx.ego_graph(G, 'tensorflow', distance = 'distance', radius = 22)

#  
subgraphs = nx.algorithms.connectivity.edge_kcomponents.k_edge_subgraphs(EG, k = 3)

# ,  'tensorflow'
for s in subgraphs:
    if 'tensorflow' in s:
        break
pruned_EG = EG.subgraph(s)

ego_nodes = pruned_EG.nodes()
ego_edges = pruned_EG.edges()

In addition, I used another function here - k_edge_subgraphs. It is used to remove some results that do not meet our needs.

For example, it stormis an open source framework for real-time distributed computing. But this is also a character from the Marvel universe. What search suggestions do you think will "win" if you type "storm vs" into Google?

The function k_edge_subgraphsfinds groups of vertices that cannot be divided by performing kor fewer actions. As it turned out, here the values of the parameters k=2and k=3. In the end, only those subgraphs remain, to which they belong tensorflow. This ensures that we don't stray too far from where we started our search and don't go too far away.

Using ego graphs in life

Let's move away from example c tensorflowand consider another ego graph. This time - a graph dedicated to something else that interests me. This is a chess opening called the "Spanish game" (Ruy Lopez chess opening).

▍Research of chess openings

Researching the "Spanish Game" (ruy lopez)

Our method allowed us to quickly discover the most common opening ideas, which can help a chess researcher.

Now let's look at other examples of using ego graphs.

▍Healthy food

Cabbage! Yummy!

But what if you have a desire to replace the beautiful, incomparable cabbage with something else? The ego-graph built around cabbage ( kale) will help you with this .

Ego graph for kale query with radius 25

▍We buy a dog

There are so many dogs, and so little time ... I need a dog. But which one? Maybe something like a poodle ( poodle)?

Ego graph for poodle query with radius 18

▍ Looking for love

Dog and cabbage change nothing? Need to find your soulmate? If so - here is a small but very self-sufficient ego graph that can help with this.

Ego graph for request coffee meets bagel with radius 18

▍What if the dating apps didn’t help?

If dating apps are useless, you should, instead of hanging in them, watch the series, stocking up with ice cream with the taste of cabbage (or with the taste of a recently discovered arugula). If you like The Office (certainly the one filmed in the UK), then you may also like some of the other shows.

Ego graph for the office query with radius 25

Summary

That concludes my story about the use of the word "vs" in Google searches and about ego graphs. I hope all of this helps you at least a little in finding love, a good dog and healthy food.

Are you using any unusual search techniques on the Internet?

What happens when Google searches use the word "vs"