All the fun is under the cut!
Scrolling through the feed of our favorite social network in the morning doze, we hardly think about how the algorithm works, it provides information that is interesting to us. Thanks to this and other algorithms, the content follows us everywhere. If you're lucky - it looks like a large cozy blanket with atmospheric photos and music, if you're not lucky - it reaches behind us with an annoying and sticky cloud that you want to brush aside, although this is not always the case.
It seems that we did not notice when the physical world had a new dimension - the dimension of content with its own rules and characteristics. But we got used to it quickly.
The abundance of information makes us forget about how to get and sift grains of knowledge and experience - after all, it, ready and sorted, is already on our plate, like an assortment of delicacies. But where does all this come from, and most importantly - how can we influence our content environment? And can we?
History of Ranking and Searching
Contrary to popular belief, tools to select and rank information for various useful purposes are a fairly old invention. It appeared not now, but in the era of now half-forgotten library catalogs.
Before the invention of the book-printing press in the 15th century, the library catalog was just an inventory of precious books with their titles. It was the appearance of printed copies that gave rise to the need for cataloging and convenient search for the necessary works in librarians and readers.
It is rather difficult to establish who exactly became the creator of the first catalog, some sources attribute its invention to Johann Trithemius, Abbot of Spongheim, librarian, historian and lover of cryptography, but most mentions Gottfried Van Swieten, an Austrian official and prefect of the imperial library in Vienna.
It was Gottfried Van Swieten who in 1780 created the first card catalog, very similar to modern library catalogs - cards with the title of the book, the name of the author, the year of publication and a brief description. We can say that the card catalog has become a harbinger of modern search engines - after all, in fact, it was the first meta information - that is, information about other information necessary for search and navigation. Of course, Van Swieten's modest cards could not cope with all the needs of readers and researchers - but they were replaced only in 1870 thanks to the invention of the American librarian Melville Dewey.
Dewey worked for a long time to improve the efficiency of cataloging and came to a completely new system based on the classification of books by content, the so-called decimal system. Her idea was based on dividing all works into ten sections - from general to religion, language and geography and history. Each section, in turn, was divided into ten subsections, and so on, while the code was formed from the numerical indexes of the section and subsections, which were indicated glory to the right, for example:
500 Natural sciences and mathematics
510 Mathematics
516 Geometry
In fact, it was the first country-wide thematic directory, allowing you to easily find any information you need. Moreover, due to the absence of non-numeric characters in the indexes of topics, Dewey's system was ideally suited for machine processing and is still valid in libraries in the USA and Canada.
This invention pushed the Belgian bibliographers Paul Otle and Henri La Fontaine to an even more daring idea - to replace paper books with a system of electronic cards with fragments of information, which would allow them to classify them without the subjective opinion of the author. In 1934, this idea was embodied in the book "Monde" by Paul Otlet, which, according to many researchers, anticipated the creation of the Internet. Unfortunately, this book is difficult to find in Russian, so I will give only one quote in English:
“Everything in the universe, and everything of man, would be registered at a distance as it was produced. In this way a moving image of the world will be established, a true mirror of his memory. From a distance, everyone will be able to read text, enlarged and limited to the desired subject, projected on an individual screen. In this way, everyone from his armchair will be able to contemplate creation, as a whole or in certain of its parts. ”
Reminds us of our realities, doesn't it?
Unfortunately, Paul Otlet's ideas did not become reality during his lifetime, and the world wide web was born much later. And already in 1998, with the invention of the PageRank algorithm for evaluating Internet pages by Sergey Brin and Larry Page, the era of endless web surfing began.
Information has become available, search is convenient and easy. And with the advent of new storage and computing power, business began to collect data.
The Double-Edged Sword of Big Data
Increasing data collection promises new business opportunities, from better customer insight to completely new digital products.
Analytics from jewelry work on testing each hypothesis has turned into a search for stable patterns in huge amounts of data describing people and phenomena of the world. This approach made it possible to see things that were simply not available before, to model and optimize various processes, from advertising to product offers, to personalize customer experience in different areas and improve it to the delight of the client and the business. This leap, in my opinion, is comparable to the transition from a medieval book inventory to a coherent system of card catalogs, where each object is assigned its own shelf space and tag.
Nevertheless, working with big data has not yet become a panacea for everything, and there are several reasons for this.
- , , , . , – , , , .
- , . , , , , , .
- , , . , – , .
- – , , – , -.
Despite these limitations, more and more companies are finding the resources and opportunities to deploy their own services to personalize the customer experience and increase their bottom line. From a source of knowledge, data turns into a source for monetization, sometimes quite aggressive. In some cases, there are even possible side effects for both the client and the business: from information overload to the so-called content bubble. And before we talk about them, let's figure it out - what is hiding under the hood of the recommendations?
Under the Hood of Personal Recommendations
Most of the models offering content, product, or service fall into one of five simple concepts.
- . , – , , , .
- . , / , , .
- . , , « – » . , – , .
- . , – , . , – , . – , . , , 70- – .
- – , .
Recommendations Issues and Content Environment Reloading
All of these models work pretty well (even heuristics!), But can still lead to unpleasant situations:
- Oversaturation. Many similar models trained on incomplete data (after all, every company has only a piece of knowledge) attack you with the same proposals. Let's say you are a coffee lover. And so, this morning you were offered a wonderful fragrant cappuccino in the nearest cafe. The proposal sparked your enthusiasm and pleasure in absorbing the crema. But then another push comes knocking with a hint of coffee, another banner - and now there are fifteen of them. How many cups of coffee can you drink in a day?
- – , , / , . , – .
- – -, .
- – , , 9 , . , , .
- – , , , - . , .
Such situations are extremely undesirable not only for the client, but also for the business, since they can significantly reduce the desire to continue interacting with the advertised services, or use one or another product or application.
A significant part of them can be corrected within the recommendation system, for example, irrelevant times or intrusive recommendations are eliminated by a well-developed communication policy and schedule.
Even the content bubble can become less monotonous if you add competing algorithms to the recommendation system that will show alternative proposals, or an additional element of randomness that will offer you something completely new and, if interested, expand the boundaries of the recommendations (see Figure 1).
Figure: 1 Competing models with random additions.
Nevertheless, some of the consequences of imperfect recommendations will have to be treated ourselves. What methods can help you fight for an enjoyable content environment?
How to Improve Your Content Environment
To find your way to your content that is relevant and relevant, try to play with the algorithms around you and find out what they respond to best. But before that, I propose to adopt a few simple data science hygiene rules that will save you from the most annoying recommendations.
- – , , , . – , – , email.
- – , .
- – « », , - .
- Be careful when paying for purchases - it is best to have separate payment instruments for all family members, and sometimes for separate purposes.
- Turn off wifi periodically in places with many public networks.
Otherwise, use active search more often and try something new. Most good recommender models use not only retro data (data on your activity over a long period), but also data on current actions, giving them higher priority. After playing around with new requests a little, you can get a portion of content to suit your current mood.
And if this does not seem enough, join the slender ranks of datascientists to create that very ideal recommendation system and learn all the subtleties from the inside. Machine learning is indispensable without an inquisitive human mind!
On this topic:
- « Data Science: »;
- - «‌ ‌Data‌ ‌Scientist‌».
- www.history.ox.ac.uk/british-medieval-library-catalogues
- Fred Lerner, “The story of libraries, from invention of writing to the computer age”, continuum, 2006
- en.wikipedia.org/wiki/Gottfried_van_Swieten#As_librarian
- en.wikipedia.org/wiki/Dewey_Decimal_Classification
- Milena Tsvetkova. – – : . Scientific Enquiry in the Contemporary World: Theoretical basis and innovative approach, 2016, San Francisco, United States. pp.115-128
- boxesandarrows.com/forgotten-forefather-paul-otlet
- www.mondotheque.be/wiki/images/e/e2/Heuvel_Rayward_Facing_Interfaces.pdf
- Sergey Brin, Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. — 1998.
- googleblog.blogspot.com/2009/12/personalized-search-for-everyone.html