Instead of a preface
It all started with the fact that I was offered to participate in the project within the framework of the "Basics of Web Programming" subject, instead of doing laboratory work and coursework, since I stated that I wanted to be doing something remote from the general course (and so there was already enough knowledge on a bunch of DRF + Vue, I wanted something new). And so in one of my PRs on github, I decided to use full-text search (the assignment hinted at this) to filter the content, which made me turn to the Django documentationin search of how best to implement this business. I think you know most of the methods suggested there (contains, icontains, trigram_similar). All of them are suitable for some specific tasks, but not very good at, namely, full-text search. Scrolling down a bit, I came across a section that talked about the interaction of Django and Pgsql to implement document-based search, which attracted me, since postgre has a built-in tool for implementing this very [full-text] search. And I decided that most likely, django simply provides an API for this search, on the basis of which such a solution should work and more accurate and faster than any other options. The teacher did not believe me too much, we argued with him, and he offered to conduct research on this topic. And here I am.
Beginning of work
The first problem that arose before me was the search for a database mockup, so as not to come up with any incomprehensible things myself, and I went to google and read the postgres wiki . As a result, I settled on their demo base about flights across Russia.
Okay, the base has been found. Now you need to decide what filtering methods will be used for comparison. The first thing I would like to use is, of course, the standard search method from django.contrib.postgres.search. The second is contains (searches for a word in a string) and icontains (provides data, ignoring accents, for example: for the query "Helen" the result will be: <Author: Helen Mirren>, <Author: Helena Bonham Carter>, <Author: Hélène Joy>) which django itself provides. I also want to compare all these filtering methods with the built-in search inside postgresql. I decided to search for the tickets table in the small version, it contains 366733 entries. The search will be performed on the passenger_name field, which, as you might guess, contains the passenger's name. It is written in transliteration.
Let django work with an existing database
β django . django , , :
$ python manage.py inspectdb > models.py
, , settings.py. . , . , ( ), , 300+ , 10, , . , , curl. .
, , , curl, , . , ( ).
django
, β , queryset - . .
A QuerySet is iterable, and it executes its database query the first time you iterate over it. For example, this will print the headline of all entries in the database:
for e in Entry.objects.all(): print(e.headline)```
class TicketListView(g.ListAPIView):
serializer_class = TicketSerializer
def get_queryset(self):
queryset = ''
params = self.request.query_params
name = params.get('name', None)
if name:
start_time = d.datetime.now()
queryset = queryset.filter(passenger_name__contains=name)
print('len of result is {} rows'.format(len(queryset)))
end_time = d.datetime.now()
time_diff = (end_time - start_time)
execution_time = time_diff.total_seconds() * 1000
print("Filter execution time {} ms".format(execution_time))
return queryset
Contains
Let's start with contains, it basically works like a WHERE LIKE.
queryset = queryset.filter(passenger_name__contains=name)
SELECT "tickets"."ticket_no", "tickets"."book_ref", "tickets"."passenger_id", "tickets"."passenger_name", "tickets"."contact_data" FROM "tickets" WHERE "tickets"."passenger_name"::text LIKE %IVAN%
In order to get the result from curl, I executed the request as follows (counted in seconds):
$ curl -w "%{time_total}\n" -o /dev/null -s http://127.0.0.1:8000/api/tickets/?name=IVAN
1,242888
I put everything in a table on the appropriate sheet.
β , 140 1400 . , . ORM 73 600 , 55 100 .
Icontains
Icontains - ( , ). , contains β icontains. .
queryset = queryset.filter(passenger_name__icontains=name)
SELECT "tickets"."ticket_no", "tickets"."book_ref", "tickets"."passenger_id", "tickets"."passenger_name", "tickets"."contact_data" FROM "tickets" WHERE UPPER("tickets"."passenger_name"::text) LIKE UPPER(%IVAN%)
, , ( 300 ), 200 1500 . ORM β 200 700 .
Full text search ( django.contrib.postgres)
, full text search . 1300 , 1000 1700 . , ORM β 1000 1450 .
class TicketListView(g.ListAPIView):
serializer_class = TicketSerializer
def get_queryset(self):
# queryset = Tickets.objects.all()
queryset = ''
params = self.request.query_params
name = params.get('name', None)
if name:
start_time = d.datetime.now()
queryset = Tickets.objects.filter(passenger_name__search=name)
end_time = d.datetime.now()
time_diff = (end_time - start_time)
execution_time = time_diff.total_seconds() * 1000
print("Filter execution time {} ms".format(execution_time))
f = open('results.txt', 'a')
f.write('{}'.format(execution_time))
f.write('\n')
f.close()
return queryset
Full text search ( rest_framework.filters, β SearchFilter)
FTS, FTS , , contains icontains. 200 1710 .
FTS , . , 800 1120 .
...
from rest_framework import filters as f
class TicketListView(g.ListAPIView):
queryset = Tickets.objects.all()
serializer_class = TicketSerializer
filter_backends = [f.SearchFilter]
search_fields = ['@passenger_name']
django-filter
contains icontains, . , django-filter - Django ORM.
?
β (, , ) , . β . , ( , , contains/icontains) , , , , .
Overall, my understanding of some of the inner workings of django has settled down thanks to this research. And finally came the realization of the difference between substring search and full-text search. The difference in their implementation through the Django ORM.