Common pitfalls of Python developers in interviews





Hello everyone, today I would like to talk about some of the difficulties and misconceptions that many job seekers face. Our company is actively growing and I often conduct or participate in interviews. As a result, I identified several issues that put many candidates in a difficult position. Let's look at them together. I'll cover Python-specific questions, but overall this article will work for any job interview. For experienced developers, no truths will be revealed here, but for those who are just starting their journey, it will be easier to decide on the topics for the next few days.



The difference between processes and threads in Linux



Well, you know, such a typical and, in general, simple question, purely for understanding, without delving into details and subtleties. Of course, most applicants will tell you that threads are lighter weight, context switches between them faster, and in general they live inside the process. And all this is correct and wonderful when we are not talking about Linux. In the Linux kernel, threads are implemented in the same way as normal processes. A thread is simply a process that shares some resources with other processes.



There are two system calls that can be used to create processes in Linux:



  • clone()



    . . , . ( , , ).
  • fork()



    . ( ), clone()



    .


I would point out the following: when you make a fork()



process, you don't immediately get a copy of the parent process's memory. Your processes will run with a single in-memory instance. Therefore, if in total you should have had a memory overflow, then everything will continue to work. The kernel will mark the memory page descriptors of the parent process as read-only, and an attempt to write to them (by the child or parent process) will raise and handle an exception that will cause a full copy to be created. This mechanism is called Copy-on-Write.



I think Linux is a great book on Linux devices. System Programming "by Robert Love.



Event Loop issues



Asynchronous services and workers in Python or Go are ubiquitous in our company. Therefore, we consider it important to have a common understanding of asynchrony and how the Event Loop works. Many candidates are already pretty good at answering questions about the advantages of the asynchronous approach and correctly represent the Event Loop as a kind of infinite loop that allows you to understand whether a certain event has come from the operating system (for example, writing data to a socket). But the glue is missing: how does the program get this information from the operating system?



Of course, the simplest thing to remember isSelect



... With its help, a list of file descriptors that you plan to monitor is formed. The client code will have to check all passed handles for events (and their number is limited to 1024), which makes it slow and inconvenient.



The answer about is Select



more than enough, but if you remember about Poll



or Epoll



, and talk about the problems they solve, then this will be a big plus to your answer. In order not to cause unnecessary worries: we are not asked for C code and detailed specification, we are talking only about a basic understanding of what is happening. Read about the differences Select



, Poll



and Epoll



can in this article .



I also advise you to look at the topic of asynchrony in Python by David Beasley .



The GIL protects, but not you



Another common misconception is that the GIL was designed to protect developers from concurrent data access issues. But this is not the case. The GIL will, of course, prevent you from parallelizing your program with threads (but not processes). In simple terms, the GIL is a lock that must be taken before any call to Python (not so important. Python code is executed or Python C API calls). Therefore, the GIL will protect internal structures from inconsistent states, but you, as in any other language, will have to use synchronization primitives.



They also say that the GIL is only needed for the GC to work correctly. For her, he, of course, is needed, but this is not the end of it.



From an execution point of view, even the simplest function will be broken down into several steps:



import dis

def sum_2(a, b):
    return a + b

dis.dis(sum_2)


4           0 LOAD_FAST                0 (a)
             2 LOAD_FAST                1 (b)
             4 BINARY_ADD
             6 RETURN_VALUE

      
      





From the processor's point of view, each of these operations is not atomic. Python will execute a lot of processor instructions per line of bytecode. In this case, you should not allow other threads to change the state of the stack or make any other memory modification, this will lead to a Segmentation Fault or incorrect behavior. Therefore, the interpreter requests a global lock on each bytecode instruction. However, the context can be changed between individual instructions, and here the GIL does not save us in any way. You can read more about bytecode and how to work with it in the documentation .



On the topic of GIL security, see a simple example:



import threading

a = 0
def x():
    global a
    for i in range(100000):
        a += 1

threads = []

for j in range(10):
    thread = threading.Thread(target=x)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

assert a == 1000000

      
      





On my machine, the error crashes stably. If suddenly it does not work for you, then run it several times or add threads. With a small number of threads, you will get a floating problem (the error appears and does not appear). That is, in addition to incorrect data, such situations have a problem in the form of its floating nature. This also brings us to the next problem: synchronization primitives.



And again, I cannot but refer to David Beasley .



Synchronization primitives



In general, synchronization primitives are not the best question for Python, but they show a general understanding of the problem and how deeply you dug in this direction. The topic of multithreading, at least with us, is asked as a bonus, and will only be a plus (if you answer). But it's okay if you haven't encountered it yet. We can say that this question is not tied to a specific language.



Many novice pythonists, as I wrote above, hope for the miraculous power of the GIL, so they don't look into the topic of synchronization primitives. But in vain, it can come in handy when performing background operations and tasks. The topic of synchronization primitives is large and well understood, in particular, I recommend reading about it in the book "Core Python Applications Programming" by Wesley J. Chun.



And since we have already looked at an example where the GIL did not help us in working with threads, we will consider the simplest example of how to protect ourselves from such a problem.



import threading
lock = threading.Lock()

a = 0
def x():
    global a
    lock.acquire()
    try:
        for i in range(100000):
            a += 1
    finally:
        lock.release()

threads = []

for j in range(10):
    thread = threading.Thread(target=x)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

assert a == 1000000

      
      





Retry all over the head



You can never rely on the fact that the infrastructure will always work stably. In interviews, we often ask to design a simple microservice that interacts with others (for example, over HTTP). The issue of service stability sometimes confuses candidates. I would like to point out a few issues that candidates overlook when proposing retrying over HTTP.



The first problem: the service may simply not work for a long time. Repeated requests in real time will be meaningless.



Roughly done Retry can finish off a service that has started to slow down under load. The least he needs is an increase in the load, which can grow significantly due to repeated requests. It is always interesting for us to discuss methods of saving state and implementing dispatch after the service starts working normally.



Alternatively, you can try to change the protocol from HTTP to something with guaranteed delivery (AMQP, etc.).



The service mesh can also take over the retry task. You can read more in this article .



Overall, as I said, there are no surprises here, but this article can help you figure out which topics to pull up. Not only for interviews, but also for a deeper understanding of the essence of the ongoing processes.



All Articles