Why it is worth testing applications for race condition

If your application or service operates with an internal currency, you should test it for race condition vulnerabilities. Race condition is a "floating bug" that can be exploited by attackers. The bottom line is that, thanks to parallel code execution, you can access the internal currency of the application, manipulate it and, if desired, cause significant financial damage to the owner of the service. We recently discovered this problem with one of our clients and helped solve it.







What is a race condition



Since developers often forget that code can be executed by several threads at the same time, they do not test the product for a race condition, although this error is quite common.



From the point of view of the backend, it looks like this: several threads simultaneously access the same shared resource: variables or files for which there is no lock or synchronization. This leads to inconsistent data output.



Here is a concrete example of such a vulnerability. Let's say we have an application that allows you to transfer bonuses between payment wallets. The attacker has two wallets - A and B, and each of them has 1000 bonuses. The diagram shows how by manipulating the time of sending a request for a transaction, an attacker can increase the amount of transfer to his account and make 10 bonuses - 20.







There are automatic tools to search for such vulnerabilities. For example, RacePWN, which in a minimum time sends many HTTP requests to the server and accepts json configuration as input, making it easier for attackers to attack. This is done manually by sending POST requests.



Deadly race condition



In the United States, from June 1985 to January 1987, a race condition error in the Therac-25 radiation therapy machine, created by the Canadian state organization Atomic Energy of Canada Limited (AECL) , caused six radiation overdoses . The victims received doses of tens of thousands of glad. The level of 1000 is considered lethal. After the resulting burns, the victims died within a few weeks. Only one patient managed to survive.



Previous Therac models had hardware protection mechanisms: independent blocking circuits that control the electron beam; mechanical blockers; hardware circuit breakers; disconnecting fuses. Hardware protection has been removed in Therac-25. The software was responsible for security. The device had several modes of operation, and due to a race condition error, the doctor sometimes did not understand in which mode the device actually works. During the court proceedings, it turned out that the Therac-25 software was developed by one programmer, but AECL did not have information on who exactly.



As a result of the process, the US government has seriously tightened the requirements for the design and operation of systems whose safety is critical for people.



How to protect yourself



The easiest and cheapest way to solve the race condition problem is to design the application architecture correctly. Here's what should be foreseen for this.



  • Locking critical records in the database. There are different ways to ensure that you work with recording one stream at a particular point in time. The main thing is not to block anything unnecessary.
  • Isolation of transactions in the database , which ensures that transactions are committed sequentially. The most important thing here is to strike a balance between safety and speed.
  • . . , , , . , , , .




Our client is an online grocery delivery store that supports the function of providing discounts using coupons. During testing, we discovered a vulnerability - when sending a POST request with a coupon value. By sending a request with different time delays, it was possible to get a discount twice. Apparently, the developers made a gross mistake related to shared access to the object that was identified with the purchase.



Most likely there was such pseudocode without synchronization mechanisms:



1 If promo_flag is not set:

2 Price = get_price ()

3 Price - = price * promo_percent;

4 set_price (price)

5 set_promo_flag ()

...

Here, applying a promo code and setting the appropriate flag is not an atomic operation. Most likely, when the second application of the promotional code began, the first one stopped on the 5th line (that is, it has not yet been executed). At this moment, the get_price () function in the second line returned a new price value, already with a discount.



Decision



The problem is solved simply:



1 acqure_mutex ()

2 If promo_flag is not set:

3 Price = get_price ()

4 Price - = price * promo_percent;

5 set_price (price)

6 set_promo_flag ()

7 release_mutex ()

...

Now, the application of the promo code will be performed completely and completely once. Even when a situation arises in which the second thread tries to apply the promo code while the first process is already busy with processing, it will not be able to do so. The mutex will block access to the "critical section", and the second process will have to wait until the first is finished.



Race condition should not be underestimated. Better to spend time and resources looking for vulnerabilities in order to avoid unforeseen consequences, including for the company's budget.






Blog ITGLOBAL.COM - Managed IT, private clouds, IaaS, information security services for business:






All Articles