Bypassing Incapsula with selenium

The customer contacted with a problem that his collector could not cope with the protection "incapsula".

In a nutshell, instead of the page code, a javascript code is returned, when executed, a request is made to the server encapsulations, some browser parameters are checked, and if the browser is recognized as valid, the page and some cookies are returned.





A detailed description is on the developer's website (www.imperva.com)



Adding a javascript handler, as well as other solutions offered by Google (for example, raising your servers), seemed too complicated / long. Selenium, as it turned out, perfectly bypasses this protection, but since there is a lot of data and collecting in one stream, (or even switching between tabs) I didn't want to, and there were not enough resources to launch several browsers, it was decided to write a proxy server.



Since the load changed, depending on the time of day and other conditions, it was decided to make a scalable web part through the combination of Nginx + uwsgi + flask. It seemed too costly to run the Selenium version for each worker, so it was decided to move Selenium into a separate service, with communication between blocks via Redis. To keep the implementation as simple as possible, requests are executed synchronously.





Project structure







uwsgi.ini – . , .  (



 selenium:

gecko/Sel.py



sellenium . , selenium  , ( ). cookie Redis. Cookie , redis. cookie    callback .





API:





src

, 1 url:





@app.route('/', methods=['GET', 'POST'])
      
      



, url url, , post .





:





http://127.0.0.1:5000/?url=https://www.example.com/vehicledetails/34313441?RowNumber=0& 
      
      



, , , .



request.py .

  requests,   .

Redis, Post, Get c reqests.

, cookie, Selenium .



 , . https, , , . . , .









uwsgi








All Articles