With this publication, I plan to begin a series of articles on how new technologies in the field of cybersecurity can transform the entire IT industry. Usually, the fight against threats is assigned an auxiliary function, and few people think that in the near future, protection technologies can significantly change our life, making it not only safer, but fundamentally different. In my forecasts, I will try to focus on the period of 5-30 years. If you take more than 30 years, you can completely go into abstraction, and if less than 5, then the forecast will be too obvious. In the first part, we will talk about a new market for intellectual labor, which is practically absent at the moment, this is the market for algorithms.
Each programmer who was involved in complex optimization tasks, developed new cryptographic functions or received some new significant result from ML / AI development, the thought arose: is it possible to sell algorithms that have taken up such a large amount of intellectual work, and something make money on them? Typically, the answer to this question is no. Sometimes you can sell, but only once and only to some special services with an obligation not to use it anywhere else. When I was in graduate school at the department of systems analysis, local graduate students wrote many interesting and significant works on multi-criteria optimization, in which it was possible to improve individual algorithms and get a result that is several percent more accurate than the existing one. However, further development of these developments was never followed.except for the sale of the R&D itself as a service.
Can algorithms be sold?
Let us analyze the complexity of such an operation using a separate example. Let's say a developer has created a new hash function that is provably collision resistant. Such a useful result led him to the idea that it would be nice to sell access to the hash function. There are two ways to do this:
1. In-cloud: to host somewhere in the cloud and provide HASHaaS service. Today, this decision is just as simple, as meaningless. Even if we imagine that the speed and quality of the communication channel is sufficient to ensure the required SLA of the function calls, we will face the difficulty of sending the data itself to the cloud. The information that we want to hash is likely to be of some value to us. For example, we can find the hash function of a document for further certification with an electronic digital signature or use hashing of user passwords so as not to store them in the database. Sending passwords in clear form to some foreign server, in order to get a hash later - looks absurd. If you encrypt them for transmission, the remote server will still need to decrypt to calculate the hashes. Thus,it will receive all passwords, all documents and any other data that we want to hash. Using the cloud model turns out to be unviable, except in some rare cases when the information sent to a remote server does not have absolutely no value for us. But such situations are more likely an exception to the rule.
2. On-premise... The second way involves transferring the algorithm directly to the client's side, where he will run it himself. There are several complications here. If we pass a program in an interpreted (open) language, such as Python, then the client can do whatever he wants with it. It will be impossible to control further copying and modification of the code. If we pass it in a compiled form, then, firstly, it is not always convenient for the client, and secondly, tracking the logic of the algorithm and multiplying it will not be difficult. Even if we confuse the code in advance and remove all the debugging information, we can disassemble and trace the logic of the algorithm, since, most likely, the amount of code for analysis will not be too large. Thus, both trajectories lead the programmer to failure. The thought ofto generate intellectual property in the form of some specialized algorithms and live on passive income from it all my life - remains a dream ... Or not?
The revolution of recent years
Over the past decade, some theoretical areas of cryptography have gone a colossal way from unrealizable theoretical constructs to applied solutions. One of these areas is homomorphic encryption.
The homomorphism of the cipher is that the changes to the cipher text are similar to the changes made to the original text. Let's say Enc () is an encryption function, and Dec () is a decryption; then the addition homomorphism can be expressed as x + y = Dec ( Enc ( x ) + Enc ( y )). Similarly - by multiplication: x ∙ y= Dec ( Enc ( x ) ∙ Enc ( y )). If a cipher has a homomorphism of addition and multiplication at the same time, then it is called Fully Homomorphic Encryption (FHE). Why is this enough? Because any logical scheme can be built on these operations. In particular, NAND ( A , B ) = 1 + A ∙ B, and NAND, in turn, is a universal gate, that is, through it you can express any other logical operators and write any program. The first ideas about homomorphic ciphers appeared a long time ago - back in 1976. However, the first implementation of such a cipher was described only in 2009 ( Craig Gentry. Fully Homomorphic Encryption Using Ideal Lattices. In the 41st ACM Symposium on Theory of Computing (STOC), 2009 ). This design was so limited in practical application that it required about 30 minutes of calculations per elementary operation with sufficient cryptographic strength of the key length. Over the next couple of years, many different FHE circuits have emerged that are more suitable for such practical implementations. Some of the most famous are BGV and CKKS (Z. Brakerski, C. Gentry, and V. Vaikuntanathan. Fully Homomorphic Encryption without Bootstrapping, ITCS 2012 and Cheon, Jung Hee; Kim, Andrey; Kim, Miran; Song, Yongsoo. Homomorphic encryption for arithmetic of approximate numbers. Advances in Cryptology - ASIACRYPT 2017. Springer, Cham. pp. 409-437 ). This was followed by the emergence of many open-source implementations and libraries that implement homomorphic ciphers. One of the first was IBM with its HElib library (2013), then HEAAN from Seoul National University (2016), PALISADE from DARPA (2017), extended SEAL from Microsoft (2018) and many other implementations, including accelerated by GPU, FPGA, etc.
The FHE example shows how in 10 years the path has been passed from abstract theoretical ideas to concrete applied solutions. Homomorphic cryptography opens up a number of new perspectives: for example, it makes it possible to work with data without decryption. Previously, in order to extract and process information from a large encrypted database, it was first necessary to download the entire database, decrypt it, change it in the right places, and then encrypt and send it again. Now this can be done in one operation. In an encrypted database, we immediately find the desired cell and modify it without resorting to decrypting the entire table.
Now, returning to the “in-cloud” scheme, we can implement a remote trading platform (“marketplace”) of algorithms, where it will be possible to send not open, but encrypted data. This makes the business layout much more realistic. Now any access to the service does not oblige the client to disclose anything. We can send personal information, accumulated big data and any other encrypted confidential information and receive the processing result also in encrypted form, the key to which is only with us.
Another trajectory is to sell access to the algorithm on-premise. Here it is worth paying attention to another discovery of cryptography in recent years. This is the so-called indistinguishability obfuscation. For the first time the idea of indistinguishable obfuscation was voiced in 2001 ( B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, SP Vadhan, and K. Yang. On the (im) possibility of obfuscating programs. CRYPTO, 2001, pp. 1-18) in connection with the need to rethink the formalized obfuscation problem, since the previous approaches to it from a mathematical point of view were not entirely correct and did not give measurable indicators of how well or poorly the program was obfuscated. Practically the same team of researchers in 2013 proposed a solution to the problem that they set themselves in 2001. They managed to find a design that could be a candidate for the role of such an obfuscator ( Sanjam Garg; Craig Gentry; Shai Halevi; Mariana Raykova; Amit Sahai; Brent Waters. Candidate Indistinguishability Obfuscation and Functional Encryption for all Circuits. Focs 2013, 40–49 ). The essence of indistinguishable obfuscation can be explained as follows. Let's say we have a program like obf(), which receives a certain program code as input, and outputs it in an obfuscated (confused) form at the output. Moreover, if we have two programs of equal functionality A and B , then having received their obfuscated variants obf ( A ) and obf ( B), we, with an accuracy of negligible values, will not be able to understand which of the two was fed to the input of the obfuscator (a similar approach is used to formulate the indistinguishability of encryption algorithms). From this follows several unobvious conclusions, one of which is the ability of the program at the output of the obfuscator to keep a "secret" inside itself. It can be, for example, an encryption key - then it is transmitted along with the executable code and at the same time cannot be extracted.
The possible bonuses from using indiscernible obfuscation are not limited to this. Another important consequence of using indiscernible code is that you don't have to trust the hardware. Any data processing can be performed on untrusted hardware. As a result, billions spent on the development of domestic computers, or the confrontation between Huawei and the United States become meaningless, if justified by the requirements of trust in the hardware. But this is already the subject of another article.
In the case of the sale of algorithms, it becomes possible to transfer their obfuscated code to the client. Moreover, even if we put in the code a certain individualization for a certain user, the client will not be able to “extract” this customized part from the code. As a result, we not only make it impossible to analyze the principles of the algorithm, but also get a way to supply programs with a certain inseparable label from them, which will always leave a digital mark when distributed on the Internet. However, it is worth noting that the current implementation of indistinguishable obfuscation is so cumbersome that it is too early to talk about its practical use. Most likely (unlike in-cloud), we will see the on-premise implementation scheme no earlier than in 10 years.
Forecasts and caveats
Thus, we see that in the next 5 or more years, the market for algorithms may appear in the form of cloud schemes, and subsequently, it is possible to perform "on-premise". Of course, this will not happen overnight. To form such relationships, there should still appear:
- The platform (or platforms) for the exchange of data between providers and consumers of algorithms. They must automatically perform all FHE functions at the level of a certain transport layer. Then the service will really become convenient and, most importantly, understandable for all market participants, because now few IT specialists know what FHE is and how to use it.
- Big data exchange is still hampered by the limited speed of communication channels. Therefore, here it is necessary to wait until the bandwidth of the channels organically grows to the required values, or to launch additional data preprocessing services on the client side, which can be part of the platforms and frameworks in section 1.
The development of the market for algorithms can have a significant impact on many sectors of the economy. Several areas can be identified that will definitely be influenced by this transformation.
Big data.The configuration of the modern big data market is made up not only of the data sets themselves, but also - even more so - of the analytics centers that are able to extract knowledge and build models based on this information. Every big data collector, such as a telecom operator, bank or retail, has its own staff of analysts who develop knowledge extraction models and sell these materials to other consumers. With the availability of a rich marketplace of algorithms and models for working with data, these divisions will lose their importance. Bigdata accumulators will no longer be able to add value to the information extracted from big data and will be forced to sell only "raw" material, the cost of which will also begin to devalue over time, likehow classic raw materials (oil, gas, aluminum, etc.) are depreciating now.
Three levels of development. The classic dichotomy of development right now is the “backend and frontend”. The latter writes the user interface, the former writes the entire server-side logic of the application. Here a new layer can be formed, which can be designated as “algoend”. It will contain the key, most important and complex algorithms (NLP, ML / AI, Data Mining, Blockchain, etc.). In other words, algoend is the essential content of any development, and frontend and backend are its individualization for a specific project. Algoend will require maximum qualifications and will go into the field of additional services that form a new service market. In turn, frontend and backend are the labor market, the cost of which will decrease.
C2B market.Already from the first two points we can conclude about the transformation taking place in the labor market. The development of new technologies in the field of cybersecurity will revive the virtually absent now sector C2B. In other words, we are moving from legal schemes for controlling intellectual property (for which only large corporations can now fight) to technological schemes that anyone can use. If the produced intellectual property is inseparable from the service that uses it, then there is no need for legal and organizational costs to maintain the mode of its use.
Legal services market. It is generally accepted that the transition to the information economy creates a great demand for lawyers who deal with patents and legal disputes. Until a certain moment, this was really so. However, 10 or more years ahead, I would predict the complete death of this service market (at least in the IT field). Already, patenting and registering algorithms looks like a not very practical and really something protective procedure, all developers are more inclined to leave important developments as some kind of know-how rather than disclosing and patenting the results. Another important fact is added here - the code at the output of an indistinguishable obfuscator cannot be the subject of copyright. This follows from the very definition of indistinguishable obfuscation, since it is impossible to determine and provewhat kind of software structure was fed to him at the entrance. I would predict that there will be no more legal disputes in the IT industry 10 years later, at least in the form we see it now.
The predictions voiced in this article, of course, like any other predictions, may not come true. Discovery and development in R&D is the most thankless area for forecasting. We cannot say, for example, that the current complex designs of indistinguishable obfuscation will be improved and become practical in 5 years. This may not happen. It would be more correct to assume that the forecasts and conclusions of this article themselves will come true with a high probability, but the time intervals for which they are laid are significantly more uncertain.
The original article is published here.