New neural network will help computers encode themselves

Hello, Habr! I present to your attention the translation of the article "A new neural network could help computers code themselves" by Will Douglas Heaven.



image



The tool detects similarities between programs to help programmers write faster and more efficient software.



Computer programming has never been easy. The first coders wrote programs by hand, tracing characters on graph paper before converting them into large stacks of punched cards that could be processed by a computer. One mistake, and it all might have to be redone.



Coders nowadays use many powerful tools that automate much of their work, from catching errors as you type, to testing your code before applying it. But otherwise, little has changed. One silly mistake can still crash entire software. And as systems become more and more complex, tracking down these errors becomes more and more difficult.
“Sometimes it can take several days for teams of coders to fix one bug,” says Justin Gottschlich, director of the machine programming research group at Intel.




This is why some people think that we should just get machines to program themselves. Automatic code generation has been a hot topic of research for a number of years. Microsoft is embedding basic code generation in its widely used software development tools, Facebook has created a system called Aroma that automatically terminates small programs, and DeepMind has developed a neural network that can create more efficient versions of simple algorithms than those developed by humans. Even the OpenAI GPT-3 language model can compose simple code snippets , such as web page layouts, from natural language queries.



Gottslich and colleagues call it machine programming.... Working with a group from Intel, MIT, and Georgia Institute of Technology in Atlanta, he developed a system called Machine Inferred Code Similarity , or MISIM, that can extract the meaning of a piece of code - what the code tells a computer - from in much the same way that Natural Language Processing (NLP) systems can read a paragraph written in English.



MISIM can then suggest other ways to write the code, suggesting fixes and ways to make it faster or more efficient. The tool's ability to understand what a program is trying to do allows it to identify other programs that are doing similar things. In theory, this approach could be used by machines that have written their own software based on a patchwork quilt of pre-existing software with minimal human control or input.



MISIM works by comparing snippets of code with millions of other programs that it has already seen, taken from a large number of online repositories. It first translates the code into a form that captures what it does, but ignores how it is written because two programs, written in very different ways, sometimes do the same thing. MISIM then uses the neural network to look for another code that has a similar meaning. In a preprint, Gottshlich and colleagues report that MISIM is 40 times more accurate than previous systems that try to do this, including Aroma.



MISIM is an exciting step forward, says Veselin Raychev, CTO of Swiss company DeepCode, whose error detection tools - some of the most advanced on the market - use neural networks trained in millions of programs to suggest improvements to coders as they write them.

But machine learning is still unable to predict whether something is a bug, Raichev says. This is because it is difficult to teach a neural network what is or is not an error if it is not labeled as such by a human.
According to him, there have been many interesting studies with deep neural networks and error correction, "but practically they are not yet there, by a very large margin." As a rule, AI bug-catching tools give a lot of false positives, he said.



MISIM handles this by using machine learning to identify similarities between programs, rather than directly detecting bugs. By comparing a new program to existing software that is known to be correct, it can alert the encoder to important differences that lead to errors.



Intel plans to use this tool as a guideline for in-house developers, suggesting alternatives to write code that are faster or more efficient. But since MISIM is not tied to the syntax of a particular program, it can do much more. For example, it can be used to translate code written in an older language such as COBOL to a more modern language such as Python. This is important because many institutions, including the US government , still rely on software written in languages ​​that few coders know how to maintain or update.



Ultimately, Gottslich believes that this idea can be applied to natural language. Combined with NLP (Natural Language Processing, not to be confused with Neuro Linguistic Programming ), the ability to work with the meaning of the code separately from its textual representation may one day allow people to write software by simply describing what they want to do in words, he says.

“Making small apps for your phone or things like that will help you in your day to day life - I think it's not that far,” says Gottshlich. "I would like 8 billion people to create software in any way that is natural for them."



All Articles