International Natural Language Programming

Recently, articles about new programming languages, as well as various ratings and forecasts related to the popularity of computer languages, have often come across.



New tools are also making themselves known , which in their work use their own formats for describing configuration files or a sequence of executable commands, which also very closely approximates them to the concept of "programming language".



The purpose of this article is to formulate expectations and possible implementation of an abstract programming language that can become a universal tool for communication between a computer and a person.



About programmers.



If you start from the very beginning, then once upon a time I heard a paraphrased statement, "every programmer should write his own database, text editor and programming language." And if I wrote the first two things a long time ago, the programming language has not yet worked out.



After all, how are programming languages ​​usually created?



Every programmer always has some kind of previous experience:



  • knowledge of one or more programming languages ​​(how can we do without it)
  • negative experience from using them (otherwise, if everything suits you, then why come up with something new?)
  • desire to get new opportunities (when something is missing in existing languages).


And before describing the syntax, choosing keywords and starting the main work: lexer, parser, base libraries, you need to answer the basic questions:



  • Compiler / interpreter / transpiler?
  • Static or dynamic typing?
  • Manual memory management or automatic with garbage collector?
  • Programming Model: OOP, Functional, Structural, or Something New?
  • Are inserts from other programming languages, etc. allowed?


I, probably, like most readers, have experience using several programming languages. Therefore, it has long been a practice that to solve a problem it is better to take a well-known language or even learn a new one, instead of starting to write your own.



Moreover, I don't want to invent another language just for the sake of a tick or for the sake of the language itself. I believe that the purpose of creating it should be outside the needs of the developer himself.



And it seems to me that we managed to determine the area for which the development of a programming language can be in demand, and the efforts spent on it can bring real benefits.



About non-programmers.



This area is natural language programming for non-programmers. I deliberately put the words "non-programmers" and "natural" in quotation marks, since these terms are very conditional.



After all, if not a programmer starts programming, then without realizing it, he automatically becomes a programmer ;-). By definition, a programming language cannot be "natural". More precisely, for computers, the Assembler language or a set of machine instructions will most likely be "natural".



Therefore, the goal is maximum - to bring the programming language closer to the natural human language.



This will not only make reading the text of the program more understandable for non-professionals, but will also allow you to start compiling programs simply by mastering written language, using the very minimum of basic rules.



But there is a very big problem in this wording!



Any programming language is international, because its syntax is independent of the natural language in which the programmer communicates.



And if the text of the program is in a "natural" language, then it will become understandable only for those who know this language, while simultaneously becoming incomprehensible to everyone else.



By way of illustration: once or twice .


If you fantasize about the wishes for such a language, then you see the following requirements and restrictions:



  • ( ), , , , .
  • / , «» , «» .
  • I really want to see tolerance in the new language, tolerance for confusion. Such a "feature" is present in writing in natural language, and despite the presence of typos, the meaning is almost always preserved. Naturally, in this case, one should not go so far as fanaticism. The compiler does not read minds and cannot really "understand" what the user meant, and yet it is quite common to ignore typos in the program text based on context (albeit with warning messages).


Nevertheless, such a language should remain just a programming language with all the possibilities of creating programs of any level of complexity, including functional and object-oriented programming and an unambiguous understanding of what has been written.



About a hypothetical language



Based on the rules of writing, the basic conventions and punctuation for a new language might look something like this:



  • Any text consists of sentences and comments. Suggestions are processed and comments are ignored.
  • A sentence consists of a sequence of terms, literals, and characters, separated by spaces and punctuation marks, and ends with an end-of-sentence character.
  • A term is a fused sequence of letters, numbers and symbols ":" and "_".
  • Literal - constants included directly in the text of the program, the type of which is uniquely determined. These are character strings in quotes, integers and real numbers, and some special formats (time, date).
  • Symbols - everything else symbols that do not belong to punctuation marks, whitespace, numbers and letters.
  • — , :



    • «.»,«;»,«!»,«?»,«…» — .
    • «=» — .
    • "" () — .
    • «()» — / .
    • «[]» — .
    • «{}» — .
    • «$» — .
    • «@» — .
    • «,» () — .
    • «:» () — .


If everything should be more or less clear with the assignment symbol, quotes, parentheses, and square brackets, because their purpose corresponds to that in the overwhelming majority of programming languages, then the purpose of the remaining characters (curly braces, colon, comma and system functions / variable) should be explained a little.



Since the goal of a hypothetical programming language is still writing programs, then it is necessary to provide for the possibility of inserting ordinary program code without taking into account all the possibilities and ambiguities that are inherent in any natural language.



This ability is also required to implement low-level functions and to interact with external libraries.



When creating such inserts, curly braces can be used, all the text between which will be inserted into the final file with little or no processing.



The symbols "$" - a system variable and "@" - a system function also serve similar purposes. If such a symbol is placed at the beginning of a word, then it will denote an object with a corresponding purpose. For example, "@exit" will mean a function, and "$ var" will mean a variable with appropriate names, and the objects themselves will become available both in normal code and in programmatic inserts inside curly braces.



Access to individual fields / methods of objects is organized in a similar way:

"object @ method" or "object $ field".



The comma character "," is used to indicate a sequence of equal logical blocks in one sentence or to create lists.



The colon character ":" is used to create lists and to indicate a logical relationship between two parts of a word / text, including the full module path.



For example, creating a list: Consequence / indication of a relationship: As you can see, the use of punctuation marks is taken from their direct purpose, adopted in writing, which should provide a certain trade-off between syntax in standard programming languages ​​and writing in natural language.



_: 1, 2, .





_:

- 1;

- 2;

- .












module:calc // «calc», «module»

super:module:example$var // «$var» .











About computers



Since we are talking about a programming language, we cannot do without standard algorithmic constructs: succession, branching and loops.



Following is easily described by the usual rules of natural language writing. In the case of sequential execution in one statement, operations and function calls are written sequentially, separated by commas. If they are located in different sentences, then they are written in the same way one after another. Moreover, paragraph formatting serves only for better perception of the text and logical separation of individual fragments.



When creating conditional and looping control structures, you will already need keywords. But since, according to the original wishes for the language, it is impossible to reserve the usual terms for writing algorithmic constructions, it is enough to indicate the symbol of the system function in front of the keywords, which will make it possible to distinguish an ordinary term from a key (control) word.



Naturally, while programming, these terms can be used, but this is not at all necessary. Since when setting up for a specific natural language, system functions and keywords must be assigned specific terms and use them already, for example:



= @goto,

= @label,

= @continue,

=@break ..








And the last in turn, but probably the most important in essence, the construction: passing parameters when calling functions. If we strive for a completely natural syntax, then we get the same natural language that is very difficult to analyze.



Nevertheless, it seems to me that it is possible to combine the two approaches by eliminating the mandatory use of parentheses, where it is permissible by syntax. But: In other words, for natural ordering of arguments, parentheses for functions and commas between parameters can be omitted. Although their use should be determined primarily by the target natural language, and not by the syntax.



: (1, 2(), 3=).

: 1 2 3=.












: ( 2() ).

: 2().

: (2 ).












About objections



I foresee well-founded objections to the use of such a language from programmers. Any program in it will turn out to be much more verbose than using the strict formal syntax of ordinary computer languages.



Therefore, let me remind you of its obligatory property - the ability to convert the text of a program from one language to another. This allows you to write programs using a strictly formal syntax without using redefined natural language terms, and to convert the source text into a "natural" language for a "non-programmer".



All Articles