We design a multi-paradigm programming language. Part 6 - Borrowing from SQL

We continue the story about the creation of a multi-paradigm programming language that combines a declarative logical style with an object-oriented and functional style, which would be convenient when working with semi-structured data and integrating data from disparate sources. The language will consist of two components closely integrated with each other: the declarative component will be responsible for describing the domain model, and the imperative or functional component will be responsible for describing the algorithms for working with the model and computing.



The hybrid language modeling component is a set of concepts-objects connected by logical relationships. I managed to talk about the main ways of defining concepts, including inheritance and defining relationships between them. And also about some of the nuances of logical programming, including the semantics of the negation operator and higher order logic. A complete list of publications on this topic can be found at the end of this article.



In the field of working with data, the indisputable leader is the SQL language. Some of its features, which turned out to be very convenient in practice, such as aggregation, later migrated to logic programming. Therefore, it will be useful to borrow from SQL as much as possible for the modeling component. In this article, I want to show you how nested queries, outer joins, and aggregation can be embedded in concept definitions. I will also talk about another type of concept, which is described using a function that generates objects (entities) in an algorithmic style without resorting to logical search. And I will show you how you can use arrays of objects as parent concepts by analogy with the SQL UNNEST operation that converts collections to table format and allows them to be joined to other tables in the FROM clause .



Anonymous definitions of concepts



In the SQL world, nested queries are a tool that is often used when it is necessary to obtain intermediate data for further processing in the main query. There is no such urgent need for intermediate data in the modeling component, since the way of obtaining them can be formalized as a separate concept. But there are cases where nested concept definitions would be handy.



Sometimes you need to slightly modify a concept, select its individual attributes, filter out values. If this modification is needed only in one place, then it makes no sense to create a separate concept with a unique name. This situation is often encountered when concepts are arguments to functions such as exists , find, or findOne , which check the deducibility of the concept, finding all or only the first object (entity) of the concept. Here you can draw an analogy with anonymous functions in functional programming languages, which are often used as arguments to functions such as map , find , filter , etc.



Consider the syntax for defining an anonymous concept. In general, it follows the syntax of a common concept definition, except that in some cases you can omit the attribute lists and the child concept name. If an anonymous concept is used as an argument to exists, then its name and the list of attributes are not important, it is enough to check that there is at least some result. The find and findOne functions may not need the concept name if the output is not used as a whole concept, but only as a set of attributes in an associative array. If no attributes are specified, then the inheritance mechanism is used by default and the attributes will be inherited from parent concepts. If no concept name is specified, it will be generated automatically.



Let's try to explain what was written above using a few examples. Using the exists function, you can check the deducibility or non-deducibility of a nested concept:



concept freeExecutor is executor e where not exists (
    task t where t.executor = e.id and t.status in ('assigned', 'in process')
)
      
      





In this example, the anonymous concept:



(task t where t.executor = e.id and t.status in ('assigned', 'in process'))
      
      





is actually a concept that inherits all the attributes of the task concept :



(concept _unanimous_task_1 as task t where t.executor = e.id and t.status in ('assigned', 'in process'))
      
      





The find function allows you to return as a list all the values ​​of a concept, which can then be associated with an attribute:



concept customerOrdersThisYear is customer c with orders where c.orders = find(
    (id = o.id, status = o.status, createdDate = o.createdDate, total = o.total) 
    from order o where o.customerId = c.id and o.createdDate > '2021-01-01'
)
      
      





In this example, we extend the notion of customer with a list of orders, which are objects containing the selected attributes of the order notion . We have specified a list of attributes for an anonymous concept, but its name is omitted.



Conditions in the section where anonymous concepts may include other attributes of the parent or subsidiary concepts, in this case c.id . A feature of anonymous concepts is that all such external variables and attributes must necessarily be associated with values ​​at the time of starting the search for solutions. Thus, the objects of the anonymous concept can be found only after finding the objects of the customer concept. ...



External connections



Anonymous concept definitions can also be used in the from section , where they represent parent concepts. Moreover, in the definition of an anonymous concept, it is possible to transfer some of the conditions linking it with other concepts, which will have a special effect. These conditions will be checked at the stage of finding a solution to the anonymous concept and will not affect the inference process of the child concept. Here you can draw an analogy between the conditions in the where section of the anonymous concept and the conditions in the JOIN ON section of the SQL language.



Thus, anonymous concepts can be used to implement a SQL analog of left outer join. This requires three things.



  1. First, replace the desired parent concept with an anonymous concept based on it and transfer all connections with other parent concepts to it.
  2. Second, to point out that the failure of the inference of this concept should not lead to an automatic failure of the inference of the entire child concept. To do this, you need to mark this parent concept with the optional keyword .
  3. And thirdly, in the where section of the child concept, you can check whether there are solutions for this anonymous concept.


Let's look at a small example:



concept taskAssignedTo (task = t, assignee = u, assigneeName) 
from task t, optional (user where id = t.assignedTo) u 
where assigneeName = if(defined(u), u.firstName + ' ' + u.lastName, 'Unassigned') 
      
      





Attributes of the taskAssignedTo concept include objects of the task, its executor and separately the name of the executor. The parent concepts are task and user , and the latter can be empty if the task does not already have an executor. It is wrapped in an anonymous concept definition, preceded by the optional keyword . The inference procedure will first find objects of the task concept , then, based on the user, it will create an anonymous concept, associating it with a specific value of the assignedTo attribute of the task concept . Keyword optional tells the inference routine that if the inference of this concept fails, its object will be associated with the special value UNDEFINED . And checking the result of its output at the child concept level allows the assigneeName attribute to set a default value. If the optional keyword was not specified, then failure to deduce an anonymous concept would result in the current branch of the child concept search failing. This would be analogous to the inner join of SQL.



Because the conditions in the where clause of the anonymous concept include the assignedTo attribute of the other parent concept task , then searching for objects of the concept user is possible only after binding task objects with values. They cannot be swapped:



from optional (user where id = t.assignedTo) u, task t 
      
      





Since at the initial stage the value of t.assignedTo will be unknown, it will not work to create a definition of an anonymous concept.



If in SQL the order of tables in the from section does not matter, then in Prolog the order of predicates in a rule uniquely determines the sequence of traversing the decision tree. The same can be said for the simulation component, whose output rule is based on the SLD resolution used in Prolog. In it, the output of objects of the left concept determines the constraints for the output of objects of the right one. Because of this, unfortunately, it will not work to implement the right outer join and full outer join operations in the same natural way. Since the cardinality of the set of results of the right parent concept may be greater than that of the left one, they will need to output in the opposite direction - from the right concept to the left. Unfortunately, the peculiarities of the chosen inference procedure impose their limitations on the functionality of the language. But the full outer join operation can be emulated by joining the inner,left and right unions:



concept outerJoinRelation( 
concept1Name, 
concept2Name, 
concept1Key,
concept2Key,
concept1 = c1,
concept2 = c2
) from <concept1Name> c1, <concept2Name> c2
where c1.<concept1Key> = c2.<concept2Key>;

concept outerJoinRelation(
concept1Name, 
concept2Name, 
concept1Key,
concept2Key,
concept1 = c1,
concept2 = null
) from <concept1Name> c1
where not exists( <concept2Name> c2 where c1.<concept1Key> = c2.<concept2Key>);

concept outerJoinRelation(
concept1Name, 
concept2Name, 
concept1Key,
concept2Key,
concept1 = null,
concept2 = c2
) from <concept2Name> c2
where not exists( <concept1Name> c1 where c1.<concept1Key> = c2.<concept2Key>);
      
      





This generalized concept is described using the higher-order logic described in the previous article . Its definition is divided into three parts. The first one will find intersections of concepts, the second - objects that the left concept has and the left one does not, and the third - vice versa. Since the names of each part are the same, the results of their inference will be combined.



Aggregation



Aggregation is an integral part of both relational algebra and logic programming. In SQL, the GROUP BY clause allows you to group rows that have the same key value into summary rows. It allows you to remove duplicate values ​​and is commonly used with aggregate functions such as sum , count , min , max , avg.... For each group of rows, aggregate functions return an ordinary value based on all rows in that group. In logic programming, aggregation has more complex semantics. This is due to the fact that in some cases of recursive definition of SLD rules, the resolution gets into an infinite loop and is not able to complete. As in the case of denial as a failure, the problem of recursion in the aggregation operation is solved using persistent model semantics or well-grounded semantics. I tried to briefly talk about these approaches in the previous article . But since the semantics of the modeling component should be as simple as possible, the standard SLD resolution is preferred. And the problem of avoiding infinite recursion is best solved by re-forming the connections between concepts.



Aggregation could naturally be implemented in a functional style using the compute component of a hybrid language. To do this, a function that collapses inference results into unique groups and calculates aggregate functions for each of them is sufficient. But dividing the definition of a concept into logical and functional parts will not be the most convenient solution for such an important tool as aggregation. Better to expand the syntax of the definition to include a grouping section and aggregation functions:



concept < > < > (
    < > = <>,
    ... 
)
group by < >, ... 
from 
    <  > <  > (
        < > = <> ,
        ...
    ),
    ...
where < >
      
      





The group by section , just like in SQL, contains a list of attributes by which the grouping is performed. Relationship expressions can also include aggregation functions. Expressions containing such functions will be considered undefined until the values ​​of all parent concepts are found and grouping is performed. Then their values ​​can be calculated for each group, associated with attributes and / or used to filter groups. With this lazy approach to evaluating and checking conditions, there is no need for a HAVING section separating filter conditions before and after grouping. The runtime will do this automatically.



The main aggregation functions are count , sum, avg , min , max . The purpose of the functions can be understood from their names. Since the modeling component can naturally work with composite data types, you can also add a function that returns grouped values ​​as a list. Let's call it group . Its input argument is an expression. The function returns a list of the results of evaluating this expression for each element in the group. By combining it with other functions, you can implement any arbitrary aggregation function. The group function will be more convenient than such SQL functions as group_concat or json_arrayagwhich are often used as an intermediate step to get an array of field values.



Grouping example:



concept totalOrders (
    customer = c,
    orders = group(o),
    ordersTotal = sum(o.total)
) group by customer
from customer c, order o 
where c.id = o.customerId and ordersTotal > 100
      
      





The orders attribute will contain a list of all user orders, ordersTotal - the total of all orders. The ordersTotal> 100 condition will be checked after the grouping is done and the sum function is calculated .



Concept defined by function



The declarative logical form of describing concepts is not always convenient. Sometimes it will be more convenient to set a sequence of calculations, the result of which will be the essence of the concept. This situation often occurs when you need to load facts from external data sources, for example, from a database, files, send requests to external services, etc. It will be convenient to represent the concept in the form of a function that translates the input query into a query to the database and returns the result of executing this query. Sometimes it makes sense to abandon a logical conclusion, replacing it with a specific implementation that takes into account the specifics of a specific problem and solves it more efficiently. It is also more convenient to describe in a functional style infinite sequences that generate concept entities, for example, a sequence of integers.



The principles of working with such concepts should be the same as with the rest of the concepts described above. The search for solutions should be launched with the same methods. They can themselves be used as parent concepts in concept definitions. Only the internal implementation of the search for a solution should differ. Therefore, we will introduce another way to define a concept using a function:



concept < > (
< >, ... 
)
by <  >
      
      





To define a concept defined using a function, you must specify a list of its attributes and a function for generating objects. I decided to make the list of attributes a mandatory element of the definition, since this will simplify the use of such a concept - to understand its structure, you will not have to study the function of generating objects.



Now let's talk about the function for generating objects. Obviously, it should receive a request as input - the initial values ​​of the attributes. Since these values ​​can be set or not, for convenience they can be placed in an associative array, which will be the input argument of the function. It will also be useful to know in which mode the search for concept meanings is started - find all possible values, find only the first, or only check the existence of a solution. Therefore, we add the search mode as the second input argument.



The result of evaluating the function should be a list of concept objects. But, since the inference procedure based on search with backtracking will consume these values ​​one at a time, it is possible to make the function's output argument not the list itself, but an iterator to it. This would make the definition of the concept more flexible, for example, it would allow, if necessary, to implement lazy evaluation or an infinite sequence of objects. You can use an iterator of any standard collection or create your own custom implementation. The collection element must be an associative array with the values ​​of the concept attributes. The essences of the concept will be created on their basis automatically.

Using an iterator as the return type has its drawbacks. It is more cumbersome and less user-friendly than simply returning a list of results. Finding the best option that combines versatility, simplicity and usability is a challenge for the future.



As an example, consider the concept that describes time intervals. Let's say we want to break the workday into 15 minute intervals. We can do this with a fairly simple function:



concept timeSlot15min (id, hour, minute) by function(query, mode) {
    var timeSlots = [];
    var curId = 1;
    for(var curHour = 8; curHour < 19; curHour += 1) {
        for(var curMinute = 0; curMinute < 60; curMinute += 15) {
            timeSlots.push({
                id: curId,
                hour: curHour,
                minute: curMinute;
            });
            curId++;
        }
    }
    return timeSlots.iterator();
}
      
      





The function returns an iterator for all possible values ​​of a 15-minute time interval for a working day. It can be used, for example, to search for free slots that have not yet been booked:



concept freeTimeSlot is timeSlot15min s where not exists (bookedSlot b where b.id = s.id)
      
      





The function does not check the result of calculations for compliance with the query query ; this will be done automatically when converting an array of attributes into an entity. But if necessary, query fields can be used to optimize the function. For example, form a database query based on the query fields for a concept.



The concept through function combines logical and functional semantics. If in the functional paradigm the function calculates the result for the given values ​​of the input arguments, then in the logical paradigm there is no division into output and input arguments. Only part of the arguments can be given, and in any combination, and the function needs to find the values ​​of the remaining arguments. In practice, it is not always possible to implement such a function capable of performing calculations in any direction, so it makes sense to limit the possible combinations of free arguments. For example, declare that some arguments must be bound to values ​​before evaluating a function. To do this, mark such attributes in the definition of the concept with the keyword required .



As an example, consider the concept that defines the values ​​of a certain exponential scale.



concept expScale (value, position, required limit) by function(query, mode) {
return {
    _curPos = 0,
    _curValue = 1,
    next: function() {
        if(!this.hasNext()) {
            return null;
        }
        var curItem = {value: this._curValue, position: this._curPosition, limit:  query.limit};
        this._curPos += 1;
        this._curValue = this._curValue * Math.E;
        return curItem;
    },
    hasNext: function() {
        return query.limit == 0 || this._curPos < query.limit; 
    }
}}
      
      





The function returns an iterator that generates concept entities using lazy evaluation. The size of the sequence is limited by the value of the limit attribute , but if it is zero, it becomes infinite. Concepts based on infinite sequences must be used with extreme caution, as they do not guarantee that the inference will complete. The limit attribute is auxiliary in nature and is used to organize calculations. We cannot infer it from the values ​​of other attributes, it must be known before the calculation starts, so it was marked as required



We have considered one of the options for how a concept as a function might look. But issues of safety and usability of such concepts require more detailed research in the future.



Flattening nested collections



Some SQL dialects that can work with data in object form support an operation such as UNNEST , which converts the contents of a collection to a table format (rowset) and adds the resulting table to the FROM clause . It is usually used to flatten objects with nested structures, in other words, to flatten or flatten them. Examples of such languages ​​are BigQuery and SQL ++.



Let's say we store user information as a JSON object:



[ {
"id":1,
"alias":"Margarita",
"name":"MargaritaStoddard",
"nickname":"Mags",
"userSince":"2012-08-20T10:10:00",
"friendIds":[2,3,6,10],
"employment":[{
    "organizationName":"Codetechno",
    "start-date":"2006-08-06"
},
{
    "organizationName":"geomedia",
    "start-date":"2010-06-17",
    "end-date":"2010-01-26"
}],
"gender":"F"
},
{
"id":2,
"alias":"Isbel",
"name":"IsbelDull",
"nickname":"Izzy",
"userSince":"2011-01-22T10:10:00",
"friendIds":[1,4],
"employment":[{
    "organizationName":"Hexviafind",
    "startDate":"2010-04-27"
}]
}, 
…]
      
      





User objects store nested collections with lists of friends and places of work. It is possible to extract the attached information about the user's places of work by gluing it together with the data about the user taken from the upper level of the object using a SQL ++ query:



SELECT u.id AS userId, u.name AS userName, e.organizationName AS orgName
FROM Users u UNNEST u.employment e
WHERE u.id = 1;
      
      





The result will be:



[ {
"userId": 1,
"userName": "MargaritaStoddard",
"orgName": "Codetechno"
}, {
"userId": 1,
"userName": "MargaritaStoddard",
"orgName": "geomedia"
} ]
      
      





This operation is discussed in more detail here .



Unlike SQL, in the modeling component, the embedded data must be converted not to tabular, but to object format. The concepts discussed above will help us with this - a concept defined through a function and an anonymous concept. A concept through a function will allow you to transform a nested collection into object format, and an anonymous concept will allow you to embed its definition into the list of parent concepts and access the values ​​of their attributes containing the desired nested collection.



Since the complete definition of a concept through a function is too cumbersome to use as an anonymous concept:



concept conceptName(attribute1, attribute2, ...) by function(query, mode) {...}
      
      





we need to find a way to shorten it. First, you can get rid of the title of the function definition with the query and mode parameters . In the position of the parent concept, the mode argument will always be "find all concept values". The query argument will always be empty, since dependencies on the attributes of other concepts can be embedded in the body of the function. The concept keyword can also be dropped. Thus, we get:



conceptName(attribute1, attribute2, ...) {…}
      
      





If the name of the concept is not important, then it can be omitted, and it will be generated automatically:



(attribute1, attribute2, ...) {…}
      
      





If in the future it is possible to create a compiler that can deduce a list of attributes from the type of objects returned by a function, then the list of attributes can be discarded:



{…}
      
      







So, an example with users and their places of work as a concept would look like this:



concept userEmployments (
    userId = u.id,
    userName = u.name,
    orgName = e.orgName
) from users u, {u.employment.map((item) => {orgName: item.organizationName}).iterator()} e
      
      





The solution turned out to be a bit verbose, but universal. At the same time, if no transformation of the objects of the nested collection is required, then it can be significantly simplified:



concept userEmployments (
    userId = u.id,
    userName = u.name,
    orgName = e. organizationName
) from users u, {u.employment.iterator()} e
      
      





conclusions



This article focused on two issues. First, the transfer of some features of the SQL language to the modeling component: nested queries, outer joins, aggregations, and joins with nested collections. Secondly, the introduction of two new constructions into the modeling component: anonymous definitions of concepts and concepts defined through a function.



An anonymous concept is an abbreviated form of a concept definition, intended to be used as arguments to functions ( find , findOne, and exists ) or as a nested concept definition in a where clause... It can be thought of as analogous to anonymous function definitions in functional programming languages.



A concept defined through a function is a concept, the way of generating objects of which is expressed using an algorithm in an explicit form. It is a kind of "interface" between the worlds of functional or object-oriented programming and logic programming. It will be useful in many cases when a logical way of defining a concept is not convenient or impossible: for example, to load the initial facts from their databases, files or from requests to remote services, to replace the universal logical search with its specific optimized implementation, or to implement any arbitrary rules creating objects.



Interestingly, borrowings from SQL, such as nested queries, outer joins, and nested collection joins, did not require major changes in the logic of the modeling component and were implemented using concepts such as anonymous concepts and concepts through a function. This suggests that these types of concepts are flexible and versatile tools with great expressive power. I think there are many more interesting ways to use them.



So, in this and two previous articles, I described the basic concepts and elements of the modeling component of a hybrid programming language. But, before moving on to the issues of integrating a modeling component with a computation component that implements a functional or object-oriented programming style, I decided to devote the next article to its possible options for its application. From my point of view, the modeling component has advantages over traditional query languages, primarily SQL, and can be used on its own without deep integration with the computation component, which I want to demonstrate with several examples in the next article.



The full scientific text in English is available at: papers.ssrn.com/sol3/papers.cfm?abstract_id=3555711



Links to previous publications:



Designing a multi-paradigm programming language. Part 1 - What is it for?

We design a multi-paradigm programming language. Part 2 - Comparison of building models in PL / SQL, LINQ and GraphQL We

design a multi-paradigm programming language. Part 3 - Overview of knowledge representation languages ​​We

design a multi-paradigm programming language. Part 4 - Basic constructions of the modeling language We

design a multi-paradigm programming language. Part 5 - Features of Logic Programming



All Articles