We design a multi-paradigm programming language. Part 4 - Basic Constructions of the Modeling Language

We continue the story about the creation of a multi-paradigm programming language that combines a declarative style with an object-oriented and functional one, which would be convenient when working with semi-structured data and integrating data from disparate sources. Finally, after the introduction and reviews of existing multi-paradigm technologies and languages ​​of knowledge representation, we got to the description of that part of the hybrid language that is responsible for describing the domain model. I named it the modeling component .



The modeling component is intended for a declarative description of a domain model in the form of an ontology - a network of data instances (facts) and abstract concepts interconnected through relationships. It is based on frame logic - a hybrid of an object-oriented approach to knowledge representation and first-order logic. Its main element is a concept that describes a modeled object using a set of attributes. The concept is built on the basis of other concepts or facts, the initial concepts will be called parental , the derivative - child... Relationships bind the values ​​of attributes of the child and parent concepts or constrain their possible values. I decided to include relationships in the definition of the concept, so that all information about it is, if possible, in one place. The syntax style for concept definitions will be similar to SQL - attributes, parent concepts and relationships between them should be separated into different sections.



In this post, I want to present the main ways of defining concepts.



First, a concept that is built by transforming parental concepts .

Secondly, the object-oriented style implies inheritance , which means that you need a mechanism that allows you to create a concept by inheriting the attributes and relations of parent concepts, expanding or narrowing them.



Third, I believe that a mechanism would be useful to define the relationship between peer concepts - without dividing into child and parent concepts.

Now let's get down to a detailed consideration of the main types of modeling components.



Let's start with the facts



Facts represent a description of specific knowledge about a domain in the form of a named set of key-value pairs:



fact < > {
	< > : < >
	...	
}


For instance:



fact  product {
	name: “Cabernet Sauvignon”,
	type: “red wine”,
	country: “Chile”
}


The fact name may not be unique, for example, there may be many products with different names, types and countries of origin. We will consider the facts to be identical if their names and the names and values ​​of their attributes coincide.



An analogy can be drawn between facts in the modeling component and facts in Prolog. They differ only in syntax. In Prolog, fact arguments are identified by their position, and the attributes of the facts of the modeling component are identified by name.



Concepts



A concept is a structure that describes an abstract entity and is based on other concepts and facts. The definition of a concept includes a name, lists of attributes, and child concepts. And also a logical expression describing the dependencies between its (child concept) attributes and the attributes of parent concepts, allowing you to display the value of the child concept attributes:



concept < > < > (
		< > = <>,
		...	
) 
from 
	<  > <  > (
		< > = <> 
	   	...
	 ),
where < >


An example of defining profit based on revenue and cost :



concept profit p (
	value = r.value – c.value,
	date
) from revenue r, cost c
where p.date = r.date = c.date


The definition of a concept is similar in form to an SQL query, but instead of the table name, you must specify the names of the parent concepts, and instead of the returned columns - the attributes of the child concept. In addition, a concept has a name by which it can be referred to in definitions of other concepts or in model queries. The parent concept can be either the concept itself or the facts. The relationship expression in the where clause is a boolean expression that can include logical operators, equality conditions, arithmetic operators, function calls, etc. Their arguments can be variables, constants, and references to attributes of both parent and child concepts. Attribute references have the following format:



< >.< >


Compared to frame logic, in the definition of a concept, its structure (attributes) is combined with relationships with other concepts (parent concepts and expression of relationships). From my point of view, this allows you to make the code more understandable, since all information about the concept is collected in one place. It also complies with the principle of encapsulation in the sense that implementation details of a concept are hidden within its definition. For comparison, a small example in the language of frame logic can be found in the previous publication .



Expression of relations has a conjunctive form (consists of expressions connected by logical operations "AND") and must include equality conditions for all attributes of the child concept, sufficient to determine their values. In addition, it can include conditions that limit the meaning of parent concepts or connect them with each other. If not all parent concepts are related in the where clause , the inference engine will return all possible combinations of their values ​​as a result (similar to the FULL JOIN operation in SQL).



For convenience, some of the conditions for equality of attributes can be placed in the attributes section of the child and parent concepts. For example, in the definition of profit, the condition for the attributevalue is moved to the attributes section, and for the date attribute it is left in the where section . You can also transfer them to the from section :



concept profit p (
	value = r.value – c.value,
	date = r.date
) from revenue r, cost c (date = r.date)


This syntactic sugar allows you to make the dependencies between attributes more explicit and distinguish them from other conditions.



The concepts follow the rules in Prolog but have slightly different semantics. Prolog focuses on constructing logically related statements and questions to them. Concepts are primarily intended for structuring input data and extracting information from them. Concept attributes correspond to the arguments of Prolog terms. But if in Prolog the arguments of terms are bound using variables, then in our case the attributes can be directly accessed by their names.



Since the list of parent concepts and the conditions of the relationship are separated into separate sections, the inference will be slightly different from that in Prolog. I will describe in general its algorithm. Parent concepts will be output in the order in which they are specified in the from section . The search for a solution for the next concept is performed for each partial solution of the previous concepts in the same way as in the SLD resolution. But for each partial solution, the validity of the relation expression from the where clause is checked.... Since this expression has the form of a conjunction, each subexpression is tested separately. If the subexpression is false, then this partial solution is rejected and the search proceeds to the next one. If some of the subexpression arguments are not yet defined (not associated with values), then its validation is postponed. If the subexpression is an equality operator and only one of its arguments is defined, then the inference system will find its value and try to associate it with the remaining argument. This is possible if the free argument is an attribute or variable.



For example, when displaying the entities of the profit concept , the entities of the revenue concept and, accordingly, the values ​​of its attributes will be found first . Then the equality p.date = r.date = c.datein the where section will allow you to associate date attributes and other concepts with values . When the logical search gets to the concept of cost , the value of its date attribute will already be known and will be the input argument for this branch of the search tree. I plan to talk in detail about inference algorithms in one of the next publications.



The difference from Prolog is that in Prolog rules everything is predicates - and references to other rules and built-in predicates of equality, comparison, etc. And the order of their checking must be specified explicitly, for example, first there must be two rules and then equality of variables:



profit(value,date) :- revenue(rValue, date), cost(cValue, date), value = rValue – cValue


In this order, they will be executed. In the modeling component, it is assumed that all calculations of conditions in the where clause are deterministic, that is, they do not require recursive diving into the next search branch. Since their computation depends only on their arguments, they can be computed in arbitrary order as the arguments are bound to values.



As a result of inference, all attributes of the child concept must be associated with values. And also the expression of relations must be true and not contain undefined subexpressions. It is worth noting that the derivation of parenting concepts does not have to be successful. There are cases when it is required to check the failure of the derivation of the parent concept from the source data, for example, in negation operations. The order of parent concepts in the from section determines the order in which the decision tree is traversed. This makes it possible to optimize the search for a solution, starting with those concepts that more strongly limit the search space.



The task of inference is to find all possible substitutions of attributes of the child concept and represent each of them as an object. Such objects are considered identical if the names of their concepts, names and attribute values ​​match.



It is considered acceptable to create several concepts with the same name, but with different implementations, including a different set of attributes. These can be different versions of the same concept, related concepts that can be conveniently combined under one name, identical concepts from different sources, etc. In the logical conclusion, all existing definitions of the concept will be considered, and the results of their search will be combined. Several concepts with the same name are analogous to the rule in Prolog, in which a list of terms has a disjunctive form (terms are ORed).



Concept inheritance



One of the most common relationships between concepts is hierarchical relationships such as genus-species. Their peculiarity is that the structures of the child and parent concepts will be very similar. Therefore, the support of the inheritance mechanism at the syntax level is very important; without it, programs will be full of repetitive code. When constructing a network of concepts, it would be convenient to reuse both their attributes and relationships. While the list of attributes is easy to expand, shorten, or redefine some of them, the situation with modifying relations is more complicated. Since they are a logical expression in conjunctive form, it is easy to add additional subexpressions to it. However, deleting or changing can require significant syntax complication. The benefits of this are not so obvioustherefore, we will postpone this task for the future.



You can declare a concept based on inheritance using the following construction:



concept < > < > is 
	<  > <  > ( 
		< > = <>, 
		...
	 ),
	...
with < > = <>, ...
without <  >, ...
where < >


The is section contains a list of inherited concepts. Their names can be specified directly in this section. Or, specify the complete list of parent concepts in the from section , and in is - aliases of only those of them that will be inherited:



concept < > < > is 
	<  >,
from 
	<  > <  > ( 
		< > = <> 
		   ...
	 ),
with < > = <>, ...
without <  >, ...
where < >


The with section allows you to expand the list of attributes of inherited concepts or override some of them, the without section - to shorten.



The inference algorithm of a concept based on inheritance is the same as that of the concept discussed above. The only difference is that the list of attributes is automatically generated based on the list of attributes of the parent concept, and the expression of relations is supplemented with operations of equality of attributes of the child and parent concepts.



Let's consider several examples of using the inheritance mechanism. Inheritance allows you to create a concept based on an existing one, getting rid of those attributes that are meaningful only for the parent, but not for the child concept. For example, if the source data is presented in the form of a table, then the cells of certain columns can be given their own names (getting rid of the attribute with the column number):



concept revenue is tableCell without columnNum where columnNum = 2


You can also convert multiple related concepts into one generic form. The with section is needed to convert some of the attributes to the general format and add the missing ones. For example, the source data can be documents of different versions, the list of fields of which has changed over time:



concept resume is resumeV1 with skills = 'N/A'
concept resume is resumeV2 r with skills = r.coreSkills


Let's assume that the first version of the "Resume" concept did not have an attribute with skills, and the second version had a different name.



Expanding the list of attributes may be required in many cases. Common tasks are changing the format of attributes, adding attributes that functionally depend on existing attributes or external data, etc. For instance:



concept price is basicPrice with valueUSD = valueEUR * getCurrentRate('USD', 'EUR')


It is also possible to simply combine several concepts under one name without changing their structure. For example, to indicate that they are of the same genus:



concept webPageElement is webPageLink
concept webPageElement is webPageInput


Or create a subset of a concept by filtering out some of its entities:



concept exceptionalPerformer is employee where performanceEvaluationScore > 0.95


Multiple inheritance is also possible, in which a child concept inherits the attributes of all parent concepts. If there are identical attribute names, priority will be given to the parent concept to the left of the list. You can also resolve this conflict manually by explicitly overriding the desired attribute in the section with. For example, this kind of inheritance would be convenient if you need to collect several related concepts in one "flat" structure:



concept employeeInfo is employee e, department d where e.departmentId = d.id 


Inheritance without changing the structure of concepts complicates the verification of the identity of objects. As an example, consider the definition of exceptionalPerformer . Queries on the parent ( employee ) and child ( exceptionalPerformer ) concepts will return the same employee entity. The objects representing it will be identical in meaning. They will have a common data source, the same list and attribute values, for a different concept name, depending on which concept the query was made to. Therefore, the object equality operation must take this feature into account. Concept names are considered equal if they coincide or are linked by a transitive inheritance relationship without changing the structure.



Inheritance is a useful mechanism that allows you to explicitly express relationships between concepts such as class-subclass, private-general, and set-subset. And also get rid of duplicate code in concept definitions and make the code more understandable. The inheritance mechanism is based on adding / removing attributes, combining several concepts under one name and adding filtering conditions. No special semantics are embedded in it, everyone can perceive and apply it as they want. For example, build a hierarchy from the particular to the general, as in the examples with the concepts resume , price and webPageElement . Or, conversely, from general to specific, as in the examples with the concepts of revenue and exceptionalPerformer... This will allow you to flexibly adjust to the specifics of data sources.



Concept for describing relationships



It was decided that for the convenience of understanding the code and facilitating the integration of the modeling component with the OOP model, the relationship of the child concept with the parent should be built into its definition. Thus, these relations define the way of obtaining a child concept from parent ones. If the domain model is built in layers, and each new layer is based on the previous one, this is justified. But in some cases, the relationship between concepts must be declared separately, and not included in the definition of one of the concepts. It can be a universal relationship that you want to define in general terms and apply to different concepts, for example, the Parent-Child relationship. Either a relation connecting two concepts must be included in the definition of both concepts, so that it would be possible to find both the essence of the first concept with the known attributes of the second, and vice versa.Then, in order to avoid code duplication, it will be convenient to set the relation separately.



In the definition of a relationship, it is necessary to list the concepts included in it and set a logical expression connecting them to each other:



relation < > 
between <  > <  > (
	< > = <>,
 	 ...	
),
...
where < >


For example, a relationship describing nested rectangles can be defined as follows:



relation insideSquareRelation between square inner, square outer 
where inner.xLeft > outer.xLeft and inner.xRight < outer.xRight 
and inner.yBottom > outer.yBottom and inner.yUp < outer.yUp


Such a relationship, in fact, is a common concept, the attributes of which are the essences of nested concepts:



concept insideSquare (
	inner = i
	outer = o												
) from square i, square o
where i.xLeft > o.xLeft and i.xRight < o.xRight 
and i.yBottom > o.yBottom and i.yUp < o.yUp


A relationship can be used in concept definitions along with other parent concepts. The concepts included in the relationship will be accessible from the outside and will play the role of its attributes. The attribute names will match the nested concept aliases. The following example states that the HTML form includes those HTML elements that are located inside it on the HTML page:



oncept htmlFormElement is e 
from htmlForm f, insideSquareRelation(inner = e, outer = f), htmlElement e


When searching for a solution, all the values ​​of the htmlForm concept will be found first , then they will be associated with the nested concept outer of the relation insideSquare and the values ​​of its inner attribute are found . At the end, those inner values that are related to the concept of htmlElement will be filtered .



The relationship can also be given functional semantics - it can be used as a function of a Boolean type to check whether the relationship is satisfied for the given nested concept entities:



oncept htmlFormElement is e 
from htmlElement e, htmlForm f
where  insideSquareRelation(e, f)


Unlike the previous case, here the relation is treated as a function, which will affect the order of inference. The evaluation of the function will be deferred until the moment when all its arguments are associated with values. That is, first all combinations of values ​​of the concepts htmlElement and htmlForm will be found , and then those that do not correspond to the relation insideSquareRelation will be filtered out . I plan to talk in more detail about the integration of logical and functional programming paradigms in one of the next publications.



Now it's time to look at a small example.



The definitions of facts and basic types of concepts are sufficient to implement the example with debtors from the first publication. Suppose we have two CSV files storing customer information (customer ID, name and email address) and invoices (account ID, customer ID, date, amount due, amount paid).



And also there is a certain procedure that reads the contents of these files and converts them into a set of facts:



fact cell {
	table: “TableClients”,
	value: 1,
	rowNum: 1,
	columnNum: 1
};
fact cell {
	table: “TableClients”,
	value: “John”,
	rowNum: 1,
	columnNum: 2
};
fact cell {
	table: “TableClients”,
	value: “john@somewhere.net”,
	rowNum: 1,
	columnNum: 3
};
fact cell {
	table: “TableBills”,
	value: 1,
	rowNum: 1,
	columnNum: 1
};
fact cell {
	table: “TableBills”,
	value: 1,
	rowNum: 1,
	columnNum: 2
};
fact cell {
	table: “TableBills”,
	value: 2020-01-01,
	rowNum: 1,
	columnNum: 3
};
fact cell {
	table: “TableBills”,
	value: 100,
	rowNum: 1,
	columnNum: 4
};
fact cell {
	table: “TableBills”,
	value: 50,
	rowNum: 1,
	columnNum: 5
};


First, let's give the table cells meaningful names:



concept clientId is cell where table = “TableClients” and columnNum = 1;
concept clientName is cell where table = “TableClients” and columnNum = 2;
concept clientEmail is cell where table = “TableClients” and columnNum = 3;
concept billId is cell where table = “TableBills” and columnNum = 1;
concept billClientId is cell where table = “TableBills” and columnNum = 2;
concept billDate is cell where table = “TableBills” and columnNum = 3;
concept billAmountToPay is cell where table = “TableBills” and columnNum = 4;
concept billAmountPaid is cell where table = “TableBills” and columnNum = 5;


Now you can combine cells of one row into a single object:



concept client (
	id = id.value,
	name = name.value,
	email = email.value
) from clientId id, clientName name, clientEmail email
where id.rowNum = name.rowNum = email.rowNum;


concept bill (
	id = id.value,
	clientId = clientId.value,
	date = date.value,
	amountToPay = toPay.value,
	amountPaid = paid.value
) from billId id, billClientId clientId, billDate date, billAmountToPay  toPay,  billAmountPaid  paid
where id.rowNum = clientId.rowNum = date.rowNum = toPay.rowNum = paid.rowNum;


Let's introduce the concepts "Unpaid invoice" and "Debtor":



concept unpaidBill is bill where amountToPay >  amountPaid;
concept debtor is client c where exist(unpaidBill {clientId: c.id});


Both definitions use inheritance, the concept unpaidBill is a subset of the concepts bill , debtor - the concept of client . The definition of debtor contains a subquery for the unpaidBill concept . We will consider in detail the mechanism of nested queries later in one of the following publications.



As an example of a "flat" concept, let us also define the concept of "Customer debt", in which we combine some fields from the concepts of "Customer" and "Account":



concept clientDebt (
	clientName = c.name,
	billDate = b.date,
	debt = b. amountToPay – b.amountPaid
) from unpaidBill b, client c(id = b.client); 


The dependence between the attributes of the concepts client and bill is moved to the from section , and the dependencies of the child concept clientDebt - to the section of its attributes. If desired, they can all be placed in the where section - the result will be the same. But from my point of view, the current version is more concise and better emphasizes the purpose of these dependencies - to define relationships between concepts.



Now let's try to define the concept of a malicious defaulter who has at least 3 unpaid invoices in a row. To do this, you need a relationship that allows you to order the invoices of one customer by their date. A generic definition would look like this:



relation billsOrder between bill next, bill prev
where next.date > prev.date and next.clientId = prev.clientId and not exist(
    bill inBetween 
    where  next.clientId = inBetween.clientId 
    and  next.date > inBetween.date  > prev.date
);


It states that two invoices go in a row if they belong to the same customer, the date of one is greater than the date of the other, and there is no other invoice lying between them. At this stage, I do not want to dwell on the computational complexity of such a definition. But if, for example, we know that all invoices are issued with an interval of 1 month, then it can be greatly simplified:



relation billsOrder between bill next, bill prev
where next.date = prev.date + 1 month and next.clientId = prev.clientId;


The sequence of 3 unpaid invoices will look like this:



concept unpaidBillsSequence (clientId = b1.clientId, bill1 = b1, bill2 = b2, bill3 = b3) 
from 
    unpaidBill b1, 
    billsOrder next1 (next = b1, prev = b2)
    unpaidBill b2
    billsOrder next2 (next = b2, prev = b3)
    unpaidBill b3;


In this concept, first all unpaid invoices will be found, then the next invoice will be found for each of them using the next1 relation . The notion b2 will allow you to verify that this invoice is unpaid. By the same principle, using next2 and b3 , the third unpaid invoice in a row will be found. The customer identifier has been added to the list of attributes separately, in order to further facilitate the connection of this concept with the concept of customers:



concept hardCoreDefaulter is client c where exist(unpaidBillsSequence{clientId: c.id});


The debtor example demonstrates how a domain model can be fully described in a declarative style. Compared to the implementation of this example in OOP or functional style, the resulting code is very concise, understandable and close to the description of the problem in natural language.



Brief conclusions.



So, I proposed three main kinds of concepts of the hybrid language modeling component:



  • concepts created on the basis of the transformation of other concepts;
  • concepts that inherit the structure and relationships of other concepts;
  • concepts that define relationships between other concepts.


These three types of concepts have different forms and purposes, but the internal logic of finding solutions is the same for them, only the method of forming the list of attributes differs.



Concept definitions resemble SQL queries - both in form and in the internal logic of execution. Therefore, I hope that the proposed language will be understandable to developers and have a relatively low entry threshold. And additional features such as the use of concepts in the definitions of other concepts, inheritance, derived relations, and recursive definitions will allow you to go beyond SQL and make it easier to structure and reuse code.



Unlike RDF and OWL, the modeling component does not distinguish between concepts and relationships — everything is concepts. In contrast to the languages ​​of frame logic, frames, which describe the structure of a concept, and the rules that define connections between them, are combined together. Unlike traditional logic programming languages ​​such as Prolog, the main element of the model is concepts that have an object-oriented structure, and not rules that have a flat structure. This language design may not be as convenient for creating large-scale ontologies or a set of rules, but it is much better suited for working with semi-structured data and for integrating disparate data sources. The concepts of the modeling component are close to the classes of the OOP model, which should facilitate the task of including a declarative description of the model in the application code.



The description of the modeling component is not yet complete. In the next article, I plan to discuss such issues from the world of computer logic as boolean variables, negation, and elements of higher-order logic. And after that - nested definitions of concepts, aggregation and concepts that generate their entities using a given function.



The full text in a scientific style in English is available at: papers.ssrn.com/sol3/papers.cfm?abstract_id=3555711



Links to previous publications:



Designing a multi-paradigm programming language. Part 1 - What is it for?

We design a multi-paradigm programming language. Part 2 - Comparison of Model Building in PL / SQL, LINQ and GraphQL

We design a multi-paradigm programming language. Part 3 - Overview of Knowledge Representation Languages



All Articles