We design a multi-paradigm programming language. Part 2 - Comparison of Model Building in PL / SQL, LINQ and GraphQL

In the last postI raised the question that the business logic of modern information systems includes a lot of elements, the descriptions of which are declarative in nature: the structure of concepts, the relationships between them, conditions, rules, the transformation of concepts when moving from one layer of the application to another, their unification, filtering, aggregation, etc. From my point of view, functional and object-oriented styles are inferior to logical ones in terms of the convenience of software implementation of the domain model. The logical style conveys relationships between concepts in a more compact and natural way. Therefore, I set myself the goal of creating a hybrid programming language that would combine an object-oriented or functional paradigm with a logical one. Moreover, the logical component should be convenient for describing the domain model - the structure of its concepts,as well as relationships and dependencies between them.



In this post, I want to talk about some popular languages ​​and technologies that include elements of declarative programming - PL / SQL , MS LINQ, and GraphQL . I will try to figure out what tasks are solved in them using declarative programming, how closely the declarative and imperative approaches are intertwined, what advantages it gives, and what ideas can be learned from them.



SQL Procedural Extensions



Let's start with an area in which this association has long become an industry standard - data access languages. The most famous of these is PL / SQL, a procedural extension of the SQL language. This language allows you to process data in a relational database using both imperative (variables, control statements, functions, objects) and declarative programming styles (SQL expressions). Using an SQL query, we can describe what properties the data we need has - what fields are needed, from which tables to take them, how they are related to each other, what constraints they must comply with, how they must be aggregated, etc. And the database server will independently draw up a query execution plan and find all possible sets of fields that meet the specified conditions. The procedural part of PL / SQL allows you to implement those taskswhich are difficult or impossible to express in a declarative form - to process the result of a query in a loop, perform arbitrary calculations, structure the code in functions and classes.



The procedural and declarative components of the language are tightly integrated. PL / SQL allows you to declare functions, execute queries inside them and return their results, use functions inside a query, passing them the values ​​of table fields as arguments. You can access the results of a query using cursors and then imperatively loop through all the records received. Cursors give you more control over the contents of tables and allow you to implement much more complex data processing logic than using SQL alone. A cursor can be assigned to a cursor variable and passed as an argument to functions, procedures, or even a client application. The request code itself can be generated dynamically by a sequence of imperative commands.The combination of procedures and queries, using some tweaks, allows you to implement recursive queries. There are even object-oriented features in PL / SQL that allow you to declare composite data types for table fields, include methods in them, and create classes through inheritance.



PL / SQL allows you to implement business logic on the database server side. Moreover, the implementation of the domain model will be quite close to its description. The basic concepts of the domain model will be mapped to the relational data model. The concepts will correspond to tables, attributes - their fields. Constraints on field values ​​can be embedded in table descriptions. And relationships with other tables can be set using foreign keys. The abstract concepts constructed on the basis of the base ones will correspond to the view. They can be used in queries along with tables, including for building other views. Views are built on the basis of queries, allowing you to harness the full power and flexibility of SQL. In this way,from tables and views, you can build a fairly complex and multi-level domain model completely in a declarative style. And anything that does not fit well into the declarative style can be implemented using procedures and functions.



The main problem is that the PL / SQL code is executed exclusively on the DB server side. This makes it difficult to scale such a solution. In addition, the resulting model will be rigidly bound to the relational database and it will be problematic to include data from other sources in it.



Language Integrated Query



Language Integrated Query (LINQ) is a popular component of the .NET platform that allows you to naturally include SQL query expressions in your main object-oriented language code. In contrast to PL / SQL, which adds an imperative paradigm to SQL on the database server side, LINQ brings SQL to the application level. Because of this, LINQ queries can be used to retrieve data not only from relational databases, but also from collections of objects, XML documents, and other LINQ queries.



LINQ architecture is quite flexible and query definitions are deeply integrated with the OOP model. LINQ allows you to create your own providers to access new data sources. You can also set your own way of executing the query and, for example, convert the LINQ expression tree of the query into a query to the desired data source. You can use lambda expressions and functions defined in your application code in your request body. True, in the case of LINQ to SQL, the query will be executed on the database server side, where these functions will not be available, but stored procedures can be used instead. The request is the essence of the first-level language, you can work with it as with an ordinary object. The compiler is able to automatically infer the type of the query result and generate the appropriate class, even if it has not been explicitly declared.



Let's try using LINQ to build a domain model as a set of queries. The original facts can be placed in lists on the application side or in tables on the database side, and abstract concepts can be formatted as LINQ queries. LINQ allows you to build queries based on other queries by specifying them in the FROM clause. This allows you to construct a new concept based on existing ones. The fields in the SELECT section will correspond to the attributes of the concept. And the WHERE section will contain dependencies between concepts. An example with invoices from a previous publication will look like this.



We will place objects with accounts and customer information in the lists:



List<Bill> bills = new List<Bill>() { ... };
List<Client> clients = new List<Client>() { ... };


And then let's build queries for them to get unpaid bills and debtors:



IEnumerable<Bill> unpaidBillsQuery =
from bill in bills
where bill.AmountToPay > bill.AmountPaid 
select bill;
IEnumerable<Client> debtorsQuery =
from bill in unpaidBillsQuery 
join client in clients on bill.ClientId equals client.ClientId
select client;


The domain model implemented with LINQ has taken on a rather bizarre form - multiple programming styles are involved. The top level of the model has imperative semantics. It can be represented as chains of object transformations, building collections of objects on top of collections. Query objects are elements of the OOP world. They need to be created, assigned to variables, and references to them must be passed to other requests. At the middle level, the query object implements the procedure for executing a query, which is functionally customized with lambda expressions that allow you to form the result structure in the SELECT section and filter out the entries in the WHERE clause. The internal level is represented by a query execution procedure that has logical semantics and is based on relational algebra.



Although LINQ made it possible to describe the domain model, the SQL syntax is aimed primarily at fetching and manipulating data. It lacks some constructs that would be useful in modeling. If in PL / SQL the structure of basic concepts was very clearly represented in the form of tables and views, then in LINQ it turned out to be rendered into OOP code. In addition, while tables and views could be referenced by name, LINQ queries could be referenced in an imperative style. In addition, SQL is limited by the relational model and has limited capabilities when working with structures in the form of graphs or trees.



Parallels between the relational model and logic programming



You can see that the SQL and Prolog implementations of the model have similarities. In SQL, we build a view based on tables or other views, and in Prolog we build rules based on facts and rules. In SQL, tables are a collection of fields, and predicates in Prolog are a collection of attributes. In SQL, we specify dependencies between table fields as expressions in the WHERE clause, and in Prolog, using predicates and boolean variables that link predicate attributes to each other. In both cases, we declaratively set the specification of the solution, and the built-in query execution engine returns the found entries in SQL or possible values ​​of variables in Prolog to us.



This similarity is not accidental. Although the theoretical foundation of SQL - relational algebra was developed in parallel with logic programming, but later a theoretical connection was revealed between them. They have a common mathematical foundation - first-order logic. The relational data model describes the rules for building relationships between data tables, logical programming - between statements. Both theories use different terms, are applied in different fields, were developed in parallel, but they had a common mathematical basis.



Strictly speaking, relational calculus is an adaptation of first-order logic to work with tabular data. This question is discussed in more detail here.... That is, any expression of relational algebra (any SQL query) can be reformulated into an expression of first-order logic, and then implemented in Prolog. But not vice versa. The relational calculus is a subset of first order logic. This means that for some kinds of statements that are admissible in first-order logic, we cannot find analogies in relational algebra. For example, the capabilities of recursive queries in SQL are very limited, and the construction of transitive relations is also not always available. Prolog operations such as target disjunction and negation like refusal are much more difficult to implement in SQL. Prolog's flexible syntax gives you more flexibility for working with complex nested structures and supports pattern matching operations on them.This makes it convenient when working with complex data structures such as trees and graphs.



But you have to pay for everything. Built-in query algorithms in relational databases are simpler and less versatile than inference algorithms in Prolog. This makes it possible to optimize them and achieve much higher performance. Prolog is also unable to quickly process millions of rows in relational databases. In addition, Prolog's inference algorithm does not guarantee the end of program execution at all - the output of some statements can lead to infinite recursion.



By the way, at the intersection of databases and logical programming, there is also such technology as deductive databases and the language of rules and queries to them Datalog. Instead of records in tables, deductive databases store large amounts of facts and rules in a logical style. And Datalog looks like Prolog, but it focuses on working with facts combined into sets, not single facts. In addition, some features of the first-order logic in it were cut in order to optimize the inference algorithm for fast work with large amounts of data. So the less expressive syntax of a logical language also has its advantages.



Declarative approach to API layer description



SQL binds the model building to the data access layer. But declarative programming is also actively developing at the opposite end of the application - in the API layer. Its peculiarity is that information about the structure of requests should be available to those who use this API. Having a formal description of the structure of requests and responses is a good form. Accordingly, there is a desire to synchronize this description with the application code, for example, generate request and response classes based on it. In which you will then need to write the logic for processing requests.



GraphQL is a framework for building APIs that goes far beyond this traditional approach, and offers not only a query language, but also a query execution environment. There is no need to generate code, the runtime understands request descriptions anyway. To implement the API using GraphQL, you need:



  1. describe the data types (objects) of the application that are part of the requests and responses;
  2. describe the structure of requests and responses;
  3. implement functions that implement the logic of creating objects to obtain the values ​​of their fields.


Data types are descriptions of object fields. Types such as scalar types, lists, enumerations, and references to nested types are supported. Since type fields can contain references to other types, the entire data schema can be represented as a graph. The request is a description of the data structure requested from the API. The description of the request includes a list of required objects, their fields, and input attributes. Each data type and each of its fields must be associated with a resolver function. The type (object) resolver describes the algorithm for obtaining its objects, the field resolver describes the object field values. They represent functions in one of the functional or object-oriented languages. The GraphQL runtime receives a request, determines the required data types, calls their resolvers, including along a chain of nested objects,collects a response object.



GraphQL combines declarative data schema description with imperative or functional algorithms for obtaining them. The data schema is described explicitly and is central to the application. Many people point out that it is good practice to create a data schema that does not duplicate data source schemas, but conforms to the domain model. This makes GraphQL quite a popular solution for integrating disparate data sources.



Thus, the GraphQL language allows you to express the domain model in a fairly clear way, to distinguish it from the rest of the code, to bring the model and its implementation closer together. Unfortunately, the declarative component of the language is limited only to the description of the composition of data types; all other relations between model elements have to be implemented using resolvers. On the one hand, resolvers allow a developer to independently implement any method of obtaining data for an object and any relationship between them. But, on the other hand, you will have to try to implement more complex query options than, for example, access to a record by key. On the one hand, the data schema in GraphQL clearly shows the relationship between the API layer and the data access layer. But, on the other hand, the leading layer to which the data schema is bound is the API layer.The content of the data schema adjusts to it; it will not contain entities that are not involved in processing requests. Although the expressive power of the data description language GraphQL is inferior to such full-fledged declarative languages ​​such as SQL and Prolog, the popularity of this framework shows that tools for declarative model description can and should be part of modern programming languages.



I will summarize



PL / SQL is a language that is convenient both for describing a domain model in the form of tables and views, and for the logic of working with it. The declarative and procedural components are tightly integrated and complementary. The main problem is that this language is closely tied to the data storage location, it can only be executed on the database server side, and the query execution logic is limited to the relational data model.



On the application side, you can use technologies such as LINQ and GraphQL to describe the model in a declarative form. Using the GraphQL data schema, you can clearly and very clearly describe the structure of the domain model, the nesting of its concepts. And the runtime is capable of automatically collecting the required objects. Unfortunately, all other relationships and connections between concepts, except for their nesting, have to be implemented in the layer of resolver functions. LINQ has opposite advantages and disadvantages. SQL's flexible syntax gives you more flexibility to describe the relationships between concepts. But outside of the request, declarativeness ends, request objects are elements of the OOP world. They need to be created, assigned to variables, and used in an imperative style.



I would like to combine the advantages of both LINQ and GraphQL. So that the description of the structure of concepts was clear as in GraphQL, and the relationships between them could be set based on logic as in SQL. And so that definitions of concepts are available directly by name as classes, without the need to explicitly create their objects, assign them to variables, pass references to them, etc.



I will start designing such a solution by developing a language for describing a domain model. But for this it is necessary to make an overview of the existing languages ​​of knowledge representation. Therefore, in the next publication I want to talk about logic programming, RDF, OWL and frame logic languages, compare them and try to find such features that would be interesting for the designed language for describing business logic.



For those who do not want to wait for the release of all publications on Habré, there is a full text in a scientific style in English, available at the link: Hybrid Ontology-Oriented Programming for Semi-Structured Data Processing .



Links to previous publications:

Designing a multi-paradigm programming language. Part 1 - What is it for?



All Articles