Machine Learning Web Server “VKF Solver”

Now, in the eyes of the general public, machine learning is strongly associated with various options for training neural networks. If initially these were fully connected networks, then they were replaced by convolutional and recurrent ones, now they have become completely exotic options like GAN and LTSM networks. In addition to the increasing volumes of samples required for their training, they still suffer from the impossibility of explaining why this or that decision was made. But there are also structural approaches to machine learning, the software implementation of one of which is described in this article.







This is a domestic approach to machine learning, called the VKF-method of machine learning based on lattice theory. The history of occurrence and the choice of the name is explained at the very end of this article.



1. Description of the method



Initially, the entire system was created by the author in C ++ as a console application, then it was connected to a database under the control of the MariaDB DBMS (using the mariadb ++ library), then turned into a Python library (using the pybind11 package).

Several arrays were selected as test data for testing machine learning algorithms from the repository of the University of California at Irvine.



On the Mushrooms array containing descriptions of 8,124 North American mushrooms, the system showed 100% result. More precisely, the random number generator divided the initial data into a training set (2088 edible and 1944 poisonous mushrooms) and a test set (2120 edible and 1972 poisonous). After calculating about 100 hypotheses about the causes of edibility, all test cases were correctly predicted. Since the algorithm uses a paired Markov chain, a sufficient number of hypotheses can vary. Quite often it was enough to generate 50 random hypotheses. I note that when generating the causes of toxicity, the number of required hypotheses is grouped around 120, however, all test cases in this case are predicted correctly. Kaggle.com has a Mushroom Classification competitionwhere quite a few authors have achieved 100% accuracy. But most of the solutions are neural networks. Our approach allows the mushroom picker to learn only about 50 rules. Since most of the features are insignificant, then each hypothesis will be a conjunction of a small number of values ​​of essential features, which makes them easy to remember. After that, the mushroom picker can go for mushrooms without fear of taking a toadstool or missing an edible mushroom.



Here is an example of one of the hypotheses, on the basis of which it can be considered that the mushroom is edible:

[('gill_attachment', 'free'), ('gill_spacing', 'close'), ('gill_size', 'broad'), ('stalk_shape ',' enlarging '), (' stalk_surface_below_ring ',' scaly '), (' veil_type ',' partial '), (' veil_color ',' white '), ('ring_number ',' one '), (' ring_type ',' pendant ')]



I draw your attention to the fact that only 9 out of 22 signs are listed, since the remaining 13 signs of similarity are not observed in the edible mushrooms that gave rise to this cause.



Another array was SPECT Hearts. There, the accuracy of prediction of test cases reached 86.1%, which turned out to be slightly more than the results (84%) of the CLIP3 machine learning system, based on learning how to cover examples using integer programming used by the authors of the array. I believe that due to the structure of the description of the tomograms of the heart, which are already precoded there with binary signs, it is not possible to significantly improve the quality of the forecast.



The author quite recently came up with (and implemented in software) an extension of his approach to processing data described by continuous (numerical) features. In some respects, its approach is similar to the C4.5 system of training decision trees. This variant was tested on the Wine Quality array. This array describes the quality of Portuguese wines. The results are encouraging: if we take high-quality red wines, the hypotheses fully explain their high scores.



2. Choice of platform



Currently, a series of web-servers for various types of tasks (using the Nginx + Gunicorn + Django bundle) is being created by the students of the Department of Intelligent Systems of the RSUH.



However, I decided to describe my personal version here (using the aiohttp, aiojobs and aiomysql bundles). The aiomcache module is not used due to known security issues.



There are several advantages to the proposed option:



  1. it is asynchronous due to the use of aiohttp;
  2. it allows Jinja2 templating;
  3. it works with a pool of connections to the database via aiomysql;
  4. it enables independent computational processes to be launched via aiojobs.aiohttp.spawn.


We point out the obvious disadvantages (compared to Django):



  1. no Object Relational Mapping (ORM);
  2. it is more difficult to organize the use of the Nginx proxy server;
  3. no Django Template Language (DTL).


Each of the two options is aimed at different strategies for working with a web server. The synchronous strategy (in Django) is aimed at a single-user mode, in which an expert works with a single database at a time. Although the probabilistic procedures of the CCF method are remarkably parallelized, nevertheless, theoretically, a case is not excluded when machine learning procedures will take considerable time. Therefore, the option discussed in this note is aimed at several experts, each of which can simultaneously work (in different browser tabs) with different databases that differ not only in data, but also in the way they are presented (different lattices on the values ​​of discrete features, different significant regressions and the number thresholds for continuous). Then, when starting the CCF experiment in one tab, the expert can switch to another,where will prepare or analyze an experiment with different data and / or parameters.



To account for several users, experiments and different stages at which they are located, there is a service database (vkf) with two tables (users, experiments). If the user table stores the login and password of all registered users, then experiments, in addition to the names of the auxiliary and main tables of each experiment, keeps the status of these tables being filled. We ditched aiohttp_session as we still need to use an Nginx proxy to protect critical data.



Here is the structure of the experiments table:



  • id int (11) NOT NULL PRIMARY KEY
  • expName varchar (255) NOT NULL
  • encoder varchar (255)
  • goodEncoder tinyint (1)
  • lattices varchar (255)
  • goodLattices tinyint (1)
  • complex varchar (255)
  • goodComplex tinyint (1)
  • verges varchar (255)
  • goodVerges tinyint (1)
  • vergesTotal int (11)
  • trains varchar (255) NOT NULL
  • goodTrains tinyint (1)
  • tests varchar(255)
  • goodTests tinyint(1)
  • hypotheses varchar(255) NOT NULL
  • goodHypotheses tinyint(1)
  • type varchar(255) NOT NULL


It should be noted that there are some sequences of data preparation for CCF experiments, which, unfortunately, are radically different for the discrete and continuous cases. The mixed case case combines the requirements of both types.



discrete: => goodLattices (semi-automatic)

discrete: goodLattices => goodEncoder (automatic)

discrete: goodEncoder => goodTrains (semi-automatic)

discrete: goodEncoder, goodTrains => goodHypotheses (automatic)

discrete: goodEncoder => goodTests (semi-automatic),

discrete goodEncoder, goodHypotheses => (automatic)

continuous: => goodVerges (manual)

continuous: goodVerges => goodTrains (manual)

continuous: goodTrains => goodComplex (automatic)

continuous: goodComplex, goodTrains => goodHypotheses (automatic)

continuous: goodVerges => goodTests (manual)

continuous: goodTests, goodComplex, goodHypotheses => (automatic)



The machine learning library itself is named vkf.cpython -36m-x86_64-linux-gnu.so under Linux or vkf.cp36-win32.pyd under Windows. (36 is the version of Python this library was built for).



The term "automatic" means the work of this library, "semi-automatic" means the work of the auxiliary library vkfencoder.cpython-36m-x86_64-linux-gnu.so. Finally, the "manual" mode is a call of programs that specially process the data of a particular experiment and are now transferred to the vkfencoder library.



3. Implementation details



When creating a web server, we use the "View / Model / Control" approach



. Python code is located in 5 files:



  1. app.py - application launch file
  2. control.py - file with procedures for working with the VKF solver
  3. models.py - file with classes for processing data and working with the database
  4. settings.py - file with application settings
  5. views.py - file with visualization and handling of routes (routes).


The app.py file looks like this:



#! /usr/bin/env python
import asyncio
import jinja2
import aiohttp_jinja2

from settings import SITE_HOST as siteHost
from settings import SITE_PORT as sitePort

from aiohttp import web
from aiojobs.aiohttp import setup

from views import routes

async def init(loop):
    app = web.Application(loop=loop)
    # install aiojobs.aiohttp
    setup(app)
    # install jinja2 templates
    aiohttp_jinja2.setup(app, 
        loader=jinja2.FileSystemLoader('./template'))
    # add routes from api/views.py
    app.router.add_routes(routes)
    return app

loop = asyncio.get_event_loop()
try:
    app = loop.run_until_complete(init(loop))
    web.run_app(app, host=siteHost, port=sitePort)
except:
    loop.stop()


I don’t think there is anything to explain here. The next file in the order of inclusion in the project is views.py:



import aiohttp_jinja2
from aiohttp import web#, WSMsgType
from aiojobs.aiohttp import spawn#, get_scheduler
from models import User
from models import Expert
from models import Experiment
from models import Solver
from models import Predictor

routes = web.RouteTableDef()

@routes.view(r'/tests/{name}', name='test-name')
class Predict(web.View):
    @aiohttp_jinja2.template('tests.html')
    async def get(self):
        return {'explanation': 'Please, confirm prediction!'}

    async def post(self):
        data = await self.request.post()
        db_name = self.request.match_info['name']
        analogy = Predictor(db_name, data)
        await analogy.load_data()
        job = await spawn(self.request, analogy.make_prediction())
        return await job.wait()

@routes.view(r'/vkf/{name}', name='vkf-name')
class Generate(web.View):
    #@aiohttp_jinja2.template('vkf.html')
    async def get(self):
        db_name = self.request.match_info['name']
        solver = Solver(db_name)
        await solver.load_data()
        context = { 'dbname': str(solver.dbname),
                    'encoder': str(solver.encoder),
                    'lattices': str(solver.lattices),
                    'good_lattices': bool(solver.lattices),
                    'verges': str(solver.verges),
                    'good_verges': bool(solver.good_verges),
                    'complex': str(solver.complex),
                    'good_complex': bool(solver.good_complex),
                    'trains': str(solver.trains),
                    'good_trains': bool(solver.good_trains),
                    'hypotheses': str(solver.hypotheses),
                    'type': str(solver.type)
            }
        response = aiohttp_jinja2.render_template('vkf.html', 
            self.request, context)
        return response
            
    async def post(self):
        data = await self.request.post()
        step = data.get('value')
        db_name = self.request.match_info['name']
        if step is 'init':
            location = self.request.app.router['experiment-name'].url_for(
                name=db_name)
            raise web.HTTPFound(location=location)
        solver = Solver(db_name)
        await solver.load_data()
        if step is 'populate':
            job = await spawn(self.request, solver.create_tables())
            return await job.wait()                
        if step is 'compute':
            job = await spawn(self.request, solver.compute_tables())
            return await job.wait()                
        if step is 'generate':
            hypotheses_total = data.get('hypotheses_total')
            threads_total = data.get('threads_total')
            job = await spawn(self.request, solver.make_induction(
                hypotheses_total, threads_total))
            return await job.wait()                

@routes.view(r'/experiment/{name}', name='experiment-name')
class Prepare(web.View):
    @aiohttp_jinja2.template('expert.html')
    async def get(self):
        return {'explanation': 'Please, enter your data'}

    async def post(self):
        data = await self.request.post()
        db_name = self.request.match_info['name']
        experiment = Experiment(db_name, data)
        job = await spawn(self.request, experiment.create_experiment())
        return await job.wait()


I reduced this file for the present note by throwing out classes serving utility routes:



  1. Auth '/' . , SignIn, '/signin'. , '/user/{name}'.
  2. SignIn '/signin' .
  3. Select '/user/{name}' , . '/vkf/{name}' '/experiment/{name}' ( ).


The remaining classes process the routes responsible for the stages of machine learning:



  1. the Prepare class processes the routes '/ experiment / {name}' and collects the names of the service tables and the numeric parameters required to run the procedures of the VKF method. After saving this information to the database, the user is redirected to the '/ vkf / {name}' route.
  2. the Generate class processes the routes '/ vkf / {name}' and starts various stages of the VKF method induction procedure, depending on the preparedness of the data by the expert.
  3. the Predict class processes routes '/ tests / {name}' and starts the procedure of the VKF prediction method by analogy.


To pass a large number of parameters to the vkf.html form, a construction from aiohttp_jinja2 is used



response = aiohttp_jinja2.render_template('vkf.html', self.request, context)
return response




Also note the use of the spawn call from the aiojobs.aiohttp package:



job = await spawn(self.request, 
    solver.make_induction(hypotheses_total, threads_total))
return await job.wait()


This is necessary to safely call coroutines from the classes defined in the models.py file that process user and experiment data stored in a DB managed by MariaDB:



import aiomysql
from aiohttp import web

from settings import AUX_NAME as auxName
from settings import AUTH_TABLE as authTable
from settings import AUX_TABLE as auxTable
from settings import SECRET_KEY as secretKey
from settings import DB_HOST as dbHost

from control import createAuxTables
from control import createMainTables
from control import computeAuxTables
from control import induction
from control import prediction

class Experiment():
    def __init__(self, dbName, data, **kw):
        self.encoder = data.get('encoder_table')
        self.lattices = data.get('lattices_table')
        self.complex = data.get('complex_table')
        self.verges = data.get('verges_table')
        self.verges_total = data.get('verges_total')
        self.trains = data.get('training_table')
        self.tests = data.get('tests_table')
        self.hypotheses = data.get('hypotheses_table')
        self.type = data.get('type')
        self.auxname = auxName
        self.auxtable = auxTable
        self.dbhost = dbHost
        self.secret = secretKey
        self.dbname = dbName

    async def create_db(self, pool):
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                await cur.execute("CREATE DATABASE IF NOT EXISTS " +
                    str(self.dbname)) 
                await conn.commit() 
        await createAuxTables(self)
 
    async def register_experiment(self, pool):
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "INSERT INTO " + str(self.auxname) + "." + 
                    str(self.auxtable)
                sql += " VALUES(NULL, '" 
                sql += str(self.dbname) 
                sql += "', '" 
                sql += str(self.encoder) 
                sql += "', 0, '" #goodEncoder
                sql += str(self.lattices) 
                sql += "', 0, '" #goodLattices
                sql += str(self.complex) 
                sql += "', 0, '" #goodComplex 
                sql += str(self.verges_total) 
                sql += "', 0, " #goodVerges
                sql += str(self.verges_total) 
                sql += ", '" 
                sql += str(self.trains) 
                sql += "', 0, '" #goodTrains 
                sql += str(self.tests) 
                sql += "', 0, '" #goodTests 
                sql += str(self.hypotheses) 
                sql += "', 0, '" #goodHypotheses 
                sql += str(self.type)
                sql += "')"
                await cur.execute(sql)
                await conn.commit() 

    async def create_experiment(self, **kw):
        pool = await aiomysql.create_pool(host=self.dbhost, 
            user='root', password=self.secret)
        task1 = self.create_db(pool=pool)
        task2 = self.register_experiment(pool=pool)
        tasks = [asyncio.ensure_future(task1), 
            asyncio.ensure_future(task2)]
        await asyncio.gather(*tasks)
        pool.close()
        await pool.wait_closed()
        raise web.HTTPFound(location='/vkf/' + self.dbname)        

class Solver():
    def __init__(self, dbName, **kw):
        self.auxname = auxName
        self.auxtable = auxTable
        self.dbhost = dbHost
        self.dbname = dbName
        self.secret = secretKey

    async def load_data(self, **kw):    
        pool = await aiomysql.create_pool(host=dbHost, 
            user='root', password=secretKey, db=auxName)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "SELECT * FROM "
                sql += str(auxTable)
                sql += " WHERE  expName='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql)
                row = cur.fetchone()
                await cur.close()
        pool.close()
        await pool.wait_closed()
        self.encoder = str(row.result()[2])
        self.good_encoder = bool(row.result()[3])
        self.lattices = str(row.result()[4])
        self.good_lattices = bool(row.result()[5])
        self.complex = str(row.result()[6])
        self.good_complex = bool(row.result()[7])
        self.verges = str(row.result()[8])
        self.good_verges = bool(row.result()[9])
        self.verges_total = int(row.result()[10])
        self.trains = str(row.result()[11])
        self.good_trains = bool(row.result()[12])
        self.hypotheses = str(row.result()[15])
        self.good_hypotheses = bool(row.result()[16])
        self.type = str(row.result()[17])

    async def create_tables(self, **kw):
        await createMainTables(self)
        pool = await aiomysql.create_pool(host=self.dbhost, user='root', 
            password=self.secret, db=self.auxname)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "UPDATE "
                sql += str(self.auxtable)
                sql += " SET encoderStatus=1 WHERE dbname='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql) 
                await conn.commit() 
                await cur.close()
        pool.close()
        await pool.wait_closed()
        raise web.HTTPFound(location='/vkf/' + self.dbname)        

    async def compute_tables(self, **kw):
        await computeAuxTables(self)
        pool = await aiomysql.create_pool(host=self.dbhost, user='root', 
            password=self.secret, db=self.auxname)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "UPDATE "
                sql += str(self.auxtable)
                sql += " SET complexStatus=1 WHERE dbname='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql) 
                await conn.commit() 
                await cur.close()
        pool.close()
        await pool.wait_closed()
        raise web.HTTPFound(location='/vkf/' + self.dbname)        

    async def make_induction(self, hypotheses_total, threads_total, **kw):
        await induction(self, hypotheses_total, threads_total)
        pool = await aiomysql.create_pool(host=self.dbhost, user='root', 
            password=self.secret, db=self.auxname)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "UPDATE "
                sql += str(self.auxtable)
                sql += " SET hypothesesStatus=1 WHERE dbname='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql) 
                await conn.commit() 
                await cur.close()
        pool.close()
        await pool.wait_closed()
        raise web.HTTPFound(location='/tests/' + self.dbname)        

class Predictor():
    def __init__(self, dbName, data, **kw):
        self.auxname = auxName
        self.auxtable = auxTable
        self.dbhost = dbHost
        self.dbname = dbName
        self.secret = secretKey
        self.plus = 0
        self.minus = 0

    async def load_data(self, **kw):    
        pool = await aiomysql.create_pool(host=dbHost, user='root', 
            password=secretKey, db=auxName)
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                sql = "SELECT * FROM "
                sql += str(auxTable)
                sql += " WHERE dbname='"
                sql += str(self.dbname)
                sql += "'"
                await cur.execute(sql) 
                row = cur.fetchone()
                await cur.close()
        pool.close()
        await pool.wait_closed()
        self.encoder = str(row.result()[2])
        self.good_encoder = bool(row.result()[3])
        self.complex = str(row.result()[6])
        self.good_complex = bool(row.result()[7])
        self.verges = str(row.result()[8])
        self.trains = str(row.result()[11])
        self.tests = str(row.result()[13])
        self.good_tests = bool(row.result()[14])
        self.hypotheses = str(row.result()[15])
        self.good_hypotheses = bool(row.result()[16])
        self.type = str(row.result()[17])

    async def make_prediction(self, **kw):
        if self.good_tests and self.good_hypotheses:
            await induction(self, 0, 1)
            await prediction(self)
            message_body = str(self.plus)
            message_body += " correct positive cases. "
            message_body += str(self.minus)
            message_body += " correct negative cases."
            raise web.HTTPException(body=message_body)
        else:
            raise web.HTTPFound(location='/vkf/' + self.dbname)




Again, some helper classes are hidden:



  1. The User class corresponds to the site visitor. It allows you to register and log in as an expert.
  2. The Expert class allows you to choose one of the experiments.


The remaining classes correspond to the main procedures:



  1. The Experiment class allows you to specify the names of key and auxiliary tables and the parameters required for conducting ICF experiments.
  2. The Solver class is responsible for the inductive generalization in the VKF method.
  3. The Predictor class is responsible for predictions by analogy in the CCF method.


It is important to note the use of the create_pool () construct of the aiomysql package. It allows you to work with a database in several connections. The ensure_future () and gather () routines from the asyncio module are also needed to wait for execution to complete.



pool = await aiomysql.create_pool(host=self.dbhost, 
    user='root', password=self.secret)
task1 = self.create_db(pool=pool)
task2 = self.register_experiment(pool=pool)
tasks = [asyncio.ensure_future(task1), 
    asyncio.ensure_future(task2)]
await asyncio.gather(*tasks)
pool.close()
await pool.wait_closed()


When reading from a table, row = cur.fetchone () returns a future, so row.result () returns a database record from which field values ​​can be retrieved (for example, str (row.result () [2]) retrieves the table name with encoding the values ​​of discrete features).




pool = await aiomysql.create_pool(host=dbHost, user='root', 
    password=secretKey, db=auxName)
async with pool.acquire() as conn:
    async with conn.cursor() as cur:
        await cur.execute(sql) 
        row = cur.fetchone()
        await cur.close()
pool.close()
await pool.wait_closed()
self.encoder = str(row.result()[2])


Key system parameters are imported from the .env file or (if there is none) from the settings.py file.



from os.path import isfile
from envparse import env

if isfile('.env'):
    env.read_envfile('.env')

AUX_NAME = env.str('AUX_NAME', default='vkf')
AUTH_TABLE = env.str('AUTH_TABLE', default='users')
AUX_TABLE = env.str('AUX_TABLE', default='experiments')
DB_HOST = env.str('DB_HOST', default='127.0.0.1')
DB_HOST = env.str('DB_PORT', default=3306)
DEBUG = env.bool('DEBUG', default=False)
SECRET_KEY = env.str('SECRET_KEY', default='toor')
SITE_HOST = env.str('HOST', default='127.0.0.1')
SITE_PORT = env.int('PORT', default=8080)


It is important to note that localhost must be specified by ip-address, otherwise aiomysql will try to connect to the database via a Unix socket, which may not work under Windows. Finally, play the last file (control.py):



import os
import asyncio
import vkf

async def createAuxTables(db_data):
    if  db_data.type is not "discrete":
        await vkf.CAttributes(db_data.verges, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is not "continuous":
        await vkf.DAttributes(db_data.encoder, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
        await vkf.Lattices(db_data.lattices, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret) 

async def createMainTables(db_data):
    if  db_data.type is "continuous":
        await vkf.CData(db_data.trains, db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.CData(db_data.tests, db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is "discrete":
        await vkf.FCA(db_data.lattices, db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.DData(db_data.trains, db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.DData(db_data.tests, db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is "full":
        await vkf.FCA(db_data.lattices, db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.FData(db_data.trains, db_data.encoder, db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        await vkf.FData(db_data.tests, db_data.encoder, db_data.verges, 
            db_data.dbname,'127.0.0.1', 'root', db_data.secret)

async def computeAuxTables(db_data):
    if  db_data.type is not "discrete":
        async with vkf.Join(db_data.trains, db_data.dbname, '127.0.0.1', 
            'root', db_data.secret) as join:
            await join.compute_save(db_data.complex, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        await vkf.Generator(db_data.complex, db_data.trains, db_data.verges, 
            db_data.dbname, db_data.dbname, db_data.verges_total, 1, 
            '127.0.0.1', 'root', db_data.secret)

async def induction(db_data, hypothesesNumber, threadsNumber):
    if  db_data.type is not "discrete":
        qualifier = await vkf.Qualifier(db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        beget = await vkf.Beget(db_data.complex, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is not "continuous":
        encoder = await vkf.Encoder(db_data.encoder, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
    async with vkf.Induction() as induction: 
        if  db_data.type is "continuous":
            await induction.load_continuous_hypotheses(qualifier, beget, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "discrete":
            await induction.load_discrete_hypotheses(encoder, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "full":
            await induction.load_full_hypotheses(encoder, qualifier, beget, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if hypothesesNumber > 0:
            await induction.add_hypotheses(hypothesesNumber, threadsNumber)
            if  db_data.type is "continuous":
                await induction.save_continuous_hypotheses(qualifier, 
                    db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', 
                    db_data.secret)
            if  db_data.type is "discrete":
                await induction.save_discrete_hypotheses(encoder, 
                    db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', 
                    db_data.secret)
            if  db_data.type is "full":
                await induction.save_full_hypotheses(encoder, qualifier, 
                    db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', 
                    db_data.secret)

async def prediction(db_data):
    if  db_data.type is not "discrete":
        qualifier = await vkf.Qualifier(db_data.verges, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
        beget = await vkf.Beget(db_data.complex, db_data.dbname, 
            '127.0.0.1', 'root', db_data.secret)
    if  db_data.type is not "continuous":
        encoder = await vkf.Encoder(db_data.encoder, 
            db_data.dbname, '127.0.0.1', 'root', db_data.secret)
    async with vkf.Induction() as induction: 
        if  db_data.type is "continuous":
            await induction.load_continuous_hypotheses(qualifier, beget, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "discrete":
            await induction.load_discrete_hypotheses(encoder, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "full":
            await induction.load_full_hypotheses(encoder, qualifier, beget, 
                db_data.trains, db_data.hypotheses, db_data.dbname, 
                '127.0.0.1', 'root', db_data.secret)
        if  db_data.type is "continuous":
            async with vkf.TestSample(qualifier, induction, beget, 
                db_data.tests, db_data.dbname, '127.0.0.1', 'root', 
                db_data.secret) as tests:
                #plus = await tests.correct_positive_cases()
                db_data.plus = await tests.correct_positive_cases()
                #minus = await tests.correct_negative_cases()
                db_data.minus = await tests.correct_negative_cases()
        if  db_data.type is "discrete":
            async with vkf.TestSample(encoder, induction, 
                db_data.tests, db_data.dbname, '127.0.0.1', 'root', 
                db_data.secret) as tests:
                #plus = await tests.correct_positive_cases()
                db_data.plus = await tests.correct_positive_cases()
                #minus = await tests.correct_negative_cases()
                db_data.minus = await tests.correct_negative_cases()
        if  db_data.type is "full":
            async with vkf.TestSample(encoder, qualifier, induction, 
                beget, db_data.tests, db_data.dbname, '127.0.0.1', 
                'root', db_data.secret) as tests:
                #plus = await tests.correct_positive_cases()
                db_data.plus = await tests.correct_positive_cases()
                #minus = await tests.correct_negative_cases()
                db_data.minus = await tests.correct_negative_cases()


I saved this file in its entirety, since here you can see the names, the order of calling and the arguments of the VKF method procedures from the vkf.cpython-36m-x86_64-linux-gnu.so library. All arguments after dbname can be omitted, since the defaults in the CPython library are set with standard values.



4. Comments



Anticipating the question of professional programmers about why the logic of controlling the VKF experiment is brought out (through numerous ifs), and not hidden through polymorphism into types, the answer should be as follows: unfortunately, the dynamic typing of the Python language does not allow shifting the decision about the type of object used to the system , that is, in any case, this sequence of nested if occurs. Therefore, the author preferred to use an explicit (C-like) syntax to make the logic as transparent (and efficient) as possible.



Let me comment on the missing components:



  1. vkfencoder.cpython-36m-x86_64-linux-gnu.so (web- , , ). vkfencoder.cpython-36m-x86_64-linux-gnu.so.
  2. - MariaDB ( DBeaver 7.1.1 Community, ). Django, ORM .


5.



The author has been involved in data mining tasks for over 30 years. After graduating from the Faculty of Mechanics and Mathematics of Moscow State University named after M.V. Lomonosov, he was invited to a group of researchers under the leadership of Doctor of Technical Sciences, prof. VK. Finn (VINITI AN SSSR). Since the beginning of the 80s of the last century, Viktor Konstantinovich has been exploring plausible reasoning and their formalization by means of multivalued logics.



The key ideas proposed by V.K. Finn, the following can be considered:



  1. the use of a binary similarity operation (originally, the intersection operation in Boolean algebra);
  2. the idea of ​​discarding the generated similarity of a group of training examples if it is embedded in the description of an example of the opposite sign (counter-example);
  3. the idea of ​​predicting the investigated (target) properties of new examples by taking into account the pros and cons;
  4. the idea of ​​checking the completeness of a set of hypotheses by finding reasons (among the generated similarities) for the presence / absence of a target property in training examples.


It should be noted that V.K. Finn attributes some of his ideas to foreign authors. Perhaps, only the logic of argumentation with full right is considered to be invented by him independently. The idea of ​​accounting for counter-examples by V.K. Finn borrowed, he said, from K.R. Popper. And the origins of checking the completeness of inductive generalization belong to him (completely obscure, in my opinion) the works of the American mathematician and logician C.S. Pierce. He considers the generation of hypotheses about the causes using the operation of similarity to be borrowed from the ideas of the British economist, philosopher and logician D.S. Mill. Therefore, he titled the set of ideas he created "DSM-method" in honor of D.S. Mill.



Strange, but emerged in the late 70s of the XX century in the works of prof. Rudolf Wille (Germany) does not use much more useful section of the algebraic theory of lattices “Analysis of formal concepts” (AFP) in V.K. Finn Regards. In my opinion, the reason for this is the unfortunate name, which, like a person who first graduated from the Faculty of Philosophy, and then the engineering stream of the Faculty of Mechanics and Mathematics of Moscow State University, causes rejection.



As the successor of his teacher's work, the author named his approach "VKF-method" in his honor. However, there is another decoding - a probabilistic combinatorial formal machine learning method based on lattice theory.



Now V.K. Finna works in the Exhibition Center. A.A. Dorodnicyn RAS FRC IU RAS and at the Department of Intelligent Systems of the Russian State University for the Humanities.



More information on the mathematics of the VKF-solver can be found in the author's dissertation or his video lectures at the Ulyanovsk State University (the author is grateful to AB Verevkin and NG Baranets for organizing lectures and processing their notes).



The complete package of source files is stored on Bitbucket .



The source files (in C ++) for the vkf library are in the process of agreeing on their placement at savannah.nongnu.org. If so, a download link will be added here.



Finally, one final note: I started learning Python on April 6, 2020. Until then, the only language he had programmed in was C ++. But this circumstance does not relieve him of the accusations that the code may be inaccurate.



The author would like to thank Tatyana A. Volkovarobofreakfor support, constructive suggestions and criticism, which made it possible to significantly improve the presentation (and even significantly improve the code). However, responsibility for the remaining errors and decisions made (even contrary to her advice) is solely the author.



All Articles