This is a domestic approach to machine learning, called the VKF-method of machine learning based on lattice theory. The history of occurrence and the choice of the name is explained at the very end of this article.
1. Description of the method
Initially, the entire system was created by the author in C ++ as a console application, then it was connected to a database under the control of the MariaDB DBMS (using the mariadb ++ library), then turned into a Python library (using the pybind11 package).
Several arrays were selected as test data for testing machine learning algorithms from the repository of the University of California at Irvine.
On the Mushrooms array containing descriptions of 8,124 North American mushrooms, the system showed 100% result. More precisely, the random number generator divided the initial data into a training set (2088 edible and 1944 poisonous mushrooms) and a test set (2120 edible and 1972 poisonous). After calculating about 100 hypotheses about the causes of edibility, all test cases were correctly predicted. Since the algorithm uses a paired Markov chain, a sufficient number of hypotheses can vary. Quite often it was enough to generate 50 random hypotheses. I note that when generating the causes of toxicity, the number of required hypotheses is grouped around 120, however, all test cases in this case are predicted correctly. has a Mushroom Classification competitionwhere quite a few authors have achieved 100% accuracy. But most of the solutions are neural networks. Our approach allows the mushroom picker to learn only about 50 rules. Since most of the features are insignificant, then each hypothesis will be a conjunction of a small number of values of essential features, which makes them easy to remember. After that, the mushroom picker can go for mushrooms without fear of taking a toadstool or missing an edible mushroom.
Here is an example of one of the hypotheses, on the basis of which it can be considered that the mushroom is edible:
[('gill_attachment', 'free'), ('gill_spacing', 'close'), ('gill_size', 'broad'), ('stalk_shape ',' enlarging '), (' stalk_surface_below_ring ',' scaly '), (' veil_type ',' partial '), (' veil_color ',' white '), ('ring_number ',' one '), (' ring_type ',' pendant ')]
I draw your attention to the fact that only 9 out of 22 signs are listed, since the remaining 13 signs of similarity are not observed in the edible mushrooms that gave rise to this cause.
Another array was SPECT Hearts. There, the accuracy of prediction of test cases reached 86.1%, which turned out to be slightly more than the results (84%) of the CLIP3 machine learning system, based on learning how to cover examples using integer programming used by the authors of the array. I believe that due to the structure of the description of the tomograms of the heart, which are already precoded there with binary signs, it is not possible to significantly improve the quality of the forecast.
The author quite recently came up with (and implemented in software) an extension of his approach to processing data described by continuous (numerical) features. In some respects, its approach is similar to the C4.5 system of training decision trees. This variant was tested on the Wine Quality array. This array describes the quality of Portuguese wines. The results are encouraging: if we take high-quality red wines, the hypotheses fully explain their high scores.
2. Choice of platform
Currently, a series of web-servers for various types of tasks (using the Nginx + Gunicorn + Django bundle) is being created by the students of the Department of Intelligent Systems of the RSUH.
However, I decided to describe my personal version here (using the aiohttp, aiojobs and aiomysql bundles). The aiomcache module is not used due to known security issues.
There are several advantages to the proposed option:
- it is asynchronous due to the use of aiohttp;
- it allows Jinja2 templating;
- it works with a pool of connections to the database via aiomysql;
- it enables independent computational processes to be launched via aiojobs.aiohttp.spawn.
We point out the obvious disadvantages (compared to Django):
- no Object Relational Mapping (ORM);
- it is more difficult to organize the use of the Nginx proxy server;
- no Django Template Language (DTL).
Each of the two options is aimed at different strategies for working with a web server. The synchronous strategy (in Django) is aimed at a single-user mode, in which an expert works with a single database at a time. Although the probabilistic procedures of the CCF method are remarkably parallelized, nevertheless, theoretically, a case is not excluded when machine learning procedures will take considerable time. Therefore, the option discussed in this note is aimed at several experts, each of which can simultaneously work (in different browser tabs) with different databases that differ not only in data, but also in the way they are presented (different lattices on the values of discrete features, different significant regressions and the number thresholds for continuous). Then, when starting the CCF experiment in one tab, the expert can switch to another,where will prepare or analyze an experiment with different data and / or parameters.
To account for several users, experiments and different stages at which they are located, there is a service database (vkf) with two tables (users, experiments). If the user table stores the login and password of all registered users, then experiments, in addition to the names of the auxiliary and main tables of each experiment, keeps the status of these tables being filled. We ditched aiohttp_session as we still need to use an Nginx proxy to protect critical data.
Here is the structure of the experiments table:
- id int (11) NOT NULL PRIMARY KEY
- expName varchar (255) NOT NULL
- encoder varchar (255)
- goodEncoder tinyint (1)
- lattices varchar (255)
- goodLattices tinyint (1)
- complex varchar (255)
- goodComplex tinyint (1)
- verges varchar (255)
- goodVerges tinyint (1)
- vergesTotal int (11)
- trains varchar (255) NOT NULL
- goodTrains tinyint (1)
- tests varchar(255)
- goodTests tinyint(1)
- hypotheses varchar(255) NOT NULL
- goodHypotheses tinyint(1)
- type varchar(255) NOT NULL
It should be noted that there are some sequences of data preparation for CCF experiments, which, unfortunately, are radically different for the discrete and continuous cases. The mixed case case combines the requirements of both types.
discrete: => goodLattices (semi-automatic)
discrete: goodLattices => goodEncoder (automatic)
discrete: goodEncoder => goodTrains (semi-automatic)
discrete: goodEncoder, goodTrains => goodHypotheses (automatic)
discrete: goodEncoder => goodTests (semi-automatic),
discrete goodEncoder, goodHypotheses => (automatic)
continuous: => goodVerges (manual)
continuous: goodVerges => goodTrains (manual)
continuous: goodTrains => goodComplex (automatic)
continuous: goodComplex, goodTrains => goodHypotheses (automatic)
continuous: goodVerges => goodTests (manual)
continuous: goodTests, goodComplex, goodHypotheses => (automatic)
The machine learning library itself is named vkf.cpython under Linux or vkf.cp36-win32.pyd under Windows. (36 is the version of Python this library was built for).
The term "automatic" means the work of this library, "semi-automatic" means the work of the auxiliary library Finally, the "manual" mode is a call of programs that specially process the data of a particular experiment and are now transferred to the vkfencoder library.
3. Implementation details
When creating a web server, we use the "View / Model / Control" approach
. Python code is located in 5 files:
- - application launch file
- - file with procedures for working with the VKF solver
- - file with classes for processing data and working with the database
- - file with application settings
- - file with visualization and handling of routes (routes).
The file looks like this:
#! /usr/bin/env python
import asyncio
import jinja2
import aiohttp_jinja2
from settings import SITE_HOST as siteHost
from settings import SITE_PORT as sitePort
from aiohttp import web
from aiojobs.aiohttp import setup
from views import routes
async def init(loop):
app = web.Application(loop=loop)
# install aiojobs.aiohttp
# install jinja2 templates
# add routes from api/
return app
loop = asyncio.get_event_loop()
app = loop.run_until_complete(init(loop))
web.run_app(app, host=siteHost, port=sitePort)
I don’t think there is anything to explain here. The next file in the order of inclusion in the project is
import aiohttp_jinja2
from aiohttp import web#, WSMsgType
from aiojobs.aiohttp import spawn#, get_scheduler
from models import User
from models import Expert
from models import Experiment
from models import Solver
from models import Predictor
routes = web.RouteTableDef()
@routes.view(r'/tests/{name}', name='test-name')
class Predict(web.View):
async def get(self):
return {'explanation': 'Please, confirm prediction!'}
async def post(self):
data = await
db_name = self.request.match_info['name']
analogy = Predictor(db_name, data)
await analogy.load_data()
job = await spawn(self.request, analogy.make_prediction())
return await job.wait()
@routes.view(r'/vkf/{name}', name='vkf-name')
class Generate(web.View):
async def get(self):
db_name = self.request.match_info['name']
solver = Solver(db_name)
await solver.load_data()
context = { 'dbname': str(solver.dbname),
'encoder': str(solver.encoder),
'lattices': str(solver.lattices),
'good_lattices': bool(solver.lattices),
'verges': str(solver.verges),
'good_verges': bool(solver.good_verges),
'complex': str(solver.complex),
'good_complex': bool(solver.good_complex),
'trains': str(solver.trains),
'good_trains': bool(solver.good_trains),
'hypotheses': str(solver.hypotheses),
'type': str(solver.type)
response = aiohttp_jinja2.render_template('vkf.html',
self.request, context)
return response
async def post(self):
data = await
step = data.get('value')
db_name = self.request.match_info['name']
if step is 'init':
location =['experiment-name'].url_for(
raise web.HTTPFound(location=location)
solver = Solver(db_name)
await solver.load_data()
if step is 'populate':
job = await spawn(self.request, solver.create_tables())
return await job.wait()
if step is 'compute':
job = await spawn(self.request, solver.compute_tables())
return await job.wait()
if step is 'generate':
hypotheses_total = data.get('hypotheses_total')
threads_total = data.get('threads_total')
job = await spawn(self.request, solver.make_induction(
hypotheses_total, threads_total))
return await job.wait()
@routes.view(r'/experiment/{name}', name='experiment-name')
class Prepare(web.View):
async def get(self):
return {'explanation': 'Please, enter your data'}
async def post(self):
data = await
db_name = self.request.match_info['name']
experiment = Experiment(db_name, data)
job = await spawn(self.request, experiment.create_experiment())
return await job.wait()
I reduced this file for the present note by throwing out classes serving utility routes:
- Auth '/' . , SignIn, '/signin'. , '/user/{name}'.
- SignIn '/signin' .
- Select '/user/{name}' , . '/vkf/{name}' '/experiment/{name}' ( ).
The remaining classes process the routes responsible for the stages of machine learning:
- the Prepare class processes the routes '/ experiment / {name}' and collects the names of the service tables and the numeric parameters required to run the procedures of the VKF method. After saving this information to the database, the user is redirected to the '/ vkf / {name}' route.
- the Generate class processes the routes '/ vkf / {name}' and starts various stages of the VKF method induction procedure, depending on the preparedness of the data by the expert.
- the Predict class processes routes '/ tests / {name}' and starts the procedure of the VKF prediction method by analogy.
To pass a large number of parameters to the vkf.html form, a construction from aiohttp_jinja2 is used
response = aiohttp_jinja2.render_template('vkf.html', self.request, context)
return response
Also note the use of the spawn call from the aiojobs.aiohttp package:
job = await spawn(self.request,
solver.make_induction(hypotheses_total, threads_total))
return await job.wait()
This is necessary to safely call coroutines from the classes defined in the file that process user and experiment data stored in a DB managed by MariaDB:
import aiomysql
from aiohttp import web
from settings import AUX_NAME as auxName
from settings import AUTH_TABLE as authTable
from settings import AUX_TABLE as auxTable
from settings import SECRET_KEY as secretKey
from settings import DB_HOST as dbHost
from control import createAuxTables
from control import createMainTables
from control import computeAuxTables
from control import induction
from control import prediction
class Experiment():
def __init__(self, dbName, data, **kw):
self.encoder = data.get('encoder_table')
self.lattices = data.get('lattices_table')
self.complex = data.get('complex_table')
self.verges = data.get('verges_table')
self.verges_total = data.get('verges_total')
self.trains = data.get('training_table')
self.tests = data.get('tests_table')
self.hypotheses = data.get('hypotheses_table')
self.type = data.get('type')
self.auxname = auxName
self.auxtable = auxTable
self.dbhost = dbHost
self.secret = secretKey
self.dbname = dbName
async def create_db(self, pool):
async with pool.acquire() as conn:
async with conn.cursor() as cur:
await cur.execute("CREATE DATABASE IF NOT EXISTS " +
await conn.commit()
await createAuxTables(self)
async def register_experiment(self, pool):
async with pool.acquire() as conn:
async with conn.cursor() as cur:
sql = "INSERT INTO " + str(self.auxname) + "." +
sql += " VALUES(NULL, '"
sql += str(self.dbname)
sql += "', '"
sql += str(self.encoder)
sql += "', 0, '" #goodEncoder
sql += str(self.lattices)
sql += "', 0, '" #goodLattices
sql += str(self.complex)
sql += "', 0, '" #goodComplex
sql += str(self.verges_total)
sql += "', 0, " #goodVerges
sql += str(self.verges_total)
sql += ", '"
sql += str(self.trains)
sql += "', 0, '" #goodTrains
sql += str(self.tests)
sql += "', 0, '" #goodTests
sql += str(self.hypotheses)
sql += "', 0, '" #goodHypotheses
sql += str(self.type)
sql += "')"
await cur.execute(sql)
await conn.commit()
async def create_experiment(self, **kw):
pool = await aiomysql.create_pool(host=self.dbhost,
user='root', password=self.secret)
task1 = self.create_db(pool=pool)
task2 = self.register_experiment(pool=pool)
tasks = [asyncio.ensure_future(task1),
await asyncio.gather(*tasks)
await pool.wait_closed()
raise web.HTTPFound(location='/vkf/' + self.dbname)
class Solver():
def __init__(self, dbName, **kw):
self.auxname = auxName
self.auxtable = auxTable
self.dbhost = dbHost
self.dbname = dbName
self.secret = secretKey
async def load_data(self, **kw):
pool = await aiomysql.create_pool(host=dbHost,
user='root', password=secretKey, db=auxName)
async with pool.acquire() as conn:
async with conn.cursor() as cur:
sql = "SELECT * FROM "
sql += str(auxTable)
sql += " WHERE expName='"
sql += str(self.dbname)
sql += "'"
await cur.execute(sql)
row = cur.fetchone()
await cur.close()
await pool.wait_closed()
self.encoder = str(row.result()[2])
self.good_encoder = bool(row.result()[3])
self.lattices = str(row.result()[4])
self.good_lattices = bool(row.result()[5])
self.complex = str(row.result()[6])
self.good_complex = bool(row.result()[7])
self.verges = str(row.result()[8])
self.good_verges = bool(row.result()[9])
self.verges_total = int(row.result()[10])
self.trains = str(row.result()[11])
self.good_trains = bool(row.result()[12])
self.hypotheses = str(row.result()[15])
self.good_hypotheses = bool(row.result()[16])
self.type = str(row.result()[17])
async def create_tables(self, **kw):
await createMainTables(self)
pool = await aiomysql.create_pool(host=self.dbhost, user='root',
password=self.secret, db=self.auxname)
async with pool.acquire() as conn:
async with conn.cursor() as cur:
sql = "UPDATE "
sql += str(self.auxtable)
sql += " SET encoderStatus=1 WHERE dbname='"
sql += str(self.dbname)
sql += "'"
await cur.execute(sql)
await conn.commit()
await cur.close()
await pool.wait_closed()
raise web.HTTPFound(location='/vkf/' + self.dbname)
async def compute_tables(self, **kw):
await computeAuxTables(self)
pool = await aiomysql.create_pool(host=self.dbhost, user='root',
password=self.secret, db=self.auxname)
async with pool.acquire() as conn:
async with conn.cursor() as cur:
sql = "UPDATE "
sql += str(self.auxtable)
sql += " SET complexStatus=1 WHERE dbname='"
sql += str(self.dbname)
sql += "'"
await cur.execute(sql)
await conn.commit()
await cur.close()
await pool.wait_closed()
raise web.HTTPFound(location='/vkf/' + self.dbname)
async def make_induction(self, hypotheses_total, threads_total, **kw):
await induction(self, hypotheses_total, threads_total)
pool = await aiomysql.create_pool(host=self.dbhost, user='root',
password=self.secret, db=self.auxname)
async with pool.acquire() as conn:
async with conn.cursor() as cur:
sql = "UPDATE "
sql += str(self.auxtable)
sql += " SET hypothesesStatus=1 WHERE dbname='"
sql += str(self.dbname)
sql += "'"
await cur.execute(sql)
await conn.commit()
await cur.close()
await pool.wait_closed()
raise web.HTTPFound(location='/tests/' + self.dbname)
class Predictor():
def __init__(self, dbName, data, **kw):
self.auxname = auxName
self.auxtable = auxTable
self.dbhost = dbHost
self.dbname = dbName
self.secret = secretKey = 0
self.minus = 0
async def load_data(self, **kw):
pool = await aiomysql.create_pool(host=dbHost, user='root',
password=secretKey, db=auxName)
async with pool.acquire() as conn:
async with conn.cursor() as cur:
sql = "SELECT * FROM "
sql += str(auxTable)
sql += " WHERE dbname='"
sql += str(self.dbname)
sql += "'"
await cur.execute(sql)
row = cur.fetchone()
await cur.close()
await pool.wait_closed()
self.encoder = str(row.result()[2])
self.good_encoder = bool(row.result()[3])
self.complex = str(row.result()[6])
self.good_complex = bool(row.result()[7])
self.verges = str(row.result()[8])
self.trains = str(row.result()[11])
self.tests = str(row.result()[13])
self.good_tests = bool(row.result()[14])
self.hypotheses = str(row.result()[15])
self.good_hypotheses = bool(row.result()[16])
self.type = str(row.result()[17])
async def make_prediction(self, **kw):
if self.good_tests and self.good_hypotheses:
await induction(self, 0, 1)
await prediction(self)
message_body = str(
message_body += " correct positive cases. "
message_body += str(self.minus)
message_body += " correct negative cases."
raise web.HTTPException(body=message_body)
raise web.HTTPFound(location='/vkf/' + self.dbname)
Again, some helper classes are hidden:
- The User class corresponds to the site visitor. It allows you to register and log in as an expert.
- The Expert class allows you to choose one of the experiments.
The remaining classes correspond to the main procedures:
- The Experiment class allows you to specify the names of key and auxiliary tables and the parameters required for conducting ICF experiments.
- The Solver class is responsible for the inductive generalization in the VKF method.
- The Predictor class is responsible for predictions by analogy in the CCF method.
It is important to note the use of the create_pool () construct of the aiomysql package. It allows you to work with a database in several connections. The ensure_future () and gather () routines from the asyncio module are also needed to wait for execution to complete.
pool = await aiomysql.create_pool(host=self.dbhost,
user='root', password=self.secret)
task1 = self.create_db(pool=pool)
task2 = self.register_experiment(pool=pool)
tasks = [asyncio.ensure_future(task1),
await asyncio.gather(*tasks)
await pool.wait_closed()
When reading from a table, row = cur.fetchone () returns a future, so row.result () returns a database record from which field values can be retrieved (for example, str (row.result () [2]) retrieves the table name with encoding the values of discrete features).
pool = await aiomysql.create_pool(host=dbHost, user='root',
password=secretKey, db=auxName)
async with pool.acquire() as conn:
async with conn.cursor() as cur:
await cur.execute(sql)
row = cur.fetchone()
await cur.close()
await pool.wait_closed()
self.encoder = str(row.result()[2])
Key system parameters are imported from the .env file or (if there is none) from the file.
from os.path import isfile
from envparse import env
if isfile('.env'):
AUX_NAME = env.str('AUX_NAME', default='vkf')
AUTH_TABLE = env.str('AUTH_TABLE', default='users')
AUX_TABLE = env.str('AUX_TABLE', default='experiments')
DB_HOST = env.str('DB_HOST', default='')
DB_HOST = env.str('DB_PORT', default=3306)
DEBUG = env.bool('DEBUG', default=False)
SECRET_KEY = env.str('SECRET_KEY', default='toor')
SITE_HOST = env.str('HOST', default='')
SITE_PORT ='PORT', default=8080)
It is important to note that localhost must be specified by ip-address, otherwise aiomysql will try to connect to the database via a Unix socket, which may not work under Windows. Finally, play the last file (
import os
import asyncio
import vkf
async def createAuxTables(db_data):
if db_data.type is not "discrete":
await vkf.CAttributes(db_data.verges, db_data.dbname,
'', 'root', db_data.secret)
if db_data.type is not "continuous":
await vkf.DAttributes(db_data.encoder, db_data.dbname,
'', 'root', db_data.secret)
await vkf.Lattices(db_data.lattices, db_data.dbname,
'', 'root', db_data.secret)
async def createMainTables(db_data):
if db_data.type is "continuous":
await vkf.CData(db_data.trains, db_data.verges,
db_data.dbname, '', 'root', db_data.secret)
await vkf.CData(db_data.tests, db_data.verges,
db_data.dbname, '', 'root', db_data.secret)
if db_data.type is "discrete":
await vkf.FCA(db_data.lattices, db_data.encoder,
db_data.dbname, '', 'root', db_data.secret)
await vkf.DData(db_data.trains, db_data.encoder,
db_data.dbname, '', 'root', db_data.secret)
await vkf.DData(db_data.tests, db_data.encoder,
db_data.dbname, '', 'root', db_data.secret)
if db_data.type is "full":
await vkf.FCA(db_data.lattices, db_data.encoder,
db_data.dbname, '', 'root', db_data.secret)
await vkf.FData(db_data.trains, db_data.encoder, db_data.verges,
db_data.dbname, '', 'root', db_data.secret)
await vkf.FData(db_data.tests, db_data.encoder, db_data.verges,
db_data.dbname,'', 'root', db_data.secret)
async def computeAuxTables(db_data):
if db_data.type is not "discrete":
async with vkf.Join(db_data.trains, db_data.dbname, '',
'root', db_data.secret) as join:
await join.compute_save(db_data.complex, db_data.dbname,
'', 'root', db_data.secret)
await vkf.Generator(db_data.complex, db_data.trains, db_data.verges,
db_data.dbname, db_data.dbname, db_data.verges_total, 1,
'', 'root', db_data.secret)
async def induction(db_data, hypothesesNumber, threadsNumber):
if db_data.type is not "discrete":
qualifier = await vkf.Qualifier(db_data.verges,
db_data.dbname, '', 'root', db_data.secret)
beget = await vkf.Beget(db_data.complex, db_data.dbname,
'', 'root', db_data.secret)
if db_data.type is not "continuous":
encoder = await vkf.Encoder(db_data.encoder, db_data.dbname,
'', 'root', db_data.secret)
async with vkf.Induction() as induction:
if db_data.type is "continuous":
await induction.load_continuous_hypotheses(qualifier, beget,
db_data.trains, db_data.hypotheses, db_data.dbname,
'', 'root', db_data.secret)
if db_data.type is "discrete":
await induction.load_discrete_hypotheses(encoder,
db_data.trains, db_data.hypotheses, db_data.dbname,
'', 'root', db_data.secret)
if db_data.type is "full":
await induction.load_full_hypotheses(encoder, qualifier, beget,
db_data.trains, db_data.hypotheses, db_data.dbname,
'', 'root', db_data.secret)
if hypothesesNumber > 0:
await induction.add_hypotheses(hypothesesNumber, threadsNumber)
if db_data.type is "continuous":
await induction.save_continuous_hypotheses(qualifier,
db_data.hypotheses, db_data.dbname, '', 'root',
if db_data.type is "discrete":
await induction.save_discrete_hypotheses(encoder,
db_data.hypotheses, db_data.dbname, '', 'root',
if db_data.type is "full":
await induction.save_full_hypotheses(encoder, qualifier,
db_data.hypotheses, db_data.dbname, '', 'root',
async def prediction(db_data):
if db_data.type is not "discrete":
qualifier = await vkf.Qualifier(db_data.verges,
db_data.dbname, '', 'root', db_data.secret)
beget = await vkf.Beget(db_data.complex, db_data.dbname,
'', 'root', db_data.secret)
if db_data.type is not "continuous":
encoder = await vkf.Encoder(db_data.encoder,
db_data.dbname, '', 'root', db_data.secret)
async with vkf.Induction() as induction:
if db_data.type is "continuous":
await induction.load_continuous_hypotheses(qualifier, beget,
db_data.trains, db_data.hypotheses, db_data.dbname,
'', 'root', db_data.secret)
if db_data.type is "discrete":
await induction.load_discrete_hypotheses(encoder,
db_data.trains, db_data.hypotheses, db_data.dbname,
'', 'root', db_data.secret)
if db_data.type is "full":
await induction.load_full_hypotheses(encoder, qualifier, beget,
db_data.trains, db_data.hypotheses, db_data.dbname,
'', 'root', db_data.secret)
if db_data.type is "continuous":
async with vkf.TestSample(qualifier, induction, beget,
db_data.tests, db_data.dbname, '', 'root',
db_data.secret) as tests:
#plus = await tests.correct_positive_cases() = await tests.correct_positive_cases()
#minus = await tests.correct_negative_cases()
db_data.minus = await tests.correct_negative_cases()
if db_data.type is "discrete":
async with vkf.TestSample(encoder, induction,
db_data.tests, db_data.dbname, '', 'root',
db_data.secret) as tests:
#plus = await tests.correct_positive_cases() = await tests.correct_positive_cases()
#minus = await tests.correct_negative_cases()
db_data.minus = await tests.correct_negative_cases()
if db_data.type is "full":
async with vkf.TestSample(encoder, qualifier, induction,
beget, db_data.tests, db_data.dbname, '',
'root', db_data.secret) as tests:
#plus = await tests.correct_positive_cases() = await tests.correct_positive_cases()
#minus = await tests.correct_negative_cases()
db_data.minus = await tests.correct_negative_cases()
I saved this file in its entirety, since here you can see the names, the order of calling and the arguments of the VKF method procedures from the library. All arguments after dbname can be omitted, since the defaults in the CPython library are set with standard values.
Anticipating the question of professional programmers about why the logic of controlling the VKF experiment is brought out (through numerous ifs), and not hidden through polymorphism into types, the answer should be as follows: unfortunately, the dynamic typing of the Python language does not allow shifting the decision about the type of object used to the system , that is, in any case, this sequence of nested if occurs. Therefore, the author preferred to use an explicit (C-like) syntax to make the logic as transparent (and efficient) as possible.
Let me comment on the missing components:
The author has been involved in data mining tasks for over 30 years. After graduating from the Faculty of Mechanics and Mathematics of Moscow State University named after M.V. Lomonosov, he was invited to a group of researchers under the leadership of Doctor of Technical Sciences, prof. VK. Finn (VINITI AN SSSR). Since the beginning of the 80s of the last century, Viktor Konstantinovich has been exploring plausible reasoning and their formalization by means of multivalued logics.
The key ideas proposed by V.K. Finn, the following can be considered:
- the use of a binary similarity operation (originally, the intersection operation in Boolean algebra);
- the idea of discarding the generated similarity of a group of training examples if it is embedded in the description of an example of the opposite sign (counter-example);
- the idea of predicting the investigated (target) properties of new examples by taking into account the pros and cons;
- the idea of checking the completeness of a set of hypotheses by finding reasons (among the generated similarities) for the presence / absence of a target property in training examples.
It should be noted that V.K. Finn attributes some of his ideas to foreign authors. Perhaps, only the logic of argumentation with full right is considered to be invented by him independently. The idea of accounting for counter-examples by V.K. Finn borrowed, he said, from K.R. Popper. And the origins of checking the completeness of inductive generalization belong to him (completely obscure, in my opinion) the works of the American mathematician and logician C.S. Pierce. He considers the generation of hypotheses about the causes using the operation of similarity to be borrowed from the ideas of the British economist, philosopher and logician D.S. Mill. Therefore, he titled the set of ideas he created "DSM-method" in honor of D.S. Mill.
Strange, but emerged in the late 70s of the XX century in the works of prof. Rudolf Wille (Germany) does not use much more useful section of the algebraic theory of lattices “Analysis of formal concepts” (AFP) in V.K. Finn Regards. In my opinion, the reason for this is the unfortunate name, which, like a person who first graduated from the Faculty of Philosophy, and then the engineering stream of the Faculty of Mechanics and Mathematics of Moscow State University, causes rejection.
As the successor of his teacher's work, the author named his approach "VKF-method" in his honor. However, there is another decoding - a probabilistic combinatorial formal machine learning method based on lattice theory.
Now V.K. Finna works in the Exhibition Center. A.A. Dorodnicyn RAS FRC IU RAS and at the Department of Intelligent Systems of the Russian State University for the Humanities.
More information on the mathematics of the VKF-solver can be found in the author's dissertation or his video lectures at the Ulyanovsk State University (the author is grateful to AB Verevkin and NG Baranets for organizing lectures and processing their notes).
The complete package of source files is stored on Bitbucket .
The source files (in C ++) for the vkf library are in the process of agreeing on their placement at If so, a download link will be added here.
Finally, one final note: I started learning Python on April 6, 2020. Until then, the only language he had programmed in was C ++. But this circumstance does not relieve him of the accusations that the code may be inaccurate.
The author would like to thank Tatyana A. Volkovarobofreakfor support, constructive suggestions and criticism, which made it possible to significantly improve the presentation (and even significantly improve the code). However, responsibility for the remaining errors and decisions made (even contrary to her advice) is solely the author.