👩🏿‍🍳 👎🏾 🤸 Building a mathematical library of the future 😡 👩‍🌾 🐗

A small community of mathematicians is using Lean to build a new digital base. They hope it will secure the future of their scientific field.

Every day, dozens of like-minded mathematicians meet in the Zulip chat to work on what they believe to shape the future of their scientific field.

They are all fans of the Lean program . It is an interactive theorem proving tool that, in principle, is capable of helping mathematicians write proofs. However, for this, mathematicians must manually enter mathematical rules into the program base, bringing the knowledge collected over thousands of years to a form that a computer can understand.

For many participants in the project, its benefits are almost self-explanatory.

“At a fundamental level, it’s obvious that once something is digitized, it can be used in new ways,” said Kevin Buzzardfrom Imperial College London. "And we're going to digitize mathematics, making it better."

Digitizing mathematics is an old dream. The expected benefits of this range from the trivial - checking students' homework with computers - to the extraordinary: using artificial intelligence to make new mathematical discoveries and find new solutions to old problems. Mathematicians believe that automatic theorem provers will also be able to review articles submitted to journals, find errors that human reviewers sometimes miss, and do the tedious technical work of filling out the proof with all the necessary details.

But first, mathematicians meeting in a chat need to equip Lean with knowledge within the framework of school mathematics, and so far this task is only half completed. In the near future, Lean is unlikely to be able to solve open problems, but people working on the base are almost sure that in just a few years the program will at least learn to understand exam level issues in high school.

Who knows? The mathematicians participating in the project do not always have a clear idea of what digital mathematics can be useful for.

“We don't know exactly where we are going,” said Sebastien Gösel of the University of Rennes.

You plan, Lean works

In the summer, a group of advanced Lean users held an online seminar " Lean for Curious Mathematicians ". In the first lesson, Scott Morrison of the University of Sydney showed how you can write a proof using a program.

He began by writing down the statement he wanted to prove in Lean terms. In human language, it meant "there are an infinite number of prime numbers." There are several ways to prove this claim, but Morrison wanted to use a slightly modified version of the first evidence found by Euclid from 300 BC. In the proof, all known prime numbers are multiplied, and then, after adding 1, a new prime number is obtained (either the product itself or one of its divisors will be prime). Morrison's choice was based on one of the basic rules for using Lean: the user must come up with the main idea of the proof himself.

"You are responsible for the first guess," Morrison said in an interview.

Introducing a statement and choosing a strategy, Morrison outlined the structure of the proof in a few minutes. He identified a sequence of intermediate steps, each of which was relatively easy to prove. While Lean cannot offer a general proof strategy, it can help you execute small, concrete steps. Breaking the proof down into accessible sub-tasks, Morrison was a bit like a chef instructing ordinary chefs to chop onions or boil water. “This is where you hope Lean will take over and start helping you,” Morrison said.

Lean accomplishes intermediate tasks using automated processes called tactics. These are kind of short algorithms designed to perform very specific tasks.

Working with proof, Morrison launched a tactic called library search. She combed through the program's math database and returned several theorems that she thought could fill in the gaps in a particular section of the proof. Different tactics perform different math tasks. One of them, linarith, can take a set of inequalities for, say, two real numbers, and confirm the truth of the new inequality, which includes the third number: if a is 2 and b is greater than a, then 3a + 4b is greater than 12. The other does most of the work with basic algebraic rules such as associativity.

“Two years ago at Lean, associativity had to be applied manually,” said Amelia Livingston, a math student at Imperial College London who is studying Lean with Buzzard. - And then someone wrote a tactic that does it for you. I am happy every time I use it. "

In all, it took Morrison 20 minutes to complete the proof of Euclid. Here and there he completed the details himself; sometimes the tacticians did it for him. After each step, Lean checked the consistency of the proof according to built-in logical rules written in a formal language called dependent type theory .

“It looks like a Sudoku app. If you make a wrong move, it beeps, ”said Buzzard. As a result, Lean confirmed the workability of Morrison's proof.

It was a very fun exercise - as it usually happens when technology does something for you. However, Euclid's proof has been around for over 2000 years. The questions of today's mathematicians are so complex that Lean cannot even understand the questions, let alone help answer them.

“It will probably take decades for this tool to help with research,” said Heather Macbeth of Fordham University, one of Lean's users.

So, before mathematicians can work with Lean on questions that are really interesting to them, they have to add more math rules to the program. And this, in fact, is a fairly simple task.

Anatomy of a proof-building program: Lean helps mathematicians prove theorems by verifying work and automating some of the steps in proofs.

1) The mathematician describes the basic structure of the proof, and writes down the sequence of steps in Lean.

2) The mathlib library of math programs has a set of tactics that perform the steps of this proof. Mathematicians keep adding new data to mathlib. Proven theorems are added to mathlib.

3) The Kernel algorithm checks the correctness of the proof using simple rules.

* The octopus has somehow become the designation for joyful emotions in the Lean community.

“For Lean to be able to understand something, people need to translate math textbooks into a form that the program can understand,” Morrison said.

Unfortunately, the simplicity of a problem doesn't mean it's easy to complete - given that much of the math is not covered in textbooks.

Scattered knowledge

If you haven't studied advanced mathematics, this area may seem accurate and well described to you. Algebra, differential calculus, mathematical analysis - everything follows from everything, and there are answers to all questions, listed at the end of the textbook.

However, mathematics in the school curriculum, college curriculum and even most of the university is a vanishing small part of all knowledge. Most of them are not well organized.

There are huge and important areas of mathematics that are not fully described. They are stored in the heads of a small number of people who learned their sub-field of mathematics from people, learned it from those who invented it - that is, they exist on the basis of folklore.

There are other areas where the basic material has been written down, but it is so complex and long that no one has yet been able to verify its correctness. Instead, mathematicians simply believe in him.

“We rely on the author's reputation. We know that he is a genius and a meticulous guy, so the proof is most likely true, ”said Patrick Massot of the University of Paris-Saclay.

This is one of the reasons why theorem-proving programs are so attractive. Translating mathematics into a language understandable by a computer forces mathematicians to finally catalog their knowledge and precisely define all objects.

Assia Mabubifrom the French State Institute for Research in Informatics and Automation recalls the first time she realized the potential of such an orderly digital library: “I was captivated by the possibility of theoretical coverage of all mathematical literature using the language of pure logic, storing the entire mathematical library in a computer, checking and searching for data in it with the help of programs ".

Lean is not the first program to have this potential. The first, Automath, appeared in the 1960s, and Coq, one of the most popular today, came out in 1989. Coq users formalized a lot of mathematics in program language, but this work was decentralized and not well organized. Mathematicians worked only on projects of interest to them, and identified only those objects that they needed for their own work, often describing them in unique ways. As a result, libraries for Coq look cluttered, like a blueprint for a city that grew naturally.

“Coq is an old man with scars today,” said Mabubi, who has been active with the program. "He was supported by many people for a long time, and thanks to his long history, his shortcomings are now well known."

In 2013, Leonardo de Mura , a researcher at Microsoft, launched the Lean project. Its name [lean] reflects de Moore's desire to create an efficient and clean program. He assumed that the program would be a tool for checking the accuracy of the code of other programs, not mathematics. However, it turns out that checking the correctness of a program is a lot like checking a proof.

“We created Lean because we are interested in software development, and there is an analogy between creating math and writing code,” said de Moore.

Heather Macbeth of Fordham University, one of the Lean users

At the time of Lean's release, there were quite a few other helper programs, including Coq, which is most similar to Lean. The logical basis for both programs is based on the theory of dependent types. However, Lean offers a chance to start over.

She quickly attracted mathematicians. They used it with such enthusiasm that they began to take all of de Moore's free time with their questions about development in the field of mathematics. "He got a little sick of leading mathematicians and said - why don't you guys make a separate repository?" Morrison said.

Mathematicians created this library in 2017. They called it mathlib and zealously set about filling it with world mathematical knowledge, creating something like an analogue of the Library of Alexandria.in the XXI century. Mathematicians created and uploaded snippets of digitized mathematics, gradually creating a catalog for Lean. And since mathlib was new, they could learn from the limitations of past systems like Coq, and keep track of how the material was organized.

“There is a deliberate attempt to create a monolithic mathematical library, all the fragments of which will work with each other,” said Macbeth.

Alexandrian mathlib

The mathlib project home page has graphs showing the progress of the project in real time. The project has a list of leaders who invest in it the most, sorted by the number of lines of code [in the first place, by the way, is the Russian mathematician Yuri Kudryashov / approx. transl.]. The total number of digitized mathematical objects is also calculated: at the beginning of October, it contains 18,756 definitions and 39,157 theorems.

All of these ingredients of mathematics can be mixed together to create mathematics within Lean. So far, their range is severely limited, despite the seemingly impressive numbers. There is almost nothing from the areas of complex analysis or differential equations - and these are two basic elements of many areas of higher mathematics. In addition, the program does not yet know enough to even describe any of the Millennium Problems - mathematical problems defined by the Clay Mathematical Institute in 2000 as "important classical problems that have not been solved for many years."

mathlib is slowly but surely filling in. The work proceeds in the spirit of teamwork. In the Zulip chat, mathematicians select definitions to formalize, are challenged to write them down, and provide prompt feedback on the work of others.

“Any research mathematician can study mathlib and make a list of 40 things that aren't there,” Macbeth said. - Therefore, he decides to fill any of these shortcomings. And instant pleasure is guaranteed - someone will read your work and leave a comment about it within 24 hours. "

Many add-ons come out small - as Sophie Morel discoveredfrom Lyon Normal School during the online seminar "Lean for Curious Mathematicians". The conference organizers gave the participants relatively simple mathematical statements that needed to be proved in Lean as practice. Working with one of them, Morel realized that her proof required a lemma - a small intermediate result - that was not found in mathlib.

“It was a small piece from linear algebra, which, for some reason, was not there yet. The people filling mathlib try to grasp everything, but you can't foresee all the options, ”said Morel, who entered the required lemma three lines into the base herself.

Other contributions are more significant. For the last year, Gösel has been working on defining a smooth manifold for mathlib. Smooth manifolds are sets (such as lines, circles, or surfaces of a ball) that play a fundamental role in the study of geometry and topology. They also often play an important role in problems from areas such as number theory and calculus. Most mathematical research is impossible without them.

However, smooth manifolds can hide under different guises - it all depends on the context. They can have a finite or infinite number of dimensions, have boundaries or not have them, be defined over various number systems - on the set of real, complex or p-adicnumbers. Defining smooth diversity is like defining love: you recognize it when you meet it, but any strict definition will surely drop some of the most obvious examples of this phenomenon.

“When defining basic things, you don't have to make any choices,” Gösel said. "But more complex objects can be formalized in 10-20 different ways."

Gösel needs to maintain a balance of specificity and generalization. “I had a rule - I wanted to be able to define 15 different uses of manifolds,” he said. “At the same time, I didn’t want the definition to be too general, otherwise it would be impossible to work with it.”

The definition he developed fits into 1,600 lines of code - quite a lot for a definition from mathlib, but perhaps not so much compared to all the mathematical possibilities he opens up to Lean.

“Now that we've got the language we need, we can start proving theorems,” he said.

Finding the correct definition of an object of the correct degree of generality is the main occupation of the mathematicians filling mathlib. The library's creators hope to define objects in a way that is useful today, and at the same time flexible enough that these objects can be used in a way that is unpredictable today.

“The need for everything to be useful for use in the distant future is underlined,” Macbeth said.

Perfectoids are born from practice

But Lean isn't just a useful tool - it gives mathematicians a chance to re-engage with their work. Macbeth still recalls the first time she tried a program to help write proofs. It was 2019 and the program was Coq (although it is now Lean). She couldn't stop.

“On one crazy weekend, I worked with this program for 12 hours straight,” she said. "It was a real addiction."

Other mathematicians describe their experiences in a similar way. Lean, they say, is like a computer game - down to the same reward hormones that make it hard to put a game controller aside. “You can do it for 14 hours a day without feeling tired and in an elevated state all day,” Livingston said. "You get positive feedback all the time."

Sebastien Gösel

Yet the Lean community understands that for many mathematicians, there are simply not enough levels in their program.

"If you quantify the percentage of formalized mathematics, I would say that less than one thousandth of a percent is ready yet," said Christian Szegedi, a Google programmer working on AI systems that he dreams of being able to independently read and formalize mathematics textbooks.

But mathematicians are increasing this percentage. If today mathlib contains most of the material taught by university sophomores, the project participants hope to add the rest of the program in a few years - and this will be a significant achievement.

“In all 50 years of these systems, no one has ever proposed: let's sit down and organize a consistent knowledge base of mathematics that describes the institute material,” said Buzzard. "We are creating a thing that can understand the questions of the institute exam - this has not happened before."

It will probably be decades before the content of mathlib matches the research library, but Lean users are demonstrating at least the theoretical possibility of creating such a catalog - and achieving this goal simply depends on entering all the mathematics into the database in the form of program code.

To this end, last year Buzzard, Masso and Johan Kommelin from the University of Freiburg in Germany implemented a project that proves the viability of this dream. They temporarily postponed the gradual filling of the base with institute material and moved on to advanced areas. Their goal was to identify one of the greatest innovations in mathematics of the 21st century - an object such as the "perfectoid space" developed over the past ten years by Peter Scholze of the University of Bonn. In 2018, he received the Fields Prize for his work, the greatest award in mathematics.

Buzzard, Masso, and Commelin wanted to demonstrate that, at least in principle, Lean can work with mathematics that mathematicians are really interested in. “They took something very complex and new and showed that these objects can be manipulated with a program to help write proofs,” Mabubi said.

Kevin Buzzard

To define a perfectoid space, three mathematicians needed to combine more than 3,000 definitions of mathematical objects and 30,000 connections between them. Definitions have stretched across many areas of mathematics, from algebra to topology and geometry. Putting them together in one definition was a powerful demonstration of the growth in complexity of mathematics over time - and why it is so important to get the foundations for mathlib right.

“Many areas of advanced mathematics require you to know all kinds of mathematics that are taught to students,” Macbeth said.

The Trinity has succeeded in defining the perfectoid space, but so far there is little mathematicians can do with it. Many more mathematical definitions need to be introduced into Lean before the program can at least formulate those complex questions in which these perfectoid spaces arise.

“It's pretty silly that Lean knows what a perfectoid space is, but doesn't know complex analysis,” Masso said.

Buzzard agrees with him, calling the formalization of the perfectoid space a "gimmick" - a trick that new technologies are sometimes shown to demonstrate their usefulness. And this time the trick worked.

“You don’t have to think that thanks to our work, all mathematicians on Earth began to use programs to help write proofs,” said Masso, “but I think that many of them noticed this possibility and started asking questions.”

It will be a long time before Lean can become a real part of mathematical research. However, this does not mean that today the program is a picture from science fiction. The mathematicians involved in its development compare their work to laying the first railroad tracks - the necessary start of an important enterprise, even if they themselves may not be able to ride the track.

“It’s going to be such a cool thing that it’s worth the trouble today,” Macbeth said. "I am investing my time in her today so that someone in the future can have such an amazing experience working with her."

Building a mathematical library of the future

A small community of mathematicians is using Lean to build a new digital base. They hope it will secure the future of their scientific field.

You plan, Lean works

Scattered knowledge

Alexandrian mathlib

Perfectoids are born from practice

More articles: