Restoration of lost texts using modern technologies. Iron

First, some news.



As you may remember, in 2018 I published the article How We Managed to Read the Manuscript, found in the 80s near the third crematorium in Auschwitz-Birkenau . You can also read an interview with me in the new newspaper .



image



After the joint work, the new information made the Birkenau Museum itself and historians move. For the first time Pavel Polyan published The Scrolls from the Ashes in German



In January 2020, we receive a letter from our historian friend Andreas Killian from Frankfurt with a link to the Auschwitz Birkenau museum shop.



. , , , , , ยซยป , , .



, . . , , . , ! . .   , , . , , . , ! , .



In Russia there is a large decentralized movement to search for and perpetuate the memory of soldiers who died in the Second World War. Search teams are scattered throughout the country and often swear at each other because of the incompetence of opening the breast medallions, irrevocably destroying them. But what manages to unfold is far from easy to read. The same situation is with letters or memoirs of relatives, a huge amount of spoiled material in archives in the regions of our country. A bunch of other documents, seemingly spoiled at first glance, has a huge potential for reading. The existing forensic literature that came to me through Google is extremely outdated. After looking at Russian publications on the reconstruction of letters of Dostoevsky, Chekhov, talking with archivists from the state. institutions, private companies, after studying publications and experience of Western colleagues,it was decided to prepare this educational overview of modern technologies (or as it is now fashionable to say:top 10 features for an unreadable letter inherited from your great-grandfather ).



This material belongs to the class of cultural heritage studies and is compiled from scientific publications available over the past 15 years, as well as my experience and analysis.



In this publication we will move from the complex to the accessible, and in the next, more practical one, we will talk about algorithms and software.



  1. X-ray micro tomography
  2. X-ray phase contrast imaging
  3. X-ray fluorescence imaging
  4. Optical coherence tomography
  5. Terahertz imaging
  6. Infrared thermography
  7. Raman spectroscopy (RAMAN imaging)
  8. Multispectral imaging
  9. Choice of technology.


1. X-ray microtomography



University of Cardiff (UK)



I will describe the standard interest of a museum archivist. Something old, unexplored and very interesting. For example, a 16th century judicial scroll from the Diss Heywood estate in Norfolk (UK) will do just fine. For some reason from the sad past, it is burned by fire and attempts to soak and unfold it can destroy both the ink and the medium itself. Iron ink (most likely it is) on the burned areas is completely unreadable. In addition, there is soot and other adhering debris on the scroll. In theory, it should contain information about life on the estate, land transactions, peace violations, payment of fines, names of jurors and other bureaucracy. The data from it can be used to study demography, crop yields and history itself. Who knows what's behind the U-turn, if you don't look? And to deploy it without consequences is possible only virtually. 





An X-ray tomograph was used as the equipment for removing the virtual copy. I will not go into the name of the model as now and further in the text, because scientists work on what is either free for current dates or generally available. In addition, there are so many settings, additional devices, manual calibrations and measurements that this procedure is unique from experiment to experiment. It happens that scientists are forced to speed up an experiment to the detriment of resolution, because time is pressed.



Scanning process
. , . 2511   3 . .



The scroll is supposed to have no complex overlays inside. Therefore, it is possible to avoid solving the problem of analyzing its internal orientation. This was confirmed by the first scans.





One of the tomographic sections



As a rule, the difference in density of parchment and air in X-ray tomographic images is very significant. And so the processing to retrieve their content begins by performing automatic segmentation using a threshold filter (binarization). But this is half the trouble, since there are many places of stuck together parts or holes, for which manual adjustment is necessary.





Demonstration of the layer separation algorithm The



assumption of the average thickness of the parchment of the entire document allows you to divide the merged area into several layers as evenly as possible.



Initial analysis reveals that Diss Haywood's scroll consists of four tightly rolled sheets, with text on two sides of each. If you miss the desired layer due to segmentation errors, the text will crumble. 



Surprisingly, this process was almost completely automatic! Due to severe damage, manual correction required only 15 out of 8044 slices.



The segmentation algorithm itself was not the most optimal (the researchers write that the creepy shit code, and even in the matlab) took 4 minutes for 1 slice! So it took about 3 weeks to segment the entire scroll. However, correcting 15 times out of 8000 in three weeks is still very good compared to other studies. 



This is what a virtual deployment looks like. 





On my own I will add, ideally you need such software, in which clicking on the text of a virtually expanded copy, you could locally adjust the depth of segmentation. Then we will have the opportunity to choose the most readable separation boundary. This is a more thorough procedure, which should be passed on to the translators themselves. The task of scientists at this stage must be completed.



Despite the stunning result, it is based on highlighting the existing contrast between parchment and ink. You can see black dips in the manuscript, these are the areas where the X-ray could not highlight the contrast in the material. But what if the sample is completely charred?



2. X-ray tomography with phase contrast



From school, we know about the eruption of Vesuvius, which happened in 79 AD. Someone remembers Karl Bryullov 's painting "The Last Day of Pompeii" . The result of this catastrophe was the destruction of Roman cities, especially Pompeii and Herculaneum. Burial under thick layers of volcanic material created some kind of conservation of these places for hundreds of years. Today, this place has become an absolutely amazing opportunity for students of ancient Greco-Roman culture.



After the first discovery of papyrus scrolls in 1752, an entire library was discovered in a small room in a huge villa, containing hundreds of handwritten charred scrolls, carefully stored on shelves. This rich book collection, consisting mainly of Epicurean philosophical texts, is a unique cultural treasure. This is the only ancient library preserved along with its books!



How many attempts were made to unfold these half-charred scrolls! All this led to their irrecoverable loss. It was decided to preserve their physical integrity in the hope of the great minds of the future.



Over the past 20 years, significant progress has been made in reading the texts of Herculaneum. The use of binocular microscopes and multispectral imaging (we'll talk about this below) have significantly improved the readability of these texts. Unfortunately, these methods are inapplicable to texts that remain folded, and generally resemble a lump of coal from your barbecue, dear reader.





As mentioned above, in X-ray computed tomography, the contrast extraction mechanism is based on the absorption of X-ray radiation. This method works especially well to distinguish highly absorbent materials from weakly absorbent materials (bone and meat).



In ancient times, papyri were written with carbon-based ink obtained from soot, the density of which is almost the same as that of the charred papyrus itself. It was the closeness of these physical properties that for many years did not allow finding the contrast necessary for the isolation of texts. 



After examining similar unburnt manuscripts, the researchers concluded that the applied ink did not penetrate into the papyrus. This means that they are applied over the material. This fact turned out to be decisive for the experiments, because using phase contrast it is possible to find exactly this difference. Different material thicknesses have different refractive index (X-ray phase shifts). The height of the ink above the papyrus is about 100 microns. It was this technology that made it possible for the first time to isolate sufficiently readable characters. 



Unlike a scroll from England, this papyrus is extremely difficult to unwind the inner layers. Because segmentation algorithms are useless due to complex surfaces. Continuous sections of text were identified manually in almost all cases. 





This groundbreaking study opens up new perspectives not only for many papyri, but also for those that have not yet been discovered. Perhaps there is another library beneath the deeper volcanic rocks!



3. X-ray fluorescence imaging



Stanford Laboratory. (USA)



Have you heard anything about palimpsests? Documents in which the information was much cheaper than the medium itself. Unnecessary texts could be scraped off, bleached and overlapped with fresh new ones. 



Galen of Pergamon - physician of emperors and gladiators. His text "On the Mixtures and the Power of Simple Medicines" was translated into Syriac in the 6th century to spread his ideas throughout the ancient Islamic world. The restoration of this text will allow us to understand how diseases were treated at that time and this is very valuable information. Unfortunately, despite the doctor's fame, the most complete and extant version of the translation was erased and rewritten with hymns in the 11th century. Earlier research revealed traces of text underneath, but they were unsuccessful - both texts were written in the same ink, and the main one was well cleaned out. It was not possible to achieve the necessary contrast for reading for 10 years.



Recently, an international team of researchers showed excellent results with the Stanford Synchrotron Radiation Source (SSRL) at the SLAC National Accelerator Laboratory.



 โ€œWe hoped there would be enough ink traces for us to decipher even one or two words,โ€ says Uwe Bergmann, a staff scientist at SLAC who led the X-ray imaging project. "The crisp letter we see now marks a huge success."



Of course, the team feared that even with the powerful X-ray imaging techniques in SSRL, the text might still be illegible. For example, the amount of iron in the remaining ink is too low or too smeared.



X-ray fluorescence imaging works on the principle of knocking out electrons near the nuclei of metal atoms. These holes are filled with external electrons, resulting in characteristic X-ray fluorescence that can be detected. Galen's hidden text and the new religious text fluoresce slightly differently because their ink contains various combinations of iron, zinc, mercury, and copper. The difference in centuries cannot but be reflected in the composition of the ink, and these are the necessary differences that will make it possible to separate the obtained data arrays.



It takes approximately 10 hours to scan one sheet for each of the 26 pages. The result is a huge amount of data. I even had to resort to machine learning to extract information. It is extremely difficult to make out with your hands.







At the end of January 2019, Michael Tott posted a photo on his Twitter account. In the channel, which is responsible for the presence of sulfur in the manuscript, a great contrast was found.





And this is a diagram of the elementary composition of a section of the manuscript.




Personally, I would like such a Photoshop, where the layers of the image would act as its constituent chemical elements. What would the color space be called then?



The manuscript is still under study.



4. Optical coherence tomography 



Duke University (USA)



This photon imaging technique is mainly used in ophthalmology. For example, for premature fetuses, the degree of brain development can be determined by the fundus. The technology is based on a similar principle as with ultrasonic measurement, only infrared rays (850nm-1000nm) serve as radiation. The images are highly detailed (the microscope is a bonus), and due to the properties of infrared rays to penetrate into tissues by 1-2 mm, we have the opportunity to obtain a volumetric array, through which we can make โ€œslicesโ€ at the required depth.



Papyrus




A case of studying a sample of papyrus from the 2nd century BC is described. In ancient Egypt, the middle-class dead were mummified with a mask made from scraps of papyrus - such as papier mรขchรฉ, then primed and paints were applied. There are suspicions that this papyrus was taken by a used one with some existing text. Some scientists, according to Michael Tott, dissolve masks in dish soap to get to those layers of papyrus under the paint. Everything would be fine, but it destroys the artifact, and the procedure depends on the straightness of the hands and does not give any guarantees. If the only problems were in the desire for non-invasive research, so go and get it out of the country! Laws prohibiting the export of cultural heritage samples, bureaucracy, packaging, shaking, etc. It so happenedthat Sister Cynthia Tott works as an ophthalmologist not far from the university's papyri archive (a few minutes walk). Her institution has an optical coherence tomography scanner.





Before you in the role of a pistol is the same optical scanner, and interested persons





Here in the photo in the background on the glass lies the very strip of papyrus. The result of the scan was a hypercube, by cutting off the cap of which (tearing off the first layer of wallpaper in your favorite Khrushchev, dear reader) you can really distinguish the symbols of the alphabet!





Don't be surprised to see symbols that are familiar to you. Michael argues that at that time Greek was the language of government, so the search for symbols does not require the involvement of native speakers of a dead language, but the main difficulty of working with this equipment and this level of tasks is that almost all resources in the world are focused on solving the problems of maintaining health and life, which is understandable. There are very few specialists, and even more free and ideological ones. And the existing software solutions are not prepared for solving problems related to cultural heritage. However, it is a promising technology.



5. Terahertz visualization



One of the young technologies that is gaining momentum in a huge number of areas lately. I could not find successful large-scale cases for manuscript recovery. There are many analytical experiments confirming the presence of potential, and in some cases exceeding X-rays due to the contrast highlighting among non-ferrous elements. In general, there is a good and very interesting lecture about this technology.







The wavelengths used, from 100 gigahertz to 3 terahertz, can penetrate paper and many other materials. The radiation is not ionized and therefore safe for humans. Based on the statistics of the reflected fields over time, it is possible to localize each page.



Here is an animation showing the letters LAZ, THZ in turn. These letters were printed on a laser printer and stacked. The emitter was placed on top and the reflected signal was able to distinguish the text of up to 20 sheets. Deeper - the signal was reflected with an already unreadable number of accumulated errors.





The Metropolitan Museum of Art in New York is interested in this approach, because their archives contain books that are prohibited from opening under threat of destruction. And access to tomography is not so easy. The availability of equipment is a big plus. Unlike previous technologies, there are already several complete products on the market that are ready to plug directly into a laptop via USB. 



6. Infrared thermography



Now we consider the construction of an image in the range of thermal imaging cameras. Active pulse thermography has been successfully used to non-invasively highlight ancient texts in parchment-bound books. As an example, we can cite the results obtained from the analysis of a 13th century manuscript (ms 509 / D813) stored in the Angelica Library of Rome. The manuscript is a summary of the Old Testament and consists of 127 written parchments. Some of them were damaged by water. Last pages with large blurred spots that make the text unreadable. 



Thermograms performed on various damaged areas show partial restoration of ink text in all areas examined.





Such thermograms were obtained using two 1 kW flash lamps. The loss of the pigment component of the ink does not mean that the rest of its components are washed away. The ability to restore contrast may be due to the temporary heating of ink residue areas that effectively absorb some of the incident light.



7. Raman spectroscopy



 Bodleian Library. Oxford



In the case of laser irradiation of any substance, in addition to Rayleigh scattering, an extremely small part of the reflected signal changes its frequency component. Spectral lines appear that were not in the primary light source. The number and location of the lines that appear are determined by the molecular structure of the substance. Thus, you can determine its composition. When installing a laser on a cnc machine, you can take this data by coordinates and then form an image from the elementary composition. This method is very popular for studying the pigment composition of paintings and revealing hidden inscriptions. True, in working with the Armenian manuscript, the goal was a slightly different task. It should be noted that the laser irradiation is very weak, but still damaging.







And this is how the resulting pigmented cover mask looks like, based on the elemental composition. This example shows the result for the red pigment. 





Not so cool, you say. After all, one way or another you can try to isolate such a mask from a photograph? It turns out that photography is also an analytical tool? 



8. Multispectral analysis



And so, we come to what actually makes sense to talk about when it comes to the availability of technology. Most of the world's largest museums and archives have this equipment at their disposal. 1993 The Dead Sea Scrolls became one of the first manuscripts studied using spectral imaging. However, at that time, researchers were trying to restore faded or illegible texts using infrared film.





The film is gone, the digital has arrived and super-bright LEDs (or a set of filters and two construction halogen lamps). The essence of the technology is quite simple. You need to make about 12 digital images on a black and white matrix (very desirable) in 12 different spectra from the optical range: three in IR, then red, amber, orange, yellow, green, cyan, blue, violet and UV. In the photo above, there are two LED spotlights that are currently illuminating the sample in UV light. Based on the results, further conclusions are drawn about the sample: is there potential, will software help us, and should we start trampling down the offices of officials, knocking out a budget for a trip to a national research laboratory. 



 In 2020, scholars studying text-free parchment material from the Qumran manuscript accidentally discover letters. A huge number of small parts were never examined for the presence of texts, because there was no hint of this. Some areas were even specially cut. for some other tasks. And when reshooting in the IR spectrum, what seemed empty suddenly turned out to be a sensation.





One of the greatest explorers, David Livingston, devoted most of his life to Africa, walking over 50 thousand kilometers. In one of his last works, instead of running out of ink, he used the juice of a local berry. But the beautiful contrast lasted only for the first time. When the manuscript reached his colleagues, the juice lost its pigment. She waited 140 years to be fully read. By the way, the project to study his diary  took 1st place in DHawards in 2016. 





In the image above, the manuscript page and further combinations of the obtained spectral images are suitable both as noise suppression masks and directly enhancing the contrast of the necessary elements.



The newspaper text was suppressed by a mask from the IR range, since the berry juice was absent there, but in other channels it was present in a more contrasting manner along with the newspaper one. The result of the decryption was a story in which Livingston was a direct witness to the terrible massacre among the slave traders. He was so amazed by what was happening that he interrupted his search for the source of the Nile. Today the manuscript has been fully transcribed and is available to anyone. But since you, dear reader, most likely live in a time when they do not appreciate what is given for free, you most likely will not read it.



On the British Library blog, you will also find regular results from multispectral imaging research. 800 years old! Magna Carta (Magna Carta) has shown excellent results, despite her condition. Or the result of the Gospel of Bodmin . 9th century. Look closely, this is the same page.





To get a better understanding of how the process works, there is a good video of the multispectral laboratory review.







hsopensource



Moreover, if it seems to you that this is not available to mere mortals, my Italian friend Antonino Cosentino (being a scientist) will tell you about his project https://chsopensource.org/ where he shares the results of his research on the use of household DSLRs and lenses in multispectral shooting. Antonino, of course, is not concerned with restoring the lost texts. He deals with the general task of studying cultural heritage and helping museum workers around the globe. His Antonello projectdevoted entirely to this. However, I'm not sure if a filter kit instead of LED spotlights is the best solution. Both there and there are nuances. To better understand how color pigments behave in multispectral photography, I'll show you Antonino's pigment chart.





Here you can see how many pigments in IR become transparent or reflect or absorb IR rays, and how everything looks completely different in UV. Shooting between IR and UV will also show its own set of contrasts.



Now, with sufficient knowledge, let's move on to a comparative analysis of the above methods in order to find out which of the above methods is best for examining the intended sample.



9. Choice of technology



The study of papyri, one of the most popular topics of cultural heritage. In one of the scientific papers, the researchers wondered how to shine on the mummy. Is it worth going through technologies one by one in search of a result, or is it better to narrow down the choice in advance? 



If the ideal conditions are reproduced, it will be possible to reason fairly accurately about the technology's ability to reveal certain pigments better than others. 



Researchers using ancient technology prepared 4 sheets of papyrus 10x15cm (phantoms), divided into four zones. Each zone on each sheet was marked with a bold cross of a different ink composition clockwise, so that there was no situation in which the crosses in folded packs of papyri would overlap each other.





Three types of ink were chosen for historical reasons, and the last one is modern (what a place to waste):



  • carbon (soot, coal)
  • iron oxide (most common)
  • glandular ink (to a lesser extent)
  • modern carbon ink (Winsor and Newton, UK, UK)


Multispectral imaging provides excellent surface detail with high resolution iron and carbon based inks but limited depth penetration. 





However, this disadvantage is mitigated to some extent by shooting in transmission.





Optical coherence tomography offered unexpectedly low penetration due to the high optical attenuation of papyrus. 





X-ray methods allowed identification of iron-based ink even with the addition of additional sheets of papyrus on top of the phantoms, but they could not detect carbon-based ink.





X-ray fluorescence imaging





Crosses matching modern ink and carbon-based have not been found. Carbon is a light element (atomic number 6) and fluoresces at too low an energy to be detected by the system being used. The lightest element that could be detected was phosphorus (15). The iron present in the glandular ink (26) was clearly visible and could be distinguished from the background even after 6 layers of papyrus.



Phase Contrast X-ray Tomography



Due to time constraints, only crosses with iron oxide and carbon inks were taken from the researchers. The fibrous structure of the papyrus is clearly visible. The crosses are also visible due to the different refractive index with the papyrus. Traces of carbon-containing ink are also rather faintly visible.





Terahertz imaging,  to the surprise of the researchers, was able to detect carbon-based inks better than iron-based inks. THz waves are supposed to be sensitive to inks that are not visible with X-ray techniques. These results are supported by previous research.



Sample results


I am pleased to bring this topic to the Russian Internet, because, for the first time faced with the need to study this, I discovered how important this material can be. I decided not to fit everything into one article due to the vastness of the topic. In the next article we will talk about algorithms and digital image processing.



If you wish, subscribe to my twitter before it ever becomes fashionable or before mitzgol finishes his hypertext fidonet.



Sources
https://www.research.ed.ac.uk/portal/files/59293691/IST_Archiving_Paper_Mummy_OCT_MSI.pdf



https://www.nature.com/articles/s41598-018-29037-x



https://www.nature.com/articles/ncomms6895



https://www.semanticscholar.org/paper/Application-of-terahertz-spectroscopy-for-character-Fukunaga-Ogawa/422ab4431a929b269800ee3d95a6833b7777f493



https://heritagesciencejournal.springeropen.com/articles/10.1186/s40494-018-0206-1



https://www.media.mit.edu/projects/reading-through-a-closed-book/overview/



https://heritagesciencejournal.springeropen.com/articles/10.1186/s40494-018-0175-4




All Articles