Stains as Spectral Curves

Over the course of the last few months, the Library of Stains team has been analyzing data collected from the hundreds of images we gathered during three trips to the Library of Congress, the University of Pennsylvania, and the Universities of Wisconsin and Iowa.

Multispectral data analysis can take many shapes and forms. As part of the Library of Stains project, the team has applied a methodology specific to stains.  This post takes you through the steps of that process as it relates to characterizing stains and what is, and is not, possible to know.

The project uses Image J software, freely accessible on the internet, and the Paleo Toolbox, which was designed by Dr William (Bill) Christens-Barry of Equipoise Imaging.

The data that come to us straight from the camera images first need to be changed into .tif files.  This is done through Capture One, a Phase One software, and both sets of images can be saved in the same file. New software is being released soon and will do this step automatically.

Next, all .tif files need to be flattened.  Flattening images is done in two steps.  First you need to clean the flats – a series of shots of a white piece of paper taken on the same day as the images. Second, you flatten the set of images for a given side (or folio) to create an image that is evenly balanced agains the white light spectrum.  Using Image J again, the software does the work and automatically processes the series of “flats” in line with the manuscript images. The flattening process is a pretty easy learning curve, but somewhat time-consuming.  The good news is that this step will also be done automatically with the new software.

Once the files have been flattened, an intermediary step recreates a color image using Image J. This takes the flattened .tif files from a specific side (folio) and with the light information from the full stack of multispectral images, recreates a color image.  The Library of Stains team wanted to be able to have .jpg images that can be annotated and the specific areas on a given folio identified for the elements that were analyzed. 

Eventually the image will end up in Digital Mappa – a data curation software environment being used for our data visualization. In the meantime, to make our data analysis workflow as efficient as possible, we have put all the RGB color images into a powerpoint presentation.



Now, with the Powerpoint up on one computer, and Image J on another, we’re ready to begin analysis. We will be using  Image J once again to plot the z-axis for each spot that will serve to create a spectral curve.  For the the stain project, this meant we would be plotting a z-axis for the substrate (parchment or paper), the inks, the red and blue pigments used for rubrication and decoration, and of course, as many stains as our hearts desired.

To move from image files to spectral curves on an excel spreadsheet requires a few steps:

  1. Using the 10 non-filtered wavelengths we imaged for each side, we opened them in Image J and configured them into one stack.
  2. Scrolling through them highlights how different components on a given folio react to different light wavelengths.
  3. Plotting the z-axis on a particular part of the image is easy with Image J.  Outline the portion to be plotted with a small rectangle and choose “Plot z-axis.” The results immediately appear as a curve on the screen.  The curve can also be viewed as a list of number values.
  4. These values are plotted in the appropriate columns on an excel spreadsheet and labelled whichever element.  As shown below, we have columns for substrate, inks, pigments and stains.
  5. In order to see the true reflectance, the z-axis of specific component needs to be adjusted against the values for the white color checker.  Indeed, the white color checker values are the first to be plotted and inserted into the appropriate columns for each of the material components.
  6. Following the methodology devised by colleagues at the Preservation and Testing Division of the Library of Congress, on the spreadsheet the formula is automatically calculated by dividing the intensity of the ink, substrate, pigment or stain by the intensity of the white swatch on the color checker. This then is what is plotted on the x and y axis of the spectral curve – the x axis showing light wavelengths, and the y axis, reflectance levels.

Then comes the fun part.  We are able to begin to decipher the curves.

Spectral Curves for inks stains on University of Wisconsin manuscript MS 170A, no. 8.
Spectral Curves for all blue inks found in the University of Wisconsin manuscripts.
Spectral curves for all red inks found in the University of Iowa manuscripts.
Spectral Curves for possible wax stains in the University of Iowa manuscripts.
















Preliminary results from the University of Iowa and Wisconsin show variations of ink curves, as well as possible curves on a number of folios that may indicate wax residues.  Further analysis is underway and final results, alongside the data itself, will be ready for open access by August 31, 2018.

With much thanks to Leah Pope Parker, PhD candidate in English at the University of Wisconsin, for her intellectual contributions to this project, as well as countless hours analyzing and visualizing data.


Multispectral Imaging: People, Processes & Technology

Michael B. Toth

President, R.B. Toth Associates

Alberto Campagnolo, Erin Connelly and Heather Wacha setting up manuscripts at SIMS for multispectral imaging as part of the “Stains Alive” project.

Alberto Campagnolo, Heather Wacha and Erin Connelly have discussed the Stains Alive Project, citing our imaging technology, work processes and data output in support of this unique scientific study into stains on ancient manuscripts. This builds on almost two decades of work we’ve put into developing our equipment and techniques. As we continue our journey from the University of Pennsylvania, Schoenberg Institute of Manuscript Studies (SIMS) and Library of Congress, and venture out on the rest of our journey to the Universities of Wisconsin and Iowa (I’ll try to ignore the forecasts of 16°F and snow showers) I thought I’d discuss what some consider the more mundane aspects of these projects that often go unrecognized.

A small sample of the standardized data output from “Stains Alive” imaging at SIMS

The methodologies and technologies we use for multispectral imaging today are based on our 18 years of experience in narrowband multispectral imaging systems development. Yet for all the advances in the latest equipment – higher resolution sensors, better signal to noise ratio, improved illumination panels – success or failure of these projects depends on more than just the technology. A successful program also requires solid work processes and dedicated people. This is where systems integrators and program managers come in – not just to make sure all the technology is working together, but to ensure the project is fully supported by the processes and people as well. Without these – especially the latter – a project can yield some pretty pictures for scholars and conservators to gasp and drool over, but might not successfully produce and preserve the solid corpus of standardized data and metadata for future generations to study.

Multispectral imaging sequence of images in a darkened room, each illuminated by different wavelengths of narrowband LEDs shining on the manuscript from Ultraviolet to Infrared light

As Alberto noted in his blog, the current narrowband multispectral imaging system used for this project includes commercial-off-the-shelf hardware and software for digital spectral image capture and viewing with the integrated system. This includes customized image processing software developed by Bill Christens-Barry of Equipoise Imaging to allow users to exploit the spectral images, utilizing techniques developed in other scientific and cultural heritage studies.

Our Phase One high-pixel-count camera takes a series of high-quality digital images, each illuminated by a specific wavelength of light from banks of light emitting diodes (LEDs). Everyone tends to get excited about being able to observe multispectral imaging of manuscripts: the sequences of various colored lights are visually compelling, you are seeing new features on an object and are part of leading edge studies. But Heather, Erin and Alberto are learning that after a few sequences of images in a dark room, many people don’t have the patience for more and excuse themselves. System operators like Meghan Hill Wilson and the PRDT team in the Library of Congress, the CHIC team at the John Rylands Library in Manchester, the digitization team in the Duke Libraries, Cerys Jones at UCL, and Damian Kasotakis in the Sinai are the unsung heroes of these projects, as they work in dark rooms day after day setting up and imaging manuscript leaf after manuscript leaf. This is where checklists are needed to make sure mistakes don’t creep in.

Compressed pseudocolor image of SIMS manuscript inner cover digitally processed from a sequence of captured images

The resulting image set is then digitally processed and combined to reveal residues and features in the object that are not visible to the eye in natural light. These processed images generated from the captured images provide the data needed for research into stains and residues.   Lots of data! Each archival 16-bit Tiff image from our current 60 Megapixel Phase One monochrome camera is about 117 MB in size, and we capture 15-18 images in a sequence. So each sequence yields about 2 GB of captured data. Multiply that by the number of leaves imaged and we are quickly piling up data. By the end of this short project, the team will have collected about a quarter terabyte of captured data alone.

Workshop on multispectral imaging system and processing tools at SIMS

While the processed images are usually stored as 8-bit Tiff images, with multiple processed images available from each sequence – including some larger pseudocolor images – they add up to yet more data to store. With open source image processing tools and training, scholars and conservators can now produce their own processed images to meet their research needs, which also need to be managed.


All these data require good metadata and file structures, for without it we would be blindly trying to find data across hard drives and the cloud. And when we found them, we wouldn’t be able to remember details about the imaging, spectral illumination or object. This highlights the additional unsung heroes of our multispectral imaging: the data managers and administrators. The dean of this cadre is Doug Emery, whose pioneering work on data management and preservation on the Archimedes Palimpsest Project was “recognized” by Program Director Will Noel’s dedication in his book (below):

“To Doug Emery, Whose critical contribution to this project goes unrecorded in this book. Sorry. Metadata doesn’t sell. Thank you so much! Will Noel”

Data Manager Doug Emery and Data Administrator Susan Marshall standardizing and organizing Sinai Palimpsests Project data and metadata

Doug’s work, and that of so many others responsible for the metadata and data output, has proven critical to multispectral imaging programs ranging from various palimpsest projects to David Livingstone’s Diaries, Top Treasures at the Library of Congress, and mummy masks around the globe. Starting with the Archimedes Palimpsest Metadata Standard (really a specification) Doug, Bill Christens-Barry and I developed over a decade ago, multispectral imaging data management has advanced on the shoulders of pioneers working with the Image Interoperability Information Framework, Dublin Core, the Text Encoding Initiative, and others. Only with the diligence and attention to detail provided by dedicated data managers and administrators have large amounts of multispectral image data have been archived and made available online for global access.

Training workshop for PACSCL members at SIMS on multispectral image processing and work flow to meet users’ diverse goals

For Stains Alive and our other multispectral imaging projects, we use the latest technology, which is always getting better. At SIMS and for the Philadelphia Area Consortium of Special Collections Libraries (PACSCL) we were able to try out the latest 100 MP Phase One camera back thanks to a loan from Digital Transitions. The CMOS sensor allowed us to autofocus and captures more detail in larger images, while capturing even more high quality data. With these new cameras, illumination panels, processors and other technologies for multispectral imaging, we also have to continuously improve our work processes. Most of all, we need to ensure the people on the team have all the resources they need to carry out their goals.

Going on a Stain Hunt

The imaging schedule for the #Stains Alive project has been set. PIs will soon be welcoming Mike Toth, multispectral imaging expert from R. B. Toth and Associates, to the University of Pennsylvania, the University of Wisconsin – Madison, and the University of Iowa. But before he arrives with his imaging equipment, the first task is to single out interesting-looking stains in our respective collections.


MS 257, f. 110r
University of Wisconsin, Special Collections, MS 257, f. 110r
Screen Shot 2017-10-21 at 7.51.41 PM
University of Pennsylvania, MS Codex 115.

In other words, we’re going on a stain hunt. For us, what makes a stain a possible candidate for imaging is its size, shape, placement on the page, color, and the genre of manuscript in which it appears. Stains found in alchemical texts or book of recipes may not be the same kind of stains that appear in a Bible or literary text. Indeed, a note on the book of remedies at right suggests that the stain is due to “a chemical spilled on the ms by an alchemist.”





IMG_2672Screen Shot 2017-10-18 at 12.36.06 PMInformation about the stains chosen for imaging is catalogued in a spreadsheet, starting with the call number and the folio on which the stain appears. Since a camera will be attached to a copy stand and sit above the manuscript, we also measure the x and y axis of the folio/manuscript, as well as the z axis, i.e. how far the folio comes up from its base. If the stain appears on the recto of a folio, the z axis is measured from the back cover (for those manuscripts written in European languages) up to the folio with the stain. If the stain appears on the verso of a folio, the z axis is measured from the front cover up to the folio. Changing the focus on the camera each time a new stain is placed underneath it takes time, so having an idea of how far the stain sits above the table facilitates streamlining the sequencing of images so that the workflow can be as effective as possible.


MS 170a, Box 1, no. 8
University of Wisconsin, Special Collections, MS 170a, Box 1, no. 8

In between the measuring and the cataloguing of these manuscripts, it’s always nice to step back from measuring and entering data, and momentarily travel back through time to be in the room in which there was the manuscript and the person who accidentally, or perhaps intentionally, left a stain –  a visible physical trace of human interaction that has endured through the centuries and can be studied today with technology like multispectral imaging.

Dirty Old Books

Screen Shot 2017-09-01 at 6.17.08 PM
Free Library Lewis 003, f. 18v.

At some point in your career as a reader of books, you may have accidentally spilled coffee or left a stain on a book you were reading, just like someone did with the book in the image at left. Stains in and on books are usually seen as inconveniences at best and tragedies at worst. The Library of Stains project proposes to focus on these oft-disparaged “dirty” old books and the stains found in them, using them as a tool for gathering scientific data that will provide clues to how previous generations used and stored their reading material.  This project examines a variety of stains found on parchment, paper, and bindings from medieval manuscripts.  The data will provide a new approach for learning about the history of the book, book conservation, the materiality of books, and will offer both scholars and the public an opportunity to engage in the intimate connection between readers and what they read.

The Library of Stains project is conceived broadly as a first foray into providing a fixed dataset for characterized stains that are commonly found on manuscripts, a sound methodology for the replication of gathering and analyzing the data, and a clear explanation for how to implement and use the database as a means to further the study of medieval manuscripts and their conservation. In so doing, the Library of Stains hopes to equip scholars with additional tools for analyzing their manuscripts vis à vis provenance, use, transmission, preservation and materiality.  The project also aims to engage both scholarly and public audiences with the intrigue of studying manuscripts traditionally pushed aside and dismissed due to their “dirty” or “stained” appearances. Contextual information will be provided concerning each manuscript studied in order to elicit public participation in the making and identifying of stains.  If not coffee stains, as humans we are probably all guilty of leaving some sort of stain, perhaps a tear on an old letter, or blood from an accidental cut on a recipe book.  This project will bring together a human audience in order to explore and study the human experience, be that a medieval person’s relationship to a manuscript or how that information relates to our interactions with books today.