Impression of a Buenos Aires slaughterhouse by Charles Pellegrini, 1829.
[First Published on the NiCHE Website] By Andrew Watson with Jim Clifford For the past two weeks I’ve been in Saskatoon, working with Jim Clifford in the University of Saskatchewan’s Historical Geographic Information Systems (HGIS) Lab. Since January 2014 I’ve been working with Jim and Colin Coates on the Trading Consequences research project thinking about how historians can use these valuable new text mining, database and visualization tools to understand the economic and environmental histories of global commodity flows during the nineteenth century. This trip to Saskatchewan has allowed Jim and I to focus our energies on using Trading Consequences for historical research. We used text-mined spatial data in conjunction with trade statistics and textual sources as a means of testing the search results and functionality of Trading Consequences. To do this, we chose a case study: the history of leather tanning related commodities during the nineteenth century.
Neckinger Leather Mills Wellcome Images on Flickr Creative Commons by-nc-nd 2.0 UK
We chose leather tanning for our case study because this topic intersects with both our research interests. Jim is interested in how industrial development across London, including the leather district of Bermondsey, contributed to broader environmental transformations through the development of global commodity flows. Part of my recently completed doctoral research examined the economic and environmental dimensions of hemlock bark harvesting for leather tanneries in Muskoka, Ontario during the same time period. Trading Consequences provides the opportunity to learn more about the ways tanneries in Muskoka and London functioned as part of transnational networks in hides, tannins and leather. Apart from some primary and secondary source background reading, our work over these initial two weeks of research on this project focused almost exclusively on exploring nineteenth century trade statistics for Britain and, to a lesser extent, the United States. Theses statistics came mainly from the Annual Statement of the Trade of the United Kingdom with Foreign Countries and British Possessions, which the HGIS Lab’s research assistant, Stephen Langlois, entered into a Commodity Flows database. With the help of Jon Bath, Director of the Digital Research Centre at U Sask, Jim and I exported the statistics from the Commodity Flows database to create spreadsheets, graphs and maps, which we used to help us understand broad patterns and trends in the global trade of leather tanning commodities during the nineteenth century. One of the tools we used to start to get a sense of the transnational connections of these commodities is SourceMap.com, a web-based supply chain mapping service, that allows users to generate maps populated with directional flow information. Using the information from the Commodity Flows database related to where commodities originated as well as their destination, Jim created four maps representing the flow of leather tanning related commodities at different points in the nineteenth century.
Nineteenth century place names are a major challenge for the Trading Consequences project. The Edinburgh Geoparser uses the Geonames Gazetteer to supply crucial geographic information, including the place names themselves, their longitudes and latitudes, and population data that helps the algorithms determine which “Toronto” is most likely mentioned in the text (there are a lot of Torontos). Based on the first results from our tests, the Geoparser using Geonames works remarkably well. However, it often fails for historic place names that are not in the Geonames Gazetteer. Where is “Lower Canada” or the “Republic of New Granada“? What about all of the colonies created during the Scramble for Africa, but renamed after decolonization? Some of these terms are in Geonames, while others are not: Ceylon and Oil Rivers Protectorate. Geonames also lacks many of the regional terms often used in historical documents, such as “West Africa” or “Western Canada”.
To help reduce the number of missed place names or errors in our text mined results, we asked David Zylberberg, who did great work annotating our test samples, to help us solve many of the problems he identified. A draft of his new Gazetteer of missing 19th century place names is displayed above. Some of these are place names David found in the 150 page test sample that the prototype system missed. This includes some common OCR errors and a few longer forms of place names that are found in Geonames, which don’t totally fit within the 19th century place name gazetteer, but will still be helpful for our project. He also expanded beyond the place names he found in the annotation by identifying trends. Because our project focuses on commodities in the 19th century British world, he worked to identify abandoned mining towns in Canada and Australia. He also did a lot of work in identifying key place names in Africa, as he noticed that the system seemed to work in South Asia a lot better than it did in Africa. Finally, he worked on Eastern Europe, where many German place names changed in the aftermath of the Second World War. Unfortunately, some of these location were alternate names in Geonames and by changing the geoparser settings, we solved this problem, making David’s work on Eastern Europe and a few other locations redundant. Nonetheless, we now have the beginnings of a database of place names and region names missing from the standard gazetteers and we plan to publish this database in the near future and invite others to use and add to it. This work is at an early stage, so we’d be very interested to hear from others about how they’ve dealt with similar issues related to text-mining historical documents.
A word cloud of diseases found in The Diseases of Tropical Plants by Melville Thurston Cook
During the 19th century British industrialists and botanists searched the world for economically useful plants. They moved seeds and plants between continents and developed networks of trade and plantations to supply British industries and consumers. This global network also spread diseases. Stuart McCook is working on the history of Coffee Rust (Hemileia Vastatrix) and there are a few books that examine the diseases that prevented Brazil from developing rubber plantations. Building on this work, we’re using the Trading Consequences text mining pipeline to try explore the wider trends of plant diseases as they spread through the trade and plantation network.
We need a list of diseases with both the scientific and common names from the time period. The Internet Archive provides a number of text books from the end of the 19th and start of the 20th century. They were written by American botanists, but one book in particular attempts a global survey of tropical plant diseases (The Diseases of Tropical Plants). Because these books are organized in an encyclopedic fashion, it is relatively easy to have a student go through and create a list of plant disease. We’re working on expanding our list from other sources of the next few weeks. Once the list is complete we’ll add them to our pipeline and extract relationships between mentions of these diseases, locations, dates and commodities in our corpus of 19th century documents. This should allow us to track Sooty Mould, Black Rot, Fleshy Fungi, Coffee Leaf Rust and hundreds of other diseases at points in time when they became enough of a problem to appear in our document collection.
Our York University team members, led by Timothy Bristow at the Library, have organized a one day workshop on text mining in the humanities on March 15:
A macroscope is designed to capture the bigger picture, to render visible vastly complex systems. Large-scale text mining offers researchers the promise of such perspective, while posing distinct challenges around data access, licensing, dissemination, and preservation, digital infrastructure, project management, and project costs. Join our panel of researchers, librarians, and technologists as they discuss not only the operational demands of text mining the humanities, but also how Ontario institutions can better support this work. Read More
Earlier this year, Jim Clifford and I were invited to present the Trading Consequences project to a group of scholars, many of them from English Departments in the Toronto region, who are interested in the Victorian period. We contributed to the workshop, “Making Connections in Victorian Research”, held at York University in Toronto, on 19 October.
Our paper was sandwiched between talks about clothing reform in Victorian Britain and the pornographic elements of Bram Stoker’s Dracula. Not surprisingly, we were concerned that our discussion of computer-assisted analysis of trading patterns and associated environmental consequences in the British empire might appear tangential, maybe even irrelevant, to the cultural concerns of this audience of scholars.
However, one of the advantages of historical studies is that issues within the same chronological time frame do have ways of connecting. As Barry Commoner suggested, a key principle of ecology is that “everything is connected to everything else.” The same is true when one approaches matters historically.
We presented the methodology of the Trading Consequences project, discussing the collaboration with computational linguists and computer scientists, and we showed some preliminary visualisations of the research findings. The map we showed illustrated the global geographical locations associated with references to natural resources in Canadian government documents from 1860 to 1900. In presenting these data, we are hoping to understand the mental geography of Canadian decision-makers (politicians, government officials and businesspeople) in this time period. A feature of the exploitation of natural resources is that extraction activities can shift fairly quickly from one part of the globe to another. In other words, a fisher off Nova Scotia may have to keep in mind what fishers in the North Sea are doing. The production of lime for fertiliser in Ontario may be influenced by developments in Florida or Algeria. The map was based on an experiment with visualisation techniques. Much of what it illustrated was fairly commonsense: concentrations of references to the United Kingdom and the United States. France seemed more prominent than we would have expected, as were Nepal and the Philippines, possibly illustrating some problems with the data which we will need to explore. China seemed under-represented. However, to our mind, the emphasis on the Caribbean seemed one angle worth pursuing.
Jay Young was inspired to write an ActiveHistory.ca post on the historical background of a favourite summer drink while working as a Researcher with Trading Consequences:
The Gin and Tonic – what better a drink during the dog days of summer? Put some ice in a glass, pour one part gin, add another part tonic water, finish with a slice of lime, and you have a refreshing drink to counter the heat. But it is also steeped in the history of medicine, global commodity frontiers, and the expansion of the British Empire. Continue reading →
Since our last post the Trading Consequences team have been working with our identified and potential data providers to begin gathering digital data for the project.
As the various data providers were sending us millions of pages of text from digitized historical documents, I flew over to London to spend some time in the archives.
A major component of our Digging Into Data project will involve doing traditional historical research, in archives and using the digitized repositories, to provide a comparison between what the historians are able to find and what the data mining and visualization components discover. So I set about researching a few of the more interesting commodities flowing into London industry during the nineteenth century. This included archival records related to the palm oil trade in west Africa and records at Kew Gardens’ archives related to John Eliot Howard’s scientific investigations into cinchona and quinine. John Eliot was one of the “Sons” in Howard & Sons, who manufactured chemicals and drugs in Startford (near the site of the 2012 Olympics) throughout the nineteenth century. After photographing most of his papers at Kew, I also spent time at the London Metropolitan Archive, looking through the company records. It was at the LMA that I was reminded about the disappointments often associated with historical research. It turned out the single most interesting document listed in the archival holdings, a ledger listing the imports of cinchona bark throughout the middle of the century, had been destroyed at some point and a second document on their trade with plantations in Java is missing.
After collecting enough material to begin my study of the relationships between factories in the Thames Estuary and commodity frontiers in South America, Africa and India, I focused my final day in the archive on a set of sources that will directly assist with the data mining aspects of the project. I recorded four years of customs ledgers, which record the quantity, declared value and country of origin of the hundreds of different commodity categories imported into Britain (everything from live animals to works of art). This source will provide the foundation of the taxonomy of commodities that we will create over the next few months, which will then be used to mine the data. Moreover, these ledgers provide a good starting point for our research into Canada’s trade with Britain and we are recording the quantity and value of all the goods shipped across the Atlantic. Just in through the monotonous process of photographing a few thousand pages, the major changes between the early and late nineteenth century began to stand out. Not only were there a lot more commodities by the centuries’ end, but Britain was relying on far more countries to supply it with raw materials.