Mapping Supply Chains for 19th Century Leather

Impression of a Buenos Aires slaughterhouse by Charles Pellegrini, 1829.

[First Published on the NiCHE Website] By Andrew Watson with Jim Clifford For the past two weeks I’ve been in Saskatoon, working with Jim Clifford in the University of Saskatchewan’s Historical Geographic Information Systems (HGIS) Lab. Since January 2014 I’ve been working with Jim and Colin Coates on the Trading Consequences research project thinking about how historians can use these valuable new text mining, database and visualization tools to understand the economic and environmental histories of global commodity flows during the nineteenth century. This trip to Saskatchewan has allowed Jim and I to focus our energies on using Trading Consequences for historical research. We used text-mined spatial data in conjunction with trade statistics and textual sources as a means of testing the search results and functionality of Trading Consequences. To do this, we chose a case study: the history of leather tanning related commodities during the nineteenth century.

We chose leather tanning for our case study because this topic intersects with both our research interests. Jim is interested in how industrial development across London, including the leather district of Bermondsey, contributed to broader environmental transformations through the development of global commodity flows. Part of my recently completed doctoral research examined the economic and environmental dimensions of hemlock bark harvesting for leather tanneries in Muskoka, Ontario during the same time period. Trading Consequences provides the opportunity to learn more about the ways tanneries in Muskoka and London functioned as part of transnational networks in hides, tannins and leather. Apart from some primary and secondary source background reading, our work over these initial two weeks of research on this project focused almost exclusively on exploring nineteenth century trade statistics for Britain and, to a lesser extent, the United States. Theses statistics came mainly from the Annual Statement of the Trade of the United Kingdom with Foreign Countries and British Possessions, which the HGIS Lab’s research assistant, Stephen Langlois, entered into a Commodity Flows database. With the help of Jon Bath, Director of the Digital Research Centre at U Sask, Jim and I exported the statistics from the Commodity Flows database to create spreadsheets, graphs and maps, which we used to help us understand broad patterns and trends in the global trade of leather tanning commodities during the nineteenth century. One of the tools we used to start to get a sense of the transnational connections of these commodities is, a web-based supply chain mapping service, that allows users to generate maps populated with directional flow information. Using the information from the Commodity Flows database related to where commodities originated as well as their destination, Jim created four maps representing the flow of leather tanning related commodities at different points in the nineteenth century.

Official Launch of Trading Consequences!

Today we are delighted to officially announce the launch of Trading Consequences!

Over the course of the last two years the project team have been hard at work to use text mining, traditional and innovative historical research methods, and visualization techniques, to turn digitized nineteenth century papers and trading records (and their OCR’d text) into a unique database of commodities and engaging visualization and search interfaces to explore that data.

Today we launch the database, searches and visualization tools alongside the Trading Consequences White Paper, which charts our work on the project including technical approaches, some of the challenges we faced, and what and how we have achieved during the project. The White Paper also discusses, in detail, how we built the tools we are launching today and is therefore an essential point of reference for those wanting to better understand how data is presented in our interfaces, how these interfaces came to be, and how you might best use and interpret the data shared in these resources in your own historical research.

There are four ways to explore the Trading Consequences database:

  1. Commodity Search. This performs a search of the database table of unique commodities, for commodities beginning with the search term entered. The returned list of commodities is sorted by two criteria (1) whether the commodity is a “commodity concept” (where any one of several unique names known to be used for the same commodity returns aggregated data for that commodity); or (2) alphabetically. Read more here.
  2. Location SearchThis performs a search of the database table of unique locations, for locations beginning with the search term entered. The returned list of locations is sorted by the frequency that the search term is mentioned within the historical documents. Selecting a location displays: information about the location such as which country it is within, population etc; A map highlighting the location with a map marker; A list of historical documents and an indication of how many times the selected location is mentioned within each document. Read more here.
  3. Location Cloud Visualization. This shows the relation between a selected commodity and its related location. The visualization is based on over 170000 documents from digital historical archives (see list of archives below).The purpose of the visualization is to provide a general overview of how the importance of location mentions in relation to a particular commodity changed between 1800 and 1920. Read more here.
  4. Interlinked Visualization. This provides a general overview of how commodities were discussed between 1750 and 1950 along geographic and temporal dimensions. They provide an overview of commodity and location mentions extracted from 179000 historic documents (extracted from the digital archive listed below). Read more here.

Please do try out these tools (please note that the two visualizations will only work with newer versions of the Chrome Browserand let us know what you think – we would love to know what other information or support might be useful, what feedback you have for the project team, how you think you might be able to use these tools in your own research.

We are also very pleased to announce that we are sharing some of the code and resources behind Trading Consequences via GitHub. This includes a range of Lexical Resources that we think historians and those undertaking historical text mining in related areas, may find particularly useful: the base lexicon of commodities created by hand for this project; the Trading Consequences SKOS ontology; and an aggregated gazeteer of ports and cities with ports.

The Trading Consequences team would like to acknowledge and thank the project partners, funders and data providers that have made this work possible. We would particularly like to thank the Digging Into Data Challenge, and the international partners and funders of DiD, for making this fun, challenging and highly collaborative transatlantic project possible. We have hugely enjoyed working together and we have learned a great deal from the interdisciplinary and international exchanges that has been so central to to this project.

We would also like to extend our thanks to all of those who have supported the project over the last few years with help, advice, opportunities to present and share our work, publicity for events and blog posts. Most of all we would like to thank all of those members of the historical research community who generously gave their time and perspectives to our historians, to our text mining experts, and particularly to our visualization experts to help us ensure that what we have created in this project meets genuine research needs and may have application in a range of historical research contexts.

What next?
Trading Consequences does not come to an end with this launch. Now that the search and visualization tools are live – and open for anyone to use freely on the web – our historians Professor Colin Coates (York University, Canada) and Dr Jim Clifford (University of Saskatchewan) will be continuing their research. We will continue to share their findings on historical trading patterns, and environmental history, via the Trading Consequences blog.

Over the coming months we will be continuing to update our publications page with the latest research and dissemination associated with the project, and we will also be sharing additional resources associated with the project via GitHub, so please do continue to keep an eye on this website for key updates and links to resources.

We value and welcome your feedback on the visualizations, search interfaces, the database, or any other aspect of the project, website or White Paper at any point. Indeed, if you do find Trading Consequences useful in your own research we would particularly encourage you to get in touch with us (via the comments here, or via Twitter) and consider writing a guest post for the blog. We also welcome mentions of the project or website in your own publications and we are happy to help you to publicize these.

Explore Trading Consequences

Comparing Apples with Oranges

This Friday we will officially launch Trading Consequences this Friday (21st March), with publication of our White Paper and the launch of our visualization and search tools. Ahead of the launch we wanted to give you some idea of what you will be able to access, what you might want to view and what you might want to compare with these new historical research tools. Professor Colin Coates has been exploring the possibilities… 

The “Trading Consequences” website literally allows us to compare apples and oranges.  Both fruits became the objects of substantial international trade in the nineteenth century, as in the right conditions they can remain edible despite being shipped great distances.

They are complementary fruits in many ways, as apples are grown in temperate climates whilst oranges prefer warmer conditions.  They may overlap geographically, but typically we associate different parts of the world with each fruit.  In the context of the British world, apples grew in the United Kingdom, of course, but they also came from Canada, New Zealand and the United States, among other locations.  Oranges from places like Spain, Florida or Latin America entered the United Kingdom in the nineteenth century.  The two maps which result from entering “apple” and “orange” into the database show, at a glance, how oranges appeared more often in reference to warmer zones than apples.

The chronological distribution of commodity mentions was roughly similar in both cases.  Increased attention from 1880 to 1900 reflects in part the expansion of the documentation in that period, but it likely also reflected growth in trade and consumption.  Historian James Murton has pointed out that regular trade in apples developed from Canada to Great Britain in the 1880s, focused primarily in Nova Scotia.  On average, one million bushels of apples reached British markets (Murton, 2012).

In contrast, both apples and oranges show sudden spikes in the 1830s, for entirely different reasons.  The spike for apples points the researcher to a useful “Report from the Selection Committee on the Fresh Fruit Trade” in 1839.  But the mid-1830s spike in oranges points instead to the activities of Orange Lodges in Ireland.  The other visualisation shows this anomaly even more clearly, as IRELAND takes on a prominence in related geographical terms in the 1830s that it did not occupy afterwards.

This project entailed teaching computers to read as an historian might, and there are distinct advantages to being able to deal with such a wide range of documentation.  However, all historians must be critical of the sources we use. The visualisations in “Trading Consequences” point towards useful sources for further study, and to suggest that historian may wish to consider some regions in their analysis.  The importance of the United States in the discussions about apples is noteworthy, for instance.  Australia has a large number of mentions of oranges, though it is important to note that a small city boasts the same name and could account for part of the number.  (Interestingly enough, Orange, New South Wales, did not grow many oranges according to the Australian Atlas 2006! But it does have apples.)

The increase in mentions of both apples and oranges from the 1880s on may reflect improving living standards in Britain in that period.  Britain’s decision to adopt free trade had led to an increase in a wide variety of imported foodstuffs (Darwin, 2009).  As the heightened attention to both apples and oranges probably shows, these fruits were part of that movement.

The “Trading Consequences” visualisations show some instructive comparisons, some that may point to different ways to conceive of trade in these resources, and others which illustrate the care with which researchers should approach results.


Text Mining 19th Century Place Names

By Jim Clifford

Nineteenth century place names are a major challenge for the Trading Consequences project. The Edinburgh Geoparser uses the Geonames Gazetteer to supply crucial geographic information, including the place names themselves, their longitudes and latitudes, and population data that helps the algorithms determine which “Toronto” is most likely mentioned in the text (there are a lot of Torontos). Based on the first results from our tests, the Geoparser using Geonames works remarkably well. However, it often fails for historic place names that are not in the Geonames Gazetteer. Where is “Lower Canada” or the “Republic of New Granada“? What about all of the colonies created during the Scramble for Africa, but renamed after decolonization? Some of these terms are in Geonames, while others are not: Ceylon and Oil Rivers Protectorate. Geonames also lacks many of the regional terms often used in historical documents, such as “West Africa” or “Western Canada”.

To help reduce the number of missed place names or errors in our text mined results, we asked David Zylberberg, who did great work annotating our test samples, to help us solve many of the problems he identified. A draft of his new Gazetteer of missing 19th century place names is displayed above. Some of these are place names David found in the 150 page test sample that the prototype system missed. This includes some common OCR errors and a few longer forms of place names that are found in Geonames, which don’t totally fit within the 19th century place name gazetteer, but will still be helpful for our project. He also expanded beyond the place names he found in the annotation by identifying trends. Because our project focuses on commodities in the 19th century British world, he worked to identify abandoned mining towns in Canada and Australia. He also did a lot of work in identifying key place names in Africa, as he noticed that the system seemed to work in South Asia a lot better than it did in Africa. Finally, he worked on Eastern Europe, where many German place names changed in the aftermath of the Second World War. Unfortunately, some of these location were alternate names in Geonames and by changing the geoparser settings, we solved this problem, making David’s work on Eastern Europe and a few other locations redundant.  Nonetheless, we now have the beginnings of a database of  place names and region names missing from the standard gazetteers and we plan to publish this database in the near future and invite others to use and add to it. This work is at an early stage, so we’d be very interested to hear from others about how they’ve dealt with similar issues related to text-mining historical documents.

Invited talk on Digital History and Big Data

Last week I was invited to give talk about Trading Consequences at the Digital Scholarship: day of ideas event 2 organised by Dr. Siân Bayne.  If you are interested in my slides, you can look at them here on Slideshare.

Rather than give a summary talk about all the different things going on in the Edinburgh Language Technology Group at the School of Informatics, we decided that it would more informative to focus on one specific project and provide a bit more detail without getting too technical.  My aim was to raise our profile with attendees from the humanities and social sciences in Edinburgh and further afield who are interested in digital humanities research.  They made up the majority of the audience, so this talk was a great opportunity.

Most of my previous presentations were directed to people in my field, so to experts in text mining and information extraction.  So this talk would have to be completely different to how I would normally present my work which is to provide detailed information on methods and algorithms, their scientific evaluation etc.  None of the attendees would be interested in such things but I wanted them to know what sort of things our technology is capable of and at the same time let them understand some of the challenges we face.

I decided to focus the talk on the user-centric approach to our collaboration in Trading Consequences, explaining that our current users and collaborators (Prof. Colin Coates and Dr. Jim Clifford, environmental historians at York University, Toronto) and their research questions are key in all that we design and develop.  Their comments and error analysis feed directly back into the technology allowing us to improve the text mining and visualisation with every iteration.  The other point I wanted to bring across is that transparency in the quality of the text mining is crucial to our users, who want to know to what level they can trust the technology.  Moreover, the output of our text mining tool in its raw XML format is not something that most historians would be able to understand and query easily.  However, when text mining is combined with interesting types of visualisations, the data mined from all the historical document collections becomes alive.

We are currently processing digitised versions of over 10 million scanned document images from 5 different collections amounting to several hundred gigabytes worth of information.  This is not big data in the computer science sense where people talk about terrabytes or petabytes.  However, it is big data to historians who in the best case have access to some of these collections online using keyword search but often have to visit libraries and archives and go through them manually.  Even if a collection is available digitally and indexed, it does not mean that all the information relevant to a search term is easily accessible users.  In a large proportion of our data, the optical character recognised (OCRed) text contains a lot of errors and, unless corrected, those errors then find their way into the index.  This means that searches for correctly spelled terms will not return any matches in sources which mention them but with one or more errors contained in them.

The low text quality in large parts of our text collections is also one of our main challenges when it comes to mining this data.  So, I summarised the types of text correction and normalisation steps we carry out in order to improve the input for our text mining component.  However, there are cases when even we give up, that is when the text quality is just so low that is impossible even for a human being to read a document.  I showed a real example of one of the documents in the collections, the textual equivalent of an up-side-down image which was OCRed the wrong way round.

At the end, I got the sense that my talk was well received.  I got several interesting questions, including one asking whether we see that our users’ research questions are now shaped by the technology when the initial idea was for the technology to be driven by their research.  I also made some connections with people in literature, so there could be some exciting new collaborations on the horizon.  Overall, the workshop was extremely interesting and very well organised and I’m glad than I had the opportunity to present our work.



From Cod to Cinchona: Creating a Bibliographic Database of Sources for the Trading Consequences Project

As part of our work with the Trading Consequences project, Jim Clifford and I have compiled a bibliographic database of secondary sources that focus on the environmental and economic effects of the nineteenth-century global commodity trade. This is no small task, since the historiography is as vast as the imperial networks that this project seeks to explore. In this post, I’ll explain how we went about creating the database.

Earlier this year, Jim created a preliminary database of sources that originated from his own research interests in the environmental history of the British Empire during the nineteenth century. Project members had included many of these sources in the Digging into Data funding application, so it made an obvious starting point for us.

Zotero was an easy choice of software for our database, and it offers a number of advantages. For example, users can create folders within the larger database so that entries can be categorized by descriptors such as geographic area and type of commodity analyzed within the text. The software also enables users to enter source entries by clicking on an icon within the web browser address bar, create notes for such entries, and share their work with others in a group. With the click of a few keys, Zotero easily converts these entries into a conventional bibliography, as we’ve done at the end of this post.

During the summer, I joined the Trading Consequences project as a researcher. One of my tasks was to add sources to the existing database. My first strategy led me to survey existing bibliographies related to environmental history. For example, I used the Network in Canadian History and Environment’s (NiCHE) New Scholars Wiki that its members had created in 2008 in order to assist graduate students who needed to compile secondary sources for comprehensive exams in environmental history. Continue reading

Gin and Tonic: A Short History of a Stiff Drink (from

Jay Young was inspired to write an post on the historical background of a favourite summer drink while working as a Researcher with Trading Consequences:

The Gin and Tonic – what better a drink during the dog days of summer? Put some ice in a glass, pour one part gin, add another part tonic water, finish with a slice of lime, and you have a refreshing drink to counter the heat. But it is also steeped in the history of medicine, global commodity frontiers, and the expansion of the British Empire. Continue reading

Digging for Data in Archives

Since our last post the Trading Consequences team have been working with our identified and potential data providers to begin gathering digital data for the project.

As the various data providers were sending us millions of pages of text from digitized historical documents, I flew over to London to spend some time in the archives.

A major component of our Digging Into Data project will involve doing traditional historical research, in archives and using the digitized repositories, to provide a comparison between what the historians are able to find and what the data mining and visualization components discover. So I set about researching a few of the more interesting commodities flowing into London industry during the nineteenth century. This included archival records related to the palm oil trade in west Africa and records at Kew Gardens’ archives related to John Eliot Howard’s scientific investigations into cinchona and quinine. John Eliot was one of the “Sons” in Howard & Sons, who manufactured chemicals and drugs in Startford (near the site of the 2012 Olympics) throughout the nineteenth century. After photographing most of his papers at Kew, I also spent time at the London Metropolitan Archive, looking through the company records. It was at the LMA that I was reminded about the disappointments often associated with historical research. It turned out the single most interesting document listed in the archival holdings, a ledger listing the imports of cinchona bark throughout the middle of the century, had been destroyed at some point and a second document on their trade with plantations in Java is missing.

After collecting enough material to begin my study of the relationships between factories in the Thames Estuary and commodity frontiers in South America, Africa and India, I focused my final day in the archive on a set of sources that will directly assist with the data mining aspects of the project. I recorded four years of customs ledgers, which record the quantity, declared value and country of origin of the hundreds of different commodity categories imported into Britain (everything from live animals to works of art). This source will provide the foundation of the taxonomy of commodities that we will create over the next few months, which will then be used to mine the data. Moreover, these ledgers provide a good starting point for our research into Canada’s trade with Britain and we are recording the quantity and value of all the goods shipped across the Atlantic. Just in through the monotonous process of photographing a few thousand pages, the major changes between the early and late nineteenth century began to stand out. Not only were there a lot more commodities by the centuries’ end, but Britain was relying on far more countries to supply it with raw materials.