Official Launch of Trading Consequences!

Today we are delighted to officially announce the launch of Trading Consequences!

Over the course of the last two years the project team have been hard at work to use text mining, traditional and innovative historical research methods, and visualization techniques, to turn digitized nineteenth century papers and trading records (and their OCR’d text) into a unique database of commodities and engaging visualization and search interfaces to explore that data.

Today we launch the database, searches and visualization tools alongside the Trading Consequences White Paper, which charts our work on the project including technical approaches, some of the challenges we faced, and what and how we have achieved during the project. The White Paper also discusses, in detail, how we built the tools we are launching today and is therefore an essential point of reference for those wanting to better understand how data is presented in our interfaces, how these interfaces came to be, and how you might best use and interpret the data shared in these resources in your own historical research.

Find the Trading Consequences searches, visualizations and code via the panel on the top right hand side of the project website (outlined in orange).

Find the Trading Consequences searches, visualizations and code via the panel on the top right hand side of the project website (outlined in orange).

There are four ways to explore the Trading Consequences database:

  1. Commodity Search. This performs a search of the database table of unique commodities, for commodities beginning with the search term entered. The returned list of commodities is sorted by two criteria (1) whether the commodity is a “commodity concept” (where any one of several unique names known to be used for the same commodity returns aggregated data for that commodity); or (2) alphabetically. Read more here.
  2. Location SearchThis performs a search of the database table of unique locations, for locations beginning with the search term entered. The returned list of locations is sorted by the frequency that the search term is mentioned within the historical documents. Selecting a location displays: information about the location such as which country it is within, population etc; A map highlighting the location with a map marker; A list of historical documents and an indication of how many times the selected location is mentioned within each document. Read more here.
  3. Location Cloud Visualization. This shows the relation between a selected commodity and its related location. The visualization is based on over 170000 documents from digital historical archives (see list of archives below).The purpose of the visualization is to provide a general overview of how the importance of location mentions in relation to a particular commodity changed between 1800 and 1920. Read more here.
  4. Interlinked Visualization. This provides a general overview of how commodities were discussed between 1750 and 1950 along geographic and temporal dimensions. They provide an overview of commodity and location mentions extracted from 179000 historic documents (extracted from the digital archive listed below). Read more here.

Please do try out these tools (please note that the two visualizations will only work with newer versions of the Chrome Browserand let us know what you think – we would love to know what other information or support might be useful, what feedback you have for the project team, how you think you might be able to use these tools in your own research.

Image of the Start page of the Interlinked Visualization.

Start page of the Interlinked Visualization.

We are also very pleased to announce that we are sharing some of the code and resources behind Trading Consequences via GitHub. This includes a range of Lexical Resources that we think historians and those undertaking historical text mining in related areas, may find particularly useful: the base lexicon of commodities created by hand for this project; the Trading Consequences SKOS ontology; and an aggregated gazeteer of ports and cities with ports.

Bea Alex shares text mining progress with the team at an early Trading Consequences meeting.

Bea Alex shares text mining progress with the team at an early Trading Consequences meeting.

Acknowledgements

The Trading Consequences team would like to acknowledge and thank the project partners, funders and data providers that have made this work possible. We would particularly like to thank the Digging Into Data Challenge, and the international partners and funders of DiD, for making this fun, challenging and highly collaborative transatlantic project possible. We have hugely enjoyed working together and we have learned a great deal from the interdisciplinary and international exchanges that has been so central to to this project.

We would also like to extend our thanks to all of those who have supported the project over the last few years with help, advice, opportunities to present and share our work, publicity for events and blog posts. Most of all we would like to thank all of those members of the historical research community who generously gave their time and perspectives to our historians, to our text mining experts, and particularly to our visualization experts to help us ensure that what we have created in this project meets genuine research needs and may have application in a range of historical research contexts.

Image of the Trading Consequences Project Team at our original kick off meeting.

Image of the Trading Consequences Project Team at our original kick off meeting.

What next?
Trading Consequences does not come to an end with this launch. Now that the search and visualization tools are live – and open for anyone to use freely on the web – our historians Professor Colin Coates (York University, Canada) and Dr Jim Clifford (University of Saskatchewan) will be continuing their research. We will continue to share their findings on historical trading patterns, and environmental history, via the Trading Consequences blog.

Over the coming months we will be continuing to update our publications page with the latest research and dissemination associated with the project, and we will also be sharing additional resources associated with the project via GitHub, so please do continue to keep an eye on this website for key updates and links to resources.

We value and welcome your feedback on the visualizations, search interfaces, the database, or any other aspect of the project, website or White Paper at any point. Indeed, if you do find Trading Consequences useful in your own research we would particularly encourage you to get in touch with us (via the comments here, or via Twitter) and consider writing a guest post for the blog. We also welcome mentions of the project or website in your own publications and we are happy to help you to publicize these.

Image of Testing and feedback at CHESS'13.

Testing and feedback at CHESS’13.

Explore Trading Consequences

Invited talk on Digital History and Big Data

Last week I was invited to give talk about Trading Consequences at the Digital Scholarship: day of ideas event 2 organised by Dr. Siân Bayne.  If you are interested in my slides, you can look at them here on Slideshare.

Rather than give a summary talk about all the different things going on in the Edinburgh Language Technology Group at the School of Informatics, we decided that it would more informative to focus on one specific project and provide a bit more detail without getting too technical.  My aim was to raise our profile with attendees from the humanities and social sciences in Edinburgh and further afield who are interested in digital humanities research.  They made up the majority of the audience, so this talk was a great opportunity.

My presentation on Trading Consequences at the Digital Scholarship workshop (photo taken by Ewan Klein).

Most of my previous presentations were directed to people in my field, so to experts in text mining and information extraction.  So this talk would have to be completely different to how I would normally present my work which is to provide detailed information on methods and algorithms, their scientific evaluation etc.  None of the attendees would be interested in such things but I wanted them to know what sort of things our technology is capable of and at the same time let them understand some of the challenges we face.

I decided to focus the talk on the user-centric approach to our collaboration in Trading Consequences, explaining that our current users and collaborators (Prof. Colin Coates and Dr. Jim Clifford, environmental historians at York University, Toronto) and their research questions are key in all that we design and develop.  Their comments and error analysis feed directly back into the technology allowing us to improve the text mining and visualisation with every iteration.  The other point I wanted to bring across is that transparency in the quality of the text mining is crucial to our users, who want to know to what level they can trust the technology.  Moreover, the output of our text mining tool in its raw XML format is not something that most historians would be able to understand and query easily.  However, when text mining is combined with interesting types of visualisations, the data mined from all the historical document collections becomes alive.

We are currently processing digitised versions of over 10 million scanned document images from 5 different collections amounting to several hundred gigabytes worth of information.  This is not big data in the computer science sense where people talk about terrabytes or petabytes.  However, it is big data to historians who in the best case have access to some of these collections online using keyword search but often have to visit libraries and archives and go through them manually.  Even if a collection is available digitally and indexed, it does not mean that all the information relevant to a search term is easily accessible users.  In a large proportion of our data, the optical character recognised (OCRed) text contains a lot of errors and, unless corrected, those errors then find their way into the index.  This means that searches for correctly spelled terms will not return any matches in sources which mention them but with one or more errors contained in them.

The low text quality in large parts of our text collections is also one of our main challenges when it comes to mining this data.  So, I summarised the types of text correction and normalisation steps we carry out in order to improve the input for our text mining component.  However, there are cases when even we give up, that is when the text quality is just so low that is impossible even for a human being to read a document.  I showed a real example of one of the documents in the collections, the textual equivalent of an up-side-down image which was OCRed the wrong way round.

At the end, I got the sense that my talk was well received.  I got several interesting questions, including one asking whether we see that our users’ research questions are now shaped by the technology when the initial idea was for the technology to be driven by their research.  I also made some connections with people in literature, so there could be some exciting new collaborations on the horizon.  Overall, the workshop was extremely interesting and very well organised and I’m glad than I had the opportunity to present our work.

 

 

How to Build a Macroscope

Our York University team members, led by Timothy Bristow at the Library, have organized a one day workshop on text mining in the humanities on March 15:

A macroscope is designed to capture the bigger picture, to render visible vastly complex systems. Large-scale text mining offers researchers the promise of such perspective, while posing distinct challenges around data access, licensing, dissemination, and preservation, digital infrastructure, project management, and project costs. Join our panel of researchers, librarians, and technologists as they discuss not only the operational demands of text mining the humanities, but also how Ontario institutions can better support this work. Read More

The Boundaries of Commodities

Together with Jim Clifford and Uta Hinrichs, I was lucky enough to be able to attend the first Networking Workshop for the AHRC Commodity Histories project on 6–7 September. This was organised by Sandip Hazareesingh, Jean Stubbs and Jon Curry-Machado, who are also jointly responsible for the Commodities of Empire project. The main stated goal of the meeting was to design a collaborative research web space for the community of digital historians interested in tracing the origins and growth of the global trade in commodities. This aspect of the meeting was deftly coordinated by Mia Ridge, and also took inspiration from William Turkel‘s analysis of designing and running a web portal for the NiCHE community of environmental historians in Canada.

Complementing the design and planning activity was an engaging programme of short talks, both by participants of Commodities of Empire and by people working on related initiatives. I won’t try to summarise the talks here; there are others who are much better qualified than me to do that. Instead, I want to mention a small idea about commodities that emerged from a discussion during the breaks.

A number of the workshop participants problematized the notion of ‘commodity’, and pointed out that it isn’t always possible or realistic to set sharp boundaries on what counts as a commodity. It’s certainly the case that we have tended to accept a simple reification of commodities within Trading Consequences. Tim Hitchcock argued that commodities are convenient fictions that abstract away from a complex chain of causes and effects. He gave guano as an example of such a commodity: it results from a collection of processes, during which fish are consumed by seabirds, digested and excreted, and the resulting accumulation of excrement is then harvested for subsequent trading. Of course, we can also think about the processes that guano undergoes after being transported, most obviously for use as a crop fertiliser that enters into further relations of production and trade. Here’s a picture that tries to capture this notion of a commodity being a transient spatio-temporal phase in a longer chain of processes, each of which takes place in a specific social/natural/technological environment.
Diagram of commodity as phase in a chain of processes
Although we have little access within the framework of Trading Consequences to these wider aspects of context, one idea that might be worth pursuing would be to annotate the plant-based commodities in our data with information about their preferred growing conditions. For example, it might be useful to know whether a given plant is limited to, say, tropical climate zones, and whether it grows in forested or open environments. Some of this data can probably be recovered from Wikipedia, but it would be nice if we could find a Linked Data set which could be more directly linked to from our current commodity vocabulary. One benefit of recording such information might be an additional sanity check that we have correctly geo-referenced locations that are associated with plants. Another line of investigation would be whether a particular plant is being cultivated on the margins of its environmental tolerance by colonists. Finally, data about climatic zone could play well with map-based visualisations of trading routes.

Trading Consequences at the Geospatial in the Cultural Heritage Domain Event

Earlier this month Claire Grover, one of the Trading Consequences team based at University of Edinburgh Schools of Informatics, gave a presentation on the project at the JISC GECO Geospatial in the Cultural Heritage Domain event in London.

The presentation gives a broad overview of the Trading Consequences project and the initial text mining work that is currently taking place. The slides are now up on SlideShare and the audio recording of Claire’s talk will also be available here shortly:



You can also read a liveblog of all of the talks, including Claire’s, over on the JISC GECO blog.