The Boundaries of Commodities

Together with Jim Clifford and Uta Hinrichs, I was lucky enough to be able to attend the first Networking Workshop for the AHRC Commodity Histories project on 6–7 September. This was organised by Sandip Hazareesingh, Jean Stubbs and Jon Curry-Machado, who are also jointly responsible for the Commodities of Empire project. The main stated goal of the meeting was to design a collaborative research web space for the community of digital historians interested in tracing the origins and growth of the global trade in commodities. This aspect of the meeting was deftly coordinated by Mia Ridge, and also took inspiration from William Turkel‘s analysis of designing and running a web portal for the NiCHE community of environmental historians in Canada.

Complementing the design and planning activity was an engaging programme of short talks, both by participants of Commodities of Empire and by people working on related initiatives. I won’t try to summarise the talks here; there are others who are much better qualified than me to do that. Instead, I want to mention a small idea about commodities that emerged from a discussion during the breaks.

A number of the workshop participants problematized the notion of ‘commodity’, and pointed out that it isn’t always possible or realistic to set sharp boundaries on what counts as a commodity. It’s certainly the case that we have tended to accept a simple reification of commodities within Trading Consequences. Tim Hitchcock argued that commodities are convenient fictions that abstract away from a complex chain of causes and effects. He gave guano as an example of such a commodity: it results from a collection of processes, during which fish are consumed by seabirds, digested and excreted, and the resulting accumulation of excrement is then harvested for subsequent trading. Of course, we can also think about the processes that guano undergoes after being transported, most obviously for use as a crop fertiliser that enters into further relations of production and trade. Here’s a picture that tries to capture this notion of a commodity being a transient spatio-temporal phase in a longer chain of processes, each of which takes place in a specific social/natural/technological environment.
Diagram of commodity as phase in a chain of processes
Although we have little access within the framework of Trading Consequences to these wider aspects of context, one idea that might be worth pursuing would be to annotate the plant-based commodities in our data with information about their preferred growing conditions. For example, it might be useful to know whether a given plant is limited to, say, tropical climate zones, and whether it grows in forested or open environments. Some of this data can probably be recovered from Wikipedia, but it would be nice if we could find a Linked Data set which could be more directly linked to from our current commodity vocabulary. One benefit of recording such information might be an additional sanity check that we have correctly geo-referenced locations that are associated with plants. Another line of investigation would be whether a particular plant is being cultivated on the margins of its environmental tolerance by colonists. Finally, data about climatic zone could play well with map-based visualisations of trading routes.

10 things we learned at the Trading Consequences project meeting…

On Thursday 17th and Friday 18th May we held a Trading Consequences project meeting in Edinburgh where the whole team finally got to meet each other after months of virtual meetings. Here are the 10 awesome things we found out…

  1. Visualisation isn’t about pretty pictures it’s about insight. Take for example the  London Underground map and a New York Subway map… you will see some seriously different stylings (you can see both in Aaron’s presentation here). The London Underground Map is all about key points on the routes, the map isn’t a literal representation of distance but a conceptual take on London’s origins as a network of villages. In New York, where residents are used to walking above ground and are particularly used to the grid system for roads the map reflects this in order to make it easier to conceptualise the combination of Subway and walking routes. And that’s the key thing… visualisations are about representing different world views, different conceptions of information, specific mental maps of the data. A good visualisation reflects a particular world view rather than trying to loyally mirror reality.
  2. Image of a banana

    Moved banana by Flickr user ungard | dave ungar

    Yes, we have no bananas! Well, actually, we might have some bananas today but in London in 1905 did you know that you were allowed to steal bananas if they were brown or blackened? There is an oral history description of being allowed to steal these bananas as they couldn’t be sold. So, can we find evidence to back this up? If we are going to then we need to leave as much information in the ontology we are building to ensure we can find and access that sort of detail. Of course we know what we want to look for here – banana-bread ready fruit is a bit of a known unknown – but what about the things we don’t know about yet? The unknown unknowns we may want to find in the future? Not being able to find something in the data we have gathered doesn’t necessarily mean it’s not there, it just means we can’t confirm that it’s there.
  3. The 19th Century take on “animal, vegetable, or mineral?” was “from the sea“, “from the farm“, or “from the forest”?  This is all about ontologies again… So what is an ontology? Well it’s a way to understand the world, a conceptual model that allows you to structure, sort, classify, connect and understand each item within its immediate and wider context. In an era of trading raw materials and early manufactured items “from the sea” made sense, “from the farm” added useful context… similarly we might be used to understanding trees by their genus but historically qualities such as whether it can be sawn or hewn were important classifications. We’ve been thinking about this since the meeting and you can read about some of the issues around ontologies on Ewan’s blog.
  4. Image of artificial eyes

    Eyes (NOT FOR SALE) by Flickr User fumikaharukaze | Fumika Harukaze

    The eyes have it… and that can be a real problem as us humans are quite a lot better built for reading visual information than machines. When we are looking at sources for Trading Consequences we are seeing digitised materials that have been scanned then OCRed (put through Optical Character Recognition). Printing presses used to be pretty quirky – the letter “a” might look squiffy in every print, or a mark might appear on every page, ink may have smudged, etc. Scanning and OCR technology might look much more high tech but they too have quirks – digital cameras and scanners get better all the time and OCR engines improve each year… that means materials we are working with that were digitised years back look noticibly different from those that have been recently scanned and OCRed. That can be pretty challenging… and then we get to the many tables of traded goods. The human may see a very attractive pattern of columns and rows but the computer just doesn’t see it that easily and we have to try to guide it to read the data in so that it makes sense to the machine, to us humans, and that it reflects what was in the original document.
  5. Image of turkey red cotton

    "Turkey red floral patterns." by the National Museum of Scotland's Feastbowl Blog (click through to read a full post on Turkey Red)

    Wild turkey and rubber demands…. Turkey Red is a type of dyed cotton – named after the place not the bird – which was exported in huge amounts, much of it from Aberdeen But Turkey Red was a complicated and expensive die to make and the process was incompatible with the new textile printing processes that were emerging. There was a shift from natural dyes to synthetic materials and demand for Turkey Red plummeted. The project team has been in discussion with Edinburgh University’s Stana Nenadic and her Colouring the Nation project, which specifically looks at the history of Turkey Red. However, this is just one great example of changes in society being echoed by the consequence of trade and we hope this project will help us explore more of these Big changes generally take place at key pivotal dates due to shifts in economic, political and environmental factors and historians will look for these peaks and sharp changes. Changes such a huge increase in demand for rubber because of the bicycle craze!
  6. Lost in translation? With academic historians, informatics researchers, visualisation experts, specialists in geospatially enabled databases and a social media specialist gathered together in one small room with a lot of coffee we knew we’d have to do a lot of talking to explain our very different positions. For a start our informatics researchers are used to beginning with a hypothesis whilst our historical researchers are much more likely to take a grounded research approach. This is a really different way to plan and conduct work and we need to understand where we’re all coming from. The tools this project creates need to enable historians in their processes and we must be careful to build something that meets specific needs and appropriate expectations. At the same time, as a project team, we also need to be working together to ensure our publications schedules make sense so we needed to spend some time getting up to speed on which conferences matter in each discipline, where we can work collaboratively on papers and publications, and what types of research outputs are most important for the project partners.
  7. Image of tape storage.

    The History of Tape Storage by Flickr user Pargon

    Storage solutions: a database is not just “a database”, just like furniture from a certain Swedish home furnishing chain you need to know the measurements, the aesthetic needs, the future extensibility before you buy. And just like a house you need the right foundations to build something stable, fit for purpose and ready to use. What questions we will be asking of our data are the essential starting point here (see also Aaron’s blog, “The question is key in Trading Consequences” ) – knowing these and some sort of suitable ontology early on helps us ensure we can design the right structure for our database.
  8. History in a changeable climate – part of the the Trading Consequences project is to consider the impact, the consequences, of historical trades. That means looking at different resources and seeing what the most likely environmental impacts of timber trade, cattle trade and so on might be. That means users may want to query our data based on those impact – looking up the kind of trades that might contribute to flooding, that may be reflected in famine, that might be affected by draught, etc. That requires a whole separate ontology for environmental impact that can somehow account for these very interconnected factors – and that is a lot harder than it looks!
  9. Image of a lab

    Harvey W. Wiley conducting experiments in his laboratory by DC Public Library Commons | DCPL Commons on Flickr Commons (click for more information)

    Shipping drugs – no, not a sinister diversification for the project but a reflection of the complexity of trading data. We can look for records of trading particular types of medicines and drugs but sometimes that’s not the right data to look at. Botanical trades also reflects the trading of drugs as some plant material was shipped for later use or processing into pharmaceuticals (for an idea of the type of plants involved take a look at the Alnwick Poison Garden). The same issue applies to leather goods for instance – you might trade the hides, specific goods like leather gloves, perhaps even the whole cow. All of those trades may reflect leather trade but understanding, combining and querying that data poses some challenges.
  10. Pithy headings! They matter! Part of our project meeting was considering how we communicate the project. As well as learning to use pithy headings, images, bullet points and other web-friendly formatting, we also found out that blog posts should usually be no more than 200-300 words. We also discussed how people access this site on other devices, particularly mobiles. Although we are working on historical data a lot of us are using smart phones and they have smaller screens and differing requirements. We agreed to apply a new mobile theme – so do try reading this blog on your phone and let us know if you like it!

We hope that gave you a flavour of our kick off meeting. It took place over two days so we’ve obviously trimmed it down a lot but if you have any questions, comments or suggestions do add it here and we’ll get back to you.

Gin and Tonic: A Short History of a Stiff Drink (from ActiveHistory.ca)

Gin and Tonic, from Wikipedia

Jay Young was inspired to write an ActiveHistory.ca post on the historical background of a favourite summer drink while working as a Researcher with Trading Consequences:

The Gin and Tonic – what better a drink during the dog days of summer? Put some ice in a glass, pour one part gin, add another part tonic water, finish with a slice of lime, and you have a refreshing drink to counter the heat. But it is also steeped in the history of medicine, global commodity frontiers, and the expansion of the British Empire. Continue reading

Building vocabulary with SPARQL

Aside

Judging from the Oxford Digital Humanities workshop on A Humanities Web of Data, and this related post on SPARQL queries by Jonathan Blaney, there is growing interest in using Semantic Web technologies for the digital humanities. Since the Trading Consequences digital historians are already perched on the edge of this particular bandwagon, I have written up a somewhat more technical post on how we’re using SPARQL and SKOS to develop the commodities vocabulary. You can find it here.

“Weevils”, “Vapours” and “Silver oics”: Finding Commodity Terms

One of the core tasks in Trading Consequences is being able to identify words in digitised texts which refer to commodities (as well as words which refer to places). Here’s a snippet of the kind of text we might be trying to analyse:

How do we know that gutta-percha in this text is a commodity name but, say, electricity is not? The simplest approach, and the one that we are adopting, is to use a big list of terms that we think could be names of commodities, and check against this list when we process our input texts. If we find gutta-percha in both our list of commodity terms and in the document that is being processed, then we add an annotation to the document that labels gutta-percha as a commodity name.

In our first version of the text mining system, we derived the list of commodity terms from WordNet. WordNet is a big thesaurus or lexical database, and its terms are organised hierarchically. This means that as a first approximation, we can guess that any lexical item in WordNet that is categorised as a subclass of Physical Matter, Plant Life, or Animal might be a commodity term. How well do we do with this? Not surprisingly, when we carried out some initial experiments at the very start of our work on the project, we found that there are some winners and some losers. Here’s some of terms that were plausibly labeled in as commodities in a sample corpus of digitised text:
horse, tin, coal, seedlings, grains, crab, merino fleece, fur, cod-liver oil, ice, log, potatoes, liquor, lemons. And here are some less plausible candidate commodity terms:
weevil, water frontage, vomit, vienna dejeuner, verde-antique, vapours, toucans, steam frigates, smut, simple question, silver oics.

There are a number of factors that conspire to give the incorrect results. The first is that our list of terms is just too broad, and includes things that could never be commodities. The second is that for now, we are not taking into account the context in which words occur in the text — this is computationally quite expensive, and not an immediate priority. The third is that the input to our text mining tools is not nice clean text such as we would get from ‘born-digital’ newswire. Instead, nineteenth century books have been scanned and then turned into text by the process of Optical Character Recognition (OCR for short). As we we’ll describe in future posts, OCR can sometimes produce bizarrely bad results, and this is probably responsible for our silver oics.

At the moment, we are working on generating a better list of commodity terms (as mentioned in a recent post by Jim Clifford. We’ll report back on progress soon.

Trading Consequences at the Geospatial in the Cultural Heritage Domain Event

Earlier this month Claire Grover, one of the Trading Consequences team based at University of Edinburgh Schools of Informatics, gave a presentation on the project at the JISC GECO Geospatial in the Cultural Heritage Domain event in London.

The presentation gives a broad overview of the Trading Consequences project and the initial text mining work that is currently taking place. The slides are now up on SlideShare and the audio recording of Claire’s talk will also be available here shortly:



You can also read a liveblog of all of the talks, including Claire’s, over on the JISC GECO blog.

Digging for Data in Archives

Since our last post the Trading Consequences team have been working with our identified and potential data providers to begin gathering digital data for the project.

As the various data providers were sending us millions of pages of text from digitized historical documents, I flew over to London to spend some time in the archives.

A major component of our Digging Into Data project will involve doing traditional historical research, in archives and using the digitized repositories, to provide a comparison between what the historians are able to find and what the data mining and visualization components discover. So I set about researching a few of the more interesting commodities flowing into London industry during the nineteenth century. This included archival records related to the palm oil trade in west Africa and records at Kew Gardens’ archives related to John Eliot Howard’s scientific investigations into cinchona and quinine. John Eliot was one of the “Sons” in Howard & Sons, who manufactured chemicals and drugs in Startford (near the site of the 2012 Olympics) throughout the nineteenth century. After photographing most of his papers at Kew, I also spent time at the London Metropolitan Archive, looking through the company records. It was at the LMA that I was reminded about the disappointments often associated with historical research. It turned out the single most interesting document listed in the archival holdings, a ledger listing the imports of cinchona bark throughout the middle of the century, had been destroyed at some point and a second document on their trade with plantations in Java is missing.

After collecting enough material to begin my study of the relationships between factories in the Thames Estuary and commodity frontiers in South America, Africa and India, I focused my final day in the archive on a set of sources that will directly assist with the data mining aspects of the project. I recorded four years of customs ledgers, which record the quantity, declared value and country of origin of the hundreds of different commodity categories imported into Britain (everything from live animals to works of art). This source will provide the foundation of the taxonomy of commodities that we will create over the next few months, which will then be used to mine the data. Moreover, these ledgers provide a good starting point for our research into Canada’s trade with Britain and we are recording the quantity and value of all the goods shipped across the Atlantic. Just in through the monotonous process of photographing a few thousand pages, the major changes between the early and late nineteenth century began to stand out. Not only were there a lot more commodities by the centuries’ end, but Britain was relying on far more countries to supply it with raw materials.

The question is key in Trading Consequences

“Dreams are today’s answers to tomorrow’s questions.”
– Edgar Cayce

Looking back to the global trading of commodities during the 19th century we see increasing access to digitised historical record, in a myriad of forms. Today, the rate at which we can collect and store data about trading is ever expanding, from high level statistics, to low level sensor data on containers in transit. In each case, the scale of the data is rapidly outstripping the provision of tools for the effective analysis and exploration of such data. The volume of data results in historians focussing on popular commodities or analysts asking for course-grained, aggregate measures.

Image of Savannah ©iStockPhoto 2012

Instead, to understand the consequences of our trading history, historians need to ask difficult, subtle, multifaceted and challenging questions. Questions which aren’t polluted by knowledge of the limitations of the methods and technologies we have today. These insightful questions won’t come from a focus on what the tools of today can support, what the analysis or visualisation methods can do or what data is available. Simply put, if you only know about hammers, all your problems will look start to like nails. And worse than this, everyone will start to think like the carpenter, reducing the power that the breadth of inter-disciplinary expertise gives you.

Overview of Information Visualisation pipeline

Figure 1: Overview of Information Visualisation pipeline

In this project we are bringing together an inter-disciplinary research team of historians, text analysis and information visualisation experts. Instead of starting with the key “historical questions” which historians are seeking answers to, it’s very tempting to focus on one or more of the earlier technology stages as shown in Figure 1. This figure is our adapted view of the “information visualisation pipeline” [1,2]. Data comes in a variety of abstract forms without a clear physical manifestation and needs to be dynamically collected, processed, cleaned and hence mined before interactive display.

However, if we first focus on a technology stage it will impact on the questions the historians might be able ask or the approaches to be taken. Consider, for example, the rendering step in Figure 1. Modern graphics APIs (eg. OpenGL), desktop computers or even commodity displays are showing increased ease of access to 3D software and hardware. If this is our starting point, we can quickly see how 3D stereoscopic tools will emerge and will shape what (if any) questions historians might pose, with our tools.

Focussing first on the data, mining, software, algorithms, layouts, methods etc. is the wrong approach in a project such as Trading Consequences. Instead, the historical questions are key. Our challenge as a team is to ensure that at the earliest stage we do not pollute the aspirations of historians. We need to encourage the historians to ask interesting questions about the data, without being hampered by expectations of what is feasible given current technology.

Of course, over time as questions emerge, prototypes will be developed and the creation of a shared view across a team is natural. We aim to continually bring in fresh perspectives to ensure that we are answering the questions which need to be asked, rather than the questions which can be asked.

[1] Stuart T. Kard, Jock D. Mackinlay, Ben Scheiderman (1999) Readings in Information Visualization: Using vision to think. Morgan Kaufman.
[2] Ben Fry (2007), Visualizing Data: Exploring and Explaining Data with the Processing Environment. O’Reilly Media.

 

Welcome to the Trading Consequences Blog

On this blog we will be sharing news and updates on the Trading Consequence project, a joint project between York University in Canada, the University of Edinburgh, and the University of St Andrews.

The project is looking at the historical documentation around commodity trading in the British Empire with a particular focus on the role of Canadian natural resources in the network of community flows. You can find more about the project on the About page.