The Boundaries of Commodities

Together with Jim Clifford and Uta Hinrichs, I was lucky enough to be able to attend the first Networking Workshop for the AHRC Commodity Histories project on 6–7 September. This was organised by Sandip Hazareesingh, Jean Stubbs and Jon Curry-Machado, who are also jointly responsible for the Commodities of Empire project. The main stated goal of the meeting was to design a collaborative research web space for the community of digital historians interested in tracing the origins and growth of the global trade in commodities. This aspect of the meeting was deftly coordinated by Mia Ridge, and also took inspiration from William Turkel‘s analysis of designing and running a web portal for the NiCHE community of environmental historians in Canada.

Complementing the design and planning activity was an engaging programme of short talks, both by participants of Commodities of Empire and by people working on related initiatives. I won’t try to summarise the talks here; there are others who are much better qualified than me to do that. Instead, I want to mention a small idea about commodities that emerged from a discussion during the breaks.

A number of the workshop participants problematized the notion of ‘commodity’, and pointed out that it isn’t always possible or realistic to set sharp boundaries on what counts as a commodity. It’s certainly the case that we have tended to accept a simple reification of commodities within Trading Consequences. Tim Hitchcock argued that commodities are convenient fictions that abstract away from a complex chain of causes and effects. He gave guano as an example of such a commodity: it results from a collection of processes, during which fish are consumed by seabirds, digested and excreted, and the resulting accumulation of excrement is then harvested for subsequent trading. Of course, we can also think about the processes that guano undergoes after being transported, most obviously for use as a crop fertiliser that enters into further relations of production and trade. Here’s a picture that tries to capture this notion of a commodity being a transient spatio-temporal phase in a longer chain of processes, each of which takes place in a specific social/natural/technological environment.
Diagram of commodity as phase in a chain of processes
Although we have little access within the framework of Trading Consequences to these wider aspects of context, one idea that might be worth pursuing would be to annotate the plant-based commodities in our data with information about their preferred growing conditions. For example, it might be useful to know whether a given plant is limited to, say, tropical climate zones, and whether it grows in forested or open environments. Some of this data can probably be recovered from Wikipedia, but it would be nice if we could find a Linked Data set which could be more directly linked to from our current commodity vocabulary. One benefit of recording such information might be an additional sanity check that we have correctly geo-referenced locations that are associated with plants. Another line of investigation would be whether a particular plant is being cultivated on the margins of its environmental tolerance by colonists. Finally, data about climatic zone could play well with map-based visualisations of trading routes.

“Weevils”, “Vapours” and “Silver oics”: Finding Commodity Terms

One of the core tasks in Trading Consequences is being able to identify words in digitised texts which refer to commodities (as well as words which refer to places). Here’s a snippet of the kind of text we might be trying to analyse:

How do we know that gutta-percha in this text is a commodity name but, say, electricity is not? The simplest approach, and the one that we are adopting, is to use a big list of terms that we think could be names of commodities, and check against this list when we process our input texts. If we find gutta-percha in both our list of commodity terms and in the document that is being processed, then we add an annotation to the document that labels gutta-percha as a commodity name.

In our first version of the text mining system, we derived the list of commodity terms from WordNet. WordNet is a big thesaurus or lexical database, and its terms are organised hierarchically. This means that as a first approximation, we can guess that any lexical item in WordNet that is categorised as a subclass of Physical Matter, Plant Life, or Animal might be a commodity term. How well do we do with this? Not surprisingly, when we carried out some initial experiments at the very start of our work on the project, we found that there are some winners and some losers. Here’s some of terms that were plausibly labeled in as commodities in a sample corpus of digitised text:
horse, tin, coal, seedlings, grains, crab, merino fleece, fur, cod-liver oil, ice, log, potatoes, liquor, lemons. And here are some less plausible candidate commodity terms:
weevil, water frontage, vomit, vienna dejeuner, verde-antique, vapours, toucans, steam frigates, smut, simple question, silver oics.

There are a number of factors that conspire to give the incorrect results. The first is that our list of terms is just too broad, and includes things that could never be commodities. The second is that for now, we are not taking into account the context in which words occur in the text — this is computationally quite expensive, and not an immediate priority. The third is that the input to our text mining tools is not nice clean text such as we would get from ‘born-digital’ newswire. Instead, nineteenth century books have been scanned and then turned into text by the process of Optical Character Recognition (OCR for short). As we we’ll describe in future posts, OCR can sometimes produce bizarrely bad results, and this is probably responsible for our silver oics.

At the moment, we are working on generating a better list of commodity terms (as mentioned in a recent post by Jim Clifford. We’ll report back on progress soon.