Society of Cartographers Redux

To be filed under the “slightly self promoting” department, earlier this year I was invited to speak at the Society of Cartographers Summer School in Manchester, UK. It’s always great to be invited to speak at a conference but I was particularly excited by the SoC. The geo world I inhabit is one of data, APIs, platforms and data mining and aggregation techniques. Sometimes the map gets lost in all of this. So it was an honour to speak at an event where it was all about the map. The Summer School was written up in November’s edition of the SoC Newsletter which is only available to society members, but with permission I’ve reproduced below the sections of the newsletter which cover my involvement.

Welcome to the world of the geo data silo: where closed data is open and open data is closed – Gary Gale (Nokia)

Inspired by London Transport maps, various historical maps and his son, Gary has been involved with maps and mapping for many years. His entertaining, informative and well-illustrated lecture took delegates on a short trip along the route taken by location-based communications from smoke signals, pigeons, the compass, maps such as the Mappa Mundi, radio signals and triangulation through to today’s maps as seen in smart phone with GPS-based mobile devices. He then turned his attention to data, silos of data and the “geo-industry” where the map doesn’t seem to be important any more; it’s all about the data and the map is often strangely absent.

Gary then took delegates on another trip, this time into the dark world of ‘Geo-Babel’, where we have data, lots of data, wide and varied, some commercial (Navteq and Teleatlas), some authoritative (Britain’s Ordnance Survey) and some of it crowd- sourced and growing aggressively (OpenStreetMap), some from unlikely sources (Flickr) and some from location-based social networking services (Foursquare and Gowalla). All this data, often available and free, a cartographer’s dream, but wait, Gary explains that there is now a darker side to data. Much of this ‘free’ data appears to be locked in its own private little data silos, ironically at a time when previously proprietary data becomes unlocked and open (Ordnance Survey), crowd-sourced data becomes locked behind a well meaning but restrictive license, the question is posed to delegates, how can we, as part of the geo-industry, dig ourselves out of this hole?

Mike Shand

Panel discussion: “All this data is good but what about the cartography?”

The last session of the conference was setup as a panel discussion, with the theme of “All this data is good, but what about the cartography?” In order to start the ball rolling the preceding presentation was by Gary Gale (Nokia/Ovi Maps). His grandly entitled presentation – Welcome to the world of the geo data silo; where closed data is open and open data is closed – certainly resonated with me, particularly “the four horsemen of the geopocalypse”. Gary sat aside to allow his fellow panelists a short rant-space each. Richard Fairhurst concentrated on his vision of carto-goodness. He made an interesting analogy between industrial carto (Google), Boing Boing carto (retro 8-bit games style map) and Artisan carto (cartography with care). For a laugh (I presume!) he proposed a figurehead for web cartography and then flipped up a slide with three figureheads – Jobs, Gates and Chilton. He was followed by Bob Barr with a wider view of maps and quality. I then tried to propose some questions to the panel (eg: you have shown examples of good/bad design – but what are you exactly looking for when you are making those choices?) – and then opened it up for audience participation and questions/comments. We really should have recorded this session as there was a wide- range of points made, few of which I can now recall! You really needed to be there to get the full impact of the panelists’ views and the lively discussion that ensued.

Steve Chilton SoC Chair

When I last wrote about my theory of GeoBabel I seem to recall saying I was retiring it. That’s still true but seeing as I didn’t actually write the newsletter my geoconscience is clear on this point.

Talking GeoBabel In Three Cities (And Then Retiring It)

You’re invited to speak at a conference. Great. The organisers want a talk title and abstract and they want it pretty much immediately. Not so great; mind goes blank; what shall I talk about; help! With this in mind, my first thought is normally “can I adapt, cannibalise or repurpose one of my other talks?“. This sometimes works. If there’s a theme which you haven’t fully worked through it can serve you well.

But a conference audience is an odd beast; a percentage of which will be “the usual suspects“. They’ve seen you talk before, maybe a few times. The usual suspects also tend to hang out on the conference Twitter back channel. Woe betide if you recycle a talk or even some slides too many times; comments such as “I’m sure I’ve seen that slide before” start to crop up. Far better to come up with new and fresh material each time.

But sometimes you can get away with it and so it was with my theme of GeoBabel. Three conferences: the Society of Cartographers Summer School, The Location Business Summit USA, AGI GeoCommunity 2010. Three cities: Manchester, San Jose, Stratford-upon-Avon. Three audiences: cartographers, Silicon Valley geo-location business types, UK GIS business types.

I’ve written about GeoBabel before; it’s the problem the location industry faces as we build more and more data sets which are fundamentally incompatible with each other. This incompatibility arises either due to differing unique geographic identifiers, where Heathrow Airport, for example, is found in each data set, with differing metadata and a different identifier, or due to different licensing schemes which don’t allow data to be co-mingled. We now have more geographic data than before but each data set is locked away in its own silo, either intentionally or through misguided attempts to be open.

The slide deck, embedded above, is the one I used in San Jose. The ones for Manchester and for Stratford-upon-Avon are pretty much identical but are on SlideShare as well.

As another way of illustrating the problems of GeoBabel, I came up with what I’ve termed The Four Horseman Of The Geopocalypse. All very fin de siecle but it seemed to be understood and liked by the audience at each talk.

The first Horseman is not Pestilence but Data Silos. All of the different types of geographic data we have, international and national commercial data, national and crowd sourced open data, specialist and niche data and social network crowd sourced data each live in isolation to each other with the only common denominator being the geo-coordinates each data set’s idea of a place has.

The second Horseman is not War but Licensing. Nowadays in the Web 2.0 community we’re used to having access to data but we’re not willing to pay for it. Licenses vary between closed commercial licenses and open licensing. But even in the open license world there are silos, with well meaning licenses becoming viral and attaching themselves to any derived work.

Which segues neatly to the third Horseman, who’s not Famine but Derivation. Each time you create something from data, you’re deriving a new work in the eyes of most licenses and that means the derived work often has the original license still attached to it. You do the work, but you don’t own the work.

Finally, the fourth Horseman is not Death but Co-Mingling. There is no one single authoritative geographic data set, you need to find the ones which work for you and for your business or use case. That means you need to mingle the data sets and frequently the licenses you have for those data sets explicitly prohibit this.

Babel by Cildo Meireles

But now after three outings, it’s time to retire GeoBabel, for now at least, just as I retired my Theory Of Stuff earlier this year. That means I had to find a new theme to talk about at my next event, the Geospatial Specialist Group at the British Computer Society. But that’s in my next post.

Your Place Is Not My Place; The Perils of Disambiguation

We take the art of geographic lookup for granted these days; type a place name into a form on a web site or feed it into a web service API and hey presto! Most of the time you’ll be told whether or not the place name is valid or not and, in case there’s more than one place with the same name, either asked to choose which one you mean or be presented with the most likely place.

Most of the time … but not all of the time.

Which Way To The Town Centre?

The hey presto bit of the process seems at first glance to be relatively trivial but isn’t. Just ask anyone who’s had to implement a system that handles place names. Actually, the hey presto part is actually two discreet processes in their own right. First of all we need to identify a place, or whether indeed there’s a place at all; this is usually called geoidentification.

identify; verb; establish or indicate who or what (someone or something) is

This is the thing that determines that there is a place in “I’m in London today” but not in “I do love Yorkshire Pudding“.

Once a place has been identified, we need to work out if there’s more than one place of the same name (which is more than likely as we’re stunningly unimaginative where place names are concerned, duplicating and reusing the same name all over the world) and if so, which one. This is usually called geodisambiguation.

disambiguate; verb; remove uncertainty of meaning from (and ambiguous sentence, phrase or other linguistic unit)

Some places are pretty easy to disambiguate; as far as I know there’s only one Ouagadougou and that’s the capital of Burkina Faso. Some places should be easy to disambiguate, least at first sight; take London, that should be easy. It’s the capital of the United Kingdom. Well that’s true but it could also be the London in Ontario, or the one in Arkansas, in California, in Kentucky or any of the other 22 Londons that I’m aware of.

The gentle art of disambiguation is critical to the act of geocoding, geoparsing, geotagging and any of the other words the the location industry chooses to tack geo on as a prefix. Get disambiguation wrong and you fail on two counts.

Firstly, you’re showing your audience that you don’t know or don’t care about what they’re trying to tell you. Secondly, you allow your users the opportunity to specify the same place in a multitude of conflicting ways.

This is part of the problem of GeoBabel; your place is not my place.

So far, so theoretical, but let’s look at a concrete example of this. A few weeks back I added my Twitter account to the Twitter directory site The first thing you’re asked to do is to supply your location, or to “Type Your City” as phrases it. So I type London and the site starts to attempt to disambiguate on the fly; so do I mean “London, United Kingdom” or “London, Ontario“? But wait, what about the other options? - London geo disambiguation fail #1

Which “London” is the one tagged by 436 people but with no indication of which country? What’s the difference between “London, United Kingdom“, “London,UK” and “London England“. Space and punctuation, or the lack of it, is obviously important to here. So let’s try and give the system some help and start to type United Kingdom … - London geo disambiguation fail #2

Oh dear. The “London, United Kingdom” still shows up but because I’ve put a space in there I don’t get offered “London,UK” anymore but I do get offered the London in the lesser known country of “Uunited Kingdom” and also “London, Ub2“, which one assumes is the UB2 postal code which specifies the London suburb of Southall.

Your place is not my place.

To be fair, I’m not singling out for attack here; this is just one of many examples of sites who try to use geographic lookup but end up making life difficult for their users (but which London do I pick?) and for themselves (now, how many users in London in the UK do we have?). I’d happily offer to help them; if only I could find any contact information anywhere on the site …

The Letter W and Hype (or Local) at the Location Business Summit

Each time I give my Hyperlocal or Hype (and Local) talk it morphs slightly and becomes more scathing of the term hyperlocal.

I started to write the talk for Where 2.0 in San Jose earlier this year and approached it from the point of a hopeful sceptic who was looking to be persueded that the long promised hyperlocal nirvana was either right here, right now or was at least looming hopefully on the horizon.

A month later and I had the pleasure of sharing the keynote slot with Professor Danny Dorling at the GIS Research UK conference at University College London and I revisited the theme. By this time any hope of hyperlocal nirvana had pretty much vanished.

Yesterday I took the talk out for the final time at the Location Business Summit in Amsterdam and the elephant in the room relating to hyperlocality had grown into a full blown herd of elephants.

My scepticism was echoed by several members of the audience, notably James Thornett from the BBC who blogged about it and with whom I shared a panel on the nebulous concept that is the geoweb today.

But what really seemed to catch the audience’s imagination was my twin memes of Geobabel and the Three W’s of Geo … the where, the when and the what.

A new and accurat map of the world

The where is what we’ve been doing for centuries; mapping the globe. Whilst it’s a sweeping generalisation, we’ve pretty much done this, albeit to a varying degree of accuracy, coverage and granularity. We’ve mapped the globe, now it’s time to do something with all of this data.

The when is the gnarly problem of temporality, which just won’t go away. Places and geography change over time; how we map a place today doesn’t show how the place was 100 years ago and neither can we expect the geography of a place to be static 100 years hence. As we update our geographic data sets and throw away the old, supposedly obsolete, historical versions, we’re throwing away a rich set of temporality in the process.

Map from memory

Then finally there’s the what; a reference to a place in intrinsically bound to it’s granularity. References to London from outside of the United Kingdom are frequently aimed at the non specific London bounded by the M25 orbital motorway. Zoom in and London becomes Greater London, and then the London Boroughs and finally the City of London and neighbouring City of Westminster.

The strong reaction to these twin memes makes me think that we’ll be seeing these topics continue to raise their heads until we’re able to find work arounds or solutions.

Fighting GeoBabel on Two Fronts

The well known, highly opinionated and occasionally error prone Tech Crunch seems to think there’s a location war going on.

A search for the keywords location and war on the site yields strident post titles including Just In Time For The Location Wars, Twitter Turns on Geolocation On Its Website, Location Isn’t A War Between Two Sides, It’s A Gold Rush For Everyone, What Did The Location War Look Like At SXSW? Like This and Google Escalates The Location War With Google Places.

And Tech Crunch are right, there is a location war going on, but it’s not the war that Michael Arrington and crew are thinking of; this war is much more insidious. It’s the war against GeoBabel and it’s being fought right now on two fronts.

Babel by Cildo Meireles

Front number one is your place is not my place. You may think we’re talking about the same place, the same POI, the same location, the same city or neighbourhood but we’re not. You’re fluent in Gowalla, I’m fluent in Foursquare and the rest of the internet is fluent in Geonames, OpenStreetMap and WOEIDs, each with their own subjective view of where. GeoBabel.

The second front is we think we’re speaking the same terminology, we’re not. Recent articles and comments, not exclusively restricted to Tech Crunch, have bandied about the terms place, map, location, centroid, coordinate, long/lat and used them interchangeably and inconsistently. GeoBabel again.

There’s little doubt that the dream of location as a key context is now on the cards and we’re rushing headlong to meet it. We think we’re all speaking about the same thing, but the sad truth is that we’re speaking about totally disparate concepts and terms most of the time.

Until we solve this GeoBabel in the making, the location war will be lost without most of the people impacted by it ever knowing it was being fought.

