Posts Tagged ‘geocom’

Talking GeoBabel In Three Cities (And Then Retiring It)

You’re invited to speak at a conference. Great. The organisers want a talk title and abstract and they want it pretty much immediately. Not so great; mind goes blank; what shall I talk about; help! With this in mind, my first thought is normally “can I adapt, cannibalise or repurpose one of my other talks?“. This sometimes works. If there’s a theme which you haven’t fully worked through it can serve you well.

But a conference audience is an odd beast; a percentage of which will be “the usual suspects“. They’ve seen you talk before, maybe a few times. The usual suspects also tend to hang out on the conference Twitter back channel. Woe betide if you recycle a talk or even some slides too many times; comments such as “I’m sure I’ve seen that slide before” start to crop up. Far better to come up with new and fresh material each time.

But sometimes you can get away with it and so it was with my theme of GeoBabel. Three conferences: the Society of Cartographers Summer School, The Location Business Summit USA, AGI GeoCommunity 2010. Three cities: Manchester, San Jose, Stratford-upon-Avon. Three audiences: cartographers, Silicon Valley geo-location business types, UK GIS business types.

I’ve written about GeoBabel before; it’s the problem the location industry faces as we build more and more data sets which are fundamentally incompatible with each other. This incompatibility arises either due to differing unique geographic identifiers, where Heathrow Airport, for example, is found in each data set, with differing metadata and a different identifier, or due to different licensing schemes which don’t allow data to be co-mingled. We now have more geographic data than before but each data set is locked away in its own silo, either intentionally or through misguided attempts to be open.

The slide deck, embedded above, is the one I used in San Jose. The ones for Manchester and for Stratford-upon-Avon are pretty much identical but are on SlideShare as well.

As another way of illustrating the problems of GeoBabel, I came up with what I’ve termed The Four Horseman Of The Geopocalypse. All very fin de siecle but it seemed to be understood and liked by the audience at each talk.

The first Horseman is not Pestilence but Data Silos. All of the different types of geographic data we have, international and national commercial data, national and crowd sourced open data, specialist and niche data and social network crowd sourced data each live in isolation to each other with the only common denominator being the geo-coordinates each data set’s idea of a place has.

The second Horseman is not War but Licensing. Nowadays in the Web 2.0 community we’re used to having access to data but we’re not willing to pay for it. Licenses vary between closed commercial licenses and open licensing. But even in the open license world there are silos, with well meaning licenses becoming viral and attaching themselves to any derived work.

Which segues neatly to the third Horseman, who’s not Famine but Derivation. Each time you create something from data, you’re deriving a new work in the eyes of most licenses and that means the derived work often has the original license still attached to it. You do the work, but you don’t own the work.

Finally, the fourth Horseman is not Death but Co-Mingling. There is no one single authoritative geographic data set, you need to find the ones which work for you and for your business or use case. That means you need to mingle the data sets and frequently the licenses you have for those data sets explicitly prohibit this.

Babel by Cildo Meireles

But now after three outings, it’s time to retire GeoBabel, for now at least, just as I retired my Theory Of Stuff earlier this year. That means I had to find a new theme to talk about at my next event, the Geospatial Specialist Group at the British Computer Society. But that’s in my next post.

Photo Credits: Nick. J. Webb on Flickr.
Written and posted from home (51.427051, -0.333344)

W3G – A Chair’s Eye View

Last year GeoCommunity, the annual conference of Britain’s Association for Geographic Information, took the brave (and in my view totally necessary) step of branching out from their traditional GIS heartland audience (sometimes referred to somewhat disparagingly as paleotards) to take on board the views of the neo-geographers, Web 2.0 and LBMS community (sometimes equally disparagingly called neotards). Mud-slinging labels aside, both geographic communities benefitted from the Geo-Web Track as it was called. I was lucky enough to be asked to participate and the Geo-Web Track was a resounding success, for both the paleo-geography and neo-geography camps.

This year, attempting to build on the success of the Geo-Web Track, I was asked by the AGI to chair a one day conference to run on the day before GeoCommunity 2009. Originally pitched as a true unconference I went for an (un)conference, half way between the joyous informality of an unconference and the formality of an invited speaker conference. So we had both, unconference sessions (all of which were filled with ease) and a set of invited guest speakers and keynotes. Trying to think of a name, I came up with W3G … the 3 W’s of Geo, which had cropped up in a blog post in April of this year. Any resemblance in name between W3G and the W3C is, of course, purely intentional.

W3G Closing Panel

Attending any sort of conference is a tiring affair; chairing and organising one is truly exhausting. While most of the thanks on the day and afterwards were directed at me, the real thanks needs to go to my fellow organiser, Rollo Home, with the support of Chris Holcroft and Claire Huppertz, all of whom had their hands more than full with GeoCommunity starting the very next day after W3G.

As chair, I gave the opening introduction, to set the theme and tone of the day and to introduce the unconference element to those unfamiliar with the concept.

So should W3G have existed at all? The GeoWeb Track at GeoCommunity 2009 certainly showed that there was an appetite for the neo-geographic side of the Location Industry, so why not integrate W3G or the GeoWeb Track into the main GeoCommunity again? That’s a difficult decision to come to … whilst there was probably around 30% of the audience of W3G attending GeoCommunity, that still leaves 70% of the audience who were totally new to the AGI. Would they have paid the asking price of a GeoCommunity ticket? Probably not. The neo-geography side of things does tend to thrive on free or low cost events (with the notable exception of O’Reilly’s Where 2.0 in Silicon Valley, which is both excellent and eye-wateringly expensive). So for this year at least, W3G served a valuable dual purpose, bringing the AGI to the attention of a community which probably didn’t know it even existed and allowing a whole load of latent geographers to meet, talk, learn and network … as well as consuming vast amounts of coffee, beer and curry. In that order.

We’re already talking about repeating the success of W3G next year in some shape or form; something I definitely want to be involved in. But I would like to see the gap between the GIS heartland and the neo-geographers, which still seem to be a long way apart at times, narrowed or even closed. The AGI is eminently poised to help bring these two parts of the community together and GeoCommunity 2011 would be the ideal event to do this, making it a Geo Community in the truest sense of the word. In 2009 I questioned whether GeoCommunity would unite the two polarised worlds of geo … the answer in 2010 is that we’ve take a few steps in the right direction, but we’re still not there yet.

Photo Credits: Paul Clarkel on Flickr.
Written and posted from home (51.427051, -0.333344)

Know Your Place; Adding Geographic Intelligence to your Content

Day two of the AGI GeoCommunity conference and the conference as a whole has ended. We discussed neogeography, paleogeography and pretty much all points in between, finally agreeing that labels such as these get in the way of the geography itself. I was fortunate enough to have my paper submission accepted and presented a talk on how to Know Your Place at the end of the morning’s geoweb track. The paper is reproduced below and the deck that accompanies it is on SlideShare.

Know Your Place; Adding Geographic Intelligence to your Content


Yahoo! GeoPlanet exposes a geographic ontology of over six million named places, enabling technologies that join users with with most geographically relevant information possible and forms the heart of the Yahoo! Geo Technologies group’s technology platform.

GeoPlanet uses a unique, language neutral identifier for (nearly) all named places around the world. Each place exists within a graph of other places; the relationships between places are categorised by the nature of the relationship, categorised by administrative hierarchy, geographical scope and place type, amongst other. 

GeoPlanet’s geodata repository is exposed by publicly available web service platforms that allow places to be identified within content (Yahoo! Placemaker) and investigated by place name or identifier (Yahoo! GeoPlanet). Users are able to navigate rich metadata associated with a place including the place hierarchies and obtain parent, child, belong-to and neighbouring relationships.

For example, a list of first level administrative entities in a given country may be obtained by requesting the list of the children of that country. In a similar manner the surrounding postal codes of a given post code by be obtained via a request for its neighbours.

The framework for this is uniform and consistent across the globe and facilities geo-enrichment and geo-identification in a wide range of content, both structured and unstructured.

Place-based Thinking

Traditionally geography has been treated as a purely spatial exercise; this is certainly the case on the internet. Places are specified in terms of their longitude and latitude, and so cities or towns are referenced by the co-ordinate pair that identifies the theoretical or arbitrary centre of the place.

From this it can be seen that everything on the internet which is location related is referenced by a co-ordinate pair that has little relevance to a human but much relevance to a geographer or software which can algorithmically undertake a radius search from a point. Instead of a spatially based approach to location, Yahoo! Geo Technologies take a place based approach.

The map above shows a spatially correct map of the central area of the London Underground network similar to those produced up until the early 1930s; in the central area of London the map is compressed due to the close proximity of the lines and their stations.

In 1932 the familiar Tube map, shown below, was produced by Harry Beck in the form of a non geographic linear diagram. Whilst not geographically or spatially correct it is far more accessible and information rich due to Beck’s assumption that people are less concerned with the exact location of a station and more interested in how to change between lines and get to their destination.

We have taken a not dissimilar approach with our repository of named places, where a place can be a monument, a park, a colloquial region such as the Home Counties and continent or even the Earth. We have taken each of these different place names at all of their differing granularities and given them unique identifiers, called Where On Earth Ids.


The Where On Earth ID is a unique and permanent global identifier, shared publicly via the GeoPlanet and Placemaker API platforms.

They are language neutral, thus the WOEID for London is the same as for Londres, for Londra and for ロンドン, whilst recognising, for the London in the United Kingdom, that London, Central London, Greater London and the City of London are geographically related though separate places.

Their usage ensure that all Yahoo! APIs have the ability to employ geography consistently and globally.

A Global Geographic Ontology

Within our geodata repository we know not only where a place is geographically located, via its centroid, but also how these places relate to each other. This is more than an index of places, it is a geographic ontology of named places, each of which is referenced by a WOEID.

Using the postal town of Stratford-upon-Avon as an example, we can determine the children of a place, its parent, its adjacent places and non administrative or colloquial areas that a place belongs to or is contained within, at the following granularities. 

  • Supernames
  • Continents
  • Countries
  • Counties
  • Regions
  • Neighbourhoods
  • ZIP and Postal Codes
  • Custom Geographies

Joining People with Content and Content with People

We can use Placemaker to parse structured and unstructured content and to identify the places referenced, each of which is represented by a WOEID. Where more than one potential place exists for each name, a ranked list of disambiguated names is presented.

Each of the WOEIDs returned by Placemaker have the notional centroid and the bounding box, described by the South West and North East coordinates, as attributes. This allows the concept of a place to be displayed, such as that for the postal town of Stratford-upon-Avon, as shown below.

For each WOEID, we can use GeoPlanet to determine the vertical relationships of the place, such as which cities are in a country or which postal codes are within a city. We can determine the states, provinces or districts with in a country and which countries are on a continent. This powerful vertical hierarchy can be easily navigated from any WOEID.

GeoPlanet also contains a horizontal-like hierarchy, which frequently overlaps. If searching against a specific place such as a postal code, we can determine the surrounding postal codes as well; if searching for a town, we can determine the surrounding postal towns, as shown below.

GeoPlanet contains a rich ontology of named places, which allows us to look up places and where these places are. But more powerful is the relationship between places which allows users of GeoPlanet to add geographic intelligence to their use cases and applications, browsing the horizontal and vertical hierarchies with ease to discover geographic detail that no other point radius-based search would allow us to do.

Capturing the World’s Geography as it is Used by the World’s People 

The Oxford English Dictionary, often criticised for capturing transient or contentious terms, states its goal as “to capture the English language as it is used at this time” and not to impose how things are called. In the same vein, our goal is to capture the world’s geography as it is used by the world’s people.

We aim to follow the United Nations and ISO 3166-1 guidelines on the official name for a place but we strive to know the informal, the ethnic and the colloquial. We are less concerned with imposing a formal geography as we are with describing how a place is described today and what its relationship is with its parent, its children and its neighbours.

Thus we recognise that MOMA NYC (WOEIDs 23617044 and 2459115) is used to refer to the Museum of Modern Art in New York, that San Francisco (2487956) is the more commonly used form of The City and County of San Francisco and that the London Eye and the Millennium Wheel are synonymous (WOEID 22475381).

A Tale of Two Stratfords

Stratford is an important tourist destination, due to the town being William Shakespeare’s birthplace, with both the “on-Avon” and “upon-Avon” suffixes being used to refer to the town. GeoPlanet recognises both Stratford-on-Avon and Stratford-upon-Avon (WOEID 36424) when referring to the postal town and further recognises Stratford-on-Avon (WOEID 12696101) as the administrative District which is the parent for Stratford-upon-Avon.

“the Council often gets asked why there is a difference in using the terms ‘Stratford-on-Avon’ and ‘Stratford-upon-Avon’. Anything to do with the town of Stratford is always referred to as Stratford-upon-Avon. However, as a district council, we cover a much larger area than the town itself, but did not want to lose the instantly recognised tag of Stratford, so anything to do with the district is referred to as Stratford-on-Avon.” 

Appendix A – Data Background

The GeoPlanet geodata repository is derived from a variety of sources, both spatial data vendors, openly available sources and Yahoo! sourced. In raw form, it occupies 25 GB of storage; after automated  topology generation and semi automated processing to clean the data and to remove duplicates, the final data footprint is around 9.5 GB. A specialised Editorial team assesses overall data quality and integrity, areas of ambiguity and challenging geographics, such as disputed territories and colloquial areas.

Appendix B – Further Reading

  1. Yahoo! Developer Network – Yahoo! Placemaker
  2. Yahoo! Developer Network – Yahoo! GeoPlanet
  3. The London Tube Map Archive
  4. Transport for London – Design Classic
  5. Yahoo! Developer Network – Where On Earth Identifiers
  6. Oxford English Dictionary – Preface to the Second Edition (1989)
  7. Yahoo! Developer Network – On Naming and Representation
  8. Stratford-on-Avon District Council – Community and Living

Posted via email from Gary’s Posterous

Location and Privacy – Where Do We Care?

As part of this year’s AGI GeoCommunity ’09 conference, I took part in the Privacy: Where Do We Care? panel on location and the implications for privacy with Terry Jones, Audrey Mandela and Ian Broadbent, chaired and overseen by conference chair Steven Feldman.

Our location is probably the single most valuable facet of our online identity, although where I currently am, whilst interesting, is far less valuable and  personal than where I’ve been. Where I’ve been, if stored, monitored and analysed, provides a level of insight into my real world activities that transcends the other forms of insight and targeting that are directed at my online activities, such as behavioural and demographic analysis.

Where I’ve been, my location stream if you will, is a convergence of online and real world identity and should not be revealed, ignored or given away without thought and without consent.

In the real world we unconsciously provide differing levels of granularity in our social engagements when we answer the seemingly trivial question “where have you been?“. To our family and close friends we may give a detailed reply … “I was out with colleagues from work at Browns on St. Martin’s Lane, London“, to other friends and colleagues we may give a more circumspect reply … “I was out in the Covent Garden area” and to acquaintances, a more generalised reply … “I was in Central London” or even “mind your own business

As with the real world, so we should choose to reveal our location to applications and to companies online with differing levels of granularity, including the ability to be our own source of truth and to conceal ourselves entirely, in other words, to lie about where I am. 

Where I am in the real world should be revealed to the online world only on an opt-in basis, carefully considered and with an eye on the value proposition that is being given to me on the basis of revealing my location to a third party. My location is mine and mine alone and I should never have to opt out of revealing where am I and where I’ve been.

Posted via email from Gary’s Posterous

The Geo Ice Has Broken

Last night was the icebreaker for the AGI GeoCommunity conference in Stratford-upon-Avon (but not Stratford-upon-Avon, oh no, that’s the district not the town you know) and the run up to the conference has started extremely well, with the added bonus for me that John McKerrell of used a quote from one of my decks as the #geocom landing page.

Twitter is abuzz with commentary on what’s happening and who’s going to be doing what, all accompanied by the eponymous #geocom hashtag and everyone’s hoping that the conference lives up to their expectations. As Thierry Gregorious aptly put it on Twitter “#geocom If this feed is producing messages at current rate, will people be glued to their mobiles instead of the presentations?” … we shall see.

The ice breaker dinner well and truly broke ice and I landed up on a table full of geostrangers and Andrew Turner; as table 24 we put in a rather respectable joint second place in the 100 question quiz, but then crashed and burned to 3rd place after not being nearly accurate enough in the tie-breaker question on when precisely did the Berlin Wall come down.

After a surprisingly good dinner, with surprisingly good wine we sat through a surprising, and intriguing, comedienne who appeared to be the result of a union between Jasper Carrot and Victoria Wood. It was certainly an experience.

Finally everyone headed to the bar where some overworked and entirely good natured bar staff served us geolibations, geolagulavins and geo-gin-and-tonics until the early hours.

And the conference hasn’t even begun yet …

Posted via email from Gary’s Posterous