Posts about open

The Challenge Of Open

Location Information SG. Earlier this week I gave a talk, but what to talk about?

It didn't take too long to come up with a suitable theme. In my current day job, consulting with open data specialists Lokku, I come across the benefits and the challenges in using open data on almost a daily basis. One of the earliest lessons is that nothing is simple and nothing is straightforwards when you bring licensing into a field and open data is no exception.

One of the great things about the combination of maps, geo, location and London is that roughly once a month there's some kind of meetup happening in the city on these themes. One of the longer running players in this space is the Geospatial Specialist Group of the British Computer Society which is being relaunched and reinvigorated as the Location Information SG. Earlier this week I gave a talk, but what to talk about?

It didn't take too long to come up with a suitable theme. In my current day job, consulting with open data specialists Lokku, I come across the benefits and the challenges in using open data on almost a daily basis. One of the earliest lessons is that nothing is simple and nothing is straightforwards when you bring licensing into a field and open data is no exception.

Slide01 Slide02

So, hello, I’m Gary and I’m from the Internet. I’m a self-confessed map addict, a geo-technologist and a geographer. I’m Geotechnologist in Residence for Lokku in London. I used to be Director of Global Community Programs for Nokia’s HERE maps and before that I led Yahoo’s Geotechnologies group in the United Kingdom. I’m a founder of the Location Forum, a co-founder of WhereCamp EU, I sit on the Council for the AGI, the UK’s Association for Geographic Information, I’m the chair of the W3G conference and I’m also a Fellow of the Royal Geographical Society.

Slide03

There’s a lot of URLs in the slides to follow and rather than try to frantically jot them down, this is the only URL you really need to know about. If you go there right now, this link will 404 on you but sometime tomorrow this where my slides and all my talk notes will appear here.

Slide04

I've been in this "industry" for almost 25 years. I'm not quite sure what actually comprises this "industry" though; I think of it as a loose collection of software, data, geo, maps and location. Thinking back, maybe life was easier when everything was proprietary and locked up? You knew the boundaries, you knew what you could and couldn't do with software and data. You didn't need to be a part time lawyer.

Slide05

But this isn't 25 years ago, like it or not we're in the future.

Slide06

And the future is very much open.

Slide07

Whether it's the open source software that runs your laptop or desktop or the open source software that runs the vast majority of the internet and the web ...

Slide08

Or whether it's open data, such as OpenStreetMap or open government data, the concept of open is very much of the now and that means we need to be able to deal with both the benefits this brings as well as some of the pitfalls that lie in wait for the unwary

Slide09

One of those pitfalls is the license, that usually vast amount of frankly impenetrable legalese that is difficult to understand and seems to have been written for lawyers and not for mere mortals.

Slide10

This isn't a new thing. Think back to the days before we downloaded software in a blinking of an eye. Remember shrink wrapped software? Remember the catch 22 of breaking the seal meaning you accepted the EULA that was underneath the shrink wrap?

Slide11

No one read the EULA, we just wanted to get our hands on those brand new floppy disks and then patiently feed them, one by one, to our computer to get at our new purchase.

Slide12

Even in the days of the web, where downloads have supplanted floppies, CD and DVD ROMs, we just want to get to the "good stuff". We instinctively look for the button that says "accept" or "agree" and just ... click.

Slide13

We don't read the EULA, or the terms of service, or the terms of use, or the license. In essence we're blind to what we're agreeing to and sometimes what we do agree to can be surprising.

Slide14

If you use iTunes on your phone, tablet or computer you'll have agreed to the iTunes terms of service and in doing so, scuppered your plans for taking over the world by use of anything nuclear, chemical or biological.

Slide15

If you're using Apple's Safari browser on a Windows machine, you'll also be in breach of the license which you've accepted and which clearly states that you won't run Safari for Windows on a Windows machine.

Slide16

But you may be missing out on an unexpected treat. In 2005, the makers of PC Pitstop included a clause that promised a financial reward for reading the EULA and contacting the company. Five months after release and 3,000 sales later one person did read the EULA and was rewarded with a cheque for $1000

Slide17

But I am not a lawyer. I have no legal training whatsoever. With the proliferation of open source and open data it now feels that I have to be able to read the small print. If you don't read your open licenses then I would strongly recommend that you do.

Slide18

In doing so, you'll probably feel as I first did; that you're walking into a veritable minefield of clauses, exclusions and prohibitions.

Slide19

You'd be forgiven for thinking that if you're fortunate enough to be dealing with purely open licensing, with not even a whiff of anything proprietary, that everything is clear, it's all black and white.

Slide20

You'll start to become familiar with the GPL.

Slide21

With Creative Commons, with or without attribution and with or without non-commercial use clauses.

Slide22

And if you're using OpenStreetMap data, with the ODbL.

Slide23

You'd probably be forgiven to thinking that it's all cut and dried and no one can make any mistakes, especially not the big players in the industry, those with large amounts of cash and an equally large team of in house lawyers who specialise in this sort of thing.

You be forgiven, but it's not black and white nor is it clear cut. Let me give you an example of this.

Slide24

This example hinges around TechCrunch, the sometimes scathing tech blog started by Michael Arrington in 2005.

Slide25

One of the by products of TechCrunch is CrunchBase, which is a freely editable database of companies, people and investors in the tech industry.

Slide26

It will probably come as no surprise that in 2007 the CrunchBase API was launched, providing access to the whole of the database under a CC-BY license.

Slide27

It's worth looking at the human readable version of the CC-BY license.

You can share - in any way, in any form You can adapt - remix the data, build a derived work, transform it You can make money - this is for any purpose, even commercial endeavours

Slide28

Then in 2010, TechCrunch plus CrunchBase was acquired by AOL for an undisclosed but estimated figure of $25M.

Slide29

In July of 2013 an app called People+ launched using the CrunchBase data set to "know who you're doing business with".

Slide30

4 months later this comes to the attention of CrunchBase's new owner who promptly send a serious of cease and desists for all the wrong reasons, displaying a stunning lack of how open licenses work and what they mean.

Slide31

The first cease and desist makes the following assertions. All of which are true. Yes, People+ replicates what CrunchBase does, after all it's based on CrunchBase. Yes, People+ exposes the CrunchBase data in a way that's far more intuitive and valuable than CrunchBase's own (web based) search.

All of this is true. Except that none of this is in breach of the CC-BY license that AOL clearly doesn't understand. AOL may not like that fact that someone is making a better job of their own data than AOL is having hurt feelings is irrelevant in the context of whether a cease & desist is valid and this one is clearly not

Slide32

The second cease and desist makes AOL's hurt feelings clear. The second clause here is completely wrong. AOL can decide to forbid someone from using the API if they feel it violates their terms, but they cannot "terminate" the license to use the content. The content is free to use under the license, and there's nothing AOL can legally do about it.

Slide33

As an interesting footnote to this tale, if you look at the CrunchBase terms now, you'll note that AOL have, as of December 2013, reissued the CrunchBase data under CC-BY-NC, but they also seemed to have learned a valuable lesson, noting that any data that was created before this date remains under CC-BY.

Slide34

So even the big players can and do get open licensing wrong. That example was just over a single data set, covered under a single license and one where the license contains both the full legal terms as well as a human readable form, for those of us who aren't lawyers.

Things get much more fun when you start to try and mix open data licenses, to produce a derived or co-mingled work.

Slide35

Actually this is where the fun stops. Whilst there are co-mingled works out there on the interwebs, they are few and far between. Finding the correct path to take when attempting to rationalise two open licensing schemes is incredibly difficult. Most legal advice is to just say no.

Slide36

To take a slightly contentious view, this may be one of the reasons why none of the big players have never produced a derived work that contains OpenStreetMap and this may also be one of the biggest single barriers to adoption of OSM. From speaking to various lawyers, all of whom actually specialise in IP and in data licenses, the main stumbling point is the "viral" nature of the share alike clause in most open data licenses. Large companies, who have invested a considerable amount of time and effort in making their proprietary data, are unwilling to add in a data source which effectively means they have to share the derived work with the public ... and their competitors.

Slide37

Another stumbling block, admittedly one which is more down to the creators of an open data set rather than the license, is that of provenance. If you take a data set, can you really be certain where all of the data came from. Did some of the data come from another source? Do you know what that source is? Do you know what license that other source is under? Do you know if the licenses are compatible?

The answer to most of these questions is usually "no". It's a truism of some members of the tech community that an approach of "sue first, ask questions later" is often used. Taking all of this into consideration it get easier to see why the default legal answer to "can we use this open data set" is often "no".

Slide38

If there was a concerted effort on the part of the organisations behind open licenses to make their licenses compatible, to set aside or work together on differences, then maybe we'd see more widespread adoption of open data outside of the existing open data community.

Slide39

For open source licenses things are a little clearer; lots of work has been done to rationalise between GPL, lGPL, BSD, MIT, X11, Apache and all the other open licenses that are focused on code and on software.

Slide40

But for open data licenses, the picture is anything but clear. Yes, there's loads of commentary on how to approach open data compatibility but nothing that's clearly and humanly readable.

Nowhere is this more apparent in the admission from Creative Commons that the number of other licenses that are compatible with CC licensing is ... none

Slide41

Maybe to bring agreement between the differing parties and factions where open data licensing is concerned we need to put disagreements behind us, maybe the way forward is a new open licensing scheme, where attribution is maintained but with the viral element softened or removed.

Slide42

Maybe, but that day has't yet come, though there have been some attempts to do this, but strangely they've yet to see widespread adoption

Slide43

Finally, a shameless plug …

Slide44

If you like the topics of maps, of geo, of location and all points inbetween, then you'll probably like #geomob, the roughly quarterly meetup of like minds. The next event is on 13th. of May at the UCL Campus.

Slide45

The Quest For The London Flood Map

the extent of potential flooding of London if the Thames Barrier wasn't in place". If you know London at all, it's certainly an arresting image but like so many times when I encounter a map, I want to interact with it, move it, see whether where I live in London would have been impacted. So I started investigating.

Some background context is probably in order. On December 5th. the UK's Met Office issued severe weather warnings for the East Coast of England. A combination of a storm in the Atlantic to the north of Scotland, low atmospheric pressure and high tides were all combining to push a massive swell of water through the narrows of English Channel, in effect squeezing the water through the Dover Strait. As the North Sea and English Channel are relatively shallow, the sea would back up and had the potential to flood large areas of the East Coast of England as well as the areas surrounding the tidal stretch of the River Thames and that means London and possibly even where I live in Teddington, which marks the upper limit of the tidal Thames. Thankfully for those of us who live West of Woolwich, the Thames Barrier exists to protect London from such flooding, though I'm sure this is less of a comfort to those people who live to the East of the barrier.

My morning's reading today has been dominated by a map image that the UK's Environment Agency released on December 6th that, to quote the Tweet, shows "the extent of potential flooding of London if the Thames Barrier wasn't in place". If you know London at all, it's certainly an arresting image but like so many times when I encounter a map, I want to interact with it, move it, see whether where I live in London would have been impacted. So I started investigating.

Some background context is probably in order. On December 5th. the UK's Met Office issued severe weather warnings for the East Coast of England. A combination of a storm in the Atlantic to the north of Scotland, low atmospheric pressure and high tides were all combining to push a massive swell of water through the narrows of English Channel, in effect squeezing the water through the Dover Strait. As the North Sea and English Channel are relatively shallow, the sea would back up and had the potential to flood large areas of the East Coast of England as well as the areas surrounding the tidal stretch of the River Thames and that means London and possibly even where I live in Teddington, which marks the upper limit of the tidal Thames. Thankfully for those of us who live West of Woolwich, the Thames Barrier exists to protect London from such flooding, though I'm sure this is less of a comfort to those people who live to the East of the barrier.

3WxNK

But back to that map. It's a nice overlay of flood levels on the Docklands area of London based on satellite imagery. The cartography is simple and pleasing; light blue for the River Thames and Bow Creek, darker blue for the banks of the rivers and a washed out aquamarine for areas that would be flooded. But it's a static image. I can't pan and scroll it. The Tweet from the Environment Agency and the image itself contained no context as to where it came from or how it was made. So I browsed over to the Environment Agency's website in search of enlightenment.

The Environment Agency is a governmental body and that's very much apparent from the website. It simply screams corporate website produced by a large contractor. But no matter, I'm not here to critique website design; I'm here looking for a map. So I looked. I searched. If that map is on that website it's not wanting to be found. It's the map equivalent of the planning application for the demolition of Earth in the Hitchhiker's Guide To The Galaxy and is on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying Beware of the Leopard**. But what I did find was this map ... the Risk Of Flooding From Rivers And Seas map. With this map I could finally find out what risk there was of flooding to my local area. Eventually.

Now it's only fair to state upfront that the original version of this post, from this point onwards, was less a critique of a map and much more of a scathing flaying alive of a map. But thankfully before I posted this, I'd also taken the time to read Gretchen Peterson's Getting Along: The Objective And The Subjective In Mapping. After rereading my original post, it was only too evident that calling it a critique was unfair as it was far far too subjective. So I rewrote it, trying to adhere to being objective wherever I could be.

flood-1

So let's start ... this map has some significant flaws. The questions are why and what could be done to rectify those flaws?

flood-2

The map starts zoomed out to encompass the entirety of England, with no apparent flood information at all. There's a prompt to "enter a postcode or place name", but I know where I live so I try to zoom in by double clicking. The map's click event is trapped as I'm told to "zoom in query the map" which I work out to mean I have to use the map's zoom slider control. But if you take the time to write some code to trap the act of clicking on a map, why not go one step further and use the double click paradigm for map navigation which is by now almost universal? But this is also a flood map, so why not use my web browser's built in geolocation facility to automatically zoom the map to where I am right now, or at least present the map in a form where there's some flood information available. Why make the user do all of this additional work? With a few simple lines of Javascript code, the map could be made so much more immediate and easily usable.

flood-3

So I started to zoom in, using the pan control. The next zoom level was less than visually pleasing. Jagged, blocky and pixellated place labels are scattered across the map. It's almost as if the map's tiles were hand rolled, but more about that in a minute.

When zooming, the map's centre had changed and after my initial double click zooming attempts were rebuffed, I feared that I wouldn't be able to pan the map without recourse to the pan controls. Indeed my first attempt at panning looked more as if I was trying to drag the map image out of the browser window. But then a few seconds later the map redrew itself. This was less a slippy map and much more a slow-py map.

flood-4

After zooming in a further 3 times, the pixellation on the place labels had cleared up but the map itself was washed out and faded, almost as if there was a semi transparent overlay on top of the underlying base map, which itself looked like the Ordnance Survey map style. It also looked, to be frank, a bit of a mess. Given that I was trying to find out flooding information there was far too much information being displayed in front of me and apart from the map's legend, helpfully marked legend, none of it was flood related. Yet.

flood-5

One further zoom level in and I finally found what I was looking for. A visualisation of what looked like an overflowing River Thames. At first sight this explained the washed out nature of the map I'd seen earlier. Surely this was due to an overlay containing the flooded areas but rather than overlay just the flooded area, the entirety of the map was overlaid, with the non-flooded areas being made translucent to allow the underlying map to bleed through.

The great thing about Javascript web maps is that, if you know how, you can actually break apart the layers of the map and see how it's constructed. Doing just this led me to discover that the flood data I was seeing wasn't an overlay. With the exception of the map's pan and zoom controls, the map is a single layer. Whoever was behind that map has made their own tile set with the flood data an intrinsic part of the map. All of which is extremely laudable but at higher zoom levels the tile set just doesn't work and the choice of underlying base map leaves quite a bit to be desired.

flood-7

Finally, after several more pan and zoom operations I could see my local area. But it had taken 7 attempts at zooming in and almost as many panning operations to keep the map centred on where I wanted to see. Now it's true that entering my postal code would have taken me there immediately but one of the habits we've developed when viewing digital maps is to be able to dive in and get where we want to go by interacting with the map itself and not neccessarily with the map's controls.

Even when I'd found the information I want, the flood data seems placed on top of the base map almost as an afterthought, despite the two data sets being baked together into a single map layer. I can appreciate the cartographical choice of using shades of blue for the two flood zones, but the pink chosen to show existing flood defences is a questionable, albeit subjective, choice. The flood data just doesn't sit well on top of the underlying Ordnance Survey map, whose map style just clashes with the flood data's style. Finally and probably worst of all, the map is slow, almost to the point of being unusable. All of which makes me wonder how many people have come across this map and just simply given up trying to find the information they're looking for. If only the map looked as good as the original graphic that started me on this map quest (pun intended). Surely someone could do better?

Maybe someone will. The flood zones are available via WMS from the UK's data.gov.uk site, though that very same site warns you that registration is required and they're not under an open license. Even taking a simpler base map approach and overlaying the tiles from the WMS would make the map far more accessible and easier to comprehend. Some of the data itself looks like it could be available from Environment Agency's DataShare site, though it's only fair to say that this site and data.gov.uk does suffer from the same lack of discoverability and ease of use that the flood map suffers from.

For geospatial information such as flood data, there's no better way to make it easily comprehensible and visible than on a map. The mere fact that there is such a map is to be applauded. It just could be so much better and this would take a trivial amount of technical acumen from anyone who's used to making even simplistic digital maps. This map could be amazing and shine so brightly but as it currently stands, it can only receive the same score as I saw too many times on my school report cards. "B-. Could try harder."

Image Credits: Environment Agency.

Service Suspended On The London Underground (API)

Transport For London Tube API, the London Datastore blog sadly notes:

Owing to overwhelming demand by apps that use the service, the London Underground feed has had to be temporarily suspended. We hope to restore the service as soon as possible but this may take some days. We will keep everyone informed of progress towards a resolution.

In the meantime, if you want to see how it does looks when the API is up and running there's a video clip of Matthew Somerville's recent Science Day hack visualisation over on my Flickr photo and video stream.

No Victoria line service after 2000 tonight Photo Credits: Martin Deutch on Flickr.

If you build it they will come. Or to put it another way, sometimes demand outstrips supply. After the phenomenal success of the Transport For London Tube API, the London Datastore blog sadly notes:

Owing to overwhelming demand by apps that use the service, the London Underground feed has had to be temporarily suspended. We hope to restore the service as soon as possible but this may take some days. We will keep everyone informed of progress towards a resolution.

In the meantime, if you want to see how it does looks when the API is up and running there's a video clip of Matthew Somerville's recent Science Day hack visualisation over on my Flickr photo and video stream.

No Victoria line service after 2000 tonight Photo Credits: Martin Deutch on Flickr.

Where's My Tube Train? Ah, There's My Tube Train

I wrote about Paul Clarke trying to solve the problem of where's my train; that there must be a definitive, raw source of real-time (train) information and that

I assert that train operators know where their assets are; it would be irresponsible if they didn't

Whilst the plethora of train operators that fragmented from the ashes of the old British Rail network haven't answered this challenge yet, Transport for London has, opening up just such data as part of the London Datastore API. In today's age of talented web mashup developers, if you release an API people will build things with it if the information is useful and interesting and that's just what Matthew Somerville of MySociety did at the recent Science Hack Day ... a (near) realtime map of the London Underground showing the movement of trains of all of the Tube lines. A screen grab wouldn't do it justice and it takes a while to load, so a video grab might help here.

Back in December of 2009, I wrote about Paul Clarke trying to solve the problem of where's my train; that there must be a definitive, raw source of real-time (train) information and that

I assert that train operators know where their assets are; it would be irresponsible if they didn't

Whilst the plethora of train operators that fragmented from the ashes of the old British Rail network haven't answered this challenge yet, Transport for London has, opening up just such data as part of the London Datastore API. In today's age of talented web mashup developers, if you release an API people will build things with it if the information is useful and interesting and that's just what Matthew Somerville of MySociety did at the recent Science Hack Day ... a (near) realtime map of the London Underground showing the movement of trains of all of the Tube lines. A screen grab wouldn't do it justice and it takes a while to load, so a video grab might help here.

Coming down the escalators at Waterloo and want to know whether to head for the Bakerloo or the Northern Line to take you north of the river? Now you can tell which line has a northbound train closest to Waterloo.

Want to see just how close the gap is between Leicester Square and Covent Garden on the Piccadilly Line really is? Now you can.

Of course, this doesn't solve every problem ... 1. If you're on the escalators at Waterloo how do you get 3G data coverage to view this mashup on your phone as Transport for London still haven't manage to achieve cellular coverage underground, unlike Amsterdam, Berlin and other cities? 2. The site will probably be the target of a tutting campaign from the Health and Safely police insisting that such a visualisation will cause people to run for the train and of course, they might trip and hurt themselves. 3. If you're at the top of the escalator and the train is in the station, now, right this very minute now, how do you get down to the platforms quickly?

Whilst I can't answer the first two of these questions, this publicity stunt from Volkswagon at Berlin's Alexanderplatz U-Bahn station might just hold the solution for the third question ... a slide!