The Challenge Of Open • Gary Gale

One of the great things about the combination of maps, geo, location and London is that roughly once a month there's some kind of meetup happening in the city on these themes. One of the longer running players in this space is the Geospatial Specialist Group of the British Computer Society which is being relaunched and reinvigorated as the Location Information SG. Earlier this week I gave a talk, but what to talk about?

It didn't take too long to come up with a suitable theme. In my current day job, consulting with open data specialists Lokku, I come across the benefits and the challenges in using open data on almost a daily basis. One of the earliest lessons is that nothing is simple and nothing is straightforwards when you bring licensing into a field and open data is no exception.

So, hello, I’m Gary and I’m from the Internet. I’m a self-confessed map addict, a geo-technologist and a geographer. I’m Geotechnologist in Residence for Lokku in London. I used to be Director of Global Community Programs for Nokia’s HERE maps and before that I led Yahoo’s Geotechnologies group in the United Kingdom. I’m a founder of the Location Forum, a co-founder of WhereCamp EU, I sit on the Council for the AGI, the UK’s Association for Geographic Information, I’m the chair of the W3G conference and I’m also a Fellow of the Royal Geographical Society.

There’s a lot of URLs in the slides to follow and rather than try to frantically jot them down, this is the only URL you really need to know about. If you go there right now, this link will 404 on you but sometime tomorrow this where my slides and all my talk notes will appear here.

I've been in this "industry" for almost 25 years. I'm not quite sure what actually comprises this "industry" though; I think of it as a loose collection of software, data, geo, maps and location. Thinking back, maybe life was easier when everything was proprietary and locked up? You knew the boundaries, you knew what you could and couldn't do with software and data. You didn't need to be a part time lawyer.

But this isn't 25 years ago, like it or not we're in the future.

And the future is very much open.

Whether it's the open source software that runs your laptop or desktop or the open source software that runs the vast majority of the internet and the web ...

Or whether it's open data, such as OpenStreetMap or open government data, the concept of open is very much of the now and that means we need to be able to deal with both the benefits this brings as well as some of the pitfalls that lie in wait for the unwary

One of those pitfalls is the license, that usually vast amount of frankly impenetrable legalese that is difficult to understand and seems to have been written for lawyers and not for mere mortals.

This isn't a new thing. Think back to the days before we downloaded software in a blinking of an eye. Remember shrink wrapped software? Remember the catch 22 of breaking the seal meaning you accepted the EULA that was underneath the shrink wrap?

No one read the EULA, we just wanted to get our hands on those brand new floppy disks and then patiently feed them, one by one, to our computer to get at our new purchase.

Even in the days of the web, where downloads have supplanted floppies, CD and DVD ROMs, we just want to get to the "good stuff". We instinctively look for the button that says "accept" or "agree" and just ... click.

We don't read the EULA, or the terms of service, or the terms of use, or the license. In essence we're blind to what we're agreeing to and sometimes what we do agree to can be surprising.

If you use iTunes on your phone, tablet or computer you'll have agreed to the iTunes terms of service and in doing so, scuppered your plans for taking over the world by use of anything nuclear, chemical or biological.

If you're using Apple's Safari browser on a Windows machine, you'll also be in breach of the license which you've accepted and which clearly states that you won't run Safari for Windows on a Windows machine.

But you may be missing out on an unexpected treat. In 2005, the makers of PC Pitstop included a clause that promised a financial reward for reading the EULA and contacting the company. Five months after release and 3,000 sales later one person did read the EULA and was rewarded with a cheque for $1000

But I am not a lawyer. I have no legal training whatsoever. With the proliferation of open source and open data it now feels that I have to be able to read the small print. If you don't read your open licenses then I would strongly recommend that you do.

In doing so, you'll probably feel as I first did; that you're walking into a veritable minefield of clauses, exclusions and prohibitions.

You'd be forgiven for thinking that if you're fortunate enough to be dealing with purely open licensing, with not even a whiff of anything proprietary, that everything is clear, it's all black and white.

You'll start to become familiar with the GPL.

With Creative Commons, with or without attribution and with or without non-commercial use clauses.

And if you're using OpenStreetMap data, with the ODbL.

You'd probably be forgiven to thinking that it's all cut and dried and no one can make any mistakes, especially not the big players in the industry, those with large amounts of cash and an equally large team of in house lawyers who specialise in this sort of thing.

You be forgiven, but it's not black and white nor is it clear cut. Let me give you an example of this.

This example hinges around TechCrunch, the sometimes scathing tech blog started by Michael Arrington in 2005.

One of the by products of TechCrunch is CrunchBase, which is a freely editable database of companies, people and investors in the tech industry.

It will probably come as no surprise that in 2007 the CrunchBase API was launched, providing access to the whole of the database under a CC-BY license.

It's worth looking at the human readable version of the CC-BY license.

You can share - in any way, in any form You can adapt - remix the data, build a derived work, transform it You can make money - this is for any purpose, even commercial endeavours

Then in 2010, TechCrunch plus CrunchBase was acquired by AOL for an undisclosed but estimated figure of $25M.

In July of 2013 an app called People+ launched using the CrunchBase data set to "know who you're doing business with".

4 months later this comes to the attention of CrunchBase's new owner who promptly send a serious of cease and desists for all the wrong reasons, displaying a stunning lack of how open licenses work and what they mean.

The first cease and desist makes the following assertions. All of which are true. Yes, People+ replicates what CrunchBase does, after all it's based on CrunchBase. Yes, People+ exposes the CrunchBase data in a way that's far more intuitive and valuable than CrunchBase's own (web based) search.

All of this is true. Except that none of this is in breach of the CC-BY license that AOL clearly doesn't understand. AOL may not like that fact that someone is making a better job of their own data than AOL is having hurt feelings is irrelevant in the context of whether a cease & desist is valid and this one is clearly not

The second cease and desist makes AOL's hurt feelings clear. The second clause here is completely wrong. AOL can decide to forbid someone from using the API if they feel it violates their terms, but they cannot "terminate" the license to use the content. The content is free to use under the license, and there's nothing AOL can legally do about it.

As an interesting footnote to this tale, if you look at the CrunchBase terms now, you'll note that AOL have, as of December 2013, reissued the CrunchBase data under CC-BY-NC, but they also seemed to have learned a valuable lesson, noting that any data that was created before this date remains under CC-BY.

So even the big players can and do get open licensing wrong. That example was just over a single data set, covered under a single license and one where the license contains both the full legal terms as well as a human readable form, for those of us who aren't lawyers.

Things get much more fun when you start to try and mix open data licenses, to produce a derived or co-mingled work.

Actually this is where the fun stops. Whilst there are co-mingled works out there on the interwebs, they are few and far between. Finding the correct path to take when attempting to rationalise two open licensing schemes is incredibly difficult. Most legal advice is to just say no.

To take a slightly contentious view, this may be one of the reasons why none of the big players have never produced a derived work that contains OpenStreetMap and this may also be one of the biggest single barriers to adoption of OSM. From speaking to various lawyers, all of whom actually specialise in IP and in data licenses, the main stumbling point is the "viral" nature of the share alike clause in most open data licenses. Large companies, who have invested a considerable amount of time and effort in making their proprietary data, are unwilling to add in a data source which effectively means they have to share the derived work with the public ... and their competitors.

Another stumbling block, admittedly one which is more down to the creators of an open data set rather than the license, is that of provenance. If you take a data set, can you really be certain where all of the data came from. Did some of the data come from another source? Do you know what that source is? Do you know what license that other source is under? Do you know if the licenses are compatible?

The answer to most of these questions is usually "no". It's a truism of some members of the tech community that an approach of "sue first, ask questions later" is often used. Taking all of this into consideration it get easier to see why the default legal answer to "can we use this open data set" is often "no".

If there was a concerted effort on the part of the organisations behind open licenses to make their licenses compatible, to set aside or work together on differences, then maybe we'd see more widespread adoption of open data outside of the existing open data community.

For open source licenses things are a little clearer; lots of work has been done to rationalise between GPL, lGPL, BSD, MIT, X11, Apache and all the other open licenses that are focused on code and on software.

But for open data licenses, the picture is anything but clear. Yes, there's loads of commentary on how to approach open data compatibility but nothing that's clearly and humanly readable.

Nowhere is this more apparent in the admission from Creative Commons that the number of other licenses that are compatible with CC licensing is ... none

Maybe to bring agreement between the differing parties and factions where open data licensing is concerned we need to put disagreements behind us, maybe the way forward is a new open licensing scheme, where attribution is maintained but with the viral element softened or removed.

Maybe, but that day has't yet come, though there have been some attempts to do this, but strangely they've yet to see widespread adoption

Finally, a shameless plug …

If you like the topics of maps, of geo, of location and all points inbetween, then you'll probably like #geomob, the roughly quarterly meetup of like minds. The next event is on 13th. of May at the UCL Campus.