Posts Tagged: data


3
Mar 10

Reclaim and Own Your Short URLs

There are many reasons to like the use of URL shorteners such as bit.ly and tinyurl.com. These free services take a long URL such as this post – http://www.vicchi.org/2010/03/03/reclaim-and-own-your-short-urls – and compresses them down to a much more manageable shorterned version – http://bit.ly/aG1RBx or http://tinyurl.com/ylaodny.

They increase link sharing; the vast majority of social networking sites use 140 characters as the maximum size for an update, using the full version of a URL you’re sharing reduces the amount of space for you to put your own thoughts into the update. Just compare the full URL http://www.vicchi.org/2010/03/03/reclaim-and-own-your-short-urls at 65 characters against http://bit.ly/aG1RBx at 21 characters.

They can track and yield click and referrer information; the information that bit.ly provides is so useful, showing live clicks, geographic and referrer information amongst others.

another awesome bit.ly site down graphic

But almost a year ago, Delicious founder and ex-Yahoo! Joshua Schachter made some pretty compelling arguments against short URLs:

The worst problem is that shortening services add another layer of indirection to an already creaky system. A regular hyperlink implicates a browser, its DNS resolver, the publisher’s DNS server, and the publisher’s website. With a shortening service, you’re adding something that acts like a third DNS resolver.

But the biggest burden falls on the clicker, the person who follows the links. The extra layer of indirection slows down browsing with additional DNS lookups and server hits. A new and potentially unreliable middleman now sits between the link and its destination. And the long-term archivability of the hyperlink now depends on the health of a third party.

Or to put it another way, you no longer own your links or the data clicks that those links yield. If the service dies, your links break, pure and simple, and that does happen, as the demise of the original tr.im and cli.gs services show.

Get used to it... tr.im is currently unavailable

But there is a way to take all the benefit that short URLs offer and keep ownership of your links and all the data that clicks on those links will give you and that’s to run your own URL shortening service, which is precisely what I’ve done with vtny.org which is running the YOURLS code behind the scenes. This gives me all the benefits and metrics that other URL shorteners provide but with the added and crucial benefit that I now own the links and the data they generate, in this case via the vtny.org/4 short URL.

The URL shortener at vtny.org goes live

Photo credit: playerx and revrev on Flickr
Written and posted from home (51.427051, -0.333344)

11
Dec 09

Geographic and Transport Data; a Tale of Capricousness, Whimsy and Downright Insanity

The industry I work in thrives on data; we consume loads of the stuff and in turn we generate petabytes of it. I’m talking about data in general, not the geographic, mapping or place data that I usually write about.

But the longer I work in the Internet industry the more convinced I become that, as an industry, we need to get our act together. How else to explain the bizarre, rapidly changing and capricious nature of how we gain access to, use, pay, don’t pay and disseminate data?

We’re socially conditioned to assume that free does not equate to good, hence the adage “there’s no such thing as a free lunch“. So stuff that costs is good and stuff that’s free isn’t. But normal rules don’t apply here.

Let’s take geographic data; I’m on home ground here so this should be relatively straightforward.

The proprietary data vendors, NavteqTeleAtlas and others, charge for their data and limit what you can and can’t do with it. OpenStreetMap on the other hand charges nothing for its’ data and only places limits on the data to protect the data by way of the Creative Commons Attribution Share Alike license.

So naturally the data you pay for should be good and the data you don’t pay for should be … less than good. Naturally.

Except OpenStreetMap data isn’t less than good. UCL’s Muki Haklay summed this up neatly as “How good is OpenStreetMap? Good enough” at the OpenStreetMap conference in Amsterdam this year. Conversely, the proprietary data vendors don’t always get it right. One data vendor, who will remain anonymous, shipped a release of data with wildly incorrect centroids, the lat/long coordinate which represents the nominal centre of a place, which meant that amongst others, Covent Garden ended up being centred on Holborn Underground Station.

This isn’t an isolated incident.

On the one hand, the City of Vancouver in British Columbia makes its data, all of its data, free and open. On the other hand, the City of Tempe in Arizona decides to charge a “fair approximation of market value” for its data, which as James Fee recently discovered means that you’ll need to cough up $100,000 to use it commercially.

In San Francisco, BART, the Bay Area Rapid Transit, makes their data which includes train times freely available and taking a refreshingly prosaic approach to accessibility and licensing.

Getting an API key: Psyche: you don’t need one. We’re opting for “open” without a lot of strings attached. Just follow our simple License Agreement, give our customers good information and don’t hog resources. If that doesn’t work for you, we can certainly manage usage with keys and write more terms and conditions. But who wants that?

Here in the UK TFL, Transport for London, give you some data for free but not the train times and for overground trains the Association of Train Operating Companies (pdf link) value this data at a staggering £27,430 per year

And elsewhere in the world, other operators are closing down people who want to use this data, in New York, in Berlin, in New South Wales and we can’t really seem to work out who owns the data and whether there’s intellectual property being infringed or a public service being undertaken.

… and don’t even talk about the British postal code data was closed, was then going to be opened up but now isn’t. Apparently.

With all the data we consume and emit, we spend a lot of time and effort evangelising APIs and web services that use it. But as an industry we really need to start to act clearly and consistently in order to be taken seriously and in order for the Internet industry to realise the potential that we all think it’s capable of.

Posted via email from Gary’s Posterous


16
Nov 09

The (Geo) Data Dichotomy Dilemma

Before Web 2.0, before mashups, before FreeOurData.org.uk and other pleas, before the Internet itself, things used to be so much simpler for geo data. You were either an end user and accessed the data as a map or you were a GIS Professional and accessed the data via a (frequently very expensive and very specialised) Geographical Information System. But now we have geo data, lots of geo data, some of it free, some of it far from free, both in terms of usage and cost and a fundamental problem has replaced the paucity of data.

Everyone wants free, open, high quality geo data and no one wants to pay for it. But it’s not quite that simple.
The recent acquisitions of Tele Atlas and Navteq, the two big global geo data providers, by TomTom and Nokia respectively show the inherent value in owning data. But owning the data isn’t enough any more as the market for licensing the data is a shrinking one, despite the phenomenal growth of the satnav market, both in car and on mobile handsets. Why is the market shrinking? Because no one wants to pay for it, at least directly.
TomTom, primarily a hardware vendor, are differentiating into the software and data market,  seems to be concentrating on the PND usage of the data, although we’ve yet to see how the outlay necessary to acquire Tele Atlas coupled with the overall economic downturn will effect their overall 2009 earnings. Their Q1 2009 report somewhat dryly notes that “market conditions were challenging” and that “we are making clear progress with the transformation of Tele Atlas into a focused business to business digital content and services production company“. There may be other aspirations at play here but for now at least, the company is keeping quiet.
Nokia, also primarily a hardware vendor in the form of mobile and cellular handsets, are also moving away from their roots and into a wider market, hopefully in an attempt to stop the encroachment of upstarts such as HTC, Apple and RIM into Nokia’s traditionally strong smartphone heartland. Again, Nokia has yet to make a public play into this arena but all the composite elements are in place to enable this to happen.
Taking the opposite route, Google, which started off as a software player are now moving to being a player in the data market by gathering high quality geo and mapping data under the smokescreen of gathering Street View. This has allowed them to gather sufficient data to supplant Tele Atlas as a data provider, at least in the Continental United States.

All three companies are either making or have the prospect of making determined plays in the location space but all three of them have ways of leveraging the value inherent in their data. Google has their unique users, their search index and a vast amount of advertising inventory; TomTom their satnav customers; Nokia their handset customers, albeit one level removed with the Mobile Network Operators as an uneasy partner and intermediary.
So what of the open data providers? It’s important to remember here that open doesn’t always mean free, it means the ability to create derived works and to use the data in ways that the originator may not have immediately foreseen. True, a lot of open data is free, but even then it’s the Free Software Foundation’s definition of the word.
Free (software) is a matter of liberty, not price. To understand the concept, you should think of free as in free speech, not as in free beer.”
The poster child of open geo data is OpenStreetMap, the “free editable map of the world”. Founded in 2004 by Steve Coast, OSM has enjoyed phenomenal growth in users and in contributions of data that can be used anywhere and by anyone and which espouses the values of free as in speech and as in beer. As with all community or crowd sourced collaborative projects, OSM’s challenge is to sustain that growth and once complete coverage of a region is reached, in keeping that coverage fresh, current and valid. We’ll leave aside that fact that complete coverage is an extremely subjective concept and means many things to many people.
Traditionally strongest in urban regions, one of OSM’s other key challenges is to match the expectations of their user community who consume that data rather than those who create it. Both internationalisation of the data and expansion out of the urban conurbations will potentially prove challenging in the years to come. That’s not to say OSM isn’t a significant player in this space and the quality of the data, though varying and in some places duplicated, is for the majority of use cases, good enough. This was backed up by research undertaken by Muki Haklay of UCL which answered the perennial question of “how good is OSM data” with a pithy “good enough”.
Attempts to capitalise on and monetize the success and data corpus of OSM through the Venture Capital funded Cloudmade have yet to deliver on the promise and with the exception of a set of APIs, Cloudmade has announced the loss of their OpenStreetMap Community Ambassadors and the closure of their London office. All of which lends credence to the fact that simply owning the data isn’t enough.
So how to solve the dichotomy of geo data? Everyone wants it but no one’s willing to pay for it with the exception of the big players, the Googles, the Yahoos and the Microsofts of the world and control of the proprietary data sources has centralised into TomTom and Nokia, both of whom are well placed to capitalise on their data assets but who haven’t yet delivered on that promise.
Maybe the answer is twofold. Firstly develop an open attribution model whereby the provenance of an atom of data can be tagged and preserved; this would remove a lot of the prohibitions on creating derived works at the original data provenance could still be maintained. Secondly allow limited usage of proprietary data at varying levels of granularity, accuracy and currency, thus creating a freemium model for the data and stimulate developer involvement in donating data to the community as a whole.
It’s too early to see whether this will come to pass or whether an already tight hold on the data will become tighter still.

Posted via email from Gary’s Posterous


7
Oct 09

O2 in Positive Customer Service Shock?

O2, the UK Telefonica brand and soon-to-be-loosing-the-iPhone-exclusivity-to-just-about-anyone mobile operator, have a reputation which is, to be honest, just a little bit crap. Their coverage in the rural wilds of Central London, especially around Soho and Covent Garden, seems to be scaled for a single user and a web search for “o2 customer service problems” throws up such gems as “O2 customer service consists of PAY UP OR ELSE” and “O2’s customer service has to be the poorest I have ever come across“.

So we’ll leave aside for one moment the fact that I have to pay an additional £20.00 for a measly 10MB of data when abroad via O2’s Data Abroad 10 bolt on and accept that I ordered this to be added to my account so I could use data on my iPhone when in the US for this week’s Open Hack NYC.

The first mailed response from O2 didn’t inspire confidence.

“Hi, Thanks for getting in touch. We’ll look into your query and get back to you as quickly as we can, normally within 24 hours.”

So I waited and less than 24 hours later I got this

“Good Morning Gary. Thanks for emailing us about adding the 10Mb Data Roaming Bolt On to your account.

Gary, you’ll be pleased to know that I’ve added the 10Mb Data Roaming Bolt On to your account effective from your next bill onwards (10 October 2009).  You’ll be charged £17.02 excluding VAT (Value Added Tax) per month for this Bolt On.

If you want to add the above Bolt On on a different date, please reply to this email and we’ll help you further.”

Data roaming on; WIN. Data roaming on from the date of my next bill and after the event in New York; FAIL.

So I asked them, nicely.

“I’m having to travel at very short notice so I really need this up and running from my first day out of the country which is this Wednesday, October 7th. Can the bolt on start date be brought forward to this day?”

That automated reply came back again

“Hi, Thanks for getting in touch. We’ll look into your query and get back to you as quickly as we can, normally within 24 hours.”

I’d expected a cut-and-paste response that they could only start services such as this on the first day of a new monthly bill, which basically means minimal work for them and maximum inconvenience for the customer. Then this morning I got this, which was emphatically not what I was expecting.

“Good Evening Gary. Thanks for emailing us as you want to pre-phone your Bolt On start date. I’ve pre phoned your Bolt On start date to 07 October 2009 as requested by you. Important – When you email us please provide: your date of birth, postcode and mobile number as it helps us answer your query faster”

So fair play to you O2; I’m not entirely sure what pre-phoning is and a bit surprised that you expect me to provide personal data including my date of birth and postal code in every email, but I went into this dialogue with you with zero expectation of success and you pleasantly surprised me. Now if we can just fix that “No Service” in Central London …

Posted via email from Gary’s Posterous


31
Jul 09

Deliciousness: data, licensing, WordPress autosaves, cheese in space and lots of Nutella

More intriguing, interesting and just plain bonkers stuff from the information hose pipe we call the internet:

  • Starting off with a serious note, Ed Parsons, my opposite number at Google, wrote a great blog post on the knots that data licensing can tie you up in and why you end up paying more for a leased digital version than you do for the physical paper version.
  • WordPress started bugging me about an auto-saved version of a blog post I didn’t want to keep but couldn’t get rid of. Turns out there’s no way to do this from the WordPress dashboard but some MySQL hackery did the trick.
  • I am, and am VERY badly affected by being in close proximity to WiFi and other microwave transmission sources. Not that I’d expect you or anyone else who isn’t adversely affected to believe me“. The rest of the story on the Daily Telegraph blog is priceless.
  • Ofcom confirmed what anyone with the UK ADSL line already knows, that the average UK broadband speed is just over half of what’s being advertised and paid for.
  • A US highway exit sign got every word misspelled, apart from the word “exit”.
  • Forget putting men on Mars or getting the Space Shuttle working; we put cheese into space, tracked it, lost it and found it again. Makes you proud to be British.
  • Someone likes Nutella. A lot.
  • And finally, if your iPhone gets a text message containing a single square character. Turn it off. Turn it off now.