Skip to main content


Data Where?

Posted on    by Andrew McHugh

Data Where?

By 4 November 2013No Comments

Introduction by Prof. Lilian Edwards

Below is a blog written originally on the personal website of Professor Derek McAuley, head of the Horizon Digital Economy Hub and Doctoral Training Centre and lead for Nottingham as partner in CREATe. I have added a short new introduction to put into context why the work outlined below is an integral and vital part of the CREATe work programme.
Horizon is CREATe’s major partner looking at the creative industries’ problems from the viewpoint of technology and computing science. Specifically, Horizon took on the Herculean job of considering a number of interlocked problems. First, the Internet is obviously the source of, and platform for, much of the new creative and innovative activity in modern society. It clearly and brutally cannot be ignored by a Centre devoted to promoting the creative industries.

On the other hand, many of the new most significant data-intensive Internet platforms and players – social networks like Facebook, Twitter, Pinterest and Tumblr, search engines like Google and hosting platforms like YouTube – bringing most amounts of social benefit and encouraging economic growth – are also data-intrusive. They largely survive and, indeed, thrive on a business model where the customer or audience does not pay for access to information, creative works or useful services directly in hard cash, but by giving away their personal data (with or, most often, without much conscious knowledge and more than formal and illusory consent). As a result, social networks and Internet intermediaries are becoming key problem actors in a general landscape of incursion into, and reduction of, personal privacy in the modern information society.

To make matters worse, while the Internet is borderless, the key intermediaries tend to be US-based – and US privacy law is acknowledged to be one of the weakest in the world (for the private and commercial sectors anyway). Recent events, such as the Snowden revelations concerning NSA and GCHQ covert and possibly illegal surveillance of social media and cloud storage, have made citizens and consumers alike suddenly more aware of how vulnerable their personal data and communications are online – and this in turn may be bad news for industry, including the creative industries of the UK and EU, if it reduces trust in use of the Internet generally.

As a result CREATe tasked itself (foresightfully, in 2011!) to build, as a pilot, an innovative open intermediary platform, useable by the public for social networking and by artists and creators to reach new audiences and collaborate in new ways; all within a supportive and privacy friendly environment.

Early work at Horizon has shown that this project, intended to last the 4 years of CREATe’s lifetime, requires whole new ways of thinking – or perhaps old ways we have forgotten – about what “networking” means. Why do we need to share all our data with Facebook or with Google to get what we really want, which is, say, search, contact with our friends, or storage for our holiday photos or our artworks for sale? Can we not keep our data in our own secure non centralised stores – “Personal Data Containers” or PDCs – and exchange what data we need, on our own terms, to get just what services and connections we want, without fearing our privacy is being breached? Do we need the cloud, or can we store our data on our own servers, phones, laptops, etc and get those gadgets to speak to each other without involving a third party company who may be under state pressure to spy or leak? Can we perhaps play with and make money out of our own data rather than giving it away to Google? Are there opportunities here for UK and EU industry?

This is a challenging approach, but one which is receiving serious concern from a number of very different directions: inter alia, the UK MiData project, Richard Stallman’s suggestion of a “Freedom Box”, the social network start up Krowdthink, and the proposals from Europe for a right of data interoperability as part of the draft Data Protection Regulation process. Market signs from worlds like cloud computing and social networking, as well as speeches by the likes of Viviane Reding in the EU, show an anxious demand from the public for this kind of work.

CREATe cannot build the new open, distributed, privacy-secure Facebook tomorrow – network effects are a big part of Internet business models, as are scaling – but it can pilot and model the techniques which will allow such a new Facebook to be built. We hope to bring you a series of blogs explaining what we are doing. The first is below. This work already involves technologists and privacy and IP lawyers; soon we would like to widen it to include creators, artists and the creative industries generally. If you would like to be part of this work, please let us know!

Data Where?

by Prof Derek McAuley, University of Nottingham (this was originally published on Prof McAuley’s personal blog)

@gikii and #gikii2013 on twitter

Attending GikII gave me a great opportunity to talk to folks at the junction of law and technology who concern themselves greatly with personal privacy and see a sea of tech washing over the population that causes great concern. I’d been laying out my strategy for dealing with the current tech and our research plans in this space, and was encouraged to get it written down in an easy to read version – i.e. not our research publications! So here goes…

The desire to access information anywhere has been leading to an increasing centralization of services into the cloud so that one can have access to email, files, contacts, etc. from anywhere – I’ll refer to this as “data in the cloud”. Following closely on this have been a series of applications either built-in (MacOS Mail, Android Mail), free (Dropbox, SkyDrive) or purchased (Outlook) that synchronize contents between the cloud servers and mobile devices and computers in the background – “data sync with cloud”. This is done so that when the user interacts with the application, both the application and its data are local, which makes it more responsive and able to operate even when disconnected. One logical and privacy enhancing conclusion to this trend is to arrange the devices to synchronize information directly with each other and forget about maintaining a copy in the cloud – “data on my devices”.

Crazy file sharing icons

These “data on my devices” services are already emerging for file sharing – I currently run seven file-sharing applications, which fall into two distinct categories.

Files in the cloud services include Dropbox, SkyDrive, GoogleDrive, Memopal and SpiderOak. The first three all maintain an unencrypted copy of my files in the cloud while the latter two assert they store encrypted copies of files – your level of PRISM related paranoia will dictate whether you trust the encryption of the latter, but for the big three you need to trust the provider to maintain confidentiality. Hence, I use these services for my research talks, publications and random other storage uses where the information is not private or personal; the consequences of a breach of confidentiality for this information are nothing more than a minor irritation – someone sees a work in progress paper or a half-baked presentation.

For private and personal information, including any data relating to other people, I use services that synchronize files across my devices without maintaining a copy in the cloud or ever seeing the contents; examples include BitTorrentSync and AeroFS. In these examples the cloud service merely provides the means for devices to find each other, and possibly provides an encrypted forwarding service if they cannot communicate directly (e.g. your iPad in the Internet cafe talking to your home computer behind your home router).

The impending serious concern is the simultaneous arrival of the “Internet of Things” and “personal data stores” – the scope for dangerous privacy breaches if these services are all in the cloud is significant. I have a simple take on this – don’t put the data in the cloud, synchronize it across your devices and run applications locally. I mean – it’s not like we don’t know how to do it and still make it easy for the user.

To this end we have been developing Nymote as a general solution for secure data synchronization across computing systems, one use of which is to securely store and share private information across my personal devices. Nymote is composed of three elements: Signpost, Irminsule and Mirage. Signpost provides the cloud service that allows your devices to find each other and establish secure communications paths. Irminsule provides the distributed data store, which moves beyond files to provide a robust database that allows simultaneous conflicting updates to data items on different devices with application hooks for their resolution – simply a more useful building block than files. Mirage is the underlying runtime environment that, in its most secure instantiation, runs within its own virtual machine.

That’s the tech side, aiming to build in “privacy by design”; it still needs the underpinning legislation for consumer protection in digital services rather than informed consent (blogs passim), especially if we are going to roll it out as a legal obligation to companies. So over to you JR…