Data is being created and collected everywhere, all the time these days. By just reading this page, you created several thousand durable bits of new data spread across several infrastructures. These bits, like grains of sand, just begin to collect. From web sites, phones, cars, smart cities and dumb terminals, it collects. Personal data accretes into vast stores waiting for analysis by algorithms and analysts. We know that part well.

What we don’t know well: how to put the I back into Private. As concerns over privacy and personal data heat up, we will need a new kind of service model for personal data. Not only must it provide for identity and ownership, it must also play to the technical advantages of the cloud.

Thirty Years of Data

Creating data by just clicking is nothing new. But the magnitude of has changed — data is now forever. Back in the 1990s, you stored a file on a floppy disk and that was sort of its permanent home. You could back it up on another disk, but that was your choice. Floppies went bad, and the work went with it.

Today, a file is in the cloud. In reality, it’s on some number of disks, spread across arrays, and if any disk fails, a copy is resurrected. The data essentially becomes immortal. So like giant dunes, the data is pushed around like heaps of sand by the automated processes of the clouds themselves.

These heaps are made up of people’s pictures, videos, documents, and the mountains of sensor and logging data from their phones, cars, and smart homes. It also contains security camera video, payroll records, mortgage originations, medical records, just about everything you can think of. Today, it’s all buckets of data that are poured around like liquid — except that in the 2020s, the data doesn’t evaporate. On a free tier storage account, the pictures in my album should theoretically last until the end of time.

Forever and a Day

This is a huge change from only having printed photos, papers and magnetic media, all of which degrade in mere decades. Even optical storage, like DVD-ROMs, isn’t forever. We’re only about 20 years into this new stage of never forgetting anything — but how we handle it hasn’t kept up.I

I went back and took a look at my own data: once everything was digital (about 2002 for me) it’s growing at an average rate of 200% a year. Bigger pictures, more videos, more logging. My phone tracks health, location, sound, light and even more if Mark or Jeff are actively eavesdropping on me.

The Dune concept addresses how this data is managed in a modern architecture, with modern privacy and confidentiality concepts baked-in. While the best solution for a privacy-minded person is to have physical custody of the bits, that isn’t feasible for most people. Many efforts exist to pull your data back down into a box on your desk, but these negate the cost and durability advantages of the cloud. The majority of people don’t have the time, money, or skills for that idea to be sustainable. And in the biggest irony of all, most on-desk solutions backup to the cloud anyway.

The next best thing is to define how personal data should be stored so that it stays private and portable.

Rivers, Lakes, Beaches and Dunes

Now, you may recognize a giant unmanaged heap of data as a “data lake,” a term that’s become very popular lately. Lakes hold raw unstructured data and objects. And while a dune can hold unstructured data, there is one fundamental difference: items in a dune are always encapsulated with ownership and access control. They always belong to somebody or something.

So instead of data for data’s sake (the lake), the dune is comprised of data which can be tied back to a discrete identity, and be access controlled in a fine-grained method. It’s the opposite of a “data swamp”, which is unmanaged useless data. Dunes retain all the scale and automation the cloud brings, keeping the data immortal and most of all, cheap to store and fast to access. Best of all, they can be pulled back onto a box on your desk, if you feel the need to have it on-premises.

As privacy and confidentiality (not to mention compliance and governance) regulations increase, attaching identity to everything will be a critical step in meeting requirements. Even without major changes to the law, plain old consumer preference will begin to demand these things.

Just as physical property rights defined modern civilization, so must digital property rights follow. We’ll get to Locke, Hobbes and Rosseau in Part Two.

Data Dunes is a concept spearheaded by Hans Cathcart and myself at SandDune.org. We think it’s cool, and we think you should think so too.