An anonymised dataset is supposed to have had all personally identifiable information removed from it, while retaining a core of useful information for researchers to operate on without fear of invading privacy.
But in practice, data can be deanonymised in a number of ways. Now researchers have built a model to estimate how easy it would be to deanonymise any arbitrary dataset. A dataset with 15 demographic attributes, for instance, “would render 99.98% of people in Massachusetts unique”. And for smaller populations, it gets easier: if town-level location data is included, for instance, “it would not take much to reidentify people living in Harwich Port, Massachusetts, a city of fewer than 2,000 inhabitants”.