Synthetic data is useful only if it is accurate, said Microsoft Research Director Darren Edge.
“You can generate synthetic data with perfect privacy but zero utility by sampling random values from random distributions.” Useful synthetic data must match the distribution of the real data set, down to the combinations of individual traits such as age, nationality, location, occupation, and so on.
Using Microsoft’s Synthetic Data Showcase, an open-source tool, the United Nations International Organization for Migration has developed a synthetic human trafficking dataset that has the same structure and statistics as the real data.
Thus, the study yields the same insights into what types of people are exploited where and how – but not enough data to track real individuals – in addition to a Power BI dashboard that can be opened in the cloud or uses the free Power BI Desktop app.
The answer lies in controlling the resolution of the data: making sure that a certain combination of attributes applies to a sufficiently large number of people that it does not act like a fingerprint for a specific person.
Microsoft is able to do this using a technique called k-anonymity, where k is the minimum number of people for each combination. Likewise, password monitoring tools such as Have I Been Pwned, 1Password, and Google’s Password Checkup can tell you if your password has been leaked.
For more information, view the original story from Techrepublic.