September 27, 2021

Dedicated Forum to help removing adware, malware, spyware, ransomware, trojans, viruses and more!

DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation. (arXiv:2103.11109v4 [cs.LG] UPDATED)

Recent success of deep neural networks (DNNs) hinges on the availability of
large-scale dataset; however, training on such dataset often poses privacy
risks for sensitive training information. In this paper, we aim to explore the
power of generative models and gradient sparsity, and propose a scalable
privacy-preserving generative model DATALENS. Comparing with the standard PATE
privacy-preserving framework which allows teachers to vote on one-dimensional
predictions, voting on the high dimensional gradient vectors is challenging in
terms of privacy preservation. As dimension reduction techniques are required,
we need to navigate a delicate tradeoff space between (1) the improvement of
privacy preservation and (2) the slowdown of SGD convergence. To tackle this,
we take advantage of communication efficient learning and propose a novel noise
compression and aggregation approach TOPAGG by combining top-k compression for
dimension reduction with a corresponding noise injection mechanism. We
theoretically prove that the DATALENS framework guarantees differential privacy
for its generated data, and provide analysis on its convergence. To demonstrate
the practical usage of DATALENS, we conduct extensive experiments on diverse
datasets including MNIST, Fashion-MNIST, and high dimensional CelebA, and we
show that, DATALENS significantly outperforms other baseline DP generative
models. In addition, we adapt the proposed TOPAGG approach, which is one of the
key building blocks in DATALENS, to DP SGD training, and show that it is able
to achieve higher utility than the state-of-the-art DP SGD approach in most
cases. Our code is publicly available at