June 20, 2021


Dedicated Forum to help removing adware, malware, spyware, ransomware, trojans, viruses and more!

On Privacy and Confidentiality of Communications in Organizational Graphs. (arXiv:2105.13418v1 [cs.CR])

Machine learned models trained on organizational communication data, such as
emails in an enterprise, carry unique risks of breaching confidentiality, even
if the model is intended only for internal use. This work shows how
confidentiality is distinct from privacy in an enterprise context, and aims to
formulate an approach to preserving confidentiality while leveraging principles
from differential privacy. The goal is to perform machine learning tasks, such
as learning a language model or performing topic analysis, using interpersonal
communications in the organization, while not learning about confidential
information shared in the organization. Works that apply differential privacy
techniques to natural language processing tasks usually assume independently
distributed data, and overlook potential correlation among the records.
Ignoring this correlation results in a fictional promise of privacy. Naively
extending differential privacy techniques to focus on group privacy instead of
record-level privacy is a straightforward approach to mitigate this issue. This
approach, although providing a more realistic privacy-guarantee, is
over-cautious and severely impacts model utility. We show this gap between
these two extreme measures of privacy over two language tasks, and introduce a
middle-ground solution. We propose a model that captures the correlation in the
social network graph, and incorporates this correlation in the privacy
calculations through Pufferfish privacy principles.