Accessibility of covariance information creates vulnerability in Federated Learning frameworks.

30/08/2023

Bioinformatics (Oxford, England)

Authors: Manuel Huth, Jonas Arruda, Roy Gusinow, Lorenzo Contento, Evelina Tacconelli, Jan Hasenauer

MOTIVATION: Federated Learning (FL) is gaining traction in various fields as it enables integrative data analysis without sharing sensitive data, such as in healthcare. However, the risk of data leakage caused by malicious attacks must be considered. In this study, we introduce a novel attack algorithm that relies on being able to compute sample means, sample covariances, and construct known linearly independent vectors on the data owner side.

RESULTS: We show that these basic functionalities, which are available in several established FL frameworks, are sufficient to reconstruct privacy-protected data. Additionally, the attack algorithm is robust to defense strategies that involve adding random noise. We demonstrate the limitations of existing frameworks and propose potential defense strategies analyzing the implications of using differential privacy. The novel insights presented in this study will aid in the improvement of FL frameworks.

AVAILABILITY AND IMPLEMENTATION: The code examples are provided at GitHub (https://github.com/manuhuth/Data-Leakage-From-Covariances.git). The CNSIM1 data set which we used in the manuscript is available within the DSData R package (https://github.com/datashield/DSData/tree/main/data).

SUPPLEMENTARY INFORMATION: Mathematical proves and further information are available online.

PMID: 37647639

Participating cluster members

Prof. Dr. Jan Hasenauer

Life and Medical Sciences Institute (LIMES) and Hausdorff Center for Mathematics
jan.hasenauer@uni-bonn.de View member: Prof. Dr. Jan Hasenauer

Participating cluster members

Prof. Dr. Jan Hasenauer