Private Information Leakage in Functional Genomics Data


Functional genomics experiments on human subjects present a privacy conundrum. On one hand, many of the conclusions we infer from these experiments are not tied to the identity of individuals but represent universal statements about disease and developmental stages. On the other hand, by virtue of the experimental procedures, the reads from them are tagged with small bits of patients’ variant information, which presents privacy challenges in terms of data sharing. There are many benefits to sharing the data as broadly as possible. Measuring the amount of variant information leaked in a variety of experiments, particularly in relation to the amount of sequencing, will allow us to uncover ways of reducing information leakage and determine an appropriate set point for sharing information with minimal leakage.

In order to solve the dilemma between data sharing and privacy leakage, we propose a file formatting system that enables the sharing of a large amount of data while protecting individuals’ sensitive information and preserving the utility of the data. The proposed file format can achieve different levels of privacy and utility balance. At the highest level of privacy, our file format masks all the variant information leaked from reads, which can be used to calculate signal profiles with 99% recovery of the original profiles and 100% recovery of the original gene expression levels.

Information Quantification and Linking


We developed various information theory-based measures to quantify the amount of private information leakage at various sampled coverage from the BAM files of different functional genomics experiments. For more details, please refer to our paper link(here). Detailed explanation of our source code and how one can calculate amount of leakaged sensitive information leakage from a given BAM file and linking attack can be found here.

Privacy-Aware File Format System For Sharing Raw Alignment Files from Functional Genomics Experiments

Our privacy-aware file format system has two components: (1) A publicly sharable alignment file (pBAM), (2) a private and light-weight file that needs secure access (.diff). We provide a combination of scripts called "p-tools" that can convert alignment files into privacy-aware file format system and vice versa. Click here for more information.

Frequently Asked Questions

Please send an e-mail to privaseq3@gersteinlab.org for your questions. We will soon have a page with the questions and the answers for everyone.

Developers:

Gamze Gursoy