We are in the process of building the github repository. In the meantime, you can download the leakage quantification and vulnerable fraction computation code from here. The directory contains a README.txt with file formats that the
code uses and an example for running the analysis in the manuscript. Currently, only the leakage from eQTL datasets are considered. The code can evaluate 3 sources of leakage:
- Per eQTL Information Leakage: For each eQTL, code computes the amount of information that is leaked for the given expression and genotype dataset. This leakage is an overall estimate of
how much information this eQTL would leak in general. High information leaking eQTLs can be excluded from queried databases or from the published data.
- Cumulative leakage of Information over sorted eQTL lists: After per eQTL analysis, one should also evaluate, how much predictability and leakage does a list of eQTLs convey to an attacker. This
can be used to estimate risks associated with releasing the list of eQTLs
- Extremity based linking attack: The linking attack can be used when genotype and phenotype datasets are to be published/served. The vulnerable individuals can be excluded from the datasets.
The outputs from each analysis are explained in more detail in the README.txt file.
We are working on extending the code to easily handle other data formats and other types of QTLs so that the leakage estimates can be computed for other data types.