Differentially Private Data Analysis of Social Networks via ...

Differentially Private Data Analysis of Social Networks via ...

18734: Foundations of Privacy Privacy-preserving Release of Statistics: Differential Privacy Anupam Datta CMU Fall 2015 Privacy-Preserving Statistics: Non-Interactive Setting Add noise, sample,

generalize, suppress x1 xn Analyst Sanitized Database D Goals: Accurate statistics (low noise)

Preserve individual privacy (what does that mean?) Database D maintained by trusted curator

Census data Health data Network data 2 Privacy-Preserving Statistics: Interactive Setting Query f # individuals with salary > $30K

x1 xn f(D)+noise Database D maintained by trusted curator Analyst

Goals: Accurate statistics (low noise) Preserve individual privacy (what does that mean?) Census data

Health data Network data 3 Classical Intuition for Privacy If the release of statistics S makes it possible to determine the value [of private information] more accurately than is possible without access to S, a disclosure has taken place. [Dalenius 1977] Privacy means that anything that can be learned

about a respondent from the statistical database can be learned without access to the database Similar to semantic security of encryption 5 Impossibility Result [Dwork, Naor 2006] Result: For reasonable breach, if sanitized database contains information about database, then some adversary breaks this definition Example

Terry Gross is two inches shorter than the average Lithuanian woman DB allows computing average height of a Lithuanian woman This DB breaks Terry Grosss privacy according to this definition even if her record is not in the database! 6 Very Informal Proof Sketch Suppose DB is uniformly random Breach is predicting a predicate g(DB) Adversarys background knowledge:

r, H(r ; San(DB)) g(DB) where H is a suitable hash function, r=H(DB) By itself, does not leak anything about DB Together with San(DB), reveals g(DB) 7 Differential Privacy: Idea [Dwork, McSherry, Nissim, Smith 2006]

Released statistic is about the same if any individuals record is removed from the database 8 An Information Flow Idea Changing input databases in a specific way changes output statistic by a small amount 9

Not Absolute Confidentiality Does not guarantee that Terry Grosss height wont be learned by the adversary 10 Differential Privacy: Definition Randomized sanitization function has -differential privacy if for all data sets D1 and D2 differing by at

most one element and all subsets S of the range of , Pr[(D1) S ] e Pr[(D2) S ] Answer to query # individuals with salary > $30K is in range [100, 110] with approximately the same probability in D1 and D2 11 Achieving Differential Privacy: Interactive Setting User

Database D Tell me f(D) f(D)+noise x1 xn How much and what type of noise should be added?

12 Slide: Adam Smith Example: Noise Addition 13 Slide: Adam Smith Global Sensitivity

14 Exercise Function f: # individuals with salary > $30K Global Sensitivity of f = ? Answer: 1 15 Background on Probability Theory

(see Oct 11, 2013 recitation) 16 Continuous Probability Distributions Probability density function (PDF), fX Example distributions Normal, exponential, Gaussian, Laplace 17

Laplace Distribution PDF = Mean = Variance = 2b2 Source: Wikipedia 18 Laplace Distribution

Change of notation from previous slide: xy 0 b 19 Achieving Differential Privacy 20

Slide: Adam Smith Laplace Mechanism 21 Laplace Mechanism: Proof Idea Pr[A(x) = t] Pr[A(x) = t]

22 Laplace Mechanism: More details For , we have -differential privacy 23 Slide: Adam Smith

Example: Noise Addition 24 Using Global Sensitivity Many natural functions have low global sensitivity Histogram, covariance matrix, strongly convex optimization problems

25 Composition Theorem If A1 is 1-differentially private and A2 is 2differentially private and they use independent random coins then < A1 , A2 > is (1+2)-differentially private Repeated querying degrades privacy; degradation is quantifiable 26

Applications Netflix data set [McSherry, Mironov 2009; MSR] Accuracy of differentially private recommendations (wrt one movie rating) comparable to baseline set by Netflix Network trace data sets [McSherry, Mahajan 2010; MSR] 27 Challenge: High Sensitivity

Approach: Add noise proportional to sensitivity to preserve -differential privacy Improvements: Smooth sensitivity [Nissim, Raskhodnikova, Smith 2007; BGUPSU] Restricted sensitivity [Blocki, Blum, Datta, Sheffet 2013; CMU] 28 Challenge: Identifying an Individuals Information

Information about an individual may not be just in their own record Example: In a social network, information about node A also in node B influenced by A, for example, because A may have caused a link between B and C 29 Differential Privacy: Summary An approach to releasing privacy-preserving

statistics A rigorous privacy guarantee Significant activity in theoretical CS community Several applications to real data sets Recommendation systems, network trace data,.. Some challenges High sensitivity, identifying individuals information, repeated querying 30

Recently Viewed Presentations

  • From Classical to Contemporary

    From Classical to Contemporary

    Defining the Romantics HUM 2212: British and American Literature I Fall 2012 Dr. Perdigao August 22-24, 2012
  • JNDI - University of Technology Sydney

    JNDI - University of Technology Sydney

    Advanced Java Programming JNDI v2 Chris Wong [email protected] based on notes by Wayne Brookes JNDI - Introduction JNDI = Java Naming and Directory Interface JNDI provides a standard way for Java applications to interface with a variety of naming and...
  • Présentation PowerPoint

    Présentation PowerPoint

    « Par le passé, le rapport annuel devait susciter la confiance des investisseurs. Désormais, il devra en plus imposer la crédibilité de l'entreprise», dit Ron Blunn, dont la firme Blunn & Co., de Toronto, prépare bon an, mal an une...
  • POWERPOINT JEOPARDY - PepsiCo

    POWERPOINT JEOPARDY - PepsiCo

    Select Team players: Evenly divide students into 5 teams. Students should be sitting in a group in order to consult on the correct answers. The first team to quietly get themselves in a small group, will play first.
  • TABLE 5: TOP 20 AUTHORITARIAN REGIMES, 2004 China

    TABLE 5: TOP 20 AUTHORITARIAN REGIMES, 2004 China

    * * Notes. Simplified. Easy Viewing. Elec College. Projections2 . Incumbents. OECD Comparison. OECD Comparison (2) Table 7 Incumbents. Dictatorships Table 5. Table 4 Global Elections
  • Somatotypes - socio-cultural stuff

    Somatotypes - socio-cultural stuff

    Somatotypes GCSE PE Theory Mr. Leighton Today's lesson… ??? ??? ??? Task… 7 mins Using the paper in front of you and the board marker provided define and describe the following two terms…
  • What Is Financial Aid?  Financial aid consists of

    What Is Financial Aid? Financial aid consists of

    Alien registration or permanent resident card (if not a U.S. citizen) The FAFSA - 2019-2020Important Things to Keep In Mind: ... Online help available on the form. Federal Programs. Pell Grant (2018-19 max award $6,095)* ... Marla Kane. Higher Education...
  • And of clay are we created

    And of clay are we created

    (washing machines, lawn mowers, tape recorders etc.) Work once, usually for the first few hours after being brought home, and then never work again ... As Rolf sang an Austrian song, he remembered his sore experience of being led by...