Managing Uncertainty in Social Networks

Managing Uncertainty in Social Networks

Managing Uncertainty in Social Networks WRITTEN BY: EYTAN ADAR AND CHRISTOPHER RE PRESENTED BY: AMIR Adar, E., & Re, C. (2007). Managing uncertainty in social networks. IEEE Data Eng. Bull., 30(2), 15-22. Social network analysis Social Network Analysis (SNA): It is a process to analyze social network related data (e.g., network structure, use of network, etc.). Traditionally, the SNA is done by graph theory.

SNA Challenges in SNA 1. Multiple sources may provide different results. Sometime you forget to update your information in all places. Maybe, you are not focusing on all Social Networking Sites (SNS) at the same time. 2. How to combine these data? Combine data from different SNS, blogs, etc. 3. How much certainty each data has, and their combinations? 4. Network is growing, but, we still need faster way to find the data.

Abstract Collecting Social Network (SN) related data may be imprecise as the network grows (e.g., shift in scale). Traditional techniques (e.g., graph theory) works fine with small database, however the degree of imprecision increases as the database become large. Probabilistic database management addresses existing challenges. Background and solution In last 50 years, SNA has grown a lot. Therefore, analyzing large-scale social network received public attention. SNA helps to understand marketing, health, communication, and trend of

commercial applications. Large-scale comes with imprecision, we need ways to analyze the data with confidence. Background and solution Traditionally managing and analyzing large-scale data has been the domain of data management research and technologies, which have almost always assumed that the data is precise. Probabilistic databases (PDBs) can be a solution to model, manage and mine emerging SNA Graph techniques Nodes are called actor--people

Edges are relationship between nodes Issues with collecting Social Network data Survey Observation\Missed observation Reporting\misreporting Imitate involvement of researcher (frequently, and over extended periods) Internet\spammers Noise data: tremendous level of uncertainty in the data. Challenges How to find an accurate (up the level of certainty) data?

How to find data fast? How to maintain large scale database? Probabilistic relational databases All these challenges can be addressed with the help of Probabilistic relational databases. As Adar and Re (2007) wrote, Probabilistic relational databasesthe potential answer to these issues (p. 2). In following slides I will try to explain how they used Probabilistic relational databases to address mentioned challenges. Adar, E., & Re, C. (2007). Managing uncertainty in social networks. IEEE Data Eng. Bull., 30(2), 15-22.

PD and RD For example, Tuples (i.e., rows of data) can be stored, searched, and aggregated in different ways. StudentID Name CourseID CourseName 001

John 001 Intro to CS 002 Paul 002

Intro to Math 003 Smith 003 Intro o Physics Student Table Course Table

StudentI Name D 001 John 002 Paul 003 Smith Student Table EnrollmentI D

001 002 003 StudentID CourseID 001 001 001 002 002

001 Enrollment Table CourseID 001 002 003 CourseNam e Intro to CS Intro to Math

Intro o Physics Course Table StudentI Name D 001 John 002 Paul 003 Smith Student Table

EnrollmentI D 001 002 003 StudentID 001 001 002 CourseID

001 002 001 Enrollment Table Probability CourseID 0.9 0.1 1.0

001 002 003 CourseNam e Intro to CS Intro to Math Intro o Physics Course Table

Another example of imprecise data You have seen Paul in Apartment A 9 times and in Apartment B 1 time. Which apartment does he live in? Diffusion model The diffusion model is a model of the cognitive processes involved in simple two-choice decisions. It separates the quality of evidence entering the decision from decision criteria and from other, nondecision, processes such as stimulus encoding and response execution. The model should be applied only to relatively fast two-choice decisions (mean RTs less than about 1000 to 1500 ms) and only to decisions that are a single-stage decision process (as

opposed to the multiple-stage processes that might be involved in, for example, reasoning tasks). Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decision tasks. Neural computation, 20(4), 873-922. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2474742/ Model and system Data From Fig 1: There are two types of data (1) standard actor/node information (e.g.., name, age, residing city, etc.) see figure 1b (2) preference data (e.g., music preference) in terms of genre. see figure 1c

This model is assigning a probability to tuple (p. 2) Continue Relation between genre and songs (see Fig 1d) What is the probability that a user would like A? SELECT U.name FROM Users U, Prefs P, Songs S WHERE U.name = P.name AND P.genre = S.genre AND S.name =A

User Kim would have probability of 0.75*0.1 = 0.075 What is the probability that Kim will be affected by these recommendations? SELECT U.Name FROM Users U , Prefs P, Song S, MusicInfluence M, Recommends R WHERE U.name = P.name AND P.Genre = S.genre AND M.name2 = U.name AND M.name1 = R.from AND R.To = U.name AND R.song = S.sname AND S.sname = D

0.6*0.75*(1 (1-0.8)*(1-0.9)) = 0.441 Application area Data analysis (SNA) in our scenario we may be interested in distributing free concert tickets for a new artist to a small subset of users. Since there is a cost to providing these tickets, we would like to find a small group of consumers who have a large amount of influence, i.e. a set of trend-setters or influencers. How would you identify influencers? Sol: Nodes with a high degree-centrality measure (e.g. the number of outgoing edges is high). Keep in mind that different social network may provide different values---uncertain data??? Sol: Send the tickets to high value users, e.g. those with a high probability of having more than k

edges Scalability: Find similar nodes? Record number of outgoing nodes and incoming nodes. List them, sort them. Sometimes, query may take long time to execute and degrade the performance of the system. Run: SQL safe plans. Safe plans tell us when a probabilistic query can be computed by simply multiplying (and summing) probabilities. Also, materialize a probabilistic view. This will help us to scale

Materialized view A materialized view is simply the result of a query that has been precomputed and stored. n traditional databases, materialized views are used widely to speed up query processing (Dalvi, Re & Suciu, 2011, p. 473). Dalvi, N., Re, C., & Suciu, D. (2011). Queries and materialized views on probabilistic databases. Journal of Computer and System Sciences, 77(3), 473-490. It is sort of copy of a table(s) Changes will reflect in the materialized view This is run periodically If can also create index in

materialized view to improve performance Delete will reflect immediately Joining takes time, however, once created and stored, the view would be faster Maintainability Adding new tuple adds new probability value. It also changes the relationship It is important for administrator to know how and why the probability value is computed.

Integration Relational database allows to integrate databases (e.g., tables) using reconciliation method. Missing data Object Instance Probability A

A1 0.7 A2 ???0.3 B1 0.6

B2 0.2 B3 ?? 0.2 C1 0.7 C2

?? 0.3 B C Conclusion Probabilistic databases are useful paradigm Helpful for data collection and analysis. It helps in many database related issues, e.g., integration, calculation, handling missing data, etc. Still there is lot to explore.

Recently Viewed Presentations

  • Distribute Violation Click on the Workbench link to

    Distribute Violation Click on the Workbench link to

    Distribute Violation Click on the Workbench link to open up your Work Item List. To work on an Activity you must first click the Claim button. After you click the Claim button for the Activity you want to work on...
  • Classifying Matter:Elements, Compounds, and Mixtures

    Classifying Matter:Elements, Compounds, and Mixtures

    Compounds. Pure substance composed of two or more different elements joined by chemical bonds.. Made of elements in a specific ratio . that is always the same. Has a chemical formula. Can only be separated by
  • Difficult Cases: Itching without Rash Gerald Lee, MD

    Difficult Cases: Itching without Rash Gerald Lee, MD

    Patient was started on gabapentin, and was titrated up to 300mg, 4 times a day. Her symptoms improved, but did not resolve despite high doses of gabapentin. She was referred to an orthopedist, and received epidural steroids with complete resolution...
  • TOPIC 2: Industry and Immigration (1865-1914) Lesson 1:

    TOPIC 2: Industry and Immigration (1865-1914) Lesson 1:

    America was the perfect place for the growth of industry. Access to natural resources to fuel economic development. Coal, timber, oil and rivers (transportation and hydroelectric power) Growing work force to act as labor- immigration! ... "New South" ...
  • Google Scholar - lums.ac.ir

    Google Scholar - lums.ac.ir

    Google Scholar search results include: "cited by", "related article" links, and more. Google Scholar is a branch of Google's main search engine. It provides a simple springboard for searching the internet for scholarly resources, such as articles, legal opinions, briefs,...
  • From GDPR to Brexit What to Expect in

    From GDPR to Brexit What to Expect in

    Country Updates and Legal Changes. Title of the Presentation. ... 2017 designed to promote immigration and replace a stricter system. ... Thailand. Work Pass Card - Singapore. Online payment portals for Government fees - Malaysia.
  • Fetal Development - PC\|MAC

    Fetal Development - PC\|MAC

    Fetal Development Month by Month OBJECTIVE: Describe the developmental changes that occur during human gestation. First Month Germinal Stage (first 2 weeks after conception ) The egg is fertilized about week 2 and implanted in the uterus about week 3...
  • Unit 2 Invertebrates

    Unit 2 Invertebrates

    Sponges - Phylum Porifera General Characteristics Distinguishing Features - collar cells General Habitat - benthic Level of organization - cellular Symmetry - none Feeding type - filter feeder Sessile - attached to surface Outer surface structure Covered with epithelial cells...