Revenue Maximization by Viral Marketing: A Social Network Hosts Perspective Arijit Khan Nanyang Technological University Singapore Benjamin Zehnder Kossmann ETH Zurich Research Switzerland Donald Microsoft Redmond, USA Viral Marketing in Social Networks A. Khan, B. Zehnder, D. Kossmann 1/20

Viral Marketing in Social Networks Find a small subset of influential individuals in a social network, such that they can influence the largest number of people in the network. [Domingos et. al. KDD 2001, Kempe et. al. KDD 2003] A. Khan, B. Zehnder, D. Kossmann 1/20 Viral Marketing in Social Networks Find a small subset of influential individuals in a social network, such that they can influence the largest number of people in the network. [Domingos et. al. KDD 2001, Kempe et. al. KDD 2003] A. Khan, B. Zehnder, D. Kossmann Viral Marketing as a Service Challenges for Campaigners Social network graph is hidden by the host of the social network (e.g., Facebook, Twitter, LinkedIn)

A campaigner (e.g., AT&T, Sony, Microsoft, Samsung) is unable to identify the top-k seed sets for maximizing her campaign A. Khan, B. Zehnder, D. Kossmann 2/20 Viral Marketing as a Service Challenges for Campaigners Social network graph is hidden by the host of the social network (e.g., Facebook, Twitter, LinkedIn) A campaigner (e.g., AT&T, Sony, Microsoft, Samsung) is unable to identify the top-k seed sets for maximizing her campaign Social network host sells viral marketing campaigns selects seed nodes for its client campaigners. [Lu et. al., KDD 2013]

2/20 Viral Marketing as a Service Challenges for Social Network Host multiple companies compete and they launch comparable products around the same time e.g., Microsofts Surface vs. Apples iPad vs. Samsung Note 3 Host needs to run multiple competing viral marketing campaigns together. 3/20 Viral Marketing as a Service Constraints Each campaigner spends her budget in two parts: - (a) her budget on the seedset size (i.e., the number of seed users, k), - (b) how much money she is willing to pay to the host for each of her target users if that user adopts her product.

An average user will purchase only one of the competing products seed sets are mutually exclusive. 4/20 Our Problem: Hosts Revenue Maximization A. Khan, B. Zehnder, D. Kossmann 5/20 Our Problem: Hosts Revenue Maximization How the campaigner selects the seed set for each of her client campaigner so that the hosts expected revenue is maximized? 5/20 Why Classical Viral Marketing May Not Work? [10$, 1$]

[1$, 10$] [10$, 1$] [10$, 1$] [1$, 10$] [1$, 10$] [10$, 1$] [1$, 10$] Two campaigners C1, C2: seed set size for each campaigner is 1 A. Khan, B. Zehnder, D. Kossmann 6/20 Why Classical Viral Marketing May Not Work? [10$, 1$]

[1$, 10$] C1 [1$, 10$] [10$, 1$] [10$, 1$] C2 [1$, 10$] [10$, 1$] [1$, 10$] Best Solution: V3 C1, V6 C2 . Hosts total revenue = 60$ A. Khan, B. Zehnder, D. Kossmann 6/20 Why Classical Viral Marketing May Not Work? [10$, 1$]

[1$, 10$] C1 [10$, 1$] [1$, 10$] [10$, 1$] [1$, 10$] [10$, 1$] [1$, 10$] Best seed node for C1 (individually): V4 Hosts maximum possible revenue from C1 (individually): 43$ A. Khan, B. Zehnder, D. Kossmann 6/20 Why Classical Viral Marketing May Not

Work? [10$, 1$] [1$, 10$] [10$, 1$] [1$, 10$] [10$, 1$] [10$, 1$] [1$, 10$] C2 [1$, 10$] Best seed node for C2 (individually): V5 Hosts maximum possible revenue from C2 (individually): 43$ A. Khan, B. Zehnder, D. Kossmann

6/20 Why Classical Viral Marketing May Not Work? [10$, 1$] [1$, 10$] C1 [10$, 1$] C2 [1$, 10$] [10$, 1$] [1$, 10$] [10$, 1$] [1$, 10$] V4 C1, V5 C2 . Hosts total revenue = 44$

Suboptimal Solution! 6/20 Roadmap Motivation Related Work Influence Diffusion Models Approximate Algorithms Greedy Heuristics Experimental Results Conclusion A. Khan, B. Zehnder, D. Kossmann 7/20 Related Work Influence Maximization - Domingos et. al. KDD 2001 - Kempe et. al. KDD 2003 Competitive Viral Marketing - Preventing the spread of an existing negative campaign

[Bharathi et. al., WINE 2007] [Borodin et. al., WINE 2007] - Non-cooperative campaigns who select seeds alternatively [Fazeli et. al., CDC 2012] [Tzoumas et. al., WINE 2012] - Competing campaigners promote their products at the same time [Li et. al., SIGMOD 2015] Viral Marketing by Social Network Host - Lu et. al. KDD 2013 8/20 Related Work Influence Maximization - Domingos et. al. KDD 2001 - Kempe et. al. KDD 2003 l a r i yv b - Preventing the spread of an existing negative ioncampaign

t a izet. al.,eWINE m. 2007] m i [Bharathi et. al., WINE 2007] [Borodin l x ma l prob e u e n v e o v n re campaigns - Non-cooperative who select seeds alternatively s sa

i t s g n 2012] [Tzoumas et. al., WINE 2012] [Fazeli al., Ho et.ke tiCDC r ma Competitive Viral Marketing - Competing campaigners promote their products at the same time [Li et. al., SIGMOD 2015] Viral Marketing by Social Network Host - Lu et. al. KDD 2013 A. Khan, B. Zehnder, D. Kossmann 8/20

Influence Diffusion Models Multi-Campaigner Independent Cascade Model (MCIC) Budak et. al. [WWW 2011] Similar to Single-Campaigner IC model When node u first becomes active with campaign of Ci, it gets a single chance to activate each of its currently inactive out-neighbors v with campaign of Ci .It succeeds with probability p(u,v). An activated node v adopts one campaign uniform at random from all its inneighbors which were successfully activated in the last round. Each node can be activated only once and by only one of the campaigns; also the node stays activated with that campaign until the end A. Khan, B. Zehnder, D. Kossmann 9/20 Influence Diffusion Models Multi-Campaigner Independent Cascade Model (MCIC) Budak et. al. [WWW 2011] Similar to Single-Campaigner IC model All All

Possibl Possibl ee Worlds Worlds Pr(v3, C1) = 0.4 + (0.1) = 0.45 Pr(v3, C2) = 0.1 + (0.1) = 0.15 A. Khan, B. Zehnder, D. Kossmann 9/20 Influence Diffusion Models Multi-Campaigner Independent Cascade Model (MCIC) Budak et. al. [WWW 2011] Similar to Single-Campaigner IC model All All Possibl Possibl ee Worlds

Worlds in e m y co e h ry t e v n e o wh wh t c s u d d ro

rien p f a r i Pr(v3, C1) = d 0.4 + (0.1) = 0.45 the op t a h t i e w l ct. u

t d c o r a Peop ont at p c h t t d c ire pt e dPr(v o d a , C )

= 0.1 + (0.1) = 0.15 3 nt2ly e c re 9/20 Influence Diffusion Models Multi-Campaigner Linear Threshold Model (K-LT) Lu et. al. [KDD 2013] Similar to Single-Campaigner LT model If the sum of the probabilities of the incoming edges from all active nodes is greater than or equal to the activation threshold of an inactive node, then the node gets activated in the next round Let us consider all nodes u that were activated in the last round and contributed to the activation of a node v in the current round. Then, v will adopt the same campaign as that of u with probability p(u,v)/ u p(u,v) Each node can be activated only once and by only one of the campaigns; also the node stays activated with that campaign until the end

A. Khan, B. Zehnder, D. Kossmann 10/20 Influence Diffusion Models Multi-Campaigner Linear Threshold Model (K-LT) Lu et. al. [KDD 2013] Similar to Single-Campaigner LT model Time step t1: v2 becomes active with C1 Time step t2: v3 becomes active also with C1 A. Khan, B. Zehnder, D. Kossmann 10/20 Influence Diffusion Models Multi-Campaigner Linear Threshold Model (K-LT) Lu et. al. [KDD 2013] Similar to Single-Campaigner LT model n a

a h t ore m ilar n m e i s h ly w d a n e t, t o p p o o y

d d g a a olo C to rs o hnwith s b c Time step t1: v2 becomes active e h er e d t i g h

1 i c e e a n n d o er pt s h er s o u sed d f a a o e b h

r r t e e nly ce mb o n u A us o t n c , r u e Time step t2: v3 becomes also

with evactive hold roCd1 s p w it. e o r d c H h e t fi t i . p ec d op o gy

l s a o y n l e t h h t tec cen s e t r c e t l s se

mo o h she w ors b h g i ne 10/20 Our Contribution: Complexity Results Hosts revenue maximization problem is NP-hard under MCIC and K-LT models. Hosts revenue maximization problem is neither monotonic, nor sub-modular under MCIC and K-LT models. A. Khan, B. Zehnder, D. Kossmann 11/20

Our Contribution: Complexity Results Hosts revenue maximization problem is NP-hard under MCIC and K-LT models. Hosts revenue maximization problem is neither monotonic, nor sub-modular under MCIC and K-LT models. [3$, 5$] u u 1.0 vv [8$, 9$] Counter-example of monotonicity C C22 v, v, Hosts Hosts revenue revenue =

= 14$ 14$ C C11 u, u, Hosts Hosts revenue revenue = = 11$ 11$ C C22 v, v, C C11 u, u, Hosts Hosts revenue revenue = = 12$ A.12$ Khan, B. Zehnder, D. Kossmann C

C11 u, u, C C22 v, v, Hosts Hosts revenue revenue = = 12$ 12$ Our Contribution: Theoretical Results Polynomial-time exact solution over tree dataset under both MCIC and K-LT models Polynomial-time approximate solution over graph dataset under KLT model*, and theoretical performance guarantee: 1 m 1

1 e * with an additional constraint that each campaigner has the same number of seed nodes. Here, m is the number of campaigners. A. Khan, B. Zehnder, D. Kossmann 12/20 Algorithm: MCIC Model [RevMax-C] Exact Algorithm over Tree Dataset Dynamic programming over binary tree A. Khan, B. Zehnder, D. Kossmann 13/20 Algorithm: MCIC Model [RevMax-C] Exact Algorithm over Tree Dataset Dynamic programming over

binary tree 13/20 Algorithm: MCIC Model [RevMax-C] Exact Algorithm over Tree Dataset Dynamic programming over binary tree Algorithm: MCIC Model [RevMax-C] Exact Algorithm over Tree Dataset Dynamic programming over binary tree Time Complexity: O(ndm2k2m) n = no of nodes d = depth of tree m = no of campaigners k = seed nodes per campaigner 13/20 Algorithm: MCIC Model [RevMax-C]

Heuristic Algorithm over Graph Dataset Find most influential tree from graph dataset Convert most influential tree to an equivalent binary tree Apply dynamic algorithm over binary tree A. Khan, B. Zehnder, D. Kossmann 14/20 Algorithm: K-LT Model [RevMax-C] Approximate Algorithm over Graph Dataset Two-step method with overall performance guarantee: 1 m 1

1 e Find km best seed nodes optimistically assuming that there is only one campaigner Optimally partition the seed nodes among m campaigners Time Complexity: O(mkn(n+e)t + m2k + mkm) n = no of nodes, e = no of edges t = no of MC Samples m = no of campaigners k = seed nodes per campaigner 15/20 Efficient Heuristic Algorithm [RevMaxS] Sort the campaigners in descending order of the expected revenue from that campaigner. Apply classical viral marketing algorithms to find the seed set for each campaigner in order. Delete already selected seed nodes of previous campaigners before deciding seed nodes for the current campaigner. Time Complexity: O(mkn(n+e)t )

n = no of nodes, e = no of edges t = no of MC Samples m = no of campaigners k = seed nodes per campaigner 16/20 List of Experiments Datasets: Revenue Distribution Models: - Uniform (U) - Not Equal (NE) - Clustering with Low Competition (CLC) - Clustering with High Competition (CHC) - Clustering with Not Equal Competition (CNC) Algorithms (RevMax-C, RevMax-S): hosts expected revenue, running time, and scalability under MCIC and K-LT models Revenue Improvement Rate (RIR): ratio of the hosts expected revenue obtained from the seed sets identified by RevMax-C (or, RevMax-S) with respect to the hosts revenue obtained from a random seed sets. List of Experiments Datasets:

Revenue Distribution Models: - Uniform (U) - Not Equal (NE) - Clustering with Low Competition (CLC) - Clustering with High Competition (CHC) - Clustering with Not Equal Competition (CNC) We We vary vary the the number number of of campaigners campaigners and and the the number number of of seed seed nodes nodes per per

campaigner campaigner Algorithms (RevMax-C, RevMax-S): hosts expected revenue, running time, and scalability under MCIC and K-LT models Revenue Improvement Rate (RIR): ratio of the hosts expected revenue obtained from the seed sets identified by RevMax-C (or, RevMax-S) with respect to the hosts revenue obtained from a random seed sets. Experimental Results: MCIC Influence Cascade Efficiency of Seeds Finding Effectiveness in terms of Hosts Revenue 18/20 Experimental Results: Scalability of RevMax-S Varying Number of Seed Nodes A. Khan, B. Zehnder, D. Kossmann

Varying Number of Campaigners 19/20 Conclusions Hosts revenue maximization by viral marketing novel problem NP-hard, neither monotonic, nor sub-modular Algorithms - RevMax-C [approximation guarantees under additional constraints] - RevMax-S [more efficient greedy heuristic] RevMax-C usually outperforms RevMax-S by 5~10% in terms of hosts revenue. RevMax-S scalable for more number of seeds and campaigners Future Work: more efficient algorithms, how the campaigner divides her budget optimally? A. Khan, B. Zehnder, D. Kossmann 20/20 Questions? A. Khan, B. Zehnder, D. Kossmann