RECOMBINOMICS: Myth or Reality? Laxmi Parida IBM Watson

RECOMBINOMICS: Myth or Reality? Laxmi Parida IBM Watson

RECOMBINOMICS: Myth or Reality? Laxmi Parida IBM Watson Research New York, USA IBM Computational Biology Center RoadMap 1. Motivation 2. Reconstructability

(Random Graphs Framework) 3. Reconstruction Algorithm (DSR Algorithm) 4. Conclusion 2 IBM Computational Biology Center 3

IBM Computational Biology Center www.nationalgeographic.com/genographic 4 IBM Computational Biology Center www.ibm.com/genographic 5 IBM Computational Biology Center

Five year study, launched in April 2005 to address anthropological questions on a global scale using genetics as a tool Although fossil records fix human origins in Africa, little is known about the great journey that took Homo sapiens to the far reaches of the earth. How did we, each of us, end up where we are? phylogeographic question

Samples all around the world are being collected and the mtDNA and Y-chromosome are being sequenced and analyzed 6 IBM Computational Biology Center DNA material in use under unilinear transmission 16000 bp 58 mill bp 0.38%

7 IBM Computational Biology Center Missing information in unilinear transmissions past present 8 IBM Computational Biology Center

Paradigm Shift in Locus & Analysis Using recombining DNA sequences Why? Nonrecombining gives a partial story 1. represents only a small part of the genome 2. behaves as a single locus 3. unilinear (exclusively male of female) transmission Recombining towards more complete information

Challenges Computationally very complex How to comprehend complex reticulations? 9 IBM Computational Biology Center RoadMap

1. Motivation 2. Reconstructability (Random Graphs Framework) 3. Reconstruction Algorithm (DSR Algorithm) 4. Conclusion L Parida, Pedigree History: A Reconstructability Perspective using Random-Graphs Framework, Under preparation. 10

IBM Computational Biology Center RoadMap 1. Motivation 2. Reconstructability (Random Graph Framework) 3. Reconstruction Algorithm (DSR Algorithm)

4. Conclusion L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 122, 2008 L Parida, A Javed, M Mele, F Calafell, J Bertranpetit and Genographic Consortium, Minimizing Recombinations in Consensus Networks for Phylogeographic Studies, BMC Bioinformatics 2009 11 IBM Computational Biology Center INPUT: Chromosomes (haplotypes) OUTPUT: Recombinational Landscape (Recotypes)

12 IBM Computational Biology Center Our Approach Granularity g statistical NO Acceptable p-value? YES

IRiS combinatorial statistical Analyze Results M Mele, A Javed, F Calafell, L Parida, J Bertranpetit and Genographic Consortium Recombination-based genomics: a genetic variation analysis in human populations, under submission. 13 IBM Computational Biology Center

Preprocess: Dimension reduction via Clustering 11 12 13 14 15 16 0 17 1 18

4 19 6 5 20 8 21 9 10 7 22 23 3

14 IBM Computational Biology Center Analysis Flow Granularity g NO statistical Acceptable p-value? YES

IRiS Analyze Results combinatorial statistical 15 IBM Computational Biology Center p-value Estimation

16 IBM Computational Biology Center Comparison of the Randomization Schemes 17 IBM Computational Biology Center SNP Blocks (granularity g=3)

18 IBM Computational Biology Center Analysis Flow Granularity g NO statistical Acceptable p-value? YES

IRiS Analyze Results combinatorial statistical 19 IBM Computational Biology Center IRiS

(Identifying Recombinations in Sequences) Stage Haplotypes: use SNP block patterns biological insights Segment along the length: infer trees computational insights Infer network (ARG) L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 122, 2008

20 IBM Computational Biology Center Segmentation 12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345 11111111111111111111111111111111111111112222222222222222222222222222222222233333333344444444455555555555555---- 21 IBM Computational Biology Center

Segmentation 22 IBM Computational Biology Center Consensus of Trees 23 IBM Computational Biology Center

Algorithm Design 1. Ensure compatibility of component trees 2. Parsimony model: minimize the no. of recombinations 24 IBM Computational Biology Center Algorithm Design 1. Ensure compatibility of component trees

2. Parsimony model: minimize the no. of recombinations Theorem: The problem is NP-Hard. It is impossible to design an algorithm that guarantees optimality. 25 IBM Computational Biology Center DSR Scheme (DominantSubdominant---Recombinant) 26

IBM Computational Biology Center DSR Scheme: Level 1 27 IBM Computational Biology Center DSR Assignment Rules 1. At most one D per row and column; if no D, at most one S per row and column

2. At most one non-R in the row and column, but not both 28 IBM Computational Biology Center DSR Assignment Rules 1. Each row and each column has at most one D ELSE has at most one S

2. A non-R can have other non-Rs either in its row or its column but NOT both 29 IBM Computational Biology Center DSR Scheme: Level 1 30 IBM Computational Biology Center

DSR Scheme: Level 2 31 IBM Computational Biology Center DSR Scheme: Level 2 32 IBM Computational Biology Center DSR Scheme: Level 3

33 IBM Computational Biology Center DSR Scheme: Level 3 34 IBM Computational Biology Center DSR Scheme: Level 4

35 IBM Computational Biology Center DSR Scheme: Level 5 36 IBM Computational Biology Center Mathematical Analysis: Approximation Factor Greedy DSR Scheme

Z and Y are computable functions of the input L Parida, A Javed, M Mele, F Calafell, J Bertranpetit and Genographic Consortium, Minimizing Recombinations in Consensus Networks for Phylogeographic Studies, BMC Bioinformatics 2009 37 IBM Computational Biology Center Analysis Flow Granularity g NO

statistical Acceptable p-value? YES IRiS Analyze Results combinatorial statistical 38

IBM Computational Biology Center IRiS Output: RECOTYPE Recombination vectors R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 s1 1 0 0 0 1 1 1 1 0 0 0 0 1 0

s2 0 1 0 1 1 1 0 1 0 0 1 0 0 0 . . . . .

. . 39 IBM Computational Biology Center Quick Sanity Check: Ultrametric Network on RECOTYPES 40 IBM Computational Biology Center

IRiS (Identifying Recombinations in Sequences) Stage Haplotypes: use SNP block patterns IRiS software will be released by the end summer Segmentof along the length: infer09 trees

biological insights computational insights Infer network (ARG) Asif Javed L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 122, 2008 41

IBM Computational Biology Center Whats in a name? 1. Allele-frequency variations between populations is also reflected RECOMBIN-OMICS in the purely recombination-based variations Jaume Bertranpetit 2. Detects subcontinental divide from short segments

based on populations level analysis RECOMBIN-OMETRICS 3. Detects populations from short segments based on recombination events analysis Robert Elston

42 IBM Computational Biology Center wepopulations ready for the 1. Allele-frequency variationsAre between is also reflected OMICS / OMETRICS?

in the purely recombination-based variations 2. Detects subcontinental divide from short segments based on populations level analysis population-specific signals ? 3. Detects populations from shorto segments other critical signals ? o

based on recombination events analysis o anything we didnt already know? 43 IBM Computational Biology Center Thank you!!

44 IBM Computational Biology Center 45

Recently Viewed Presentations

  • Sections 9-1 and 9-2 - Gordon State College

    Sections 9-1 and 9-2 - Gordon State College

    PAIRED DATA. Is there a linear relationship? If so, what is the equation? Use that equation for prediction. In this chapter, we will look at paired sample data (sometimes called
  • Aynn Setright Academic Director, SIT Study Abroad Nicaragua

    Aynn Setright Academic Director, SIT Study Abroad Nicaragua

    Aynn SetrightAcademic Director, SIT Study Abroad Nicaragua. SIT SymposiumBrattleboro, VermontAugust, 2010. ... "Low Intensity Conflict" 22,000 total vs. Ejercito. Popular Sandinista. 1984: Elections - ... Role of USA. Property Issues/Human Rights Issues. Truth Commissions.
  • Pecha Kucha Template - Captain

    Pecha Kucha Template - Captain

    You can put whatever you want on each slide. There are 20 slides to fill after the title slide. ΒΌ of the way there!
  • What do we know about Dark Matter and

    What do we know about Dark Matter and

    Incomplete understanding of gravitation Modified Newtonian Dynamics (MOND) Nonsymmetric gravity General relativity The silent majority: Dark Energy Aside: Standard Cosmology Based on Einstein's theory of Gravity, aka General Relativity Assumes isotropic, homogeneous universe This "smeared out mass" property is approximately...
  • Chapter 1: Network Media

    Chapter 1: Network Media

    STP. Wireless Media. The two models. Data. HTTP Header. TCP Header. IP Header. Data Link Header. Data Link Trailer. Protocol SuitesTCP/IP Protocol Suite and Communication. Data EncapsulationProtocol Data Units (PDUs) Getting it ConnectedConnecting to the Network.
  • Applying Machine Translation Metrics to Student-Written Translations Lisa

    Applying Machine Translation Metrics to Student-Written Translations Lisa

    Sentence boundaries. Michaud and McCoy. EXPERT: Due to the fact that previous studies have shown that POMC neurons are linked to obesity, the scientists believe the findings might also lead to developing treatments to control obesity and other metabolic disorders.
  • PEEKSKILL HIGH SCHOOL Forward and Upward in the Pursuit of ...

    PEEKSKILL HIGH SCHOOL Forward and Upward in the Pursuit of ...

    Peekskill Board of Education Presentation October 6, 2015 Cassandra Hyacinthe, Ed. D., Principal * * * Mission Statement The Mission of the Peekskill City School District is to educate students in a caring, inspiring environment characterized by a spirit of...
  • Trauma Related High-Altitude Pulmonary Edema Christine Ebert-Santos Ebert

    Trauma Related High-Altitude Pulmonary Edema Christine Ebert-Santos Ebert

    A study performed on Swiss/Italian border located at 4559m showed that 50% of visitors to high altitude may have asymptomatic fluid accumulation in the lungs, consistent with occult edema. This usually resolves spontaneously even though subjects remain at high altitude.3