Titel-slide (44 pt groene tekst op groene lijn; 2e regel eronder)

Titel-slide (44 pt groene tekst op groene lijn; 2e regel eronder)

Advanced Bioinformatics Medicago Basic project: Study gene expression under a single condition Team members

Jente Lifei

Yuebang Nick Our chosen eukaryotic organism:

Yeast Input data

Fastq files as sequence data Genome.fa file as a reference genome Genes.gtf Tophat, Cufflinks and Cuffmerge Genes.gtf, genome.* and the fastq files are used

to generate .bam files The accepted_hits.bam is used by Cufflinks to generate a file called transcripts.gtf

Because the experiment was in triplo, we get 3 transcripts.gtf files. These are merged together with Cuffmerge. gtf_to_fasta

With the program gtf_to_fasta we create a fasta file which contains all the transcripts with sequences. So now we have a fasta and a gtf file to extract

data from with the help of programs and scripts. The Big Hash Table From the FASTA we use/determine: Gene_id Sequence

length GC content Codon usage From the GTF we use/determine: Gene_id

Expression level Inter-transcript size Intron length Reading gtf file: Sort top 100 expressed genes

From the GTF we use/determine: Gene_id Expression level Inter-transcript size Intron length Key point:

First order, then get top 100 genes. Build hash table: gene_id(keys) to FPKM, intron

length, inter-transcript(values). Using array:Gene_ID and FPKM in seq[8] Inter-transcript: use defined($seq2[1])

Intron length: divid into different conditons (subroutines) After reading next transcript line, calculate last intron length Important: hash table matching!

Why we need to analyse FPKM, intron length,inter-transcript(correlation)?

FPKM: gene expression level Intron length: positive to gene expression level Inter-transcript: gene density

Reading the fasta file The important information is the sequence.

From this GC content, codon usage etc. can be determined. To couple this info to the gtf output, we analyse the ID as well.

Reading the fasta file The analysis was performed by reading the file line by line, just like the exercises.

Then the ID was extracted from the first line and saved in a heshtable. Normally heshtables have only a key and one value but we managed to put arrays in these

values. Reading the fasta file >xxxxx 1:783285 gene_id etcetcetcetcetcetc. AGCTGCTAGGCTGCGCATCGTGAGCTGCCTTG %hesh ID; seqLength, GC_content, codonUsage

Combine the best of both! The array values from the %gtf hesh table are pushed into the %fasta hesh table.

For example: my $newval = $gtf {$i} [0]; my $newval2 = $gtf {$i} [1]; push @{ $fasta{$ID} }, "$newval\t", "$newval2\t; # Heshtable #

In this way we obtained a table that contained: ID; length, CUP, GC, TSp, TEp, ITL, Intron size(s)

We give options to show a variable number of genes and to sort on specific parameters. Now Jente will unleash his package

Package: Jente My Package Codon Usage Bias R: correlations Jente

Codon Usage Bias Relative Synonymous Codon Usage (RSCU)

Effective Numbers of Codons (NC) Codon Usage Bias RSCU Not in pipeline Optional subroutine Codon Usage Bias

NC = 2 + + + Only possible for sequences that use all amino acids Codon Usage Proportion (CUP) CUP =

R: Correlations FPKM R = -0.1205 GC

R: Correlations R = -0.1220 Highly expressed genes have a more extreme codon bias tRNAs?

R: Correlations R = -0,1282 Highly expressed genes are smaller More efficient? R: Correlations

R = 0,9588 Longer genes use more codons... Visualize highly expressed genes in the interaction network What are Networks? A map of interactions or relationships A collection of nodes and links (edges)

Why Network? predict protein function through identification of partners Proteins relative position in a network Mechanistic understanding of the gene-function & phenotype association Visualize highly expressed genes in the interaction network

Interaction network (1) Download Yeast Interactome: http://interactome.dfci.harvard.edu/S_cerevisiae/index.php?page=download http://www.yeastnet.org/data/ Interaction network (2)

Runing Cytoscape and import yeast Interactome Interaction network (3) Visualize analysis of the interaction network Interaction network (4) Visualize the highly expressed genes in interaction network

Interaction network (5) Interaction network (6) Top 100 genes interactome data Interaction network (7)

Interaction network (8) Interaction network (9) Interaction network (10) Visualize the highly expressed genes in interaction network

Interaction network (11) Interaction network of top 100 intractome data Interaction network (12) GO graph (1) Intall BiNGO

GO graph (2) Import the top 100 expression genes, and start BiNGO GO graph (3) Conclusion In the CCSB-Y|1 file, 8 genes of top 100 highly expressed genes are found, and no directly

interaction among them in the interaction network It is confirmed highly expressed genes are related to production of protein by GO term. Thank you forever

Recently Viewed Presentations

  • AGILE on-board BKG rejection GLAST Italian meeting Francesco

    AGILE on-board BKG rejection GLAST Italian meeting Francesco

    V. Cocco, A.Giuliani, P.Lipari, A. Pellizzoni, M.Prest, M.Tavani. E.Vallazza ... (0 -50 ) 50 MeV gamma-ray, ~50 Monte Carlo Simulations: Simulated Gamma-Ray (E=100 MeV, =30º) Simulated background proton event: It can be rejected by the on-board trigger because of the...
  • Programming Languages & Software Engineering

    Programming Languages & Software Engineering

    Welcome! We have 10 weeks to learn thefundamental concepts of programming languages. With hard work, patience, and an open mind, this course makes you a much better programmer. Even in languages we won't use
  • Portfolio Committee Briefing 26 October 2005 Content  Company

    Portfolio Committee Briefing 26 October 2005 Content Company

    Portfolio Committee Briefing 26 October 2005 Content Company Background Overview of Operations Financial and Production Review Mine Production Challenges Short-term Production Improvement Plans Health and Safety Statistics Possible Opportunities for Growth Social Responsibility and Investment Uncertainties & Litigation Issues Conclusion...
  • The First Century Church " With one accord"

    The First Century Church " With one accord"

    Paul himself was not poor- "to be a citizen of Tarsus one had to pass the means test of owning property worth at least 500 drachmae". He was thought wealthy enough to be able to give a bribe (Acts 24:26)....
  • Case study: Kohler Co. - California State University, Northridge

    Case study: Kohler Co. - California State University, Northridge

    Methodology Case summary Ratio Analysis Valuation: Income approach (DCF) Valuation: Market approach (Multiples) Discount for lack of control and marketability Valuation summary (intrinsic value per share) 2. Income approach Apply the corporate value model Find the market value (MV) of...
  • Chapter 2: Nuclear Chemistry

    Chapter 2: Nuclear Chemistry

    Chapter 2: Nuclear Chemistry A. Fusion B. Nuclear binding energy/Mass defect/binding energy per nucleon C. Neutron & electron capture D. Radioactive decay (β, positron emission, α)-Belt of stability.
  • Information Extraction Shih-Hung Wu Assistant Professor CSIE, Chaoyang

    Information Extraction Shih-Hung Wu Assistant Professor CSIE, Chaoyang

    Information Extraction Shih-Hung Wu Assistant Professor CSIE, Chaoyang University of Technology
  • Indus Valley Civilization Introduction - Lakewood High School

    Indus Valley Civilization Introduction - Lakewood High School

    The Indus Script: Seals and Writing . The most unique objects were square seals made of stone and engraved with symbols and animal motifs. The most common animal on the seals is a mythical unicorn while Abstract or pictographic symbols...