Linguaggi sovrapposti: il codice dello splicing 17 dicembre 2010 Trento Linguaggi sovrapposti: il codice dello splicing Piva F, Giulietti M, Principato G Department of Biochemistry, Biology and Genetic Politechnic University of Marche, Ancona
One point mutation at a time BRCA1 exon 18 100% 20% 17 18 19 17 18 19 80% 17 19 Binding of DAZAP1 and hnRNPA1/A2 to an Exonic Splicing Silencer in a Natural BRCA1 Exon 18 Mutant Goina E, Skoko N, Pagani F. Mol Cell Biol 2008; 28: 3850–3860
Two point mutations at a time BRCA1 exon 18 Decreased efficiency Complete exon 18 skipping Binding of DAZAP1 and hnRNPA1/A2 to an Exonic Splicing Silencer in a Natural BRCA1 Exon 18 Mutant Goina E, Skoko N, Pagani F. Mol Cell Biol 2008; 28: 3850–3860
Effect of variations in CFTR exon 9 WT 5’-ACAGTTGTTGGCGGTTG-3’ TACCACCC TTATT GGTTC AA CCGC G G T 100 pathological 90 80 pathological 70 % exon 9 inclusion 60 pathological 50 40 30 20 10 A G T G A G T C T C G C A C A C A C C T T C A G T T C T WT 144A 145C 146A 147G 148T 149T 150G 151T 153G 154G 155C 156G 157G ex9 + ex9 - Pagani, F., Buratti, E., Stuani, C., and Baralle, F. E. (2003) J Biol Chem Pagani, F., Stuani, C., Zuccato, E., Kornblihtt, A. R., and Baralle, F. E. (2003) J Biol Chem
The genetic code is degenerate, but it is not all rodustness GCAGTACGA GCAGTACGC GCAGTACGG GCAGTACGT GCAGTAAGA GCAGTAAGG GCAGTCCGA GCAGTCCGC GCAGTCCGG GCAGTCCGT GCAGTCAGA GCAGTCAGG GCAGTGCGA GCAGTGCGC GCAGTGCGG GCAGTGCGT GCAGTGAGA GCAGTGAGG GCAGTTCGA GCAGTTCGC GCAGTTCGG GCAGTTCGT GCAGTTAGA GCAGTTAGG GCCGTACGA GCCGTACGC GCCGTACGG GCCGTACGT GCCGTAAGA GCCGTAAGG GCCGTCCGA GCCGTCCGC GCCGTCCGG GCCGTCCGT GCCGTCAGA GCCGTCAGG GCCGTGCGA GCCGTGCGC GCCGTGCGG GCCGTGCGT GCCGTGAGA GCCGTGAGG GCCGTTCGA GCCGTTCGC GCCGTTCGG GCCGTTCGT GCCGTTAGA GCCGTTAGG GCGGTACGA GCGGTACGC GCGGTACGG GCGGTACGT GCGGTAAGA GCGGTAAGG GCGGTCCGA GCGGTCCGC GCGGTCCGG GCGGTCCGT GCGGTCAGA GCGGTCAGG GCGGTGCGA GCGGTGCGC GCGGTGCGG GCGGTGCGT GCGGTGAGA GCGGTGAGG GCGGTTCGA GCGGTTCGC GCGGTTCGG GCGGTTCGT GCGGTTAGA GCGGTTAGG GCTGTACGA GCTGTACGC GCTGTACGG GCTGTACGT GCTGTAAGA GCTGTAAGG GCTGTCCGA GCTGTCCGC GCTGTCCGG GCTGTCCGT GCTGTCAGA GCTGTCAGG GCTGTGCGA GCTGTGCGC GCTGTGCGG GCTGTGCGT GCTGTGAGA GCTGTGAGG GCTGTTCGA GCTGTTCGC GCTGTTCGG GCTGTTCGT GCTGTTAGA GCTGTTAGG . . . Ala Val Arg . . . GCA C G T GTA C G T CGA C G T AGA 4 * 4 * 6 = 96 Three AAs specified by 96 synonymous words 5
An additional exonic constraints: the splicing code
cryptic exon exon31 NF1 gene ttttatagTGAGAATA A>G WT MUT La mutazione attiva un esone criptico (in rosso) Raponi M, Upadhyaya M, Baralle D. Functional splicing assay shows a pathogenic intronic mutation in neurofibromatosis type 1 (NF1) due to intronic sequence exonization. Hum Mutat. 2006; 27(3):294-295.
cryptic exon exon31 NF1 gene CAGgtattg TAGataata CAAgtattg TAGgtggga Disruption of 5’ss restores normal splicing TAGataata CAAgtattg TAGgtggga CAAgtaagc TAGgtaata CAAgtaagg La seq 2 ha un sito di splicing in 5’ più debole della seq 1. La seq 3 non ha il sito. Raponi M, Upadhyaya M, Baralle D. Functional splicing assay shows a pathogenic intronic mutation in neurofibromatosis type 1 (NF1) due to intronic sequence exonization. Hum Mutat. 2006;27(3):294-295.
ATM gene structure mutations results M WT del mut 20 21 WT: GGCCAGGTAAGTGATA 20 21 mutations DEL: GGCCAG____GTGATA MUT: GGCCAGGTCTGTGATA M WT del mut results 20 21 A new type of mutation causes a splicing defect in ATM Pagani F, Buratti E, Stuani C, Bendix R, Dörk T, Baralle FE Nature Genetics 2002, 30: 426-429 20 21 9
AIM: mRNA structure pre mRNA sequence SPLICING PREDICTION TOOL 10
A compact formalism, but… score matrix 11
Compression and reconstruction of motifs AGG AGT CGT Experimental assessed binding sites zip AGG AGT CGG CGT A G consensus sequence G unzip C T 12
elements promoting exons elements promoting introns 14
15
ESE, ISS: esone ESS, ISE: introne
PROTEINS REGULATING SPLICING STORED IN SPLICEAID 9G8, CUG-BP1, DAZAP1, ETR-3, Fox-1, Fox-2, FMRP, hnRNP A0, hnRNP A1, hnRNP A2/B1, hnRNP C, hnRNP C1, hnRNP C2, hnRNP D, hnRNP D0, hnRNP DL, hnRNP E1, hnRNP E2, hnRNP F, hnRNP G, hnRNP H1, hnRNP H2, hnRNP I (PTB), hnRNP J, hnRNP K, hnRNP L, hnRNP LL, hnRNP M, hnRNP P (TLS), hnRNP Q, hnRNP U, HTra2alpha, HTra2beta1, HuB, HuD, HuR, KSRP, MBNL1, Nova-1, Nova-2, nPTB, PSF, RBM4, RBM25, Sam68, SAP155, SC35, SF1, SF2/ASF, SLM-1, SLM-2, SRp20, SRp30c, SRp38, SRp40, SRp54, SRp55, SRp75, TDP43, TIA-1, TIAL1, YB-1, ZRANB2 …
EXPERIMENTALLY ASSESSED BINDING Some comparisons among literature data (SpliceAid) and prediction tools SEQUENCE SPLICEAID PREDICTORS EXPERIMENTALLY ASSESSED BINDING ESE Finder Rescue ESE Splicing Rainbow ACAAC YB-1 no binding no ESE SRp40 GAAGAAGA HTra2A, HTra2B1, SF2/ASF, SC35, SRp40, SRp55, SRp75 3 ESE Tra2B CUGGCGUCGUCGC SF2/ASF, SRp55 2 ESE SRp40, SRp55 UGACUG hnRNP A1 UUUUAGACAA hnRNP C1, Sam68, hnRNP A1, hnRNP D, hnRNP E1, hnRNP E2, SRp38 1 ESE hnRNP A2/B1, hnRNP C1/C2, hnRNP E1/E2, SRp40, SRp55, U2AF65 UGUGUGUGUGUGUGUGUG CUG-BP1, ETR-3, TDP43 SRp55 hnRNP U 19
SpliceAid 2
Correlazioni favorite Giunzioni esoniche in fase 0 Giunzioni esoniche in fase 1 Giunzioni esoniche in fase 2 m -1 1 n 10 -10 m -1 1 n 10 -10 m -1 1 n 10 A………………………………....A A…………………………………………..A A……………………………………………A A………T A……………………….A T……T C…………...G A……………………………………………………..A T……………………………………………………...T T…………………………T G……………………………G G………………………..G G…………………………………………..C G………………………………………………..G A…………………………………A A………………………………………….A A………………………….A A………………………………......A C………G C……………….C T…...G A………A T………T A………………A C………………C A……………………….A A……………………….T C……………………….C C………………………….C G………………………………………………..…..G A………………………………....A A.................................................T T……………………………………………..T A……………….A A…………………………A A…………………………………..A C......C C……….C G………C A..T C..C T…C T…T C……G T……T T………G C………….G C………………………..T C……………………………G Correlazioni favorite tra la fine di un esone e l’inizio del successivo
Correlazioni tra l’inizio e la fine degli introni umani Esoni che iniziano in fase 0 e terminano in fase 0 Fase 0 1 Fase 0 2 1 10 -10 -1 1 10 -10 -1 1 10 -10 -1 G…………………………………..G G……………………………………..G G………………………………………………………...C G…………………………………………………………...C G…………………………………………………….……..T A……………………………A T…………………………….A T…………………………….T A…………………………………….A T……………………………………..T A………………………………................A T.................................................A T.................................................T T...........................................................T A………………….A T…………………..A A……………………..……A A........................................A A.................................................T T..................................................T C.......................G C....................G A...........A T............A T............T A.....................A T......................T A...............................A T................................T C...........C T...........T G….......G G…..........G T……………………………………..A T…………………………………..…...G T………………………………………….……….A T………………………………………...........................C C................................................................C A.......................................A T.......................................A A................................................A A................................................T T…………………………………..………….A T.................................................T C............................................................C T............................................................T T..................................A A..........................A T..........................A A....................................A C.................G G.......................................G A.................T T.................T A……………….……….A C……………………………………C T…………………………………………………………………….C A....................................A T....................................A A..............................................A T..............................................A C.....................................................C A.........................................................A A……………………………………………...........T A.............................................................C A........................A T........................A A..................................A T...................................T A.............................................A T..............................................T G..........................................T T.............A A.......................A A.......................T A.................................T T................................A A.........................................C G.....................................C Correlazioni tra l’inizio e la fine degli introni umani
dovrei compiere al massimo 49.995.000 allineamenti Elaborazioni in corso al CASPUR tramite ClustalW multiprocessore e programmazione multithreading… per ripetere le analisi su un insieme di geni con minore ridondanza Seq ridondanti Seq 1 Seq 2 Seq 3 Seq 4 Seq 5 … Seq N Seq NON ridondanti Seq ridondanti - Seq 2 Seq 3 Seq 4 Seq 5 … Seq N Seq NON Ridondanti Seq 1 Seq ridondanti - Seq 3 Seq 4 Seq 5 … Seq N Seq NON Ridondanti Seq 1 Seq 2 Seq ridondanti - Seq 4 Seq 5 … Seq N Seq NON Ridondanti Seq 1 Seq 2 Seq 3 Seq ridondanti - Seq 5 … Seq N Seq NON Ridondanti Seq 1 Seq 2 Seq 3 Seq 4 Partendo da un insieme di 10.000 sequenze, se non effettuo nessun pruning, dovrei compiere al massimo 49.995.000 allineamenti
Altri lavori pubblicati o accettati nel 2010: Piva F, Giulietti M, Nardi B, Bellantuono C, Principato G. An improved in silico selection of phenotype affecting polymorphisms in SLC6A4, HTR1A and HTR2A genes. Human Psychopharmacology 2010; 25: 153-61. Piva F, Ciaprini F, Onorati F, Benedetti M, Fattorini D, Ausili A, Regoli F Assessing sediment hazard through a Weight Of Evidence approach with bioindicator organisms: a practical model to elaborate data from sediment chemistry, bioavailability, biomarkers and ecotoxicological bioassays Chemosphere 2010 accepted Bianchi F, Raponi M, Piva F, Viel A, Bearzi I, Galizia E, Bracci R, Belvederesi L, Loretelli C, Brugiati C, Corradini F, Baralle D, Cellerino R. An intronic mutation in MLH1 associated with familial colon and breast cancer. Familial Cancer 2010 published Nardi B, Turchi C, Piva F, Giulietti M, Castellucci G, Arimatea E, Rocchetti D, Rocchetti G, Principato G, Tagliabracci A, Bellantuono C Searching for a relationship between the Serotonin Receptor 2A Gene variations and the development of Inward and Outward Personal Meaning Organisations Psychiatric Genetics 2010 accepted Lavori inviati nel 2010: Piva F, Giulietti M, Ballone Burini A, Principato G SpliceAid 2: a database of human splicing factors expression data and RNA target motifs Piva F, Giulietti M, Baldelli L, Nardi B, Bellantuono C, Armeni T, Saccucci F, Principato G Bioinformatic analyses to select phenotype affecting polymorphisms in HTR2C gene Piva F, Giulietti M, Principato G CLIP data to detect polymorphisms lying in splicing regulatory motifs: a method to refine SNP selection in association studies Turchi C, Piva F, Solito G, Principato G, Buscemi L, Tagliabracci A ADH4 intronic variations are associated with alcohol dependence: results from an Italian case-control association study Lenzi L, Facchin F, Piva F, Giulietti M, Pelleri MC, Frabetti F, Vitale L, Casadei R, Canaider S, Bortoluzzi S, Coppe A, Danieli GA, Principato G, Ferrari S, Strippoli P TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources Facchin F, …, Piva F,.... Complexity of bidirectional transcription and alternative splicing at human RCAN3 locus
Giovanni Principato Francesco Piva Matteo Giulietti