Next Generation Sequencing

Slides:



Advertisements
Presentazioni simili
Sfogliandomi… Viaggio tra me e me alla scoperta dellaltro… A travel between me and myself discovering the other…
Advertisements

Preposizioni semplici e articolate
Centro Internazionale per gli Antiparassitari e la Prevenzione Sanitaria Azienda Ospedaliera Luigi Sacco - Milano WP4: Cumulative Assessment Group refinement.
I numeri, l’ora, I giorni della settimana
Giovanni Falcone & Paolo Borsellino.
L’esperienza di un valutatore nell’ambito del VII FP Valter Sergo
Divisione in gruppi di tre persone
Cache Memory Prof. G. Nicosia University of Catania
Prof. Stefano Bistarelli
Teoria e Tecniche del Riconoscimento
Licia Laurino and Angelo P. Dei Tos
Relaunching eLene Who are we now and which are our interests.
EUROPA TERRA DI MISSIONE. Flags of Europe This page contains flags and maps for Europe and its countries. European Union MAPS Europe 1 -- no flag references.
EBRCN General Meeting, Paris, 28-29/11/20021 WP4 Analysis of non-EBRCN databases and network services of interest to BRCs Current status Paolo Romano Questa.
Obiettivo: To be confident when describing yourself and others
DG Ricerca Ambientale e Sviluppo FIRMS' FUNDING SCHEMES AND ENVIRONMENTAL PURPOSES IN THE EU STRUCTURAL FUNDS (Monitoring of environmental firms funding.
The lac operon gal operon Glucose-1-phosphate
Grammar Tips. Meanings of verbs in the present May describe things that are continuing over a period of time.
© and ® 2011 Vista Higher Learning, Inc.4B.1-1 Punto di partenza Italian uses two principal tenses to talk about events in the past: the passato prossimo.
Cancer Pain Management Guidelines
© and ® 2011 Vista Higher Learning, Inc.4B.2-1 Punto di partenza The verbs conoscere and sapere both mean to know. The choice of verb depends on its context.
Raffaele Cirullo Head of New Media Seconda Giornata italiana della statistica Aziende e bigdata.
Biometry to enhance smart card security (MOC using TOC protocol)
TIPOLOGIA DELLE VARIABILI SPERIMENTALI: Variabili nominali Variabili quantali Variabili semi-quantitative Variabili quantitative.
2000 Prentice Hall, Inc. All rights reserved. 1 Capitolo 3 - Functions Outline 3.1Introduction 3.2Program Components in C++ 3.3Math Library Functions 3.4Functions.
Magnetochimica AA Marco Ruzzi Marina Brustolon
Watson et al. , BIOLOGIA MOLECOLARE DEL GENE, Zanichelli editore S. p
Chistmas is the most loved holiday of the years. Adults and children look forward to Chistmas and its magical atmosphere. It is traditional to decorate.
LOMBARDY Lombardy is the region where we live and where our school is. Here you can find mountains in the North, flat lands in the South, lakes like Maggiore.
HERES OUR SCHOOL.. 32 years ago this huge palace was built and it was just the beginning; It is becoming larger and larger as a lot of students choose.
Le regole Giocatori: da 2 a 10, anche a coppie o a squadre Scopo del gioco: scartare tutte le carte per primi Si gioca con 108 carte: 18 carte.
Players: 3 to 10, or teams. Aim of the game: find a name, starting with a specific letter, for each category. You need: internet connection laptop.
STAGE IN LINGUA INGLESE ISIS GREENWICH SCHOOL OF ENGLISH GREENWICH Data: dal al Studenti delle II-III-IV classi Docenti coordinatori:
Alcuni, qualche, un po’ di
TELEFONO CELLULLARE E SACRA BIBBIA CELLULAR PHONE AND HOLY BIBLE.
Guardate le seguenti due frasi:
Italian Regular Verbs Italian Regular Verbs Regular or irregular?? Italian verbs are either regular or irregular. Italian irregular verbs MUST be memorized…
Motor Sizing.
My Italian Experience By Ryan Davidson. My daily routine in Urbino If there was no field trip in the morning, my daily routine in Urbino was very basic.
MERRY CHRISTMAS ! BUON NATALE ! Veniva nel mondo la luce vera, quella che illumina ogni uomo. (Gv 1,9) For the Light was coming into the world,the true.
Present Perfect.
Collection & Generics in Java
EMPOWERMENT OF VULNERABLE PEOPLE An integrated project.
The Beatles. Love, love, Love. Love, Love, Love. Love, Love, Love. There's nothing you can do that can't be done. Nothing you can sing that can't be sung.
Regolazione della traduzione generale specifica.
Stefano Rufini Tel
Teorie e tecniche della Comunicazione di massa Lezione 7 – 14 maggio 2014.
You’ve got a friend in me!
UITA Genève ottobre Comitè du Groupe Professionnel UITA Genève octobre 2003 Trade Union and Tour.
Warehousing Market 25 March 2014 Elena Di Biase. Contesto L’economia europea continua a mostrare segnali di ripresa e gli indicatori economici di fiducia.
A PEACEFUL BRIDGE BETWEEN THE CULTURES TROUGH OLYMPICS OLYMPIC CREED: the most significant thing in the olympic games is not to win but to take part OLYMPIC.
La DNA Polimerasi può commettere errori Nei batteri: 1 errore ogni 10 9 basi in ogni generazione.
Passato Prossimo. What is it?  Passato Prossimo is a past tense and it is equivalent to our:  “ed” as in she studied  Or “has” + “ed” as in she has.
Saluti ed espressioni Greetings in Italian.
Lezione n°27 Università degli Studi Roma Tre – Dipartimento di Ingegneria Corso di Teoria e Progetto di Ponti – A/A Dott. Ing. Fabrizio Paolacci.
Italian 1 -- Capitolo 2 -- Strutture
Ratifica dei trattati internazionali - Italia Art. 87 Costituzione “Il Presidente della Repubblica…ratifica i trattati internazionali, previa, quando occorra,
Bioinformatic Analysis of Chromatin Genomic Data
CESANELLA PRIMARY SCHOOL SENIGALLIA - ITALY PLAYGROUND GAMES ELASTIC -ELASTIC -ELASTIC -ELASTIC -ELASTIC - ELASTIC - ELASTIC - ELASTIC - ELASTIC - ELASTIC.
Well and Truly by Roni Horn. Mind map Artist’s name Techniques Life Groupworks Artworks My opinion Her message My artwork inspiried by…
Il principio della ChIP: arricchimento selettivo della frazione di cromatina contenente una specifica proteina La ChIP può anche esser considerata.
What time does the plane leave? At 12:45 1.
Next Generation Sequencing Giulio Pavesi University of Milano
Do You Want To Pass Actual Exam in 1 st Attempt?.
Oggi è giovedì il dodici settembre 2013
La Grammatica Italiana Avanti! p
Fitness-Associated Sexual Reproduction in a Filamentous Fungus
Gülüm Kosova, Nicole M. Scott, Craig Niederberger, Gail S
Transcript della presentazione:

Next Generation Sequencing Giulio Pavesi University of Milano giulio.pavesi@unimi.it

Next generation sequencing vs Sanger sequencing http://en.wikipedia.org/wiki/DNA_sequencing

Next Generation Sequencing Applicazioni: Sequenziamento de novo di genomi Risequenziamento di genomi per identificazione di varianti Metagenomica Sequenziamento e quantificazione di trascrittomi Sequenziamento di “campioni” di DNA/RNA (estratti secondo diversi criteri)

“Epigenetica” L'epigenetica (dal greco επί, epì = "sopra" e γεννετικός, gennetikòs = "relativo all'eredità familiare") si riferisce a quei cambiamenti che influenzano il fenotipo senza alterare il genotipo, ed è una branca della genetica che descrive tutte quelle modificazioni ereditabili che variano l’espressione genica pur non alterando la sequenza del DNA Che cosa c’entra il sequenziamento del DNA con qualcosa che *non* riguarda la sequenza del DNA?!?!?!

“Nucleosome” The nucleosome core particle consists of approximately 147 base pairs of DNA wrapped in 1.67 left-handed superhelical turns around a histone octamer Octamer: 2 copies each of the core histones H2A, H2B, H3, and H4 Core particles are connected by stretches of "linker DNA", which can be up to about 80 bp long

The histone code Example H3K4me3 H3 is the histone K4 is the residue that is modified and its position (K lysine in position 4 of the sequence) me3 is the modification (three-methyl groups attached to K4) If no number at the end like in H3K9ac means only one group

Different chromatin states Chromatin structure (and thus, gene expression) depend also on the post-translational modifications associated with histones forming nuclesomes

“ChIP” If we have the “right” antibody, we can extract (“immunoprecipitate”) from living cells the protein of interest bound to the DNA And - we can try to identify which were the DNA regions bound by the protein Can be done for transcription factors But can be done also for histones - and separately for each modification

ChIP-Seq Histone ChIP TF ChIP

Many cells- many copies of the same region bound by the protein

After ChIP Size selection: only fragments of the “right size” (200 bp) are kept Identification of the DNA fragment bound by the protein Sequencing

So - if we found that a region has been sequenced many times, then we can suppose that it was bound by the protein, but…

Only a short fragment of the extracted DNA region can be sequenced, at either or both ends (“single” vs “paired end” sequencing) for no more than 35 (before) / 50 (yesterday) / 100 (now) bps Thus, original regions have to be “reconstructed”

Read Mapping Each sequence read has to be assigned to its original position in the genome A typical ChIP-Seq experiment produces from 6 (before) to 100 million (now) reads of 50-70 and more base pairs for each sequencing “lane” (Solexa/Illumina) There exist efficient “sequence mappers” against the genome for NGS read

Read Mapping “Typical” Output @12_10_2007_SequencingRun_3_1_119_647 (actual sequence) TTTGAATATATTGAGAAAATATGACCATTTTT +12_10_2007_SequencingRun_3_1_119_647 (“quality” scores) 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 39 27 40 40 4 27 40

“Peak finding” The critical part of any ChIP-Seq analysis is the identification of the genomic regions that produced a significantly high number of sequence reads, corresponding to the region where the protein (nucleosome) of interest was bound to DNA Since a graphical visualization of the “piling” of read mapping on the genome produces a “peak” in correspondence of these regions, the problem is often referred to as “peak finding” A “peak” then marks the region that was enriched in the original DNA sample

“Peak finding” Peaks: How tall? How wide? How much enriched?

“Peak finding” The main issue: the DNA sample sequenced (apart from sequencing errors/artifacts) contains a lot of “noise” Sample “contamination” - the DNA of the PhD student performing the experiment DNA shearing is not uniform: open chromatin regions tend to be fragmented more easily and thus are more likely to be sequenced Repetitive sequences might be artificially enriched due to inaccuracies in genome assembly Amplification pushed too much: you see a single DNA fragment amplified, not enriched As yet unknown problems, that anyway seem to produce “noisy” sequencings and screw the experiment up

ChIP-Seq histone data Histone modifications tend to be located at preferred locations with respect to gene annotations/transcribed regions Hence, enrichment can be assessed in two ways Enrichment with respect a the control experiment and peak identification “Local” enrichment in given regions with respect to gene annotations Promoters (active/non active) Upstream of transcribed/non transcribed genes Within transcribed/not transcribed regions Enhancers, whatever else

Esperimento Eseguire una ChIP-Seq per diverse modificazioni istoniche, partendo da quelle più “classiche” Verificare: Se ciascuna modifica ha una sua localizzazione “preferenziale” sul genoma o rispetto ai geni (es. nel promotore, nella regione trascritta, etc.) Se ciascuna modifica è “correlata” in qualche modo alla trascrizione/espressione dei geni

Genome wide histone modifications maps through ChIP-Seq Barski et.al - Cell 129 823-837, 2007 20 histone lysine and arginine methylations in CD4+ T cells H3K27 H3K9 H3K36 H3K79 H3R2 H4K20 H4R3 H2BK5 Plus: Pol II binding H2A.Z (replaces H2A in some nucleosomes) insulator-binding protein (CTCF)

Genome wide histone modifications maps through ChIP-Seq

Esperimento ChIP-Seq associata a una particolare modificazione (es, H3K4me3) Domanda: la modificazione è “correlabile” alla trascrizione dei geni? Ovvero, la modificazione “marca” particolari nucleosomi rispetto all’inizio della trascrizione, o alla regione trascritta Esempio: potrebbero esserci modificazioni che: Marcano l’inizio della trascrizione Marcano tutta e solo la regione trascritta “Silenziano” particolari loci genici impedendo la trascrizione Non c’entrano nulla con la trascrizione vera e propria e sono localizzate altrove

Esperimento Sequenze ottenute da ChIP-Seq per la modificazione studiata Input: coordinate genomiche delle posizioni in ciascuna delle sequenze mappa (vedi file di esempio) Input: coordinate genomiche dei geni RefSeq annotati Un nucleosoma marcato dalla modificazione dovrebbe corrispondere a un “mucchietto” di read che si sovrappongono (“picco”) Andiamo a contare, nucleosoma per nucleosoma, quanto alto è il “mucchietto”, ovvero quanti read sono associabili al nucleosoma

Esempio: se si trovasse la modifica nel nucleosoma a monte del TSS dei geni trascritti, troveremmo un “mucchietto” così Modificazione Nucleosoma

Esempio: se si trovasse la modifica nei nucleosomi associati alle regioni trascritte, troveremmo “mucchietti” così Modificazione Nucleosoma

“Inizi della trascrizione” Tecniche di laboratorio come il “CAGE” (Cap-Analysis-Gene-Expression) permettono: L’esatta mappatura del 5’ degli RNA sul genoma, ovvero localizzare gli esatti TSS Quantificare il livello di trascritto prodotto a partire da ciascuno del TSS identificati Poiché cerchiamo la precisa localizzazione delle modifiche istoniche rispetto ai TSS, è importante localizzare anche i TSS con precisione

Analisi: primo esempio Input Lista ordinata delle coordinate genomiche dei TSS , con relativo livello di trascritto Lista ordinata delle coordinate genomiche dove mappa ciascuna sequenza della ChIP-Seq Output: calcolare la distribuzione (i “mucchietti”) rispetto ai TSS Suddividere i TSS sulla base del livello di trascritto: Geni trascritti Geni (poco trascritti) Geni NON trascritti E verificare se ci sono differenze evidenti a seconda del fatto che il TSS sia effettivamente trascritto o meno Confrontare i risultati della modifica istonica con un esperimento di controllo

Algoritmo! -1000 +1000 TSS Dato ciascun TSS, calcolare quante sequenze mappano tra -1000 e +1000 bp rispetto al TSS Contare quante sequenze mappano a -1000, -999, -998...-1,0 +1,+2,...+998,+999,+1000 Sommare per tutti i TSS i conteggi a ciascuna distanza (-1000, -999, -998,...,-1,0,+1,+2,...+998,+999,+1000)

Attenzione! -1000 +1000 TSS +1000 -1000 TSS Le coordinate rispetto al TSS dipendono dalla direzione della trascrizione!!

Output: histone modifications at TSS Read count (peak height) -1000 +1000 Distance from TSS

Output: histone modifications at TSS Read count (peak height) -1000 +1000 Distance from TSS

PolII is found bound to DNA at the TSS of transcribed genes

H3K4me3 is found just before and after the TSS of transcribed genes

H3K4me2 (not me3!) is found just before and after the TSS of transcribed genes, but farther away than H3K4me3

H3K4me1 is found just before and after the TSS of transcribed genes, but farther away than H3K4me3 and H3K4me2

H3K27me3 covers the whole locus of “silent” genes - no transcription here

H3K27me1 (not me3!) is vice versa associated before and after loci of transcribed genes

H3K36me3 is found within the transcribed region - a bit downstream of the TSS - as if it “lets” polymerase proceed with transcription

H3K9me1 is similar in profile to H3K4me3

Barski et. al. High-Resolution Profiling of Histone Methylations in the Human Genome, Cell 129(4)

Histone modifications at transcribed regions Read count (peak height) High Low Expression level

Top: profiles for nine chromatin marks (greyscale) are shown across the WLS gene in four cell types, and summarized in a single chromatin state annotation track for each (coloured according to b). WLS is poised in ESCs, repressed in GM12878 and transcribed in HUVEC and NHLF. Its TSS switches accordingly between poised (purple), repressed (grey) and active (red) promoter states; enhancer regions within the gene body become activated (orange, yellow); and its gene body changes from low signal (white) to transcribed (green). These chromatin state changes summarize coordinated changes in many chromatin marks; for example, H3K27me3, H3K4me3 and H3K4me2 jointly mark a poised promoter, whereas loss of H3K27me3 and gain of H3K27ac and H3K9ac mark promoter activation. WCE, whole-cell extract. Bottom: nine chromatin state tracks, one per cell type, in a 900-kb region centred at WLS, summarizing 90 chromatin tracks in directly interpretable dynamic annotations and showing activation and repression patterns for six genes and hundreds of regulatory regions, including enhancer states. b, Chromatin states learned jointly across cell types by a multivariate hidden Markov model. The table shows emission parameters learned de novo on the basis of genome-wide recurrent combinations of chromatin marks. Each entry denotes the frequency with which a given mark is found at genomic positions corresponding to the chromatin state. c, Genome coverage, functional enrichments and candidate annotations for each chromatin state. Blue shading indicates intensity, scaled by column. CNV, copy number variation; GM, GM12878. d, Box plots depicting enhancer activity for predicted regulatory elements. Sequences 250 bp long corresponding either to strong or weak/poised HepG2 enhancer elements or to GM12878-specific strong enhancer elements were inserted upstream of a luciferase gene and transfected into HepG2. Reporter activity was measured in relative light units. Robust activity is seen for strong enhancers in the matched cell type, but not for weak/poised enhancers or for strong enhancers specific to a different cell type. Boxes indicate 25th, 50th and 75th percentiles, and whiskers indicate 5th and 95th percentiles.