A C G T A T G G T T A AC T A G T T A G G A A T C G C G C A T T A T G T C C A C G T T A G G T T G A A C G G C A G G T T T A A A T C G A T T C C A C G T T A T G A A A T T G G G G C A G G T T T A A C G C G C C C
M V N S T P L K G Q Metionina Valina Asparagina STOP Serina Treonina A U G G UU A A C U A G UU A G G A A U C G C G C A U U A U G U C C A C G U Metionina Valina Asparagina STOP Serina Treonina Prolina Lisina Leucina Glicina Glutamina A C G U U A G G U U G A A C G G C A G G U U U A A A U C G A U U C C C A G U A C G U U A UG A A A U U G G G G C A G G U U U A A C G C G C C C
ATTACGGCCATGCGGAGCCGGAAG presente in ? algoritmo che richiede un numero di confronti pari alla lunghezza di
confronto approssimato di stringhe ALLINEAMENTO T G T A C G G A A T C G G A T C T C C G A C C A T C G G A 4 3 + = 7 T G - T A - C G G A - - A T C G G A T - C T - C C G - A C C A T C G G A T G C TAC C G G A C C A T C G G A
T G T A C G G A A T C G G A T C G A T C G A T G T A C G G A A T C G G A
4 3 + = 7 T - G T A C - G G A - - A T C G G A T C - T - C C - G A C C A T C G G A T G - T A - C G G A - - A T C G G A T - C T - C C G - A C C A T C G G A
cammino minimo quante operazioni ? N.B. : il numero di cammini è molto elevato impossibile la valutazione esplicita !
RICORSIONE ! +1 = min +1
ogni arco viene considerato esattamente una volta numero operazioni = numero archi = due sequenze di 1000 basi richiedono un milione di operazioni
Diverso modello: sostituzioni ammesse T G T A C G G A A T C G G A T C T C C G A C C A T C G G A 4 T G T A C G G A - - A T C G G A T C T C C G - A C C A T C G G A 2 2 8
T G T A C G G A A T C G G A T C G A 6 14
T G T A C G G A - A T C G G A T C T C C G A C C A T C G G A T G T A C G G A - - A T C G G A T C T C C G - A C C A T C G G A
ALLINEAMENTO MULTIPLO T G T A C G G A A T C G G A T C T C C G A C C A T C G G A A C T C A G A C A A T G A T G T A C G - G A A T C G G A T C T C C G A C C A T C G G A A C T C A G A C A A T - - G A
Numero confronti = prodotto lunghezze stringhe 3 stringhe lunghe 1000 un miliardo di operazioni !
? ATAGA CTAGA CTAGA ATGA CTGA AGGA ATGA TAGA CTGA ATGA TAGA TACA TAGA A G - G A - T A C A - T A G A C T - G A A T - G A A G - G A - T A C A - T A G A C T - G A A T - G A
AUGCCGAUUCAACGGUCCUACUCGGACUUUACC M P I Q R S Y S D F T M R I S R S D S D Y T punteggio (M<->M, P<-> R ...) basato sulle probabilità di mutazione
RICOSTRUZIONE DEI FRAMMENTI
ACGTTACG TTACGGAT CGGATTCA CGGCGATT AACAAGCTT CGGAATCG TTACCGGAT CGGTTAGG CGAATTAG TGGCGAA GGCCTTAA ACGACGAT GCATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TGTCGCGA CGCGCGAT GTGTAGAG CTTGATCT CGGATATA CGCGATAT TGTGAATA ACGTTACG TTACGAAT CGGATTCA CGGCGATT AACCAGCTT CGGAATCG TTACCGGAT CGGTTAGG CGAATTAG TGGCGAA AGCCTTAA ACGACGAT GCATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TGTCGCGA CGCGCGAT GTGCAGAG CTTGATCT CGGATATA CGCGATAT TGTGAATA ACGTTACG TTACGGAT CGGATTTA CGGCGATT AACAAGCTT CGGAATCG TTACCGGAT CGGTTAGG AGAATTAG TGGCGAA GGCCTTAA ACGACGAT GCATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TCTCGCGA CGCGCGAT GTGTAGAG CTTGATCT CGGATATA CGCGCTAT TGTGAATA ACATTACG TTACGGAT CGGATTCA CGGCGACT AACAAGCTT CGGAATCG TTACCGGAT CGGTTAAG CGAATTAG TGGCGAA GGCCTTAA ACGACGTT GCATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TGTCGCGA CGCGCGAT TTGTAGAG CTTGATCT CGGATATA CGCAATAT TGTGAATA ACGTTACG TTACTGAT CGGATTCA CGGCGATT AACAAGCGT CGGAATCG TTACCGGAT CGGTTAGG AGAATTAG TGGCGAA GGCCTTAA ACGACGAT GCATTGGA ATATCGAT CGCGCGAA TGTGCATA AACGGAC TGTCGCGA CGCGCGAT GTGTAGAG CTTGTTCT CGGATATA CGCGATAT TGTGAATA ACGTTACG TTACGGAT CGGATTCA CGGCAATT AACAAGCTT CGGAATAG TTACCGGAT CGGTTAGG CGAATTAG TGGCGAA GGCCTTAA ACGACGAT GTATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TGTCGCGA CGCTCGAT GTGTAGAG CTTGATCT AGGATATA CGCGATAT TGTGAATA
ACGTTACG TTACGGAT CGGATTCA CGGCGATT AACAAGCTT CGGAATCG TTACCGGAT CGGTTAGG CGAATTAG TGGCGAA GGCCTTAA ACGACGAT GCATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TGTCGCGA CGCGCGAT GTGTAGAG CTTGATCT CGGATATA CGCGATAT TGTGAATA ACGTTACG TTACGAAT CGGATTCA CGGCGATT AACCAGCTT CGGAATCG TTACCGGAT CGGTTAGG CGAATTAG TGGCGAA AGCCTTAA ACGACGAT GCATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TGTCGCGA CGCGCGAT GTGCAGAG CTTGATCT CGGATATA CGCGATAT TGTGAATA ACGTTACG TTACGGAT CGGATTTA CGGCGATT AACAAGCTT CGGAATCG TTACCGGAT CGGTTAGG AGAATTAG TGGCGAA GGCCTTAA ACGACGAT GCATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TCTCGCGA CGCGCGAT GTGTAGAG CTTGATCT CGGATATA CGCGCTAT TGTGAATA ACATTACG TTACGGAT CGGATTCA CGGCGACT AACAAGCTT CGGAATCG TTACCGGAT CGGTTAAG CGAATTAG TGGCGAA GGCCTTAA ACGACGTT GCATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TGTCGCGA CGCGCGAT TTGTAGAG CTTGATCT CGGATATA CGCAATAT TGTGAATA ACGTTACG TTACTGAT CGGATTCA CGGCGATT AACAAGCGT CGGAATCG TTACCGGAT CGGTTAGG AGAATTAG TGGCGAA GGCCTTAA ACGACGAT GCATTGGA ATATCGAT CGCGCGAA TGTGCATA AACGGAC TGTCGCGA CGCGCGAT GTGTAGAG CTTGTTCT CGGATATA CGCGATAT TGTGAATA ACGTTACG TTACGGAT CGGATTCA CGGCAATT AACAAGCTT CGGAATAG TTACCGGAT CGGTTAGG CGAATTAG TGGCGAA GGCCTTAA ACGACGAT GTATTCGA ATATCGAT CGCGCGAA TGTGCATA ACCGGAC TGTCGCGA CGCTCGAT GTGTAGAG CTTGATCT AGGATATA CGCGATAT TGTGAATA
ACCGT CGTGC TTAC TACCGT - - ACCGT - - - - - - CGTGC TTAC - - - - - - TACCGT - - TTACCGTGC TTAC - - - - - - TACCGT - - - - ACCGT - - - - - - CGTGC 1 + 2 = ___ 4
TAGG AGGT CGTC GTCG TAGG AGGT 1 TAGG AGGT 3 TAGG AGGT CGTC GTCG 4 1 3 4 4 2 1 2 4
TAGG 4 1 3 4 4 GTCG AGGT 2 1 4 2 4 4 CGTC CGTC - GTCG - - - - TAGG - AGGT CGTCGTAGGT lunghezza 10
TAGG 1 4 4 3 4 4 GTCG AGGT 2 TAGG 1 4 2 4 - AGGT 4 - - GTCG CGTC - - CGTC TAGGTCGTC lunghezza 9 CGTCGTAGGT
ALBERI FILOGENETICI A B C D E F
a b c d e A 1 B C D E F
00110 00010 00100 10010 00011 00101 00100 01011 00010 10011 10010
1 1 1 1
1 1 1 1
1 1 1 1
a b c d e A 1 B C D E F esiste un albero filogenetico perfetto con A,B,C,D,E,F nodi?
2 foglie A B A B C 3 foglie A B C 4 foglie 12 6 18 5 foglie 60 30 120
caratteri ordinati: solo 0 --> 1 ammesso problema facile a b c d e A B C D E F 1
a b c d e A 1 B C D E F
A B C D E F a b c d e 1 a b c d e E C 1 B F D A a b c d e C F B E A D
caratteri non ordinati (filogenia perfetta) B C D E F a b c d e 1 f g A B C D E F a b c d e 1 f g
1001011 1101011 1001010 0101011 1101011 1001010 1001010 E B 1101011 0101011 C 0100011 F 1101001 D A