(Laboratorio di ) Sistemi Informatici Avanzati

(Laboratorio di ) Sistemi Informatici Avanzati
Giuseppe Manco

Modelli matematici

Qual’è il modo più semplice di generare un grafo?
Erdos-Renyi Random Graph model [Erdos-Renyi, ’60] Due varianti: Gn,p Grafo con n nodi, in cui un arco (u,v) appare con probabilità p Gn,m Grafo con n nodi, con m archi scelti in maniera random uniforme

p=1/6 N=12

p=0.03 N=100 Does look like a real network, does it not?

Modello N=10 p=1/6

Grafo random Probabilità di Gn,p:
BERNOULLI Che tipo di grafo produce un simile processo Bernoulliano?

Distribuzione Binomiale/Poisson
Probabilità che ci siano esattamente m archi

Dalla Binomiale a Poisson…
Probabilità di avere m successi Valore medio Varianza

Dalla Binomiale a Poisson
Probabilità di avere m successi Se M è grande…

Dalla Binomiale a Poisson
Mettendo tutto assieme Mp è la media Distribuzione di Poisson

Grafo Random La degree distribution è binomiale (Poissoniana)

Degree distribution Probabilità che N-1-k archi siano assenti K nodi dei possibili N-1 Probabilità di avere k archi All’ingrandirsi della rete, la distribuzione si restringe – si schiaccia sul valore di <k>.

P(k) Note that the exact result for the degree distribution in the binomial form (\ref{P_K}) and (\ref{E-RG-Poisson}) represents only an approximation to (\ref{P_K}) for large $N$. Yet, for $k<<N$, there is no fundamental difference between (\ref{P_K}) and (\ref{E-RG-Poisson}), the Poisson form (\ref{E-RG-Poisson}) simply represents a different way to write (\ref{P_K}) in this limit. The advantage of the Poisson form is that (\ref{E-RG-Poisson}) it does not explicitly depend on the number of nodes $N$. Therefore, it predicts us that the degree distributions of random networks with the same average degree, $\langle k \rangle$, but different sizes, are indistinguishable from each other. This is illustrated in Figure \ref{F-RG-PoissonNDependence}, where we show the degree distribution of several random networks of different sizes, $N$. The figure indicates that while for small $N$ there are differences between the numerically obtained $p_k$ and (\ref{E-RG-Poisson}), for large $N$ the differences vanish and the degree distribution becomes independent of the system size. \\ k Network Science: Random Graphs 2012

Risultato esatto N grande
-binomial distribution- N grande -Poisson distribution- Probability Distribution Function (PDF)

I nodi hanno gradi comparabili nelle reti random
Nel continuo: Una rete con grado medio <k> ha probabilità che un nodo ecceda k0: Ad esempio, con <k>=10, La probabilità che un nodo abbia grado almeno 20 è La probabilità che un nodo abbia grado almeno 100 è × 10-13 La probabilità che un nodo abbia grado inferiore a un decimo è Del discreto: La probabilità di vedere un nodo con degree molto alto o molto basso è esponenzialmente bassa La maggior parte dei nodi ha grado comparabile Quanto più la rete è ampia, tanto più i gradi sono comparabili

Random networks, social networks
Sulla base di una ricerca sociologica, k ~1,000 La probabilità di trovare un individuo con k> 2,000 è 10-27 Una società random consisterebbe essenzialmente di persone con lo stesso numero di amici No outliers How different are really the node degrees in a random network? To get a feel for this difference, let us assume that the random network model is a good model for social networks. According to sociological research, a typical person knows about 1,000 individuals on a first name basis, so we take $\langle k \rangle \approx 1000$ and estimate the likelihood to observe nodes with degrees that are significantly different from $k = 1000$. Using Eq. (\ref{E-RG-Poisson} we find that the probability to find an individual with degree $k>2000$, i.e. one that has at least twice as many friends as the average person, is $\approx 10^{-27}$. Given that the Earth's population is about $10^9$, the chance of finding an individual with 2,000 acquaintances is so small that such nodes are virtually inexistent in a random society. In other words, highly connected nodes are practically forbidden in a random network. That is, a random society would consist of mainly average individuals, where everyone has roughly the same number of friends. It would lack outliers, individuals that are either highly popular or recluse.

Evoluzione in un grafo random
Imagine organizing a party for a hundred guests who do not know each other. Offer them wine and cheese, and soon you will see thirty to forty chatting groups of two to three. Now mention to a guest that the red wine in the unlabeled dark green bottles is a rare vintage, better than that with the red label. Ask him to share this information only with his acquaintances and you know that your expensive wine is fairly safe, because your guest has only had time to meet two or three people in the room. However, inevitably the guests will mix, joining other groups and with subtle invisible paths will start connecting people that may still be strangers to each other. For example, while John has not met Mary yet, they have both met Mike, and so there is an invisible path from John to Mary through Mike. As time goes on, the guests will be increasingly interwoven by such intangible links. With that the identity of the expensive wine moves from a tiny group of insiders to more and more chatting groups. To be sure, when all guests had gotten to know each other, everybody would be pouring the superior wine. But if each encounter took only ten minutes, meeting all ninety-nine others would take about sixteen hours. Thus, you could reasonably hope that a few drops of the better wine would be left at the end of the party. Yet, you would not be more wrong and the purpose of this chapter is to show why. We will see that this problem maps into a classic problem in network science, leading us to the concept of random networks. In turn random network theory will tell us that we do not have to wait until all individuals get to know each other to endanger our expensive wine. Rather, after each person meets at least \textit{one} other guest, you may find yourself tipping an empty bottle into your expectant glass as everybody could be drinking the reserve wine.

Nodi disconnessi  NETWORK.
Come avviene la transizione?

Transizione di fase Denotiamo con u=1-Ng/N, la frazione di nodi che non siano parte di una componente gigante Ng Un nodo i fa parte della GC connettendosi ad un altro nodo j La non appartenenza può avvenire per due motivi i non si connette a j (prob 1-p) i è connesso a j, ma j non fa parte di GC (prob pu) In totale, la probabilità è 1-p +pu Poiché i può collegarsi a N-1 nodi, Size di GC

evoluzione Sostituendo p=<k>/(N-1) e con manipolazioni algebriche otteniamo Esponenziando Denotando con S la frazione di nodi in GC (S=Ng/N)

Con S=0, otteniamo <k>=1
Punto di transizione: (a) (b) Graphical solution for the size of the giant component.} (a) The three curves in the left panel show $y = 1e^{-\langle k \rangle S}$ for values of <k > as marked, the diagonal dashed line shows $y = S$, and the intersection gives the solution to Eq. (\ref{S}), $S = 1e^{-\langle k \rangle S}$. For the bottom curve there is only one intersection, at $S = 0$, so there is no giant component, while for the top curve there is a solution at $S = 0.583$... (vertical dashed line). The middle curve is precisely at the threshold between the regime where a non-trivial solution for $S$ exists and the regime where there is only the trivial solution $S = 0$. (b) The resulting solution for the size of the giant component as a function of $\langle k \rangle$ as predicted by Eq. (\ref{S}) (After Newman)

Conclusione Quanti nodi devono essere aggiunti per vedere GC?
<k> Size di GC Quanti nodi devono essere aggiunti per vedere GC? Quando <k>= 1, la componente compare <k>= 3.96 <k>= 0.99 <k>= 1.18

Coefficiente di clustering
Poiché gli archi sono indipendenti e hanno probabilità p Il coefficiente di clustering è basso nei grafi random

Small world Topologia tree-like Neighbors al livello 1: <k>
… Neighbord al livello d: <k>d

N L <k> (Sorgente: : The structure and function of complex networks, M. E. J. Newman, SIAM Review 45, (2003) ,

Riassumendo Il grafo random può essere esprime le seguenti caratteristiche Path medio Clustering coefficient Degree distribution Come sono i grafi reali?

Path medio Predizione: Dati reali:

Clustering coefficient
Predizione: Dati reali:

Degree distribution Predizione: Dati reali: Internet; Movie Actors;
Coauthorship, high energy physics; (d) Coauthorship, neuroscience

Watts-Strogatz model Riconcilia due osservazioni High clustering
Gli amici dei miei amici sono miei amici Cammino geodesico medio corto Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:

Watts-Strogatz model Base di partenza: il reticolo
Ogni coppia di vertici separata da un cammino di dimensione al più k

Watts-Strogatz model Selezioniamo una frazione p
di archi dal reticolo e Riposizioniamo i vertici Aggiungiamo I vertici in Maniera random Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:

Watts-Strogatz model p=0 p=1 0.001 < p< 0.01 Reticolo
Grafo random 0.001 < p< 0.01 Transitività alta Cammino medio corto

Geographic Models I nodi sono posizionati in un reticolo e connessi ai suoi vicini più vicini Connessioni aggiuntive in accordo alla legge Kleinberg, ‘Navigation in a small World, Nature, 2000

Con r=0, i links sono distribuiti in maniera random
Con r<2, il cammino medio è ~N(2-r)/3

Con r>2 il cammino medio è ~ N(r-2)/(r-1)

Con r=2, il cammino è ~ (log N)2

Degree-distribution Niente power-law

Le reti reali

Random vs Scale-free Binomial distribution Power-law distribution

Preferential attachment
Introdotto in [Price 65] per le reti di citazioni Ogni nuovo articolo è generato con m citazioni in media I nuovi articoli citano I vecchi con probabilità proporzionale al loro in-degree (numero di citazioni che già hanno) Ogni articolo ha un numero “default” di citazioni La probabilità di citare un articolo con grado k è proporzionale a k+1 “I ricchi diventano sempre più ricchi” Power law con esponente α = 2+1/m Probabilità di collegarsi al nodo i-esimo

Barabasi-Albert model
Modello semplice Si considera un insieme iniziale di m0 nodi connessi Es. m0 = 3 Aggiungi i nodi uno alla volta, con m archi ognuno Ogni nuovo arco si connette ad un nodo esistente in proporzione al unmero di archi che quel nodo ha già preferential attachment 3 1 2 …. Source: Barabási & Albert, Science 286, 509 (1999)

Barabasi-Albert model
3 1 2 Ogni nodo ha lo stesso numero di archi(2) Probabilità 1/3 Un nuovo nodo con m=2 Peschiamo random due nodi, es. 2 e 3 Probabilità di selezione per 1,2,3,e 4 diventano 1/5, 3/10, 3/10, 1/5 Aggiungi un nuovo nodo, connettilo in maniera analoga etc. 3 4 1 2 5 3 1 2 4

Proprietà La distribuzione è power law con esponente α = 3
Il grafo è connesso Ogni nodo nasce con un link (m= 1) o con molti link (m > 1) Si connette ai vertici più vecchi, che sono parte della componente gigante I vecchi sono più ricchi I nodi accumulano links

Cammino Medio nei modelli PA
Nei primi due casi, ci sono grandi hubs per cui ogni nodo è connesso a tutti gli altri tramite questi hub con un cammino lungo circa due Negli ultimi due casi il cammino medio ha valori simili a quelli di un grafo random Riferimenti Cohen, Havlin Phys. Rev. Lett. 90, 58701(2003); Cohen, Havlin and ben-Avraham, in Handbook of Graphs and Networks, Eds. Bornholdt and Shuster (Willy-VCH, NY, 2002) Chap. 4; Confirmed also by: Dorogovtsev et al (2002), Chung and Lu (2002); (Bollobas, Riordan, 2002; Bollobas, 1985; Newman, 2001

BA Model e Clustering Coefficient
Andamento simile al grafo random

Preferential attachment nel mondo reale
4 reti sociali osservate in un arco temporale Rete Tempo N L Flickr (F) 621 584,207 3,554,130 Delicious (D) 292 203,234 430,707 Answers (A) 121 598,314 1,834,217 LinkedIn (L) 1294 7,550,955 30,682,028

Preferential networks
Rete τ Flickr (F) 1 Delicious (D) Answers (A) 0.9 LinkedIn (L) 0.6 PA Gn,p

Conseguenze: resilience
Le reti reali sono resistenti ad attacchi random Andrebbero rimosse tutte le pagine di grado > 5 per disconnettere il web Una piccola percentuale Le reti random resistono meglio ad attacchi mirati

Conseguenze: Web Search
Poiché il Web è scale-free (e non random) gli outliers (pagine ad alto grado) sono comuni Il ranking basato sulla struttura funziona bene: PageRank Hubs, Authorities

Sommario Modello <l> C P(k) Random Watts-Strogatz BA Exponential

(Laboratorio di ) Sistemi Informatici Avanzati

Presentazioni simili

Presentazione sul tema: "(Laboratorio di ) Sistemi Informatici Avanzati"— Transcript della presentazione:

Presentazioni simili

Sul progetto

Feed-back

Entrare

Autorizzarsi attraverso i social network:

(Laboratorio di ) Sistemi Informatici Avanzati

Presentazioni simili

Presentazione sul tema: "(Laboratorio di ) Sistemi Informatici Avanzati"— Transcript della presentazione:

Presentazioni simili

Sul progetto

Feed-back