Estrazione di informazioni da testo
Perchè occuparsene? E unapplicazione particolarmente complessa. Sfrutta la maggior parte delle risorse utilizzate in compiti di analisi. Il suo studio permette quindi di avere una buona panoramica delle problematiche e delle tecnologie utilizzate nellanalisi del linguaggio naturale.
Cosa è lEstrazione di Informazioni da Testo? Information retrieval (IR): cercare e informazioni in testi a fronte di richieste specifiche. Recupero di passaggi: cercare e trovare passaggi (paragrafi, frasi) allinterno di un testo che possano fornire risposte a determinati quesiti. Estrazione di informazioni (IE): trovare informazioni che possano riempire schemi (templates) predefiniti. Domanda-risposta (Question-answering): dare risposte a domande di tipo generale formulate da un utente: IE+IR Comprensione di testi: modellare la comprensione dei testi da parte di umani.
Tipo di domande IR Recupero di passaggi IE Domanda/risposta Comprensione dei testi Pre-definite. Aspetti fissi della informazione testuale
What is Information Extraction Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access. Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION
What is Information Extraction Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access. Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft.. IE
What is Information Extraction Information Extraction = segmentation + classification + clustering + association As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access. Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation aka named entity extraction
What is Information Extraction Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access. Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation
What is Information Extraction Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access. Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation
What is Information Extraction Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access. Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation NAME TITLE ORGANIZATION Bill Gates CEOMicrosoft Bill Veghte VP Microsoft RichardStallman founder Free Soft.. * * * *
Un esempio: FASTUS (1993) Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and metal wood clubs a month. TIE-UP-1 Relationship: TIE-UP Entities: Bridgestone Sport Co. a local concern a Japanese trading house Joint Venture Company: Bridgestone Sports Taiwan Co. Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: Bridgestone Sports Taiwan Co. Product: iron and metal wood clubs Start Date: DURING: January 1990
Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and metal wood clubs a month Un esempio: FASTUS (1993) TIE-UP-1 Relationship: TIE-UP Entities: Bridgestone Sport Co. a local concern a Japanese trading house Joint Venture Company: Bridgestone Sports Taiwan Co. Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: Bridgestone Sports Taiwan Co. Product: iron and metal wood clubs Start Date: DURING: January 1990
Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and metal wood clubs a month TIE-UP-1 Relationship: TIE-UP Entities: Bridgestone Sport Co. a local concern a Japanese trading house Joint Venture Company: Bridgestone Sports Taiwan Co. Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: Bridgestone Sports Taiwan Co. Product: iron and metal wood clubs Start Date: DURING: January 1990
Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and metal wood clubs a month. TIE-UP-1 Relationship: TIE-UP Entities: Bridgestone Sport Co. a local concern a Japanese trading house Joint Venture Company: Bridgestone Sports Taiwan Co. Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: Bridgestone Sports Taiwan Co. Product: iron and metal wood clubs Start Date: DURING: January 1990
Come funziona FASTUS 1.Parole complesse e nomi propri 2.Sintagmi semplici: nominali, verbali, particelle 3.Sintagmi complessi: 4.Eventi rilevanti Costruzione di semplici templates 5. Fusione di templates, nel caso Presentino informazioni sullo stesso evento set up new Twaiwan dollars a Japanese trading house had set up production of 20, 000 iron and metal wood clubs [company] [set up] [Joint-Venture] with [company]
Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be supplied to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990 with production of 20,000 iron and metal wood clubs a month. TIE-UP-1 Relationship: TIE-UP Entities: Bridgestone Sport Co. a local concern a Japanese trading house Joint Venture Company: Bridgestone Sports Taiwan Co. Activity: ACTIVITY-1 Amount: NT$ ACTIVITY-1 Activity: PRODUCTION Company: Bridgestone Sports Taiwan Co. Product: iron and metal wood clubs Start Date: DURING: January 1990
Altro esempio – un template sbagliato ………. Jurgen Pfrang, 51, reportedly stumbled upon the robbers on the second floor of his Nanjing home early on Sunday. The deputy general manager of Yaxing Benz, a Sino-German joint venture that makes buses and bus chassis in nearby Yangzhou, was hacked to death with 45 cm watermelon knives. ………. Name of the Venture: Yaxing Benz Products: buses and bus chassis Location: Yangzhou,China Companies involved: (1)Name: X? Country: German (2)Name: Y? Country: China Template sbagliato
Template giusto A German vehicle-firm executive was stabbed to death …. ………. Jurgen Pfrang, 51, reportedly stumbled upon the robbers on the second floor of his Nanjing home early on Sunday. The deputy general manager of Yaxing Benz, a Sino-German joint venture that makes buses and bus chassis in nearby Yangzhou, was hacked to death with 45 cm watermelon knives. ………. Crime-Type: Murder Type: Stabbing The killed: Name: Jurgen Pfrang Age: 51 Profession: Deputy general manager Location: Nanjing, China
Chi esegue linterpretazione? (1) IR (2) Recupero passaggi (3) IE (5) Comprensione testi (4) Domanda/risposta Utente Sistema
Insieme di testi Sistema di IR Caratterizzazione dei testi richiesta
Sistema di IR Caratterizzazione dei testi Richiesta interpretazione conoscenza Insieme di testi
Recupero passaggi IR Caratterizzazione dei testi richiesta Interpretazione conoscenza Insieme di testi
Caratterizzazione dei testi Queries Interpretazione conoscenza Sistema di IE testi template Elaborazione Linguaggio naturale Insieme di testi Recupero passaggi IR
Interpretazione conoscenza Sistema di IE testi Templates
Interpretazaione conoscenza IE Testi Templates Predefinito Approccio generale Allelaborazione/ Comprensione del LN IE: un approccio Pragmatico al NLP
(1)IR, (2) recupero passaggi (3) ie (5) Comprensione di testi (4) Domanda/Risposa Valutazione delle prestazioni Metodologia chiara Metodologia non chiara Metodologia chiara Metodologia abbastanza vaga Metodologia vaghissima
N N: documenti corretti M: documenti recuperati C: documenti recuperati che sono corretti M C domanda Insieme dei documenti Precision: Recall: C M C N F-Value: P R P+R 2P R
N N: Templates corretti M: Templates recuperati C: Templates corretti che sono stati recuperati M C domanda Insieme dei documenti Precision: Recall: C M C N F-Value: P R P+R 2P R Il tutto è più complicato per la Possibilità di template parzialmente riempiti