Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Slides:



Advertisements
Presentazioni simili
Trieste, 26 novembre © 2005 – Renato Lukač Using OSS in Slovenian High Schools doc. dr. Renato Lukač LinuxDay Trieste.
Advertisements

LEGAL INFORMATION ON THE WEB: THE ITALIAN SITUATION
Anno Diaconale f Federazione delle Chiese Evangeliche in Italia ufficio volontariato internazionale via firenze 38, roma tel. (+39) fax.
Preposizioni semplici e articolate
Centro Internazionale per gli Antiparassitari e la Prevenzione Sanitaria Azienda Ospedaliera Luigi Sacco - Milano WP4: Cumulative Assessment Group refinement.
I giorni della settimana
I numeri, l’ora, I giorni della settimana
L’esperienza di un valutatore nell’ambito del VII FP Valter Sergo
Cache Memory Prof. G. Nicosia University of Catania
Teoria e Tecniche del Riconoscimento
1 Teaching Cloud Computing and Windows Azure in Academia Domenico Talia UNIVERSITA DELLA CALABRIA & ICAR-CNR Italy Faculty Days 2010.
A. Oppio, S. Mattia, A. Pandolfi, M. Ghellere ERES Conference 2010 Università Commerciale Luigi Bocconi Milan, june 2010 A Multidimensional and Participatory.
Relaunching eLene Who are we now and which are our interests.
EBRCN General Meeting, Paris, 28-29/11/20021 WP4 Analysis of non-EBRCN databases and network services of interest to BRCs Current status Paolo Romano Questa.
DG Ricerca Ambientale e Sviluppo FIRMS' FUNDING SCHEMES AND ENVIRONMENTAL PURPOSES IN THE EU STRUCTURAL FUNDS (Monitoring of environmental firms funding.
The Lexical Approach Caltanissetta settembre 2008.
Grammar Tips. Meanings of verbs in the present May describe things that are continuing over a period of time.
1.E un algoritmo ricorsivo: Tutti le istanze di oggetti raggiungibili da un oggetto persistente diventano anchessi persistenti.
piacere The verb to like does not have a direct equivalent in Italian.
© and ® 2011 Vista Higher Learning, Inc.4B.1-1 Punto di partenza Italian uses two principal tenses to talk about events in the past: the passato prossimo.
Cancer Pain Management Guidelines
Punto di partenza Reciprocal verbs are reflexives that express a shared or reciprocal action between two or more people or things. In English we often.
Il presente del congiuntivo (the present subjunctive)
Il presente del congiuntivo (the present subjunctive)
SOCIOLOGIA DEI PROCESSI CULTURALI E COMUNICATIVI Prof.ssa Donatella Padua A.A. 2011/12 A.A. 2011/12.
C Consiglio Nazionale delle Ricerche - Pisa Iit Istituto per lInformatica e la Telematica Reasoning about Secure Interoperation using Soft Constraints.
Biometry to enhance smart card security (MOC using TOC protocol)
Ergo : what is the source of EU-English? Standard British English? Standard American English? Both!!!! See morphology (use of British.
Fuoco vitale portions of art. Project Fuoco vitale The tast lab is born to stimulate the knowledge of the territory around the Vesuvio,with all its characteristics.
2000 Prentice Hall, Inc. All rights reserved. 1 Capitolo 3 - Functions Outline 3.1Introduction 3.2Program Components in C++ 3.3Math Library Functions 3.4Functions.
Magnetochimica AA Marco Ruzzi Marina Brustolon
Chistmas is the most loved holiday of the years. Adults and children look forward to Chistmas and its magical atmosphere. It is traditional to decorate.
National Project – on going results Potenza 7/10 November 06 IT-G2-SIC-066 – Social Enterprise and Local Development.
Players: 3 to 10, or teams. Aim of the game: find a name, starting with a specific letter, for each category. You need: internet connection laptop.
VII EBRCN GM, Berlin, 26-28/09/20041 EBRCN Site: current status Béatrice Dutertre Questa presentazione può essere utilizzata come traccia per una discussione.
25/09/2009 In un bar italiano Un ripasso Vocabolario Pagina 28.
PASTIS CNRSM, Brindisi – Italy Area Materiali e Processi per lAgroindustria Università degli Studi di Foggia, Italy Istituto di Produzioni e Preparazioni.
Formal Models for a Legislative Grammar. Explicit Text Amendment Andrea Bolioli, Pietro Mercatali, Francesco Romano CONSIGLIO NAZIONALE DELLE RICERCHE.
Alcuni, qualche, un po’ di
UNIVERSITÀ DEGLI STUDI DI PAVIA FACOLTÀ DI ECONOMIA, GIURISPRUDENZA, INGEGNERIA, LETTERE E FILOSOFIA, SCIENZE POLITICHE. Corso di Laurea Interfacoltà in.
Guardate le seguenti due frasi:
Italian Regular Verbs Italian Regular Verbs Regular or irregular?? Italian verbs are either regular or irregular. Italian irregular verbs MUST be memorized…
Bello, Before a noun, the forms of the adjective bello imitate those of the definite article il divanobel divano lappartamentobellappartamento lo scaffalebello.
The Brooklyn Bridge stands as a testament to hard work. An iconic landmark and historical site, it is one of the oldest suspension bridges in the United.
Tutor: Elisa Turrini Mail:
Enzo Anselmo Ferrari By Giovanni Amicucci. Di Enzo Questo è Enzo Anselmo Ferrari. Enzo compleanno è diciotto febbraio Enzo muore è quattordici agosto.
Models of Knowledge and Models of Data: Social Network Analysis between Mathematical Relations and Social Relations Alfredo Givigliano Department of Philosophy.
Quale Europa? Riscopriamo le radici europee per costruire unEuropa PIÙ vicina a noi ISTITUTO COMPRENSIVO MAZZINI CASTELFIDARDO PROGETTO COMENIUS 2010/2012.
Preposizioni preposizioni utili a di in su da at / to / in of / from / by in / to / at / by on / in from / by / to /at.
Present Perfect.
20 maggio 2002 NETCODE Set up a thematic network for development of competence within the Information Society.
UG40 Energy Saving & Twin Cool units Functioning and Adjustment
EMPOWERMENT OF VULNERABLE PEOPLE An integrated project.
LA WEB RADIO: UN NUOVO MODO DI ESSERE IN ONDA.
UITA Genève ottobre Comitè du Groupe Professionnel UITA Genève octobre 2003 Trade Union and Tour.
Early Language Learning and Multilingualism: Scottish and European Perspectives BILINGUALISM MATTERS.
A PEACEFUL BRIDGE BETWEEN THE CULTURES TROUGH OLYMPICS OLYMPIC CREED: the most significant thing in the olympic games is not to win but to take part OLYMPIC.
Italian 6 Preparation for Final Exam Signorina Troullos.
Passato Prossimo. What is it?  Passato Prossimo is a past tense and it is equivalent to our:  “ed” as in she studied  Or “has” + “ed” as in she has.
Saluti ed espressioni Greetings in Italian.
Lezione n°27 Università degli Studi Roma Tre – Dipartimento di Ingegneria Corso di Teoria e Progetto di Ponti – A/A Dott. Ing. Fabrizio Paolacci.
Italian 1 -- Capitolo 2 -- Strutture
Ratifica dei trattati internazionali - Italia Art. 87 Costituzione “Il Presidente della Repubblica…ratifica i trattati internazionali, previa, quando occorra,
Scenario e Prospettive della Planetologia Italiana
5^BLS Regione Friuli Venezia Giulia Liceo Scientifico “Albert Einstein”
Castelpietra G., Bassi G., Frattura L.
1 Acceleratori e Reattori Nucleari Saverio Altieri Dipartimento di Fisica Università degli Studi - Pavia
WRITING – EXERCISE TYPES
The effects of leverage in financial markets Zhu Chenge, An Kenan, Yang Guang, Huang Jiping. Department of Physics, Fudan University, Shanghai, ,
Transcript della presentazione:

Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica Computazionale – ILC-CNR Pisa Dip. Di Linguistica “T. Bolelli”, Università degli Studi di Pisa { tommaso (dot) caselli (at) ilc (dot) cnr (dot) it } CBA 2008, Barcelona, 14 November 2008

Outline Motivations Bridging in Italian: corpus-study Introducing a Different Resource: PAROLE/SIMPLE/CLIPS Preliminary Experiments and Evaluation Conclusion & Future Work

Motivations: Bridging anaphora is a very challenging phenomenon and their resolution is essential to improve the performance of many NLP applications (Q.A.; I.R. & I.E. and Summarizers); So far, the use of (lexical) resources has concentrated on the exploitation of semantic relations (meronymy, synonymy, hyponymy...) but the results present limitations: the relation between the bridging anaphor and the anchor is not always a semantic relation in classical terms Relations between words are not randomly created by speakers. This calls for resources based on strong theoretical frameworks which may provide accounts on the way words combine and are related Generative Lexicon (G.L.) & G.L.-based resources

Bridging anaphora: theoretical assumptions it is“a type of indirect textual reference whereby a new referent is introduced as an anaphoric not of but via the referent of an antecedent expression” [Kleiber 1999: 339]; Yesterday we went for a pic-nic, but I forgot to put the beers in the fridge. it is a class of inferences required to maintain the coherence of the discourse (Clark 1977); they give rise to three kinds of presupposition: the Uniqueness Presupposition; the Familiarity/Identifiability Presupposition and the Inferential Presupposition i.e.“the [N 1 ] R [N 2 ]” e.g.: N 1 [the beers], N 2 [a pic-nic] R= is_a_member_of).

Bridging anaphora: theoretical assumptions (2) they are a matter of the local focus of the discourse for the identification of their antecedents (Sidner 1979, Poesio 2003); 3 pragma-cognitive dimensions can be identified for their interpretation (Korzen 2003): Lexical Semantics Dimension; Co-textual Dimension (discourse structure); Con-textual Dimension (scripts, frames, world knowledge).

Bridging anaphora: corpus-study Two-folded corpus study: 1)General corpus-study on Full Definite Noun Phrases (FDNPs) in Italian; 2) A study on those cases of FDNPs which are instances of bridging anaphora in Italian. METHODOLOGICAL NOTE  corpus of seventeen randomly chosen articles from the Italian financial newspaper “il Sole-24 Ore”, a workpackage of the SI-TAL project  use of processing requirements for the classification both of the FDNPs in general and for bridging anaphors;  Minimal vs Maximal NP (MUC-7);  all instances of NPs (pronouns – including zero anaphora, lexical expressions), VPs and frames have been considered as probable anchors;  pre- and post-nominal modifiers (adjectives, non-finite verb forms, relative clauses and prepositional phrases) have been considered as disambiguating clues.

Bridging anaphora: corpus-study (1) CLASS NUMBER OF ITEMS PERCENTAGE First Mention % Direct Anaphora % Bridging % Possessives362.54% Idiom251.62% Doubt493.47% Total % Full Definite Noun Phrases in Italian :

Bridging anaphora: corpus-study (2) Bridging Anaphora in Italian : CLASS OF BRIDGING FDNPs NUMBER OF ITEMS PERCENTAGE Lexical % Event186.02% Rhetorical Relation279.03% Inferential % Discourse Topic268.69% Total299100% LEXICAL SEMANTICS > PRAGMATICS > DISCOURSE STRUCTURE 221 anchors are nominal entities & ~70% have lookback ranging % (119/221) of the anchors are previous Cbs/Cps (Centering Theory) 25.33% are proper names 34.03% are NPs of postmodifying PPs, i.e. the explicit argument of the head noun of Lőbner’s FC2 e.g.: 4) i due Paesi - i due partner commerciali: I negoziatori dei due Paesi hanno annunciato che i colloqui “informali” in corso da giovedì scorso nella capitale Usa hanno portato all' alba di martedì al compromesso[...]. Doppiato questo scoglio [...] i due partner commerciali hanno promesso di procedere a passo spedito.

A Different Resource: PAROLE/SIMPLE /CLIPS As the corpus-study has shown more than 45% of the relations between anchor – bridging anaphor are based on relations which are not strictly lexical. WHY USING (AGAIN) A LEXICAL RESOURCE? SIMPLE is based on Generative Lexicon (Pustejovsky, 1995):  formal framework which explains how senses are generated in the lexicon;  the basic qualia (telic, constitutive, agentive and formal) enable the description of the meaning of the word & captures orthogonal relations between semantic units;  the span of semantic relations in the G.L. framework is much wider and it reduces the need of world/pragmatic knowledge to explain semantic relations between words

A Different Resource: PAROLE/SIMPLE /CLIPS (2) PAROLE/SIMPLE/CLIPS is the largest computational lexical knowledge base of Italian language: SEMANTICS Lemmas:45,437 verbs2,830 common nouns14,088 proper nouns526 adjectives1856 Semantic Units:57,101 verbs5,351 common nouns19,123 proper nouns873 adjectives3,163

A Different Resource: PAROLE/SIMPLE /CLIPS (3) SEMANTIC UNIT Ontological Type Domain Event Type Semantic Properties FEATURESFEATURES RELATIONSRELATIONS Extended Qualia Synonymy Derivation Regular Polysemy

A Different Resource: PAROLE/SIMPLE /CLIPS (4)

A Different Resource: PAROLE/SIMPLE /CLIPS (5) Qualia structure:  the classical 4 qualia have been extended, up to 64 relations  finer-grained specification of meaning dimensions  from a single keyword it is possible to retrieve and extract a set of semantic units, regardeless of their semantic type, which creates a rich semantic network in the text FORMALTELICAGENTIVECONSTITUTIVE 5 semantic relations35 semantic relations10 semantic relations14 semantic relations THESE SEMANTIC RELATIONS ARE TAKEN TO EXPRESS THE R ELEMENT OF THE INFERENTIAL PRESUPPOSITION TO RESOLVE BRIDGING ANAPHORS PISTOLA (gun) – ARMA (weapon) SemRel= is_a MORTE (dead) – SUICIDIO (suicide) SemRel= resulting_state BENZINA (petrol) – PETROLIO (oil) SemRel= derived_from PROIETTILE (bullet) – COLPIRE (shoot) SemRel= used_for

A Different Resource: PAROLE/SIMPLE /CLIPS (6) 1.i prezzi – al consumatore [the prices – the customer]; INFERENTIAL  indirect_telic + agent_verb 2.il processo – gli imputati [the trial – the convicted]; INFERENTIAL  member_of 3.essersi sparato – il suicidio [to shoot oneself – the suicide]; EVENT  resulting_state 4.fatto esplodere – the debris [exploded – the debris]; EVENT  result_of 5.condannare – il pubblico ministero [to condemn – the attorney] EVENT  relates 6.il voto – l’elezione [the vote – the election] RHET. RELATION  purpose

Experiments and evaluation Experiment: 129 couple of bridging anaphor – anchor has been selected from the corpus-study, corresponding to the following classes:  Lexical  Event  Rhetorical Relations  Inferential Anaphoric relations involving N.E. have been excluded

Experiments and evaluation (2) Bridging anaphorAnchor SIMPLE  WSD of the bridging anaphor  selection of the anchor - automatic retrieval of the semantic relation - maximum 2 semantic arcs allowed - direct connection between the 2 SemU or between the 2 SemType.

Experiments and evaluation: results Resource# BridgingLexicalInferentialEvent Rhet. Relation SIMPLE22 (17.05%)11 (50.00%)7 (31.82%)2 (9.09%) IWN19 (14.72%)12 (63.20%)5 (26.31%)2 (10.52%)0 Unsatisfactory results BUT still better than using IWN Reason: lots of the extended qualia relations have not been introduced into the resource The classes of Inferential and Rhetorical Relations are mostly resolved by 2 type of qualia: CONSTITUTIVE & TELIC

Conclusion & Future Work  the use of a GL based resource can be seen as a way of reducing the need of extralinguistic knowledge;  the problem of bridging anaphora resolution becomes part of a more general problem of identification of semantic relations between linguistic elements.  a resource with GL qualia relations encoded in it should not be compared with a world-knowledge databases. GL-based relations are dynamic: they allow to discover new relations between lexical items and can provide an account for the creative use of language;

Conclusion & Future Work (2)  qualia relations can represent new features for machine learning approaches;  GL pattern induction from a corpus-based study can improve the resource by adding missing relation;  extensive exploitation of the SemTypes can overcome the need of introducing single SemUs. ESPLODERE (explode) - MACERIE (debris) ESPLODERE Resulting_state SemU maceria SemType Cause_change_of_state MACERIA result_of SemU esplodere SemType Cause_change_of_state SemType Cause_change_of_state SemU DETRITO SemU …………

Thanks

The Model : Bill (Cb), book (Cp), Maria(Cf), Bill(x 1 ) book(x 2 ) Maria(x 3 ) give(x 1,x 3,x 2 ) ……………. author (Cb) author(y 1 ) famous(y 1 ) Main DRS DRS 2 2) Bill gave a book to Maria. The author is very famous.

Lexical Bridging la pistola (the gun) - l' arma (the weapon)): E poi stupisce che nel tamburo della pistola mancasse un proiettile[…]. Alcune tracce di ruggine, infatti, farebbero pensare che l' arma fu collocata nella cintura dei pantaloni almeno 4 o 5 giorni prima del ritrovamento del corpo. l’esplosivo (the explosive) – la bomba (the bomb): Gli agenti sono risaliti al furgone utilizzato per trasportare l' esplosivo nel garage e alla persona che l' aveva affittato, Salameh. Il suo arresto, anche per aver collaborato alla preparazione della bomba, fu seguito dalla cattura di Ayyad, un chimico

essersi sparato (to shoot oneself) - il suicidio (the suicide): dopo essersi sparato una prima volta con la sua “Smith e Wesson” calibro 38, carica a proiettili “rafforzati” […]. Ciò non esclude automaticamente l' ipotesi del suicidio, ma avvalora quella di successive manomissioni, effettuate subito dopo la morte. Event rispose (to answer) - le domande (the questions): Nel 1993, invece, rispose positivamente alle domande degli inquirenti perché ritenne che il clima politico consentisse di parlare liberamente.

Rhetorical Relations il voto (the vote) - l' elezione (the election): il voto di lista maggioritario per l' elezione in assemblea dei componenti del cda (che per altro verranno retribuiti anche in relazione ai risultati ottenuti dalla società) due elementi (two elements) - il voto... (the vote) / i limiti (the limits).. [i tre ministri] che hanno voluto introdurre nello statuto due elementi finora sconosciuti nell' universo italiano delle privatizzazioni: il voto di lista maggioritario per l' elezione in assemblea dei componenti del cda (che per altro verranno retribuiti anche in relazione ai risultati ottenuti dalla società) e, soprattutto, i limiti imposti al tetto azionario che vanno ben oltre il vincolo del 5 per cento.

Inferential quattro uomini (four men) - i quattro immigrati (the four immigrants): “Non potrò veder crescere mio figlio perché quattro uomini hanno deciso di far saltare simboli americani” [...]. Neppure la chiusura del processo ai quattro immigrati di origine araba promette però di scrivere la parola fine il tribunale (the court) - il giudice (the judge): La decisione del tribunale era parsa scontata e non ha sorpreso neppure Mohammed Salameh, Ahmad Ajaj, Mahmud Abouhalima e Nidal Ayyad, il gruppo di fondamentalisti islamici sotto processo. “Mi aspetto il massimo della pena” aveva detto Ajaj poco prima di ascoltare il responso del giudice [...]. la Cina (China) – Pechino (Bejing): gli Stati Uniti sono parsi più vicini a trovare una soluzione di compromesso anche sulla controversia con la Cina sui diritti umani. Il segretario di Stato Warren Christopher avrebbe infatti stabilito che Pechino ha soddisfatto richieste specifiche alle quali gli Usa.

I terroristi hanno fatto esplodere una potentissima carica di esplosivo nel garage dei piu' alti grattacieli di New York : tra le macerie persero la vita sei persone e altre mille rimasero ferite,