INFORMATICA UMANISTICA D: LESSICOGRAFIA E COMPUTER Semantica lessicale Tesauri WordNet
SEMANTICA LESSICALE Nella lezione 2 iniziammo a discutere la caratterizzazione del significato delle parole nei dizionari contemporanei In questa lezione discuteremo piu’ in dettaglio queste definizioni, e parleremo di altri tipi di dizionari che cercano di caratterizzare questi significati in modo piu’ preciso: tesauri e WordNet
TIPI DI DEFINIZIONI IN UN DIZIONARIO GENUS E DIFFERENTIA: “stating the superordinate concept next to the definiendum together with at least one distinctive feature” SINONIMIA TIPICALITA’ USO
GENUS E DIFFERENTIA horse noun GENUS 1 a solid-hoofed plant-eating domesticated mammal with a flowing mane and tail, used for riding, racing, and to carry and pull loads New Oxford Dictionary of English DIFFERENTIAE
LIMITI DELLA DEFINIZIONE VIA GENUS & DIFFERENTIA (lez.2) Putnam: `faggio’ / `olmo’ `diamante’ / `zircone’ Jackson: happen vs occur vs befall vs transpire Everything is illuminated: `harmonize’ vs `agree’,
TIPI DI DEFINIZIONI IN UN DIZIONARIO GENUS E DIFFERENTIA SINONIMIA Molte parole, specialmente astratte, difficili da definire in modo analitico In questo caso si usano sinonimi TIPICALITA’ USO
DEFINIZIONE PER SINONIMIA miserable 1 very unhappy, wretched 2 causing misery 3 squalid 4 mean unhappy 1 sad or depressed 2 unfortunate or wretched wretched 1 miserable or unhappy 2 worthless Collins Pocket English Dictionary (2000) CIRCOLARITA
TIPI DI DEFINIZIONI IN UN DIZIONARIO GENUS E DIFFERENTIA SINONIMIA TIPICALITA’ La definizione specifica cos’e’ “tipico” del referente USO
DEFINIZIONE PER TIPICALITA’ day of rest a day set aside from normal activity, typically, Sunday on religious grounds measles an infectious viral disease causing fever and a red rash, typically occurring in childhood Concise Oxford Dictionary
TIPI DI DEFINIZIONI IN UN DIZIONARIO GENUS E DIFFERENTIA SINONIMIA TIPICALITA’ USO La definizione spiega l’uso di una parola Tipica specialmente per le parole funzionali (articoli, preposizioni, etc)
RELAZIONI DI SIGNIFICATO Molte di queste definizioni stabiliscono il significato di una parola tramite relazioni di significato con altre parole: IPONIMIA: cane / animale SINONIMIA: scemo / cretino ANTONIMIA: giusto / sbagliato MERONIMIA: cavallo / criniera
IPONIMIA HYPONYMY is the relation between a subclass and a superclass: CAR and VEHICLE DOG and ANIMAL BUNGALOW and HOUSE Generally speaking, a hyponymy relation holds between X and Y whenever it is possible to substitute Y for X: That is a X -> That is a Y E.g., That is a CAR -> That is a VEHICLE. HYPERNYMY is the opposite relation
IPONIMIA NELLE DEFINIZIONI Gia’ visto esempi sopra
SINONIMIA Two words are SYNONYMS if they have the same meaning at least in some contexts E.g., PRICE and FARE; CHEAP and INEXPENSIVE; LAPTOP and NOTEBOOK; HOME and HOUSE I’m looking for a CHEAP FLIGHT / INEXPENSIVE FLIGHT From Roget’s thesaurus: OBLITERATION, erasure, cancellation, deletion But few words are truly synonymous in ALL contexts: I wanna go HOME / ?? I wanna go HOUSE The flight was CANCELLED / ?? OBLITERATED / ??? DELETED
SINONIMIA NELLE DEFINIZIONI Gia’ visto esempi prima
ANTONIMIA La relazione di antonimia lega lemmi con significati opposti: giusto / sbagliato; piccolo / grande Alle volte anche antonimia ‘estesa’ destra / sinistra; cane / gatto
ANTONIMIA artificial not real conventional not spontaneous or sincere or original vacant not occupied Concise Oxford Dictionary 9
MERONIMIA La relazione tra le parti ed il tutto: Criniera / cavallo; ruota / auto
MERONIMIA NELLE DEFINIZIONI HYPERNYM horse noun 1 a solid-hoofed plant-eating domesticated mammal with a flowing mane and tail, used for riding, racing, and to carry and pull loads New Oxford Dictionary of English PARTI
QUANTI SIGNIFICATI? horse noun 1 a solid-hoofed plant-eating domesticated mammal with a flowing mane and tail, used for riding, racing, and to carry and pull loads Equus caballus, family Equidae (the horse family), descended from the wild Przewalski’s horse. The horse family also includes the asses and zebras. An adult male horse; a stallion or gelding. A wild mammal of the horse family 2 a frame or structure on which something is mounted or supported, especially a sawhorse. 3 [mass noun] informal heroin 4 informal a unit of horsepower: the huge 63-horse 701-cc engine 5 Mining an obstruction in a vein New Oxford Dictionary of English
QUANTI SIGNIFICATI? horse n 1 a domesticated perissodactyl mammal, Equus caballus, used for draught work and riding: family Equidae 2 the adult male of this species; stallion. 3 wild horse. 3a a horse (Equus caballus) that has become feral. 3b another name for Przewalski’s horse. 4a any other member of the family Equidae, such as the zebra or ass. 4b (as modifier): the horse family 5 (functioning as pl) horsemen, especially cavalry: a regiment of horse 6 Also called: buck Gymnastics: a padded apparatus on legs, used for vaulting, etc 7 a narrow board supported by a pair of legs at each end, used as a frame for sawing or as a trestle, barrier, etc 8 a contrivance on which a person may ride and exercise 9 a slang word for heroin 10 Mining a mass of rock within a vein or ore. 11 Nautical. A rod, rope or cable, fixed at the ends, along which something may slide by means of a thimble, shackle, or other fitting; traveller. 12 Chess. An informal name for knight. 13 Informal. Short for horsepower. 14 (modifier) drawn by horse or horses: a horse cart. Collins English Dictionary 4
OMONIMIA E POLISEMIA OMONIMIA: I significati sono ben distinti (e.g., etimologie diverse) BANK ‘SCANNARE’ come ‘fare a pezzi’ / ‘italianizzazione di TO SCAN’; GRU come uccello / macchina per sollevare pesi POLISEMIA: i significati sono collegati MOUTH VERDE’ come ‘avente un certo colore’ e come ‘ricco di vegetazione’
QUANTI SIGNIFICATI? The `lumpers’ like to lump meanings together and leave the user to extract the nuance of meaning that corresponds to a particular context, whereas the `splitters’ prefer to enumerate differences of meaning in more detail; the distinction corresponds to that between summarizing and analysing. Allen, R. Lumping and splitting, English today, 16(4), 61-3
CRITERI ? GRAMMATICALI COLLOCAZIONI ETIMOLOGIA Sensi nominali vs verbali Usi transitivi & intransitivi (Hirst, 1987) Ross KEPT staring at Nadia’s decolletage Nadia KEPT calm and made a cutting remark Ross wrote of his embarassment in the diary that he KEPT. COLLOCAZIONI isometric da CED4: (of a crystal or system of crystallization) having three mutually perpendicular equal axes (of a method of projecting a drawing in three dimensions) having the three axes equally inclined and all lines drawn to scale ETIMOLOGIA
PROBLEMI Gia’ menzionato: distinzioni di senso non sempre facili Circolarita’ Relazioni non usate in modo coerente
SEMANTICA & LESSICO: UN RIASSUNTO “eat” “eats” EAT-LEX-1 eat0600 eat0700 “ate” “eaten” WORD-FORMS LEXEMES SENSES
L’ORGANIZZAZIONE DEL LESSICO stock0100 STOCK-LEX-1 stock0200 STOCK-LEX-2 stock0600 “stock” stock0700 STOCK-LEX-3 stock0900 stock1000 WORD-FORMS LEXEMES SENSES
SINONIMIA cheap0100 “cheap” …. …… cheapXXXX inexp0900 “inexpensive” CHEAP-LEX-1 “cheap” …. CHEAP-LEX-2 …… cheapXXXX INEXP-LEX-3 inexp0900 “inexpensive” inexpYYYY WORD-FORMS LEXEMES SENSES
DIZIONARI ORGANIZZATI SULLA BASE DEL SIGNIFICATO Tesauri WordNet
TESAURI Dizionari organizzati per argomenti sono apparsi simultaneamente a quelli organizzati alfabeticamente (Ǽlfric: Glossary, ~ 1000) Piu’ famoso dizionario tematico: Peter Mark Roget, Thesaurus of English Words and Phrases, apparso per la prima volta nel 1852
ROGET THESAURUS: CLASSI ABSTRACT RELATIONS Sezioni: Existence, relation, quantity, order, number, time, change, causation SPACE MATTER INTELLECT VOLITION AFFECTIONS
ROGET’S THESAURUS: SEZIONI & INSIEMI DI PAROLE ABSTRACT RELATIONS …. IV. ORDER 1. GENERAL 58 Order 59 Disorder 60 Arrangement 61 Derangement 2. CONSECUTIVE 62 Precedence 63 Sequence 64 Precursor 65 Sequel 66 Beginning 67 End 68 Middle
ALTRI TESAURI A THESAURUS OF OLD ENGLISH (Roberts, 1995) HISTORICAL THESAURUS OF ENGLISH (Christian Kay) LONGMAN DICTIONARY OF SCIENTIFIC USAGE
WORDNET A lexical database created at Princeton Freely available for research from the Princeton site http://www.cogsci.princeton.edu/~wn/ Information about a variety of SEMANTICAL RELATIONS Three sub-databases (supported by psychological research as early as (Fillenbaum and Jones, 1965)) NOUNs VERBS ADJECTIVES and ADVERBS Each database organized around SYNSETS
SYNSETS Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET E.g., {chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug} (gloss: person who is gullible and easy to take advantage of)
STRUTTURA DI WORDNET Diagrammi con synsets e relazioni
IL DATABASE DEI NOMI About 90,000 forms, 116,000 senses Relations: hypernym breakfast -> meal hyponym meal -> lunch has-member faculty -> professor member-of copilot -> crew has-Part table -> leg part-of course -> meal antonym leader -> follower
IPERNIMIA 2 senses of robin Sense 1 robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast) => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast) => oscine, oscine bird -- (passerine bird having specialized vocal apparatus) => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless) => bird -- (warm-blooded egg-laying vertebrates characterized by feathers and forelimbs modified as wings) => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium) => chordate -- (any animal of the phylum Chordata having a notochord or spinal column) => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement) => organism, being -- (a living thing that has (or can develop) the ability to act or function independently) => living thing, animate thing -- (a living (or once living) entity) => object, physical object -- => entity, physical thing --
MERONIMIA wn beak –holon Holonyms of noun beak 1 of 3 senses of beak beak, bill, neb, nib PART OF: bird
VERBI About 10,000 forms, 20,000 senses Relations between verb meanings: Hypernym fly-> travel Troponym Walk -> stroll Entails Snore -> sleep Antonym Increase -> decrease
RELAZIONI TRA SIGNIFICATI VERBALI V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2 - e.g., snore entails sleep TROPONYMY when To do V1 is To do V2 in some manner - e.g., limp is a troponym of walk
AGGETTIVI & AVVERBI About 20,000 adjective forms, 30,000 senses 4,000 adverbs, 5600 senses Relations: Antonym (adjective) Heavy <-> light Antonym (adverb) Quickly <-> slowly
COME USARLO Online: http://cogsci.princeton.edu/cgi-bin/webwn Scaricatevelo, poi da command line: Get synonyms: wn –synsn bank Get hypernyms: wn –hypen robin (also for adjectives and verbs): get antonyms wn –antsa right
I LIMITI DI WORDNET Coverage Context-dependent senses: words not in WordNet Crocidolite, spinoff (spin-off) Missing information: MERONYMY Context-dependent senses: slump, crash, bust all synonyms in the WSJ corpus The structure of WordNet Some information is encoded in complex ways (room, wall, floor) But: MOVING TARGET!!
MERONIMIA IN WORDNET: UN ESPERIMENTO 100 bridging descriptions in a mereological relation Ran a script trying to find a direct link in WordNet (1.7) between one of the senses of the BD and one of the senses of any of the previous NPs Results: in only 6 cases there is in WordNet a direct lexical relation between a BD and one of the CFs
John looked at the HOUSE. The WALL was crumbling. ARTIFACT HOUSING BUILDING HOUSE HOME ROOM WALL FLOOR IS-A PART-OF
SOLUZIONE: ACQUISIZIONE LESSICALE Parziale (aggiungi informazioni a WordNet, specialmente per domini specialistici) Totale (crei un nuovo lessico a partire da zero)
LETTURE Jackson, cap. 8 C. Fellbaum. WordNet: An electronic lexical database. MIT Press, 1998 cap. 1