Bayesian Modeling of Lexical Resources for Low-Resource Settings

Bayesian Modeling of Lexical Resources for Low-Resource Settings

Bayesian Modeling of Lexical Resources for Low-Resource Settings Nicholas Andrews with Mark Dredze, Benjamin Van Durme, and Jason Eisner A place name 2 This Talk: Sequence Labeling Corpus ... known as [Llanfairpwllgwyngyll] yt = Person? Location? Other?

3 This Talk: Sequence Labeling with Gazetteers Gazetteer Llanfairpwllgwyngyll Jacksonville New York Allentown Corpus ... known as [Llanfairpwllgwyngyll] yt = Person? Location? Other? 4

This Talk in One Slide Dont Condition on the gazetteer: Do Generate the gazetteer! P(y | P( Gazetteer Llanfairpwllgwyngyll Jacksonville

, x) Gazetteer Llanfairpwllgwyngyll Jacksonville , x, y) 5 Warning: Pick a good generative model Its easy to have rich discriminative models A little harder for generative models, but possible:

High-resource case: LSTM language models Low-resource case (this paper): hierarchical Bayesian LMs 6 PART I: GAZETTEER FEATURES 7 Discriminative Named-Entity Recognition Corpus Person? Location?

he went to [Jacksonville] What type is Jacksonville? 8 Discriminative Named-Entity Recognition Corpus Person? Location? he went to [Jacksonville] What type is Jacksonville? 9

Discriminative Named-Entity Recognition Corpus Person? Location? he went to [Jacksonville] Location! P(labels | words) context 10 Discriminative Named-Entity Recognition

Corpus Person? Location? he went to [Jacksonville] Location! P(labels | words) context Location! spelling 11

Discriminative Named-Entity Recognition Corpus yt = loc he went to [Jacksonville] Location! P(labels | words) context Location! spelling

12 What if context and spelling arent enough? Corpus yt = Person? Location? ... known as [Llanfairpwllgwyngyll] 13 What if context and spelling arent enough? Corpus

yt = Person? Location? ... known as [Llanfairpwllgwyngyll] ? context 14 What if context and spelling arent enough? Corpus yt = Person? Location?

... known as [Llanfairpwllgwyngyll] ? ? context spelling 15 Hmm, what if we had some sort of list of names we knew were locations?

Corpus y =? t ... known as [Llanfairpwllgwyngyll] ? ? context spelling 16

Solution: use Gazetteers Gazetteer Llanfairpwllgwyngyll Jacksonville New York Allentown Albert Einstein Albert Eyntey n Alberts Eintei ns

17 Gazetteer Features Gazetteer Llanfairpwllgwyngyll Jacksonville New York Allentown Corpus ... known as [Llanfairpwllgwyngyll] GazFeature(str) :=

1 0 IF str IN GAZ OTHERWISE 18 Gazetteer Llanfairpwllgwyngyll Jacksonville New York Allentown Corpus

yt = loc ... known as [Llanfairpwllgwyngyll] ? context Location! gazetteer ? spelling 19

PART II: THE TROUBLE WITH GAZETTEER FEATURES 20 What goes wrong with gazetteer features 1. Overfitting: gazetteer inhibits learning of spelling + context features from annotated corpus 2. Discriminative training doesnt learn spelling information from the gazetteer 21

The larger the gazetteer, the more we overfit Gazetteer Llanfairpwllgwyngyll Corpus Corpus Training 1. 2. 3. 4. a statement from Clinton [] [...]

known as [Llanfairpwllgwyngyll] he went to [Jacksonville] She is from [New York] a statement from [Allentown] context gazetteer spelling 22 The larger the gazetteer, the more we overfit

Gazetteer Llanfairpwllgwyngyll *Jacksonville* Corpus Corpus Training 1. 2. 3. 4. a statement from Clinton [] [...] known

as [Llanfairpwllgwyngyll] he went to [*Jacksonville*] She is from [New York] a statement from [Allentown] context gazetteer spelling 23 The larger the gazetteer, the more we overfit Gazetteer

Llanfairpwllgwyngyll Jacksonville *New York* Corpus Corpus Training 1. 2. 3. 4. a statement from Clinton [] [...] known

as [Llanfairpwllgwyngyll] he went to [Jacksonville] She is from [*New York*] a statement from [Allentown] context gazetteer spelling 24 The larger the gazetteer, the more we overfit Gazetteer

Llanfairpwllgwyngyll Jacksonville New York *Allentown* Corpus Corpus Training 1. 2. 3. 4. context a statement from Clinton []

[...] known as [Llanfairpwllgwyngyll] he went to [Jacksonville] She is from [New York] a statement from [*Allentown*] gazetteer spelling 25 The larger the gazetteer, the more we overfit

Gazetteer Llanfairpwllgwyngyll Jacksonville New York Allentown Corpus Train TEST Corpus A statement [...] a statement by [Townville] from Clinton []

context gazetteer spelling 26 The larger the gazetteer, the more we overfit Gazetteer Llanfairpwllgwyngyll Jacksonville New York

Allentown Corpus Train TEST Corpus A statement [...] a statement by [Townville] from Clinton [] Townville not in gazetteer. Location!

Person! context gazetteer spelling 27 What goes wrong with gazetteer features 1. Overfitting: gazetteer inhibits learning of spelling + context features from annotated corpus 2. Discriminative training doesnt learn spelling information from the gazetteer

Arent more observations supposed to help? (Bayes) The Problem: So far, we treat gazetteer as features, not observations 28 Gazetteer Features Ignore Information Gazetteer Llanfairpwllgwyngyll Jacksonville New York Allentown Test Corpus

Corpus A statement by [Townville] [...] a statement from Clinton [] Can we learn spelling from the gazetteer? 29 Prior Work We are not the first to notice some of these issues

Weight undertraining (Sutton et al., 2006) CRF-specific remedies have been proposed Logarithmic opinion pools (Smith et al., 2005) Our Solution: Model the corpus and the gazetteer jointly 30 PART III: GENERATE THE GAZETTEER 31

Explorers Gazetteer Explorers Diary Jacksonville Allentown Greenville Georgetown 1. 2. 3. 4. known as Centertown

he went to Townville She is from Georgeville a statement from Allentown 32 Explorer names new places Explorers Gazetteer Jacksonville Allentown Greenville Georgetown 33

Explorer names new Pspelling(name | yt = loc) places Explorers Gazetteer Jacksonville Allentown Greenville Georgetown 34 Explorer writes about places Explorers Gazetteer

Explorers Diary Jacksonville Allentown Greenville Georgetown 1. 2. 3. 4. known as Centertown he went to Townville She is from Georgeville

a statement from Allentown 35 NOTE: the SAME spelling model generates both types and tokens Pcontext(yt = loc | context) * Pspelling(name | yt = loc) Explorers Gazetteer Explorers Diary Jacksonville Allentown Greenville

Georgetown 1. 2. 3. 4. known as Centertown he went to Townville She is from Georgeville a statement from Allentown 36 (Conditional model)

Condition on x (Proposed model) Model x: gazetteer + corpus 37 yt-2 yt-1 yt Context model 38

yt-2 yt-1 yt xt Spelling Model 39 We can now generalize from the gazetteer! Test Corpus

Gazetteer Llanfairpwllgwyngyll A statement by [Townville] Jacksonville New York Allentown Location! Pspelling(T, o, w, n, v, i, l, l, e | yt = location) 40 We can now generalize from the gazetteer! Test Corpus

Gazetteer Llanfairpwllgwyngyll A statement by [Townville] Jacksonville New York Allentown Townville not Location! VERSUS in gazetteer. Pspelling(T, o, w, n, v, i, l, l, e | yt = location) gazetteer

41 What about Llanfairpwllgwyngyllgogeryc drobwll Problem: Pspelling(L, l, a, n, f, a, i, r, p, w, l, l, | y = loc) is tiny 42 What about Llanfairpwllgwyngyllgogeryc drobwll Problem: Pspelling(L, l, a, n, f, a, i, r, p, w, l, l, | y = loc) is tiny But gazetteer features handled this case! Gazetteer features recognize specific strings via: GazFeature(str) :=

1 0 IF str IN GAZ OTHERWISE Even a weirdly spelled name is a location, if its in gazetteer! 43 What about Llanfairpwllgwyngyllgogeryc drobwll Problem: Pspelling(L, l, a, n, f, a, i, r, p, w, l, l, | y = loc) is tiny IF str IN model? GAZ

Can we account for this 1in generative GazFeature(str) := 0 OTHERWISE Even a weirdly spelled name is still a name, if its in gazetteer! 44 Solution: Stochastic Memoization Gazetteer Llanfairpwllgwyngyll Jacksonville New York

Allentown With probability : Sample an existing word in the gazetteer E.g. Llanfairpwllgwyngyll 45 Solution: Stochastic Memoization Gazetteer Llanfairpwllgwyngyll Jacksonville New York Allentown

With probability : Sample an existing word in the gazetteer E.g. Llanfairpwllgwyngyll With probability 1 : Spell a new word character-bycharacter E.g. Townville 46 Solution: Stochastic Memoization Gazetteer Llanfairpwllgwyngyll Jacksonville New York

Allentown With probability : Sample an existing word in the gazetteer E.g. Llanfairpwllgwyngyll With probability 1 : Spell a new word character-bycharacter E.g. Townville Pcache(word) + (1 ) Pspelling(x = w, o, r, d | y = label) 47 Summary & Trade-offs

Condition on the Gazetteer Generate the Gazetteer Fewer independence assumptions More independence assumptions Gazetteer features: may overfit Gazetteer is data: no overfitting

Does not model the gazetteer; needs annotated data to learn spelling Learns spelling from gazetteers; no need for supervised data 48 PART IV: EXPERIMENTS LOW-RESOURCE NAMED-ENTITY RECOGNITION + PART-OF-SPEECH INDUCTION 49 Experiment 1: Low-Resource NER

Language: Turkish Baseline: CRF with gazetteer features We vary: Supervision: 1 to 500 sentences Gazetteers size: 10, 100, 1000 For each type: person, location, organization, other 50 F1 of model minus F1 of baseline NUMBER OF LABELED SENTENCES FOR TRAINING 51

Experiment 2: Part-of-Speech Induction Use Wiktionary entries as a gazetteer (Incomplete) dictionary: words and their parts-of-speech Baseline: HMM trained with EM (Li et al., 2012) dictionary as constraints on possible parts-of-speech for each word type Data: CoNLL-X and CoNLL 2007 languages 52 Concluding Remarks

54 Key ideas / take-aways Discriminative training has intrinsic limitations when incorporating gazetteers or other lexical knowledge Solution: use a generative model and treat gazetteer entries as ordinary observations Pick your favorite rich generative model Low-resource (this paper): Bayesian backoff via Pitman-Yor processes High-resource: LSTM language model + LSTM spelling model Experiments with more languages in the paper Code: https://github.com/noa/bayesner 55

Generate your Gazetteer! Explorers Gazetteer Explorers Diary Llanfairpwllgwyngyll Allentown Greenville Georgetown 1. 2.

3. 4. known as Llanfairpwllgwyngyll he went to Townville She is from Georgeville a statement from Allentown 56

Recently Viewed Presentations

  • PegIntron WCOG Bangkok - HIVandHepatitis.com

    PegIntron WCOG Bangkok - HIVandHepatitis.com

    In addition, the ribavirin dose is suboptimal in many1-2; both factors affect the SVR rate Objectives Evaluate the efficacy and safety of extended treatment with PEG-IFN alfa-2b plus weight-based RBV in G1 slow responders Slow responders are patients with detectable...
  • Resignation/Termination Guidelines

    Resignation/Termination Guidelines

    Termination Guidelines O&O Store Manager Training . 20 August 2018. When printing the dark front page or chapter pages, you might run into some issues if your default printer settings are black & white or greyscale.
  • Northwest Area Committee & RegionalBoise, Response Team Idaho

    Northwest Area Committee & RegionalBoise, Response Team Idaho

    The NCP mandates the formation of Regional Response Teams. These regional response teams are charged with contingency planning prior to incidents, and during incidents they can be called together to muster resources to assist the on-scene coordinator, or approve specific...
  • O TO IU DNG CHUYN KHOA & PHM

    O TO IU DNG CHUYN KHOA & PHM

    8. Kết nối đào tạo và phát triển nghề nghiệp 9. Vai trò & Phạm vi hoạt động chuyên môn của APN Chăm sóc NB kỹ năng cao (Skilled Care Provider): Chủ động thực hiện CSNB với kỹ năng chuyên...
  • Similitude Analysis - NPTEL

    Similitude Analysis - NPTEL

    Gas-Turbine Combustor Efficiency: If scale model is run with same fuel, at same inlet temperature (Tu) & same mixture ratio (F) as prototype, nondimensional parameters will be same if: PARTIAL MODELING OF CHEMICALLY REACTING SYSTEMS * Gas-Turbine Combustor Efficiency: Is...
  • Tackling the Issue of Gender Equity in STEM

    Tackling the Issue of Gender Equity in STEM

    Women are less likely than men are to declare a STEM major in college. Source: Commission on Professionals in Science and Technology. Data derived from Cooperative Institutional Research Program, Higher Education Research Institute, Graduate School of Education and Information Studies,...
  • The southeast region - Issaquah Connect

    The southeast region - Issaquah Connect

    The Southeast is the best because their resorcorces and industry. Such as, Florida grows more oranges than any other state! And they have lots of farming, also factories in north Carolina make jets for large passenger planes. And the Mississippi...
  • Teaching Family Physicians To Be Information Masters

    Teaching Family Physicians To Be Information Masters

    The point: just like with the case of Noah, we have to examine information that we are sure is "true." That is the point of information mastery teaching - to create informed users of medical information. We will later repeat...