# Lecture 8: Learning to Search Imitation Learning in

Lecture 8: Learning to Search Imitation Learning in NLP Kai-Wei Chang CS @ UCLA [email protected] Couse webpage: https://uclanlp.github.io/CS269-17/ ML in NLP 1 Learning to search approaches Shift-Reduce parser Maintain a buffer and a stack Make predictions from left to right Three (four) types of actions: Shift, Reduce, Left, Right

Credit: Google research blog Kai-Wei Chang (University of Virginia) 2 Structured Prediction as a Search problem Decomposition of y often implies an ordering a sequential decision making process I Pro can can a can Md

Dt Nn Vb Kai-Wei Chang 3 Notations Input: Truth: Predicted: Loss: I can can a can

Pro Md Vb Dt Nn Pro Pro Pro Pro Md Md Md Md Md Md Nn Nn Dt Dt Dt

Dt Vb Nn Md Vb Goal: make joint prediction to minimize a joint loss find such that minimizing based on samples Kai-Wei Chang ( MSR -> U of Virginia) 4 Credit Assignment Problem When making a mistake, which local decision should be blamed? sentence

Kai-Wei Chang (University of Virginia) 5 An Analogy from Playing Mario From Mario AI competition 2009 Output: Jump in {0,1} Right in {0,1} Left in {0,1} Speed in {0,1} High level goal: Extracted 27K+ binary features from last 4 observations Watch an expert play and (14 binary features for every cell)

learn to mimic her behavior Video credit: Stphane Ross, Geoff Gordon and Drew Bagnell Input: Example of Search Space I can Pro Md can a I

can can can decision Pro Md Vb Dt Nn decision action Pro Md Vb Dt Nn decision action Pro Md Vb Dt Nn

action Kai-Wei Chang 7 Example of Search Space I can can a can Pro Md

Vb Dt Vb Encodes an output = (e) from which loss(y, ) can be computed (at training time) I can can a can Pro Md Vb Dt Nn

e end Kai-Wei Chang 8 Policies A policy maps observations to actions p( obs. input: x timestep: t partial traj:

anything else Kai-Wei Chang ) =a 9 Labeled data Reference policy Given partial traj. and true label , the minimum achievable loss is 1 2 . 1 ( ,+1 , )

Kai-Wei Chang loss( y , ( a )) e 10 Labeled data Reference policy Given partial traj. and true label , the minimum achievable loss is The optimal action is the corresponding The optimal policy is the policy that always selects the optimal action Reference policy can be constructed by the gold label in the training phase Kai-Wei Chang 11 Ingredients for learning to search

Structured Learning Learning to Search Input: Truth: Outputs: Loss: Search Space: - state: - action: - end state Policies: Reference policy: Kai-Wei Chang 12

A Simple Approach Collect trajectories from expert Store as dataset Train classifier on Let play the game! Kai-Wei Chang 13 Learning a Policy[Chang+ 15, Ross+15] At ? state, we construct a cost-sensitive multi-class example (?, [0, .2, .8]) E loss=0 E loss=.2 ? E loss=.8

rollout rollin one-step deviations Kai-Wei Chang 14 Example: Sequence Labeling x = Receive input: the monster ate the sandwich y = Dt Nn Vb Dt Nn

Make a sequence of predictions: x = the monster ate the sandwich = Dt Dt Dt Dt Dt Pick a timestep and try all perturbations there: x = Dt = Nn = Vb = the monster ate the sandwich Dt Dt Vb Dt Nn Dt

Nn Vb Dt Nn Dt Vb Vb Dt Nn Compute losses and construct example: ( { w=monster, p=Dt, }, [1,0,1] ) l=1 l=0 l=1 Learning a Policy[Chang+ 15, Ross+15] At ? state, we construct a cost-sensitive multi-class example (?, [0, .2, .8])

E loss=0 E loss=.2 ? E loss=.8 rollout rollin one-step deviations Kai-Wei Chang 16 Analysis [ICML 15]: Learning to search better than your teacher

17 Analysis Roll-in with Ref: unbounded structured regret 18 Analysis Roll-out with Ref: no local optimal if reference is sub-optimal 19 Analysis

Roll-in & Roll-out with current policy ignore Ref 20 Analysis Minimizes a combination of regret to Ref and regret to its own one-step deviations. Competes with Ref when Ref is good. Competes with local deviations to improve on suboptimal Ref 21 How to Program? Sample codes are available in VW CS XXX Lecture 1

22

## Recently Viewed Presentations

• Parkin y Esquivel (1999) Adam Smith (1723-1790) La riqueza de las Naciones (1776) Macroeconomía: es el estudio de la economía nacional y de la economía global. Busca explicar los precios promedio y el empleo, el ingreso y la producción totales....
• Part Four - The World Wide Web and Windows XP Objectives Start your browser. Change browser options. Navigate Web pages. Display a Web page on your desktop. Manage Web pages. Download Web files. Use e-mail.
• The TR/LM schema is a representation of this very basic relation. It has two roles: --The TR - the object being located -- the LM - the object whose location is already known, I.e. the reference object These terms are...
• Design of Beams, columns and Foundation. Design of shear and retaining walls. ... Loads: Due to surcharge - 0.363. kip/ft2 ( Acting Downward) Active earth pressure - 2.4. ... Design of Beams,Columns and Footings
• cadre: In Asian Communist systems, party members serving as officials. France (1 of 2) During the seventeenth and eighteenth centuries, France set the pattern for the rest of Europe with its heavily bureaucratized state.
• Highest Common Factor HCF. Take two numbers. What's the biggest number that will go into both? 2 goes into 8 four times 8 ÷ 2 = 4. 2 goes into 10 five times 10 ÷ 2 = 5
• An Introduction to Operating Systems Definition An Operating System, or OS, is low-level software that enables a user and higher-level application software to interact with a computer's hardware and the data and other programs stored on the computer.
• Median Search Time in Seconds In top 20 results Not in top 20 results Category is 36% faster Category is 48% faster Source: Chen & Dumais \$ 1,000,000 Call response costs per month \$ 3,600,000 Service costs savings per year...