CSE 582 - Compilers - University of Washington

CSE 582 - Compilers - University of Washington

CSE P501 Compiler Construction Compiler Backend Organization Instruction Selection Spring 2014 Instruction Selection Instruction Scheduling Registers Allocation Peephole Optimization Peephole Instruction Selection Jim Hogg - UW - CSE - P501 N-1 A Compiler Source Front End Middle End chars Back End IR IR Select Instructions

Scan tokens Parse IR Optimize Allocate Registers IR AST Emit Convert IR AST = Abstract Syntax Tree IR = Intermediate Representation Spring 2014 Target IR Machine Code Instruction Selection: processors don't support IR; need to convert IR into real machine code Jim Hogg - UW - CSE - P501 N-2 The Big Picture Compiler = lots of fast stuff, followed by hard problems Scanner: O(n)

Parser: O(n) Analysis & Optimization: ~ O(n log n) Instruction selection: fast or NP-Complete Instruction scheduling: NP-Complete Register allocation: NP-Complete Recall: approach in P501 to describing backend is 'survey' level A deeper dive would require a further full, 10-week course Spring 2014 Jim Hogg - UW - CSE - P501 N-3 Compiler Backend: 3 Parts Select Instructions Schedule Instructions eg: Best x86 instruction to implement: vr17 = 12(arp) ? eg: Given instructions a,b,c,d,e,f,g would the program run faster if they were issued in a different order, such as d,c,a,b,g,e,f ? Allocate Registers eg: which variables to store in registers? which to store in memory? Spring 2014 Jim Hogg - UW - CSE - P501 N-4

Instruction Selection is . . . Mid-Level IR TAC - Three Address Code t1 a op b Tree or Linear supply of temps Array address calcs. explicit Optimizations all done Storage locations decided Spring 2014 Select Real Instructions Low-Level IR Specific to one chip/ISA Use chip addressing modes But still not decided on registers Jim Hogg - UW - CSE - P501 N-5 Instruction Scheduling is . . . a b c d e f g h Schedule Execute in-order to

get correct answer Originally devised for super-computers Now used everywhere: in-order procs - Atom, older ARM out-of-order procs - newer x86 Compiler does 'heavy lifting' - reduce chip power Spring 2014 b a d f c g h f Issue in new order eg: memory fetch is slow eg: divide is slow Overall faster Still get correct answer! Jim Hogg - UW - CSE - P501 N-6 Register Allocation is . . . Allocate Virtual registers or temps supply R1 = = R1 push R1 R1 = = R1 pop R1

Real machine registers eg: EAX, EBP Very finite supply! Enregister/Spill After allocating registers, may do a second pass of scheduling to improve speed of spill code Spring 2014 Jim Hogg - UW - CSE - P501 N-7 Instruction Selection Instruction selection chooses which target instructions (eg: for x86, for ARM) to use for each IR instruction Selecting the best instructions is massively difficult because modern ISAs provide: huge choice of instructions wide choice of addressing modes Eg: Intel's x64 Instruction Set Reference Manual = 1422 pages Spring 2014 Jim Hogg - UW - CSE - P501 N-8 Choices, choices ...

Most chip ISAs provide many ways to do the same thing: eg: to set eax to 0 on x86 has several alternatives: mov eax, 0 xor eax, eax sub eax, eax imul eax, 0 Many machine instructions do several things at once eg: register arithmetic and effective address calculation. Recall: lea rdst, [rbase + rindex*scale + offset] Spring 2014 Jim Hogg - UW - CSE - P501 N-9 Overview: Instruction Selection Map IR into near-assembly code Assume known storage layout and code shape For MiniJava, we emitted textual assembly code Commercial compilers emit binary code directly ie: optimization phases have already done their thing Combine low-level IR operations into machine

instructions (take advantage of addressing modes, etc) Spring 2014 Jim Hogg - UW - CSE - P501 N-10 Criteria for Instruction Selection Several possibilities Fastest Smallest Minimal power (eg: dont use a function-unit if leaving it powered-down is a win) Sometimes not obvious eg: if one of the function-units in the processor is idle and we can select an instruction that uses that unit, it effectively executes for free, even if that instruction wouldnt be chosen normally (Some interaction with scheduling here) Spring 2014 Jim Hogg - UW - CSE - P501 N-11 Instruction Selection: Approaches Two main techniques:

1. Tree-based matches (eg: maximul munch algorithm) 2. Peephole-based generation Note that we select instructions from the target ISA. We do not decide on which physical registers to use. That comes later during Register Allocation We have a few generic registers: ARP, StackPointer, ResultReg Spring 2014 Jim Hogg - UW - CSE - P501 N-12 Tree-Based Instruction Selection e f How to generate target code for this simple tree? We could use a template approach - similar to converting an AST into IR: ID ID case ID: t1 = offset(node) t2 = base(node) reg = nextReg() emit(loadA0, t1, t2, res) Resulting codegen is correct, but dumb Doesn't even cover: call-by-val, call-by-ref, enregistered, different data type, etc Spring 2014 Jim Hogg - UW - CSE - P501

N-13 Template Code Generation e f ID ID Naive loadI loadAO loadI loadAO mult 4 rarp, r5=> 8 => rarp, r7=> r6, r8 => Spring 2014 case ID: t1 = offset(node) t2 = base(node) reg = nextReg() emit(loadA0, t1, t2, res) Ideal => r5 r6 r7 r8 r9 loadAI rarp, 4 => r5 loadAI rarp, 8 =>

r6 mult r5, r6 => r7 Jim Hogg - UW - CSE - P501 N-14 IR (3-address Code) Tree-lets Rules for Tree-to-Target Conversion (prefix notation) Production ILOC Template 5 ... 6 Reg Lab 8 9 load l => rn Reg Num Reg Reg1 load r1 => rn 10 Reg + Reg1 Reg2 loadA0 r1, r2 => rn 11 Reg + Reg1 Num2 loadAI r1, n2 => rn 14 Reg + Lab1 Reg2 loadAI r2, l1 => rn 15 Reg + Reg1 Reg2 add 16 Reg + Reg1 Num2

19 Reg + Lab1 Reg2 20 ... r1, r2 => rn addI addI r1, 22 => rn + Reg1 Num2 r2, l1 => rn + Reg1 Num2 Memory Dereference Spring 2014 11 Jim Hogg - UW - CSE - P501 N-15 Example: Tiling a tiny 4-node tree Load variable + Lab @G Num 12 The tree above shows code to access variable, c, stored at offset 12 bytes from label G How many ways can we tile this tree into equivalent ILOC code? Spring 2014

Jim Hogg - UW - CSE - P501 N-16 Potential Matches 11 14 + 6 + Num 12 Lab @G <6,11> loadI @G => ri loadAI ri 12 => rj 9 16 Lab @G 8 <8,14> loadI l2 => ri loadAI ri @G => rj 19 +

<6,16,9> Lab @G 10 + 8 + Lab @G 8 Num 12 <6,8,10> loadI @G => ri loadI l2 => rj loadAO ri rj => rk loadI l2 => ri loadI @G => rj loadAO ri rj => rk 9 9 8 6

Num 12 Lab @G <8,19,9> 6 Num 12 Lab @G 15 + Num 12 6 Num 12 Lab @G 9 6 10 <8,6,10> 15

+ 8 6 Num 12 Lab @G <6,8,15,9> + 8 Num 12 <8,6,15,9> N-17 Tree-Pattern Matching A given tiling implements a tree if: covers every node in the tree, and overlap between any two tiles (trees) is limited to a single node If is in the tiling, then node is also covered by a leaf in another operation tree in the tiling unless it is the root Where two operation trees meet, they must be compatible (ie: expect the same value in the same location) Spring 2014

Jim Hogg - UW - CSE - P501 N-18 IR AST for Simple Expression = - + Val arp a=b-2 c Num 4 Num 2 + Val arp a local var, offset 4 from arp b call-by-ref param c var, offset 12 from label @G + Num -16 Val @G Num

12 Prefix form - same info as IR tree: = + Val1 Num1 - + Val2 Num2 Num3 + Lab1 Num4 Spring 2014 Jim Hogg - UW - CSE - P501 N-19 IR AST as Prefix Text String = + Val1 Num1 - + Val2 Num2 Num3 + Lab1 Num4 - don't need them: evaluate this expression from rightNo parentheses? to-left, using simple stack machine Rewrite Rules - Another BNF, but including ambiguity! 5 6 Production ... Reg Lab1 ILOC Template 7 Reg Val1 8 Reg Num1 loadl n1 => rn 9 Reg Reg1 load r1

load l1 => rn => rn 10 Reg + Reg1 Reg2 loadA0 r1, r2 => rn 11 Reg + Reg1 Num2 loadAI r1, n2 => rn 14 Reg + Lab1 Reg2 loadAI r2, l1 => rn 15 Reg + Reg1 Reg2 add r1, r2 => rn 16 Reg + Reg1 Num2 addI r1, 22 => rn 19 Reg + Lab1 Reg2 addI r2, l1 => rn 20 ... Spring 2014 Jim Hogg - UW - CSE - P501 N-20 Select Instructions with LR Parser Production

Template 5 ... 6 Reg Lab1 7 Reg Val1 8 Reg Num1 9 Reg Reg1 = + Val1 Num1 - + Val2 Num2 Num3 + Lab1 Num4 ILOC load l1 loadl load r1 => rn n1 => rn => rn 10 Reg + Reg1 Reg2 loadA0 r1, r2 => rn 11 Reg + Reg1 Num2 loadAI r1, n2 => rn 14 Reg + Lab1 Reg2 loadAI r2, l1 => rn

15 Reg + Reg1 Reg2 add r1, r2 => rn 16 Reg + Reg1 Num2 addI r1, 22 => rn 19 Reg + Lab1 Reg2 addI r2, l1 => rn 20 ... Use LR parser for Grammar to parse prefix expression Ambiguous, so lots of parse-action conflicts: resolve with tiebreaker rules: Lowest cost (need to augment Grammar productions with cost) Favor large reductions over smaller ("maximal munch") reduce-reduce conflict - choose longer reduction shift-reduce conflict - choose shift => largest number of prefix ops translated into machine instruction Spring 2014 Jim Hogg - UW - CSE - P501 N-21 Peephole Optimization Originally devised as the last optimization pass of a compiler Examine a small, sliding window of target code (a few adjacent instructions) and optimize Reminder Cooper&Torczon "ILOC" => rarp, 8

Original storeAI r1 loadAI rarp, 8 => r15 Optimized storeAI r1 => rarp, 8 i2i r1 => r15 Spring 2014 Memory[rarp + 8] = r1 r15 = Memory[rarp + 8] Memory[rarp + 8] = r1 r15 = r1 Jim Hogg - UW - CSE - P501 N-22 Peeps, 2 Reminder addI r2, 0 => r7 mult r4, r7 => r10 r7 = r2 + 0 r10 = r4 * r7 Optimized mult r4, r2 => r10 r10 = r4 * r2 Original Spring 2014 Original jumpI L10 L10: jumpI L20 Optimized jumpI L20 L10: jumpI L20 Jim Hogg - UW - CSE - P501 N-23

Modern Peephole Optimizer Modern ISAs are enormous Linear search for a match no longer fast enough So . . . Expander IR Simplifier LIR Matcher LIR LIR Like a miniature compiler Use for both peeps and instruction-selection Spring 2014 Jim Hogg - UW - CSE - P501 N-24 Instruction Selection via Peeps IR LIR t1 = 2 c a = b - t1 r10 = 2 r11 = @G r12 = 12 r13 = r11 + r12 r14 = M[r13] r15 = r10 * r14 r16 = -16 r17 = rarp + r16 r18 = M[r17]

r19 = M[r18] r20 = r19 - r15 r21 = 4 14 r22 = r arp + rInstructions 21 M[r22] = r20 Spring 2014 LIR Simplified LIR r10 = 2 r11 = @G r14 = M(r11+r12) r15 = r10 * r14 r18 = M(rarp16) r19 = M(r18) r20 = r19 - r15 8 Instructions M(r arp+4) = r20 a 4(arp) b -16(arp) parameter c 12(@G) loadI loadI loadAI mult loadAI load sub storeAI rarp,4 2 @G r11, 12 r10, r14

rarp, -16 r18 r19, r15 r20 => => => => => => => => r10 r11 r14 r15 r18 r19 r20 Local Variable Call-by-reference Variable Jim Hogg - UW - CSE - P501 N-25 The Simplifier r10 = 2 r11 = @G r12 = 12 r11 = @G r12 = 12 r13 = r11 + r12 r11 = @G r13 = r11 + 12 r14 = M[r13] r14 = M[r11+12] r15 = r10 * r14 r16 = -16

r15 r16 r17 r16 r19 r20 r21 = r10 * r14 = -16 = rarp + r15 = r10 * r14 r17 = rarp - 16 r18 = M[r17] = M[r18] = r19 - r15 = 4 r20 = r19 - r15 r21 = 4 r22 = rarp + r21 r18 = M[rarp-16] r19 = M[r18] r20 = r19 - r15 r20 = r19 - r15 M[r22] = r20 Roll r11 = r14 = 12] r15 = r15 = r18 = r19 = @G M[r11 + r10 * r14 r10 * r14

M[rarp-16] M[r18] r20 = r19 - r15 r22 = rarp + 4 M[r22] = r20 Const Prop + DCE This example is simplified - never re-uses a virtual register: so we can delete the instruction after a constant propagation. In general, we would have to generate liveness info for each virtual register to perform safe DCE Spring 2014 Jim Hogg - UW - CSE - P501 N-26 Next Instruction Scheduling Register Allocation And more. Spring 2014 Jim Hogg - UW - CSE - P501 N-27

Recently Viewed Presentations

  • The Depression - Ms. Andres&#x27; Class

    The Depression - Ms. Andres' Class

    The government and those around them began to profit by making them a significant tourist attraction in Ontario. Ontario Premier Mitchell Hepburn with the Dionne babies ca. 1934 Canadians saving lives: The discovery of insulin Sir Frederick Banting was a...
  • Completed by: Dawn Cooper EDUC 524 Spring 2012

    Completed by: Dawn Cooper EDUC 524 Spring 2012

    Completed by: Dawn Cooper EDUC 524 Spring 2012
  • Continuous Quality Improvement: Executive Orientation

    Continuous Quality Improvement: Executive Orientation

    Catalog features/functions of each other potential products. Current Product Funding. Document cost of each product. Explore how each product is funded for each user group. Explore how much staff task time is required to complete workflow (e.g. budget-based workload analysis,...
  • Newsletter 4 - Ohio State University

    Newsletter 4 - Ohio State University

    Ohio Military Kids is a partnership between Ohio State Extension 4-H Youth Development and Family Readiness with Ohio National Guard. Through this partnership, Ohio Military Kids puts on events all around the state of Ohio. These events range from one...
  • Lecture 6 - Florida State University

    Lecture 6 - Florida State University

    Deasserted: The register file destination number for the Write registercomes from the rtfield. ... The jump target address (IR[25-0] shifted left 2 bits and concatenated with PC + 4[31-28]) is sent to thePCfor writing. Multi-cycle datapath and control.
  • CREDIT BASICS AND CREDIT SCORES Personal Finance Mrs.

    CREDIT BASICS AND CREDIT SCORES Personal Finance Mrs.

    Mrs. Bullock. Your Present Self Impacts Your Future Self. Credit availability depends on if lenders trust you will pay back the loan as agreed. ... _____credit report annually from each of the three credit reporting agencies. If _____ credit, the...
  • Ecozones of Canada

    Ecozones of Canada

    Écozones du Canada Écozones marines Écozones terrestres Écozones Marine Pacifique Archipel arctique Bassin arctique Atlantique Nord-Ouest Atlantique Terrestres Cordillère arctique Haut-arctique Bas-arctique Taïga des plaines Taïga du bouclier Taïga de la cordillère Plaines hudsoniennes Plaines boréales Bouclier boréale Cordillère boréale...
  • BCs professional Annual General association Meeting for registered

    BCs professional Annual General association Meeting for registered

    College Liaison Committee . The DTABC and our College are working in partnership with the Denturists and the Dental Hygienists to submit a joint proposal to government for an expanded Scope of Practice for all three professions.