CSE P501 Compiler Construction Compiler Backend Organization Instruction Selection Spring 2014 Instruction Selection Instruction Scheduling Registers Allocation Peephole Optimization Peephole Instruction Selection Jim Hogg - UW - CSE - P501 N-1 A Compiler Source Front End Middle End chars Back End IR IR Select Instructions
Scan tokens Parse IR Optimize Allocate Registers IR AST Emit Convert IR AST = Abstract Syntax Tree IR = Intermediate Representation Spring 2014 Target IR Machine Code Instruction Selection: processors don't support IR; need to convert IR into real machine code Jim Hogg - UW - CSE - P501 N-2 The Big Picture Compiler = lots of fast stuff, followed by hard problems Scanner: O(n)
Parser: O(n) Analysis & Optimization: ~ O(n log n) Instruction selection: fast or NP-Complete Instruction scheduling: NP-Complete Register allocation: NP-Complete Recall: approach in P501 to describing backend is 'survey' level A deeper dive would require a further full, 10-week course Spring 2014 Jim Hogg - UW - CSE - P501 N-3 Compiler Backend: 3 Parts Select Instructions Schedule Instructions eg: Best x86 instruction to implement: vr17 = 12(arp) ? eg: Given instructions a,b,c,d,e,f,g would the program run faster if they were issued in a different order, such as d,c,a,b,g,e,f ? Allocate Registers eg: which variables to store in registers? which to store in memory? Spring 2014 Jim Hogg - UW - CSE - P501 N-4
Instruction Selection is . . . Mid-Level IR TAC - Three Address Code t1 a op b Tree or Linear supply of temps Array address calcs. explicit Optimizations all done Storage locations decided Spring 2014 Select Real Instructions Low-Level IR Specific to one chip/ISA Use chip addressing modes But still not decided on registers Jim Hogg - UW - CSE - P501 N-5 Instruction Scheduling is . . . a b c d e f g h Schedule Execute in-order to
get correct answer Originally devised for super-computers Now used everywhere: in-order procs - Atom, older ARM out-of-order procs - newer x86 Compiler does 'heavy lifting' - reduce chip power Spring 2014 b a d f c g h f Issue in new order eg: memory fetch is slow eg: divide is slow Overall faster Still get correct answer! Jim Hogg - UW - CSE - P501 N-6 Register Allocation is . . . Allocate Virtual registers or temps supply R1 = = R1 push R1 R1 = = R1 pop R1
Real machine registers eg: EAX, EBP Very finite supply! Enregister/Spill After allocating registers, may do a second pass of scheduling to improve speed of spill code Spring 2014 Jim Hogg - UW - CSE - P501 N-7 Instruction Selection Instruction selection chooses which target instructions (eg: for x86, for ARM) to use for each IR instruction Selecting the best instructions is massively difficult because modern ISAs provide: huge choice of instructions wide choice of addressing modes Eg: Intel's x64 Instruction Set Reference Manual = 1422 pages Spring 2014 Jim Hogg - UW - CSE - P501 N-8 Choices, choices ...
Most chip ISAs provide many ways to do the same thing: eg: to set eax to 0 on x86 has several alternatives: mov eax, 0 xor eax, eax sub eax, eax imul eax, 0 Many machine instructions do several things at once eg: register arithmetic and effective address calculation. Recall: lea rdst, [rbase + rindex*scale + offset] Spring 2014 Jim Hogg - UW - CSE - P501 N-9 Overview: Instruction Selection Map IR into near-assembly code Assume known storage layout and code shape For MiniJava, we emitted textual assembly code Commercial compilers emit binary code directly ie: optimization phases have already done their thing Combine low-level IR operations into machine
instructions (take advantage of addressing modes, etc) Spring 2014 Jim Hogg - UW - CSE - P501 N-10 Criteria for Instruction Selection Several possibilities Fastest Smallest Minimal power (eg: dont use a function-unit if leaving it powered-down is a win) Sometimes not obvious eg: if one of the function-units in the processor is idle and we can select an instruction that uses that unit, it effectively executes for free, even if that instruction wouldnt be chosen normally (Some interaction with scheduling here) Spring 2014 Jim Hogg - UW - CSE - P501 N-11 Instruction Selection: Approaches Two main techniques:
1. Tree-based matches (eg: maximul munch algorithm) 2. Peephole-based generation Note that we select instructions from the target ISA. We do not decide on which physical registers to use. That comes later during Register Allocation We have a few generic registers: ARP, StackPointer, ResultReg Spring 2014 Jim Hogg - UW - CSE - P501 N-12 Tree-Based Instruction Selection e f How to generate target code for this simple tree? We could use a template approach - similar to converting an AST into IR: ID ID case ID: t1 = offset(node) t2 = base(node) reg = nextReg() emit(loadA0, t1, t2, res) Resulting codegen is correct, but dumb Doesn't even cover: call-by-val, call-by-ref, enregistered, different data type, etc Spring 2014 Jim Hogg - UW - CSE - P501
N-13 Template Code Generation e f ID ID Naive loadI loadAO loadI loadAO mult 4 rarp, r5=> 8 => rarp, r7=> r6, r8 => Spring 2014 case ID: t1 = offset(node) t2 = base(node) reg = nextReg() emit(loadA0, t1, t2, res) Ideal => r5 r6 r7 r8 r9 loadAI rarp, 4 => r5 loadAI rarp, 8 =>
19 Reg + Lab1 Reg2 20 ... r1, r2 => rn addI addI r1, 22 => rn + Reg1 Num2 r2, l1 => rn + Reg1 Num2 Memory Dereference Spring 2014 11 Jim Hogg - UW - CSE - P501 N-15 Example: Tiling a tiny 4-node tree Load variable + Lab @G Num 12 The tree above shows code to access variable, c, stored at offset 12 bytes from label G How many ways can we tile this tree into equivalent ILOC code? Spring 2014
Jim Hogg - UW - CSE - P501 N-16 Potential Matches 11 14 + 6 + Num 12 Lab @G <6,11> loadI @G => ri loadAI ri 12 => rj 9 16 Lab @G 8 <8,14> loadI l2 => ri loadAI ri @G => rj 19 +
<6,16,9> Lab @G 10 + 8 + Lab @G 8 Num 12 <6,8,10> loadI @G => ri loadI l2 => rj loadAO ri rj => rk loadI l2 => ri loadI @G => rj loadAO ri rj => rk 9 9 8 6
Num 12 Lab @G <8,19,9> 6 Num 12 Lab @G 15 + Num 12 6 Num 12 Lab @G 9 6 10 <8,6,10> 15
+ 8 6 Num 12 Lab @G <6,8,15,9> + 8 Num 12 <8,6,15,9> N-17 Tree-Pattern Matching A given tiling implements a tree if: covers every node in the tree, and overlap between any two tiles (trees) is limited to a single node If is in the tiling, then node is also covered by a leaf in another operation tree in the tiling unless it is the root Where two operation trees meet, they must be compatible (ie: expect the same value in the same location) Spring 2014
Jim Hogg - UW - CSE - P501 N-18 IR AST for Simple Expression = - + Val arp a=b-2 c Num 4 Num 2 + Val arp a local var, offset 4 from arp b call-by-ref param c var, offset 12 from label @G + Num -16 Val @G Num
12 Prefix form - same info as IR tree: = + Val1 Num1 - + Val2 Num2 Num3 + Lab1 Num4 Spring 2014 Jim Hogg - UW - CSE - P501 N-19 IR AST as Prefix Text String = + Val1 Num1 - + Val2 Num2 Num3 + Lab1 Num4 - don't need them: evaluate this expression from rightNo parentheses? to-left, using simple stack machine Rewrite Rules - Another BNF, but including ambiguity! 5 6 Production ... Reg Lab1 ILOC Template 7 Reg Val1 8 Reg Num1 loadl n1 => rn 9 Reg Reg1 load r1
15 Reg + Reg1 Reg2 add r1, r2 => rn 16 Reg + Reg1 Num2 addI r1, 22 => rn 19 Reg + Lab1 Reg2 addI r2, l1 => rn 20 ... Use LR parser for Grammar to parse prefix expression Ambiguous, so lots of parse-action conflicts: resolve with tiebreaker rules: Lowest cost (need to augment Grammar productions with cost) Favor large reductions over smaller ("maximal munch") reduce-reduce conflict - choose longer reduction shift-reduce conflict - choose shift => largest number of prefix ops translated into machine instruction Spring 2014 Jim Hogg - UW - CSE - P501 N-21 Peephole Optimization Originally devised as the last optimization pass of a compiler Examine a small, sliding window of target code (a few adjacent instructions) and optimize Reminder Cooper&Torczon "ILOC" => rarp, 8
Modern Peephole Optimizer Modern ISAs are enormous Linear search for a match no longer fast enough So . . . Expander IR Simplifier LIR Matcher LIR LIR Like a miniature compiler Use for both peeps and instruction-selection Spring 2014 Jim Hogg - UW - CSE - P501 N-24 Instruction Selection via Peeps IR LIR t1 = 2 c a = b - t1 r10 = 2 r11 = @G r12 = 12 r13 = r11 + r12 r14 = M[r13] r15 = r10 * r14 r16 = -16 r17 = rarp + r16 r18 = M[r17]
M[rarp-16] M[r18] r20 = r19 - r15 r22 = rarp + 4 M[r22] = r20 Const Prop + DCE This example is simplified - never re-uses a virtual register: so we can delete the instruction after a constant propagation. In general, we would have to generate liveness info for each virtual register to perform safe DCE Spring 2014 Jim Hogg - UW - CSE - P501 N-26 Next Instruction Scheduling Register Allocation And more. Spring 2014 Jim Hogg - UW - CSE - P501 N-27
The government and those around them began to profit by making them a significant tourist attraction in Ontario. Ontario Premier Mitchell Hepburn with the Dionne babies ca. 1934 Canadians saving lives: The discovery of insulin Sir Frederick Banting was a...
Catalog features/functions of each other potential products. Current Product Funding. Document cost of each product. Explore how each product is funded for each user group. Explore how much staff task time is required to complete workflow (e.g. budget-based workload analysis,...
Ohio Military Kids is a partnership between Ohio State Extension 4-H Youth Development and Family Readiness with Ohio National Guard. Through this partnership, Ohio Military Kids puts on events all around the state of Ohio. These events range from one...
Deasserted: The register file destination number for the Write registercomes from the rtfield. ... The jump target address (IR[25-0] shifted left 2 bits and concatenated with PC + 4[31-28]) is sent to thePCfor writing. Multi-cycle datapath and control.
Mrs. Bullock. Your Present Self Impacts Your Future Self. Credit availability depends on if lenders trust you will pay back the loan as agreed. ... _____credit report annually from each of the three credit reporting agencies. If _____ credit, the...
College Liaison Committee . The DTABC and our College are working in partnership with the Denturists and the Dental Hygienists to submit a joint proposal to government for an expanded Scope of Practice for all three professions.
Ready to download the document? Go ahead and hit continue!