Practical Approaches to Formally Verify Concurrent Software Ganesh

Practical Approaches to Formally Verify Concurrent Software Ganesh

Practical Approaches to Formally Verify Concurrent Software Ganesh Gopalakrishnan Microsoft HPC Institutes, NSF CNS-0509379 SRC Contract TJ 1318 http://www.cs.utah.edu/formal_verification (talk is kept at above URL under presentations/ce-junior-seminar-2008.pptx) 1 Multicores are the future! Why ? (photo courtesy of Intel Corporation.) 2 Multicores are the future! Why ? That is the only way to get more compute power

Energy per compute operation lowered thru the use of parallelism - e.g. Todays DSP processors (photo courtesy of Intel Corporation.) 3 How many different types of multicore CPUs? (photo courtesy of Intel Corporation.) Quite a variety Use in cloud computing Use in hand-helds

Use in embedded devices Use in Graphics / Gaming Use in HPC A plethora of SW and HW challenges 4 HW design challenges: Hierarchical Cache Coherence Protocols will play a major role in multi-core processors Chip-level protocols Intra-cluster protocols

mem dir mem dir Inter-cluster protocols State Space grows multiplicatively across the hierarchy! Verification will become harder 5 Other challenges : Varying grains of atomicity in modeling / verification One step in highlevel home remot e Multiple steps in lowlevel hom e route

r remot e buf an atomic guarded/command 6 Diagrams showing Xiaofang Chens PhD work that deals with complex cache coherence protocols : Abstraction / refinement Inv P1 Inv P2 Inv P Inv P3 7 Step 1 of Refinement Inv P2 P1

Inv P3 Inv P2 Inv P1 Inv P3 8 Step 2 of Refinement Inv P2 P1 Inv P3 Inv P2 Inv P1 Inv P3 Inv P2 nv P1 Inv P3 9

Final Step of Refinement Inv P2 P1 Inv P3 Inv P3 Inv P2 v P1 Inv P3 Inv P2 Inv P1 Inv P2 Inv P1 Inv P3 10 Multicore Challenges :

Need to employ / teach concurrent programming at an unprecedented scale! Some of todays proposals: Threads (various) (photo courtesy of Intel Corporation.) Message Passing (various) Transactional Memory (various) OpenMP MPI

Intels Ct Microsofts Parallel Fx Cilk Artss Cilk Intels TBB Nvidias Cuda 11 Sequential program verification remains hard!

main(){ int Z1, Z2, Z3; int x1, x2; int z11, z12, z13, z21, z22, z23; /* x1 = x2; */ z11 = z21; z12 = z22; z13 = z23; if (x1 == 1) z11 = Z1; if (x1 == 2) z12 = Z2; if (x1 == 3) z13 = Z3; if (x2 == 1) z21 = Z1; else if (x2 == 2) z22 = Z2; else if (x2 == 3) z23 = Z3; assert((z11 + z12 + z13) == (z21 + z22 + z23)); } How might we prove / disprove the assertion for ALL possible initial values of the variables ? (Welcome to symbolic methods!) Concurrent program verification is harder !! main(){ int Z1, Z2, Z3; int x1, x2; int z11, z12, z13, z21, z22, z23; /* x1 = x2; */ z11 = z21; z12 = z22; z13 = z23; At each ; lurks a bug !! *see below if (x1 == 1) z11 = Z1; if (x1 == 2) z12 = Z2; if (x1 == 3) z13 = Z3;

if (x2 == 1) z21 = Z1; else if (x2 == 2) z22 = Z2; else if (x2 == 3) z23 = Z3; assert((z11 + z12 + z13) == (z21 + z22 + z23)); } (* More specifically: at each grain boundary of atomicity lurks a potential interleaving that was not considered) Q: Why is concurrent program debugging hard ? A: Too many interleavings !! Card Deck 1 Card Deck 0 0: 1: 2: 3: 4: 5: 0: 1: 2: 3: 4:

5: Suppose only the interleavings of the red cards matter Then dont try all riffle-shuffles (12!) / ((6!) (6!)) = 924 Instead just try TWO shuffles of the decks !! 14 The Growth of (n.p)! / (n!)p Thread 1 . Thread p 1: 2: 3: 4: n: 1: 2: 3: 4: n:

Unity / Murphi guard / action rules : n=1, p=R p = 3, n = 5 106 interleavings p = 3, n = 6 17 * 106 interleavings p = 4, n = 5 1010 interleavings R! interleavings 15 Ad-hoc Testing is INEFFECTIVE for thread verification ! Thread 1 1: 2: 3: 4:

n: . Thread p 1: 2: 3: 4: n: MUST reduce the number of interleavings considered Rigorous proof to back omitting interleavings DYNAMIC PARTIAL ORDER REDUCTION achieves both 16 Example: Dining Philosophers in C / PThreads #include // Dining Philosophers with no deadlock #include // all phils but "odd" one pickup their #include // left fork first; odd phil picks #include // up right fork first #include #include

#include #include #define NUM_THREADS 3 pthread_mutex_t mutexes[NUM_THREADS]; pthread_cond_t conditionVars[NUM_THREADS]; int permits[NUM_THREADS]; pthread_t tids[NUM_THREADS]; int data = 0; void * Philosopher(void * arg){ int i; i = (int)arg; // pickup left fork pthread_mutex_lock(&mutexes[i%NUM_THREADS]); while (permits[i%NUM_THREADS] == 0) { printf("P%d : tryget F%d\n", i, i%NUM_THREADS); pthread_cond_wait(&conditionVars[i%NUM_THREADS],&mutexes[i %NUM_THREADS]); } permits[i%NUM_THREADS] = 0; printf("P%d : get F%d\n", i, i%NUM_THREADS); pthread_mutex_unlock(&mutexes[i%NUM_THREADS]); // pickup right fork pthread_mutex_lock(&mutexes[(i+1)%NUM_THREADS]); while (permits[(i+1)%NUM_THREADS] == 0) { printf("P%d : tryget F%d\n", i, (i+1)%NUM_THREADS);

pthread_cond_wait(&conditionVars[(i+1)%NUM_THREADS],&mutexes[ (i+1)%NUM_THREADS]); } permits[(i+1)%NUM_THREADS] = 0; printf("P%d : get F%d\n", i, (i+1)%NUM_THREADS); pthread_mutex_unlock(&mutexes[(i+1)%NUM_THREADS]); //printf("philosopher %d thinks \n",i); printf("%d\n", i); // data = 10 * data + i; fflush(stdout); // putdown right fork pthread_mutex_lock(&mutexes[(i+1)%NUM_THREADS]); permits[(i+1)%NUM_THREADS] = 1; printf("P%d : put F%d\n", i, (i+1)%NUM_THREADS); pthread_cond_signal(&conditionVars[(i+1)%NUM_THREADS]); pthread_mutex_unlock(&mutexes[(i+1)%NUM_THREADS]); 17 Philosophers in PThreads // putdown left fork pthread_mutex_lock(&mutexes[i%NUM_THREADS]); permits[i%NUM_THREADS] = 1; printf("P%d : put F%d \n", i, i%NUM_THREADS); pthread_cond_signal(&conditionVars[i%NUM_THREADS]);

pthread_mutex_unlock(&mutexes[i%NUM_THREADS]); pthread_create(&tids[NUM_THREADS-1], NULL, OddPhilosopher, (void*)(NUM_THREADS-1) ); for (i = 0; i < NUM_THREADS; i++){ pthread_join(tids[i], NULL); } // putdown right fork pthread_mutex_lock(&mutexes[(i+1)%NUM_THREADS]); permits[(i+1)%NUM_THREADS] = 1; printf("P%d : put F%d \n", i, (i+1)%NUM_THREADS); pthread_cond_signal(&conditionVars[(i+1)%NUM_THREADS]); pthread_mutex_unlock(&mutexes[(i+1)%NUM_THREADS]); for (i = 0; i < NUM_THREADS; i++){ pthread_mutex_destroy(&mutexes[i]); } for (i = 0; i < NUM_THREADS; i++){ pthread_cond_destroy(&conditionVars[i]); } //printf(" data = %d \n", data); return NULL; //assert( data != 201); return 0;

} } int main(){ int i; for (i = 0; i < NUM_THREADS; i++) pthread_mutex_init(&mutexes[i], NULL); for (i = 0; i < NUM_THREADS; i++) pthread_cond_init(&conditionVars[i], NULL); for (i = 0; i < NUM_THREADS; i++) permits[i] = 1; for (i = 0; i < NUM_THREADS-1; i++){ pthread_create(&tids[i], NULL, Philosopher, (void*)(i) ); } 18 Plain run of Philosophers gcc -g -O3 -o nobug examples/Dining3.c -L ./lib -lpthread -lstdc++ -lssl % time nobug P0 : get F0 P0 : get F1 0 P0 : put F1 P0 : put F0 P1 : get F1

P1 : get F2 1 P1 : put F2 P1 : put F1 P2 : get F0 P2 : get F2 2 P2 : put F2 P2 : put F0 real 0m0.075s user 0m0.001s sys 0m0.008s 19 Buggy Philosophers in PThreads // putdown left fork pthread_mutex_lock(&mutexes[i%NUM_THREADS]); permits[i%NUM_THREADS] = 1; printf("P%d : put F%d \n", i, i%NUM_THREADS); pthread_cond_signal(&conditionVars[i%NUM_THREADS]); pthread_mutex_unlock(&mutexes[i%NUM_THREADS]); pthread_create(&tids[NUM_THREADS-1], NULL, Philosopher, (void*)(NUM_THREADS-1) ); for (i = 0; i < NUM_THREADS; i++){ pthread_join(tids[i], NULL);

} // putdown right fork pthread_mutex_lock(&mutexes[(i+1)%NUM_THREADS]); permits[(i+1)%NUM_THREADS] = 1; printf("P%d : put F%d \n", i, (i+1)%NUM_THREADS); pthread_cond_signal(&conditionVars[(i+1)%NUM_THREADS]); pthread_mutex_unlock(&mutexes[(i+1)%NUM_THREADS]); for (i = 0; i < NUM_THREADS; i++){ pthread_mutex_destroy(&mutexes[i]); } for (i = 0; i < NUM_THREADS; i++){ pthread_cond_destroy(&conditionVars[i]); } //printf(" data = %d \n", data); return NULL; //assert( data != 201); return 0; } } int main(){ int i; for (i = 0; i < NUM_THREADS; i++)

pthread_mutex_init(&mutexes[i], NULL); for (i = 0; i < NUM_THREADS; i++) pthread_cond_init(&conditionVars[i], NULL); for (i = 0; i < NUM_THREADS; i++) permits[i] = 1; for (i = 0; i < NUM_THREADS-1; i++){ pthread_create(&tids[i], NULL, Philosopher, (void*)(i) ); } 20 Plain run of buggy philosopher .. bugs missed by testing gcc -g -O3 -o buggy examples/Dining3Buggy.c -L ./lib -lpthread -lstdc++ -lssl % time buggy P0 : get F0 P0 : get F1 0 P0 : put F1 P0 : put F0 P1 : get F1 P1 : get F2 1 P1 : put F2 P1 : put F1 P2 : get F2 P2 : get F0

2 P2 : put F0 P2 : put F2 real 0m0.084s user 0m0.002s sys 0m0.011s 21 Jiggling Schedule in Buggy Philosopher.. #include // Dining Philosophers with no deadlock #include // all phils but "odd" one pickup their #include // left fork first; odd phil picks #include // up right fork first #include #include #include #include #define NUM_THREADS 3 pthread_mutex_t mutexes[NUM_THREADS]; pthread_cond_t conditionVars[NUM_THREADS]; int permits[NUM_THREADS]; pthread_t tids[NUM_THREADS]; int data = 0; void * Philosopher(void * arg){ int i; i = (int)arg;

// pickup left fork pthread_mutex_lock(&mutexes[i%NUM_THREADS]); while (permits[i%NUM_THREADS] == 0) { printf("P%d : tryget F%d\n", i, i%NUM_THREADS); pthread_cond_wait(&conditionVars[i%NUM_THREADS],&mutexes[i %NUM_THREADS]); } permits[i%NUM_THREADS] = 0; printf("P%d : get F%d\n", i, i%NUM_THREADS); pthread_mutex_unlock(&mutexes[i%NUM_THREADS]); nanosleep (0) added here // pickup right fork pthread_mutex_lock(&mutexes[(i+1)%NUM_THREADS]); while (permits[(i+1)%NUM_THREADS] == 0) { printf("P%d : tryget F%d\n", i, (i+1)%NUM_THREADS); pthread_cond_wait(&conditionVars[(i+1)%NUM_THREADS],&mutexes[ (i+1)%NUM_THREADS]); } permits[(i+1)%NUM_THREADS] = 0; printf("P%d : get F%d\n", i, (i+1)%NUM_THREADS); pthread_mutex_unlock(&mutexes[(i+1)%NUM_THREADS]); //printf("philosopher %d thinks \n",i); printf("%d\n", i); // data = 10 * data + i;

fflush(stdout); // putdown right fork pthread_mutex_lock(&mutexes[(i+1)%NUM_THREADS]); permits[(i+1)%NUM_THREADS] = 1; printf("P%d : put F%d\n", i, (i+1)%NUM_THREADS); pthread_cond_signal(&conditionVars[(i+1)%NUM_THREADS]); pthread_mutex_unlock(&mutexes[(i+1)%NUM_THREADS]); 22 Plain runs of buggy philosopher bug still very dodgy gcc -g -O3 -o buggynsleep examples/Dining3BuggyNanosleep0.c -L ./lib -lpthread -lstdc++ -lssl % buggysleep P0 : get F0 P0 : sleeping 0 ns P1 : get F1 P1 : sleeping 0 ns P2 : get F2 P2 : sleeping 0 ns P0 : tryget F1 P2 : tryget F0 P1 : tryget F2 First run deadlocked second did not ..

buggysleep P0 : get F0 P0 : sleeping 0 ns P0 : get F1 0 P0 : put F1 P0 : put F0 P1 : get F1 P1 : sleeping 0 ns P2 : get F2 P2 : sleeping 0 ns P1 : tryget F2 P2 : get F0 2 P2 : put F0 P2 : put F2 P1 : get F2 1 P1 : put F2 P1 : put F1 23 Dijkstras famous quote Testing only confirms the presence of errors never its absence

MUCH MORE TRUE for concurrent software Some terminology and an overview Concurrent includes Parallel (aka shared memory) and Distributed (aka message passing) In our research group, we have developed tools to verify PRACTICAL Parallel and Distributed software Currently for Pthreads (parallel) and MPI (distributed) This talk mainly focusses on Parallel (Pthreads / C) software verification An important use of message passing The scientific community is increasingly employing expensive supercomputers that employ distributed programming libraries. (BlueGene/L - Image courtesy of IBM / LLNL) to program large-scale simulations in all walks of science, engineering, math, economics, etc. (Image courtesy of Steve Parker, CSAFE, Utah)

26 Verification Realities Code written using mature libraries (MPI, OpenMP, PThreads, ) 27 Verification Realities Code written using mature libraries (MPI, OpenMP, PThreads, ) API calls made from real programming languages (C, Fortran, C++) 28 Verification Realities Code written using mature libraries (MPI, OpenMP, PThreads, ) API calls made from real programming languages (C, Fortran, C++) Runtime semantics determined by realistic Compilers and Runtimes

29 Verification Realities Code written using mature libraries (MPI, OpenMP, PThreads, ) API calls made from real programming languages (C, Fortran, C++) How best to verify codes that will run on actual platforms? Runtime semantics determined by realistic Compilers and Runtimes 30 Traditional verification methods dont work well in practice e.g. Classical Approach to Model Checking Finite State Model of Concurrent

Program Check Properties 31 Background Classical Approach to Model Checking Finite State Model of Concurrent Program Check Properties DRAWBACK : Extraction of Finite State Models for Realistic Programs is Difficult ! 32 Background Dynamic Model Checking Actual

Concurrent Program Check Properties 33 Background Dynamic Model Checking Actual Concurrent Program Check Properties MAIN ADVANTAGE: Avoid model extraction which can be tedious and imprecise Program serves as its own model Reduce Complexity through Reduction of Interleavings (and other methods) 34 Background Dynamic Model Checking in Practice

Actual Concurrent Program Check Properties One Specific Test Harness 35 Background Need test harness in Dynamic Model Checking in Practice order to run the code Actual Concurrent Program Check Properties We will Explore ALL RELEVANT INTERLEAVINGS for ONE test harness

One Specific Test Harness Conventional Testing tools cannot do this !! E.g. 5 threads 5 instructions each 1010 interleavings !! 36 Background Dynamic Model Checking in Practice Drawback: Need to consider all test harnesses Actual Concurrent Program One Specific Test Harness

Check Properties (but luckily, FOR MANY PROGRAMS, this number seems small -- e.g. Hypergraph partitioner test harness is the initial graph to be partitioned) 37 Workflow of Parallel Software Verification tool Inspect Multithreaded C/ C++ program instrumentation instrumented program compile executable thread 1 thread n

request/permit scheduler mit r e t/p s e u req Thread library wrapper 38 Result of instrumentation void * Philosopher(void * arg){ int i; i = (int)arg; ... pthread_mutex_lock(&mutexes[i%3]); ... while (permits[i%3] == 0) {

printf("P%d : tryget F%d\n", i, i%3); pthread_cond_wait(...); } ... permits[i%3] = 0; ... pthread_cond_signal(&conditionVars[i%3]); pthread_mutex_unlock(&mutexes[i%3]); return NULL; } void *Philosopher(void *arg ) { int i ; pthread_mutex_t *tmp ; { inspect_thread_start("Philosopher"); i = (int )arg; tmp = & mutexes[i % 3]; inspect_mutex_lock(tmp); while (1) { __cil_tmp43 = read_shared_0(& permits[i % 3]) if (! __cil_tmp32) { break; } __cil_tmp33 = i % 3; tmp___0 = __cil_tmp33;

inspect_cond_wait(...); } ... write_shared_1(& permits[i % 3], 0); ... inspect_cond_signal(tmp___25); 39 Inspect of nonbuggy and buggy Philosophers .. ./instrument file.c ./compile file.instr.c ./inspect ./target P0 : get F0 P0 : get F1 0 P0 : put F1 P0 : put F0 P1 : get F1 P1 : get F2 1 P1 : put F2 P1 : put F1 P2 : get F0 P2 : get F2 2 P2 : put F2

P2 : put F0 num of threads = 1 === run 2 === P0 : get F0 ... P1 : put F1 === run 48 === P2 : get F0 P2 : get F2 2 P2 : put F2 P2 : put F0 P0 : get F0 P0 : get F1 0 P1 : tryget F1 << Total number of runs: 48, Transitions explored: 1814 Used time (seconds): 7.999327

=== run 1 === P0 : get F0 P0 : get F1 0 P0 : put F1 P0 : put F0 P1 : get F1 P1 : get F2 1 P1 : put F2 P1 : put F1 P2 : get F2 P2 : get F0 2 P2 : put F0 P2 : put F2 === run 2 === P0 : get F0 P0 : get F1 0 P0 : put F1 P0 : put F0 P1 : get F1 P1 : get F2 1

P2 : tryget F2 P1 : put F2 P1 : put F1 === run 28 === P0 : get F0 P1 : get F1 P0 : tryget F1 P2 : get F2 P1 : tryget F2 P2 : tryget F0 Found a deadlock!! (0, thread_start) (0, mutex_init, 5) (0, mutex_init, 6) (0, mutex_init, 7) (0, cond_init, 8) (0, cond_init, 9) (0, cond_init, 10) (0, obj_write, 2) (0, obj_write, 3) (0, obj_write, 4) (0, thread_create, 1) (0, thread_create, 2) (0, thread_create, 3) (1, mutex_lock, 5)

(1, obj_read, 2) (1, obj_write, 2) (1, mutex_unlock, 5) (2, mutex_lock, 6) (2, obj_read, 3) (2, obj_write, 3) (2, mutex_unlock, 6) (1, mutex_lock, 6) (1, obj_read, 3) (1, mutex_unlock, 6) (3, mutex_lock, 7) (3, obj_read, 4) (3, obj_write, 4) (3, mutex_unlock, 7) (2, mutex_lock, 7) (2, obj_read, 4) (2, mutex_unlock, 7) (3, mutex_lock, 5) (3, obj_read, 2) (3, mutex_unlock, 5) (-1, unknown) Total number of runs: 29, killed-in-the-middle runs: 4

Transitions explored: 1193 Used time (seconds): 5.990523 40 The Growth of (n.p)! / (n!)p for Diningp.c illustrating the MAIN technology behind Inspect Diningp.c has n = 4 (roughly) p = 3 : We get 34,650 (loose upper-bound) versus 48 with DPOR p = 5 : We get 305,540,235,000 versus 2,375 with DPOR DPOR really works well in reducing the number of interleavings !! Testing will have to exhibit its cleverness among 3 * 1011 interleavings 41 Obtaining and Running Inspect (Linux) http://www.cs.utah.edu/~yuyang/inspect May need to obtain libssl-dev Need Ocaml-3.10.2 or higher Remove the contents of the cache directory autom4te.cache in case make loops

bin/instrument file.c bin/compile file.instr.c inspect help inspect target inspect s target 42 DPOR produces alternate interleaving ONLY when there are contentions First HAPPENS-BEFORE: Another HAPPENS-BEFORE pthread_mutex_lock(&mutex); A_count++; pthread_mutex_unlock(&mutex); pthread_mutex_lock(&mutex);

A_count-- ; pthread_mutex_unlock(&mutex); pthread_mutex_lock(&lock); B_count++; pthread_mutex_unlock(&lock); pthread_mutex_lock(&lock); B_count++; pthread_mutex_unlock(&lock); pthread_mutex_lock(&mutex); A_count-- ; pthread_mutex_unlock(&mutex); pthread_mutex_lock(&mutex); A_count++; pthread_mutex_unlock(&mutex); 43 The demos to follow will show that these ideas do work !! DEMO 1 : Seq C program verification DEMO 2 : Inspect DEMO 3 : Classical model checking

Recently Viewed Presentations

  • The Underwriting of Marine War, Terrorism and Piracy

    The Underwriting of Marine War, Terrorism and Piracy

    "The Underwriting of Marine War, Terrorism and Piracy in the 21st Century" History - Why a separate class of business, why not included in hull all risks.
  •  OnCourse Learning 1 Chapter 5 Land Use Controls

    OnCourse Learning 1 Chapter 5 Land Use Controls

    the delegation of authority from the North Carolina General Assembly to the local municipalities. Zoning . Implements or enforces city or county planning: to provide for orderly growth of a community. to result in greatest social and economic benefit to...
  • Human Evolution: an overview Dr Jeremy Pritchard o

    Human Evolution: an overview Dr Jeremy Pritchard o

    Human Evolution: an overview Dr Jeremy Pritchard Hominoid: Great apes and humans Hominid: Closer to us than gorillas and chimps Humans did not evolve from Apes You are descended from your mother and father You are related to your aunt...
  • What is the message of HIP HOP - Univerzita Karlova

    What is the message of HIP HOP - Univerzita Karlova

    Slouching toward Bork: The Culture Wars and Self-Criticism in Hip-Hop Music Jeffrey O. G. Ogbar Journal of Black Studies, Vol. 30, No. 2. (Nov., 1999), pp. 164-183. + 3 personal videos by DAVE CHAPPELLE What is the message of HIP...
  • Higher Level Questioning - Laurel County

    Higher Level Questioning - Laurel County

    I can identify the purpose behind asking questions at a higher level. I can incorporate Bloom's Taxonomy when developing high level questions. I understand what question space is and can implement this into my own questioning practice.
  • EPHESIANS The Epistle to the Ephesians is a

    EPHESIANS The Epistle to the Ephesians is a

    Ephesians 4:1 "As a prisoner for the Lord" Ephesians 6:20 "…an ambassador in chains" Acts 24-28: Trials and Journey from Caesarea to Rome Felix was the Procurator in Caesarea from about 52-58 AD Two years in prison in Caesarea Festus...
  • L01:L04: Intro,Memory Combinational & Data Logic II CSE369,

    L01:L04: Intro,Memory Combinational & Data Logic II CSE369,

    Google's AlphaGo AI secretively won more than 50 straight games against the world's top Go playersKe, the reigning top-ranked Go player, has acknowledged that human beings are no match for robots in the complex board game, after he lost three...
  • Drag&amp;Drop Macro: Electronic Jungle

    Drag&Drop Macro: Electronic Jungle

    Open PowerPoint Click Tools Click Options Click Security Tag Click Macro Security Select Medium Close and re-open PowerPoint (you won't have to do any of that again, but if asked click on enable macros at start-up next time)