Yuri Boykov, UWO Tutorial on Medical Image Segmentation: Beyond Level-Sets MICCAI, 2014 Western University Canada www.csd.uwo.ca/faculty/yuri/miccai14_MIS 1: Basics of optimization-based segmentation - continuous and discrete approaches 2 : Exact and approximate techniques - non-submodular and high-order problems 3: Multi-region segmentation (Milan) - high-dimensional applications Yuri Boykov, UWO Introduction to Image Segmentation Part 1 implicit/explicit representation of boundaries active contours, level-sets, graph cut, etc. Basic low-order objective functions (energies) physics, geometry, statistics, information theory Part 2

Set functions, submodularity Exact methods Approximation methods Higher-order and non-submodular objectives Comparison to gradient descent (level-sets) Yuri Boykov, UWO Thresholding T Yuri Boykov, UWO Thresholding T S={ p : Ip < T } Yuri Boykov, UWO Thresholding - =

Background I= Iobj - Ibkg Subtraction ? Threshold intensities above T better segmentation? Yuri Boykov, UWO Good segmentation S ? Objective function must be specified Quality function Cost function Loss function E(S) Energy Regularization functional Segmentation becomes an optimization problem: : 2P S = arg min E(S) Yuri Boykov, UWO Good segmentation S ?

Objective function must be specified Quality function Cost function Loss function E(S) = E1(S)++ En(S) Energy Regularization functional combining different constraints e.g. on region and boundary Segmentation becomes an optimization problem: S = arg min E(S) Yuri Boykov, UWO Beyond linear combination of terms Ratios are also used E1( S )

E( S ) E2 ( S ) Normalized cuts [Shi, Malik, 2000] Minimum Ratio cycles [Jarmin Ishkawa, 2001] Ratio regions [Cox et al, 1996] Parametric max-flow applications [Kolmogorov et al 2007] Not in this tutorial Yuri Boykov, UWO Segmentation principles interactive Boundary seeds vs. Livewire (intelligent scissors) Region seeds Graph cuts (intelligent paint) Distance (Voronoi-like cells) Bounding box

Grabcut [Rother et al] Center seeds Star shape [Veksler] Many other options unsupervised Normalized cuts [Shi Malik] Mean-shift [Comaniciu] MDL [Zhu&Yuille] Entropy of appearance Add enough constraints: Saliency Shape Known appearance Texture Yuri Boykov, UWO 1. We wont cover everything 2. We will not emphasize the differences between interactive and unsupervised (too easy to convert one into the other) interactive

Boundary seeds vs. Livewire (intelligent scissors) Region seeds Graph cuts (intelligent paint) Distance (Voronoi-like cells) Bounding box Grabcut [Rother et al] Center seeds Star shape [Veksler] Many other options unsupervised

Normalized cuts [Shi Malik] Mean-shift [Comaniciu] MDL [Zhu&Yuille] Entropy of appearance Add enough constraints: Saliency Shape Known appearance Texture Yuri Boykov, UWO Common segmentation techniques boundary-based boundary region-based region-growing thresholding intelligent scissors both region & geodesic active contours (live-wire) (e.g. level-sets)

active contours MRF (snakes) (e.g. graph-cuts) watersheds random walker optimization-based Yuri Boykov, UWO Common surface representations continuous optimization sp mesh level-sets combinatorial optimization mixed optimization s p {0,1}

sp Z p graph labeling point cloud labeling on complex on grid Yuri Boykov, UWO Active contours (e.g. snakes) [Kass, Witkin, Terzopoulos 1987] Given: initial contour (model) near desirable object Goal: evolve the contour to fit exact object boundary Yuri Boykov, UWO Tracking via active contours Tracking Heart Ventricles 5-14 Yuri Boykov, UWO Active contours - snakes Parametric Curve Representation (continuous case)

A curve can be represented by 2 functions C { ( s) | s [0,1]} ( s ) ( x( s) , y ( s)) open curve parameter 0 s 1 closed curve 5-15 Yuri Boykov, UWO Snake Energy E( C ) Ein ( C ) Eex ( C ) internal energy encourages smoothness or any particular shape external energy encourages curve onto image structures (e.g. image edges) 5-16 Yuri Boykov, UWO Active contours - snakes

(continuous case) internal energy (physics of elastic band) 1 1 2 2 2 Ein ( C ) d ds d 2 ds ds d s 0 0 elasticity / stretching stiffness / bending external energy (from image) 1 Eex ( C ) 2 | I (v

( s )) | ds 0 proximity to image edges 5-17 Yuri Boykov, UWO Active contours snakes (discrete case) v3 elastic energy (elasticity) d vi 1 i 1 ds 2 v4 v5 v2 v1 C ( 0 , 1 , 2 ,...., n 1 ) 2n

i ( xi , yi ) v6 v7 v 10 v9 v8 bending energy (stiffness) d 2 ( i 1 i ) ( i i 1 ) i 1 2i i 1 2 ds Yuri Boykov, UWO Basic Elastic Snake 1 dv 2 E | | ds ds 0 n 1 2

E | vi 1 vi | i 0 elastic smoothness term (interior energy) 1 2 | I ( v( s )) | ds continuous C { (case s ) | s [ 0 ,1 ]} 0 n 1 2 | I ( vi ) | discrete case C { i | 0 i n } i 0 image data term (exterior energy) 5-19

Yuri Boykov, UWO Snakes - gradient descent C n 1 E( x0 , , xn 1 , y0 , , yn 1 ) here, energy is a function of 2n variables simple elastic snake energy 2 2 | I ( x , y ) | | I ( x , y ) | x i i y i

i i 0 n 1 ( xi 1 xi ) 2 ( yi 1 yi ) 2 i 0 update equation for the whole snake C' C E t C x'0 x0 y ' y 0 0 ... ... x ' x n 1 n 1 y' n 1 y n 1 xE0 E y0

... t E x n 1 E y n 1 5-20 Yuri Boykov, UWO Snakes - gradient descent C n 1 E( x0 , , xn 1 , y0 , , yn 1 ) here, energy is a function of 2n variables simple elastic snake energy 2 2 | I ( x , y ) | | I (

x , y ) | x i i y i i i 0 n 1 ( xi 1 xi ) 2 ( yi 1 yi ) 2 i 0 ' i i Fi t update equation for each node Fi C x'0 x0 y ' y 0 0 ... ...

x ' x n 1 n 1 y' n 1 y n 1 xE0 E y0 ... t E x n 1 E y n 1 xEi Fi E yi 5-21 Yuri Boykov, UWO Snakes - gradient descent E(C) energy function E(C) for contours C C 2n

c local minima for E(C) step size could be tricky c 2 c1 c0 gradient descent steps C i 1 C i t E second derivative of 5-22 image intensities Yuri Boykov, UWO Implicit (region-based) surface representation via level-sets z u(x, y) C {( x, y ) : u ( x, y ) 0} (implicit contour representation) [Dervieux, Thomasset, 79, 81] [Osher, Sethian, 89] Yuri Boykov, UWO

Implicit (region-based) surface representation via level-sets dC N Normal contour motion can be represented by evolution of level-set function u du | u| [Dervieux, Thomasset, 79, 81] [Osher, Sethian, 89] u (x) The scaling by | u | is easily verified in one dimension du | u | dC dC p x dC E dt Note 1: - commonly used for gradient descent evolution Note 2: - level sets can not represent tangential motion of contour points Yuri Boykov, UWO Tangential vs. normal motion of contour points

normal motion of a contour point visibly changes shape (geometry) tangential motion generates no visible shape change A simple example of tangential motion of contour points (rotation) A simple example of normal motion of contour points (expansion) Comments: - geometric energy of a contour measures visible shape properties (length, curvature, area, e.t.c.). Thus, gradient descent w.r.t. geometric objective generates only visible (normal) motion. level sets can represent contour gradient descent evolution N only for geometric energies E(C) (s.t. E is collinear with contour normal ) dC dt E Level sets (implicit contour representation) Geodesic active contours Yuri Boykov, UWO

Tangential vs. normal motion of contour points normal motion of a contour point visibly changes shape (geometry) tangential motion generates no visible shape change Np E p Q: in what medical applications tangential motion of segment boundary matters? Comments: - gradient descent for physics-based energy of a contour (e.g. elasticity) may produce geometrically invisible tangential motion of contour points physics-based energy of a contour depends on its parameterization, while geometrically it could be the same contour (compare two shapes above) Parametric contours (explicit contour representation) Physics-based active contours Yuri Boykov, UWO Physics vs. Geometry continuous optimization mesh

snakes, balloons, active contours level-sets geodesic active contours explicit or parametric contour representation physics-based objectives (typically) gradient descent could use dynamic programming in 2D [Amini, Weymouth, Jain, 1990] implicit or non-parametric representation geometry-based objectives gradient descent can use convex formulations (TV-based) [Chan, Esidoglu, Nikolova 2006] Yuri Boykov, UWO Most common geometric functionals for segmentation with level-sets dC N Functional E( C ) E (C ) g ds

weighted length E (C ) f da E (C ) v, N ds (oriented boundary alignment) ~ g g , N ~ f int(C ) (region alignment to appearance model) flux du | u | C (boundary alignment to intensity edges) weighted area or

C ~ div(v) Yuri Boykov, UWO Most common geometric functionals for segmentation with level-sets dC N Functional E( C ) E (C ) | S | g weighted length or du | u | ~ g g , N (boundary alignment to intensity edges) weighted area E (C ) E (C ) v, N ds

(oriented boundary alignment) ~ f int(C ) (region alignment to appearance model) flux f da C ~ div(v) Yuri Boykov, UWO Towards discrete geometry: weighted boundary length on a graph [Barrett and Mortensen 1996] Live wire or intelligent scissors w (| I |) | I | pixels | I| image-based edge weights

B A shortest path algorithm (Dijkstra) Yuri Boykov, UWO shortest path on a 2D graph graph cut Example: find the shortest closed contour in a given domain of a graph Shortest paths approach Graph Cuts approach p Compute the shortest path p ->p for a point Repeat forp. all points on the gray line. Then choose the optimal contour. Compute the

minimum cut that separates red region from blue region Yuri Boykov, UWO graph cuts vs. shortest paths On 2D grids graph cuts and shortest paths give optimal 1D contours. B A A Path connects points A Cut separates regions Shortest paths still give optimal 1-D contours on N-D grids Min-cuts give optimal hyper-surfaces on N-D grids Yuri Boykov, UWO Graph cut [Boykov and Jolly 2001] hard constraint

t n-links a cut s Minimum cost cut can be computed in polynomial time (max-flow/min-cut algorithms) hard constraint Ipq w exp 2 pq 2 I pq Yuri Boykov, UWO Minimum s-t cuts algorithms

Augmenting paths [Ford & Fulkerson, 1962] - heuristically tuned to grids [Boykov&Kolmogorov 2003] Push-relabel [Goldberg-Tarjan, 1986] - good choice for denser grids, e.g. in 3D Preflow [Hochbaum, 2003] - also competitive Yuri Boykov, UWO Optimal boundary in 2D max-flow = min-cut Yuri Boykov, UWO Optimal boundary in 3D 3D bone segmentation (real time screen capture, year 2000) Yuri Boykov, UWO Smoothness of segmentation boundary - snakes (physics-based contours) - geodesic contours (geometry) - graph cuts (discrete geometry)

E: many distance-to-seed methods optimize segmentation boundary only indirec they compute some analogue of optimum Voronoi cells [fuzzy connectivity, random walker, geodesic Voronoi cells, etc.] Yuri Boykov, UWO Discrete vs. continuous boundary cost Geodesic contours E (S) w s ds S Graph cuts E ( S ) w pq [ S p S q ] p ,q S p {0,1} C [Caselles, Kimmel, Sapiro, 1997] (level-sets) [Chan, Esidoglu, Nikolova, 2006] (convex) [Boykov and Jolly 2001] [Boykov and Kolmogorov 2003] Both incorporate segmentation boundary smoothness and

alignment to image edges Yuri Boykov, UWO Graph cuts on a grid and boundary of S S p {0,1} B(S ) | e | eC Severed n-links can approximate geometric length of contour C [Boykov&Kolmogorov, ICCV 2003] This result fundamentally relies on ideas of Integral Geometry (also known as Probabilistic Geometry) originally developed in 1930s. e.g. Blaschke, Santalo, Gelfand Yuri Boykov, UWO Integral geometry approach to length C 2 a subset of lines L intersecting contour C

LC a set of all lines L 0 probability that a randomly drown line intersects C Euclidean length of C : 1 || C || n d d 2 L

Cauchy-Crofton formula the number of times line L intersects C Yuri Boykov, UWO Graph cuts and integral geometry Graph nodes are imbedded in R2 in a grid-like fashion C Edges of any regular neighborhood system generate families of lines { , , , } 1 || C ||gc ||

C || n 2 k k k Euclidean length k the number of edges of family k intersecting C graph cut cost for edge weights: k k w

k 2 Length can be estimated without computing any derivatives Yuri Boykov, UWO Metrication errors Euclidean metric standard 4-neighborhoods (Manhattan metric) Riemannian metric 8-neighborhoods larger-neighborhoods Yuri Boykov, UWO Metrication errors 4-neighborhood 8-neighborhood Yuri Boykov, UWO Differential geometry

Differential vs. integral approach to geometric boundary length 1 || C || Ct dt 0 Parametric (explicit) contour representation Level-set function representation || C || | u | dx Integral geometry || C || n d

d 1 2L Cauchy-Crofton formula implicit (region-based) representation of contours Yuri Boykov, UWO Implicit (region-based) surface representation via level-sets Level set function u(p) is normally stored on image pixels Values of u(p) can be interpreted as distances or heights of image pixel -1.7 -0.8 -0.8 0.2 A contour may be approximated from u(x,y) with sub-pixel accuracy u ( p ) u ( x , y) -0.6

-0.4 -0.5 0.5 C -0.2 0.6 0.3 0.7 Yuri Boykov, UWO Implicit (region-based) surface representation via graph-cuts Graph cuts represent surfaces via binary labeling Sp of each graph nod Binary values of Sp indicate interior or exterior points (e.g. pixel centers) 0 0 0 1 There are many contours satisfying interior/

exterior labeling of points S p S ( x , y) 0 0 0 Question: Is this a contour to be reconstructed from binary labeling Sp ? Answer: NO 0 1 C 1 1 1 Yuri Boykov, UWO Contour/surface representations (summary) Implicit (area-based)

Level sets (geodesic active contours) Graph cuts (minimum cost cuts) What else besides boundary length |S| ? Explicit (boundary-based) Snakes (physics-based band model) Live-wire (shortest paths on graphs) Yuri Boykov, UWO From seeds to more general region constraints [Boykov and Jolly 2001] t-link D p (s ) S t-link w pq

S D p (t ) s segmentation s t n-links a cut t I and I assume are s D ( s ) | I I | p p known

t expected intensities D ( t ) | I I | p p of object and background cost of severed t-links s t E(S) = | I p I | | I p I | + | S | pS pS cost of severed n-links Yuri Boykov, UWO From seeds to more general region constraints [Boykov and Jolly 2001] t n-links

a cut t-link D p (s ) S S segmentation D p (t ) I s and I t s D ( s ) | I I | p p unknown intensities of object and background Chan-Vese model

s re-estimate I s and I t could be Block-Coord.Descent t-link w pq t D ( t ) | I I | p p cost of severed t-links s E(S, Is,It) = | I p I | pS t |

I I p | pS + | S | cost of severed n-links Yuri Boykov, UWO Block-coordinate descent for E (S , I 0 , I 1 ) Minimize over labeling S for fixed I , I 0 E(S , I 0 , I 1 ) (I 0 I p )2 p:S p 0 (I 1

I p )2 w pq 1 [ S p S q ] { pq}N p:S p 1 optimal L can be computed using graph cuts Minimize over I , I for fixed labeling S 0 0 1 E( S, I , I ) (I 0 1

2 Ip) p:S p 0 I 0 1 |S | (I 1 Ip) p:S p 1 Ip p:S p 0 I1 2 S=const w pq [ S p S q ] fixed for { pq}N

optimal I 1, I 0 can be computed by minimizing squared errors inside object and background segments 1 |S | Ip p :S p 1 Yuri Boykov, UWO Chan-Vese segmentation (binary case S p {0 ,1} ) E (S , I 0 , I 1 ) 0 2 ( I I ) p p :S p 0 w

{ pq}N pq 1 2 ( I I ) p p :S p 1 [ S p S q ] Yuri Boykov, UWO Chan-Vese segmentation (could be used for more than 2 labels S p {0,1,2,...} ) can be used for segmentation, to reduce color-depth, or to create a cartoon E ( S , I 0 , I 1 ,...) 0 2 ( I

I ) p p :S p 0 w { pq}N pq 1 2 ( I I ) ... p p :S p 1 [ S p S q ] Yuri Boykov, UWO

Chan-Vese segmentation (could be used for more than 2 labels S p {0,1,2,...} ) can be used for segmentation, to reduce color-depth, or to create a cartoon E ( S , I 0 , I 1 ,...) joint optimization over S and I0, I1, is NP-hard 0 2 ( I I ) p p :S p 0 1 2 ( I I ) ...

p p :S p 1 without the smoothing term, this is like K-means clustering in the color space Yuri Boykov, UWO From fixed intensity segments to general intensity distributions [Boykov and Jolly 2001] t-link D p (s ) t n-links a cut S S t-link w pq D p (t ) s

segmentation Appearance models can D p ( s ) ln Pr ( I p | s ) be D p (t ) ln Pr ( I p | t ) given by intensity distributions cost of severed t-links of objectE(S) and background ln Pr( I p | s ) ln Pr( I p | t ) + | S | = pS pS cost of severed n-links Yuri Boykov, UWO Graph cut (region + boundary) Yuri Boykov, UWO Graph cut as energy optimization for S [Boykov and Jolly 2001] t n-links a cut t-link

D p (s ) S S D p (t ) s segmentation cut S p {0 ,1} E(S) = cost(cut) (1)D ( S)D D p pS p p t-link w pq p

pS p ( 0) + w pq [ S p S q ] pqN cost of severed t-links cost of severed n-links unary terms pair-wise terms regional properties of S boundary smoothness for S Yuri Boykov, UWO Unary potentials as linear term wrt.

S p {0 ,1} unary terms D p p ( S p ) D p (1) D p (0) pS pS D p (1) S p D p (0) (1 - S p ) p = const D p (1) D p (0) S p p f ( p) f p s p f, S p

Linear region term analogous to geodesic active contours E ( S ) f S Yuri Boykov, UWO Unary potentials as linear term wrt. f,S f p s p p unary terms (linear) Examples of potential functions f Log-likelihoods Chan-Vese f p lnPr( I p ) f p ( I p c) 2

Volume Ballooning f p 1 Attention f p filter response at p Yuri Boykov, UWO In general, k-arity potentials are k-th order polynomial pair-wise terms w pq N pq [ S p S q ] w pq S p (1 S q ) (1 S p ) S q pqN quadratic polynomial wrt.S p Quadratic term analogous to boundary length in geodesic active E ( S ) |contours S |w ws ds S

Yuri Boykov, UWO Basic (quadratic) boundary regularization | S |w w pq [ s p sq ] pqN s p {0,1} second-order terms (quadratic) Examples of discontinuity penalties w Euclidean boundary length 1 w pq ~ | pq| [Boykov&Kolmogorov, 2003], via integral geometry contrast-weighted boundary length [Boykov&Jolly, 2001] w pq ~ exp( I p I q ) 2 Yuri Boykov, UWO Basic second-order segmentation energy includes linear and quadratic terms

E(S) f, S | S |w f pS p w pqN pq [ S p S q ] Yuri Boykov, UWO Basic second-order segmentation energy includes linear and quadratic terms E(S) f, S | S |w AIN ADVANTAGE: guaranteed global optimum (t.e. best segmentation w.r.t. objective) (discrete case) via graph cuts [Boykov&Jolly01; Boykov&Kolmogorov03] public code [BK2004], fast on CPU (continuous case) via convex TV formulations

[Chen,Esidoglu,Nikolova06; Chambolle,Pock,Cremers08] public code [C. Nieuwenhuis2014], comparably fast on GPU NOTE: this formulation is different from basic level-sets Yuri Boykov, UWO Optimization vs Thresholding f, Sf p B(S) E(S) pS S thresholding Pr(I | Fg) e.g. graph cut [BJ, 2001] Pr(I | Bg) Pr(I(p) | fg) f(p) ln Pr(I(p) | bg) I Yuri Boykov, UWO Other examples of useful globally optimizable segmentation objectives

Flux [Boykov&Kolmogorov 2005] Color consistency [One Cut, Tang et al. 2014] Distance ||S-S0|| from template shape Hamming, L2, [e.g. Boykov,Cremers,Kolmogorov, 2006] Star-shape prior [Veksler 2008] NOT ALLOWED Yuri Boykov, UWO Many more example of useful hard-to-optimize segmentation objectives Typical Problems: Continuous case Non-convexity Typical Solutions: gradient descent (linearization) + level sets Discrete case Non-submodularity (more later) High-order

Density (too many terms) to be discussed Yuri Boykov, UWO Examples of useful higher-order energies Cardinality potentials (constraints on segment size) E(S) V0 | S| E(S) V0 s p can not be represented as a sum of simpler (unary or quadratic) terms high-order potential V0 |S| Yuri Boykov, UWO

Examples of useful higher-order energies Cardinality potentials E(S) (constraints on segment size) V0 | S| E(S) 2 V0 s 2 p can be represented as a sum of unary and quadratic terms NOTE: 2nd-order potential V0 |S| still difficult to optimize (completely connected graph)

Yuri Boykov, UWO Examples of useful higher-order energies Cardinality potentials E(S) (constraints on segment size) V0 | S| E(S) m V0 s m p can not be represented as a sum of simpler (unary or quadratic) terms high-order potential V0

|S| Yuri Boykov, UWO Examples of useful higher-order energies Cardinality potentials (constraints on segment size) Curvature of the boundary Shape convexity Segment connectivity Appearance entropy, color consistency Distribution consistency High-order shape moments Yuri Boykov, UWO Summary Not-Covered: Covered basics of:

Thresholding, region growing Snakes, active contours Geodesic contours Graph cuts (binary labeling, MRF) Implicit surface representation Global optimization is possible To be covered later: High-order models Ratio functionals Normalized cuts Watersheds Random walker Many others