The Swiss Cheese Problem: Deduplication of High-Density Storage

The Swiss Cheese Problem, or, Chesterton Redeemed: Being a response to, and enlargement upon, his long-awaited opus, The Neglect of Cheese in European Literature, through a comparison of the formal qualities of caseinetic foodstuffs to problems in Library Economy. Jacob Nadal Executive Director ReCAP On Cheese On volumetric inefficiencies in cheese On Switzerland and its national dish Rub the pot with garlic Heat wine and a touch of lemon juice Mix with your starch, preferably corn or potato Add cheese slowly, stirring and melting Finish with a dash of kirsch and a little nutmeg On the arrangement of items on shelves

(batch) The Swiss Cheese Problem of library shelving Removing individual volumes from high-density storage creates random gaps Gaps can only be filled by a samesize item Thus, manual selection and placement of items is required And the cost per item in labor is high enough that it is less expensive to simply build additional storage space ReCAP Quantified: 11.3 Million Items (and growing at a rate of some 50,000 each month) 2012 OCLC Study [] 442,422 duplicates + 46,575 triplicates from 4,437,546 monographs 9.9% duplication + 0.1% triplication = 10% 471,593 duplicates + 48,880 triplicates from 4,851,089 titles 9.7% duplication + 0.1% triplication = 9.8%

So lets assume 10% item-level duplication From 11,300,060 items: 1,130,006 dups Skepticism Can this be true? Is it really the case that its less expensive to build space for 1,130,006 books than to backfill existing space? Are we just going to give up!? R. I. P. Bold Spirit of Librarianship Dawn of Recorded History -- 2014 Tray (a) is reshelved, the duplicate is removed from tray (b), and tray (c) is introduced (a) (b) (a) (c)

(b) Newly arrived books are added into the partly filled trays (b) and (c), and then accessioned Items do not have a standalone workflow (Items are accessioned in trays, the actual unit of work) Item Mins Mins./ s . Item Retrieve (a) 65 Retrieve (b) 65 Sort Items 65 Reshelve (a) 65 Size Accesion Verify Tray to Shelf 234 650 650

60 0.92 60 0.92 39 0.60 60 0.92 60 60 60 .26 0.09 0.09 650 60 0.09

Cost /Item $ 1.32 w/o sizing $ 1.23 partial comparison $ 0.91 But, are these item costs that we can implement? This kind of analysis abstracts out some Basedlimits on our(materials regular quotahandling, of 65 retrievals/hour key in this case) and accidentally a zeroEach retrieval gets one of theassume pair of items to deduplicate cost, zero-friction shift from one unit of Items are for refile at 100/hr. That rate is used work to sorted

another. here. Re-shelving the retained tray (a) is the same rate as This is an accidental best case retrieval analysis. started work onhave this13 items An average If of we 18 trays per hour; trays on average basis, wed go over time and budget. 50 traysfamiliar? are accessioned per hour Seem 50 trays/hour 50 trays/hour If large batches of same-size materials are coordinated, sizing time may not matter

If comparison is only when the item fails a condition check (50%) Trays do have an existing workflow Standard Rate: These are from actual operations at ReCAP Duplicate Resolution Rate Units After deriving from actual Retrieve: After deriving an an item item cost costRetrieve from the the actual 65/hour Tray (a) 65 workflow,

higher when workflow, the the costs costs are are much much higher when Retrieve Tray (b) 65 Reshelve: 65/hour sizing is factored in. sizing is factored in. Sort Items 65 Reshelve (a) 65 Sorting (on return): Sizing Trays to Sizing starts starts with with aa large

large set set of of effectively effectively 100/hr random Reaccesion into random materials materials that that are are sorted sorted into likelike-60 Size batches. That aa Trays lot time. Sizing: size size batches. That takes takes time time lot of

of (3 time. 1.25 batches/ 4 batches) 60 hr Accesion 60 On a tray-basis, if we eliminate sorting or Minutes 60 60 39 60 trays Trays/shelf varies w tray Sorting and and sizing sizing work work across across trays, trays, so so size Sorting 10 hrs

On a tray-basis, if we eliminate sorting or 60 Verify 1 batch = 1we shelf of to sizing, get closer sizing, we get full closer to the the item-basis item-basis Tray to Shelf 60 traysprojection. projection. Hours for 65 once work

once again, again, we we have have two two units units of ofMins/Item work in in Cost/Item Accession, Verify, TTS: our our projection. projection. w/o sizing partial 180 68 68 60 9.29 $3.15 $1.01

On the nature of things singular and plural 1,130,006 dups (est.) 843,709 trays 1.34 dups / tray 5 dups / 4 traysdups dups / 4 trays/ dups / 4 trays4 dups / 4 traystrays 80,035 shelves 14.12 dups/shelf dups / 4 trays113 dups / 4 traysdups dups / 4 trays/ dups / 4 trays8 dups / 4 traysshelves These rates suggests several possibilities Must be trays with multiple instances May be large clusters of duplicates Hot spot mapping dups / 4 traysless travel during retrieval Whole (serials) trays that can be removed dups / 4 traysinstant open space with no downstream work Large shelf areas reopened dups / 4 traysfaster to refill Large same-size batches

of duplicates reduces work of sizing, our most expensive step On the likelihood of a positive return on effort Per Volume Costs For 442,422 monographs: At $1.25 $552,893 At $3.15 $1,392,300 _________________________ Even with cost overruns up to 300% dups / 4 trays dups / 4 trayson dups / 4 traysthe dups / 4 trayslow dups / 4 traysend, and dups / 4 trays40% dups / 4 trayson dups / 4 traysthe dups / 4 trayshigh dups / 4 trays end, dups / 4 traysthis is a worthwhile effort. These are suitable projected conditions for a useful pilot project. Space reclamation v Construction $2,000,000 to build shelves Marginal benefit of $600,000 to $1,450,000

In addition to cost savings: Verify shelving accuracy (better than 99.99% confidence with a margin of error below 0.002%) Pushes back date of next construction (frees two aisles in the existing facility) Conclusions and Outright Speculation Conclusion: Deduplication of fixed location storage can show a positive return on effort Speculation: We have the right process (plan, pilot, project) We need to be more rigorous in planning Formalized and repeated at different scales Planning is tedious, can be protracted, but still cheap Professional hourly rates look expensive, but labor and unit costs accrue en masse scale matters once again Labor and unit costs are hard to contain once a project starts

