Key-Value stores simple data model that maps keys to a list of values Easy to achieve Performance Fault tolerance Heterogeneity Availability due to its schema-less data model and fine granularity partitioning of the data
Google BigTable Google use the key-value paradigm to map URLs to multidimensional data, such as: Timestamps/Versions Rank Keywords Links No explicit ordering is needed on keys since a hash function is used
MapReduce Many data distributed over hundreds or thousands of machines These data need to be processed and produce new data (usually the process is a simple task) ex. aggregate functions, filtering, etc. MapReduce Partition the data according to pre-defined key
ex. words, URLs, etc. A master node assigns to workers specific partitions (=keys, =mappings of data to keys) The worker will produce a new list of keyvalue, corresponding to the new intermediate processed data (Map phase) Workers then will gather all intermediate data belonging to a specific key, and reduce them to the requested output ordered by key (Reduce phase) MapReduce map (in_key, in_value) -> list(out_key, intermediate_value) reduce (out_key, list(intermediate_value)) -> list(out_value) k, v k', v' map map k1,v1 k2,v2
... reduce reduce r1 r2 ... r1 r2 ... The dirty little secret of Google that is too obvious is ... Hadoop Map/Reduce open source implementation of the map/reduce idea takes care of scheduling tasks, monitoring them
and re-executes the failed tasks a single master JobTracker and one slave TaskTracker per cluster-node HadoopDB HadoopDB: Database Connector extends Hadoop's InputFormat class connects to a database, executes the SQL query and returns results as key-value pairs should support any JDBC-compliant database that resides in the cluster HadoopDB: Catalog
The catalog maintains meta-information about the databases connection parameters such as database location, driver class and credentials metadata such as data sets contained in the cluster, replica locations, and data partitioning properties HadoopDB: Data Loader responsible for: globally repartitioning data on a given partition key upon loading
breaking apart single node data into multiple smaller partitions or chunks and finally bulk-loading the single-node databases with the chunks The [not so] easy way to do the labwork... HadoopDB with MonetDB instead of PostgreSQL read the HadoopDB paper download HadoopDB hook in MonetDB define a benchmark
do experiments write an excellent report! implementation details+experiments The [not so] fun way to do the labwork... Combine something of the following: Map/Reduce URLs DHTs - Chord N-gram Strings Strings
M5 module DHTs a traditional key-value store based on a distributed hash table there are no master nodes keys are distributed to nodes according to a hash function values are retrieved with O(logN) messages by employing routing tables Chord protocol
keys are assigned an identifier: hash(key) Each peer maintains a routing table (finger table) to route lookups peers are assigned an identifier: hash(IP) store and retrieve pairs of (key, data): lookup(key) peer6 peer1 V+2^0 peer2 V+2^1 peer4 V+2^2 peer5 peer5 Chord Ring modulo 2^m peer2 peer4 peer3
The [not so] fun way to do the labwork... Design the BAT representation of distributed KeyValue store module Develop a wrapper around it to (simulate) parallel behavior define a benchmark do experiments write an excellent report! implementation details + experiments What is the interface of the KV
ICT Standards Policy 2011. Objectives. Improve . the effective functioning of ICT sector. through adoption of high-quality and interoperable standards for ICT infrastructures in order to increase the efficiency of service delivery.
MIC at a Glance Mission Assist businesses in seizing market opportunities Assist the Taiwanese government in formulating industry development strategies Since 1987, MIC has been serving as a strategic think thank for senior decision makers in industries, public sectors, academic...
Modality principle . Although multimedia only combines 2 of the senses, we use this method quite often. Cognitive psychologist Richard Mayer studied 3 different groups. One group who received info via hearing, one group via sight, and one group with...
study abroad - to live, travel, and take classes in a foreign country - for a semester or even a year. Knowing a second language such as SPANISH increases the opportunity to immerse oneself in a foreign language and culture.
Hebrews 12:1-2. 1 Therefore we also, since we are surrounded by so great a cloud of witnesses, let us lay aside every weight, and the sin which so easily ensnares us, and let us run with endurance the race that...
Digital Video Disc or Digital Versatile Disc (DVD) DVD is an optical media standard that can be used to store large amounts of different types of data (computer data, video, audio). ... The NTFS file system automatically detects bad sectors...
Adam Gouker A2LA, EMC Program Manager ... Telecom 88 Bluetooth 10 CTIA 23 Product Safety (electrical) 68 VCCI 67 SAR 13 WiMAX 4 Energy Star - 23 (1 CB) TCB's - 8 Upcoming A2LA Training November 14-15 (SC) - Intro...
Ready to download the document? Go ahead and hit continue!