Package: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 21712 Depends: adduser, sun-java6-jre Recommends: hadoop-native Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop_0.18.3-6cloudera0.3.0~lenny_all.deb Size: 12455396 SHA256: f6afb269a8dcc6b76de9902cd6f9454071194698636df7f8e5557d8a174fb223 SHA1: 5f972e8a9181580b87f22998ce70e8af5319cfcb MD5sum: 9d7d01fb011f28e83ce646e0e6fb84e0 Description: A software platform for processing vast amounts of data Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data. . Here's what makes Hadoop especially useful: * Scalable: Hadoop can reliably store and process petabytes. * Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. * Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid. * Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures. . Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS). MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located. Package: hadoop-conf-pseudo Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 224 Depends: hadoop (= 0.18.3-6cloudera0.3.0~lenny) Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop-conf-pseudo_0.18.3-6cloudera0.3.0~lenny_all.deb Size: 93454 SHA256: a072644d1d86f4e9d96d98357e1b878b13e2e18efa10c12783b365c0aec83905 SHA1: 03ed128fac16d871b36f9ce10a34cc836943dc5f MD5sum: ec906bb16c4075b1ecba6c348205f9b7 Description: Pseudo-distributed Hadoop configuration Contains configuration files for a "pseudo-distributed" Hadoop deployment. In this mode, each of the hadoop components runs as a separate Java process, but all on the same machine. Package: hadoop-datanode Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 132 Depends: hadoop (= 0.18.3-6cloudera0.3.0~lenny) Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop-datanode_0.18.3-6cloudera0.3.0~lenny_all.deb Size: 79580 SHA256: a38de0c89b29e5f6283b62ed9a3417a5f152a661167181ad8d4eda6e6eb33df8 SHA1: 3eed39d9e32e972f590c76424fc4c408ba88e59a MD5sum: 2036176f07f9ebc357f91e09e32cc3af Description: Data Node for Hadoop The Data Nodes in the Hadoop Cluster are responsible for serving up blocks of data over the network to Hadoop Distributed Filesystem (HDFS) clients. Package: hadoop-doc Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 37052 Homepage: http://hadoop.apache.org/core/ Priority: extra Section: doc Filename: pool/contrib/h/hadoop/hadoop-doc_0.18.3-6cloudera0.3.0~lenny_all.deb Size: 5718880 SHA256: 093a506b327f890e2d4d7ad7f59850a5fccff1e491fee1e243a4d9e157e5442a SHA1: b09b14f2b8c54d37a7a09da3012e26d2f936c3ab MD5sum: f04e6698d1ed1b88e1e6c5d8b2ba2fea Description: Documentation for Hadoop This package contains the Java Documentation for Hadoop and its relevant APIs. Package: hadoop-jobtracker Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 132 Depends: hadoop (= 0.18.3-6cloudera0.3.0~lenny) Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop-jobtracker_0.18.3-6cloudera0.3.0~lenny_all.deb Size: 79616 SHA256: 14305b8d31cba62dba166975a4eecb8555ac09e08a7ecda21cda69a1414dc092 SHA1: 43ef3abb06d4e998ce0358ef943cb0fee9ef9acc MD5sum: cbb4546329e6a9b9ff704598dc297104 Description: Job Tracker for Hadoop The jobtracker is a central service which is responsible for managing the tasktracker services running on all nodes in a Hadoop Cluster. The jobtracker allocates work to the tasktracker nearest to the data with an available work slot. Package: hadoop-namenode Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 132 Depends: hadoop (= 0.18.3-6cloudera0.3.0~lenny) Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop-namenode_0.18.3-6cloudera0.3.0~lenny_all.deb Size: 79576 SHA256: 9229904698f3dc55bfb8bf52c0257a183b7938834823f3f641b45dc71b0a4486 SHA1: ef161672f47640856e4daceec3c1fff958fe3922 MD5sum: 89d5aec034bd516438693112d1699324 Description: Name Node for Hadoop The Hadoop Distributed Filesystem (HDFS) requires one unique server, the namenode, which manages the block locations of files on the filesystem. Package: hadoop-native Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: i386 Maintainer: Todd Lipcon Installed-Size: 212 Depends: libc6 (>= 2.7-1), hadoop (= 0.18.3-6cloudera0.3.0~lenny), liblzo2-2, libz1 Enhances: hadoop Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop-native_0.18.3-6cloudera0.3.0~lenny_i386.deb Size: 93034 SHA256: 0c2acaecda837ac74eb77d42911ec15be0ac483f27f89dfc1d6bb9c469a4f235 SHA1: d026a9eeddc573c72db703f89058c59d1626cd97 MD5sum: 68f38fa13ab717644cc08be1ba46c72f Description: Native libraries for Hadoop (e.g., compression) This optional package contains native libraries that increase the performance of Hadoop's compression. Package: hadoop-pipes Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: i386 Maintainer: Todd Lipcon Installed-Size: 320 Depends: hadoop (= 0.18.3-6cloudera0.3.0~lenny) Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop-pipes_0.18.3-6cloudera0.3.0~lenny_i386.deb Size: 134302 SHA256: 0198e7f7bcb721f0def951939ff9be0451c43839b38e41eb8d504cde6f8f94a5 SHA1: 890d86a81b12a688f6025a6e2b03baceadabe374 MD5sum: 0e27ec894e0eb2e5b79c3e5cf7dc2f70 Description: Interface to author Hadoop MapReduce jobs in C++ Contains Hadoop Pipes, a library which allows Hadoop MapReduce jobs to be written in C++. Package: hadoop-secondarynamenode Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 132 Depends: hadoop (= 0.18.3-6cloudera0.3.0~lenny) Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop-secondarynamenode_0.18.3-6cloudera0.3.0~lenny_all.deb Size: 79608 SHA256: 26c614e71a933d963cff43e6c00b7c2581cf45b4c141a0bc22c8211ce1b5b800 SHA1: ec0191ef6b5595f582810dfc0535197269bc95c0 MD5sum: 186b91b84be3fd65877b44588cbbf707 Description: Secondary Name Node for Hadoop The Secondary Name Node is responsible for checkpointing file system images. It is _not_ a failover pair for the namenode, and may safely be run on the same machine. Package: hadoop-source Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 45564 Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop-source_0.18.3-6cloudera0.3.0~lenny_all.deb Size: 16291114 SHA256: d9f0cb8c3191b6d85a822fcce66ddf1e17a70dd6309d1c38f351f2fd21864a61 SHA1: 0e332749b4de04fc354a0f4c870c7a9efb537ac3 MD5sum: b294788f74b8e669ccd50586126b0fb9 Description: Source code for Hadoop This package contains the source code for Hadoop and its contrib modules. Package: hadoop-tasktracker Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 132 Depends: hadoop (= 0.18.3-6cloudera0.3.0~lenny) Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/hadoop-tasktracker_0.18.3-6cloudera0.3.0~lenny_all.deb Size: 79584 SHA256: d6d7dc76e2f47286e60830ed1ab63f3d484016c963dceecaff7bbe709cfc3be1 SHA1: a46f44e3b8cdc2153736f835386c2b8344831c8e MD5sum: 14c38fc41d7ccf1fb1d353e3666ea221 Description: Task Tracker for Hadoop The Task Tracker is the Hadoop service that accepts MapReduce tasks and computes results. Each node in a Hadoop cluster that should be doing computation should run a Task Tracker. Package: hive Version: 0.3.0-0cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 10996 Depends: java6-jre | java6-sdk, hadoop Homepage: http://hadoop.apache.org/hive/ Priority: extra Section: misc Filename: pool/contrib/h/hive/hive_0.3.0-0cloudera0.3.0~lenny_all.deb Size: 9506218 SHA256: f89faa4840205749294009c5af876938f5ab7695779034f9641ce7cd64bead4d SHA1: 89156ad1bea0e54dc67c7e165e3489a3eda43e94 MD5sum: 8711163cdbbfc6d2b766733667aff9db Description: A data warehouse infrastructure built on top of Hadoop Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language. Package: libhdfs0 Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: i386 Maintainer: Todd Lipcon Installed-Size: 156 Depends: hadoop (= 0.18.3-6cloudera0.3.0~lenny), libc6 (>= 2.7-1) Homepage: http://hadoop.apache.org/core/ Priority: extra Section: misc Filename: pool/contrib/h/hadoop/libhdfs0_0.18.3-6cloudera0.3.0~lenny_i386.deb Size: 90138 SHA256: a5b96e11c06ab4cb2591a46c2ca931ec67d55d956ee007b486ec34cbc096fa1e SHA1: 03989a9dfddf17ae6b91b19700ec6ba858baac7f MD5sum: ec45e4244bc5cd9bc22bd1dcaa3e67f1 Description: JNI Bindings to access Hadoop HDFS from C See http://wiki.apache.org/hadoop/LibHDFS Package: libhdfs0-dev Source: hadoop Version: 0.18.3-6cloudera0.3.0~lenny Architecture: i386 Maintainer: Todd Lipcon Installed-Size: 156 Depends: hadoop (= 0.18.3-6cloudera0.3.0~lenny), libhdfs0 (= 0.18.3-6cloudera0.3.0~lenny) Homepage: http://hadoop.apache.org/core/ Priority: extra Section: libdevel Filename: pool/contrib/h/hadoop/libhdfs0-dev_0.18.3-6cloudera0.3.0~lenny_i386.deb Size: 86342 SHA256: 723021555d5be85da74a5c88ed86931adccc810e1443f63cf1e984d3718a3c25 SHA1: ea1b4bda94e70387896dca861d99c9ad92d8c5a0 MD5sum: f3a67fe9446cdb9897ec261ae1ea5e64 Description: Development support for libhdfs0 Includes examples and header files for accessing HDFS from C Package: pig Version: 0.2.0-0cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 42348 Depends: java6-jre | java6-sdk, hadoop Homepage: http://hadoop.apache.org/pig/ Priority: extra Section: misc Filename: pool/contrib/p/pig/pig_0.2.0-0cloudera0.3.0~lenny_all.deb Size: 16054166 SHA256: 7f05ede05d6c71fd8a5431523e65af939a86bc58d2a5a7dc0a2523e2a89aab95 SHA1: eef6e8ef15617295d46bc88ddaeb7ec9a00535d3 MD5sum: 8379351b4c4536e31d1e3ecf7f887438 Description: A platform for analyzing large data sets using Hadoop Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. . At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin, which has the following key properties: . * Ease of programming It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain. * Optimization opportunities The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency. * Extensibility Users can create their own functions to do special-purpose processing. Package: python-hive Source: hive Version: 0.3.0-0cloudera0.3.0~lenny Architecture: all Maintainer: Todd Lipcon Installed-Size: 688 Depends: python, python-support (>= 0.7.1) Provides: python2.4-hive, python2.5-hive Homepage: http://hadoop.apache.org/hive/ Priority: extra Section: python Filename: pool/contrib/h/hive/python-hive_0.3.0-0cloudera0.3.0~lenny_all.deb Size: 49444 SHA256: 81910b98a14ca61ea9fdeb95b5fcddd486ea5d95c0a1141768513e4fed0b7946 SHA1: 6240ddd35056e6a7ba7b62b8bafbf727759f51f1 MD5sum: 773cfb2bd36f679736b48c4ff4998674 Description: Python client library to talk to the Hive Metastore This is a generated Thrift client to talk to the Hive Metastore.