Reprints from my posting to SAN-Tech Mailing List and ...


[san-tech][03244] "Apache Hadoop: The Scalability Update", June 2011, ;login:, The USENIX Magazine

Date: Mon, 04 Jul 2011 06:11:19 +0900
;login:, The USENIX Magazine、June 2011 (Volume 36, Number 3) で、

"Apache Hadoop: The Scalability Update"
 Konstantin V. Shvachko
 Principal Hadoop architect at eBay

が一般に公開されています (他にもいくつか公開されてます、URLは文末)。

  "Scalability is one of the primary forces driving popularity and
   adoption of the Apache Hadoop project. A typical use case for Hadoop
   is an emerging Web site starting to run a five-node Hadoop cluster
   and then gradually increasing it to hundreds of nodes as business

   Last year ;login: published my article [12] summarizing one aspect of
   Hadoop scalability, namely, the limits of scalability of the Hadoop
   Distributed File System [13]. There are many other dimensions to
   Hadoop scalability. Here I would like to address some of them."

Size Matters
*)Yahoo reportedly ran numerous clusters having 4000+ nodes with four
  1 TB drives per node, 15 PB of total storage capacity, 70 million files,
  and 80 million blocks using 50 GB NameNode heap. (2008だった気がします)
※[13] (2010)の Slideでは、
  25 PB of application data across 25,000 servers
  Largest cluster is 4,000 server

*)Facebook's 2000-node warehouse cluster [2] is provisioned for 21 PB
  of total storage capacity. Extrapolating the announced growth rate,
  its namespace should have close to 200 million objects (files + blocks)
  by now, but an immense 108 GB heap should allow room for close to 400
  million objects.

*)eBay runs a 700-node cluster. Each node has 24 TB of local disk storage,
  72 GB of RAM, and a 12-core CPU. Total cluster size is 16 PB. It is
  configured to run 26,000 MapReduce tasks simultaneously.

[2] D. Borthakur, "Facebook Has the World's Largest Hadoop Cluster!": 

[12] K.V. Shvachko, "HDFS Scalability: The Limits to Growth," ;login:, 
     vol. 35, no. 2, April 2010, pp. 6-16.
※Principal software engineer at Yahoo!,

[13] K.V. Shvachko, H. Kuang, S. Radia, and R. Chansler, 
"The Hadoop Distributed File System," Proceedings of Symposium on 
 Mass Storage Systems and Technologies, May 2010.
※この時点の所属は Yahoo!

;login:, The USENIX Magazine、June 2011 (Volume 36, Number 3)
※FAST'11 Conferenceレポート等が公開
"FAST '11: 9th USENIX Conference on File and Storage Technologies"

0 件のコメント: