Reprints from my posting to SAN-Tech Mailing List and ...

2011/07/31

[san-tech][03314] "How Facebook moved 30 petabytes of Hadoop data", Jul. 27, 2011, GigaOM

Date: Sat, 30 Jul 2011 21:03:25 +0900
--------------------------------------------------
また Hadoop関連ですが、

"How Facebook moved 30 petabytes of Hadoop data"
 Jul. 27, 2011, GigaOM
  http://gigaom.com/cloud/facebook-hadoop-cluster/

  "... Today, it shared the latest of those innovations - moving its
   whopping 30-petabyte cluster from one data center to another."

  "... The move was necessary because Facebook had run out of both
   power and space to expand the cluster ? very likely the largest
   in the world - and had to find it a new home. Yang writes that
   there were two options, physical migration of the machines or
   replication, and Facebook chose replication to minimize downtime."


以下が Facebookの担当者の報告 (元ネタ) です:

"Moving an Elephant: Large Scale Hadoop Data Migration at Facebook"
 2011/07/27, Paul Yang
  http://www.facebook.com/notes/paul-yang/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920

.....
  "... By March 2011, the cluster had grown to 30 PB - that's 3,000 times
   the size of the Library of Congress! At that point, we had run out of
   power and space to add more nodes, necessitating the move to a larger
   data center."
.....

Replication It Is
  "Once the required systems were developed, the replication approach
   was executed in two steps. First, a bulk copy transferred most of
   the data from the source cluster to the destination." ..... " After
   the bulk copy was done, file changes after the start of the bulk copy
   were copied over to the destination cluster through the new
   replication system. File changes were detected through a custom
   Hive plug-in that recorded the changes to an audit log. The replication
   system continuously polled the audit log and copied modified files
   so that the destination would never be more than a couple of hours
   behind. The plug-in recorded Hive metadata changes as well, so that
   metadata modifications such as the last accessed time of Hive tables
   and partitions were propagated. Both the plug-in and the replication
   system were developed in-house by members of the Hive team."

  "At the final migration switchover time, we set up camp in a war room
   and shut down Hadoop JobTracker so that new files would not be created.
   Then, the replication system was allowed to catch up." ..... "Once
   replication was caught up, both clusters were identical, and we changed
   the DNS entries so that the hostnames referenced by Hadoop jobs pointed
   to the servers in the new cluster. We started the JobTracker in the
   new data center, and the jobs were able to run as usual, with
   no modifications required." .....

Size Matters
   ........
Move Fast, Replicate Things
   ........

  "The next set of challenges for us include providing an ability to support
   a data warehouse that is distributed across multiple data centers.
   If you're interested in working on these and other "petascale" problems
   related to Hadoop, Hive, or just large systems, come join Facebook's data
   infrastructure team!"

Careers | Facebook
  https://www.facebook.com/careers/


0 件のコメント:

コメントを投稿