Date: Sat, 30 Jul 2011 21:03:25 +0900
"How Facebook moved 30 petabytes of Hadoop data"
Jul. 27, 2011, GigaOM
"... Today, it shared the latest of those innovations - moving its
whopping 30-petabyte cluster from one data center to another."
"... The move was necessary because Facebook had run out of both
power and space to expand the cluster ? very likely the largest
in the world - and had to find it a new home. Yang writes that
there were two options, physical migration of the machines or
replication, and Facebook chose replication to minimize downtime."
以下が Facebookの担当者の報告 (元ネタ) です：
"Moving an Elephant: Large Scale Hadoop Data Migration at Facebook"
2011/07/27, Paul Yang
"... By March 2011, the cluster had grown to 30 PB - that's 3,000 times
the size of the Library of Congress! At that point, we had run out of
power and space to add more nodes, necessitating the move to a larger
Replication It Is
"Once the required systems were developed, the replication approach
was executed in two steps. First, a bulk copy transferred most of
the data from the source cluster to the destination." ..... " After
the bulk copy was done, file changes after the start of the bulk copy
were copied over to the destination cluster through the new
replication system. File changes were detected through a custom
Hive plug-in that recorded the changes to an audit log. The replication
system continuously polled the audit log and copied modified files
so that the destination would never be more than a couple of hours
behind. The plug-in recorded Hive metadata changes as well, so that
metadata modifications such as the last accessed time of Hive tables
and partitions were propagated. Both the plug-in and the replication
system were developed in-house by members of the Hive team."
"At the final migration switchover time, we set up camp in a war room
and shut down Hadoop JobTracker so that new files would not be created.
Then, the replication system was allowed to catch up." ..... "Once
replication was caught up, both clusters were identical, and we changed
the DNS entries so that the hostnames referenced by Hadoop jobs pointed
to the servers in the new cluster. We started the JobTracker in the
new data center, and the jobs were able to run as usual, with
no modifications required." .....
Move Fast, Replicate Things
"The next set of challenges for us include providing an ability to support
a data warehouse that is distributed across multiple data centers.
If you're interested in working on these and other "petascale" problems
related to Hadoop, Hive, or just large systems, come join Facebook's data
Careers | Facebook