Date: Fri, 09 Oct 2009 22:20:48 +0100
------------------------------------------------
2011/06/13
[san-tech][01877] Re: DRAM信頼性についての報告
------------------------------------------------
以前 CMU Gibson教授のところでディスクの信頼性を研究されていた Bianca
Schroeder博士が、DRAMの信頼性についての報告をされてました:
"DRAM errors in the wild: A Large-Scale Field Study."
B. Schroeder, E. Pinheiro, W.-D. Weber. Sigmetrics/Performance 2009
http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
ABSTRACT
"The goal of this paper is to answer questions such as the follow-
ing: How common are memory errors in practice? What are their
statistical properties? How are they affected by external factors,
such as temperature and utilization, and by chip-specific factors,
such as chip density, memory technology and DIMM age?"
で、どこのデータを解析したかというと
1. INTRODUCTION
"This paper provides the first large-scale study of DRAM memory
errors in the field. It is based on data collected from Google's
server fleet over a period of more than two years making up many
millions of DIMM days."
何故 Googleが出てくるかと言うと、共著者のお二人は Googleの方です
Eduardo Pinheiro
http://research.google.com/pubs/author1777.html
Wolf-Dietrich Weber
http://research.google.com/pubs/author10649.html
しかも、お二人は反響の大きかった
"Failure Trends in a Large Disk Drive Population", Eduardo Pinheiro,
Wolf-Dietrich Weber, Luiz Andre Barroso, 5th USENIX Conference on
File and Storage Technologies (FAST 2007)
http://research.google.com/pubs/pub32774.html
の共著者です。同じ FAST 2007での Schroederさんの発表
"Disk failures in the real world: What does an MTTF of 1,000,000 hours
mean too you?", Bianca Schroeder, Garth Gibson.
http://www.cs.toronto.edu/~bianca/papers/fast07.pdf
FAST 2007 Technical Session
http://www.usenix.org/events/fast07/tech/
Bianca Schroeder, Assistant professor
Computer Science Department
University of Toronto
http://www.cs.toronto.edu/~bianca/
PDSI at CMU: Analyzing Failure Data
http://www.pdl.cmu.edu/PDSI/FailureData/index.html
SIGMETRICS/Performance 2009, June 15 - 19, 2009
http://conferences.sigmetrics.org/sigmetrics/2009/
Sigmetrics Best Presentation Awardを受賞されたとのことですが、
発表資料は公開されてないようです。
上記の件は James Hamilton氏の Blogで知りました:
"You really DO need ECC Memory", 2009年10月7日
http://perspectives.mvdirona.com/2009/10/07/YouReallyDONeedECCMemory.aspx
このエントリーで紹介されている、
"Ten Ways to Waste a Parallel Computer",
Katherine Yelick (Professor, U.C. Berkeley and Director of NERSC),
Keynote, ISCA 2009, June 22, 2009
http://isca09.cs.columbia.edu/ISCA09-WasteParallelComputer.pdf
Katherine Yelick
http://www.cs.berkeley.edu/~yelick/
ISCA 2009の発表資料:
http://isca09.cs.columbia.edu/papers.html
Papersですが論文は公開されていません。発表資料だけです。
------------------------------------------------
[san-tech][03151] "Survey of Error and Fault Detection Mechanisms", Technical report, April 2011
0 件のコメント:
コメントを投稿