Reprints from my posting to SAN-Tech Mailing List and ...

2011/06/13

[san-tech][01874] DRAM信頼性についての報告

Date: Fri, 09 Oct 2009 22:20:48 +0100
------------------------------------------------
2011/06/13
[san-tech][01877] Re: DRAM信頼性についての報告
------------------------------------------------
以前 CMU Gibson教授のところでディスクの信頼性を研究されていた Bianca
Schroeder博士が、DRAMの信頼性についての報告をされてました:

"DRAM errors in the wild: A Large-Scale Field Study."
 B. Schroeder, E. Pinheiro, W.-D. Weber. Sigmetrics/Performance 2009
  http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf
ABSTRACT
"The goal of this paper is to answer questions such as the follow-
 ing: How common are memory errors in practice? What are their
 statistical properties? How are they affected by external factors,
 such as temperature and utilization, and by chip-specific factors,
 such as chip density, memory technology and DIMM age?"
で、どこのデータを解析したかというと

1. INTRODUCTION
"This paper provides the first large-scale study of DRAM memory
 errors in the field. It is based on data collected from Google's
 server fleet over a period of more than two years making up many
 millions of DIMM days."

何故 Googleが出てくるかと言うと、共著者のお二人は Googleの方です
Eduardo Pinheiro
  http://research.google.com/pubs/author1777.html
Wolf-Dietrich Weber
  http://research.google.com/pubs/author10649.html
しかも、お二人は反響の大きかった
"Failure Trends in a Large Disk Drive Population",  Eduardo Pinheiro,
 Wolf-Dietrich Weber, Luiz Andre Barroso, 5th USENIX Conference on
 File and Storage Technologies (FAST 2007)
  http://research.google.com/pubs/pub32774.html
の共著者です。同じ FAST 2007での Schroederさんの発表
"Disk failures in the real world: What does an MTTF of 1,000,000 hours
 mean too you?", Bianca Schroeder, Garth Gibson.
  http://www.cs.toronto.edu/~bianca/papers/fast07.pdf

FAST 2007 Technical Session
  http://www.usenix.org/events/fast07/tech/


Bianca Schroeder, Assistant professor
Computer Science Department
University of Toronto
  http://www.cs.toronto.edu/~bianca/
PDSI at CMU: Analyzing Failure Data
  http://www.pdl.cmu.edu/PDSI/FailureData/index.html

SIGMETRICS/Performance 2009, June 15 - 19, 2009
  http://conferences.sigmetrics.org/sigmetrics/2009/
Sigmetrics Best Presentation Awardを受賞されたとのことですが、
発表資料は公開されてないようです。


上記の件は James Hamilton氏の Blogで知りました:
"You really DO need ECC Memory", 2009年10月7日
  http://perspectives.mvdirona.com/2009/10/07/YouReallyDONeedECCMemory.aspx
このエントリーで紹介されている、
"Ten Ways to Waste a Parallel Computer",
 Katherine Yelick (Professor, U.C. Berkeley and Director of NERSC),
 Keynote, ISCA 2009, June 22, 2009
  http://isca09.cs.columbia.edu/ISCA09-WasteParallelComputer.pdf
Katherine Yelick
  http://www.cs.berkeley.edu/~yelick/

ISCA 2009の発表資料:
  http://isca09.cs.columbia.edu/papers.html
Papersですが論文は公開されていません。発表資料だけです。
------------------------------------------------
[san-tech][03151] "Survey of Error and Fault Detection Mechanisms", Technical report, April 2011

0 件のコメント:

コメントを投稿