Date: Sun, 05 Jun 2011 07:35:15 +0900
--------------------------------------------------
システムの耐障害性/弾力性 (Resiliency) 動向の基礎理解にお役に立つと
思われる、サーベイレポートです:
"Survey of Error and Fault Detection Mechanisms"
Ikhwan Lee, ..... Mattan Erez, The University of Texas at Austin
Technical report TR-LPH-2011-002, April 2011 (24 Page)
http://lph.ece.utexas.edu/merez/uploads/MattanErez/detection_mechanisms_TR_LPH_2011_002.pdf
Abstract
"This report describes diverse error detection mechanisms that can be
utilized within a resilient system to protect applications against
various types of errors and faults, both hard and soft. These
detection mechanisms have different overhead costs in terms of energy,
performance, and area, and also differ in their error coverage,
complexity, and programmer effort.
In order to achieve the highest efficiency in designing and running
a resilient computer system, one must understand the trade-offs among
the aforementioned metrics for each detection mechanism and choose
the most efficient option for a given running environment. To
accomplish such a goal, we first enumerate many error detection
techniques previously suggested in the literature."
1 Introduction
2 Failure Mechanisms
3 Detection Mechanisms for Memory
3.1 Information Redundancy
3.2 Cache Memory Error Protection
3.3 Main Memory Error Protection
4 Detection Mechanisms for Compute
4.1 Circuit-level Techniques
4.2 Architecture-level Techniques
4.2.1 Code-based Techniques
4.2.2 Execution Redundancy
4.3 Software Systems
4.4 Application-level Techniques
4.4.1 Algorithmic Based Fault Tolerance (ABFT)
4.4.2 Assertion and Sanity-Based Fault Tolerance
4.5 Hybrid Techniques
5 System-Level Detection Mechanisms
5.1 Detection at the Core Level
5.2 Detection at the System Level
5.2.1 Detecting Network Failures
5.2.2 Detecting Node Failures
6 Conclusion
Acknowledgements
References [1] - [81]
Mattan Erez, Assistant Professor
Electrical and Computer Engineering Department,
The University of Texas at Austin
http://lph.ece.utexas.edu/merez/MattanErez/Home
Erez博士は、System Resiliency, Reliability, and Dependability等を
研究されている方です。
Research
http://lph.ece.utexas.edu/merez/MattanErez/Research
上記テクニカルレポート発行以前から、
"Virtualized and Flexible ECC for Main Memory"
Doe Hyun Yoon, and Mattan Erez
Fifteenth International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS'10)
http://lph.ece.utexas.edu/merez/uploads/MattanErez/vecc_asplos_2010.pdf
http://lph.ece.utexas.edu/merez/uploads/MattanErez/vecc_asplos_2010.pptx
等、面白そうな研究を続けてきています。
==================================================
[san-tech][01874] DRAM信頼性についての報告
[san-tech][01877] Re: DRAM信頼性についての報告
0 件のコメント:
コメントを投稿