Date: Mon, 09 Apr 2012 13:58:42 +0900
-------------------------------------------------
Oak Ridge Leadership Computing Facilityの CY 2011 (2011/01/01-12/31)
Operational Assessment Report (OAR) です:
"High Performance Computing Facility Operational Assessment,
CY 2011 Oak Ridge Leadership Computing Facility"
February 2012, 89 Page
U.S. Department of Energy, Office of Science
http://www.osti.gov/bridge/product.biblio.jsp?osti_id=1036210
Oak Ridge National Laboratory's Leadership Computing Facility (OLCF)
http://www.olcf.ornl.gov/
http://science.energy.gov/ascr/facilities/olcf/
関連施設:
Argonne Leadership Computing Facility (ALCF)
http://www.alcf.anl.gov/
http://science.energy.gov/ascr/facilities/alcf/
関連プログラム
Innovative & Novel Computational Impact on Theory and Experiment (INCITE)
http://science.energy.gov/ascr/facilities/incite/
"INCITE in Review", March, 2012
http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/INCITE_IR.pdf
User Results/Business Results/Strategic Results/Innovation/
Risk Management/Summary of the Proposed Metric Values に対して、
CHARGE QUESTIONが設定され、各章の冒頭でそれに答え、引き続き裏付け
データ等を展開していく構成です。
CHARGE QUESTION、斜め読みして気になる用語やグラフとか:
EXECUTIVE SUMMARY
"Oak Ridge National Laboratory's Leadership Computing Facility (OLCF)
continues to deliver the most powerful resources in the U.S. for
open science. At 2.33 petaflops peak performance, the Cray XT Jaguar
delivered more than 1.4 billion core hours in calendar year (CY) 2011
to researchers around the world ..."
....
"Effective operations of the OLCF play a key role in the scientific
missions and accomplishments of its users. This Operational Assessment
Report (OAR) will delineate the policies, procedures, and innovations
implemented by the OLCF to continue delivering a petaflop-scale resource
for cutting-edge research. This report covers CY 2011 that unless
otherwise specified, denotes January 1, 2011 through December 31, 2011."
.....
Communications with Key Stakeholders
Communication with the Program Office
Communication with the User Community
Communication with the Vendors
Communication with Advisory Groups
Summary of 2011 Metrics
Responses to Recommendations from the Previous 2011 Operational Assessment Review
User Results
CHARGE QUESTION 1:
Are the processes for supporting the customers, resolving problems,
and outreach effective?
1.1 User Results Summary
1.2 User Support Metrics
1.2.1 Overall Satisfaction Rating for the Facility
1.2.2 Average Rating across All User Support Questions
1.4 Problem Resolution Metrics
1.4.1 Problem Resolution Metric Summary
Figure 1.1. Number of Helpdesk Tickets Issued per Month
Figure 1.2. Categorization of Helpdesk Tickets
1.5 User Support and Outreach
1.5.3 Scientific Computing Liaisons
Responding to Time-Critical Needs
the OLCF's rapid response to the Fukushima nuclear accident
Table 1.9. Training Event Summary
1.6 User Support Conclusion (Page 26)
user satisfaction (4.2/5.0)
user services (4.1/5.0)
problem resolution (4.2/5.0)
to address user problems within 3 business days
OLCF training effort and rated it a 4.2/5.0
Business Results
CHARGE QUESTION 2:
Is the facility maximizing the use of its HPC systems and other
resources consistent with its mission?
2.2 Cray XT Compute Partition Summary
Table 2.3. OLCF Business Results Summary for HPC Systems
Cray XT/HPSS: Scheduled Availability/MTTI/MTTF/Total Usage etc.
2.3 Resource Availability
2.3.1 Scheduled Availability
2.3.2 Overall Availability
Increasing System Availability
Figure 2.1. Eliminating VRM failures increases system stability
2.3.3 Mean time to Interrupt
2.3.4 Mean Time to Failure
2.4 Resource Utilization
2.4.1 Total System Utilization
Table 2.10. 2011 OLCF System Utilization
Figure 2.2. 2011 XT5 Resource Utilization - Core Hours by Program
2.5 Capability Utilization
Figure 2.3. Effective Scheduling Policy Enables Leadership-class Usage
Strategic Results
CHARGE QUESTION 3:
Is OLCF enabling scientific achievements consistent with
the Department of Energy Strategic Goal 2, which is to
"maintain a vibrant U. S. effort in science and engineering as
a cornerstone of our economic prosperity and clear leadership
in strategic areas?"
3.1 Science Output
Table 3.1. List of OLCF Publications
3.2 Scientific Accomplishments
3.3 Accomplishments in Energy Systems Research
3.4 Allocation of Facility Director's Reserve
3.4.1 Director's Discretionary Program
Table 3.2. Director's Discretionary Program: Domain Allocation Distribution
Table 3.3. Director's Discretionary Program: Awards and User Demographics
3.4.2 Industrial HPC Partnerships Program
Table 3.4. Industry Projects at the OLCF
Innovation
CHARGE QUESTION 4:
What innovations have been implemented that have improved
the facility's operations?
4.1 Application Readiness
4.2 Application Support
4.3 Outreach
4.4 Systems
Breaking Bottlenecks in Parallel I/O - Innovative Systems
"I/O Congestion Avoidance via Routing and Object Placement"
2011 Cray User Group meeting
http://info.ornl.gov/sites/publications/Files/Pub30140.pdf
Intuitive Data Portal for Collaborative Climate Science
- Innovative Systems
Real-time Monitoring of Simulations through an Integrated Dashboard
- Innovative Systems
4.5 Leadership
Empowering a Sustainable Lustre Ecosystem through OpenSFS
- Innovative Leadership
4.6 Energy Management
Effects of CRU Top Hats on Air Flow - Innovative Energy Management
Risk Management
CHARGE QUESTION 5:
Is the Facility effectively managing risk?
5.1 Risk Management
5.2 Major Risks Tracked in the Current Year
Summary of the Proposed Metric Values
CHARGE QUESTION 6:
Are the performance metrics used for the review year and proposed
for future years sufficient and reasonable for assessing Operational
performance?
The OLCF provides (below) a summary table of the metrics and actuals
for 2011, and proposed metrics and targets for 2012 and 2013.
Nice information.........operational readiness monitoring system
返信削除