Date: Wed, 10 Feb 2010 17:41:27 +0900
--------------------------------------------------
2012/05/01
"Practical experiences with OS Jitter"
Feb 09, 2012, IBM developerWorks Wikis
https://www.ibm.com/developerworks/wikis/display/LinuxP/Practical+experiences+with+OS+Jitter
"OS Jitter Mitigation Techniques"
Feb 09, 2010, IBM developerWorks Wikis
https://www.ibm.com/developerworks/wikis/display/LinuxP/OS+Jitter+Mitigation+Techniques
--------------------------------------------------
大規模システム (16,000ノード) での OS Jitterに関するレポートです
(実際は HPC Colony Project報告)
"Linux OS Jitter Measurements at Large Node Counts using a BlueGene/L"
Jones, Terry R [ORNL] ;
Tauferner, Mr. Andrew [IBM T.J. Watson Research Center] ;
Inglett, Mr. Todd [IBM T.J. Watson Research Center]
Publication Date: 2010 Jan 01 (On Paper: November 30, 2009)
http://www.osti.gov/bridge/product.biblio.jsp?query_id=1&page=0&osti_id=971232
Abstract
"We present experimental results for a coordinated scheduling
implementation of the Linux operating system. Results were collected
on an IBM Blue Gene/L machine at scales up to 16K nodes. Our results
indicate coordinated scheduling was able to provide a dramatic
improvement in scaling performance for two applications characterized
as bulk synchronous parallel programs."
比較オペレーティングシステム (カーネル)
Kernel 1: Blue Gene/L Compute Node Kernel (CNK)
"One of CNK's principal design points was to avoid OS noise.
It runs one process at a time; therefore it does not need to
perform time-slicing or preemptive multitasking."
"This static memory map completely avoids TLB misses ..."
Kernel 2: Colony Linux Kernel with unmodified Scheduler
Linux version 2.6.16
"A console driver and RAS driver were added in addition to various
changes to support the BlueGene/L platform. The default 4KB pages
were replaced with 64KB pages."
Kernel 3: Colony Linux Kernel with Coordinated Scheduler
"Two /proc interfaces were created and the scheduler was modified
to give priority to the HPC applications in a coordinated fashion."
今回の検証アプリケーション
Application 1: Allreduce
Application 2: glob
いろいろ試行錯誤しながら、大規模システムに適した OSを作り込んでいます。
(後述しますが、HPC-Colonyプロジェクトは INCITE 2010に採択されました)
HPC-Colony Project
http://www.hpc-colony.org/
ソースコード公開は未だのようです
"Colony Update", Terry Jones, Principal Investigator
http://sites.google.com/site/fastos2/fastos-workshop-slides/fastos-workshop-materials/sc09-fastos-slides/Colony_SC2009_Talk.ppt?attredirects=0&d=1
↑ PPTファイル
FastOS 2, Birds-of-a-Feather at Supercomputing 2009
http://sites.google.com/site/fastos2/supercomputing-2009-bof
Terry Jones, Application Performance Tools group, CSM, ORNL
http://www.csm.ornl.gov/~trj/
Terry Jones, Stanford University
http://www-cs-students.stanford.edu/~trj/
HPC Colonyは、INCITE 2010で新規に採択されました。マシンは XT5ですが、
協同研究者の半数以上は IBMの方です。4,000,000コア時間 (= 455年)
Title:
"HPC Colony: Removing Scalability, Fault, and Performance
Barriers in Leadership Class Systems through Adaptive System
Software"
Principal Investigator: Terry Jones (Oak Ridge National Laboratory)
Co-Investigators
Laxmikant Kale(University of Illinois?Urbana-Champaign)
Jose Moreira (International Business Machines)
Celso Mendes, Esteban Meneses, (UIUC),
Yoav Tock, Eliezer Dekel, Roie Melamed, Eli Luboshitz,
Menachem Shtalhaim, Benjamin Mandler (IBM)
Scientific Discipline: Computer Science
INCITE Allocation: 4,000,000 processor hours
Site: Oak Ridge National Laboratory
Machine (Allocation): Cray XT (4,000,000 processor hours)
[san-tech][02043] Re: US DOE INCITE 2010 AWARDS発表 (10/01/26), 28 Jan 2010
2010 Awards Fact Sheet
http://www.er.doe.gov/ascr/incite/2010INCITEFactSheets.pdf
0 件のコメント:
コメントを投稿