Lightweight kernel operating system

A massively parallel, high-performance computing (HPC) system is particularly sensitive to operating system overhead. Traditional, multi-purpose, operating systems are designed to support a wide range of usage models and requirements. To support the range of needs, a large number of system processes are provided and are often inter-dependent on each other. The computing overhead of these processes leads to an unpredictable amount of processor time available to a parallel application. A very common parallel programming model is referred to as the bulk synchronous parallel model. The synchronization events are made at specific points in the application code. If one processor takes longer to reach that point than all the other processors, everyone must wait. The overall finish time is increased. Unpredictable and frequent operating system overhead is one significant reason a processor might take longer to reach the synchronization point than the others.

Custom Lightweight Kernel (LWK) operating systems, currently used in some of the fastest computers in the world, help alleviate this problem. The IBM Blue Gene line of [[supercomputers] run various versions of CNK—Compute Node Kernel .^[1] . The Cray XT4 and Cray XT5 supercomputers run Compute Node Linux ^[2] . Sandia National Laboratories has an almost two-decade commitment to Lightweight Kernels on its high-end HPC systems ^[3] . Sandia and University of New Mexico researchers began work on the SUNMOS for the Intel Paragon in the early 1990s. This operating system evolved into the Puma, Cougar, and Catamount operating systems deployed on ASCI Red and Red Storm. Sandia continues its work in LWKs with a new R&D effort, called kitten .^[4] .

The design goals of these operating systems are: • Targeted at massively parallel environments comprised of thousands of processors with distributed memory and a tightly coupled network. • Provide necessary support for scalable, performance-oriented scientific applications • Offer a suitable development environment for parallel applications and libraries. • Emphasize efficiency over functionality. • Maximize the amount of resources (e.g. CPU, memory, and network bandwidth) allocated to the application. • Seek to minimize time to completion for the application.

References

^ Moreira, Jose; et al. (2006-11). "Designing a Highly-Scalable Operating System: The Blue Gene/L Story". Proceedings of the 2006 ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC’06). {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help); Explicit use of et al. in: |author= (help)
^ Wallace, D. (2007-05). "Compute Node Linux: Overview, progress to date, and roadmap". Proceedings of the 2007 Cray User Group Annual Technical Conference. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)
^ Riesen, Rolf; et al. (2009-04). "Designing and Implementing Lightweight Kernels for Capability Computing". Concurrency and Computation: Practice and Experience. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help); Explicit use of et al. in: |author= (help)
^ "Kitten Lightweight Kernel".

[bgl-cnk-1] Moreira, Jose; et al. (2006-11). "Designing a Highly-Scalable Operating System: The Blue Gene/L Story". Proceedings of the 2006 ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC’06). {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help); Explicit use of et al. in: |author= (help)

[cnl-dwb-2] Wallace, D. (2007-05). "Compute Node Linux: Overview, progress to date, and roadmap". Proceedings of the 2007 Cray User Group Annual Technical Conference. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help)

[lwk-rr-3] Riesen, Rolf; et al. (2009-04). "Designing and Implementing Lightweight Kernels for Capability Computing". Concurrency and Computation: Practice and Experience. {{cite journal}}: Check date values in: |date= (help); Cite journal requires |journal= (help); Explicit use of et al. in: |author= (help)

[pedretti-4] "Kitten Lightweight Kernel".

[1]

[2]

[3]

[4]