BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//WSSP - ECPv6.15.20//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:WSSP
X-ORIGINAL-URL:https://www.wssp.hlrs.de
X-WR-CALDESC:Events for WSSP
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:Europe/Berlin
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20240331T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20241027T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20250330T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20251026T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20260329T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20261025T010000
END:STANDARD
END:VTIMEZONE
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:20170326T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:20171029T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:20180325T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:20181028T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:20190331T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:20191027T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:20200329T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:20201025T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:20210328T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:20211031T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:20220327T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:20221030T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:20230326T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:20231029T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:20240331T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:20241027T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:20250330T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:20251026T010000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;VALUE=DATE:20250527
DTEND;VALUE=DATE:20250529
DTSTAMP:20260508T071404
CREATED:20250423T112900Z
LAST-MODIFIED:20250528T135251Z
UID:346-1748304000-1748476799@www.wssp.hlrs.de
SUMMARY:39th Workshop on Sustained Simulation Performance
DESCRIPTION:Agenda\nAll times are given in Central European Summer Time (CEST).  \n\n\n\n\n\nTuesday\, May 27th\, 2025\n\n\n\n\n09:15 – 09:30\nWelcome & Introduction\nMichael Resch\, High-Performance Computing Center Stuttgart\, University of Stuttgart\n\n\n09:30 – 10:00\nResearch and user support activities at Tohoku University Cyberscience Center\nHiroyuki Takizawa\,Cyberscience Center\, Tohoku University\nTohoku University Cyberscience Center has been operating vector supercomputers and assisting users in fully utilizing their potential. This presentation will report on our recent efforts to help users optimize their code for our computing system\, AOBA\, which is powered by the latest generation of NEC SX-Aurora TSUBASA\, the most powerful vector supercomputer. At both the system operation and research levels\, we are continuously exploring effective methods to make the most of vector computing technologies. Performance evaluation results demonstrate that the SX-Aurora TSUBASA achieves high sustained performance for memory-intensive applications without requiring special programming models or languages.\n\n\n10:00 – 10:30\nThe Future of HLRS\nMichael Resch\, High-Performance Computing Center Stuttgart\, University of Stuttgart\nIn this talk we will look at the new developments in HPC and what impact they will have on HLRS. We will explore the role of AI for an HPC center and see how this will change services and operation of HLRS.\n\n\n10:30 – 11:00\nMusing on performance sustainability in the age of AI\, superchips\, APUs\, increasing TCOs and NetZero impact\nSadaf Alam\, Bristol Center of Supercomputing\, University of Bristol\nSupercomputing ecosystems have experienced considerable shifts since the early 2020 across applications and technology domains. HPC and supercomputing resources are increasingly being allocated to AI—a domain where the frequency of hardware changes and especially software stacks updates are considerably different compared to classic modelling and simulation HPC applications. Recently\, the fastest reported HPL system on November 2024 Top500 list is based on an APU\, or accelerated processor unit. This talks overviews Isambard-AI\, a UK national AI RR or research resource comprising Nvidia Arm-GPU superchip called GH200\, its software stack for AI and HPC\, and its sustainability credentials using the Modular Data Centre (MDC) solution to manage Total Cost of Ownership (TCO) and NetZero impact.\n\n\n11:00 – 11:30\nCoffee Break\n\n\n11:30 – 12:00\nFuture Computing at HLRS\nJohannes Gebert\, High-Performance Computing Center Stuttgart\, University of Stuttgart\nIncreasing the computational capabilities enables larger and more detailed simulations\, even after decades of development. While the high-performance computer’s (HPC) performance increases\, the physical\, hence technological limits are on the horizon. Specialized processors may focus on certain mathematical operations in widespread software stacks\, but they also promise to accelerate the performance of potentially heterogeneous systems. Hardware improvements need software environments to allow for reasonably quick ports of existing software stacks to these new\, accelerating devices.\nAt HLRS\, we contribute to the HPC community by investigating computing devices’ capabilities\, testing the software stack\, and sharing our experiences and improvements on hardware architectures.\nIn this talk\, we will give an overview of the future computing program at HLRS\, goals\, and strategy. We will outline current projects and present initial results.\n\n\n12:00 – 12:30\n Introduction and challenge of new supercomputing system towards the Open Science era\nSusumu Date\, University of Osaka\nOsaka University has been working on the procurement of a supercomputing system after OCTOPUS. We will plan to complete the installation of it and start the operation in September 2025. In this talk the speaker introduce and explain the specification of the new supercomputing system as well as a challenge which we have faced for realizing supercomputing system in the Open Science era.\n\n\n12:30 – 13:30\nLunch Break\n\n\n13:30 – 14:00\nFirst Results of GPU Porting Activities for Aeroacoustic Prediction\nMethods\nMatthias Meinke\, Institute of Aerodynamics\, RWTH Aachen University\ntbd\n\n\n14:00 – 14:30\nFrom PDE to x in NeoFOAM: An overview of the NeoFOAM project\nGregor Olenik\, TU München\nNeoFOAM is a platform portable implementation of OpenFOAMs core algorithms and data structures. It aims to bring modern software development methods to existing simulation workflows\, leveraging C++20 compliant code\, being extensively unit-tested\, hardware vendor agnostic\, and extensible via plugins. This talk discusses the architecture and design choices behind NeoFOAM using neoIcoFoam as an illustrative example.\n\n\n14:30 – 15:00\nAccelerating the FlowSimulator: Speeding up the HPC codes used at DLR \nImmo Huismann\, DLR Dresden\nThis contribution summarizes activities performed at German Aerospace Center (DLR) to speed up the used high-performance computing (HPC) codes.\nIt starts out from the overarching goals that are to be solved via HPC\, showcases the current distribution of HPC usage at DLR and\, thereafter\, demonstrates multiple case studies for analysing and accelerating the specific codes.\nThe list of case studies for performance analysis includes\, but is not limited to\, one of the CFD software “CFD by ONERA\, DLR and Airbus” (CODA) and one of an industrial-grade aeroelastic toolchain.\nRuntimes\, profiles\, and traces are shown\, and current actions to address the underlying bottlenecks discussed.\n\n\n15:00 – 15:30\nCoffee Break\n\n\n15:30 – 16:00\nSustainable research software for accessible high-performance computing\nMichael Schlottke-Lakemper\, High-Performance Scientific Computing\, University of Augsburg\nModern supercomputers are becoming increasingly heterogeneous\, incorporating hardware components from multiple vendors. At the same time\, high-performance computing software development has grown more collaborative\, often uniting research groups and institutions across different regions. Coupled with the constant addition of new features and performance optimizations\, this raises critical questions about sustainability: How can we handle hardware complexity\, coordinate diverse development teams\, and maintain evolving research software within an academic environment while still writing energy-efficient\, high-performance code?In this talk\, we present some of our strategies for tackling these challenges through the Trixi Framework. We introduce Trixi.jl\, a high-order numerical simulation environment for conservation laws built in Julia\, along with its spin-off packages (TrixiShallowWater.jl\, TrixiAtmo.jl) and its sister project\, TrixiParticles.jl. We then discuss how we handle software architectures\, automation\, code reuse\, and organizational practices to balance extensibility with accessible high performance on heterogeneous systems. Finally\, we point out remaining open questions and outline plans for future development.\n\n\n16:00 – 16:30\nInvestigating the Cerebras CS2 Chip: Mathematical and Software Engineering Goals\, Methods and Initial Results\nJonathan Schäfer\, High-Performance Computing Center Stuttgart\, University of Stuttgart\nIn a world of ever-increasing computational demand and Moore’s law slowing down for conventional chips\, interest in alternative hardware for specialized mathematical operations rises. An instance of an interesting new hardware concept is the Cerebras CS2 chip with around 850000 cores\, which is therefore called a “supercomputer on a chip”. As part of the new Future Computing Group led by Johannes Gebert and in collaboration with the Computational Mathematics Group led by Prof. Hartwig Anzt at the Technical University of Munich\, HLRS investigates the new chip from various angles\, namely from a mathematical\, hardware-oriented\, software engineering\, and user experience point of view. We present a roadmap for research as well as initial findings.\n\n\n16:30 – 17:00\nReproducible and Performance-Optimized Environments for Large-Scale Machine Learning Applications\nFelix Ruhnke\, High-Performance Computing Center Stuttgart\, University of Stuttgart\nWith the increasing size of machine learning models for solving complex problems\, the demand for computational resources is rising significantly. This leads to a stronger convergence between the disciplines of machine learning and High-Performance Computing. The growing performance capabilities of modern machine learning models open up new application areas\, including safety-critical domains such as medical technology. At the same time\, the scientific community is increasingly drawing attention to a reproducibility crisis in the field of machine learning. The intersection of these three developments underscores the necessity of thoroughly investigating reproducible and performance-optimized environments for large-scale machine learning applications. In this work\, various implementations of the Message Passing Interface in containerized environments were developed to examine the impact of different communication modes on the numerical reproducibility of results. For this purpose\, benchmark tests were conducted to optimize polynomials of varying degrees on a single node with a varying number of Graphics Processing Units. The gradient aggregation algorithms Average and Adaptive Summation were employed. The results showed that variations in communication models using the Average algorithm had no impact on numerical reproducibility. In contrast\, tests using the Adaptive Summation algorithm with varying communication modes resulted in non-reproducible outcomes.Furthermore\, it was observed that\, due to the non-associative nature of floating-point arithmetic operations and the varying execution order of computations during parallel training of machine learning models\, deviations of over 7% in the number of iterations until convergence can occur. Additionally\, the investigations revealed a highly sensitive convergence behavior of the models during training concerning configuration changes\, emphasizing the need for careful selection of the computing environment and precise hyperparameter adjustment.\n\n\n17:00 – 17:30\nDevelopment of dynamic resource assignment for effective system usage\nMasatoshi Kawai \, Tohoku University\nRecently\, energy efficiency as well as improving parallel performance of applications has become important for the operation and use of supercomputers. In this presentation\, we will introduce a developing platform that provides dynamic resource assignments for improving parallel performance and energy consumption.\n\n\n\n\nWednesday\, May 28\, 2025\n\n\n\n\n09:30 – 10:00\nHammerHAI – The German AI Factory for Engineering\, Global Challenges and Industry\nBastian Koller\, High-Performance Computing Center Stuttgart\, University of Stuttgart\nThis talk will provide insight into the German AI Factory HammerHAI\, which resulted from a European Key Initiative and which started service operations in Q1/25\n\n\n10:00 – 10:30\nLeveraging Cloud-Native Supercomputing for AI Workflows\nDennis Hoppe\, High-Performance Computing Center Stuttgart\, University of Stuttgart\nThis talk explores the transformative potential of cloud-native supercomputing concepts and their impact on AI workflows. While powerful and GPU-rich\, traditional High-Performance Computing systems present unique challenges when executing AI tasks. A clear paradigm shift is underway\, moving from a system-centric to a user-centric approach\, where usability\, accessibility\, and flexibility become key design criteria. Using the AI Factory HammerHAI as a concrete example\, the presentation will demonstrate how HammerHAI aims to lower these barriers by integrating cloud-native concepts like containerisation and orchestration\, thereby optimising supercomputing resources specifically for the practical needs of the AI community.\n\n\n10:30 – 11:00\nCoffee Break\n\n\n11:00 – 11:30\nNew Brand “NEC BluStellar” Use Case – Research Information Infrastructure(RII)\nFutoshi Tabata\, NEC Japan\ntbd\n\n\n11:30 – 12:00\nIntroduction of Confidential Computing in HPC: Usage Model\, Use Cases and Challenges\nKamil Tokmakov\, NEC\nHPC data centres accommodate users from various domains\, each with varying security requirements. Sensitive data processing\, such as for medical use cases or intellectual property protection\, requires stricter security measures\, including encryption across all data states: at-rest\, in-motion and in-use. While parallel file systems found in HPC\, such as GPFS and Lustre\, already offer encryption of data at-rest and in-motion\, encryption keys and sensitive data are not yet fully protected in memory. Confidential computing addresses such protection of data in-use by performing computations in the trusted execution environments\, where data and code are secured at the hardware level. This talk introduces confidential computing in the context of HPC\, covering its use cases and challenges.\n\n\n12:00 – 12:30\nNEC SX-Aurora TSUBASA – Our best friend for a long time\nChristoph Wenzel\, Institut für Aero- und Gasdynamik (IAG)\, University of Stuttgart\nFor a long time\, the NEC SX-Aurora TSUBASA has been a valuable addition to the HPE Apollo (Hawk) system at HLRS. Even though its size was not sufficient for large-scale production runs for the fundamental research on turbulent boundary layers with direct numerical simulation (DNS)\, Aurora has still played a central role in our research pipeline. In this talk\, our group’s experience working with the SX-Aurora platform will be presented\, highlighting its integration into our workflows. Additionally\, a performance study of our in-house DNS code NS3D on Aurora will be presented\, providing insights into computational efficiency achieved on Aurora.\n\n\n12:30 – 13:30\nLunch Break\n\n\n13:30 – 14:00\nSuper-resolution Reconstruction of Three-dimensional Vorticity Fields by Latent Diffusion Models\nMitsuo Yokokawa\, Tohoku University\ntbd\n\n\n14:00 – 14:30\nEnabling AMD APU Support for CalculiX\nCrunchiX: A Port to the Instinct MI300A\nBenjamin Schnabel\, High-Performance Computing Center Stuttgart\, University of Stuttgart\nCalculiX CrunchiX is an open-source finite-element analysis (FEA) application featuring both implicit and explicit solvers\, written in C\, C++ and Fortran 77 and maintained by Guido Dhondt since 1998. Its flexible architecture allows it to interface with a variety of sparse linear solvers-such as an iterative Cholesky solver\, SPOOLES\, or intel oneMKL PARDISO-to solve structural mechanics problems. Recently\, heterogeneous computing with GPUs has emerged as a key strategy to accelerate the solution of linear systems of equations in scientific applications. While CalculiX already supports NVIDIA GPUs through the PaStiX solver and the CUDA library\, there is currently no counterpart for AMD architectures. In this work\, we describe the design\, implementation\, and optimization of a new backend for CalculiX that targets the AMD Instinct MI300A APUs.\nOur implementation is deployed and benchmarked on HLRS’s newest flagship supercomputer HPE Cray EX4000 system (Hunter).\n\n\n14:30 – 15:00\nPower and Performance\nNico Formanek\, High-Performance Computing Center Stuttgart\, University of Stuttgart\nThe biggest driver of computing performance are hardware improvements. Even though the relative energy efficiency (e.g. FLOPS/watt) is improving at an almost exponential rate this does not translate in any reduction in absolute energy consumption. Rebound effects like this have been studied for centuries in economics starting from Jevons (1865) but there is still no consensus if economic growth can be decoupled from absolute energy consumption. Here I will argue that we face a similar problem in computing\, i.e. that performance cannot be decoupled from absolute energy input. This in turn casts doubt on the feasibility of sustainability efforts like the GREENER (2023) principles. I will close by evaluating several accounts of why we still could want to improve performance even in the light of such hard tradeoffs.\n\n\n15:00 – 15:30\nCoffee Break\n\n\n15:30 – 16:00\nTowards scientific foundation models\nSteffen Staab\, University of Stuttgart\nFoundation models are machine-learned models that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. Foundation models have been used successfully for question answering and text generation (ChatGPT)\, image understanding (Clip\, VIT)\, or image generation. Recently\, the basic idea underlying foundation models been considered for learning scientific foundation models that capture expectations about partial differential equations. Existing scientific foundation models have still been very much limited wrt. the type of PDEs or differential operators . In this talk\, I present some of our recent work on paving the way towards scientific foundation models that aims at making them more robust and better generalisable.\n\n\n16:00 – 16:30\nQuantum Computing Status -Technologies\, Benchmarks and Use Cases in Academic and Industry Fields\nShintaro Momose\, NEC Japan\ntbd\n\n\n16:30 – 16:45\nFarewell\nMichael Resch\, High-Performance Computing Center Stuttgart\, University of Stuttgart\n\n\n\n 
URL:https://www.wssp.hlrs.de/events/39th-workshop-on-sustained-simulation-performance/
LOCATION:HLRS\, Nobelstraße 19\, Stuttgart\, Baden-Württemberg\, 70569\, Germany
ORGANIZER;CN="Mr. Johannes Gebert":MAILTO:gebert@hlrs.de
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20240617
DTEND;VALUE=DATE:20240619
DTSTAMP:20260508T071404
CREATED:20240527T091834Z
LAST-MODIFIED:20240618T075544Z
UID:292-1718582400-1718755199@www.wssp.hlrs.de
SUMMARY:37th Workshop on Sustained Simulation Performance
DESCRIPTION:Agenda\nAll times are given in Central European Summer Time (CEST).  \n\n\n\n\n\nMonday\, June 17th \, 2024\n\n\n\n\n10:00 – 10:15\nWelcome & Introduction\nMichael Resch\, High-Performance Computing Center Stuttgart\, University of Stuttgart\n\n\n10:15 – 10:45\nOperational experience of the latest-generation SX-Aurora TSUBASA system\, AOBA-S\nHiroyuki Takizawa\, Cyberscience Center\, Tohoku University \nTohoku University Cyberscience Center started operation of the world’s largest SX-Aurora TSUBASA system named AOBA-S in August 2023. This talk reports the experience of operating the AOBA-S system while showing some performance evaluation results and discussions. Several important applications have already been optimized for AOBA-S\, and the performance evaluation results clearly suggest the potential of the latest-generation vector engines adopted in AOBA-S. This talk also introduces our research projects that have recently started in collaboration with AOBA users.\n\n\n10:45 – 11:15\nPerformance of a direct numerical simulation code for isothermal compressible turbulence on the SX-Aurora TSUBASA\nMitsuo Yokokawa\, Kobe University \nA direct numerical simulation code for compressible turbulent flows under isothermal conditions in a box with periodic boundary conditions was developed. A finite difference method is used for a discretization of the governing equations. In paticular\, an eighth-order compact difference scheme was used for the covective terms and the Mattor’s method\, which is a parallel solver for a linear system with tridiagonal matrix\, was used to compute the first-order derivative of the covective terms. Performance of the DNS code was measured on the SX-Aurora TSUBASA.\n\n\n11:15 – 11:45\nAim and Strategy of mdx2\, IaaS-typed Computing Infrastructure\nSusumu Date\, Osaka University \nThe Cybermedia Center at Osaka University has installed an IaaS-typed Computing Infrastructure in March 2024 and will soon start the IaaS service. In this talk\, the speaker will start with the background and show the aim and strategy of this system installation from now on as well as overview the structure of the system.\n\n\n11:45 – 12:15\nImproving Efficiency of Monte Carlo Method via Code Intrinsic framework\nQifeng Pan\, High-Performance Computing Center Stuttgart\, University of Stuttgart \nThe Monte Carlo (MC) method is widely used in many engineering fields\, especially in uncertainty quantification\, due to its robustness and simplicity. However\, the existing execution pattern of MC suffers from low efficiency and scaling problems in high-performance computing (HPC). In this talk\, the speaker will introduce the code intrinsic framework designed to tackle the efficiency problem of MC in HPC. The basic idea of the code intrinsic framework is to reduce the redundant calculations of MC and increase the code vectorization rate. Numerical results show that performance improvements can be achieved on various platforms\, including the Intel and SX-Aurora TSUBASA machines.\n\n\n\n12:15 – 13:15\nLunch Break\n\n\n\n13:15 – 13:45\nDEPO Meets Mechanics: A Case Study on Dynamic Power Capping for Energy Efficiency\nJohannes Gebert\, Jonathan Schäfer\, High-Performance Computing Center Stuttgart\, University of Stuttgart \nMoore’s law is slowing down despite HPC centers’ increasing energy consumption. For cost and climate reasons\, developing techniques for reducing energy usage while at least maintaining performance is sensible. DEPO\, a software-agnostic\, node-level\, and dynamic power-capping approach by Krzywaniak\, Czarnul\, and Proficz (2022) promises to achieve these goals. We present their approach to real-world challenges based on an FEA. The Direct Tensor Computation (DTC) of Ralf Schneider and Johannes Gebert is run with DEPO to demonstrate the tools’ applicability. We explore different points in the input configuration space of the application and investigate the impact on energy consumption under power capping. Furthermore\, we present and discuss different ways to continue and expand this research.\n\n\n\n13:45 – 14:15\nConnecting Software Methods and Data Driven Methods\nSabine Roller\, German Aerospace Center\, TU Dresden\n\n\n14:15 – 14:45\nEvaluating a Real-Time Lossy Array Compression Algorithm for Computer Simulations\nDarjan Krijan\, High-Performance Computing Center Stuttgart\, University of Stuttgart \nComputer Simulations that were previously regarded as CPU-bound become gradually memory-bound as the growth in memory bandwidth cannot keep up with the much higher advancements in raw computational power. This imbalance is quantified with a relative factor of approximately5.1 per decade since the 1990s\, where a rise in memory bandwidth is met with a 5.1-times increase in relative computing power. In practical terms\, comparing a NEC SX-4 from 1994 that operated at a balanced computational intensity of 0.125 FLOP/Byte with an Intel Ponte Vecchio accelerator from 2023 that operates at 15.9 FLOP/Byte shows a factor of 127 in the described imbalance. Mixed precision approaches that were traditionally used to speed up the throughput of calculations inside the CPU core on a cache/register level now provide speedup due to less demanded memory bandwidth. Approaches to compress arrays in a lossless or lossy manner to reduce memory bandwidth were implemented in LLNL’s zfp library\, although it is not able to process the data in real-time. A similar approach targeting a real-time lossy array compression (RTLAC) algorithm is currently being evaluated for use in highly memory-bound computer simulations at HLRS.\n\n\n14:45 – 15:15\nCoffee Break\n\n\n15:15 – 15:45\nDirect Numerical Simulation of Turbulent Boundary Layers – On the Road to High Reynolds Numbers\nChristoph Wenzel\, Institute of Aerodynamics and Gas Dynamics\, University of Stuttgart\n\n\n\n15:45 – 16:15\nDatenmanagement for HPC Workloads at Scale: Case Studies on Data Structure and Compression.\nGregror Weiß\, High-Performance Computing Center Stuttgart\, University of Stuttgart\n\n\n\n16:15 – 16:45\nCloud resolving global weather simulation with the Model for Prediction Across Scales (MPAS) – A case study\nThomas Schwitalla\, University of Hohenheim\n\n\n\n16:45 – 17:15\nThe Quest for Sustained Performance on Heterogeneous Exascale Architectures with the Climate Model ICON\nPanagiottis Adamidis\, Deutsches Klimarechenzentrum Hamburg\n\n\n\n18:30\nDinner\n\n\n\n\nTuesday\, June 18\, 2024\n\n\n\n\n09:00 – 09:30\nChallenges for HLRS ahead\nMichael Resch\, High-Performance Computing Center Stuttgart\, University of Stuttgart \nThis talk summarizes the challenges that lay ahead for HLRS in the coming decade. It looks into the challenges that we face when changing technology from CPU to GPU. It will also address the issue of AI as a new user community in HPC. \n\n\n09:30 – 10:00\nAnalyses of Turbomachinery and Heat Transfer Cases using HPC Systems\nMatthias Meinke\, Institute of Aerodynamics\, RWTH Aachen University \nThis presentation will highlight computational methodologies and results from two industrial applications related to the field of turbomachinery and steel casting. Details of the computational methods featuring\, adaptive mesh refinement\, dynamic load balancing and multigrid methods will be presented together with the required HPC resources.\n\n\n10:00 – 10:30\nAlgorithmic Differentiation of Geometric Modelling Libraries Aimed at Gradient-Based Shape Optimization\nMladen Banovic\, German Aerospace Center\, TU Dresden\n\n\n\n10:30 – 11:00\nCoffee Break\n\n\n\n11:00 – 11:30\nML for Computational Science: Machine Learning Models to Accelerate Simulation Science on HPC\nMakoto Takamoto\, NEC Germany\n\n\n\n11:30 – 12:00\nA Constraint Partition Method for Combinatorial Optimization Problems\nKazuhiko Komatsu\, Cyberscience Center\, Tohoku University \nIn recent years\, Ising machines have attracted much attention due to their potential in solving combinatorial optimization problems that are challenging for conventional computers. For optimization problems formulated into quadratic unconstrained binary optimization (QUBO) problems with constraints\, known as constraint problems\, an objective function and constraint function\, along with penalty coefficients\, are combined into a single Hamiltonian of a QUBO problem. However\, when solving constraint problems\, solution accuracy typically degrades by excessively large penalty coefficients avoiding constraint violations. As a result\, minimizing the objective function becomes challenging. To solve this issue\, the presentation introduces a method that partitions constraint functions and reduces penalty coefficient values. Performance evaluation using the traveling salesperson problem (TSP) with one-hot constraints illustrates that the proposed method enhances solution accuracy compared to conventional approaches.\n\n\n12:00 – 12:30\nAn analyse of Kernels Performances when porting a CFD code to Nvidia GPUs using different Programming Models\nPaul Saumet\, High-Performance Computing Center Stuttgart\, University of Stuttgart\n\n\n\n12:30 – 13:30\nLunch\n\n\n13:30 – 14:00\nAI for HPC: Optimising Operations\nRishabh Saxena\, High-Performance Computing Center Stuttgart\, University of Stuttgart \nIn the last few years\, Artificial Intelligence and Machine Learning have been dominant topics in the general field of computer science\, and the scientific community as well. Since one of the core principles of machine learning-based algorithms requires a huge amount of data being processed on a large scale\, it is expectant that high performance hardware would be needed for such computations. In this context\, HPC can provide a suitable platform for various aspects of the ML pipeline\, from data pre-processing to the deployment of models in production environment. In this talk\, we will look at how HLRS is working towards optimizing its systems for AI and ML workloads\, what are the aspects of ML pipelines that are relevant for HPC\, how traditional HPC workloads\, like simulations\, can be integrated into ML pipelines\, and what is the outlook for AI on HPC in the near future at HLRS.\n\n\n14:00 – 14:30\nModelling of a Offshore Wind Park with OpenFOAM\nFlavio Galeazzo\, High-Performance Computing Center Stuttgart\, University of Stuttgart\n\n\n\n14:30 – 15:00\n HANAMI: Advancing Supercomputing Collaboration Between Europe and Japan\nSophia Honisch\, High-Performance Computing Center Stuttgart\, University of Stuttgart \nThe HANAMI project\, a strategic alliance between Europe and Japan\, aims to innovate high-performance computing (HPC) applications for next-generation supercomputers across various scientific domains. This collaboration focuses on enhancing simulation capabilities in environmental sciences\, biomedicine\, and materials science. Aligned with the EuroHPC Joint Undertaking\, HANAMI will port existing code\, evaluate application performance on new architectures\, and facilitate access to advanced supercomputing resources like Fugaku and EuroHPC systems.\n\n\n15:00 – 15:45\n\nFarewell\n\n\n\n  \n  \n 
URL:https://www.wssp.hlrs.de/events/37th-workshop-on-sustained-simulation-performance/
LOCATION:HLRS\, Nobelstraße 19\, Stuttgart\, Baden-Württemberg\, 70569\, Germany
ORGANIZER;CN="Mr. Johannes Gebert":MAILTO:gebert@hlrs.de
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20230413
DTEND;VALUE=DATE:20230415
DTSTAMP:20260508T071404
CREATED:20230203T095652Z
LAST-MODIFIED:20230704T074522Z
UID:229-1681344000-1681516799@www.wssp.hlrs.de
SUMMARY:35th Workshop on Sustained Simulation Performance
DESCRIPTION:Agenda\nAll times are given in Central European Summer Time (CEST).  \nThe registration is closed. \n\n\n\n\n\nThursday\, April 13\, 2022\n\n\n\n\n10:00 – 10:15\nWelcome & Introduction\nMichael Resch\, HLRS\, University of Stuttgart\n\n\n10:15 – 10:45\nLessons Learned from A Quantum-Annealing Assisted HPC R&D Project\nHiroaki Kobayashi\, Tohoku University\n\n\n10:45 – 11:15\nNEC’s Quantum Computing Strategy\, Technology\, and Use Cases\nShintaro Momose\, NEC Corporation \n\n\n\nThis presentation consists of two parts\, discussing SX-Aurora TSUBASA vector supercomputer and introducing simulated annealer working on SX-Aurora TSUBASA called Aurora Vector Annealing. The first half of the presentation shows the vector architecture of SX-Aurora TSUBASA\, especially its latest vector processors having the highest-level memory  bandwidth. Sustained performance and power efficiency are also discussed\, as well as NEC’s future plans and roadmap. The second half of the presentation shows NEC’s quantum computing strategies and their products to provide higher sustained performance in the annealing/optimization fields. NEC developed the Aurora Vector Annealing as a simulated annealer and has a strong business relationship with D-Wave providing a quantum annealer. NEC aims at solving various social issues by using the quantum/simulated annealing technologies and by developing a hybrid platform with supercomputer and quantum/simulated annealer to provide much higher sustained performance. \n\n\n\n\n\n\n11:15 – 11:45\nTowards Science DMZ based on Accelerated ONION using DTN\nSusumu Date\, Osaka University\nThe speaker introduces what is happening in Osaka University towards the promotion and advancement of Data-driven Scientific Research. In this talk\, the experience of using DTN is first introduced and then a future direction of compute infrastructure composed of supercomputers( SQUID and OCTOPUS ) and data infrastructure is explained based on the experience.\n\n\n11:45 – 13:15\nLunch\n\n\n13:15 – 13:45\nSPH-EXA: A Framework for Scalable\, Flexible\, and Extensible Astrophysical and Cosmological Simulations – Slides\nFlorina M. Ciorba\, Department of Mathematics and Computer Science\, University of Basel \n\n\n\nSPH-EXA is a highly scalable and extensible simulation framework for astrophysical and cosmological simulations. It is codesigned by computational (astrophysicists and cosmologists) and computer scientists (high-performance computing) to achieve scientific advances in astrophysics\, cosmology\, and high-performance computing\, for highly scalable simulations using Smoothed Particle Hydrodynamics.  SPH-EXA includes highly optimized and parallelized hydrodynamics and gravity solvers. It supports new particle types and particle data fields that can be combined with custom-made\, simulation-derived observable properties and in-situ data analysis.  It requires minimal software dependencies\, provides scalable parallelization and communication support\, and portability with optimizations for recent CPUs and GPUs. This design relieves potential users from architectural specifics and performance concerns for performing and scaling up simulations with SPH-EXA. This talk will describe the SPH-EXA framework\, its approach to domain decomposition\, parallelization\, gravity and hydrodynamics solvers\, and their flexible use as pluggable components. A scalable initial conditions generator\, test cases and simulations will also be presented.  SPH-EXA is extensible with additional physical effects through the use of propagators\, whereby the pluggable framework components are easily customized or implemented from scratch using the provided building blocks in an abstract\, efficient\, and scalable way. \n\n\n\n\n\n\n13:45 – 14:15\nPrediction and Mitigation of Aeroacoustic Noise on HPC Systems\nMatthias Meinke\, Ansgar Niemöller\, Miro Gondrum\, Zhe Yang\, Wolfgang Schröder\, Institute of Aerodynamics\, RWTH Aachen University\, Germany\n\n\n14:15 – 14:45\nReal-time flood inudndation simulation on SX-Aurora TSUBASA\nHiroyuki Takizawa\, Yoichi Shimomura\, Akihiro Musa\, Yoshihiko Sato\, Atuhiko Konja\, Guoqing Cui\, Rei Aoyagi\, and Keichi Takahashi\, Tohoku University\nA real-time flood inundation simulation based on the Rainfall-Runoff Inundation (RRI) model is memory-intensive\, and SX-Aurora TSUBASA is hence promising to efficiently execute the simulation in time due to high sustained memory bandwidth provided by vector processors. This talk will report that the real-time simulation has successfully been migrated and optimized for SX-Aurora TSUBASA. Assuming a shared computing system such as Supercomputer AOBA installed at Tohoku University\, a resource demand estimation method is developed to minimize the amount of shared computing resources used for prediction in order to reduce the impact on other users sharing the system. The evaluation results show that SX-Aurora TSUBASA with only 32 cores can meet the real-time simulation requirement of simulating 7-hour flood inundation for the Tohoku region of Japan within 20 minutes\, and also the resource demand estimation method can adaptively adjust the computing resource amount used for the real-time simulation.\n\n\n14:45 – 15:15\nBreak\n\n\n15:15 – 15:45\nCompetence Centres and Centres of Excellence within the European Strategy – Slides\nBastian Koller\, HLRS\, University of Stuttgart\n\n\n15:45 – 16:15\nPower capping in high performance computing – experiences and prospects\nPawel Czarnul\, Adam Krzywaniak and Jerzy Proficz\, Gdańsk University of TechnologyIn this work we investigate usage\, limitations and prospects of power capping in high performance computing (HPC). Specifically\, we discuss APIs for modern CPUs\, GPUs and present\, as an illustration\, new unpublished data showing performance and energy characteristics for selected parallel OpenMP applications under power caps executed on a dual socket Intel Skylake-X system. These APIs can be used within algorithms: deriving configurations for which selected performance-energy goals (such as EDP\, EDS) are optimized for non-trivial i.e. non-default power caps; and also allowing minimization of execution times under power caps. We discuss various factors of interest in the future in power capping aware HPC such as other metrics not considered so far\, applicability and accuracy of measurement methods: using filters\, hardware vs software methods\, conditions and use cases for particular methods. We describe future works and areas including scenarios that can benefit from power capping in HPC.\n\n\n16:15 – 16:45\nContainerization for DLR HPC applications\nSabine Roller\, Deutsches Zentrum für Luft- und Raumfahrt\, Institut für Softwaremethoden zur Produkt-Virtualisierung\n\n\n18:30\nDinner\n\n\n\n\nFriday\, April 14\, 2023\n\n\n\n\n09:00 – 09:45\nKeynote – Sustaining Simulation Performance in the US Exascale Computing Project – Slides\nHartwig Anzt \, University of Tennessee\, Knoxville\nThe US Exascale Computing Project (ECP) has the goal to deliver a capable exascale computing ecosystem to provide breakthrough modeling and simulation solutions to address the most critical challenges in scientific discovery\, energy assurance\, economic  competitiveness\, and national security. This requires providing scientific computing applications with a software stack that allows them to perform on the leadership supercomputers. In this talk\, we discuss the impact of the ECP hardware landscape on software design and how the Ginkgo math library responds to the ECP application requirements and helps to achieve the simulation performance goals.\n\n\n09:45 – 10:15\nHPC and AI at HLRS – Slides\nMichael Resch\, HLRS\, University of Stuttgart \n\n\n\nHLRS was created to support users with High-Performance Computing. Over the last years\, however\, Artificial Intelligence has become an important topic. In this talk we look into the changes that AI drives in computer simulation. We will explore the needs of AI users. Furthermore\, we will have a first look at the needs of traditional computer simulation and t the potential of AI for such simulations. \n  \n\n\n\n\n\n\n10:15 – 11:00\nBreak\n\n\n11:00 – 11:30\nTowards Building a Digital Twin of Job Scheduling\nTatsuyoshi Ohmura\, NEC Corporation \n\n\n\nA job scheduler\, which is the core component of the HPC system\, has several parameters. Tuning these parameters can improve the efficiency of system operation\, but this requires knowledge and experience and places a burden on\nthe system operator. To reduce this burden\, we are studying the digital twin of the job scheduler. To realize the digital twin\, we have developed a job scheduler simulator. We demonstrate examples of the use of the simulator. \n\n\n\n\n\n\n11:30 – 12:00\nScalable Cluster Administration with LXC³ – Slides\nErich Focht\, NEC Corporation\nThe LXC^3 Cluster Command and Control tools have evolved at NEC Germany since two decades\, going through various changes to adapt to continuously changing requirements. The talk discusses the design choices of LXC^3-neo which runs as a pool of micro-services orchestrated by docker swarm\, its scalability and limitations seen at customer sites. The most recent developments move the cluster management stack even closer to methods used in cloud systems management\, simplifying node image handling and management network setup.\n\n\n12:00 – 13:30\nLunch\n\n\n13:30 – 14:00\nA feasibility study of quantum annealing for the next-generation computing infrastructure – Slides\nKazuhiko Komatsu\, Tohoku University\nThis presentation introduces a new project\, feasibility study of quantum computing for the next-generation computing infrastructure\, and shows an early evaluation of annealing machines.\n\n\n14:00 – 14:30\nEnergy Efficiency and Renewable Energy for Distributed High-Performance Computing\nChristoph Niethammer\, HLRS\, University of Stuttgart\n\n\n14:30 – 15:00\nA new framework for calibrating COVID-19 SEIR models with spatial-/time-varying coefficients using genetic and sliding window algorithms – Slides\nHuan Zhou\, HLRS\, University of Stuttgart \n\n\n\nA susceptible-exposed-infected-removed (SEIR) model assumes spatial-/time-varying coefficients to model the effect of non-pharmaceutical interventions (NPIs) on the regional and temporal distribution of COVID-19 disease epidemics. A significant challenge in using such models is their fast and accurate calibration to observed data from geo-referenced hospitalized data\, i.e.\, efficient estimation of the spatial-/time-varying parameters. In this work\, a new calibration framework is proposed towards optimizing the spatial-/time-varying parameters of SEIR model. We also devise a method for combing the overlapping sliding window technique (OSW) with a genetic algorithm (GA) calibration routine to automatically search the segmented parameter space. Parallelized GA is used to reduce the computational burden. Our framework abstracts the implementation complexity of the method away from the user. It provides high-level APIs for setting up a customized calibration system and consuming the optimized values of parameters. We evaluated the application of our method on the calibration of a spatial age-structured microsimulation model (CoSMic) using a single objective function that comprises observed COVID-19-related ICU demand. The results reflect the effectiveness of the proposed method towards estimating the parameters in a changing environment. \n\n\n\n\n\n\n15:00\nOpen End\n\n\n\n\n\n\n\n\n\n\n\n\n 
URL:https://www.wssp.hlrs.de/events/35th-workshop-on-sustained-simulation-performance/
LOCATION:HLRS\, Nobelstraße 19\, Stuttgart\, Baden-Württemberg\, 70569\, Germany
ORGANIZER;CN="Mr. Johannes Gebert":MAILTO:gebert@hlrs.de
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=Europe/London:20230113T080000
DTEND;TZID=Europe/London:20230113T170000
DTSTAMP:20260508T071404
CREATED:20230113T120530Z
LAST-MODIFIED:20230113T120530Z
UID:219-1673596800-1673629200@www.wssp.hlrs.de
SUMMARY:26th Workshop on Sustained Simulation
DESCRIPTION:Agenda\n\n\n\n\n\nTuesday\, 10 October 2017\n\n\n\n\n9:00 – 9:15\nIntroduction\n      Michael Resch\, HLRS\, University of Stuttgart\n\n\n9:15 – 9:45\nTwo-Year Experiences with Vector Supercomputer SX-ACE and Design Space Exploration of the Next Generation Vector System\n      Hiroaki Kobayashi\, Cyberscience Center\, Tohoku University\n      Abstract \n\n\n\n             \n          In this talk\, I will be presenting two-year experiences with our brand-new vector-parallel supercomputer SX-ACE.  In particular\, we will show you operation statistics\, applications developed on SX-ACE\, and some case study of program tuning to exploit its potential.   In addition\, I will describe the future plan for supercomputing resource installation and deployment at Tohoku University and make some discussion on the design space exploration of the future vector system. \n            \n          \n \n \n\n\n\n9:45 – 10:15\nSiVeGCS – The Future of German Supercomputing\n      Michael Resch\, HLRS\n      Abstract \n\n\n\n            \n         This talk will summarize the research and development activities of HLRS and will highlight future activities and directions\n         \n          \n \n \n\n\n\n10:15 – 10:45\nBreak\n\n\n10:45 – 11:15\nJAMSTEC Next Scalar Supercomputer System\n      Ken’ichi Itakura\, JAMSTEC\n      Abstract\n\n\n11:15 – 11:45\nOCTOPUS: a new supercomputing service of Osaka University\n      Susumu Date\, Cybermedia Center\, Osaka University\n      Abstract\n\n\n11:45 – 12:15\nThe Brand-new Vector Supercomputer\, Aurora\n      Shintaro Momose\, NEC\n      Abstract\n\n\n12:15 – 13:15\nLunch\n\n\n13:15 – 13:45\nA Multiple-layer Bypass Mechanism for Energy Efficient Computing\n      Ryusuke Egawa\, Masayuki Sato\, Ryoma Saito\, Hiroaki Kobayashi\, Cyberscience Center\, Tohoku University\n      Abstract\n\n\n13:45 – 14:15\nCoupling Strategies for Multiphysics Simulations on Hierarchical Cartesian Meshes\n      Matthias Meinke\, Michael Schlottke\, Ansgar Niemöller\, Institute of Aerodynamics\, RWTH Aachen University\n      Abstract\n\n\n14:15 – 14:45\nLocally Linearized Euler Equations in Discontinuous Galerkin with Legendre Polynomials\n      H. Klimach\, M. Gaida\, S. Roller\, Simulationstechnik & Wissenschaftliches Rechnen\, Universität Siegen\n      Abstract\n\n\n14:45 – 15:15\nUnveiling Insight on Fluid Systems in a Diverse Environment using CFD\n      Manuel Hasert\, Festo AG & Co. KG\n      Abstract\n\n\n15:15 – 15:45\nbreak\n\n\n15:45 – 16:15\nHigh-fidelity Simulation of Helicopter Phenomena HPC aspects in advanced engineering applications\n      Manuel Keßler\, Institute for Aero and Gasdynamics University of Stuttgart\n      Abstract\n\n\n16:15 – 16:45\nHighly portable CFD solutions for heterogeneous computing on unstructured meshes\n      A.V. Gorobets\, S.A.Soukov\, Keldysh Institute of Applied Mathematics of RAS\, Moscow\, Russia\n      P.B. Bogdanov\, Scientific Research Institute of System Development of RAS\, Moscow\, Russia\n      X. Alvarez\, F.X.Trias\, Heat and Mass Transfer Technological Center of UPC\, Barcelona\, Spain\n      Abstract \n      \n\n\n16:45 – 17:15\nNumerical modelling of phase change processes in clouds – challenges and approaches\n      Martin Reitzle\, Bernhard Weigand\, Institute of Aerospace Thermodynamics\, University of Stuttgart\n      Abstract\n\n\n17:15 – 17:45\nA dynamic load-balancing strategy for large scale CFD-applications\n      Philipp Offenhäuser\, HLRS\n      Abstract\n\n\n19:00 – 21:00\nDinner in Goldener Adler\n\n\n\n\nWednesday\, 11 October 2017\n\n\n\n\n9:00 – 9:30\nAPI Extension and Resource Manager Integration for Malleable MPI Applications\n      Isaias Alberto Compres Urena\, Institute of Informatics\, Technical University of Munich\n      Abstract\n\n\n9:30 – 10:00\nPerformance and Quality Analysis of Interpolation Methods for Coupling\n      N. Ebrahimi-Pour\, S. Roller\, Simulationstechnik & Wissenschaftliches Rechnen\, Universität Siegen\n      Abstract\n\n\n10:00 – 10:30\nTowards Realizing a Dynamic and MPI Application-aware Interconnect with SDN\n      Keichi Takahashi\, Cybermedia Center\, Osaka University\n      Abstract\n\n\n10:30 – 11:00\nBreak\n\n\n11:00 – 11:30\nFEniCS HPC: An automated predictive high-performance framework for multiphysics simulations\n      Niclas Jansson\, Department of High Performance Computing and Visualization\, School of Computer Science and Communication\, KTH Royal Institute of Technology\n      Abstract\n\n\n11:30 – 12:00\nAutomated derivation and parallel execution of finite difference models on CPUs\, GPUs and Intel Xeon Phi processors using code generation techniques\n      Christian T. Jacobs\, Satya P. Jammy\, David J. Lusher\, Neil D. Sandham\, Engineering and the Environment at the University of Southampton\n      Abstract \n      \n\n\n12:00 – 12:30\nPerformance tuning of Ateles using Xevolver\n      Kazuhiko Komatsu\, Cyberscience Center\, Tohoku University\n      Abstract\n\n\n12:30 – 13:30\nLunch\n\n\n13:30 – 14:00\nvTorque – Introducing virtualization capabilities to Torque\n      Nico Struckmann\, HLRS\n      Abstract\n\n\n14:00 -14:30\nOptimised scheduling mechanisms for Virtual Machine deployment in Cloud infrastructures\n      Michael Gienger\, HLRS\n      Abstract\n\n\n14:30 – 15:00\nTo be defined\n      Christopher L. Barrett\, Biocomplexity Institute\, Virginia Tech\n      Abstract\n\n\n15:00 – 15:30\nBreak\n\n\n15:30 – 16:00\nSoftware for agent based social simulation in the distributed HPC environments\n      Sergiy Gogolenko\, HLRS\n      Abstract\n\n\n16:00 – 16:30\nA parallel solver for a linear system with a symmetric sparse matrix by one-dissection ordering\n      Mitsuo Yokokawa\, Tomoki Nakano\, Kobe University\n      Takeshi Fukaya\, Hokkaido University\n      Yusaku Yamamoto\, The University of Electro-Communications\n      Abstract\n\n\n16:30 – 17:00\nVistle\, a scalable visualization system for immersive virtual environments\n      Martin Aumüller\, Uwe Wössner\, HLRS\n      Abstract\n\n\n17:00 – 17:30\nFarewell
URL:https://www.wssp.hlrs.de/events/26th-workshop-on-sustained-simulation/
LOCATION:Baden-Württemberg
ORGANIZER;CN="Mr. Johannes Gebert":MAILTO:gebert@hlrs.de
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20220523
DTEND;VALUE=DATE:20220525
DTSTAMP:20260508T071404
CREATED:20220926T181045Z
LAST-MODIFIED:20221204T144319Z
UID:71-1653264000-1653436799@www.wssp.hlrs.de
SUMMARY:33rd Workshop on Sustained Simulation
DESCRIPTION:Agenda\nAll times are given in Central European Summer Time (CEST). An additional live stream will be available. Please contact Mr. Johannes Gebert (gebert@hlrs.de) if you would like to participate. \n\n\n\n\n\nMonday\, May 23rd\, 2022\n\n\n\n\n10:00 – 10:15\nWelcome & Introduction\n      Michael Resch\, HLRS\, University of Stuttgart\n\n\n10:15 – 10:45\nReaggregation of Disaggregation:\n      A Smart Approach to the Optimized Architecture and Platform Design\n      Hiroaki Kobayashi\, Tohoku University\n\n\n10:45 – 11:15\nCombinatorial Clustering for a Material Informatics Application using Aurora Vector Annealing\n      Kazuhiko Komatsu\, Tohoku University\n      Abstract \n\n\n\n            Due to the recent advancement of data science\, such as machine learning and big-data analysis\, the approach using data science techniques has attracted attention even to develop new materials\, called material informatics. In material informatics\, clustering is one of the essential data processing techniques to understand thermophysical properties. To improve clustering accuracy\, this presentation gives an Ising-based clustering method using aurora annealing machine for a material informatics application. \n \n \n\n\n\n11:15 – 11:45\nHPC Refactoring Catalog : Updates\n      Ryusuke Egawa\, Tokyo Denki University\n      Abstract \n\n\n\n            Aiming to support smooth code migration and optimization among HPC systems\, HPC Refactoring\, a database of system-aware code optimization patterns\, was designed in 2015. This talk introduces an overview and updates/future plans for improving the HPC refactoring catalog. \n \n \n\n\n\n11:45 – 13:15\nLunch\n\n\n13:15 – 13:45\nConcept of a File Tracing Mechanism for Research Data Management in High Performance Computing System\n      Yuta Namiki\, Takeo Hosomi\, AKihiro Yamashita\, Susumu Date\, Joint Research Laboratory for Integrated Infrastructure of High Performance Computing and Data Analysis\, Cybermedia Center\, Osaka University\n      Abstract \n\n\n\n            Research data management has come to take a role of great importance for reproducible and reusable research. In the recent academic research scene\, researchers and scientists leverage HPC systems as a base for processing and analyzing a large amount of data through numerical analysis and computer simulations. Despite that\, however\, the data produced and analyzed on HPC systems are not managed due to the lack of a functionality that allows us to understand how data is produced and processed in the system. From the perspective\, by focusing on the lineage of data\, which shows the origins and history of data for assuring that the data is produced on HPC systems. we have prototyped a mechanism that generates the lineage by tracing file access operations of user programs in the kernel. In this presentation\, we report the interim result we have achieved so far with future issues. \n \n \n\n\n\n13:45 – 14:15\nMitigation of Aeroacoustic Noise based on Simulations on HPC Systems\n      Matthias Meinke\, Ansgar Niemoeller\, Miro Gondrum\, Moritz Waldmann\, and Wolfgang Schroeder\, AIA\, RWTH Aachen\n      Abstract \n\n\n\n            The numerical simulation of aeroacoustic sound is important for an improved understanding of noise generation mechanisms and the design of noise mitigation strategies. In this paper\, the performance of two direct coupled two-step CFD/CAA methods implemented on HPC hardware are discussed. For the flow field either a finite-volume method for the solution of the Navier-Stokes equations or a lattice Boltzmann method is coupled to a discontinuous Galerkin method for the solution of the acoustic pertubation equations. The coupling takes advantage of a joint Cartesian mesh allowing for the exchange of the acoustic sources without MPI communication. An immersed boundary treatment of the acoustic scatttering from solid bodies by a novel solid wall formulation is implemented and validated in the DG method. Results for the case of a spinning vortex pair and the low Reynolds number unsteady flow around a circular cylinder show that a solution with comparable accuracy is obtained for the two direct hybrid methods when using identical mesh resolution. Finally\, results of a large scale application\, i.e.\, the noise prediction for a nose landing gear are presented. \n \n \n\n\n\n14:15 – 14:45\nManagement of data flows between Cloud\, HPC and IoT/Edge\n      Kamil Tokamov\, HLRS\, University of Stuttgart\n      Abstract \n\n\n\n            The components of heterogeneous applications are deployed across various execution platforms and utilise the capabilities of the platforms. As such\, one component can utilise HPC resources for better performance in batch computation\, while another – Cloud resources\, for better scalability and elasticity. Furthermore\, this is also a possibility for processing on Edge devices. The usage of such a hybrid setup\, where dependent components of the applications are deployed across various platforms\, might require flexible and adaptive data transfers from one platform to another. This work presents a data management framework\, based on the Apache NiFi dataflow management system and developed in the scope of the SODALITE EU project. This framework enables scalable data transfer between any of GridFTP (a file transfer protocol dominant in HPC)\, HTTP\, S3-compatible and data streaming (such as MQTT) endpoints. \n \n \n\n\n\n14:45 – 15:15\nBreak\n\n\n15:15 – 15:45\n\n      Sabine Roller\, DLR (Deutsches Zentrum fÃ¼r Luft- und Raumfahrt e.V.)\n      Abstract \n\n\n\n            The abstract will be provided soon. \n \n \n\n\n\n15:45 – 16:15\nIntegration of parallel HDF5 I/O in a large scale computational fluid dynamics solver\n      Tobias Gibis\, University of Stuttgart\n      Abstract \n\n\n\n            As the number of cores in massively parallel computer systems increases\, I/O strategies must be adapted as to not present bottlenecks. The “read and write one file per core” strategy\, while efficient for smaller earlier computational architectures\, leads to an unmanageable amount of files and is poorly suited for Lustre filesystems. With the introduction of Hawk at HLRS\, an urgent need had arisen to develop a new framework based on MPI-I/O in which all cores write simultaneously to a common file. To this end\, a new I/O framework based on HDF5 was implemented in the IAG in-house CFD code NS3D. The associated talk will discuss selected design decisions and challenges encountered\, escpecially regarding adequate I/O performance.\n          \n \n \n\n\n\n16:15 – 16:45\nHigh Performance Object-Oriented Data Processing Workflows for Researchers and Scientists\n      Jason Appelbaum \, University of Stuttgart\n      Abstract \n\n\n\n            As computing capacity increases\, datasets generated by HPC applications grow in size as well. The researchers and scientists who use such datasets for their work require scalability and efficiency in their data-processing workflows\, but still prioritize utility and practicality. Typically such researchers are self-taught\, intermediate-level programmers working collaboratively with others\, in which case object-oriented languages such as python offer a greatly reduced barrier to entry and improved code maintainability within their research groups. Research progress is accelerated by flexible\, easy access to large datasets for comparison and testing. The merits of a high-performance yet flexible approach utilizing HDF5 and MPI\, wrapped by h5py and mpi4py in python\, will be discussed. Examples taking advantage of collective I/O and parallel processing for the analysis of direct numerical simulation datasets will be presented\, along with performance metrics. Additionally\, techniques for ‘big data’ visualization using Paraview\, HDF5 and XDMF will be showcased. \n \n \n\n\n\n18:30\nDinner (Registration closed)\n\n\n\n\nTuesday\, May 24th\, 2022\n\n\n\n\n09:00 – 09:45\nKeynote\n      Jack Dongarra \, University of Tennessee\n\n\n09:45 – 10:15\nA ML-Based Approach to Automatic Selection of Compiler and its Option Flags\n      Hiroyuki Takizawa\, Tohoku University\n      Abstract \n\n\n\n            Today\, one HPC platform could have multiple compilers\, each of which provides a lot of option flags. Those compilers have different optimization capabilities\, and target at even different processors on a heterogeneous computing system such as NEC SX-Aurora TSUBASA. Thus\, it could be challenging to select an appropriate build configuration such as the best available compiler and its option flags for each application code. In this talk\, I will introduce our ongoting work to use machine learning for predicting an appropriate build configuration from performance counter values. \n \n \n\n\n\n10:15 – 11:00\nBreak\n\n\n11:00 – 11:30\nSpeeding up k-nearest neighbors search with space-filling curve.\n      Masashi Kotera\, Sourav Saha\, Takuya Araki\, NEC Corporation\n      Abstract \n\n\n\n            k-nearest neighbors search (k-NN) is a useful algorithm that can be used for classification and regression\, but naive k-NN is slow because it requires scanning all the training data at prediction time. There is a method of solving this problem for low dimensional data\, such as dividing the search space Like kd-tree. However\, algorithms using tree structures are difficult to vectorize because they require recursion. In this talk\, we talk about implementing the speed-up method for k-NN using z-curve\, a kind of space-filling curve; the calculation of z-curve is easy to vectorize\, and z-curve enables efficient range search that can be used to implement k-NN. \n \n \n\n\n\n11:30 – 12:00\nHeterogeneous Computing with SX-Aurora TSUBASA Vector Engine\n      Ryota Ishihara\, NEC Corporation\n      Abstract \n\n\n\n            SX-Aurora TSUBASA supports a variety of execution models in heterogeneous environments including various computational resources such as GPU and x86. User can select appropriate computational resources according to characteristics of each of applications in the executions. In this session\, we will introduce the functions provided in each execution model and how to use them. \n \n \n\n\n\n12:00 – 13:30\nLunch\n\n\n13:30 – 14:00\nTrends\n      Michael Resch\, HLRS\, University of Stuttgart\n\n\n14:00 – 14:30\nSearching a roadmap to solve partial differential equations with quantum machine learning\n      Markus Mieth\, Pia Siegl\, DLR (German Aerospace Center)\n      Abstract \n\n\n\n            Our research aims to evaluate the potential of quantum computing to solve partial differential equations (PDEs) in the context of aerospace engineering. Established algorithms and methods for PDEs often rely on discretization in time and space. For a reasonable accuracy they come with high computational costs in terms of time and memory space. Machine learning approaches are studied as an alternative. One promising concept is the physical informed neural network (PINN) [1]. Here\, the PDE is directly included into the loss function such that no data or only a limited amount is needed for training. In our approach\, we exchange the classical neural network of the PINN with a trainable quantum circuit\, while the optimization still runs on a classical computer. While we can already approximate simple functions successfully with known strategies [2\, 3]\, more complex PDEs are hard to solve. Our work focuses on the search of problem-oriented quantum circuits and data encoding strategies\, which increase the expressibility of the quantum model and allow for the approximation of more complex functions. \n \n \n\n\n\n14:30 – 15:00\nPrediction of Bio-Hybrid Fuel Injection and Mixture Formation in an Internal Combustion Engine\n      Tim Wegmann\, Matthias Meinke\, Wolfgang Schroeder\, AIA\, RWTH Aachen\n      Abstract \n\n\n\n            For an efficient\, stable and low emission combustion of novel e-fuels in piston engines\, the fuel distribution at start of ignition plays a crucial rule. The injection system and fuel properties define the initial fuel vapor distribution. The subsequent fuel-air mixing depends on the convection\, turbulence intensities\, and the formation and break-up of large-scale flow structures\, in particular the tumble vortex. Large-eddy simulations (LES) with high mesh resolution are necessary to accurately predict all involved scales for the mixing process. In this study\, numerical analyses of the liquid fuel injection and the fuel-air mixing in a piston engine are performed. LES are conducted using a hierachical unstructured Cartesian mesh method with an efficient four-way coupling of the spray droplets with the gas phase. Due to the large number of spray droplets\, a Lagrangian Particle Tracking (LPT) algorithm is used to accurately predict the liquid spray propagation and evaporation. The spray model is based on a KHRT-breakup formulation. The Navier-Stokes equations are solved for compressible flow using a finite-volume method\, where boundary surfaces are represented by a conservative cut-cell method. The hierarchical Cartesian mesh ensures efficient use of high performance computing platforms through solution adaptive refinement and dynamic load balancing. \n \n \n\n\n\n15:00 – 15:30\nBreak\n\n\n15:30 – 16:00\nQuantum machine learning for data analysis\n      Li Zhong\, HLRS\, University of Stuttgart\n      Abstract \n\n\n\n            Fault-tolerant quantum computers have been proven to be able to improve machine learning through speed-ups in computation orimproved model scalability. Therefore\, research at the junction of the two fields has garnered an increasing amount of interest\, which has led to the rapid development of quantum deep learning and quantum-inspired deep learning techniques. In thiswork\, we will demonstrate how quantum computers and quantum algorithms can be leveraged for image processing through quantum-inspired deep neural networks. \n \n \n\n\n\n16:00\nOpen End
URL:https://www.wssp.hlrs.de/events/33rd-workshop-on-sustained-simulation/
LOCATION:HLRS\, Nobelstraße 19\, Stuttgart\, Baden-Württemberg\, 70569\, Germany
ATTACH;FMTTYPE=image/png:https://www.wssp.hlrs.de/wp-content/uploads/2022/09/featured.png
ORGANIZER;CN="Mr. Johannes Gebert":MAILTO:gebert@hlrs.de
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20210316
DTEND;VALUE=DATE:20210320
DTSTAMP:20260508T071404
CREATED:20221204T235613Z
LAST-MODIFIED:20221205T001529Z
UID:204-1615852800-1616198399@www.wssp.hlrs.de
SUMMARY:31st Workshop on Sustained Simulation
DESCRIPTION:Agenda\nDue to the corona pandemic\, the workshop was hold online. All times are given in Central European Time (CET) \n\n\n\n\n\nTuesday\, March 16th\, 2021\n\n\n\n\n9:15 – 9:30\nWelcome & Introduction\n      Michael Resch\, HLRS\, University of Stuttgart\n\n\n9:30 – 10:00\nThe New Era of Hybrid-Computing on and with SX-Aurora TSUBASA: Vector-Scalar to Vector-Digital Annealing\, to Vector-Quantum Annealing\n      Hiroaki Kobayashi\, Cyberscience Center\, Tohoku University\n      Abstract \n\n\n\n            I will be giving my talk about recent-achievements on our on-going project entitle “R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications.”  My talk will start with some performance evaluation results of SX-Aurora TSUBASA as a vector computing platform\, and then go into its hybrid computing using VH-VE together. Finally\, I will discuss two types of hybrid: computing mechanism and hardware platform levels\, which are annealing on SX-Aurora TSUBASA  itself and its on SX-Aurora TSUBASA and Quantum-Annealer. \n \n \n\n\n\n10:00 – 10:30\nHPC in the Next Decade\n      Michael Resch\, HLRS\, University of Stuttgart\n      Abstract \n\n\n\n            The coming ten years of HPC will be dominated by three main developments. First\, the end of Moore’s law is already now showing its impact. Second a new application − Artificial Intelligence − is forcing computing centers to adapt their strategies. Finally\, Quantum Computing is knocking on the door of HPC without providing and hints about the direction in which QC is headed. HLRS has to adapt to these challenges and in the talk we will present challenges and opportunities as well as strategies to cope with both. \n \n \n\n\n\n10:30 – 11:00\nSoftware Methods for Product Virtualization\n      Sabine Roller\, DLR (Deutsches Zentrum für Luft- und Raumfahrt e.V.)\n      Abstract \n\n\n\n            The abstract will be provided soon. \n \n \n\n\n\n11:00 – 11:15\nBreak\n\n\n11:15 – 11:45\nSX-Aurora TSUBASA VE Design\n      Hiroki Asano\, NEC Tokyo\n      Abstract \n\n\n\n            VE (Vector Engine) of SX-Aurora TSUBASA leverages innovative vector processors which have high single-core performance and memory bandwidth. \n            NEC has been developing VE series and launched the latest model VE20 in 2020. \n            At this moment\, our challenge is to achieve better performance for the future VE products through several studies. \n            In this session\, we will introduce one of the studies regarding the memory subsystem. \n \n \n\n\n\n11:45 – 12:15\nFostering HPC Competences in Europe to Support Academia and Industry\n      Bastian Koller\, HLRS\n      Abstract \n\n\n\n            This talk will give an update on the EuroHPC Projects EuroCC and CASTIEL\, which are implementing 33 National Competence Centres in Europe. Also\, it will update on the FF4EuroHPC Projects with insights on the first closed open call for industrial experimentation. \n \n \n\n\n\n12:15\nOpen Meeting\n\n\n\n\nWednesday\, March 17th\, 2021\n\n\n\n\n8:30 – 8:35\nIntroduction of Chair\n\n\n8:35 – 9:05\nExploiting Hybrid Parallelism in LBM Implementation Musubi on Hawk\n      Harald Klimach\, Kannan Masilamani\, Sabine Roller\, University of Siegen\n      Abstract \n\n\n\n            In this contribution we look into the efficiency and scalability of our Lattice Boltzmann implementation Musubi when using OpenMP threads within an MPI parallel computation on Hawk. The Lattice Boltzmann methods enables explicit computation of incompressible flows and the mesh discretization can be automatically generated\, even for complex geometries. The basic Lattice Boltzmann kernel is fairly simple and involves only few floating point operations for each lattice node. A simple loop over all lattice nodes each partition of the MPI parallel setup lends a straight forward loop parallelization with OpenMP. With increased core counts per compute node\, the use of threads on the shared memory nodes is gaining importance\, as it avoids overly small partitions with many outbound communications to neighboring partitions. We briefly discuss the hybrid parallelization of Musubi and investigate how the usage of OpenMP threads affects the performance when running simulations on the Hawk supercomputer at HLRS. \n \n \n\n\n\n9:05 – 9:35\nHybrid Computation on Building Responses for Earthquakes on a VH and VEs of SX-Aurora TSUBASA\n      Mitsuo Yokokawa\, Kobe University\n      Abstract \n\n\n\n            We have earthquakes frequently\, and therefore buildings need to be earthquake resistant. A code for building responses in time for earthquakes was parallelized by using a hybrid execution model on a VH and VEs of the SX-Aurora TSUBASA. Computation performance will be presented. \n \n \n\n\n\n09:35 – 10:05\nForecasting Intensive Care Unit Demand during the COVID-19 Pandemic: A Spatial Age-structured Microsimulation Model\n      Ralf Schneider\, HLRS\, Sebastian Klüsener\, Matthias Rosenbaum-Feldbrügge\n      Abstract \n\n\n\n            Background: The COVID-19 pandemic poses the risk of overburdening health care systems\, and in particular intensive care units (ICUs). Non-pharmaceutical interventions (NPIs)\, ranging from wearing masks to (partial) lockdowns have been implemented as mitigation measures around the globe. However\, especially severe NPIs are used with great caution due to their negative effects on the economy\, social life and mental well-being. Thus\, understanding the impact of the pandemic on ICU demand under alternative scenarios reflecting different levels of NPIs is vital for political decision-making on NPIs. The aim is to support political decision-making by forecasting COVID-19-related ICU demand under alternative scenarios of COVID-19 progression reflecting different levels of NPIs. \n            Methods: In this talk we will present our implementation of a spatial age-structured microsimulation model of the COVID-19 pandemic by extending the Susceptible-Exposed-Infectious-Recovered (SEIR) framework. The model accounts for regional variation in population age structure and in spatial diffusion pathways. In a first step\, we calibrate the model by applying a genetic optimization algorithm against hospital data on ICU patients with COVID-19. In a second step\, we forecast COVID-19-related ICU demand under alternative scenarios of COVID 19 progression reflecting different levels of NPIs. The third step is the automation of the procedure for the provision of weekly forecasts. The automated estimation of the model’s parameters is done by means of Random-Forest regression. \n            Results: In the results section we will show the application of the model to Germany and demonstrate state-level forecasts over a 2-month period\, which can be updated daily based on latest data on the progression of the pandemic.To illustrate the merits of our model\, we present here “forecasts” of ICU demand for different stages of the pandemic during 2020 and 2021. Our forecasts for a quiet summer phase with low infection rates identified quite some variation in potential for relaxing NPIs across the federal states. By contrast\, our forecasts during a phase of quickly rising infection numbers in autumn (second wave) suggested that all federal states should implement additional NPIs. However\, the identified needs for additional NPIs varied again across federal states. In addition\, our model suggests that during large infection waves ICU demand would quickly exceed supply\, if there were no NPIs in place to contain the virus. \n              \n \n \n\n\n\n10:05 – 10:20\nBreak\n\n\n10:20 – 10:50\nHPC Based Analyses of Biofuel Injection in IC Engines and Metall Machining Processes\n      Matthias Meinke\, Tim Wegmann\, Julian Vorspohl\, Daniel Lauwers\, and Wolfgang Schröder\, RWTH Aachen\n      Abstract \n\n\n\n            Some recent engineering applications will be presented\, in which the turbulent flow in technical devices with embedded droplets and particles is simulated with the HPC platform HAWK installed at HLRS. The investigation of the flow field in the internal combustion engine is performed\, to analyze the mixing of air with various injected biofuels. The distribution of the evaporated fuel in the internal combustion engine has a large influence on the emissions of pollutants and the engine efficiency. Since biofuels possess quite different fluid properties\, it is important to accurately predict the concentration of the evaporated fuels for an optimization of the engine performance. The second application is connected to electrical discharge and electro-chemical machining processes\, which are used to manufacture work pieces such as turbine blades of high strength material. In these processes\, fluid flow plays an important role for the transport of removed material and thus on the quality of the final product. The simulations conducted are based on a solver formulated for hierarchical Cartesian meshes\, in which a Lagrangian particle solver is used to track the motion of droplets of the fuel spray or the transport of removed material. Several aspects of the numerical methods\, its parallelization\, dynamic load balancing and the implementation on high performance computing platforms will be presented in this contribution. \n \n \n\n\n\n10:50 – 11:20\nParticle-in-Cell Plasma Simulation of Filamentary Coherent Structures\n      Seiji Ishiguro and Hiroki Hasegawa\, National Institute for Fusion Science\, National Institute of Natural Sciences\n      Abstract \n\n\n\n            We have performed three dimensional particle-in-cell (PIC) plasma simulation to investigate the filamentary coherent structure so called blob/hole.\n            Impurity ion transport by this structure is revealed. The impurity ion profile in the blob/hole structure becomes a dipole structure and this propagates with the blob/hole. \n            The performance of three dimensional PIC code on the new super computer build by NEC SX-Aurora TSUBASA at National Institute for Fusion Science will also be presented. \n \n \n\n\n\n11:20 – 11:50\nComputational Simulation of Chiral Transition and Paramagnetic Current Induced by Paramagnetic Coupling in Chiral Superconductor\n      Hirono Kaneyasu\, University of Hyogo\n      Abstract \n\n\n\n            Assuming the non-unitary chiral superconductivity as a bulk state of Sr₂RuO₄\, we show the field-induced chiral stability generating the paramagnetic current in the eutectic Sr₂RuO₄-Ru by computational simulation of the Ginzburg-Landau equation. The paramagnetic coupling with the chiral magnetization causes the field-induced chiral transition and the paramagnetic current. The field-induced chiral stability consists of the field-dependence of zero-bias anomaly in the tunneling spectroscopy. This good agreement with the experimental result indicates that the non-unitary chiral spin-triplet state is one of candidates for the superconducting state of Sr₂RuO₄\, in addition to the chiral spin-single state as other candidates. The high-performance computing by code optimized for SX-Aurora makes it possible to analyze field dependence and spatial variation of chiral state and supercurrent in more detail. \n \n \n\n\n\n11:50 – 12:20\nDirect Numerical Flow Simulation on Vector and Massively-parallel Supercomputers\n      Johannes Peter\, University of Stuttgart\n      Abstract \n\n\n\n            Direct numerical simulations of turbulent flows require high computational power\, only available on supercomputers such as those provided at the HLRS. At the Institute of Aerodynamics and Gas Dynamics an adapted in-house finite-difference code of high accuracy order is used for the analysis of canonical transitional and turbulent wall-bounded flows and (in-)stability investigations. The talk will present some recent results regarding supersonic mixing of two gases and show performance data for the massively-parallel ‘Hawk’ system and the ‘SX-Aurora’ vector system. \n \n \n\n\n\n12:20\nOpen Meeting\n\n\n\n\nThursday\, March 18th\, 2021\n\n\n\n\n8:30 – 8:35\nIntroduction of Chair\n\n\n8:35 – 9:05\nIntroduction of Cloud Bursting and Seamless Use of SX-Aurora TSUBASA under Job Scheduler.\n      Tatsuyoshi Ohmura\, NEC Tokyo\n      Abstract \n\n\n\n            With the development of AI/BDA\, the number of users of HPC systems is increasing. While the computational power required by users is also increasing On-premise HPC systems are difficult to expand due to limitations in power\, space\, and budget. In this session\, we will introduce cloud bursting\, which is a Job Scheduler function to temporarily expand computational power by using cloud computing resources. In addition\, we will introduce the function that enables the HPC system to be used transparently under Job Scheduler. \n \n \n\n\n\n9:05 – 9:35\nPorting and Optimizing Molecular Docking Simulations on SX-Aurora Vector Engine\n      Erich Focht (NEC Germany)\, Leonardo Solis-Vasquez (TU Darmstadt)\, Andreas Koch (TU Darmstadt)\n      Abstract \n\n\n\n            Molecular docking simulations are widely used in computational drug discovery to predict molecular interactions at close distances. Specifically\, these simulations aim to predict the binding poses between a small molecule and a macromolecular target\, both referred to as ligand and receptor\, respectively. The purpose of drug discovery is to identify ligands that effectively inhibit the harmful function of a certain receptor. In that context\, molecular docking simulations are critical\, by using them\, the time-consuming preliminary tasks consisting of identifying potential drug candidates can be significantly shortened. Subsequent wet lab experiments can be carried out using only a narrowed list of promising ligands\, hence reducing the overall cost of experiments. \n            AutoDock is one of the most widely used software applications for molecular docking simulations. Its main engine is a Lamarckian Genetic Algorithm (LGA)\, which combines a genetic algorithm and a local-search method to explore several molecular poses. The prediction of the best pose is based on the score\, which is a function that evaluates the free energy (kcal/mol) of a ligand-receptor system. AutoDock is characterized by nested loops with variable upper bounds and divergent control structures. Moreover\, the time-intensive score evaluations are typically invoked a couple of million of times within each LGA run. Based on its computation intensity\, AutoDock suffers from long execution runtimes\, which are mainly attributed to its inability to leverage its embarrassing parallelism. In recent years\, an OpenCL-based implementation of AutoDock has been developed to accelerate its executions on a variety of devices including multi-core CPUs\, GPUs\, and even FPGAs. \n            In this work\, we present our experiences porting and optimizing the OpenCL-based AutoDock onto the SX-Aurora Vector Engine. The OpenCL code is composed of a host and device parts that are maintained in the NEC VEO version. As the API functions of OpenCL and VEOffload resemble each other\, porting the host code was very smooth. While the device part was easily ported too\, an extra effort was required to increase the performance on SX-Aurora. For this\, we used hardware-specific techniques that involve: appropriate data types for wider vectors\, leveraging the multiple cores on the SX-Aurora\, pushing outer into inner loops in score calculations and local search and using multi-process VEO to overcome OpenMP limitations in NUMA mode. Our evaluations were done on VE10B and VE20B models and compared to modern multicore CPUs and GPUs. \n \n \n\n\n\n09:35 – 10:05\nEvaluating and Exploiting the Potential of the Second-generation SX-Aurora TSUBASA\n      Hiroyuki Takizawa\, Tohoku University\n      Abstract \n\n\n\n            In October 2020\, we have started the operation of Supercomputer AOBA\, which employs the second-generation SX-Aurora TSUBASA as the main computing resource. In this talk\, I would like to share the performance evaluation results to discuss the potential of the second-generation SX-Aurora TSUBASA through comparison with the previous generations. I will also introduce our recent research activities for making a good use of its performance while keeping the code portable. \n \n \n\n\n\n10:05 – 10:20\nBreak\n\n\n10:20 – 10:50\nAcceleration of Structural Analysis Software FrontISTR on NEC SX-Aurora TSUBASA\n      Toshiaki Hishinuma\, Research Institute for Computational Science Co. Ltd.\n      Abstract \n\n\n\n            Structural analysis using the finite element method (FEM) is widely used in the field of engineering.\n            Recently\, NEC has introduced SX-Aurora TSUBASA\, which has vector accelerator boards (Vector Engine\, VE).\n            One VE has a high-speed memory with a bandwidth of about 1.2 TB/s and eight high-performance vector cores.\n            Each core has three Fused Multiply-Add (FMA) arithmetic units\, each of which can perform 32 double-precision floating point element executions simultaneously.\n            The host CPU is called VH. \n            FrontISTR is one of the highly parallelized open-source FEM software programs for nonlinear structural analysis.\n            This software firstly generates a stiffness matrix using the FEM and then solves linear equations for the sparse matrix generated by FEM.\n            The stiffness matrix generation is not suitable for VE because it cannot process data continuously\, and it involves many integer operations. \n            There is an API for transferring data between VH and VE called Another/Alternative/Awesome VE Offloading (AVEO)\, which can be used to execute compute-intensive portions of an entire program on VE.\n            We accelerate execution speed of the overall structural analysis program by running the linear equation solvers on VE using AVEO.\n            We chose the JAD format as the sparse matrix storage format and the conjugate gradient (CG) method as the linear solver. \n            In this study\, we evaluate accelerated FrontISTR in terms of the following three parts.\n            (1) the generation time of stiffness matrices on VH\, (2) the transfer time of sparse matrices and vectors from VH to VE using AVEO\, and (3) the calculation time of linear equations by the CG method on VE.\n            We describe the effectiveness of accelerated structural analysis execution on NEC SX-Aurora TSUBASA. \n \n \n\n\n\n10:50 – 11:20\nVGL: a High-Performance Graph Processing Framework for the NEC SX-Aurora TSUBASA Vector Architecture\n      Ilya Afanasyev; Moscow State University\n      Abstract \n\n\n\n            Developing efficient graph algorithms implementations is an extremely important problem of modern computer science\, since graphs are frequently used in various real-world applications. Graph algorithms typically belong to the data-intensive class\, and thus using architectures with high-bandwidth memory potentially allows to solve many graph problems significantly faster compared to modern multicore CPUs. Among other supercomputer architectures\, vector systems\, such as the SX family of NEC vector supercomputers\, are equipped with high-bandwidth memory. However\, the highly irregular structure of many real-world graphs makes it extremely challenging to implement graph algorithms on vector systems\, since these implementations are usually bulky and complicated\, and a deep understanding of vector architectures hardware features is required. We present the world first attempt to develop an efficient and simultaneously simple graph processing framework for modern vector systems. Our vector graph library (VGL) framework targets NEC SX-Aurora TSUBASA as a primary vector architecture and provides relatively simple computational and data abstractions. These abstractions incorporate many vector-oriented optimization strategies into a high-level programming model\, allowing quick implementation of new graph algorithms with a small amount of code and minimal knowledge about features of vector systems. The provided comparative performance analysis demonstrates that VGL-based implementations achieve significant acceleration over the existing high-performance frameworks and libraries: up to 14 times speedup over multicore CPUs (Ligra\, Galois\, GAPBS) and up to 3 times speedup compared to NVIDIA GPU (Gunrock\, NVGRAPH) implementations. \n \n \n\n\n\n11:20 – 11:50\nOptimization of the stencil computation considering the architecture of SX-Aurora TSUBASA\n      Kazuhiko Komatsu\, Tohoku University\n      Abstract \n\n\n\n            This presentation introduces optimizations of the stencil computation on SX-Aurora TSUBASA\, which focuses on the differences of bandwidth characteristics of SX-Aurora TSUBASA.\n \n \n\n\n\n11:50 – 12:20\nComputational Simulation of External Aerodynamics: Evaluation of Performance and Scalability\n      Michael Wagner\, DLR (Deutsches Zentrum für Luft- und Raumfahrt e.V.)\n      Abstract \n\n\n\n            This presentation will share our efforts and experiences in evaluating performance and scalability of CODA on current HPC architectures. CODA is a CFD solver for external aircraft aerodynamics developed by DLR\, ONERA\, and Airbus\, and one of the key next-generation engineering applications represented in the European Centre of Excellence for Engineering Applications (EXCELLERAT).\n \n \n\n\n\n12:20\nOpen Meeting\n\n\n\n\nFriday\, March 19th\, 2021\n\n\n\n\n8:30 – 8:35\nIntroduction of Chair\n\n\n8:35 – 9:05\nPacked Mode Vectorization in LLVM for SX-Aurora\n      Simon Moll (NEC Germany)\n      Abstract \n\n\n\n            The abstract will be provided soon.\n \n \n\n\n\n9:05 – 9:35\nAn Energy-aware Cache Control Mechanism for Deep Cache Hierarchy\n      Ryusuke Egawa(Tokyo Denki University)\, Liu Jiaheng(Tohoku University)\n      Abstract \n\n\n\n            To overcome the memory wall problem\, cache hierarchies of modern microprocessors have become deeper and larger as the number of cores increases. Besides\, the power and energy consumption of the deep cache hierarchy become non-negligible. In this talk\, we present a mechanism to improve cache energy efficiency by adapting a cache hierarchy to individual applications and its evaluation results.\n \n \n\n\n\n09:35 – 10:05\nLoose Coupling of Task-based Programming Models with MPI Through Continuations\n      Joseph Schuchart\, HLRS\n      Abstract \n\n\n\n            Using MPI in combination with asynchronous task-based programming models can be a daunting task. Applications typically have to manage a dynamic set of active operations\, fall-back to a fork-join model\, or rely on some middleware to coordinate the interaction between MPI and the task scheduler. In this talk\, I will propose an extension to MPI\, called MPI Continuations\, that provides a callback-based notification mechanism to simplify the usage of MPI inside asynchronous tasks.\n \n \n\n\n\n10:05 – 10:20\nBreak\n\n\n10:20 – 10:50\nFive challenges of new supercomputing system SQUID in Osaka University\n      Susumu Date\, Cybermedia Center\, Osaka University\n      Abstract \n\n\n\n            The abstract will be provided soon.\n \n \n\n\n\n10:50 – 11:20\nBasics on Quantum Computation\n      Thomas Kloss\, University Grenoble\n      Abstract \n\n\n\n            With the announcement of quantum supremacy in 2019\, Google claimed to have solved the first real-world problem out of reach for classical computers. Since then at the latest\, quantum computing has moved into the political and economic spotlight. In this talk I will present some very basic slides on quantum computers and what makes them different of classical computers. I will also show new work which puts Googles supremacy claim into question. \n \n \n\n\n\n11:20 – 11:50\nAI@HLRS: The Past\, the Present\, and the Future\n      Dennis Hoppe\, HLRS\n      Abstract \n\n\n\n            \n        The growth of artificial intelligence (AI) is accelerating. AI has left research and innovation labs\, and nowadays plays a significant role in everyday lives. The impact on society is graspable: autonomous driving cars produced by Tesla\, voice assistants such as Siri\, and AI systems that beat renowned champions in board games like Go. All these advancements are facilitated by powerful computing infrastructures based on HPC and advanced AI-specific hardware\, as well as highly-optimized AI codes. Since several years\, HLRS is engaged in big data and AI-specific activities around HPC. In this talk\, I will give a brief overview about our AI-focused research project CATALYST to engage with researchers and industry\, present selected case studies\, and outline our journey over the last years with respect to the convergence of AI and HPC from both a software and hardware point of view.\n\n          \n \n \n\n\n\n11:50 – 12:00\nBreak\n\n\n12:00 – 13:00\nNEC Vector Engine Performance with Legacy CFD Codes\n      Keith Obenschain\, NRL (United States Naval Research Laboratory)\n      Abstract\n      Abstract \n\n\n\n            \nMany codes that were developed during the vector supercomputing era from the 1970’s to 1990’s are still in use with vector friendly constructs in their codebase. The recently released NEC Vector Engine provides an opportunity to exploit this vector heritage. The NEC Vector engine can potentially provide state of the art performance without a complete rewrite of the codebase.  Given the time and cost required to port or rewrite codes\, this is potentially an attractive solution. This presentation will assess how the NEC Vector engine performance compares with existing architectures using traditional benchmarks\, a legacy CFD program FDL3DI and the effort required to take full advantage of the architecture.\n        \n          \n \n \n\n\n\n13:00\nOpen Meeting
URL:https://www.wssp.hlrs.de/events/31st-workshop-on-sustained-simulation/
LOCATION:HLRS\, Nobelstraße 19\, Stuttgart\, Baden-Württemberg\, 70569\, Germany
ATTACH;FMTTYPE=image/png:https://www.wssp.hlrs.de/wp-content/uploads/2022/09/featured.png
ORGANIZER;CN="Mr. Johannes Gebert":MAILTO:gebert@hlrs.de
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20181009
DTEND;VALUE=DATE:20181011
DTSTAMP:20260508T071404
CREATED:20221205T170916Z
LAST-MODIFIED:20221205T172127Z
UID:213-1539043200-1539215999@www.wssp.hlrs.de
SUMMARY:28th Workshop on Sustained Simulation
DESCRIPTION:Agenda\n\n\n\n\n\nTuesday\, 9 October 2018\n\n\n\n\n9:00 – 9:15\nIntroduction\n      Michael Resch\, HLRS\, University of Stuttgart\n\n\n9:15 – 9:45\nExperiences with SX-Aurora Tsubasa and its extension for the future\n      Hiroaki Kobayashi\, Cyberscience Center\, Tohoku University\n      Abstract \n\n\n\n            \n        In my talk\, I would like to share with you some experiences with NEC’s New Vector system named SX-Aurora TSUBASA.  I will also present our on-going project\, named Quantum Annealing-Assisted Next generation HPC infrastructure\, with the extension of SX-Aurora TSUBASA for the future.\n        \n          \n \n \n\n\n\n9:45 – 10:15\nUpdate on the HLRS Strategy\n      Michael Resch\, HLRS\n      Abstract \n\n\n\n            \n         This talk will summarize the research and development activities of HLRS and will highlight future activities and directions\n         \n          \n \n \n\n\n\n10:15 – 10:45\nBreak\n\n\n10:45 – 11:15\nStatus of HPC in Siegen\n      Sabine Roller\, Simulationstechnik & Wissenschaftliches Rechnen\, Universität Siegen\n      Abstract \n\n\n\n \n \n\n\n\n11:15 – 11:45\nHPC\, HPDA\, Machine Learning\, Deep Learning….a glance of the evolution of traditional HPC Centers\n      Bastian Koller\, HLRS\n      Abstract \n\n\n\n            \n         In this talk the current evolution of HPC as a single resource offering towards a set of solutions will be presented\, analysed and some thoughts about the future use of such systems will be given\n         \n          \n \n \n\n\n\n11:45 – 12:15\nData Reduction using Singular Value Decomposition (SVD) Algorithm\n      Jing Zhang\, HLRS\n      Abstract \n\n\n\n            \n         This talk will present the Singular Value Decomposition (SVD) algorithm and  give the results for smaller data sets using SVD as a data reduction  algorithm\, which could perform feature extraction in “raw” data.\n         \n          \n \n \n\n\n\n12:15 – 13:15\nLunch\n\n\n13:15 – 13:45\nPorting Climate Models to Aurora TSUBASA\n      Panos Adamidis\, DKRZ\n      Abstract \n\n\n\n            \n         The main interest of the climate models running at DKRZ focuses on two major directions. On one hand\, high resolution grids are being used in order to resolve small-scale physical processes. In this way\, parametrisation and the inherent uncertainty can be avoided \, thus improving significantly climate change projections. On the other hand addressing questions related to climate variability also involves simulations running over long time periods e.g. modeling of complete glacial cycle on grids with coarser resolution.\n  		 Such simulations are computationally very intensive and high sustained performance is vital in order to be able to conduct real world experiments. Both\, single node performance as well as  good scaling capabilities of the soft- and hardware are important.\n  		 The NEC Aurora TSUBASA System promises high sustained performance by combining high floating point performance of vector processors with extremely high memory bandwidth.  The presentation will show first results from our tests with  earth system models on the NEC Aurora system at DKRZ.\n         \n          \n \n \n\n\n\n13:45 – 14:15\nPerformance evaluation and analysis of SX-Aurora TSUBASA\n      Kazuhiko Komatsu\, Cyberscience Center\, Tohoku University\n      Abstract \n\n\n\n            \n         A new vector supercomputer\, SX-Aurora TSUBASA\, has been released. It has a newly developed Vector Engine(VE) processor to achieve a high sustained performance by powerful vector processing and a high memory bandwidth. This presentation examines the basic potential of SX-Aurora TSUBASA through the performance evaluations.\n         \n          \n \n \n\n\n\n14:15 – 14:45\nPerformance of a DNS code on SX-Aurora TSUBASA\n      Mitsuo Yokokawa\, Kobe University\n      Abstract \n\n\n\n           \n         Direct numerical simulations (DNSs) of turbulence at high Reynolds number are very important to understand behavior of turbulent flow and to establish tubulent models. We have carried out large-scale DNSs for more than a decade on the Earth Simulator and the K compute and would like to execute DNSs with larger grid points beyond the present grid points on future supercomputers. In this talk\, the first evaluation results of a DNS code performance on SX-Aurora TSUBASA will be presented. In paticular\, off-loaded I/O performace for checkpoints of instantanious velocity fields from a vector engin (VE) to a vector host (VH) as well as computing performace on VEs will be included.\n         \n          \n \n \n\n\n\n14:45 – 15:15\nOrganizing MPI parallel Simulations\n      Harald Klimach\, Uni Siegen\n      Abstract \n\n\n\n            \n        MPI parallel simulations provide some challenges when dealing with user interaction. We present:\n* a method to obtain configuration settings from Lua scripts in a scalable way\,\n* a strategy to manage logging output during the parallel execution with configurable level of detail\,\n* a concept to deal with errors detected by the application at runtime.\nThese components are put together into a Fortran library to build a basic infrastructure for parallel simulation applications. It relies on Fypp as a pre-processing tool\, which allows the usage Python to generate Fortran code.\n         \n          \n \n \n\n\n\n15:15 – 15:45\nBreak\n\n\n15:45 – 16:15\nAccelerating Heatstroke Risk Simulation on Modern Vector Systems\n      Ryusuke Egawa\, Cyberscience Center\, Tohoku University\n      Abstract \n\n\n\n            \n         TBD\n         \n          \n \n \n\n\n\n16:15 – 16:45\nVector Engine Processor and 2D vector function\n      Shintaro Momose\, NEC Germany\n      Abstract \n\n\n\n            \n         TBD\n         \n          \n \n \n\n\n\n16:45 – 17:15\nAurora SW Update\n      Masashi Ikuta\, NEC Tokyo\n      Abstract \n\n\n\n            \n         TBD\n         \n          \n \n \n\n\n\n17:15 – 17:45\nLimits of sustained performance and remedies\n      Uwe Küster\, NEC Germany\n      Abstract \n\n\n\n            \n         With limited processor frequency any performance increase results from parallelism. But the startup to fill the operating units decreases the effective performance if the executed kernels are not large. We discuss this effect by measurements on Aurora. A remedy could be in implementing bricks of kernels in hardware.\n         \n          \n \n \n\n\n\n19:00 –\nDinner in Goldener Adler\n\n\n\n\nWednesday\, 10 October 2018\n\n\n\n\n9:00 – 9:30\nDeep Neural Networks for Data-Driven Turbulence Models\n      Andrea Beck\, IAG Stuttgart\n      Abstract \n\n\n\n            \n        In this talk\, we present a novel data-based approach to turbulence modelling for Large Eddy Simulation by artificial neural networks. We define the exact closure terms including the discretization operators and generate training data from direct numerical simulations of decaying homogeneous isotropic turbulence. We design and train artificial neural networks based on local convolution filters to predict the underlying unknown non-linear mapping from the coarse grid quantities to the closure terms without a priori assumptions. All investigated networks are able to generalize from the data and learn approximations with a cross correlation of up to 47% and even 73% for the inner elements\, leading to the conclusion that the current training success is data-bound. We further show that selecting both the coarse grid primitive variables as well as the coarse grid LES operator as input features significantly improves training results. Finally\, we construct a stable and accurate LES model from the learned closure terms. Therefore\, we translate the model predictions into a data-adaptive\, pointwise eddy viscosity closure and show that the resulting LES scheme performs well compared to current state of the art approaches. This work represents the starting point for further research into data-driven\, universal turbulence models.\n        \n          \n \n \n\n\n\n9:30 – 10:00\nAn object oriented multiphysics simulation concept for HPC\n      Matthias Meinke\, Lennart Schneiders\, Michael Schlottke-Lakemper\, Wolfgang Schroeder\, Institute of Aerodynamics\, RWTH Aachen Unviersity\, Aachen\, Germany\n      Abstract \n\n\n\n            \n        A simulation framework for a generalized multiphysics simulation concept is introduced which is based on an implementation of various solvers formulated for Cartesian hierarchical meshes. The implementation features a generalized mesh object which communicates with various solvers based on finite-volume or discontinous Galerkin methods. SInce all solution methods share a common mesh\, solution adaptive meshes with dynamic load balancing are straighforward to implement. Interleaved time stepping of the solvers for the different physics allow an efficient implemention on HPC systems. Examples are presented for the coupling of various solution methods for flow simulations\, acoustic fields and level sets used for tracking moving surfaces.\n        \n          \n \n \n\n\n\n10:00 – 10:30\nMoving geometries in high-order discontinuous Galerkin discretization\n      Neda Ebrahimi Pour\, Uni Siegen\n      Abstract \n\n\n\n           \n         Representing geometries in high-order schemes is a crucial task with special requirements. An attractive solution to this problem is the employment of penalizing terms to represent the geometry within elements. This approach also allows for a convenient movement of obstacles through flows for example\, as it avoids the need for expensive remeshing and interpolations. We present this concept in our high-order discontinuous Galerkin solver Ateles and show first results for compressible flows.\n         \n          \n \n \n\n\n\n10:30 – 11:00\nBreak\n\n\n11:00 – 11:30\nTowards Performance and Power Model for Multi-Core Processors with DVFS\n      Dmitry Khabi\, HLRS\n      Abstract \n\n\n\n            \n         A significant part of homogeneous supercomputer consists of an extensive number of general-purpose processors (CPU)\,which are connected with each other over a high-performance network. The degree of parallelism of these high-performance computing (HPC)platforms is generally limited to the number of processor cores. The scalability and performance enhancement almost exclusively stems fromthe growing number of CPU cores\, although that number no longer meets the constantly expanding HPC requirements.The growing number of CPU cores is to be seen not only as a simple increasing number of cores but also as the additionaloverhead in a distribution of the rest of the hardware resoursers\, such as the network\, last level cache\, memory channels\, power budget\, etc.\,between the growing number of the cores and hence also between the parallel processes and threads. Particulary in combination with the capabilitiesof the processor to change the operating voltage and frequency\, so-called “Dynamic Voltage and Frequency Scaling (DVFS)”\,the analysis of the scalability and energy efficienty of a multi-core processor even on the basis of the existing models\,such as “Roofline Performance Model” or “Execution-Cache-Memory Model” (ECM)\, is additionally complicated.The performance and power dissipation of CPU and DRAM are in complex interaction with the number of the active cores and the CPU frequency.This talk presents an extension of ECM model (hereinafter referred as DTM – “Data Transfer Model”)\, which describes the performance taking intoconsideration the various frequencies of the hardware components. The evaluation of the model using the “STREAM” kernels (with temporally memory access)is performed on different hardware architectures.\n         \n          \n \n \n\n\n\n11:30 – 12:00\nA method to reduce load imbalances in simulations of phase change processes with FS3D\n      Johannes Müller\, Martin Reitzle\, Institute of Aerospace Thermodynamics (ITLR)\, University of Stuttgart\, Philipp Offenhäuser\, High-Performance Computing Center Stuttgart (HLRS)\n      Abstract \n\n\n\n           \n         Numerical simulations of phase change processes require a precise reconstruction of the interface between two phases. Based on the Volume of Fluid (VoF) method for multiphase flows the height function technique is able to reconstruct the sharp interface accurately and enables simulations with complex interface deformations. But this calculations increase the computational load for cells containing the phase interface. An equidistant domain decomposition leads to an imbalanced workload distribution. In order to perform investigations with a high spatial and temporal resolution\, it is necessary to use the available HPC resources efficiently. The challenge of parallelization is to distribute the workload homogeneously among the cores. For simulations of the solidification processes with the multiphase code Free Surface 3D (FS3D)\, a load balanced domain decomposition is presented. The first part is the decomposition of the structured computational domain by recursive bisection. The second part is the corresponding process communication\, which enables a nearest neighbor communication through non-blocking MPI calls. The transport of the diagonal element is realized via a communication sequence and thus an exchange of small amounts of data is avoided. A measure for the load imbalance is presented based on test cases. Finally\, advantages and limitations of load balancing are discussed based on the tracing of the calculations for one timestep.\n         \n          \n \n \n\n\n\n12:00 – 12:30\nOptimization and Parallelization of a Phase-Field Solver to Investigate Sintering Processes\n      J. Hötzer\, H. Hierl\, M. Seiz\, M. Kellner\, B. Nestler\, IAM-CMS / KIT\n      Abstract \n\n\n\n            \n         Simulations allow to improve the development of new high performance materials with tailored microstructures and defined properties. The process of sintering is of high interest in order to produce defined ceramic materials as needed for a broad range of applications e.g. healthcare\, electronics\, automotive and aerospace. The phase-field method allows to efficiently investigate the microstructure evolution in large scale 3D domains during the sintering process. A phase-field model based on the grand potential approach is implemented in the massively parallel phase-field solver framework PACE3D. It is optimized on various levels starting from the model and parameters down to the hardware. The solver allows to resolve and calculate an arbitrary number of individual particles in the green body by using a local reduction technique and a material class based parametrization concept. The evolution equations for the phase-fields and the concentration are explicitly vectorized using vector intrinsics. Performance results on a single core\, single node and with up to 96100 processes on the German supercomputers Hazel Hen\, SuperMuc and ForHLR II are shown and discussed. Besides an optimized voxel based format to store the simulation data for checkpointing with MPI-IO\, an efficient and reduced mesh based output is used.\n         \n          \n \n \n\n\n\n12:30 – 13:30\nLunch\n\n\n13:30 – 14:00\nThe potential of MPI shared-memory model for supporting hybrid communication scheme\n      Huan Zhou\, HLRS\n      Abstract \n\n\n\n            \n         This talk first brings the MPI shared-memory model forward and then illustrates the usages of this model via two use cases. One is the hybrid RMA. It is fully employed in the DART-MPI. The other is the hybrid collectives (to be specific\, allgather operation). These two use cases highlight the performance benefit brought by the appliance of MPI shared-memory model. However\, extra synchronization operations should be added  to guarantee a deterministic behaviour.\n         \n          \n \n \n\n\n\n14:00 – 14:30\nFine-Grained Synchronization using Global Task Dependencies in DASH\n      J. Schuchart\, J.Gracia\, HLRS\n      Abstract \n\n\n\n            \n         The current usage of MPI communication operations leads to a global synchronization across many processes and compute nodes. The problem becomes more severe when combining MPI with a thread-parallel programming model such as OpenMP: synchronization latencies are paid manyfold by all threads within an MPI processes. We present ongoing work to address this problem by implementing a task-based programming model which allows to express dependencies across MPI processes. This kind of fine-grained synchronization can replace global MPI synchronization in many cases and thus result in substantially improved communication efficiency.\n         \n          \n \n \n\n\n\n14:30 – 15:00\nAutomatic Parameter Tuning for Efficient Checkpointing\n      Hiroyuki Takizawa\, Muhammad Alfian Amrizal\, Kazuhiko Komatsu\, and Ryusuke Egawa\, Graduate School of Information Sciences\, Tohoku University\n      Abstract \n\n\n\n            \n         One of the most intensive I/O operations of a scientific simulation is so-called checkpointing\, which is to save the state of a running simulation into a checkpoint file so that the simulation can be resumed from the file upon a system failure. Generally\, it is difficult to increase the I/O performance of a system at the same pace as its computational performance. Moreover\, it is necessary for a future system to perform checkpointing more frequently because the system will consist of more hardware components and hence the probability of encountering a system failure during the simulation will significantly increase. As a result\, the overhead for checkpointing will be relatively growing\, and could dominate the total simulation time. In this talk\, therefore\, I will discuss the possibility and potential benefit of employing automatic parameter tuning for reducing the checkpointing overheads.\n         \n          \n \n \n\n\n\n15:00 – 15:30\nBreak\n\n\n15:30 – 16:00\nNEC SX-Aurora TSUBASA and the LLVM ecosystem\n      Simon Moll\, Compiler Design Lab\, Saarland University\n      Abstract \n\n\n\n            \n         The NEC SX-Aurora TSUBASA is a high-performance vector CPU for sustained simulation performance. The existing compiler toolchain for the SX-Aurora is comprehensive but also proprietary restricting its use in research and confining its development to internal teams at NEC. In recent years\, the open source LLVM compiler infrastructure has seen significant support and contributions by major players such as NVIDIA\, AMD\, ARM\, Intel\, Apple and Google. These employ LLVM in their official toolchains\, GPU driver stacks and mission-critical infrastructure. Likewise\, many compiler research labs have adopted LLVM for its accessibility\, robustness and permissive license. Recently\, the LLVM community has been discussing an extension for scalable vector architectures (LLVM-SVE)\, which feature an active vector length just as the SX-Aurora does.  In this talk\, we will discuss the potential of LLVM for the NEC SX-Aurora. The Compiler Design Lab at Saarland University is working with NEC on an LLVM-SVE backend for the SX-Aurora.\n         \n          \n \n \n\n\n\n16:00 – 16:30\nApproach to provide supercomputer storage I/O information toward users\n      Tsuyoshi Nakagawa\, JAMSTEC\n      Abstract \n\n\n\n            \n         The role of supercomputer storage on JAMSTEC becomes increasingly important not only for simulation but also for data driven science. The information such as the storage I/O performance and the file system characteristics may improve the user’s availability and make new scientific knowledge and innovation. In this talk\, we introduce our approach to provide I/O information toward users.\n         \n          \n \n \n\n\n\n16:30 – 17:00\nJob Scheduler Simulator Extension For Evaluating Queue Mapping to Computing Node\n      Susumu Date\, Yuki Matsui\, Yasuhiro Watashiba\, Tatashi Yoshikawa\, Shinji Shimojo\, Cybermedia Center\, Osaka University\n      Abstract \n\n\n\n            \n        TBD\n        \n          \n \n \n\n\n\n17:00 – 17:30\nFarewell
URL:https://www.wssp.hlrs.de/events/28th-workshop-on-sustained-simulation/
LOCATION:HLRS\, Nobelstraße 19\, Stuttgart\, Baden-Württemberg\, 70569\, Germany
ATTACH;FMTTYPE=image/png:https://www.wssp.hlrs.de/wp-content/uploads/2022/09/featured.png
ORGANIZER;CN="Mr. Johannes Gebert":MAILTO:gebert@hlrs.de
END:VEVENT
END:VCALENDAR