Research
At present, I'm carrying out my research activities as part of my post-doctorate with the Camus (Inria) and ICPS (ICube) teams, where I am working on the automatic parallelization of C/C++ source code by task insertion (see Automatic parallelization through task insertion). However, to this day, most of my research activities have taken place as part of my doctoral thesis at the University of Bordeaux within the Concace team. My thesis was focused on the development of methods for solving large coupled FEM/BEM sparse/dense linear systems arising from aeroacoustic problems (see Solvers for coupled FEM/BEM linear systems). That being said, I took my first steps towards computer science research during my two internships within the Camus and the ICPS teams, focusing on a new programming structure, called XFOR, and the associated software tools (see XFOR programming structure).
Automatic parallelization through task insertion
Today, multi-core processors can be found in most computers, from cell phones to high-performance computing clusters. Nevertheless, designing and writing parallel applications that make efficient use of these numerous computing resources often remains the prerogative of experts with the technical knowledge and time required to optimize their software. Numerous compilation studies have addressed this issue and proposed different approaches to automatic parallelization. These are traditionally oriented towards loop nest parallelization, based on the polyhedral model. We are interested in task-based parallel programming. In this model, the program is broken down into tasks comprising one or more instructions. Tasks are associated with dependency constraints to enable an execution engine to schedule and execute tasks in the right order, i.e. respecting any dependencies between tasks accessing the same data in memory. In this way, tasks that are independent of each other can be executed in parallel. The challenge in optimizing this approach is to find an efficient way of breaking down the initial program into tasks. In this context, we are working on a tool for the automatic parallelization of C/C++ source code, enabling us to analyze a sequential input program, find an efficient task-based parallelism model and insert tasks at the right places in the input program. Having implemented the preliminary analysis phases, notably the analysis of dependencies between different instructions in the input program, we are now exploring different strategies for automatically inserting tasks into the program.
To implement our approach, we rely on the OptiTrust source-to-source code transformation tool, to which we add the functionalities needed to analyze and transform the original program. We work closely with the authors and contributors of the OptiTrust project, in particular Arthur Chargéraud (Inria) and Thomas Koehler (Inria).
Running tutorials
- Towards a reproducible research study
- ComPAS 2023 conferece @ Annecy, France, July 2023
- Self-contained tutorial
- How to use Org mode and Guix to build a reproducible experimental study
- Workshop on Reproducible Software Environments for Research and High-Performance @ Montpellier, France, November 2023
- Slides
Participation on program committees
- Artifact evaluation committee
- International Symposium on Code Generation and Optimization (CGO) 2024 @ Edinburgh, United Kingdom, March 2024
- Committee members
- Peer-review
- Parallel Processing Letters (PPL) journal, January 2024
- Reproducibility committee (upcoming)
- International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) 2024 @ Atlanta, Georgia, United States, November 2024
- Committee members
Solvers for coupled FEM/BEM linear systems
In the aeronautical industry, aeroacoustics is used to model the propagation of sound waves in the airflow enveloping an aircraft in flight. It is then possible to simulate the noise produced by an aircraft at ground level during take-off and landing, to ensure compliance with environmental standards and to enable the design of future aircraft models. Unlike most other complex physics simulations, the method consists in solving coupled linear sparse/dense systems. To produce a realistic result, the number of unknowns in the system can be extremely large, making its resolution a major challenge. My thesis [1] focused on the design and evaluation of algorithms for solving large linear systems of this kind.
On the one hand, we proposed algorithms using the existing programming interface (API) of fully-featured and well-optimized sparse and dense direct solvers such as MUMPS 1, HMAT 2 and SPIDO 3. Thanks to these algorithms, we were able to bypass the major shortcomings of a basic usage of these solvers and take full advantage of their advanced features such as numerical compression, out-of-core computation and distributed memory parallelism. In summary, compared with a state-of-the-art reference approach, the proposed algorithms allow for processing up to 7× larger coupled FEM/BEM systems on a single shared-memory multi-core machine, and more than 6.5× larger coupled FEM/BEM systems in a distributed-memory environment.
In the research report [2], we began by formalizing and benchmarking existing implementations at Airbus. Following the first new developments, I presented a preliminary study of the proposed algorithms [3] at ComPAS 2021. This national-level peer-reviewed conference is ideally suited for PhD students seeking detailed feedback on their preliminary work. The absence of conference proceedings is voluntary and allows future submission of the work to an international journal or conference, for example.
Indeed, we subsequently published our final study of all the proposed shared-memory algorithms in [4], which I presented at the IPDPS 2022 international conference. This is one of the major events in the field, the reputation of which is, and rightly so, well recognized by existing rankings, and in which it is important to publish.
We also conducted a multi-metric study of the proposed algorithms, including energy consumption, memory usage and number of floating-point operations [5]. I presented this study at SBAC-PAD 2022. Firstly, the study confirmed the interest in numerical compression and out-of-core computation. Next, profiles of processor and memory power consumption, memory usage and the number of floating-point operations gave us a better understanding of the application's behavior. Finally, the study revealed a major bottleneck in our implementation, as well as a potential load-balancing problem in the sparse direct solver.
We then briefly presented our work in the short paper [6] published at the Waves 2022 conference 4.
Finally, the study of these algorithms carried out in a distributed memory environment and presented in the thesis is currently the subject of an article being prepared for publication.
The methods developed have been implemented and included in Airbus proprietary software based on the MUMPS, SPIDO and HMAT solvers.
On the other hand, we evaluated an alternative solver API reyling on a coupling of direct task-based solvers using the same runtime. A customized API allows us to improve composability and to simplify data exchange between solvers for a more efficient use of computational resources. While the introduction of such substantial changes in fully-featured community-driven solvers can only be considered in the long term due to the complexity of their source code (a few hundred thousand lines of code), we were able to implement a proof of concept of this approach in a reduced prototype. A preliminary comparative experimental study validated our implementation, confirming that it can achieve the targeted solution accuracy. In addition, we have illustrated the potential benefits of an asynchronous task execution and shown that even a proof-of-concept of this approach can compete with previously proposed methods as well as those in the state of the art.
This work was the fruit of a collaboration with Alfredo Buttari (CNRS, IRIT). In May 2022, I spent a week in Toulouse with the aim of working on the incorporation of necessary changes in the sparse direct solver qrmumps we have used and which is developed by Alfredo Buttari.
I had the opportunity to submit an abstract [7] and present this work at ComPAS 2023.
In addition to the main contribution, we devoted a significant effort to the reproducibility of our work. To this end, we have explored the principles of literate programming and associated software tools to ensure reproducibility of experimental environments and the numerical experiments themselves running on different machines and spanning over extended periods of time. The thesis itself contains a chapter dedicated to this subject.
Moreover, the technical report [8] published as a companion to the study [2] and describing its literate and reproducible environment represents our first complete work implementing the reproducibility principles studied. Subsequently, these community activity reports [9], [10] document our ongoing efforts. In this context, I also contributed to the GNU Guix package manager, helping to package new software and localize GNU Guix and its documentation into Slovak.
Publications
Author names always appear in alphabetical order.
Talks
- Solution of larger coupled sparse/dense linear systems in an industrial
aeroacoustic context
- CAMUS team seminar @ ICube lab, Illkirch-Graffenstaden, France, June 2022
- Slides
- Direct solution of larger coupled sparse/dense FEM/BEM linear systems using
low-rank compression
- Sparse Days 2022 @ St-Girons, France, June 2022
- Slides
- Reconciling high-performance computing with the use of third-party
libraries?
- with E. Agullo
- Datamove team seminar, held virtually, May 2022
- Slides
- An energy consumption study of coupled solvers for FEM/BEM linear systems:
preliminary results
- SOLHARIS plenary meeting @ Inria Bordeaux Sud-Ouest, Bordeaux, France, February 2022
- Slides
- Towards memory-aware multi-solve two-stage solver for coupled FEM/BEM
systems
- SOLHARIS plenary meeting, held virtually, July 2021
- Slides
- Coupled solvers for high-frequency aeroacoustics
- Doctoral school days, held virtually, May 2021
- Poster
- Guix et Org mode, deux amis du doctorant sur le chemin
vers une thèse reproductible
- Atelier reproductibilité des environnements logiciels, held virtually, May 2021
- Slides
- Video recording
- Coupled solvers for FEM/BEM linear systems arising from discretization of
aeroacoustic problems
- HiePACS team work group, held virtually, April 2021
- Slides
- A preliminary comparative study of solvers for coupled FEM/BEM linear
systems in a reproducible environment
- SOLHARIS plenary meeting, held virtually, December 2020
- Slides
Running tutorials
- Guix and Org mode, a powerful association for building a reproducible
research study
- Seminar and hands-on session @ Inria Grand-Est, Nancy, France, June 2022
- Slides
- Hands-on session
XFOR programming structure
The work I carried out within my Master's thesis [11] was related to the field of program optimization, and in particular for-loops, using the polyhedral model. More specifically, I worked on the XFOR programming structure [12]. Its syntax is very similar to that of standard for-loops in the C language. However, it allows several for-loops to be grouped together and managed at the same time. Thanks to its two specific parameters, grain and offset, the programmer can change the way these loops are executed in a simpler, more intuitive way.
The goal is to adjust the order of execution of instructions within managed loops, so as to highlight the possibilities offered by modern computer architectures. The idea is to optimize the use of cache memory and to exploit the parallelization capabilities of processors. A program rewritten using XFOR can be up to 6× faster compared to its original version.
One of the main tools dedicated to this structure is the "Iterate, But Better!" (IBB) compiler, which translates XFOR loops into equivalent for-loops. This way, the translated program can be compiled by any C compiler. The need to compile an XFOR program twice was one of the reasons for integrating the XFOR structure into a production compiler such as Clang/LLVM. This would allow direct compilation of XFOR programs and could help to promote the structure within the programming community.
As part of my internship, I extended the lexical and syntax analyzers of the Clang/LLVM compiler, so that it could recognize and correctly translate XFOR loops. I also implemented the transformation of XFOR loops into the intermediate representation used by the compiler (aka. LLVM IR) to produce executable files. By the end of the internship, the extended Clang/LLVM was able to compile programs with both simple and nested XFOR loops.
To make XFOR programs even more powerful, I then focused on strategies for parallelizing XFOR loops using threads in order to allow for exploring of a more coarse-grained parallelism. In particular, I studied the use of the OpenMP parallelization library in XFOR programs.
Publications
Talks
- XFOR loops, integration into the Clang/LLVM compiler and extenstion to
parallel programming
- Software corner @ ICube UMR 7357, Illkirch-Graffenstaden, France, June 2019
- Slides
- On the XFOR programming structure
- Introduction to research for Bachelor's degree students @ Faculty of Computer science, University of Strasbourg, France, April 2019
References
Footnotes
sparse direct solver, https://mumps-solver.org/
dense direct solver, https://theses.hal.science/tel-01244260/
proprietary dense direct solver developped internally at Airbus
International Conference on Mathematical and Numerical Aspects of Wave Propagation