Marek Felšöci


At present, I'm carrying out my research activities as part of my post-doctorate with the Camus (Inria) and ICPS (ICube) teams, where I am working on the automatic parallelization of C/C++ source code by task insertion (see Automatic parallelization through task insertion). However, to this day, most of my research activities have taken place as part of my doctoral thesis at the University of Bordeaux within the Concace team. My thesis was focused on the development of methods for solving large coupled FEM/BEM sparse/dense linear systems arising from aeroacoustic problems (see Solvers for coupled FEM/BEM linear systems). That being said, I took my first steps towards computer science research during my two internships within the Camus and the ICPS teams, focusing on a new programming structure, called XFOR, and the associated software tools (see XFOR programming structure).

Automatic parallelization through task insertion

Today, multi-core processors can be found in most computers, from cell phones to high-performance computing clusters. Nevertheless, designing and writing parallel applications that make efficient use of these numerous computing resources often remains the prerogative of experts with the technical knowledge and time required to optimize their software. Numerous compilation studies have addressed this issue and proposed different approaches to automatic parallelization. These are traditionally oriented towards loop nest parallelization, based on the polyhedral model. We are interested in task-based parallel programming. In this model, the program is broken down into tasks comprising one or more instructions. Tasks are associated with dependency constraints to enable an execution engine to schedule and execute tasks in the right order, i.e. respecting any dependencies between tasks accessing the same data in memory. In this way, tasks that are independent of each other can be executed in parallel. The challenge in optimizing this approach is to find an efficient way of breaking down the initial program into tasks. In this context, we are working on a tool for the automatic parallelization of C/C++ source code, enabling us to analyze a sequential input program, find an efficient task-based parallelism model and insert tasks at the right places in the input program. Having implemented the preliminary analysis phases, notably the analysis of dependencies between different instructions in the input program, we are now exploring different strategies for automatically inserting tasks into the program.

To implement our approach, we rely on the OptiTrust source-to-source code transformation tool, to which we add the functionalities needed to analyze and transform the original program. We work closely with the authors and contributors of the OptiTrust project, in particular Arthur Chargéraud (Inria) and Thomas Koehler (Inria).

Running tutorials

Participation on program committees

Solvers for coupled FEM/BEM linear systems

In the aeronautical industry, aeroacoustics is used to model the propagation of sound waves in the airflow enveloping an aircraft in flight. It is then possible to simulate the noise produced by an aircraft at ground level during take-off and landing, to ensure compliance with environmental standards and to enable the design of future aircraft models. Unlike most other complex physics simulations, the method consists in solving coupled linear sparse/dense systems. To produce a realistic result, the number of unknowns in the system can be extremely large, making its resolution a major challenge. My thesis [1] focused on the design and evaluation of algorithms for solving large linear systems of this kind.

On the one hand, we proposed algorithms using the existing programming interface (API) of fully-featured and well-optimized sparse and dense direct solvers such as MUMPS 1, HMAT 2 and SPIDO 3. Thanks to these algorithms, we were able to bypass the major shortcomings of a basic usage of these solvers and take full advantage of their advanced features such as numerical compression, out-of-core computation and distributed memory parallelism. In summary, compared with a state-of-the-art reference approach, the proposed algorithms allow for processing up to 7× larger coupled FEM/BEM systems on a single shared-memory multi-core machine, and more than 6.5× larger coupled FEM/BEM systems in a distributed-memory environment.

In the research report [2], we began by formalizing and benchmarking existing implementations at Airbus. Following the first new developments, I presented a preliminary study of the proposed algorithms [3] at ComPAS 2021. This national-level peer-reviewed conference is ideally suited for PhD students seeking detailed feedback on their preliminary work. The absence of conference proceedings is voluntary and allows future submission of the work to an international journal or conference, for example.

Indeed, we subsequently published our final study of all the proposed shared-memory algorithms in [4], which I presented at the IPDPS 2022 international conference. This is one of the major events in the field, the reputation of which is, and rightly so, well recognized by existing rankings, and in which it is important to publish.

We also conducted a multi-metric study of the proposed algorithms, including energy consumption, memory usage and number of floating-point operations [5]. I presented this study at SBAC-PAD 2022. Firstly, the study confirmed the interest in numerical compression and out-of-core computation. Next, profiles of processor and memory power consumption, memory usage and the number of floating-point operations gave us a better understanding of the application's behavior. Finally, the study revealed a major bottleneck in our implementation, as well as a potential load-balancing problem in the sparse direct solver.

We then briefly presented our work in the short paper [6] published at the Waves 2022 conference 4.

Finally, the study of these algorithms carried out in a distributed memory environment and presented in the thesis is currently the subject of an article being prepared for publication.

The methods developed have been implemented and included in Airbus proprietary software based on the MUMPS, SPIDO and HMAT solvers.

On the other hand, we evaluated an alternative solver API reyling on a coupling of direct task-based solvers using the same runtime. A customized API allows us to improve composability and to simplify data exchange between solvers for a more efficient use of computational resources. While the introduction of such substantial changes in fully-featured community-driven solvers can only be considered in the long term due to the complexity of their source code (a few hundred thousand lines of code), we were able to implement a proof of concept of this approach in a reduced prototype. A preliminary comparative experimental study validated our implementation, confirming that it can achieve the targeted solution accuracy. In addition, we have illustrated the potential benefits of an asynchronous task execution and shown that even a proof-of-concept of this approach can compete with previously proposed methods as well as those in the state of the art.

This work was the fruit of a collaboration with Alfredo Buttari (CNRS, IRIT). In May 2022, I spent a week in Toulouse with the aim of working on the incorporation of necessary changes in the sparse direct solver qrmumps we have used and which is developed by Alfredo Buttari.

I had the opportunity to submit an abstract [7] and present this work at ComPAS 2023.

In addition to the main contribution, we devoted a significant effort to the reproducibility of our work. To this end, we have explored the principles of literate programming and associated software tools to ensure reproducibility of experimental environments and the numerical experiments themselves running on different machines and spanning over extended periods of time. The thesis itself contains a chapter dedicated to this subject.

Moreover, the technical report [8] published as a companion to the study [2] and describing its literate and reproducible environment represents our first complete work implementing the reproducibility principles studied. Subsequently, these community activity reports [9], [10] document our ongoing efforts. In this context, I also contributed to the GNU Guix package manager, helping to package new software and localize GNU Guix and its documentation into Slovak.


Author names always appear in alphabetical order.

M. Felšöci, “Fast solvers for high-frequency aeroacoustics - Solveurs rapides pour l’aéroacoustique haute-fréquence,” Thesis, Université de Bordeaux, 2023 [Online]. Available:
E. Agullo, M. Felšöci, and G. Sylvand, “A comparison of selected solvers for coupled FEM/BEM linear systems arising from discretization of aeroacoustic problems,” Inria Bordeaux Sud-Ouest, Research Report RR-9412, 2021 [Online]. Available:
E. Agullo, M. Felšöci, and G. Sylvand, “Comparison of coupled solvers for FEM/BEM linear systems arising from discretization of aeroacoustic problems,” in COMPAS 2021 - Conférence francophone d’informatique en Parallélisme, Architecture et Système, 2021 [Online]. Available:
E. Agullo, M. Felšöci, and G. Sylvand, “ Direct solution of larger coupled sparse/dense linear systems using low-rank compression on single-node multi-core machines in an industrial context ,” in 2022 ieee international parallel and distributed processing symposium (ipdps), 2022, pp. 25–35, doi: 10.1109/IPDPS53621.2022.00012 [Online]. Available:
E. Agullo, M. Felšöci, A. Guermouche, H. Mathieu, G. Sylvand, and B. Tagliaro, “Study of the processor and memory power and energy consumption of coupled sparse/dense solvers,” in 2022 ieee 34th international symposium on computer architecture and high performance computing (sbac-pad), 2022, pp. 211–220, doi: 10.1109/SBAC-PAD55451.2022.00032.
T. Abboud, B. Chaigne, M. Felšöci, A. Piche, and G. Sylvand, “Numerical Modelisation of Interaction between Plasma Thruster Plume and Antennas on Small Satellites,” in Waves 2022 - International Conference on Mathematical and Numerical Aspects of Wave Propagation, 2022.
E. Agullo, A. Buttari, M. Felšöci, and G. Sylvand, “Vers un solveur direct à base de tâches pour des systèmes linéaires FEM/BEM creux/denses,” in Conférence francophone d’informatique en Parallélisme, Architecture et Système (ComPAS), Jul. 2023 [Online]. Available:
E. Agullo, M. Felšöci, and G. Sylvand, “A comparison of selected solvers for coupled FEM/BEM linear systems arising from discretization of aeroacoustic problems: literate and reproducible environment,” Inria Bordeaux Sud-Ouest, Technical Report RT-0513, 2021 [Online]. Available:
P.-A. Bouttier, L. Courtès, Y. Dupont, M. Felšöci, F. Gruber, K. Hinsen, A. Isaac, P. Prins, P. Swartvagher, S. Tournier, and R. Wurmus, “Guix-HPC Activity Report 2020-2021,” Inria Bordeaux - Sud-Ouest ; Université Grenoble - Alpes ; Université Paris, Technical Report, 2022 [Online]. Available:
C. Acary-Robert, L. Courtès, Y. Dupont, M. Felšöci, K. Hinsen, O. Lünsdorf, P. Prins, P. Swartvagher, S. Tournier, and R. Wurmus, “Guix-HPC Activity Report 2021-2022,” Inria Bordeaux - Sud-Ouest ; Université Grenoble - Alpes ; Université Paris, Technical Report, 2023 [Online]. Available:


  • Solution of larger coupled sparse/dense linear systems in an industrial aeroacoustic context
    • CAMUS team seminar @ ICube lab, Illkirch-Graffenstaden, France, June 2022
    • pdf.png Slides
  • Direct solution of larger coupled sparse/dense FEM/BEM linear systems using low-rank compression
  • Reconciling high-performance computing with the use of third-party libraries?
  • An energy consumption study of coupled solvers for FEM/BEM linear systems: preliminary results
  • Towards memory-aware multi-solve two-stage solver for coupled FEM/BEM systems
  • Coupled solvers for high-frequency aeroacoustics
    • Doctoral school days, held virtually, May 2021
    • pdf.png Poster
  • fr.png Guix et Org mode, deux amis du doctorant sur le chemin vers une thèse reproductible
  • Coupled solvers for FEM/BEM linear systems arising from discretization of aeroacoustic problems
  • A preliminary comparative study of solvers for coupled FEM/BEM linear systems in a reproducible environment

Running tutorials

  • Guix and Org mode, a powerful association for building a reproducible research study

XFOR programming structure

The work I carried out within my Master's thesis [11] was related to the field of program optimization, and in particular for-loops, using the polyhedral model. More specifically, I worked on the XFOR programming structure [12]. Its syntax is very similar to that of standard for-loops in the C language. However, it allows several for-loops to be grouped together and managed at the same time. Thanks to its two specific parameters, grain and offset, the programmer can change the way these loops are executed in a simpler, more intuitive way.

The goal is to adjust the order of execution of instructions within managed loops, so as to highlight the possibilities offered by modern computer architectures. The idea is to optimize the use of cache memory and to exploit the parallelization capabilities of processors. A program rewritten using XFOR can be up to 6× faster compared to its original version.

One of the main tools dedicated to this structure is the "Iterate, But Better!" (IBB) compiler, which translates XFOR loops into equivalent for-loops. This way, the translated program can be compiled by any C compiler. The need to compile an XFOR program twice was one of the reasons for integrating the XFOR structure into a production compiler such as Clang/LLVM. This would allow direct compilation of XFOR programs and could help to promote the structure within the programming community.

As part of my internship, I extended the lexical and syntax analyzers of the Clang/LLVM compiler, so that it could recognize and correctly translate XFOR loops. I also implemented the transformation of XFOR loops into the intermediate representation used by the compiler (aka. LLVM IR) to produce executable files. By the end of the internship, the extended Clang/LLVM was able to compile programs with both simple and nested XFOR loops.

To make XFOR programs even more powerful, I then focused on strategies for parallelizing XFOR loops using threads in order to allow for exploring of a more coarse-grained parallelism. In particular, I studied the use of the OpenMP parallelization library in XFOR programs.


M. Felšöci, “Integration of the XFOR programming structure into the Clang/LLVM compiler and extension to parallel programming,” Université de Strasbourg, Master’s thesis, Aug. 2019 [Online]. Available:


  • XFOR loops, integration into the Clang/LLVM compiler and extenstion to parallel programming
    • Software corner @ ICube UMR 7357, Illkirch-Graffenstaden, France, June 2019
    • pdf.png Slides
  • On the XFOR programming structure
    • Introduction to research for Bachelor's degree students @ Faculty of Computer science, University of Strasbourg, France, April 2019


E. Violard, P. Clauss, and I. Fassi, “Xfor: Semantics and Performance,” Team ICPS (ICube Laboratory), Research Report, Dec. 2014 [Online]. Available:



sparse direct solver,


proprietary dense direct solver developped internally at Airbus


International Conference on Mathematical and Numerical Aspects of Wave Propagation

Last update on 11/02/2024

This site is proudly powered by Org mode for Emacs on the servers of Websupport, spol. s r. o.

Source code of the site is publicly available on GitHub.

Featured icons come from the open-source sets Chicago95 and flag-icons.

Content is available under the Creative Commons BY NC ND 4.0 International license unless otherwise stated.

Creative Commons License