**Minimum fill-in: Inapproximability and almost tight lower bounds,**

by, Yixin **Cao** and R. B. **Sandeep** (url).

The minimum fill-in problem is one of the core problems in sparse direct methods. This problem is NP-complete [6]. The NP-completeness of the problem was first conjectured to be true in 1976 by Rose, Tarjan, and Lueker [5] in terms of the elimination process on undirected graphs. Then Rose and Tarjan [4] proved in 1978 that finding an elimination ordering on a directed graph that gives minimum fill-in is NP-complete (there was apparently a glitch in the proof which was rectified by Gilbert [1] two years later). Finally, Yannakakis [6] proved the NP-completeness of the minimum fill-in problem on undirected graphs in 1981.

The sparse matrix community needs a heuristic to tackle the minimum fill-in problem, as direct methods are used in many applications. There are quite efficient and effective heuristics for this purpose, which are variants of the approximate minimum degree and incomplete nested dissection algorithms. These heuristics are efficiently implemented in programs that are wide spread. The mentioned heuristics do not have any performance guarantees (except for classes of graphs such as graphs with good separators) but they are very-well established. So there is an NP-complete problem, practitioners have some heuristics and are happy with what they have.

On the other hand, the minimum fill-in problem poses many challenges to the theoreticians. Natanzon, Shamir, and Sharan [2] obtained an algorithm that approximates the minimum fill-in () by bounding the fill by in time, for a graph with vertices. The paper by Cao and Sandeep shows that unless P=NP, there is no polynomial time approximation scheme (PTAS) for the minimum fill-in problem. A PTAS is an algorithm which takes an instance of an optimization problem and a parameter and, in polynomial time, produces a solution that is within a factor of the optimum. This is a bad news in a way: no simple heuristic with a performance guarantee for theoretically oriented practitioners is in view.

The paper by Cao and Sandeep contains two other theorems, ruling out algorithms with approximation and with a running time , and exact algorithms with running time in , assuming exponential time hypothesis (ETH). Here is again the number of vertices, is an integer parameterizing the fill-in, and is a small positive constant. ETH posits that the satisfiability problem with at most three variables per clause cannot be solved in time, where and denote the number of clauses and variables, respectively, in the problem.

If you are interested in these problems, then see the tree-width and minimum fill-in challenges. The minimum tree-width is related to the shortest elimination tree over all elimination orderings of an undirected graph, which one of us had proved to be NP-complete [3].

- John R. Gilbert, A note on the NP-completeness of vertex elimination on directed graphs, SIAM Journal on Algebraic and Discrete Methods, 1 (1980), pp. 292–294 (link).
- Assaf Natanzon, Ron Shamir, and Roded Sharan, A polynomial approximation for the minimum fill-in problem, SIAM Journal on Computing, 30, (2000), pp. 1067–1079 (link).
- Alex Pothen, The complexity of optimal elimination trees,

Tech. Report, Department of Computer Science, Penn State University, 1988 (link). - Donald J. Rose and Robert E. Tarjan, Algorithmic aspects of vertex elimination in directed graphs, SIAM Journal on Applied Mathematics, 34 (1978), pp. 176–197 (link).
- Donald J. Rose, Robert E. Tarjan, and George S. Lueker, Algorithmic aspects of vertex elimination on graphs, SIAM Journal on Computing, 5 (1976), pp. 266–283 (link).
- Mihalis Yannakakis, Computing the minimum fill-in is NP-complete, SIAM Journal on Algebraic and Discrete Methods, 2 (1981), pp. 77–79 (link).

As noted in the citation of the award, this paper brings together recent half-approximation algorithms for weighted matchings and the classical Gale-Shapley and McVitie-Wilson algorithms for the Stable Marriage problem. This is helpful in two ways.

First, the authors show that a recent half-approximation algorithm for computing greedy matchings, the Suitor algorithm and its variants, is equivalent to the classical algorithms for the Stable Marriage problem. This correspondence enables the authors to propose multi-core and many-core parallel algorithms for the Stable Marriage problem, based on the greedy matching algorithms. This way, if I am not mistaken, they develop the first GPU algorithm for the Stable Marriage problem.

Second, the extensive theoretical work on the Stable Marriage algorithms explains the behavior of the Suitor matching algorithm. The worst-case number of proposals in the Suitor algorithm can be , where n is the number of vertices in the bipartite graph. However, if the weights are assigned randomly, the expected number of proposals is , which follows from a classical analysis of Donald Knuth [1, 2].

This work represents a nice marriage between theory and practice: the practical algorithms for the greedy matching help in designing parallel algorithms for the Stable Marriage problem, and the theoretical understanding of the Stable Marriage problem sheds light into the behavior of the greedy weighted matching algorithms.

**Stable marriage problem:** This is described on a full bipartite graph . Each vertex on the left has a total ranking of the vertices on the right; similarly each vertex on the right has a total ranking of all vertices on the left. The aim is to find a matching such that no and would obtain a higher ranked partner if they were to abandon their current partners in and rematch with each other.

**Greedy matching algorithm:** Here we compute approximations to matchings of maximum weight in weighted graphs. The Greedy algorithm considers edges in a non-increasing order of weights and the heaviest remaining edge is added to the matching, whereupon all edges incident on and are removed. The Greedy matching is a half-approximate matching.

**Suitor algorithm:** This is a proposal based algorithm [3]. Vertices can propose in any order, however, each vertex proposes to a neighbor with the heaviest weight that already does not have a better weight offer to match with it. A vertex could annul the proposal received by a neighbor if it has a better weight edge to offer the neighbor. When two vertices propose to each other, they are matched. The Suitor algorithm computes the same matching as the one obtained by the Greedy algorithm and the Locally Dominant edge algorithm described by Robert Preis [4]!

- Vicki Knoblauch, Marriage matching: A conjecture of Donald Knuth, Economics Working Papers
*.*Paper 200715, http://digitalcommons.uconn.edu/econ_wpapers/200715, 2007. - Donald E. Knuth, Stable Marriage and its Relation to Other Combinatorial Problems, CRM Proceedings and Lecture Notes, Vol. 10 American Mathematical Society, (1997).
- Fredrik Manne and Mahantesh Halappanavar, New effective multithreaded matching algorithms, in Proc. IPDPS 2014, IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, pp. 519–528, 2014.
- Robert Preis, Linear time -approximation algorithm for maximum weighted matching in general graphs, in Proc. STACS 99, 16th Annual Symposium on Theoretical Aspects of Computer Science Trier, Germany, pp. 259–269, 1999.

Umit V. Çatalyürek presented our work on the directed acyclic graph (DAG) partitioning problem. In this problem, we are given a DAG and an integer . The aim is to partition the vertex set into parts in such a way that the parts have (almost) equal weight and the sum of the costs of all those arcs having their endpoint in different parts minimized. Vertices can have weights, and the edges can have costs. Up to now, all is standard. What is not standard is that the quotient graph of the parts should be acyclic. In other words, the directed graph , where and iff for some and , should be acyclic.

John R. Gilbert wanted to understand the complexity of the problem, with respect to the undirected version. He is an expert on the subject matter (see, e.g., [2]). He asked what happens if we orient the edges of the model problem. If you are not familiar with this jargon, it is the mesh with each node being connected to its immediate neighbor in the four main directions, if those neighbors exist. See the small image for an example.

Partitioning these kind of meshes is a very hard problem. Gary Miller had mentioned their optimal partitioning in his invited talk (about something else). Rob Bisseling [Ch. 4, 1] has a great text about partitioning these meshes and their cousins in 3D. I had surveyed known methods in a paper with Anaël Grandjean [3]. In particular, Anaël found about discrete isoperimetric problems, showed that the shape of an optimal partition at a corner, or inside the mesh was discussed in the related literature. He also showed that the Cartesian partitioning is optimal for edge cut. Anaël also proposed efficient heuristics which produced connected components. See the cited papers for nice images. Our were based on our earlier work with Umit [4].

Anyhow, let’s return back to acyclic partitioning of DAGs, and John’s question. He suggested that we should look at the electrical spiral heater to get an orientation. This orientation results in a total order of the vertices. The figures below show the ordering of the and the meshes. Only some of the edges are shown; all edges of the mesh, including those that are not shown are from the lower numbered vertex to the higher numbered one.

As seen in the figures above, the spiral ordering is a total order and there is only one way to cut the meshes into two parts with the same number of vertices; blue and red show the two parts.

Theorem: Consider the mesh whose edges are oriented following the electrical spiral heater ordering. The unique acyclic cut with vertices in each side has edges in cut, for .

The theorem can be proved by observing that the blue vertices in the border (excluding the corners) has one arc going to a red vertex; those in the interior, except the one labeled has 2 such arcs; the vertex labeled has three such arcs. The condition comes from the fact that we assumed that there are blue vertices in the interior of the mesh. This is a lot of edges to cut!

Later, John said that he visited the Maxwell Museum of Anthropology at UNM after the CSC16 workshop, and saw that similar designs by the original native New Mexicans.

- Rob H. Bisseling, Parallel Scientific Computation: A Structured Approach using BSP and MPI, 1st ed, Oxford University Press, 2004.
- John R. Gilbert, Some nested dissection order is nearly optimal. Inf. Process. Lett. 26(6): 325–328 (1988).
- Anaël Grandjean and Bora Uçar, On partitioning two dimensional finite difference meshes for distributed memory parallel computers. PDP 2014: 9–16.
- Bora Uçar and Umit V. Çatalyürek, On the scalability of hypergraph models for sparse matrix partitioning. PDP 2014: 593–600.
- Da-Lun Wang and Ping Wang, Discrete isoperimetric problems, SIAM Journal on Applied Mathematics, 32(4):860–870 (1977).

There are so much to talk about. We will have a series of posts about the workshop and related things. Here are some bits and pieces.

The workshop had three invited talks, 19 contributed talks, and eight posters, and attended by 60+ people. There will be a proceedings with 11 papers. The proceedings will be published by SIAM and will be hosted at its publication platform.

We had also celebrated the 60th birthdays of Alex Pothen and Rob Bisseling.

There was a best paper award. It went to Fredrik Manne, Md. Naim, Håkon Lerring, and Mahantesh Halappanavar for their paper titled On Stable Marriages and Greedy Matching. Congratulations. The citation by the best paper award committee (Uwe Naumann, Alex Pothen, and Sivan Toledo) reads as:

for the way the paper brings together several decades of work on stable marriages with the more recent work on approximation algorithms for weighted matchings, and the consequences for the average case complexity of the latter algorithms.

As announced before, we had a mini-symposium on CSC in three sessions. SIAM keeps records of the program, the speakers and the abstracts (here). As the organizers, we thought that we should take this one step ahead and make the pdf’s of the talks available as well, wherever possible.

Here is the list of talks, in the order of speaker line-up.

**Authors**: **Alex Pothen (**Purdue University, USA**)** and **Mu Wang** (Purdue University, USA)

**Abstract. **Not yet

**Talk**: No files yet.

**Comments**: Alex had a video showcasing the use of the solver, in real-time, updating the mesh and showing the result of the surgery. The video was prepared with the help of professionals. The audience was all captive and silent during the video!

**Authors: Mathias Jacquelin** (Lawrence Berkeley National Laboratory, USA), **Esmond Ng** (Lawrence Berkeley National Laboratory, USA), **Barry Peyton** (Dalton State College, USA), **Yili Zheng** (Lawrence Berkeley National Laboratory, USA), and **Kathy Yelick**, (Lawrence Berkeley National Laboratory, USA).

**Abstract.** Combinatorial techniques are used in several phases of sparse matrix computation. For large-scale problems, while numerical phases are often executed in parallel, most of these combinatorial techniques are serial and can become bottlenecks. We are investigating the extent to which some of the combinatorial techniques can be performed in parallel.

**Talk**: No files yet.

**Comments**: RCM is discussed as the showcase. I think Aydin and Ariful were also involved (second slide of the talk had this information). Given the group’s experience in distributed memory BFS, it is of surprise that the RCM is implemented based on this. The target was not small-world graphs, neither social network graphs; graphs with large diameters were at the focus. So the parallelization problem is somehow tough. Sorting (by the vertex degrees) is required for a formal RCM (I could not catch the details of which sorting algorithm was used). This step incurred cost and was detrimental to performance. Maybe, in an application one can skip sorting and obtain a variant of RCM (after all, RCM is a heuristic). Also, Esmond pointed that the motivation for this work is that the matrix/graph could be already distributed in another context. Instead of collecting the global data to a central processor, solving it there, and distributing the result back to everyone, one could possibly solve the problem in parallel. In RCM, pseudo-peripheral nodes are used traditionally. They are again found by BFS. There are recent work for finding the diameters in graphs with a few rounds of BFS. Maybe review this.

**Authors: Ariful Azad** (Lawrence Berkeley National Laboratory, USA) and **Aydin Buluç** (Lawrence Berkeley National Laboratory, USA).

**Abstract.** We present distributed-memory parallel algorithms for computing matchings in bipartite graphs. We consider both exact and approximate algorithms for cardinality and weighted matching problems. We substitute the asynchronous data access patterns of traditional matching algorithms by a small subset of more structured, bulk-synchronous functions based on matrix algebra. Relying on communication-avoiding algorithms for the underlying matrix-algebric modules, different matching algorithms achieve good speedups on tens of thousands of cores on current supercomputers.

**Talk**: (arifulAzad-pp16) file.

**Authors**: **Michele Benzi** (Emory University, USA), **Fanny Dufossé** (Inria Lille-Nord Europe, France), **Kamer Kaya** (Sabanci University, Turkey), **Alex Pothen** (Purdue University, USA), and **Bora Uçar** (CNRS and ENS Lyon, France).

**Abstract. **Not yet

**Talk**: (boraUcar-pp16) file.

**Authors: Edmond Chow** (Georgia Institute of Technology, USA)

**Abstract.** We present a combinatorial problem and potential solutions arising in parallel computational chemistry. The Hartree-Fock (HF) method has a very complex data access pattern. Much research has been devoted over 20 years for parallelizing this important method, based primarily on intuition and experience. A formal approach for parallelizing HF while reducing communication may come from graph and hypergraph partitioning. Besides providing a potential solution, this approach may also shed light on the optimality of existing approaches.

**Talk**: (edmondChow-pp16) file.

**Authors: Md Naim** (University of Bergen, Norway) and **Fredrik Manne** (University of Bergen, Norway,)

**Abstract.** There has been considerable interest in community detection for finding the modularity structure in real world data. Such data sets can arise from social networks as well as various scientific domains. The Louvain method is one popular method for this problem as it is simple and fast. It can also be used to detect hierarchical structures in the data. However, its inherently sequential nature and cache unfriendly workloads makes it difficult to parallelize. This is particularly true for co-processor architectures. In this work we show how these obstacles can be overcome and present results from implementing the algorithm on a GPU.

**Talk**: (fredrikManne-pp16) file.

**Comments**: Md Naim could not attend to the conference (dommage), and Fredrik replaced him.

**Authors: Evangelos Georganas** (University of California, Berkeley, USA)

**Abstract.** A critical problem for computational genomics is the problem of *de novo* genome assembly: the development of robust scalable methods for transforming short randomly sampled sequences into the contiguous and accurate reconstruction of complex genomes. While advanced methods exist for assembling the small and haploid genomes of prokaryotes, the genomes of eukaryotes are more complex. We address this challenge head on by developing HipMer, an end-to-end high performance de novo assembler designed to scale to massive concurrencies. HipMer employs an efficient Unified Parallel C (UPC) implementation and computes the assembly of the human genome in only 8.4 minutes using 15,360 cores of a Cray XC30 system.

**Talk**: (evangelosGeorganas-pp16) file.

**Authors: **Aydin Buluç (Lawrence Berkeley National Laboratory, USA)

**Abstract.** We present a faster and more scalable implementation of the sparse matrix-matrix multiplication (SpGEMM) kernel. The implementation exploits multiple levels of parallelism, using a scalable three-dimensional algorithm for inter-node parallelism and multithreaded subroutines for intra-node parallelism. The three-dimensional formalism has characteristics that are special for the sparse case, which we thoroughly explain. We then provide results on applications in Markov graph clustering and Algebraic Multigrid based graph coarsening.

**Talk**: (aydinBuluc-pp16) file.

**Authors: Julien Herrmann** (The Ohio State University, USA), **Umit V. Catalyurek** (The Ohio State University, USA), **Kamer Kaya** (Sabanci University, Turkey), and **Bora Uçar** (CNRS and ENS Lyon, France).

**Abstract.** In scientific computing directed graphs are commonly used for modeling dependencies among entities. However, while modeling some of the problems as graph partitioning problems, directionality is generally ignored. Accurate modeling of some of the problems necessitates to take the directionally into account, which adds additional constraints that cannot be easily addressed in the current state-of-the-art partitioning methods and tools. In this talk, we will discuss some example problems, models and potential solution approaches for them.

**Talk**: (pdf) file.

**Authors: Arif Khan** (Purdue University, USA), and **Alex Pothen** (Purdue University, USA).

**Abstract.** We propose a new 3/2-approximation algorithm, called *LSE* for computing *-Edge Cover* and its application to a data privacy problem called adaptive $latex k$-Anonymity. -Edge Cover is a special case of the well-known *Set Multicover* problem and also a generalization of *Edge Cover* problem in graphs. The objective is to choose a subset of $latex C$ edges in the graph with weights on the edges, such that at least a specified number $latex b(v)$ of edges in $latex C$ are incident on each vertex $latex v$ and the sum of edge weights is minimized. We implement the algorithm on serial and shared-memory parallel processors and compare its performance against a collection of inherently sequential approximation algorithms that have been proposed for the *Set Multicover* problem. With *LSE*, i) we propose the first shared-memory parallel algorithm for the adaptive $latex k$-Anonymity problem and ii) give new theoretical results regarding privacy guarantees which are significantly stronger than the best known previous results.

**Talk**: (arifKhan-pp16) file.

**Authors: Iain S. ****Duff** (Science & Technology Facilities Council, United Kingdom and CERFACS, Toulouse, France), **Philip Knight** (University of Strathclyde, United Kingdom), **Sandrine Mouysset** (Université de Toulouse, France), **Daniel Ruiz** (ENSEEIHT, France), and **Bora Uçar** (CNRS and ENS Lyon, France).

**Abstract.** Considering any square fully indecomposable matrix , we can apply a two-sided diagonal scaling to $latex |A|$ to render it into doubly stochastic form. The Perron-Frobenius theorem is a key tool to exploit and we aim to use spectral properties of doubly stochastic matrices to reveal hidden block structure in matrices. We also combine this with classical graph analysis techniques to design partitioning algorithms for large sparse matrices based on both numerical values and pattern information.

**Talk**: (danielRuiz-pp16) file.

**Authors: **Mehmet Deveci (Sandia National Laboratories, USA), Erik Boman (Sandia National Laboratories, USA), and Siva Rajamanickam (Sandia National Laboratories, USA).

**Abstract.** In scientific computing, the problem of finding sets of independent tasks is usually addressed with graph coloring. We study performance portable graph coloring algorithms for many-core architectures. We propose a novel edge-based algorithm and enhancements of the speculative Gebremedhin-Manne algorithm that exploit architectures. We show superior quality and execution time of the proposed algorithms on GPUs and Xeon Phi compared to previous work. We present effects of coloring on applications such as Gauss-Seidel preconditioned solvers.

**Talk**: (mehmetDeveci-pp16) file.

Rob Bisseling and Alex Pothen contributed to a mini-symposium on combinatorial scientific computing.

Rob talked about hypergraph partitioning and how to use it with an iterative solver. We usually get this question: how many mat-vecs (or iterations in a solver) one needs to perform to offset the use of hypergraph partitioning. Rob’s main point in this talk was that one can estimate the number of iterations and spend some more time partitioning the hypergraph, if the number of iterations allow it. He has an ongoing project of optimally bisecting sparse matrices (see the link); his talk included updates (theoretical and practical) to this project. He says he adds a matrix a day to this page. As of now, there are 263 matrices. Chapeau! as the French say.

Also, he said (well maybe slipped out after a few glasses of Côtes du Rhône) that the new edition of his book (Parallel Scientific Computation: A Structured Approach using BSP and MPI) will be coming out. There are new materials; in particular a few sections on sorting algorithms and a complete chapter on graph algorithms (mainly matching). Stay tuned! Rob will be at SIAM PP next week. I will try to get more information about his book.

[I have just realized that I did not put Alex’s photo anywhere yet. So let’s have his face too.]

Alex discussed approximation algorithms to matching and -matching problems. He took up the challenge of designing parallel algorithms for matching problems, where the concurrency is usually limited. He discussed approximation algorithms with provable guarantees and great parallel performance for the -matching and a related edge cover problem. He also discussed an application of these algorithms in a data privacy problem he has been working on.

Alex arrived earlier to Lyon and we did some work. With Alex, we always end up discussing matching problems. This was not exception. We looked at the foundations of bottleneck matching algorithms. Alex and I will be attending SIAM PP16 next week. If you know/like these algorithms, please attend CSC mini-symposia so that we can talk.

I had chaired an invited talk by Yousef Saad.

The talk was for 90 minutes, without break! His talk was very engaging and illuminating. I enjoyed very much and appreciated how he communicates deep math to innocent (or ignorant;)) computer scientists. His two books Iterative methods for sparse linear systems (2nd edition) and Numerical Methods for Large Eigenvalue Problems (2nd Edition) are available at his web page and attest this.

Here is a crash course on Krylov subspace methods from his talk.

Let be an initial guess and be the initial residual.

Define and another subspace of dimension .

Basic Krylov step is then:

where and .

At this point, the reader/listener gets the principle and starts wondering what are the choices of that make sense? How do I keep all vectors? How do I get something orthogonal to them? Yousef had another slide:

1. ; class of Galerkin or orthogonal projection methods (e.g., CG), where

2. ; class of minimal residual methods (e.g., ORTHOMIN, GMRES) where .

So we learned the alternatives for , and we probably guessed correctly that we do not need to keep all vectors in all cases (e.g., CG), sometimes need all (e.g., GMRES without restart), and even if we need we can short-cut and restart. Getting orthogonal vectors could be tougher, especially if we do not store all the vectors. Now that we have a guide, a feeling, and a few questions we can turn to resources to study.

]]>Professor Thomas F. Coleman has been named among the Class of 2016 of SIAM Fellows. Tom is currently the Ophelia Lazaridis Research Chair at the University of Waterloo, Canada. He has served earlier as the Dean of the Faculty of Mathematics at Waterloo (2005-2010), and also the Director of the Theory Center at Cornell University (1998-2005). Tom’s research contributions are in optimization algorithms, financial optimization, Automatic Differentiation, and in CSC. Tom was a pioneer with Jorge More of Argonne National Lab, to model the estimation of sparse Jacobian and Hessian matrices as graph coloring problems, and thereby develop efficient algorithms for computing these derivative matrices. Tom was the PhD advisor of one of us (Alex Pothen) and Bruce Hendrickson at Cornell, and through his mentoring and research has profoundly influenced the CSC community.

Xiaoye Sherry Li has also been named among the Class of 2016 of SIAM Fellows (the whole list is here). She is very well known internationally for her work on methods and software for sparse matrix computations. In particular, she is the lead author behind SuperLU (software for solving general sparse systems of linear equations). Her citation also highlights the enabling role of her contributions in large-scale scientific and engineering applications. Sherry has been recently elected to lead the Scalable Solvers Group in Berkeley Lab’s Computational Research Division (CRD).

Congratulations to Tom and Sherry! We are also fortunate to have Sherry serve on the CSC Steering Committee.

Alex and Bora

]]>Timothy A. Davis, Sivasankaran Rajamanickam, and Wissam Sid-Lakhdar,“A survey of direct methods for sparse linear systems (link)”. The authors state their goal in the abstract:

The goal of this survey article is to impart a working knowledge of the underlying theory and practice of sparse direct methods for solving linear systems and least-squares problems, and to provide an overview of the algorithms, data structures, and software available to solve these problems, so that the reader can both understand the methods and know how best to use them.

I have very much appreciated the breadth of the survey. It reviews the earlier work on methods for the classical problems (e.g., LU, QR, Cholesky factorizations) and gives the context of the recent work (e.g., GPU acceleration of the associated software; most recent problems of updating/downdating; exploiting low-rank approximations for efficiency).

One of the most impressive parts of the surveys are their reference lists. This one has 604 bibliographic items (if I did not do any errors in counting). There is great scholarly work in collecting 604 bibliographic items, reading through them, and putting them into a well-organized survey. There is virtually no bulk references; all citations come with at least a few words. This assiduous approach got me excited and I dug into the reference list. The earliest cited work is from 1957 (one by Berge [1], and one by Markowitz [2]); the latests are from 2015 (there are a number of them). There are no cited papers from the years 1958, 1959, 1960, 1962, 1964, and 1965. Here is a histogram of the number of papers per 5-year periods (centered at years 1955 to 2015 by increments of 5, e.g., 1955:5:2015).

The histogram tells at least two things: (i) much of the activities at the foundations of today’s methods are from the years 1990–2000; (ii) the field is very active, considering that the survey gives an overview of fundamentals, and recent developments which did not fit neatly into the grand/traditional lines of the world of direct methods are only summarized in a relatively short section (Section 12).

I underlined another quotation from the survey:

Well-designed mathematical software has long been considered a cornerstone of scholarly contributions in computational science.

This is a great survey, even for those who know the area. Kudos to Tim, Siva, and Wissam for having crafted it.

- Claude Berge, Two theorems in graph theory, Proceedings of the National Academy of Sciences of the United States of America 43(9), 842–844, 1957 (link).
- Harry M. Markowitz, The elimination form of the inverse and its application to linear programming, Management Science, 3 (3), 255–269, 1957 (link).

where and each is a permutation matrix.

Given this formulation, one wonders if the decomposition is unique. Well, the answer is “No”. Then, one asks what can be said about the number . And this is the main topic of this post.

Richard A. Brualdi [1] discusses many things among which a lower bound and an upper bound on . The minimum number of permutation matrices is equal to the maximum cardinality of a set of nonzeros positions of no two of which could appear together in a single permutation matrix in the pattern of . An easy lower bound is then equal to the maximum number of nonzeros in a row or a column of . The upper bound is , for a fully indecomposable matrix with nonzeros; more generally if there are fully indecomposable blocks, then the upper bound is .

What about the minimum number of permutation matrices? It turns out that this is an NP-complete problem. Let’s state it in the form of a standard NP-completeness result.

**Input**: A doubly stochastic matrix .

**Output**: A Birkhoff-von Neumann decomposition of as .

**Measure**: The number of permutation matrices in the decomposition.

The NP-completeness of the decision version of this problem is shown in a recent technical report [2].

- Richard A. Brualdi, Notes on the Birkhoff algorithm for doubly stochastic matrices,
*Canadian Mathematical Bulletin*, 25(2), 191–199, 1982 (doi). - Fanny Dufossé and Bora Uçar, Notes on Birkhoff-von Neumann decomposition of doubly stochastic matrices, Technical Report, RR-8852, Inria Grenoble Rhône-Alpes, 2016 (link).

Aydın Buluç, a great scholar and a good friend, has received the IEEE TCSC Award for Excellence for Early Career Researchers. This award recognizes up to three individuals who have made outstanding, influential, and potentially long-lasting contributions in the field of scalable computing within five years of receiving their PhD degree as of January 1 of the year of the award. Aydın received a plaque at the SC15 conference that was held in Austin, TX in Nov. 15-20, 2015.

Congratulations Aydın!

]]>