## Recent news on the minimum fill-in problem

We have just seen a a paper in the Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2017. The proceedings is available online for the curious. A paper from the proceedings is the subject of this post.

Minimum fill-in: Inapproximability and almost tight lower bounds,
by, Yixin Cao and R. B. Sandeep (url).

The minimum fill-in problem is one of the core problems in sparse direct methods. This problem is NP-complete [6]. The NP-completeness of the problem was first conjectured to be true in 1976 by Rose, Tarjan, and Lueker [5] in terms of the elimination process on undirected graphs. Then Rose and Tarjan [4] proved in 1978 that finding an elimination ordering on a directed graph that gives minimum fill-in is NP-complete (there was apparently a glitch in the proof which was rectified by Gilbert [1] two years later). Finally, Yannakakis [6] proved the NP-completeness of the minimum fill-in problem on undirected graphs in 1981.

The sparse matrix community needs a heuristic to tackle the minimum fill-in problem, as direct methods are used in many applications. There are quite efficient and effective heuristics for this purpose, which are variants of the approximate minimum degree and incomplete nested dissection algorithms. These heuristics are efficiently implemented in programs that are wide spread. The mentioned heuristics do not have any performance guarantees (except for classes of graphs such as graphs with good separators) but they are very-well established. So there is an NP-complete problem, practitioners have some heuristics and are happy with what they have.

On the other hand, the minimum fill-in problem poses many challenges to the theoreticians. Natanzon, Shamir, and Sharan [2] obtained an algorithm that approximates the minimum fill-in ($k$) by bounding the fill by $8k^2$ in $O(k n^3)$ time, for a graph with $n$ vertices. The paper by Cao and Sandeep shows that unless P=NP, there is no polynomial time approximation scheme (PTAS) for the minimum fill-in problem. A PTAS is an algorithm which takes an instance of an optimization problem and a parameter $\epsilon > 0$ and, in polynomial time, produces a solution that is within a factor $1 + \epsilon$ of the optimum. This is a bad news in a way: no simple heuristic with a performance guarantee for theoretically oriented practitioners is in view.

The paper by Cao and Sandeep contains two other theorems, ruling out algorithms with approximation $1+\epsilon$ and with a running time $2^{O(n^{1-\delta})}$, and exact algorithms with running time in $2^{O(k^{1/2-\delta})}\cdot n^{O(1)}$, assuming exponential time hypothesis (ETH). Here $n$ is again the number of vertices, $k$ is an integer parameterizing the fill-in, and $\delta$ is a small positive constant. ETH posits that the satisfiability problem with at most three variables per clause cannot be solved in $2^{o(p+q)}$ time, where $p$ and $q$ denote the number of clauses and variables, respectively, in the problem.

If you are interested in these problems, then see the tree-width and minimum fill-in challenges. The minimum tree-width is related to the shortest elimination tree over all elimination orderings of an undirected graph, which one of us had proved to be NP-complete [3].

### References

1. John R. Gilbert, A note on the NP-completeness of vertex elimination on directed graphs, SIAM Journal on Algebraic and Discrete Methods, 1 (1980), pp. 292–294 (link).
2. Assaf Natanzon, Ron Shamir, and Roded Sharan, A polynomial approximation for the minimum fill-in problem, SIAM Journal on Computing, 30, (2000), pp. 1067–1079 (link).
3. Alex Pothen, The complexity of optimal elimination trees,
Tech. Report, Department of Computer Science, Penn State University, 1988 (link).
4. Donald J. Rose and Robert E. Tarjan, Algorithmic aspects of vertex elimination in directed graphs, SIAM Journal on Applied Mathematics, 34 (1978), pp. 176–197 (link).
5. Donald J. Rose, Robert E. Tarjan, and George S. Lueker, Algorithmic aspects of vertex elimination on graphs, SIAM Journal on Computing, 5 (1976), pp. 266–283 (link).
6. Mihalis Yannakakis, Computing the minimum fill-in is NP-complete, SIAM Journal on Algebraic and Discrete Methods, 2 (1981), pp. 77–79 (link).

## After CSC16 (cont’)

Umit V. Çatalyürek presented our work on the directed acyclic graph (DAG) partitioning problem. In this problem, we are given a DAG $G=(V,E)$ and an integer $k\geq 2$. The aim is to partition the vertex set $V$ into $k$ parts $V_1,\ldots, V_k$ in such a way that the parts have (almost) equal weight and the sum of the costs of all those arcs having their endpoint in different parts minimized. Vertices can have weights, and the edges can have costs. Up to now, all is standard. What is not standard is that the quotient graph of the parts should be acyclic. In other words, the directed graph $G'=(V', E')$, where $V'=\{V_1,\ldots,V_k\}$ and $V_i\rightarrow V_j\in E'$ iff $v_i\rightarrow v_j\in E$ for some $v_i\in V_i$ and $v_j\in V_j$, should be acyclic.

John R. Gilbert wanted to understand the complexity of the problem, with respect to the undirected version. He is an expert on the subject matter (see, e.g., [2]). He asked what happens if we orient the edges of the $n\times n$ model problem. If you are not familiar with this jargon, it is the $n\times n$ mesh with each node being connected to its immediate neighbor in the four main directions, if those neighbors exist. See the small image for an $8\times 8$ example.

Partitioning these kind of meshes is a very hard problem. Gary Miller had mentioned their optimal partitioning in his invited talk (about something else). Rob Bisseling [Ch. 4, 1] has a great text about partitioning these meshes and their cousins in 3D. I had surveyed known methods in a paper with Anaël Grandjean [3]. In particular, Anaël found about discrete isoperimetric problems, showed that the shape of an optimal partition at a corner, or inside the mesh was discussed in the related literature. He also showed that the Cartesian partitioning is optimal for edge cut.  Anaël also proposed efficient heuristics which produced connected components. See the cited papers for nice images. Our were based on our earlier work with Umit [4].

Anyhow,  let’s return back to acyclic partitioning of DAGs, and John’s question. He suggested that we should look at the electrical spiral heater to get an orientation. This orientation results in a total order of the vertices. The figures below show the ordering of the $8\times 8$ and the $16\times 16$ meshes. Only some of the edges are shown; all edges of the mesh, including those that are not shown are from the lower numbered vertex to the higher numbered one.

As seen in the figures above, the spiral ordering is a total order and there is only one way to cut the meshes into two parts with the same number of vertices; blue and red show the two parts.

Theorem: Consider the $n\times n$ mesh whose edges are oriented following the electrical spiral heater ordering. The unique acyclic cut with $n^2/2$ vertices in each side has $n^2-4n+3$ edges in cut, for $n\geq 8$.

The theorem can be proved by observing that the blue vertices in the border (excluding the corners) has one arc going to a red vertex; those in the interior, except the one labeled $n^2/2$ has 2 such arcs;  the vertex labeled $n^2/2$ has three such arcs. The condition $n\geq 8$ comes from the fact that we assumed that there are blue vertices in the interior of the mesh. This is a lot of edges to cut!

Later, John said that he visited the Maxwell Museum of Anthropology at UNM after the CSC16 workshop, and saw that similar designs by the original native New Mexicans.

### References

1. Rob H. Bisseling, Parallel Scientific Computation: A Structured Approach using BSP and MPI, 1st ed, Oxford University Press, 2004.
2. John R. Gilbert, Some nested dissection order is nearly optimal. Inf. Process. Lett. 26(6): 325–328 (1988).
3. Anaël Grandjean and Bora Uçar, On partitioning two dimensional finite difference meshes for distributed memory parallel computers. PDP 2014: 9–16.
4. Bora Uçar and Umit V. Çatalyürek, On the scalability of hypergraph models for sparse matrix partitioning. PDP 2014: 593–600.
5. Da-Lun Wang and Ping Wang, Discrete isoperimetric problems, SIAM Journal on Applied Mathematics, 32(4):860–870 (1977).

## On HPC Days in Lyon

Last week (6–8 April, 2016) we had an incredible meeting called HPC Days in Lyon. This three day event featured only invited long talks and invited mini-symposia talks. The meeting was organized with generous support of Labex MILYON.

Rob Bisseling and Alex Pothen contributed to a mini-symposium on combinatorial scientific computing.

Rob talked about hypergraph partitioning and how to use it with an iterative solver. We usually get this question: how many mat-vecs (or iterations in a solver) one needs to perform to offset the use of hypergraph partitioning. Rob’s main point in this talk was that one can estimate the number of iterations and spend some more time partitioning the hypergraph, if the number of iterations allow it. He has an ongoing project of optimally bisecting sparse matrices (see the link); his talk included updates (theoretical and practical) to this project. He says he adds a matrix a day to this page. As of now, there are 263 matrices. Chapeau! as the French say.

Also, he said (well maybe slipped out after a few glasses of Côtes du Rhône) that the new edition of his book (Parallel Scientific Computation: A Structured Approach using BSP and MPI) will be coming out.  There are new materials; in particular a few sections on sorting algorithms and a complete chapter on graph algorithms (mainly matching). Stay tuned! Rob will be at SIAM PP next week. I will try to get more  information about his book.

[I have just realized that I did not put Alex’s photo anywhere yet. So let’s have his face too.]

Alex discussed approximation algorithms to matching and $b$-matching problems. He took up the challenge of designing parallel algorithms for matching problems, where the concurrency is usually limited. He discussed approximation algorithms with provable guarantees and great parallel performance for the $b$-matching and a related edge cover problem. He also discussed an application of these algorithms in a data privacy problem he has been working on.

Alex arrived earlier to Lyon and we did some work. With Alex, we always end up discussing matching problems. This was not exception. We looked at the foundations of bottleneck matching algorithms. Alex and I will be attending SIAM PP16 next week. If you know/like these algorithms, please attend CSC mini-symposia so that we can talk.

I had chaired an invited talk by Yousef Saad.

The talk was for 90 minutes, without break! His talk was very engaging and illuminating. I enjoyed very much and appreciated how he communicates deep math to innocent (or ignorant;)) computer scientists. His two books Iterative methods for sparse linear systems (2nd edition) and Numerical Methods for Large Eigenvalue Problems (2nd Edition) are available at his web page and attest this.
Here is a crash course on Krylov subspace methods from his talk.

Let $x_0$ be an initial guess and $r_0=b-Ax_0$ be the initial residual.
Define $K_m=\textrm{span}\{r_0, Ar_0,\ldots,A^{m-1}r_0\}$ and $L_m$ another subspace of dimension $m$.
Basic Krylov step is then:
$x_m=x_0 + \delta$ where $\delta\in K_m$ and $b-Ax_m \perp L_m$.

At this point, the reader/listener gets the principle and starts wondering what are the choices of $L_m$ that make sense? How do I keep all $m$ vectors? How do I get something orthogonal to them? Yousef had another slide:

1. $L_m=K_m$; class of Galerkin or orthogonal projection methods (e.g., CG), where $\|x^*-\tilde{x}\|=\min_{z\in K_m}\|x^*-z\|_{A}.$
2. $L_m=AK_m$; class of minimal residual methods (e.g., ORTHOMIN, GMRES) where $\|b-A\tilde{x}\|_2=\min_{z\in K_m}\|b-Az\|_2$.

So we learned the alternatives for $L_m$, and we probably guessed correctly that we do not need to keep all $m$ vectors in all cases (e.g., CG), sometimes need all (e.g., GMRES without restart), and even if we need we can short-cut and restart. Getting orthogonal vectors could be tougher, especially if we do not store all the $m$ vectors. Now that we have a guide, a feeling, and a few questions we can turn to resources to study.

## On the Birkhoff-von Neumann decomposition

Michele Benzi, Alex Pothen and I have been making use of the celebrated Birkhoff-von Neumann theorem on doubly stochastic matrices. The theorem says that any doubly stochastic matrix can be written as a convex combination of a finitely many permutation matrices. Formally, let $\mathbf{A}$ be a doubly stochastic matrix. Then,

$\mathbf{A}=\sum_{j=1}^k \alpha_j \mathbf{P}_j\;,$

where $\alpha_j>0, \sum_j \alpha_j=1$ and each $\mathbf{P}_j$ is a permutation matrix.

Given this formulation, one wonders if the decomposition is unique. Well, the answer is “No”. Then, one asks what can be said about the number $k$. And this is the main topic of this post.

Richard A. Brualdi [1] discusses many things among which a lower bound and an upper bound on $k$. The minimum number of permutation matrices is equal to the maximum cardinality of a set of nonzeros positions of $\mathbf{A}$ no two of which could appear together in a single permutation matrix in the pattern of $\mathbf{A}$. An easy lower bound is then equal to the maximum number of nonzeros in a row or a column of $\mathbf{A}$. The upper bound is $\mathrm{nnz}(\mathbf{A})-2n+2$, for a fully indecomposable matrix $\mathbf{A}$ with $\mathrm{nnz}(\mathbf{A})$ nonzeros; more generally if there are $b$ fully indecomposable blocks, then the upper bound is $\mathrm{nnz}(\mathbf{A})-2n+b+1$.

What about the minimum number of permutation matrices? It turns out that this is an NP-complete problem. Let’s state it in the form of a standard NP-completeness result.

Input:  A doubly stochastic matrix $\mathbf{A}$.
Output: A Birkhoff-von Neumann decomposition of $\mathbf{A}$ as $\mathbf{A} = \alpha_1\mathbf{P}_1 + \alpha_2\mathbf{P}_2 + \cdots +\alpha_k\mathbf{P}_k$.
Measure: The number $k$ of permutation matrices in the decomposition.

The NP-completeness of the decision version of this problem is shown in a recent technical report [2].

### References

1. Richard A. Brualdi, Notes on the Birkhoff algorithm for doubly stochastic matrices, Canadian Mathematical Bulletin, 25(2), 191–199, 1982 (doi).
2. Fanny Dufossé and Bora Uçar, Notes on Birkhoff-von Neumann decomposition of doubly stochastic matrices, Technical Report, RR-8852, Inria Grenoble Rhône-Alpes, 2016 (link).