## 9 November 2018

Today was a great day at the school. There were two remarkable events. The first one is the 30th anniversary of the lab LIP at ENS Lyon (here is the program). The second was that the ENS Lyon awarded Marc Snir with an honorary doctorate degree (Docteur Honoris Causa). The page announcing this event is in French and google translate does a good job in translating it to English. Yves Robert had slides for introducing Marc during the event.

Many of us know Marc (ever heard of the MPI standard and the book “MPI: The complete reference“). His work span complexity theory, to MPI standard, to parallel computing systems. And oh, he speaks French.

I was lucky to see his talk during the 30th anniversary of the LIP (but unlucky to miss the ceremony of Docteur Honoris Causa). He gave an overview of his involvement with building parallel machines: BlueGene, Blue Waters, SP/Vulcan, and others. His talk has many whimsical observations. Here are some:

• A supercomputer research prototype is an oxymoron.
• A supercomputer research design is either boring or unpractical.
• The main contribution [of all the supercomputer design projects]: The projects educated a generation of researchers.
• Theory informs practice, but should not be taken literally.

After stating

Often theory follows practice, rather than practice following theory

he discussed how his paper with Upfal and Felperin was motivated by Vulcan’s practically well behaving design of $\log N + O(1)$ stages. Back then, the theory demonstrated $2\log N$ stages to avoid worst cases. The cited paper shows $\log N + \log\log N$, where the extra term is $O(1)$ for practical purposes.

After the talk, I wondered which computer he helped to build was his favorite. He said, more or less,

I created them, so all are my favorite !

### References

Eli Upfal, Sergio Felperin, and Marc Snir, Randomized routing with shorter paths, IEEE Transactions on Parallel and Distributed Systems, 7(4), pp. 356–362, 1996. (doi)

## After CSC18

It has been long time we last blogged, due to Bora’s other professional engagements. There has been many things in between, including the 8th SIAM Workshop on CSC (CSC18), June 6–8, 2018, Bergen, Norway. The best paper of CSC18 by Kevin Deweese and John R. Gilbert, entitled Evolving Difficult Graphs for Laplacian Solvers is our subject.

Kevin is currently a PhD student at the University of California,  Santa Barbara, working on provably fast Laplacian solvers. See his web page for a few of his papers with experimental evaluation (most of the similar solvers are hard to implement).

Here is the abstract of the subject paper (Link to paper) by Kevin and John:

We seek to discover Laplacian linear systems that stress the ability of existing Laplacian solver packages to solve them efficiently. We employ a genetic algorithm to explore the problem space of graphs of fixed size and edge density. The goal is to measure the gap between theoretical and existing Laplacian solvers, by trying to find worst case example graphs for existing solvers. These problems may have little use inside any real world application, but they give great insight into solver behavior. We report performance results of our genetic algorithm, and explore the properties of the evolved graphs.

Kevin and John focus on the combinatorial solver by Kelner, Orecchia, Sidford, and Zhu (arXiv link) known as KOSZ on the provably fast solver side, and PCG with a Jacobi preconditioner on the traditional side. The genetic algorithms by Kevin and John create populations of graphs by starting with an initial population. Then, a Laplacian solver is run on all graphs and those which required the most work to solve are selected as parents. A random vertex is swapped between the selected parents to yield new individuals. Random edge mutations of the form edge removal and replacement are performed. The techniques are versatile: they are used for creating hard instances for KOSZ and PCG; they are also used to create instances in which the performance of KOSZ with respect to that of PCG varies. On a reported instance, KOSZ outperforms PCG by a factor of 2, and on another one PCG outperforms KOSZ by a factor of 140! In all experiments, the performance is measured in terms of the number of arithmetic operations. Future work includes combining different instances to yield larger problems which stress both solvers to understand our abilities (beware solvers!).

## After CSC16 (cont’)

Umit V. Çatalyürek presented our work on the directed acyclic graph (DAG) partitioning problem. In this problem, we are given a DAG $G=(V,E)$ and an integer $k\geq 2$. The aim is to partition the vertex set $V$ into $k$ parts $V_1,\ldots, V_k$ in such a way that the parts have (almost) equal weight and the sum of the costs of all those arcs having their endpoint in different parts minimized. Vertices can have weights, and the edges can have costs. Up to now, all is standard. What is not standard is that the quotient graph of the parts should be acyclic. In other words, the directed graph $G'=(V', E')$, where $V'=\{V_1,\ldots,V_k\}$ and $V_i\rightarrow V_j\in E'$ iff $v_i\rightarrow v_j\in E$ for some $v_i\in V_i$ and $v_j\in V_j$, should be acyclic.

John R. Gilbert wanted to understand the complexity of the problem, with respect to the undirected version. He is an expert on the subject matter (see, e.g., [2]). He asked what happens if we orient the edges of the $n\times n$ model problem. If you are not familiar with this jargon, it is the $n\times n$ mesh with each node being connected to its immediate neighbor in the four main directions, if those neighbors exist. See the small image for an $8\times 8$ example.

Partitioning these kind of meshes is a very hard problem. Gary Miller had mentioned their optimal partitioning in his invited talk (about something else). Rob Bisseling [Ch. 4, 1] has a great text about partitioning these meshes and their cousins in 3D. I had surveyed known methods in a paper with Anaël Grandjean [3]. In particular, Anaël found about discrete isoperimetric problems, showed that the shape of an optimal partition at a corner, or inside the mesh was discussed in the related literature. He also showed that the Cartesian partitioning is optimal for edge cut.  Anaël also proposed efficient heuristics which produced connected components. See the cited papers for nice images. Our were based on our earlier work with Umit [4].

Anyhow,  let’s return back to acyclic partitioning of DAGs, and John’s question. He suggested that we should look at the electrical spiral heater to get an orientation. This orientation results in a total order of the vertices. The figures below show the ordering of the $8\times 8$ and the $16\times 16$ meshes. Only some of the edges are shown; all edges of the mesh, including those that are not shown are from the lower numbered vertex to the higher numbered one.

As seen in the figures above, the spiral ordering is a total order and there is only one way to cut the meshes into two parts with the same number of vertices; blue and red show the two parts.

Theorem: Consider the $n\times n$ mesh whose edges are oriented following the electrical spiral heater ordering. The unique acyclic cut with $n^2/2$ vertices in each side has $n^2-4n+3$ edges in cut, for $n\geq 8$.

The theorem can be proved by observing that the blue vertices in the border (excluding the corners) has one arc going to a red vertex; those in the interior, except the one labeled $n^2/2$ has 2 such arcs;  the vertex labeled $n^2/2$ has three such arcs. The condition $n\geq 8$ comes from the fact that we assumed that there are blue vertices in the interior of the mesh. This is a lot of edges to cut!

Later, John said that he visited the Maxwell Museum of Anthropology at UNM after the CSC16 workshop, and saw that similar designs by the original native New Mexicans.

### References

1. Rob H. Bisseling, Parallel Scientific Computation: A Structured Approach using BSP and MPI, 1st ed, Oxford University Press, 2004.
2. John R. Gilbert, Some nested dissection order is nearly optimal. Inf. Process. Lett. 26(6): 325–328 (1988).
3. Anaël Grandjean and Bora Uçar, On partitioning two dimensional finite difference meshes for distributed memory parallel computers. PDP 2014: 9–16.
4. Bora Uçar and Umit V. Çatalyürek, On the scalability of hypergraph models for sparse matrix partitioning. PDP 2014: 593–600.
5. Da-Lun Wang and Ping Wang, Discrete isoperimetric problems, SIAM Journal on Applied Mathematics, 32(4):860–870 (1977).

## After CSC16

CSC16 was held last week. Kudos to Assefaw and Erik as the chair of the workshop.

There are so much to talk about. We will have a series of posts about the workshop and related things. Here are some bits and pieces.

The workshop had three invited talks, 19 contributed talks, and eight posters, and attended by 60+ people. There will be a proceedings with 11 papers. The proceedings will be published by SIAM and will be hosted at its publication platform.

We had also celebrated the 60th birthdays of Alex Pothen and Rob Bisseling.

There was a best paper award. It went to Fredrik Manne, Md. Naim, Håkon Lerring, and Mahantesh Halappanavar for their paper titled On Stable Marriages and Greedy Matching. Congratulations. The citation by the best paper award committee (Uwe Naumann, Alex Pothen, and Sivan Toledo) reads as:

for the way the paper brings together several decades of work on stable marriages with the more recent work on approximation algorithms for weighted matchings, and the consequences for the average case complexity of the latter algorithms.

A heads up: the CSC18 meeting will likely be in Bergen, Norway. Erik cracked a joke about this in saying that the best paper awardees should take on organizing the next meeting.

## On HPC Days in Lyon

Last week (6–8 April, 2016) we had an incredible meeting called HPC Days in Lyon. This three day event featured only invited long talks and invited mini-symposia talks. The meeting was organized with generous support of Labex MILYON.

Rob Bisseling and Alex Pothen contributed to a mini-symposium on combinatorial scientific computing.

Rob talked about hypergraph partitioning and how to use it with an iterative solver. We usually get this question: how many mat-vecs (or iterations in a solver) one needs to perform to offset the use of hypergraph partitioning. Rob’s main point in this talk was that one can estimate the number of iterations and spend some more time partitioning the hypergraph, if the number of iterations allow it. He has an ongoing project of optimally bisecting sparse matrices (see the link); his talk included updates (theoretical and practical) to this project. He says he adds a matrix a day to this page. As of now, there are 263 matrices. Chapeau! as the French say.

Also, he said (well maybe slipped out after a few glasses of Côtes du Rhône) that the new edition of his book (Parallel Scientific Computation: A Structured Approach using BSP and MPI) will be coming out.  There are new materials; in particular a few sections on sorting algorithms and a complete chapter on graph algorithms (mainly matching). Stay tuned! Rob will be at SIAM PP next week. I will try to get more  information about his book.

[I have just realized that I did not put Alex’s photo anywhere yet. So let’s have his face too.]

Alex discussed approximation algorithms to matching and $b$-matching problems. He took up the challenge of designing parallel algorithms for matching problems, where the concurrency is usually limited. He discussed approximation algorithms with provable guarantees and great parallel performance for the $b$-matching and a related edge cover problem. He also discussed an application of these algorithms in a data privacy problem he has been working on.

Alex arrived earlier to Lyon and we did some work. With Alex, we always end up discussing matching problems. This was not exception. We looked at the foundations of bottleneck matching algorithms. Alex and I will be attending SIAM PP16 next week. If you know/like these algorithms, please attend CSC mini-symposia so that we can talk.

The talk was for 90 minutes, without break! His talk was very engaging and illuminating. I enjoyed very much and appreciated how he communicates deep math to innocent (or ignorant;)) computer scientists. His two books Iterative methods for sparse linear systems (2nd edition) and Numerical Methods for Large Eigenvalue Problems (2nd Edition) are available at his web page and attest this.
Here is a crash course on Krylov subspace methods from his talk.

Let $x_0$ be an initial guess and $r_0=b-Ax_0$ be the initial residual.
Define $K_m=\textrm{span}\{r_0, Ar_0,\ldots,A^{m-1}r_0\}$ and $L_m$ another subspace of dimension $m$.
Basic Krylov step is then:
$x_m=x_0 + \delta$ where $\delta\in K_m$ and $b-Ax_m \perp L_m$.

At this point, the reader/listener gets the principle and starts wondering what are the choices of $L_m$ that make sense? How do I keep all $m$ vectors? How do I get something orthogonal to them? Yousef had another slide:

1. $L_m=K_m$; class of Galerkin or orthogonal projection methods (e.g., CG), where $\|x^*-\tilde{x}\|=\min_{z\in K_m}\|x^*-z\|_{A}.$
2. $L_m=AK_m$; class of minimal residual methods (e.g., ORTHOMIN, GMRES) where $\|b-A\tilde{x}\|_2=\min_{z\in K_m}\|b-Az\|_2$.

So we learned the alternatives for $L_m$, and we probably guessed correctly that we do not need to keep all $m$ vectors in all cases (e.g., CG), sometimes need all (e.g., GMRES without restart), and even if we need we can short-cut and restart. Getting orthogonal vectors could be tougher, especially if we do not store all the $m$ vectors. Now that we have a guide, a feeling, and a few questions we can turn to resources to study.

## Good news again!

Professor Thomas F. Coleman has  been named among the Class of 2016 of SIAM Fellows. Tom is currently the Ophelia Lazaridis Research Chair at the University of Waterloo, Canada. He has served earlier as the Dean of the Faculty of Mathematics at Waterloo (2005-2010), and also the Director of the Theory Center at Cornell University (1998-2005). Tom’s research contributions are in optimization algorithms, financial optimization, Automatic Differentiation, and in CSC. Tom was a pioneer with Jorge More of Argonne National Lab, to model the estimation of sparse Jacobian and Hessian matrices as graph coloring problems, and thereby develop efficient algorithms for computing these derivative matrices. Tom was the PhD advisor of one of us (Alex Pothen) and Bruce Hendrickson at Cornell, and  through his mentoring and research has profoundly influenced the CSC community.

Xiaoye Sherry Li has also been named among the Class of 2016 of SIAM Fellows (the whole list is here). She is very well known internationally for her work on methods and software for sparse matrix computations. In particular, she is the lead author behind SuperLU (software for solving general sparse systems of linear equations). Her citation also highlights the enabling role of her contributions in large-scale scientific and engineering applications. Sherry has been recently elected to lead the Scalable Solvers Group in Berkeley Lab’s Computational Research Division (CRD).

Congratulations to Tom and Sherry! We are also fortunate  to have Sherry serve on the CSC Steering Committee.

Alex and Bora

## A recent survey on direct methods

I have just read a recent survey by

Timothy A. Davis, Sivasankaran Rajamanickam, and Wissam Sid-Lakhdar,“A survey of direct methods for sparse linear systems (link)”. The authors state their goal in the abstract:

The goal of this survey article is to impart a working knowledge of the underlying theory and practice of sparse direct methods for solving linear systems and least-squares problems, and to provide an overview of the algorithms, data structures, and software available to solve these problems, so that the reader can both understand the methods and know how best to use them.

I have very much appreciated the breadth of the survey. It reviews the earlier work on methods for the classical problems (e.g., LU, QR, Cholesky factorizations) and gives the context of the recent work  (e.g., GPU acceleration of the associated software; most recent problems of updating/downdating; exploiting low-rank approximations for efficiency).

One of the most impressive parts of the surveys are their reference lists. This one has 604 bibliographic items  (if I did not do any errors in counting). There is great scholarly work in collecting 604 bibliographic items, reading through them, and putting them into a well-organized survey. There is virtually no bulk references; all citations come with at least a few words. This assiduous approach got me excited and I dug into the reference list. The earliest cited work is from 1957 (one by Berge [1], and one by Markowitz [2]); the latests are from 2015 (there are a number of them).  There are no cited papers from the years 1958, 1959, 1960, 1962, 1964, and 1965. Here is a histogram of the number of papers per 5-year periods (centered at years 1955 to 2015 by increments of 5, e.g., 1955:5:2015).

The histogram tells at least two things: (i) much of the activities at the foundations of today’s methods are from the years 1990–2000; (ii) the field is very active, considering that the survey gives an overview of fundamentals, and recent developments which did not fit neatly into the grand/traditional lines of the world of direct methods are only summarized in a relatively short section (Section 12).

I underlined another quotation from the survey:

Well-designed mathematical software has long been considered a cornerstone of scholarly contributions in computational science.

This is a great survey, even for those who know the area. Kudos to Tim, Siva, and Wissam for having crafted it.

### References

1. Claude Berge, Two theorems in graph theory, Proceedings of the National Academy of Sciences of the United States of America 43(9), 842–844, 1957  (link).
2. Harry M. Markowitz, The elimination form of the inverse and its application to linear programming, Management Science, 3 (3), 255–269, 1957 (link).