# DLAP Sparse Linear Algebra Package

DLAP is a FORTRAN90 library which implements a number of routines for solving sparse linear systems, by Anne Greenbaum and Mark Seager.

DLAP contains "core" routines for the iterative solution of symmetric and non-symmetric positive definite and positive semi-definite linear systems.

Included in this package are core routines to do

• Iterative Refinement iteration,
• Preconditioned Conjugate Gradient iteration on the Normal Equations,
• Preconditioned BiConjugate Gradient Squared iteration,
• Orthomin iteration,
• Generalized Minimum Residual iteration (GMRES).

The DLAP core routines generally do not interact directly with the sparse matrix. Instead, they require the user to supply two routines:

• MATVEC (Matrix Vector Multiply)
• MSOLVE (Preconditioning).
This allows the core routines to be written in a way that makes them independent of the matrix data structure. For each core routine there are several drivers and support routines that allow the user to utilize Diagonal Scaling and Incomplete Cholesky factorization or Incomplete LU factorization as preconditioners with no coding. The price for this convenience is that one must use the a specific matrix data structure: DLAP Column or DLAP Triad format.

This document contains the specifications for the DLAP Version 2.0 package, a Fortran package for the solution of large sparse linear systems, Ax = b, via preconditioned iterative methods.

Included in this package are core routines to do

• Iterative Refinement (Jacobi's method),
• Conjugate Gradient on the normal equations, AA'y = b, (where x = A'y and A' denotes the transpose of A),
• Orthomin
• Generalized Minimum Residual (GMRES) Iteration.

These core routines do not require a fixed data structure for storing the matrix A and the preconditioning matrix M. The user is free to choose any structure that facilitates efficient solution of the problem at hand. The drawback to this approach is that the user must also supply at least two routines (MATVEC and MSOLVE, say).

MATVEC must calculate y = A*x, given x and the user's data structure for A. MSOLVE must solve, r = M*z, for z (*NOT* r) given r and the user's data structure for M (or its inverse). The user should choose M so that M*A is approximately the identity and the solution step r = M*z is "easy" to solve.

For some of the "core" routines (Orthomin, BiConjugate Gradient and Conjugate Gradient on the normal equations) the user must also supply a matrix transpose times vector routine (MTTVEC, say) and (possibly, depending on the "core" method) a routine that solves the transpose of the preconditioning step (MTSOLV, say). Specifically, MTTVEC is a routine which calculates y = A'x, given x and the user's data structure for A (A' is the transpose of A). MTSOLV is a routine which solves the system r = M'z for z given r and the user's data structure for M.

This process of writing the matrix vector operations can be time consuming and error prone. To alleviate these problems we have written drivers for the "core" methods that assume the user supplies one of two specific data structures (DLAP Triad and DLAP Column format), see below. Utilizing these data structures we have augmented each "core" method with two preconditioners: Diagonal Scaling and Incomplete Factorization. Diagonal scaling is easy to implement, vectorizes very well and for problems that are not too ill-conditioned reduces the number of iterations enough to warrant its use. On the other hand, an Incomplete factorization (Incomplete Cholesky for symmetric systems and Incomplete LU for nonsymmetric systems) may take much longer to calculate, but it reduces the iteration count (for most problems) significantly. Our implementations of IC and ILU vectorize for machines with hardware gather scatter, but the vector lengths can be quite short if the number of nonzeros in a column is not large.

In the DLAP Triad format only the non-zeros are stored. They may appear in any order. The user supplies three arrays of length NELT, where NELT is the number of non-zeros in the matrix: (IA(NELT), JA(NELT), A(NELT)). If the matrix is symmetric then one need only store the lower triangle (including the diagonal) and NELT would be the corresponding number of non-zeros stored. For each non-zero the user puts the row and column index of that matrix element in the IA and JA arrays. The value of the non-zero matrix element is placed in the corresponding location of the A array. This is an extremely easy data structure to generate. On the other hand, it is not very efficient on vector computers for the iterative solution of linear systems. Hence, DLAP changes this input data structure to the DLAP Column format for the iteration (but does not change it back).

Here is an example of the DLAP Triad storage format for a nonsymmetric 5x5 Matrix. NELT=11. Recall that the entries may appear in any order.

```      |11 12  0  0 15|
|21 22  0  0  0|
| 0  0 33  0 35|
| 0  0  0 44  0|
|51  0 53  0 55|
```
Here is the DLAP Triad format for the same 5x5 matrix:
```           1  2  3  4  5  6  7  8  9 10 11
A: 51 12 11 33 15 53 55 22 35 44 21
IA:  5  1  1  3  1  5  5  2  3  4  2
JA:  1  2  1  3  5  3  5  2  5  4  1
```

## DLAP Column format

In the DLAP Column format the non-zeros are stored counting down columns (except for the diagonal entry, which must appear first in each "column") and are stored in the real array A. In other words, for each column in the matrix first put the diagonal entry in A. Then put in the other non-zero elements going down the column (except the diagonal) in order. The IA array holds the row index for each non-zero. The JA array holds the offsets into the IA, A arrays for the beginning of each column. That is, IA(JA(ICOL)), A(JA(ICOL)) are the first elements of the ICOL-th column in IA and A. IA(JA(ICOL+1)-1), A(JA(ICOL+1)-1) are the last elements of the ICOL-th column. Note that we always have JA(N+1) = NELT+1, where N is the number of columns in the matrix and NELT is the number of non-zeros in the matrix. If the matrix is symmetric one need only store the lower triangle (including the diagonal) and NELT would be the corresponding number of non-zeros stored.

Here is an example of the DLAP Column storage format for a nonsymmetric 5x5 Matrix (in the A and IA arrays '|' denotes the end of a column):

```      |11 12  0  0 15|
|21 22  0  0  0|
| 0  0 33  0 35|
| 0  0  0 44  0|
|51  0 53  0 55|
```
Here is the DLAP Column format for the 5x5 matrix:
```           1  2  3    4  5    6  7    8    9 10 11
A: 11 21 51 | 22 12 | 33 53 | 44 | 55 15 35
IA:  1  2  5 |  2  1 |  3  5 |  4 |  5  1  3
JA:  1  4  6    8  9   12
```

## Naming Conventions

DLAP iterative methods, matrix vector and preconditioner calculation routines follow a naming convention which, when understood, allows one to determine the iterative method and data structure(s) used. The subroutine naming convention takes the following form:

P[S][M]DESC
where P stands for the precision (or data type) of the routine and is required in all names, S denotes whether or not the routine requires the DLAP Triad or Column format (it does if the second letter of the name is S and does not otherwise), the optional M stands for the type of preconditioner used (only appears in drivers for "core" routines) and DESC is some number of letters describing the method or purpose of the routine. In this incarnation of DLAP only single precision data types are supported (no double precision or complex data type routines have been written). Hence, all routines start with an S, boring. The brackets around S and M designate that these fields are optional.

The following is a list of the "DESC" fields for iterative methods and their meaning: BCG: BiConjugate Gradient; CG: Conjugate Gradient; CGN: Conjugate Gradient on the Normal equations; CGS, CS: biConjugate Gradient Squared; GMRES, GMR, GM: Generalized Minimum RESidual; IR, R: Iterative Refinement; JAC: JACobi's method; GS: Gauss-Seidel; OMN, OM: Orthomin.

Here are some examples of the routines:

1. SBCG: Single precision BiConjugate Gradient "core" routine. One can deduce that this is a "core" routine, because the S and M fields are missing and BiConjugate Gradient is an iterative method.
2. SSDBCG: Single precision, DLAP data structure BCG with Diagonal scaling.
3. SSLUBC: Single precision, BCG with incomplete LU factorization as the preconditioning.
4. SCG: Single precision Conjugate Gradient "core" routine.
5. SSDCG: Single precision, DLAP data structure Conjugate Gradient with Diagonal scaling.
6. SSICCG: Single precision, DLAP data structure Conjugate Gradient with Incomplete Cholesky factorization preconditioning.

## Which Method To Use

In solving a large sparse linear system Ax = b using an iterative method, it is not necessary to actually store the matrix A. Rather, what is needed is a procedure for multiplying the matrix A times a given vector y to obtain the matrix-vector product, Ay. DLAP has been written to take advantage of this fact. The higher level routines in the package require storage only of the nonzero elements of A (and their positions), and even this can be avoided, if the user writes his own subroutine for multiplying the matrix times a vector and calls the lower-level iterative routines in the package.

If the matrix A is ill-conditioned, then most iterative methods will be slow to converge (if they converge at all!). To improve the convergence rate, one may use a "matrix splitting," or, "preconditioning matrix," say, M. It is then necessary to solve, at each iteration, a linear system with coefficient matrix M. A good preconditioner M should have two properties: (1) M should "approximate" A, in the sense that the matrix inv(M)*A (or some variant thereof) is better conditioned than the original matrix A; and (2) linear systems with coefficient matrix M should be much easier to solve than the original system with coefficient matrix A. Preconditioning routines in the DLAP package are separate from the iterative routines, so that any of the preconditioners provided in the package, or one that the user codes, can be used with any of the iterative routines.

If you are willing to live with either the DLAP Triad or Column matrix data structure you can then choose one of two types of preconditioners to use: diagonal scaling or incomplete factorization. To choose between these two methods requires knowing something about the computer you're going to run these codes on and how well incomplete factorization approximates the inverse of your matrix.

Let's suppose you have a scalar machine. Then, unless the incomplete factorization is very, very poor this is *GENERALLY* the method to choose. It will reduce the number of iterations significantly and is not all that expensive to compute. So if you have just one linear system to solve and "just want to get the job done" then try incomplete factorization first. If you are thinking of integrating some DLAP iterative method into your favorite "production code" then try incomplete factorization first, but also check to see that diagonal scaling is indeed slower for a large sample of test problems.

Let's now suppose you have a vector computer with hardware gather/scatter support (Cray X-MP, Y-MP, SCS-40 or Cyber 205, ETA 10, ETA Piper or Convex C-1, etc.). Then it's much harder to choose between the two methods. The versions of incomplete factorization in DLAP do in fact vectorize, but have short vector lengths and the factorization step is relatively more expensive. Hence, for most problems (i.e., unless your problem is ill conditioned, sic!) diagonal scaling is faster, with its very fast set up time and vectorized (with long vectors) preconditioning step (even though it may take more iterations). If you have several systems (or right hand sides) to solve that can utilize the same preconditioner then the cost of the incomplete factorization can be amortized over these several solutions. This situation gives more advantage to the incomplete factorization methods. If you have a vector machine without hardware gather/scatter (Cray 1, Cray 2 & Cray 3) then the advantages for incomplete factorization are even less.

If you're trying to shoehorn DLAP into your favorite "production code" and can not easily generate either the DLAP Triad or Column format, then you are left to your own devices in terms of preconditioning. Also, you may find that the preconditioners supplied with DLAP are not sufficient for your problem. In this situation, we would recommend that you talk with a numerical analyst versed in iterative methods about writing other preconditioning subroutines (e.g., polynomial preconditioning, shifted incomplete factorization, SOR or SSOR iteration). You can always "roll your own" by using the "core" iterative methods and supplying your own MSOLVE and MATVEC (and possibly MTSOLV and MTTVEC) routines. If you do develop a new preconditioner for the DLAP data structure send the code to us (if you can do that with no strings attached!, i.e. copyright restrictions) and we'll add it to the package!

If your matrix is symmetric then you would want to use one of the symmetric system solvers. If your system is also positive definite, (Ax,x) (Ax dot product with x) is positive for all non-zero vectors x, then use Conjugate Gradient (SCG, SSDCG, SSICSG). If you're not sure it's SPD (symmetric and Positive Definite) then try SCG anyway and if it works, fine. If you're sure your matrix is not positive definite then you may want to try the iterative refinement methods (SIR) or the GMRES code (SGMRES) if SIR converges too slowly.

Nonsymmetric systems are an area of active research in numerical analysis and there are new strategies being developed. Consequently take the following advice with a grain of salt. If your matrix is positive definite, (Ax,x) (Ax dot product with x is positive for all non-zero vectors x), then you can use any of the methods for nonsymmetric systems (Orthomin, GMRES, BiConjugate Gradient, BiConjugate Gradient Squared and Conjugate Gradient applied to the normal equations). If your system is not too ill conditioned then try BiConjugate Gradient Squared (BCGS) or GMRES (SGMRES). Both of these methods converge very quickly and do not require A' or M' (' denotes transpose) information. SGMRES does require some additional storage, though. If the system is very ill conditioned or nearly positive indefinite ((Ax,x) is positive, but may be very small), then GMRES should be the first choice, but try the other methods if you have to fine tune the solution process for a "production code". If you have a great preconditioner for the normal equations (i.e., M is an approximation to the inverse of AA' rather than just A) then this is not a bad route to travel. Old wisdom would say that the normal equations are a disaster (since it squares the condition number of the system and SCG convergence is linked to this number of infamy), but some preconditioners (like incomplete factorization) can reduce the condition number back below that of the original system.

DLAP can store a sparse matrix in a file, using SLAP/DLAP Sparse Matrix File Format. Such a file can be written by the routine DTOUT, or read back in by DTIN.

There is also a routine DBHIN which can read in a file in Harwell Boeing Sparse Matrix File Format.

### Languages:

DLAP is available in a FORTRAN90 version.

### Related Data and Programs:

CC, a data directory which contains examples of the Compressed Column (CC) sparse matrix file format;

CG, a FORTRAN90 library which implements the conjugate gradient method for solving a positive definite sparse linear system A*x=b, using reverse communication.

CR, a data directory which contains examples of the Compressed Row (CR) sparse matrix file format;

DLAP_IO, a FORTRAN90 library which reads and writes files describing sparse matrices used by DLAP.

DSP, a data directory which contains a description and examples of the DSP format for storing sparse matrices, which is equivalent to the SLAP/DLAP Triad format.

HB_TO_ST, a FORTRAN90 program which converts the sparse matrix information stored in a Harwell-Boeing file into a sparse triplet file.

LINPLUS, a FORTRAN90 library which manipulates matrices in a variety of formats.

MACHINE, a FORTRAN90 library which supplies certain machine arithmetic constants. A copy of this library is used by DLAP.

MGMRES, a FORTRAN90 library which applies the restarted GMRES algorithm to solve a sparse linear system, by Lili Ju.

SLATEC, a FORTRAN90 library which collects together a number of standard numerical libraries, including BLAS, DASSL, DEPAC, EISPACK, FFTPACK, FISHPACK, FNLIB, LINPACK, PCHIP, QUADPACK, SDRIV, SLAP, XERROR.

SPARSEKIT, a FORTRAN90 library which carries out sparse matrix operations, by Yousef Saad.

SPARSEPAK, a FORTRAN90 library which forms an obsolete version of the Waterloo Sparse Matrix Package.

SUPERLU, FORTRAN90 programs which illustrate how to use the SUPERLU library, which applies a fast direct solution method to solve sparse linear systems, by James Demmel, John Gilbert, and Xiaoye Li.

### Author:

Anne Greenbaum, Mark Seager

### Reference:

1. Peter Brown, Alan Hindmarsh,
Reduced Storage Matrix Methods In Stiff ODE Systems,
Technical Report UCRL-95088, Revision 1,
Lawrence Livermore National Laboratory, June 1987.
2. Paul Concus, Gene Golub, Dianne OLeary,
A Generalized Conjugate Gradient Method for the Numerical Solution of Elliptic Partial Differential Equations,
in Symposium on Sparse Matrix Computations,
edited by James Bunch, Donald Rose,
ISBN: 0121410501,
LC: QA188.S9.
3. Gene Golub, Charles VanLoan,
Matrix Computations,
Third Edition,
Johns Hopkins, 1996,
ISBN: 0-8018-4513-X,
LC: QA188.G65.
4. Louis Hageman, David Young,
Applied Iterative Methods,
ISBN: 0-12-313340-8,
LC: QA297.8.H34.
5. Ron Jones, David Kahaner,
XERROR, The SLATEC Error Handling Package,
Technical Report SAND82-0800,
Sandia National Laboratories, 1982.
6. Ron Jones, David Kahaner,
XERROR, The SLATEC Error Handling Package,
Software: Practice and Experience,
Volume 13, Number 3, 1983, pages 251-257.
7. Erik Kaasschieter,
The solution of non-symmetric linear systems by bi-conjugate gradients or conjugate gradients squared,
Technical Report 86-21,
Delft University of Technology Report, 1986.
8. Charles Lawson, Richard Hanson, David Kincaid, Fred Krogh,
Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage,
ACM Transactions on Mathematical Software,
Volume 5, Number 3, September 1979, pages 308-323.
9. Mark Seager,
A SLAP for the Masses,
Technical Report: UCRL-100267,
Lawrence Livermore National Laboratory, December 1988.
10. Richard Singleton,
Algorithm 347: An Efficient Algorithm for Sorting with Minimal Storage,
Communications of the ACM,
Volume 12, Number 3, March 1969, pages 185-187.
11. Peter Sonneveld,
CGS, a fast Lanczos-type solver for nonsymmetric linear systems,
Technical Report 84-16,
Department of Mathematics and Informatics,
Delft University of Technology, 1984.
12. http://www.netlib.org/slap/index.html, NETLIB web site.

### Examples and Tests:

DLAP_TEST is a formal test program that checks all the DLAP routines. However, it's not very useful as a template for writing your own program!

DLAP_DGMRES_TEST is a "simple" test program applies the GMRES algorithm to the [-1,2,-1] matrix. No preconditioning is used. The entries of the matrix are stored naively, in DLAP Triad format. You might find this example easier to adapt to your own purposes.

### List of Routines:

• I1MACH returns integer machine constants.
• ISAMAX finds the index of the vector element of maximum absolute value.
• ISDBCG: Non-Symmetric Linear system, Sparse, Iterative Precondition, Stop Test.
• ISDCG calculates the stop test for the Conjugate Gradient iteration scheme.
• ISDCGN: stop test for the Conjugate Gradient applied to the normal equations.
• ISDCGS: stop test for the BiConjugate Gradient iteration scheme.
• ISDGMR: stop test for the GMRES iteration.
• ISDIR: stop test for the iterative refinement iteration.
• ISDOMN : stop test for the Orthomin iteration.
• J4SAVE saves and recalls variables needed for error handling.
• QS2I1R sorts an integer array, and updates companion integer and real arrays.
• D1MACH returns double precision machine constants.
• RAND generates a uniformly distributed random number.
• DASUM sums the absolute values of the entries of a vector.
• DAMAX returns the maximum absolute value of the entries in a vector.
• DAXPY adds a constant times one vector to another.
• DAXPYX computes a constant times a vector plus a vector.
• DBCG: solve a Non-Symmetric system using Preconditioned BiConjugate Gradient.
• DBHIN: read a sparse linear system in Boeing Harwell format.
• DCG: solve PDS system using Preconditioned Conjugate Gradient.
• DCGN: solve system using Preconditioned Conjugate Gradient/Normal Equations.
• DCGS solve Non-Symmetric system using Preconditioned BiConjugate Gradient.
• DCHKW checks work array lengths.
• DCOPY copies one real vector into another.
• DCPPLT prints a DLAP column format matrix.
• DDOT forms the dot product of two vectors.
• DDSDOT computes the dot product of real vectors using double precision.
• DGMRES solves a nonsymmetric system using preconditioned GMRES.
• DHELS minimizes the norm of B-A*X using factors from SHEQR.
• DHEQR QR decomposes an upper Hessenberg matrix using Givens rotations.
• DIR: Linear system, Sparse, Iterative Precondition
• DLLTI2 solves (L*D*L')*X=B, L is unit lower triangular, and D is diagonal.
• DNRM2 computes the Euclidean norm of a vector.
• DOMN: Non-Symmetric Linear system, Sparse, Iterative Precondition, Orthomin
• DORTH: Non-Symmetric Linear system, Sparse, Iterative Precondition, Generalized Minimum Residual
• DPIGMR: Non-Symmetric Linear system, Sparse, Iterative Precondition, Generalized Minimum Residual
• DRLCAL calculates the scaled residual RL from the V(I)'s.
• DROT applies a plane rotation.
• DROTG constructs a Givens plane rotation.
• DROTM applies a modified Givens plane rotation.
• DROTMG constructs a modified Givens plane rotation.
• DS2LT: Linear system, DLAP Sparse, Lower Triangle.
• DS2Y: convert from DLAP Triad to DLAP Column format.
• DSCAL scales a vector by a constant.
• DSD2S: compute the inverse of the diagonal of A*A'.
• DSDBCG: Non-Symmetric Linear system, Sparse, Iterative Precondition
• DSDCG: Symmetric Linear system, Sparse, Iterative Precondition
• DSDCGN: Non-Symmetric Linear system solve, Sparse, Iterative Precondition
• DSDCGS: Non-Symmetric Linear system, Sparse, Iterative Precondition
• DSDGMR: Non-Symmetric Linear system, Sparse, Iterative Precondition, Generalized Minimum Residual
• DSDI calculates the product X = DIAG*B where DIAG is a diagonal matrix.
• DSDOMN: Non-Symmetric Linear system solve, Sparse, Iterative Precondition
• DSDS: DLAP Sparse, Diagonal
• DSDSCL: DLAP Sparse, Diagonal
• DSGS: Linear system, Sparse, Iterative Precondition
• DSICCG: Symmetric Linear system, Sparse, Iterative Precondition, Incomplete Cholesky
• DSICS: Linear system, DLAP Sparse, Iterative Precondition, Incomplete Cholesky Factorization.
• DSILUR: Linear system, Sparse, Iterative Precondition
• DSILUS: Non-Symmetric Linear system, Sparse, Iterative Precondition, Incomplete LU Factorization
• DSJAC solves a linear system A*x=b using Jacobi iteration.
• DSLI: Linear system solve, Sparse, Iterative Precondition
• DSLI2: Linear system solve, Sparse, Iterative Precondition
• DSLLTI: Linear system solve, Sparse, Iterative Precondition
• DSLUBC: Non-Symmetric Linear system, Sparse, Iterative incomplete LU Precondition
• DSLUCN: Non-Symmetric Linear system, Sparse, Iterative Incomplete LU Precondition
• DSLUCS: Non-Symmetric Linear system, Sparse, Iterative incomplete LU Precondition
• DSLUGM: Non-Symmetric Linear system, Sparse, Iterative Precondition, Generalized Minimum Residual
• DSLUI applies the incomplete LU preconditioner.
• DSLUI2 carries out the incomplete LU preconditioning.
• DSLUI4 solves (L*D*U)'*X = B.
• DSLUOM is an incomplete LU Orthomin solver for A*x=b.
• DSLUTI is the DLAP MTSOLV for LDU Factorization.
• DSMMI2 back solves for LDU factorization of the normal equations.
• DSMMTI is the DLAP MSOLVE for LDU Factorization of Normal Equations.
• DSMTV: Matrix transpose Vector Multiply, Sparse
• DSMV, Matrix Vector Multiply, Sparse
• DSWAP interchanges two vectors.
• DTIN Linear system, DLAP Sparse, Diagnostics
• DTOUT writes out DLAP Triad Format Linear System.
• DXLCAL computes the solution XL, the current DGMRES iterate.
• XERABT aborts program execution and print error message.
• XERCLR resets current error number to zero.
• XERCTL allow user control over handling of errors.
• XERDMP prints the error tables and then clears them.
• XERMAX sets maximum number of times any error message is to be printed.
• XERPRT prints error messages.
• XERROR processes an error message.
• XERRWV processes an error message.
• XERSAV records that an error has occurred.
• XGETF returns the current value of the error control flag.
• XGETUA returns unit number(s) to which error messages are being sent.
• XGETUN returns the (first) output file to which error messages are being sent.
• XSETF sets the error control flag.
• XSETUA sets logical unit numbers to which error messages are to be sent.
• XSETUN sets the output file to which error messages are to be sent.
• TIMESTAMP prints the current YMDHMS date as a time stamp.

You can go up one level to the FORTRAN90 source codes.

Last revised on 01 January 2010.