Documentation INTEL

 

 

 CD ROM Annuaire d'Entreprises France prospect (avec ou sans emails) : REMISE DE 10 % Avec le code réduction AUDEN872

: matrix-matrix product, triangular matrix, double-precision complex. Sparse BLAS level 1 naming conventions are similar to those of BLAS level 1. For more information, see Naming Conventions. Fortran 95 Interface Conventions Fortran 95 interface to BLAS and Sparse BLAS Level 1 routines is implemented through wrappers that call respective FORTRAN 77 routines. This interface uses such features of Fortran 95 as assumed-shape arrays and optional arguments to provide simplified calls to BLAS and Sparse BLAS Level 1 routines with fewer parameters. 2 Intel® Math Kernel Library Reference Manual 52 NOTE For BLAS, Intel MKL offers two types of Fortran 95 interfaces: • using mkl_blas.fi only through include 'mkl_blas_subroutine.fi' statement. Such interfaces allow you to make use of the original LAPACK routines with all their arguments • using blas.f90 that includes improved interfaces. This file is used to generate the module files blas95.mod and f95_precision.mod. The module files mkl95_blas.mod and mkl95_precision.mod are also generated. See also section "Fortran 95 interfaces and wrappers to LAPACK and BLAS" of Intel® MKL User's Guide for details. The module files are used to process the FORTRAN use clauses referencing the BLAS interface: use blas95 (or an equivalent use mkl95_blas) and use f95_precision (or an equivalent use mkl95_precision). The main conventions used in Fortran 95 interface are as follows: • The names of parameters used in Fortran 95 interface are typically the same as those used for the respective generic (FORTRAN 77) interface. In rare cases formal argument names may be different. • Some input parameters such as array dimensions are not required in Fortran 95 and are skipped from the calling sequence. Array dimensions are reconstructed from the user data that must exactly follow the required array shape. • A parameter can be skipped if its value is completely defined by the presence or absence of another parameter in the calling sequence, and the restored value is the only meaningful value for the skipped parameter. • Parameters specifying the increment values incx and incy are skipped. In most cases their values are equal to 1. In Fortran 95 an increment with different value can be directly established in the corresponding parameter. • Some generic parameters are declared as optional in Fortran 95 interface and may or may not be present in the calling sequence. A parameter can be declared optional if it satisfies one of the following conditions: 1. It can take only a few possible values. The default value of such parameter typically is the first value in the list; all exceptions to this rule are explicitly stated in the routine description. 2. It has a natural default value. Optional parameters are given in square brackets in Fortran 95 call syntax. The particular rules used for reconstructing the values of omitted optional parameters are specific for each routine and are detailed in the respective "Fortran 95 Notes" subsection at the end of routine specification section. If this subsection is omitted, the Fortran 95 interface for the given routine does not differ from the corresponding FORTRAN 77 interface. Note that this interface is not implemented in the current version of Sparse BLAS Level 2 and Level 3 routines. Matrix Storage Schemes Matrix arguments of BLAS routines can use the following storage schemes: • Full storage: a matrix A is stored in a two-dimensional array a, with the matrix element aij stored in the array element a(i,j). • Packed storage scheme allows you to store symmetric, Hermitian, or triangular matrices more compactly: the upper or lower triangle of the matrix is packed by columns in a one-dimensional array. • Band storage: a band matrix is stored compactly in a two-dimensional array: columns of the matrix are stored in the corresponding columns of the array, and diagonals of the matrix are stored in rows of the array. For more information on matrix storage schemes, see Matrix Arguments in Appendix B. BLAS Level 1 Routines and Functions BLAS Level 1 includes routines and functions, which perform vector-vector operations. Table “BLAS Level 1 Routine Groups and Their Data Types” lists the BLAS Level 1 routine and function groups and the data types associated with them. BLAS and Sparse BLAS Routines 2 53 BLAS Level 1 Routine and Function Groups and Their Data Types Routine or Function Group Data Types Description ?asum s, d, sc, dz Sum of vector magnitudes (functions) ?axpy s, d, c, z Scalar-vector product (routines) ?copy s, d, c, z Copy vector (routines) ?dot s, d Dot product (functions) ?sdot sd, d Dot product with extended precision (functions) ?dotc c, z Dot product conjugated (functions) ?dotu c, z Dot product unconjugated (functions) ?nrm2 s, d, sc, dz Vector 2-norm (Euclidean norm) (functions) ?rot s, d, cs, zd Plane rotation of points (routines) ?rotg s, d, c, z Generate Givens rotation of points (routines) ?rotm s, d Modified Givens plane rotation of points (routines) ?rotmg s, d Generate modified Givens plane rotation of points (routines) ?scal s, d, c, z, cs, zd Vector-scalar product (routines) ?swap s, d, c, z Vector-vector swap (routines) i?amax s, d, c, z Index of the maximum absolute value element of a vector (functions) i?amin s, d, c, z Index of the minimum absolute value element of a vector (functions) ?cabs1 s, d Auxiliary functions, compute the absolute value of a complex number of single or double precision ?asum Computes the sum of magnitudes of the vector elements. Syntax Fortran 77: res = sasum(n, x, incx) res = scasum(n, x, incx) res = dasum(n, x, incx) res = dzasum(n, x, incx) Fortran 95: res = asum(x) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 2 Intel® Math Kernel Library Reference Manual 54 • C: mkl_blas.h Description The ?asum routine computes the sum of the magnitudes of elements of a real vector, or the sum of magnitudes of the real and imaginary parts of elements of a complex vector: res = |Re x(1)| + |Im x(1)| + |Re x(2)| + |Im x(2)|+ ... + |Re x(n)| + |Im x(n)|, where x is a vector with a number of elements that equals n. Input Parameters n INTEGER. Specifies the number of elements in vector x. x REAL for sasum DOUBLE PRECISION for dasum COMPLEX for scasum DOUBLE COMPLEX for dzasum Array, DIMENSION at least (1 + (n-1)*abs(incx)). incx INTEGER. Specifies the increment for indexing vector x. Output Parameters res REAL for sasum DOUBLE PRECISION for dasum REAL for scasum DOUBLE PRECISION for dzasum Contains the sum of magnitudes of real and imaginary parts of all elements of the vector. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine asum interface are the following: x Holds the array of size n. ?axpy Computes a vector-scalar product and adds the result to a vector. Syntax Fortran 77: call saxpy(n, a, x, incx, y, incy) call daxpy(n, a, x, incx, y, incy) call caxpy(n, a, x, incx, y, incy) call zaxpy(n, a, x, incx, y, incy) Fortran 95: call axpy(x, y [,a]) BLAS and Sparse BLAS Routines 2 55 Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?axpy routines perform a vector-vector operation defined as y := a*x + y where: a is a scalar x and y are vectors each with a number of elements that equals n. Input Parameters n INTEGER. Specifies the number of elements in vectors x and y. a REAL for saxpy DOUBLE PRECISION for daxpy COMPLEX for caxpy DOUBLE COMPLEX for zaxpy Specifies the scalar a. x REAL for saxpy DOUBLE PRECISION for daxpy COMPLEX for caxpy DOUBLE COMPLEX for zaxpy Array, DIMENSION at least (1 + (n-1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. y REAL for saxpy DOUBLE PRECISION for daxpy COMPLEX for caxpy DOUBLE COMPLEX for zaxpy Array, DIMENSION at least (1 + (n-1)*abs(incy)). incy INTEGER. Specifies the increment for the elements of y. Output Parameters y Contains the updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine axpy interface are the following: x Holds the array of size n. y Holds the array of size n. a The default value is 1. ?copy Copies vector to another vector. 2 Intel® Math Kernel Library Reference Manual 56 Syntax Fortran 77: call scopy(n, x, incx, y, incy) call dcopy(n, x, incx, y, incy) call ccopy(n, x, incx, y, incy) call zcopy(n, x, incx, y, incy) Fortran 95: call copy(x, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?copy routines perform a vector-vector operation defined as y = x, where x and y are vectors. Input Parameters n INTEGER. Specifies the number of elements in vectors x and y. x REAL for scopy DOUBLE PRECISION for dcopy COMPLEX for ccopy DOUBLE COMPLEX for zcopy Array, DIMENSION at least (1 + (n-1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. y REAL for scopy DOUBLE PRECISION for dcopy COMPLEX for ccopy DOUBLE COMPLEX for zcopy Array, DIMENSION at least (1 + (n-1)*abs(incy)). incy INTEGER. Specifies the increment for the elements of y. Output Parameters y Contains a copy of the vector x if n is positive. Otherwise, parameters are unaltered. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine copy interface are the following: x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. BLAS and Sparse BLAS Routines 2 57 ?dot Computes a vector-vector dot product. Syntax Fortran 77: res = sdot(n, x, incx, y, incy) res = ddot(n, x, incx, y, incy) Fortran 95: res = dot(x, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?dot routines perform a vector-vector reduction operation defined as where xi and yi are elements of vectors x and y. Input Parameters n INTEGER. Specifies the number of elements in vectors x and y. x REAL for sdot DOUBLE PRECISION for ddot Array, DIMENSION at least (1+(n-1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. y REAL for sdot DOUBLE PRECISION for ddot Array, DIMENSION at least (1+(n-1)*abs(incy)). incy INTEGER. Specifies the increment for the elements of y. Output Parameters res REAL for sdot DOUBLE PRECISION for ddot Contains the result of the dot product of x and y, if n is positive. Otherwise, res contains 0. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine dot interface are the following: 2 Intel® Math Kernel Library Reference Manual 58 x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. ?sdot Computes a vector-vector dot product with extended precision. Syntax Fortran 77: res = sdsdot(n, sb, sx, incx, sy, incy) res = dsdot(n, sx, incx, sy, incy) Fortran 95: res = sdot(sx, sy) res = sdot(sx, sy, sb) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?sdot routines compute the inner product of two vectors with extended precision. Both routines use extended precision accumulation of the intermediate results, but the sdsdot routine outputs the final result in single precision, whereas the dsdot routine outputs the double precision result. The function sdsdot also adds scalar value sb to the inner product. Input Parameters n INTEGER. Specifies the number of elements in the input vectors sx and sy. sb REAL. Single precision scalar to be added to inner product (for the function sdsdot only). sx, sy REAL. Arrays, DIMENSION at least (1+(n -1)*abs(incx)) and (1+ (n-1)*abs(incy)), respectively. Contain the input single precision vectors. incx INTEGER. Specifies the increment for the elements of sx. incy INTEGER. Specifies the increment for the elements of sy. Output Parameters res REAL for sdsdot DOUBLE PRECISION for dsdot Contains the result of the dot product of sx and sy (with sb added for sdsdot), if n is positive. Otherwise, res contains sb for sdsdot and 0 for dsdot. BLAS and Sparse BLAS Routines 2 59 Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine sdot interface are the following: sx Holds the vector with the number of elements n. sy Holds the vector with the number of elements n. NOTE Note that scalar parameter sb is declared as a required parameter in Fortran 95 interface for the function sdot to distinguish between function flavors that output final result in different precision. ?dotc Computes a dot product of a conjugated vector with another vector. Syntax Fortran 77: res = cdotc(n, x, incx, y, incy) res = zdotc(n, x, incx, y, incy) Fortran 95: res = dotc(x, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?dotc routines perform a vector-vector operation defined as: where xi and yi are elements of vectors x and y. Input Parameters n INTEGER. Specifies the number of elements in vectors x and y. x COMPLEX for cdotc DOUBLE COMPLEX for zdotc Array, DIMENSION at least (1 + (n -1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. y COMPLEX for cdotc DOUBLE COMPLEX for zdotc Array, DIMENSION at least (1 + (n -1)*abs(incy)). incy INTEGER. Specifies the increment for the elements of y. 2 Intel® Math Kernel Library Reference Manual 60 Output Parameters res COMPLEX for cdotc DOUBLE COMPLEX for zdotc Contains the result of the dot product of the conjugated x and unconjugated y, if n is positive. Otherwise, res contains 0. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine dotc interface are the following: x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. ?dotu Computes a vector-vector dot product. Syntax Fortran 77: res = cdotu(n, x, incx, y, incy) res = zdotu(n, x, incx, y, incy) Fortran 95: res = dotu(x, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?dotu routines perform a vector-vector reduction operation defined as where xi and yi are elements of complex vectors x and y. Input Parameters n INTEGER. Specifies the number of elements in vectors x and y. x COMPLEX for cdotu DOUBLE COMPLEX for zdotu Array, DIMENSION at least (1 + (n -1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. y COMPLEX for cdotu DOUBLE COMPLEX for zdotu BLAS and Sparse BLAS Routines 2 61 Array, DIMENSION at least (1 + (n -1)*abs(incy)). incy INTEGER. Specifies the increment for the elements of y. Output Parameters res COMPLEX for cdotu DOUBLE COMPLEX for zdotu Contains the result of the dot product of x and y, if n is positive. Otherwise, res contains 0. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine dotu interface are the following: x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. ?nrm2 Computes the Euclidean norm of a vector. Syntax Fortran 77: res = snrm2(n, x, incx) res = dnrm2(n, x, incx) res = scnrm2(n, x, incx) res = dznrm2(n, x, incx) Fortran 95: res = nrm2(x) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?nrm2 routines perform a vector reduction operation defined as res = ||x||, where: x is a vector, res is a value containing the Euclidean norm of the elements of x. Input Parameters n INTEGER. Specifies the number of elements in vector x. x REAL for snrm2 2 Intel® Math Kernel Library Reference Manual 62 DOUBLE PRECISION for dnrm2 COMPLEX for scnrm2 DOUBLE COMPLEX for dznrm2 Array, DIMENSION at least (1 + (n -1)*abs (incx)). incx INTEGER. Specifies the increment for the elements of x. Output Parameters res REAL for snrm2 DOUBLE PRECISION for dnrm2 REAL for scnrm2 DOUBLE PRECISION for dznrm2 Contains the Euclidean norm of the vector x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine nrm2 interface are the following: x Holds the vector with the number of elements n. ?rot Performs rotation of points in the plane. Syntax Fortran 77: call srot(n, x, incx, y, incy, c, s) call drot(n, x, incx, y, incy, c, s) call csrot(n, x, incx, y, incy, c, s) call zdrot(n, x, incx, y, incy, c, s) Fortran 95: call rot(x, y, c, s) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description Given two complex vectors x and y, each vector element of these vectors is replaced as follows: x(i) = c*x(i) + s*y(i) y(i) = c*y(i) - s*x(i) Input Parameters n INTEGER. Specifies the number of elements in vectors x and y. x REAL for srot BLAS and Sparse BLAS Routines 2 63 DOUBLE PRECISION for drot COMPLEX for csrot DOUBLE COMPLEX for zdrot Array, DIMENSION at least (1 + (n-1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. y REAL for srot DOUBLE PRECISION for drot COMPLEX for csrot DOUBLE COMPLEX for zdrot Array, DIMENSION at least (1 + (n -1)*abs(incy)). incy INTEGER. Specifies the increment for the elements of y. c REAL for srot DOUBLE PRECISION for drot REAL for csrot DOUBLE PRECISION for zdrot A scalar. s REAL for srot DOUBLE PRECISION for drot REAL for csrot DOUBLE PRECISION for zdrot A scalar. Output Parameters x Each element is replaced by c*x + s*y. y Each element is replaced by c*y - s*x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine rot interface are the following: x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. ?rotg Computes the parameters for a Givens rotation. Syntax Fortran 77: call srotg(a, b, c, s) call drotg(a, b, c, s) call crotg(a, b, c, s) call zrotg(a, b, c, s) Fortran 95: call rotg(a, b, c, s) 2 Intel® Math Kernel Library Reference Manual 64 Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description Given the Cartesian coordinates (a, b) of a point, these routines return the parameters c, s, r, and z associated with the Givens rotation. The parameters c and s define a unitary matrix such that: The parameter z is defined such that if |a| > |b|, z is s; otherwise if c is not 0 z is 1/c; otherwise z is 1. See a more accurate LAPACK version ?lartg. Input Parameters a REAL for srotg DOUBLE PRECISION for drotg COMPLEX for crotg DOUBLE COMPLEX for zrotg Provides the x-coordinate of the point p. b REAL for srotg DOUBLE PRECISION for drotg COMPLEX for crotg DOUBLE COMPLEX for zrotg Provides the y-coordinate of the point p. Output Parameters a Contains the parameter r associated with the Givens rotation. b Contains the parameter z associated with the Givens rotation. c REAL for srotg DOUBLE PRECISION for drotg REAL for crotg DOUBLE PRECISION for zrotg Contains the parameter c associated with the Givens rotation. s REAL for srotg DOUBLE PRECISION for drotg COMPLEX for crotg DOUBLE COMPLEX for zrotg Contains the parameter s associated with the Givens rotation. ?rotm Performs modified Givens rotation of points in the plane. Syntax Fortran 77: call srotm(n, x, incx, y, incy, param) BLAS and Sparse BLAS Routines 2 65 call drotm(n, x, incx, y, incy, param) Fortran 95: call rotm(x, y, param) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description Given two vectors x and y, each vector element of these vectors is replaced as follows: for i=1 to n, where H is a modified Givens transformation matrix whose values are stored in the param(2) through param(5) array. See discussion on the param argument. Input Parameters n INTEGER. Specifies the number of elements in vectors x and y. x REAL for srotm DOUBLE PRECISION for drotm Array, DIMENSION at least (1 + (n -1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. y REAL for srotm DOUBLE PRECISION for drotm Array, DIMENSION at least (1 + (n -1)*abs(incy)). incy INTEGER. Specifies the increment for the elements of y. param REAL for srotm DOUBLE PRECISION for drotm Array, DIMENSION 5. The elements of the param array are: param(1) contains a switch, flag. param(2-5) contain h11, h21, h12, and h22, respectively, the components of the array H. Depending on the values of flag, the components of H are set as follows: 2 Intel® Math Kernel Library Reference Manual 66 In the last three cases, the matrix entries of 1., -1., and 0. are assumed based on the value of flag and are not required to be set in the param vector. Output Parameters x Each element x(i) is replaced by h11*x(i) + h12*y(i). y Each element y(i) is replaced by h21*x(i) + h22*y(i). Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine rotm interface are the following: x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. ?rotmg Computes the parameters for a modified Givens rotation. Syntax Fortran 77: call srotmg(d1, d2, x1, y1, param) call drotmg(d1, d2, x1, y1, param) Fortran 95: call rotmg(d1, d2, x1, y1, param) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description Given Cartesian coordinates (x1, y1) of an input vector, these routines compute the components of a modified Givens transformation matrix H that zeros the y-component of the resulting vector: BLAS and Sparse BLAS Routines 2 67 Input Parameters d1 REAL for srotmg DOUBLE PRECISION for drotmg Provides the scaling factor for the x-coordinate of the input vector. d2 REAL for srotmg DOUBLE PRECISION for drotmg Provides the scaling factor for the y-coordinate of the input vector. x1 REAL for srotmg DOUBLE PRECISION for drotmg Provides the x-coordinate of the input vector. y1 REAL for srotmg DOUBLE PRECISION for drotmg Provides the y-coordinate of the input vector. Output Parameters d1 REAL for srotmg DOUBLE PRECISION for drotmg Provides the first diagonal element of the updated matrix. d2 REAL for srotmg DOUBLE PRECISION for drotmg Provides the second diagonal element of the updated matrix. x1 REAL for srotmg DOUBLE PRECISION for drotmg Provides the x-coordinate of the rotated vector before scaling. param REAL for srotmg DOUBLE PRECISION for drotmg Array, DIMENSION 5. The elements of the param array are: param(1) contains a switch, flag. param(2-5) contain h11, h21, h12, and h22, respectively, the components of the array H. Depending on the values of flag, the components of H are set as follows: 2 Intel® Math Kernel Library Reference Manual 68 In the last three cases, the matrix entries of 1., -1., and 0. are assumed based on the value of flag and are not required to be set in the param vector. ?scal Computes the product of a vector by a scalar. Syntax Fortran 77: call sscal(n, a, x, incx) call dscal(n, a, x, incx) call cscal(n, a, x, incx) call zscal(n, a, x, incx) call csscal(n, a, x, incx) call zdscal(n, a, x, incx) Fortran 95: call scal(x, a) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?scal routines perform a vector operation defined as x = a*x where: a is a scalar, x is an n-element vector. Input Parameters n INTEGER. Specifies the number of elements in vector x. a REAL for sscal and csscal DOUBLE PRECISION for dscal and zdscal COMPLEX for cscal DOUBLE COMPLEX for zscal Specifies the scalar a. x REAL for sscal DOUBLE PRECISION for dscal COMPLEX for cscal and csscal DOUBLE COMPLEX for zscal and zdscal Array, DIMENSION at least (1 + (n -1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. BLAS and Sparse BLAS Routines 2 69 Output Parameters x Updated vector x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine scal interface are the following: x Holds the vector with the number of elements n. ?swap Swaps a vector with another vector. Syntax Fortran 77: call sswap(n, x, incx, y, incy) call dswap(n, x, incx, y, incy) call cswap(n, x, incx, y, incy) call zswap(n, x, incx, y, incy) Fortran 95: call swap(x, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description Given two vectors x and y, the ?swap routines return vectors y and x swapped, each replacing the other. Input Parameters n INTEGER. Specifies the number of elements in vectors x and y. x REAL for sswap DOUBLE PRECISION for dswap COMPLEX for cswap DOUBLE COMPLEX for zswap Array, DIMENSION at least (1 + (n-1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. y REAL for sswap DOUBLE PRECISION for dswap COMPLEX for cswap DOUBLE COMPLEX for zswap Array, DIMENSION at least (1 + (n-1)*abs(incy)). incy INTEGER. Specifies the increment for the elements of y. 2 Intel® Math Kernel Library Reference Manual 70 Output Parameters x Contains the resultant vector x, that is, the input vector y. y Contains the resultant vector y, that is, the input vector x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine swap interface are the following: x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. i?amax Finds the index of the element with maximum absolute value. Syntax Fortran 77: index = isamax(n, x, incx) index = idamax(n, x, incx) index = icamax(n, x, incx) index = izamax(n, x, incx) Fortran 95: index = iamax(x) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description This function is declared in mkl_blas.fi for FORTRAN 77 interface, in blas.f90 for Fortran 95 interface, and in mkl_blas.h for C interface. Given a vector x, the i?amax functions return the position of the vector element x(i) that has the largest absolute value for real flavors, or the largest sum |Re(x(i))|+|Im(x(i))| for complex flavors. If n is not positive, 0 is returned. If more than one vector element is found with the same largest absolute value, the index of the first one encountered is returned. Input Parameters n INTEGER. Specifies the number of elements in vector x. x REAL for isamax DOUBLE PRECISION for idamax COMPLEX for icamax BLAS and Sparse BLAS Routines 2 71 DOUBLE COMPLEX for izamax Array, DIMENSION at least (1+(n-1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. Output Parameters index INTEGER. Contains the position of vector element x that has the largest absolute value. Fortran 95 Interface Notes Functions and routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the function iamax interface are the following: x Holds the vector with the number of elements n. i?amin Finds the index of the element with the smallest absolute value. Syntax Fortran 77: index = isamin(n, x, incx) index = idamin(n, x, incx) index = icamin(n, x, incx) index = izamin(n, x, incx) Fortran 95: index = iamin(x) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description This function is declared in mkl_blas.fi for FORTRAN 77 interface, in blas.f90 for Fortran 95 interface, and in mkl_blas.h for C interface. Given a vector x, the i?amin functions return the position of the vector element x(i) that has the smallest absolute value for real flavors, or the smallest sum |Re(x(i))|+|Im(x(i))| for complex flavors. If n is not positive, 0 is returned. If more than one vector element is found with the same smallest absolute value, the index of the first one encountered is returned. Input Parameters n INTEGER. On entry, n specifies the number of elements in vector x. x REAL for isamin 2 Intel® Math Kernel Library Reference Manual 72 DOUBLE PRECISION for idamin COMPLEX for icamin DOUBLE COMPLEX for izamin Array, DIMENSION at least (1+(n-1)*abs(incx)). incx INTEGER. Specifies the increment for the elements of x. Output Parameters index INTEGER. Contains the position of vector element x that has the smallest absolute value. Fortran 95 Interface Notes Functions and routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the function iamin interface are the following: x Holds the vector with the number of elements n. ?cabs1 Computes absolute value of complex number. Syntax Fortran 77: res = scabs1(z) res = dcabs1(z) Fortran 95: res = cabs1(z) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?cabs1 is an auxiliary routine for a few BLAS Level 1 routines. This routine performs an operation defined as res=|Re(z)|+|Im(z)|, where z is a scalar, and res is a value containing the absolute value of a complex number z. Input Parameters z COMPLEX scalar for scabs1. DOUBLE COMPLEX scalar for dcabs1. Output Parameters res REAL for scabs1. DOUBLE PRECISION for dcabs1. Contains the absolute value of a complex number z. BLAS and Sparse BLAS Routines 2 73 BLAS Level 2 Routines This section describes BLAS Level 2 routines, which perform matrix-vector operations. Table “BLAS Level 2 Routine Groups and Their Data Types” lists the BLAS Level 2 routine groups and the data types associated with them. BLAS Level 2 Routine Groups and Their Data Types Routine Groups Data Types Description ?gbmv s, d, c, z Matrix-vector product using a general band matrix gemv s, d, c, z Matrix-vector product using a general matrix ?ger s, d Rank-1 update of a general matrix ?gerc c, z Rank-1 update of a conjugated general matrix ?geru c, z Rank-1 update of a general matrix, unconjugated ?hbmv c, z Matrix-vector product using a Hermitian band matrix ?hemv c, z Matrix-vector product using a Hermitian matrix ?her c, z Rank-1 update of a Hermitian matrix ?her2 c, z Rank-2 update of a Hermitian matrix ?hpmv c, z Matrix-vector product using a Hermitian packed matrix ?hpr c, z Rank-1 update of a Hermitian packed matrix ?hpr2 c, z Rank-2 update of a Hermitian packed matrix ?sbmv s, d Matrix-vector product using symmetric band matrix ?spmv s, d Matrix-vector product using a symmetric packed matrix ?spr s, d Rank-1 update of a symmetric packed matrix ?spr2 s, d Rank-2 update of a symmetric packed matrix ?symv s, d Matrix-vector product using a symmetric matrix ?syr s, d Rank-1 update of a symmetric matrix ?syr2 s, d Rank-2 update of a symmetric matrix ?tbmv s, d, c, z Matrix-vector product using a triangular band matrix ?tbsv s, d, c, z Solution of a linear system of equations with a triangular band matrix ?tpmv s, d, c, z Matrix-vector product using a triangular packed matrix ?tpsv s, d, c, z Solution of a linear system of equations with a triangular packed matrix ?trmv s, d, c, z Matrix-vector product using a triangular matrix ?trsv s, d, c, z Solution of a linear system of equations with a triangular matrix 2 Intel® Math Kernel Library Reference Manual 74 ?gbmv Computes a matrix-vector product using a general band matrix Syntax Fortran 77: call sgbmv(trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) call dgbmv(trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) call cgbmv(trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) call zgbmv(trans, m, n, kl, ku, alpha, a, lda, x, incx, beta, y, incy) Fortran 95: call gbmv(a, x, y [,kl] [,m] [,alpha] [,beta] [,trans]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?gbmv routines perform a matrix-vector operation defined as y := alpha*A*x + beta*y, or y := alpha*A'*x + beta*y, or y := alpha *conjg(A')*x + beta*y, where: alpha and beta are scalars, x and y are vectors, A is an m-by-n band matrix, with kl sub-diagonals and ku super-diagonals. Input Parameters trans CHARACTER*1. Specifies the operation: If trans= 'N' or 'n', then y := alpha*A*x + beta*y If trans= 'T' or 't', then y := alpha*A'*x + beta*y If trans= 'C' or 'c', then y := alpha *conjg(A')*x + beta*y m INTEGER. Specifies the number of rows of the matrix A. The value of m must be at least zero. n INTEGER. Specifies the number of columns of the matrix A. The value of n must be at least zero. kl INTEGER. Specifies the number of sub-diagonals of the matrix A. The value of kl must satisfy 0 = kl. ku INTEGER. Specifies the number of super-diagonals of the matrix A. The value of ku must satisfy 0 = ku. BLAS and Sparse BLAS Routines 2 75 alpha REAL for sgbmv DOUBLE PRECISION for dgbmv COMPLEX for cgbmv DOUBLE COMPLEX for zgbmv Specifies the scalar alpha. a REAL for sgbmv DOUBLE PRECISION for dgbmv COMPLEX for cgbmv DOUBLE COMPLEX for zgbmv Array, DIMENSION (lda, n). Before entry, the leading (kl + ku + 1) by n part of the array a must contain the matrix of coefficients. This matrix must be supplied column-bycolumn, with the leading diagonal of the matrix in row (ku + 1) of the array, the first super-diagonal starting at position 2 in row ku, the first subdiagonal starting at position 1 in row (ku + 2), and so on. Elements in the array a that do not correspond to elements in the band matrix (such as the top left ku by ku triangle) are not referenced. The following program segment transfers a band matrix from conventional full matrix storage to band storage: do 20, j = 1, n k = ku + 1 - j do 10, i = max(1, j-ku), min(m, j+kl) a(k+i, j) = matrix(i,j) 10 continue 20 continue lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least (kl + ku + 1). x REAL for sgbmv DOUBLE PRECISION for dgbmv COMPLEX for cgbmv DOUBLE COMPLEX for zgbmv Array, DIMENSION at least (1 + (n - 1)*abs(incx)) when trans = 'N' or 'n', and at least (1 + (m - 1)*abs(incx)) otherwise. Before entry, the array x must contain the vector x. incx INTEGER. Specifies the increment for the elements of x. incx must not be zero. beta REAL for sgbmv DOUBLE PRECISION for dgbmv COMPLEX for cgbmv DOUBLE COMPLEX for zgbmv Specifies the scalar beta. When beta is equal to zero, then y need not be set on input. y REAL for sgbmv DOUBLE PRECISION for dgbmv COMPLEX for cgbmv DOUBLE COMPLEX for zgbmv Array, DIMENSION at least (1 +(m - 1)*abs(incy)) when trans = 'N' or 'n' and at least (1 +(n - 1)*abs(incy)) otherwise. Before entry, the incremented array y must contain the vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. 2 Intel® Math Kernel Library Reference Manual 76 Output Parameters y Updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine gbmv interface are the following: a Holds the array a of size (kl+ku+1, n). Contains a banded matrix m*nwith kl lower diagonal and ku upper diagonal. x Holds the vector with the number of elements rx, where rx = n if trans = 'N',rx = m otherwise. y Holds the vector with the number of elements ry, where ry = m if trans = 'N',ry = n otherwise. trans Must be 'N', 'C', or 'T'. The default value is 'N'. kl If omitted, assumed kl = ku, that is, the number of lower diagonals equals the number of the upper diagonals. ku Restored as ku = lda-kl-1, where lda is the leading dimension of matrix A. m If omitted, assumed m = n, that is, a square matrix. alpha The default value is 1. beta The default value is 0. ?gemv Computes a matrix-vector product using a general matrix Syntax Fortran 77: call sgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy) call dgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy) call cgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy) call zgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy) call scgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy) call dzgemv(trans, m, n, alpha, a, lda, x, incx, beta, y, incy) Fortran 95: call gemv(a, x, y [,alpha][,beta] [,trans]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h BLAS and Sparse BLAS Routines 2 77 Description The ?gemv routines perform a matrix-vector operation defined as y := alpha*A*x + beta*y, or y := alpha*A'*x + beta*y, or y := alpha*conjg(A')*x + beta*y, where: alpha and beta are scalars, x and y are vectors, A is an m-by-n matrix. Input Parameters trans CHARACTER*1. Specifies the operation: if trans= 'N' or 'n', then y := alpha*A*x + beta*y; if trans= 'T' or 't', then y := alpha*A'*x + beta*y; if trans= 'C' or 'c', then y := alpha *conjg(A')*x + beta*y. m INTEGER. Specifies the number of rows of the matrix A. The value of m must be at least zero. n INTEGER. Specifies the number of columns of the matrix A. The value of n must be at least zero. alpha REAL for sgemv DOUBLE PRECISION for dgemv COMPLEX for cgemv, scgemv DOUBLE COMPLEX for zgemv, dzgemv Specifies the scalar alpha. a REAL for sgemv, scgemv DOUBLE PRECISION for dgemv, dzgemv COMPLEX for cgemv DOUBLE COMPLEX for zgemv Array, DIMENSION (lda, n). Before entry, the leading m-by-n part of the array a must contain the matrix of coefficients. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, m). x REAL for sgemv DOUBLE PRECISION for dgemv COMPLEX for cgemv, scgemv DOUBLE COMPLEX for zgemv, dzgemv Array, DIMENSION at least (1+(n-1)*abs(incx)) when trans = 'N' or 'n' and at least (1+(m - 1)*abs(incx)) otherwise. Before entry, the incremented array x must contain the vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. beta REAL for sgemv DOUBLE PRECISION for dgemv COMPLEX for cgemv, scgemv 2 Intel® Math Kernel Library Reference Manual 78 DOUBLE COMPLEX for zgemv, dzgemv Specifies the scalar beta. When beta is set to zero, then y need not be set on input. y REAL for sgemv DOUBLE PRECISION for dgemv COMPLEX for cgemv, scgemv DOUBLE COMPLEX for zgemv, dzgemv Array, DIMENSION at least (1 +(m - 1)*abs(incy)) when trans = 'N' or 'n' and at least (1 +(n - 1)*abs(incy)) otherwise. Before entry with non-zero beta, the incremented array y must contain the vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. Output Parameters y Updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine gemv interface are the following: a Holds the matrix A of size (m,n). x Holds the vector with the number of elements rx where rx = n if trans = 'N', rx = m otherwise. y Holds the vector with the number of elements ry where ry = m if trans = 'N', ry = n otherwise. trans Must be 'N', 'C', or 'T'. The default value is 'N'. alpha The default value is 1. beta The default value is 0. ?ger Performs a rank-1 update of a general matrix. Syntax Fortran 77: call sger(m, n, alpha, x, incx, y, incy, a, lda) call dger(m, n, alpha, x, incx, y, incy, a, lda) Fortran 95: call ger(a, x, y [,alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h BLAS and Sparse BLAS Routines 2 79 Description The ?ger routines perform a matrix-vector operation defined as A := alpha*x*y'+ A, where: alpha is a scalar, x is an m-element vector, y is an n-element vector, A is an m-by-n general matrix. Input Parameters m INTEGER. Specifies the number of rows of the matrix A. The value of m must be at least zero. n INTEGER. Specifies the number of columns of the matrix A. The value of n must be at least zero. alpha REAL for sger DOUBLE PRECISION for dger Specifies the scalar alpha. x REAL for sger DOUBLE PRECISION for dger Array, DIMENSION at least (1 + (m - 1)*abs(incx)). Before entry, the incremented array x must contain the m-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. y REAL for sger DOUBLE PRECISION for dger Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. a REAL for sger DOUBLE PRECISION for dger Array, DIMENSION (lda, n). Before entry, the leading m-by-n part of the array a must contain the matrix of coefficients. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, m). Output Parameters a Overwritten by the updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine ger interface are the following: a Holds the matrix A of size (m,n). 2 Intel® Math Kernel Library Reference Manual 80 x Holds the vector with the number of elements m. y Holds the vector with the number of elements n. alpha The default value is 1. ?gerc Performs a rank-1 update (conjugated) of a general matrix. Syntax Fortran 77: call cgerc(m, n, alpha, x, incx, y, incy, a, lda) call zgerc(m, n, alpha, x, incx, y, incy, a, lda) Fortran 95: call gerc(a, x, y [,alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?gerc routines perform a matrix-vector operation defined as A := alpha*x*conjg(y') + A, where: alpha is a scalar, x is an m-element vector, y is an n-element vector, A is an m-by-n matrix. Input Parameters m INTEGER. Specifies the number of rows of the matrix A. The value of m must be at least zero. n INTEGER. Specifies the number of columns of the matrix A. The value of n must be at least zero. alpha COMPLEX for cgerc DOUBLE COMPLEX for zgerc Specifies the scalar alpha. x COMPLEX for cgerc DOUBLE COMPLEX for zgerc Array, DIMENSION at least (1 + (m - 1)*abs(incx)). Before entry, the incremented array x must contain the m-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. y COMPLEX for cgerc BLAS and Sparse BLAS Routines 2 81 DOUBLE COMPLEX for zgerc Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. a COMPLEX for cgerc DOUBLE COMPLEX for zgerc Array, DIMENSION (lda, n). Before entry, the leading m-by-n part of the array a must contain the matrix of coefficients. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, m). Output Parameters a Overwritten by the updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine gerc interface are the following: a Holds the matrix A of size (m,n). x Holds the vector with the number of elements m. y Holds the vector with the number of elements n. alpha The default value is 1. ?geru Performs a rank-1 update (unconjugated) of a general matrix. Syntax Fortran 77: call cgeru(m, n, alpha, x, incx, y, incy, a, lda) call zgeru(m, n, alpha, x, incx, y, incy, a, lda) Fortran 95: call geru(a, x, y [,alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?geru routines perform a matrix-vector operation defined as A := alpha*x*y ' + A, where: 2 Intel® Math Kernel Library Reference Manual 82 alpha is a scalar, x is an m-element vector, y is an n-element vector, A is an m-by-n matrix. Input Parameters m INTEGER. Specifies the number of rows of the matrix A. The value of m must be at least zero. n INTEGER. Specifies the number of columns of the matrix A. The value of n must be at least zero. alpha COMPLEX for cgeru DOUBLE COMPLEX for zgeru Specifies the scalar alpha. x COMPLEX for cgeru DOUBLE COMPLEX for zgeru Array, DIMENSION at least (1 + (m - 1)*abs(incx)). Before entry, the incremented array x must contain the m-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. y COMPLEX for cgeru DOUBLE COMPLEX for zgeru Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. a COMPLEX for cgeru DOUBLE COMPLEX for zgeru Array, DIMENSION (lda, n). Before entry, the leading m-by-n part of the array a must contain the matrix of coefficients. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, m). Output Parameters a Overwritten by the updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine geru interface are the following: a Holds the matrix A of size (m,n). x Holds the vector with the number of elements m. y Holds the vector with the number of elements n. alpha The default value is 1. BLAS and Sparse BLAS Routines 2 83 ?hbmv Computes a matrix-vector product using a Hermitian band matrix. Syntax Fortran 77: call chbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) call zhbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) Fortran 95: call hbmv(a, x, y [,uplo][,alpha] [,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?hbmv routines perform a matrix-vector operation defined as y := alpha*A*x + beta*y, where: alpha and beta are scalars, x and y are n-element vectors, A is an n-by-n Hermitian band matrix, with k super-diagonals. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the Hermitian band matrix A is used: If uplo = 'U' or 'u', then the upper triangular part of the matrix A is used. If uplo = 'L' or 'l', then the low triangular part of the matrix A is used. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. k INTEGER. Specifies the number of super-diagonals of the matrix A. The value of k must satisfy 0 = k. alpha COMPLEX for chbmv DOUBLE COMPLEX for zhbmv Specifies the scalar alpha. a COMPLEX for chbmv DOUBLE COMPLEX for zhbmv Array, DIMENSION (lda, n). Before entry with uplo = 'U' or 'u', the leading (k + 1) by n part of the array a must contain the upper triangular band part of the Hermitian matrix. The matrix must be supplied column-by-column, with the leading diagonal of the matrix in row (k + 1) of the array, the first super-diagonal starting at position 2 in row k, and so on. The top left k by k triangle of the array a is not referenced. 2 Intel® Math Kernel Library Reference Manual 84 The following program segment transfers the upper triangular part of a Hermitian band matrix from conventional full matrix storage to band storage: do 20, j = 1, n m = k + 1 - j do 10, i = max(1, j - k), j a(m + i, j) = matrix(i, j) 10 continue 20 continue Before entry with uplo = 'L' or 'l', the leading (k + 1) by n part of the array a must contain the lower triangular band part of the Hermitian matrix, supplied column-by-column, with the leading diagonal of the matrix in row 1 of the array, the first sub-diagonal starting at position 1 in row 2, and so on. The bottom right k by k triangle of the array a is not referenced. The following program segment transfers the lower triangular part of a Hermitian band matrix from conventional full matrix storage to band storage: do 20, j = 1, n m = 1 - j do 10, i = j, min( n, j + k ) a( m + i, j ) = matrix( i, j ) 10 continue 20 continue The imaginary parts of the diagonal elements need not be set and are assumed to be zero. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least (k + 1). x COMPLEX for chbmv DOUBLE COMPLEX for zhbmv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. beta COMPLEX for chbmv DOUBLE COMPLEX for zhbmv Specifies the scalar beta. y COMPLEX for chbmv DOUBLE COMPLEX for zhbmv Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. Output Parameters y Overwritten by the updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine hbmv interface are the following: a Holds the array a of size (k+1,n). BLAS and Sparse BLAS Routines 2 85 x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. beta The default value is 0. ?hemv Computes a matrix-vector product using a Hermitian matrix. Syntax Fortran 77: call chemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) call zhemv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) Fortran 95: call hemv(a, x, y [,uplo][,alpha] [,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?hemv routines perform a matrix-vector operation defined as y := alpha*A*x + beta*y, where: alpha and beta are scalars, x and y are n-element vectors, A is an n-by-n Hermitian matrix. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array a is used. If uplo = 'U' or 'u', then the upper triangular of the array a is used. If uplo = 'L' or 'l', then the low triangular of the array a is used. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha COMPLEX for chemv DOUBLE COMPLEX for zhemv Specifies the scalar alpha. a COMPLEX for chemv DOUBLE COMPLEX for zhemv Array, DIMENSION (lda, n). 2 Intel® Math Kernel Library Reference Manual 86 Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of a is not referenced. The imaginary parts of the diagonal elements need not be set and are assumed to be zero. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, n). x COMPLEX for chemv DOUBLE COMPLEX for zhemv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. beta COMPLEX for chemv DOUBLE COMPLEX for zhemv Specifies the scalar beta. When beta is supplied as zero then y need not be set on input. y COMPLEX for chemv DOUBLE COMPLEX for zhemv Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. Output Parameters y Overwritten by the updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine hemv interface are the following: a Holds the matrix A of size (n,n). x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. beta The default value is 0. ?her Performs a rank-1 update of a Hermitian matrix. Syntax Fortran 77: call cher(uplo, n, alpha, x, incx, a, lda) BLAS and Sparse BLAS Routines 2 87 call zher(uplo, n, alpha, x, incx, a, lda) Fortran 95: call her(a, x [,uplo] [, alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?her routines perform a matrix-vector operation defined as A := alpha*x*conjg(x') + A, where: alpha is a real scalar, x is an n-element vector, A is an n-by-n Hermitian matrix. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array a is used. If uplo = 'U' or 'u', then the upper triangular of the array a is used. If uplo = 'L' or 'l', then the low triangular of the array a is used. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha REAL for cher DOUBLE PRECISION for zher Specifies the scalar alpha. x COMPLEX for cher DOUBLE COMPLEX for zher Array, dimension at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. a COMPLEX for cher DOUBLE COMPLEX for zher Array, DIMENSION (lda, n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of a is not referenced. The imaginary parts of the diagonal elements need not be set and are assumed to be zero. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, n). 2 Intel® Math Kernel Library Reference Manual 88 Output Parameters a With uplo = 'U' or 'u', the upper triangular part of the array a is overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', the lower triangular part of the array a is overwritten by the lower triangular part of the updated matrix. The imaginary parts of the diagonal elements are set to zero. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine her interface are the following: a Holds the matrix A of size (n,n). x Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. ?her2 Performs a rank-2 update of a Hermitian matrix. Syntax Fortran 77: call cher2(uplo, n, alpha, x, incx, y, incy, a, lda) call zher2(uplo, n, alpha, x, incx, y, incy, a, lda) Fortran 95: call her2(a, x, y [,uplo][,alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?her2 routines perform a matrix-vector operation defined as A := alpha *x*conjg(y') + conjg(alpha)*y *conjg(x') + A, where: alpha is a scalar, x and y are n-element vectors, A is an n-by-n Hermitian matrix. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array a is used. If uplo = 'U' or 'u', then the upper triangular of the array a is used. BLAS and Sparse BLAS Routines 2 89 If uplo = 'L' or 'l', then the low triangular of the array a is used. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha COMPLEX for cher2 DOUBLE COMPLEX for zher2 Specifies the scalar alpha. x COMPLEX for cher2 DOUBLE COMPLEX for zher2 Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. y COMPLEX for cher2 DOUBLE COMPLEX for zher2 Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. a COMPLEX for cher2 DOUBLE COMPLEX for zher2 Array, DIMENSION (lda, n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of a is not referenced. The imaginary parts of the diagonal elements need not be set and are assumed to be zero. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, n). Output Parameters a With uplo = 'U' or 'u', the upper triangular part of the array a is overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', the lower triangular part of the array a is overwritten by the lower triangular part of the updated matrix. The imaginary parts of the diagonal elements are set to zero. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine her2 interface are the following: a Holds the matrix A of size (n,n). x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. 2 Intel® Math Kernel Library Reference Manual 90 ?hpmv Computes a matrix-vector product using a Hermitian packed matrix. Syntax Fortran 77: call chpmv(uplo, n, alpha, ap, x, incx, beta, y, incy) call zhpmv(uplo, n, alpha, ap, x, incx, beta, y, incy) Fortran 95: call hpmv(ap, x, y [,uplo][,alpha] [,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?hpmv routines perform a matrix-vector operation defined as y := alpha*A*x + beta*y, where: alpha and beta are scalars, x and y are n-element vectors, A is an n-by-n Hermitian matrix, supplied in packed form. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the matrix A is supplied in the packed array ap. If uplo = 'U' or 'u', then the upper triangular part of the matrix A is supplied in the packed array ap . If uplo = 'L' or 'l', then the low triangular part of the matrix A is supplied in the packed array ap . n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha COMPLEX for chpmv DOUBLE COMPLEX for zhpmv Specifies the scalar alpha. ap COMPLEX for chpmv DOUBLE COMPLEX for zhpmv Array, DIMENSION at least ((n*(n + 1))/2). Before entry with uplo = 'U' or 'u', the array ap must contain the upper triangular part of the Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains a(1, 1), ap(2) and ap(3) contain a(1, 2) and a(2, 2) respectively, and so on. Before entry with uplo = 'L' or 'l', the array ap must contain the lower triangular part of the Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains a(1, 1), ap(2) and ap(3) contain a(2, 1) and a(3, 1) respectively, and so on. BLAS and Sparse BLAS Routines 2 91 The imaginary parts of the diagonal elements need not be set and are assumed to be zero. x COMPLEX for chpmv DOUBLE PRECISION COMPLEX for zhpmv Array, DIMENSION at least (1 +(n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. beta COMPLEX for chpmv DOUBLE COMPLEX for zhpmv Specifies the scalar beta. When beta is equal to zero then y need not be set on input. y COMPLEX for chpmv DOUBLE COMPLEX for zhpmv Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. Output Parameters y Overwritten by the updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine hpmv interface are the following: ap Holds the array ap of size (n*(n+1)/2). x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. beta The default value is 0. ?hpr Performs a rank-1 update of a Hermitian packed matrix. Syntax Fortran 77: call chpr(uplo, n, alpha, x, incx, ap) call zhpr(uplo, n, alpha, x, incx, ap) Fortran 95: call hpr(ap, x [,uplo] [, alpha]) Include Files • FORTRAN 77: mkl_blas.fi 2 Intel® Math Kernel Library Reference Manual 92 • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?hpr routines perform a matrix-vector operation defined as A := alpha*x*conjg(x') + A, where: alpha is a real scalar, x is an n-element vector, A is an n-by-n Hermitian matrix, supplied in packed form. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the matrix A is supplied in the packed array ap. If uplo = 'U' or 'u', the upper triangular part of the matrix A is supplied in the packed array ap . If uplo = 'L' or 'l', the low triangular part of the matrix A is supplied in the packed array ap . n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha REAL for chpr DOUBLE PRECISION for zhpr Specifies the scalar alpha. x COMPLEX for chpr DOUBLE COMPLEX for zhpr Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. incx must not be zero. ap COMPLEX for chpr DOUBLE COMPLEX for zhpr Array, DIMENSION at least ((n*(n + 1))/2). Before entry with uplo = 'U' or 'u', the array ap must contain the upper triangular part of the Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains a(1, 1), ap(2) and ap(3) contain a(1, 2) and a(2, 2) respectively, and so on. Before entry with uplo = 'L' or 'l', the array ap must contain the lower triangular part of the Hermitian matrix packed sequentially, column-bycolumn, so that ap(1) contains a(1, 1), ap(2) and ap(3) contain a(2, 1) and a(3, 1) respectively, and so on. The imaginary parts of the diagonal elements need not be set and are assumed to be zero. Output Parameters ap With uplo = 'U' or 'u', overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', overwritten by the lower triangular part of the updated matrix. The imaginary parts of the diagonal elements are set to zero. BLAS and Sparse BLAS Routines 2 93 Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine hpr interface are the following: ap Holds the array ap of size (n*(n+1)/2). x Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. ?hpr2 Performs a rank-2 update of a Hermitian packed matrix. Syntax Fortran 77: call chpr2(uplo, n, alpha, x, incx, y, incy, ap) call zhpr2(uplo, n, alpha, x, incx, y, incy, ap) Fortran 95: call hpr2(ap, x, y [,uplo][,alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?hpr2 routines perform a matrix-vector operation defined as A := alpha*x*conjg(y') + conjg(alpha)*y*conjg(x') + A, where: alpha is a scalar, x and y are n-element vectors, A is an n-by-n Hermitian matrix, supplied in packed form. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the matrix A is supplied in the packed array ap. If uplo = 'U' or 'u', then the upper triangular part of the matrix A is supplied in the packed array ap . If uplo = 'L' or 'l', then the low triangular part of the matrix A is supplied in the packed array ap . n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha COMPLEX for chpr2 2 Intel® Math Kernel Library Reference Manual 94 DOUBLE COMPLEX for zhpr2 Specifies the scalar alpha. x COMPLEX for chpr2 DOUBLE COMPLEX for zhpr2 Array, dimension at least (1 +(n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. y COMPLEX for chpr2 DOUBLE COMPLEX for zhpr2 Array, DIMENSION at least (1 +(n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. ap COMPLEX for chpr2 DOUBLE COMPLEX for zhpr2 Array, DIMENSION at least ((n*(n + 1))/2). Before entry with uplo = 'U' or 'u', the array ap must contain the upper triangular part of the Hermitian matrix packed sequentially, column-by-column, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a(1,2) and a(2,2) respectively, and so on. Before entry with uplo = 'L' or 'l', the array ap must contain the lower triangular part of the Hermitian matrix packed sequentially, column-bycolumn, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a(2,1) and a(3,1) respectively, and so on. The imaginary parts of the diagonal elements need not be set and are assumed to be zero. Output Parameters ap With uplo = 'U' or 'u', overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', overwritten by the lower triangular part of the updated matrix. The imaginary parts of the diagonal elements need are set to zero. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine hpr2 interface are the following: ap Holds the array ap of size (n*(n+1)/2). x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. ?sbmv Computes a matrix-vector product using a symmetric band matrix. BLAS and Sparse BLAS Routines 2 95 Syntax Fortran 77: call ssbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) call dsbmv(uplo, n, k, alpha, a, lda, x, incx, beta, y, incy) Fortran 95: call sbmv(a, x, y [,uplo][,alpha] [,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?sbmv routines perform a matrix-vector operation defined as y := alpha*A*x + beta*y, where: alpha and beta are scalars, x and y are n-element vectors, A is an n-by-n symmetric band matrix, with k super-diagonals. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the band matrix A is used: if uplo = 'U' or 'u' - upper triangular part; if uplo = 'L' or 'l' - low triangular part. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. k INTEGER. Specifies the number of super-diagonals of the matrix A. The value of k must satisfy 0 = k. alpha REAL for ssbmv DOUBLE PRECISION for dsbmv Specifies the scalar alpha. a REAL for ssbmv DOUBLE PRECISION for dsbmv Array, DIMENSION (lda, n). Before entry with uplo = 'U' or 'u', the leading (k + 1) by n part of the array a must contain the upper triangular band part of the symmetric matrix, supplied column-by-column, with the leading diagonal of the matrix in row (k + 1) of the array, the first superdiagonal starting at position 2 in row k, and so on. The top left k by k triangle of the array a is not referenced. The following program segment transfers the upper triangular part of a symmetric band matrix from conventional full matrix storage to band storage: do 20, j = 1, n m = k + 1 - j do 10, i = max( 1, j - k ), j 2 Intel® Math Kernel Library Reference Manual 96 a( m + i, j ) = matrix( i, j ) 10 continue 20 continue Before entry with uplo = 'L' or 'l', the leading (k + 1) by n part of the array a must contain the lower triangular band part of the symmetric matrix, supplied column-by-column, with the leading diagonal of the matrix in row 1 of the array, the first sub-diagonal starting at position 1 in row 2, and so on. The bottom right k by k triangle of the array a is not referenced. The following program segment transfers the lower triangular part of a symmetric band matrix from conventional full matrix storage to band storage: do 20, j = 1, n m = 1 - j do 10, i = j, min( n, j + k ) a( m + i, j ) = matrix( i, j ) 10 continue 20 continue lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least (k + 1). x REAL for ssbmv DOUBLE PRECISION for dsbmv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. beta REAL for ssbmv DOUBLE PRECISION for dsbmv Specifies the scalar beta. y REAL for ssbmv DOUBLE PRECISION for dsbmv Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. Output Parameters y Overwritten by the updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine sbmv interface are the following: a Holds the array a of size (k+1,n). x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. beta The default value is 0. BLAS and Sparse BLAS Routines 2 97 ?spmv Computes a matrix-vector product using a symmetric packed matrix. Syntax Fortran 77: call sspmv(uplo, n, alpha, ap, x, incx, beta, y, incy) call dspmv(uplo, n, alpha, ap, x, incx, beta, y, incy) Fortran 95: call spmv(ap, x, y [,uplo][,alpha] [,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?spmv routines perform a matrix-vector operation defined as y := alpha*A*x + beta*y, where: alpha and beta are scalars, x and y are n-element vectors, A is an n-by-n symmetric matrix, supplied in packed form. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the matrix A is supplied in the packed array ap. If uplo = 'U' or 'u', then the upper triangular part of the matrix A is supplied in the packed array ap . If uplo = 'L' or 'l', then the low triangular part of the matrix A is supplied in the packed array ap . n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha REAL for sspmv DOUBLE PRECISION for dspmv Specifies the scalar alpha. ap REAL for sspmv DOUBLE PRECISION for dspmv Array, DIMENSION at least ((n*(n + 1))/2). Before entry with uplo = 'U' or 'u', the array ap must contain the upper triangular part of the symmetric matrix packed sequentially, column-bycolumn, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a(1,2) and a(2, 2) respectively, and so on. Before entry with uplo = 'L' or 'l', the array ap must contain the lower triangular part of the symmetric 2 Intel® Math Kernel Library Reference Manual 98 matrix packed sequentially, column-by-column, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a(2,1) and a(3,1) respectively, and so on. x REAL for sspmv DOUBLE PRECISION for dspmv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. beta REAL for sspmv DOUBLE PRECISION for dspmv Specifies the scalar beta. When beta is supplied as zero, then y need not be set on input. y REAL for sspmv DOUBLE PRECISION for dspmv Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. Output Parameters y Overwritten by the updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine spmv interface are the following: ap Holds the array ap of size (n*(n+1)/2). x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. beta The default value is 0. ?spr Performs a rank-1 update of a symmetric packed matrix. Syntax Fortran 77: call sspr(uplo, n, alpha, x, incx, ap) call dspr(uplo, n, alpha, x, incx, ap) Fortran 95: call spr(ap, x [,uplo] [, alpha]) BLAS and Sparse BLAS Routines 2 99 Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?spr routines perform a matrix-vector operation defined as a:= alpha*x*x'+ A, where: alpha is a real scalar, x is an n-element vector, A is an n-by-n symmetric matrix, supplied in packed form. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the matrix A is supplied in the packed array ap. If uplo = 'U' or 'u', then the upper triangular part of the matrix A is supplied in the packed array ap . If uplo = 'L' or 'l', then the low triangular part of the matrix A is supplied in the packed array ap . n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha REAL for sspr DOUBLE PRECISION for dspr Specifies the scalar alpha. x REAL for sspr DOUBLE PRECISION for dspr Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. ap REAL for sspr DOUBLE PRECISION for dspr Array, DIMENSION at least ((n*(n + 1))/2). Before entry with uplo = 'U' or 'u', the array ap must contain the upper triangular part of the symmetric matrix packed sequentially, column-by-column, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a(1,2) and a(2,2) respectively, and so on. Before entry with uplo = 'L' or 'l', the array ap must contain the lower triangular part of the symmetric matrix packed sequentially, column-bycolumn, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a(2,1) and a(3,1) respectively, and so on. Output Parameters ap With uplo = 'U' or 'u', overwritten by the upper triangular part of the updated matrix. 2 Intel® Math Kernel Library Reference Manual 100 With uplo = 'L' or 'l', overwritten by the lower triangular part of the updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine spr interface are the following: ap Holds the array ap of size (n*(n+1)/2). x Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. ?spr2 Performs a rank-2 update of a symmetric packed matrix. Syntax Fortran 77: call sspr2(uplo, n, alpha, x, incx, y, incy, ap) call dspr2(uplo, n, alpha, x, incx, y, incy, ap) Fortran 95: call spr2(ap, x, y [,uplo][,alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?spr2 routines perform a matrix-vector operation defined as A:= alpha*x*y'+ alpha*y*x' + A, where: alpha is a scalar, x and y are n-element vectors, A is an n-by-n symmetric matrix, supplied in packed form. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the matrix A is supplied in the packed array ap. If uplo = 'U' or 'u', then the upper triangular part of the matrix A is supplied in the packed array ap . If uplo = 'L' or 'l', then the low triangular part of the matrix A is supplied in the packed array ap . BLAS and Sparse BLAS Routines 2 101 n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha REAL for sspr2 DOUBLE PRECISION for dspr2 Specifies the scalar alpha. x REAL for sspr2 DOUBLE PRECISION for dspr2 Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. y REAL for sspr2 DOUBLE PRECISION for dspr2 Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. ap REAL for sspr2 DOUBLE PRECISION for dspr2 Array, DIMENSION at least ((n*(n + 1))/2). Before entry with uplo = 'U' or 'u', the array ap must contain the upper triangular part of the symmetric matrix packed sequentially, column-by-column, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a(1,2) and a(2,2) respectively, and so on. Before entry with uplo = 'L' or 'l', the array ap must contain the lower triangular part of the symmetric matrix packed sequentially, column-bycolumn, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a (2,1) and a(3,1) respectively, and so on. Output Parameters ap With uplo = 'U' or 'u', overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', overwritten by the lower triangular part of the updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine spr2 interface are the following: ap Holds the array ap of size (n*(n+1)/2). x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. ?symv Computes a matrix-vector product for a symmetric matrix. 2 Intel® Math Kernel Library Reference Manual 102 Syntax Fortran 77: call ssymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) call dsymv(uplo, n, alpha, a, lda, x, incx, beta, y, incy) Fortran 95: call symv(a, x, y [,uplo][,alpha] [,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?symv routines perform a matrix-vector operation defined as y := alpha*A*x + beta*y, where: alpha and beta are scalars, x and y are n-element vectors, A is an n-by-n symmetric matrix. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array a is used. If uplo = 'U' or 'u', then the upper triangular part of the array a is used. If uplo = 'L' or 'l', then the low triangular part of the array a is used. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha REAL for ssymv DOUBLE PRECISION for dsymv Specifies the scalar alpha. a REAL for ssymv DOUBLE PRECISION for dsymv Array, DIMENSION (lda, n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular part of the symmetric matrix A and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular part of the symmetric matrix A and the strictly upper triangular part of a is not referenced. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, n). x REAL for ssymv DOUBLE PRECISION for dsymv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. BLAS and Sparse BLAS Routines 2 103 incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. beta REAL for ssymv DOUBLE PRECISION for dsymv Specifies the scalar beta. When beta is supplied as zero, then y need not be set on input. y REAL for ssymv DOUBLE PRECISION for dsymv Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. Output Parameters y Overwritten by the updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine symv interface are the following: a Holds the matrix A of size (n,n). x Holds the vector with the number of elements n. y Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. beta The default value is 0. ?syr Performs a rank-1 update of a symmetric matrix. Syntax Fortran 77: call ssyr(uplo, n, alpha, x, incx, a, lda) call dsyr(uplo, n, alpha, x, incx, a, lda) Fortran 95: call syr(a, x [,uplo] [, alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?syr routines perform a matrix-vector operation defined as A := alpha*x*x' + A , 2 Intel® Math Kernel Library Reference Manual 104 where: alpha is a real scalar, x is an n-element vector, A is an n-by-n symmetric matrix. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array a is used. If uplo = 'U' or 'u', then the upper triangular part of the array a is used. If uplo = 'L' or 'l', then the low triangular part of the array a is used. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha REAL for ssyr DOUBLE PRECISION for dsyr Specifies the scalar alpha. x REAL for ssyr DOUBLE PRECISION for dsyr Array, DIMENSION at least (1 + (n-1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. a REAL for ssyr DOUBLE PRECISION for dsyr Array, DIMENSION (lda, n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular part of the symmetric matrix A and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular part of the symmetric matrix A and the strictly upper triangular part of a is not referenced. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, n). Output Parameters a With uplo = 'U' or 'u', the upper triangular part of the array a is overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', the lower triangular part of the array a is overwritten by the lower triangular part of the updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine syr interface are the following: a Holds the matrix A of size (n,n). x Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. BLAS and Sparse BLAS Routines 2 105 ?syr2 Performs a rank-2 update of symmetric matrix. Syntax Fortran 77: call ssyr2(uplo, n, alpha, x, incx, y, incy, a, lda) call dsyr2(uplo, n, alpha, x, incx, y, incy, a, lda) Fortran 95: call syr2(a, x, y [,uplo][,alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?syr2 routines perform a matrix-vector operation defined as A := alpha*x*y'+ alpha*y*x' + A, where: alpha is a scalar, x and y are n-element vectors, A is an n-by-n symmetric matrix. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array a is used. If uplo = 'U' or 'u', then the upper triangular part of the array a is used. If uplo = 'L' or 'l', then the low triangular part of the array a is used. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. alpha REAL for ssyr2 DOUBLE PRECISION for dsyr2 Specifies the scalar alpha. x REAL for ssyr2 DOUBLE PRECISION for dsyr2 Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. y REAL for ssyr2 DOUBLE PRECISION for dsyr2 Array, DIMENSION at least (1 + (n - 1)*abs(incy)). Before entry, the incremented array y must contain the n-element vector y. incy INTEGER. Specifies the increment for the elements of y. The value of incy must not be zero. 2 Intel® Math Kernel Library Reference Manual 106 a REAL for ssyr2 DOUBLE PRECISION for dsyr2 Array, DIMENSION (lda, n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of a is not referenced. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, n). Output Parameters a With uplo = 'U' or 'u', the upper triangular part of the array a is overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', the lower triangular part of the array a is overwritten by the lower triangular part of the updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine syr2 interface are the following: a Holds the matrix A of size (n,n). x Holds the vector x of length n. y Holds the vector y of length n. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. ?tbmv Computes a matrix-vector product using a triangular band matrix. Syntax Fortran 77: call stbmv(uplo, trans, diag, n, k, a, lda, x, incx) call dtbmv(uplo, trans, diag, n, k, a, lda, x, incx) call ctbmv(uplo, trans, diag, n, k, a, lda, x, incx) call ztbmv(uplo, trans, diag, n, k, a, lda, x, incx) Fortran 95: call tbmv(a, x [,uplo] [, trans] [,diag]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h BLAS and Sparse BLAS Routines 2 107 Description The ?tbmv routines perform one of the matrix-vector operations defined as x := A*x, or x := A'*x, or x := conjg(A')*x, where: x is an n-element vector, A is an n-by-n unit, or non-unit, upper or lower triangular band matrix, with (k +1) diagonals. Input Parameters uplo CHARACTER*1. Specifies whether the matrix A is an upper or lower triangular matrix: if uplo = 'U' or 'u', then the matrix is upper triangular; if uplo = 'L' or 'l', then the matrix is low triangular. trans CHARACTER*1. Specifies the operation: if trans = 'N' or 'n', then x := A*x; if trans = 'T' or 't', then x := A'*x; if trans = 'C' or 'c', then x := conjg(A')*x. diag CHARACTER*1. Specifies whether the matrix A is unit triangular: if diag = 'U' or 'u' then the matrix is unit triangular; if diag = 'N' or 'n', then the matrix is not unit triangular. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. k INTEGER. On entry with uplo = 'U' or 'u', k specifies the number of super-diagonals of the matrix A. On entry with uplo = 'L' or 'l', k specifies the number of sub-diagonals of the matrix a. The value of k must satisfy 0 = k. a REAL for stbmv DOUBLE PRECISION for dtbmv COMPLEX for ctbmv DOUBLE COMPLEX for ztbmv Array, DIMENSION (lda, n). Before entry with uplo = 'U' or 'u', the leading (k + 1) by n part of the array a must contain the upper triangular band part of the matrix of coefficients, supplied column-by-column, with the leading diagonal of the matrix in row (k + 1) of the array, the first super-diagonal starting at position 2 in row k, and so on. The top left k by k triangle of the array a is not referenced. The following program segment transfers an upper triangular band matrix from conventional full matrix storage to band storage: do 20, j = 1, n m = k + 1 - j do 10, i = max(1, j - k), j a(m + i, j) = matrix(i, j) 10 continue 20 continue Before entry with uplo = 'L' or 'l', the leading (k + 1) by n part of the array a must contain the lower triangular band part of the matrix of coefficients, supplied column-by-column, with the leading diagonal of the matrix in row1 of the array, the first sub-diagonal starting at position 1 in 2 Intel® Math Kernel Library Reference Manual 108 row 2, and so on. The bottom right k by k triangle of the array a is not referenced. The following program segment transfers a lower triangular band matrix from conventional full matrix storage to band storage: do 20, j = 1, n m = 1 - j do 10, i = j, min(n, j + k) a(m + i, j) = matrix (i, j) 10 continue 20 continue Note that when diag = 'U' or 'u', the elements of the array a corresponding to the diagonal elements of the matrix are not referenced, but are assumed to be unity. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least (k + 1). x REAL for stbmv DOUBLE PRECISION for dtbmv COMPLEX for ctbmv DOUBLE COMPLEX for ztbmv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. Output Parameters x Overwritten with the transformed vector x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine tbmv interface are the following: a Holds the array a of size (k+1,n). x Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N', 'C', or 'T'. The default value is 'N'. diag Must be 'N' or 'U'. The default value is 'N'. ?tbsv Solves a system of linear equations whose coefficients are in a triangular band matrix. Syntax Fortran 77: call stbsv(uplo, trans, diag, n, k, a, lda, x, incx) call dtbsv(uplo, trans, diag, n, k, a, lda, x, incx) call ctbsv(uplo, trans, diag, n, k, a, lda, x, incx) call ztbsv(uplo, trans, diag, n, k, a, lda, x, incx) BLAS and Sparse BLAS Routines 2 109 Fortran 95: call tbsv(a, x [,uplo] [, trans] [,diag]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?tbsv routines solve one of the following systems of equations: A*x = b, or A'*x = b, or conjg(A')*x = b, where: b and x are n-element vectors, A is an n-by-n unit, or non-unit, upper or lower triangular band matrix, with (k + 1) diagonals. The routine does not test for singularity or near-singularity. Such tests must be performed before calling this routine. Input Parameters uplo CHARACTER*1. Specifies whether the matrix A is an upper or lower triangular matrix: if uplo = 'U' or 'u' the matrix is upper triangular; if uplo = 'L' or 'l', the matrix is low triangular. trans CHARACTER*1. Specifies the system of equations: if trans = 'N' or 'n', then A*x = b; if trans = 'T' or 't', then A'*x = b; if trans = 'C' or 'c', then conjg(A')*x = b. diag CHARACTER*1. Specifies whether the matrix A is unit triangular: if diag = 'U' or 'u' then the matrix is unit triangular; if diag = 'N' or 'n', then the matrix is not unit triangular. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. k INTEGER. On entry with uplo = 'U' or 'u', k specifies the number of super-diagonals of the matrix A. On entry with uplo = 'L' or 'l', k specifies the number of sub-diagonals of the matrix A. The value of k must satisfy 0 = k. a REAL for stbsv DOUBLE PRECISION for dtbsv COMPLEX for ctbsv DOUBLE COMPLEX for ztbsv Array, DIMENSION (lda, n). Before entry with uplo = 'U' or 'u', the leading (k + 1) by n part of the array a must contain the upper triangular band part of the matrix of coefficients, supplied column-by-column, with the leading diagonal of the matrix in row (k + 1) of the array, the first super-diagonal starting at position 2 in row k, and so on. The top left k by k triangle of the array a is not referenced. 2 Intel® Math Kernel Library Reference Manual 110 The following program segment transfers an upper triangular band matrix from conventional full matrix storage to band storage: do 20, j = 1, n m = k + 1 - j do 10, i = max(1, j - k), jl a(m + i, j) = matrix (i, j) 10 continue 20 continue Before entry with uplo = 'L' or 'l', the leading (k + 1) by n part of the array a must contain the lower triangular band part of the matrix of coefficients, supplied column-by-column, with the leading diagonal of the matrix in row 1 of the array, the first sub-diagonal starting at position 1 in row 2, and so on. The bottom right k by k triangle of the array a is not referenced. The following program segment transfers a lower triangular band matrix from conventional full matrix storage to band storage: do 20, j = 1, n m = 1 - j do 10, i = j, min(n, j + k) a(m + i, j) = matrix (i, j) 10 continue 20 continue When diag = 'U' or 'u', the elements of the array a corresponding to the diagonal elements of the matrix are not referenced, but are assumed to be unity. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least (k + 1). x REAL for stbsv DOUBLE PRECISION for dtbsv COMPLEX for ctbsv DOUBLE COMPLEX for ztbsv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element right-hand side vector b. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. Output Parameters x Overwritten with the solution vector x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine tbsv interface are the following: a Holds the array a of size (k+1,n). x Holds the vector with the number of elements n. BLAS and Sparse BLAS Routines 2 111 uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N', 'C', or 'T'. The default value is 'N'. diag Must be 'N' or 'U'. The default value is 'N'. ?tpmv Computes a matrix-vector product using a triangular packed matrix. Syntax Fortran 77: call stpmv(uplo, trans, diag, n, ap, x, incx) call dtpmv(uplo, trans, diag, n, ap, x, incx) call ctpmv(uplo, trans, diag, n, ap, x, incx) call ztpmv(uplo, trans, diag, n, ap, x, incx) Fortran 95: call tpmv(ap, x [,uplo] [, trans] [,diag]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?tpmv routines perform one of the matrix-vector operations defined as x := A*x, or x := A'*x, or x := conjg(A')*x, where: x is an n-element vector, A is an n-by-n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. Input Parameters uplo CHARACTER*1. Specifies whether the matrix A is upper or lower triangular: if uplo = 'U' or 'u', then the matrix is upper triangular; if uplo = 'L' or 'l', then the matrix is low triangular. trans CHARACTER*1. Specifies the operation: if trans = 'N' or 'n', then x := A*x; if trans = 'T' or 't', then x := A'*x; if trans = 'C' or 'c', then x := conjg(A')*x. diag CHARACTER*1. Specifies whether the matrix A is unit triangular: if diag = 'U' or 'u' then the matrix is unit triangular; if diag = 'N' or 'n', then the matrix is not unit triangular. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. ap REAL for stpmv 2 Intel® Math Kernel Library Reference Manual 112 DOUBLE PRECISION for dtpmv COMPLEX for ctpmv DOUBLE COMPLEX for ztpmv Array, DIMENSION at least ((n*(n + 1))/2). Before entry with uplo = 'U' or 'u', the array ap must contain the upper triangular matrix packed sequentially, column-by-column, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a(1,2) and a(2,2) respectively, and so on. Before entry with uplo = 'L' or 'l', the array ap must contain the lower triangular matrix packed sequentially, column-by-column, so that ap(1) contains a(1,1), ap(2) and ap(3) contain a(2,1) and a(3,1) respectively, and so on. When diag = 'U' or 'u', the diagonal elements of a are not referenced, but are assumed to be unity. x REAL for stpmv DOUBLE PRECISION for dtpmv COMPLEX for ctpmv DOUBLE COMPLEX for ztpmv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. Output Parameters x Overwritten with the transformed vector x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine tpmv interface are the following: ap Holds the array ap of size (n*(n+1)/2). x Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N', 'C', or 'T'. The default value is 'N'. diag Must be 'N' or 'U'. The default value is 'N'. ?tpsv Solves a system of linear equations whose coefficients are in a triangular packed matrix. Syntax Fortran 77: call stpsv(uplo, trans, diag, n, ap, x, incx) call dtpsv(uplo, trans, diag, n, ap, x, incx) call ctpsv(uplo, trans, diag, n, ap, x, incx) call ztpsv(uplo, trans, diag, n, ap, x, incx) BLAS and Sparse BLAS Routines 2 113 Fortran 95: call tpsv(ap, x [,uplo] [, trans] [,diag]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?tpsv routines solve one of the following systems of equations A*x = b, or A'*x = b, or conjg(A')*x = b, where: b and x are n-element vectors, A is an n-by-n unit, or non-unit, upper or lower triangular matrix, supplied in packed form. This routine does not test for singularity or near-singularity. Such tests must be performed before calling this routine. Input Parameters uplo CHARACTER*1. Specifies whether the matrix A is upper or lower triangular: if uplo = 'U' or 'u', then the matrix is upper triangular; if uplo = 'L' or 'l', then the matrix is low triangular. trans CHARACTER*1. Specifies the system of equations: if trans = 'N' or 'n', then A*x = b; if trans = 'T' or 't', then A'*x = b; if trans = 'C' or 'c', then conjg(A')*x = b. diag CHARACTER*1. Specifies whether the matrix A is unit triangular: if diag = 'U' or 'u' then the matrix is unit triangular; if diag = 'N' or 'n', then the matrix is not unit triangular. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. ap REAL for stpsv DOUBLE PRECISION for dtpsv COMPLEX for ctpsv DOUBLE COMPLEX for ztpsv Array, DIMENSION at least ((n*(n + 1))/2). Before entry with uplo = 'U' or 'u', the array ap must contain the upper triangular matrix packed sequentially, column-by-column, so that ap(1) contains a(1, +1), ap(2) and ap(3) contain a(1, 2) and a(2, 2) respectively, and so on. Before entry with uplo = 'L' or 'l', the array ap must contain the lower triangular matrix packed sequentially, column-by-column, so that ap(1) contains a(1, +1), ap(2) and ap(3) contain a(2, +1) and a(3, +1) respectively, and so on. When diag = 'U' or 'u', the diagonal elements of a are not referenced, but are assumed to be unity. x REAL for stpsv DOUBLE PRECISION for dtpsv COMPLEX for ctpsv 2 Intel® Math Kernel Library Reference Manual 114 DOUBLE COMPLEX for ztpsv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element right-hand side vector b. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. Output Parameters x Overwritten with the solution vector x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine tpsv interface are the following: ap Holds the array ap of size (n*(n+1)/2). x Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N', 'C', or 'T'. The default value is 'N'. diag Must be 'N' or 'U'. The default value is 'N'. ?trmv Computes a matrix-vector product using a triangular matrix. Syntax Fortran 77: call strmv(uplo, trans, diag, n, a, lda, x, incx) call dtrmv(uplo, trans, diag, n, a, lda, x, incx) call ctrmv(uplo, trans, diag, n, a, lda, x, incx) call ztrmv(uplo, trans, diag, n, a, lda, x, incx) Fortran 95: call trmv(a, x [,uplo] [, trans] [,diag]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?trmv routines perform one of the following matrix-vector operations defined as x := A*x, or x := A'*x, or x := conjg(A')*x, where: x is an n-element vector, A is an n-by-n unit, or non-unit, upper or lower triangular matrix. BLAS and Sparse BLAS Routines 2 115 Input Parameters uplo CHARACTER*1. Specifies whether the matrix A is upper or lower triangular: if uplo = 'U' or 'u', then the matrix is upper triangular; if uplo = 'L' or 'l', then the matrix is low triangular. trans CHARACTER*1. Specifies the operation: if trans = 'N' or 'n', then x := A*x; if trans = 'T' or 't', then x := A'*x; if trans = 'C' or 'c', then x := conjg(A')*x. diag CHARACTER*1. Specifies whether the matrix A is unit triangular: if diag = 'U' or 'u' then the matrix is unit triangular; if diag = 'N' or 'n', then the matrix is not unit triangular. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. a REAL for strmv DOUBLE PRECISION for dtrmv COMPLEX for ctrmv DOUBLE COMPLEX for ztrmv Array, DIMENSION (lda,n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular matrix and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular matrix and the strictly upper triangular part of a is not referenced. When diag = 'U' or 'u', the diagonal elements of a are not referenced either, but are assumed to be unity. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, n). x REAL for strmv DOUBLE PRECISION for dtrmv COMPLEX for ctrmv DOUBLE COMPLEX for ztrmv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element vector x. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. Output Parameters x Overwritten with the transformed vector x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine trmv interface are the following: a Holds the matrix A of size (n,n). x Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N', 'C', or 'T'. 2 Intel® Math Kernel Library Reference Manual 116 The default value is 'N'. diag Must be 'N' or 'U'. The default value is 'N'. ?trsv Solves a system of linear equations whose coefficients are in a triangular matrix. Syntax Fortran 77: call strsv(uplo, trans, diag, n, a, lda, x, incx) call dtrsv(uplo, trans, diag, n, a, lda, x, incx) call ctrsv(uplo, trans, diag, n, a, lda, x, incx) call ztrsv(uplo, trans, diag, n, a, lda, x, incx) Fortran 95: call trsv(a, x [,uplo] [, trans] [,diag]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?trsv routines solve one of the systems of equations: A*x = b, or A'*x = b, or conjg(A')*x = b, where: b and x are n-element vectors, A is an n-by-n unit, or non-unit, upper or lower triangular matrix. The routine does not test for singularity or near-singularity. Such tests must be performed before calling this routine. Input Parameters uplo CHARACTER*1. Specifies whether the matrix A is upper or lower triangular: if uplo = 'U' or 'u', then the matrix is upper triangular; if uplo = 'L' or 'l', then the matrix is low triangular. trans CHARACTER*1. Specifies the systems of equations: if trans = 'N' or 'n', then A*x = b; if trans = 'T' or 't', then A'*x = b; if trans = 'C' or 'c', then oconjg(A')*x = b. diag CHARACTER*1. Specifies whether the matrix A is unit triangular: if diag = 'U' or 'u' then the matrix is unit triangular; if diag = 'N' or 'n', then the matrix is not unit triangular. n INTEGER. Specifies the order of the matrix A. The value of n must be at least zero. a REAL for strsv BLAS and Sparse BLAS Routines 2 117 DOUBLE PRECISION for dtrsv COMPLEX for ctrsv DOUBLE COMPLEX for ztrsv Array, DIMENSION (lda,n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular matrix and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular matrix and the strictly upper triangular part of a is not referenced. When diag = 'U' or 'u', the diagonal elements of a are not referenced either, but are assumed to be unity. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. The value of lda must be at least max(1, n). x REAL for strsv DOUBLE PRECISION for dtrsv COMPLEX for ctrsv DOUBLE COMPLEX for ztrsv Array, DIMENSION at least (1 + (n - 1)*abs(incx)). Before entry, the incremented array x must contain the n-element right-hand side vector b. incx INTEGER. Specifies the increment for the elements of x. The value of incx must not be zero. Output Parameters x Overwritten with the solution vector x. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine trsv interface are the following: a Holds the matrix a of size (n,n). x Holds the vector with the number of elements n. uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N', 'C', or 'T'. The default value is 'N'. diag Must be 'N' or 'U'. The default value is 'N'. BLAS Level 3 Routines BLAS Level 3 routines perform matrix-matrix operations. Table “BLAS Level 3 Routine Groups and Their Data Types” lists the BLAS Level 3 routine groups and the data types associated with them. BLAS Level 3 Routine Groups and Their Data Types Routine Group Data Types Description ?gemm s, d, c, z Matrix-matrix product of general matrices ?hemm c, z Matrix-matrix product of Hermitian matrices ?herk c, z Rank-k update of Hermitian matrices 2 Intel® Math Kernel Library Reference Manual 118 Routine Group Data Types Description ?her2k c, z Rank-2k update of Hermitian matrices ?symm s, d, c, z Matrix-matrix product of symmetric matrices ?syrk s, d, c, z Rank-k update of symmetric matrices ?syr2k s, d, c, z Rank-2k update of symmetric matrices ?trmm s, d, c, z Matrix-matrix product of triangular matrices ?trsm s, d, c, z Linear matrix-matrix solution for triangular matrices Symmetric Multiprocessing Version of Intel® MKL Many applications spend considerable time executing BLAS routines. This time can be scaled by the number of processors available on the system through using the symmetric multiprocessing (SMP) feature built into the Intel MKL Library. The performance enhancements based on the parallel use of the processors are available without any programming effort on your part. To enhance performance, the library uses the following methods: • The BLAS functions are blocked where possible to restructure the code in a way that increases the localization of data reference, enhances cache memory use, and reduces the dependency on the memory bus. • The code is distributed across the processors to maximize parallelism. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 ?gemm Computes a scalar-matrix-matrix product and adds the result to a scalar-matrix product. Syntax Fortran 77: call sgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) call dgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) call cgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) call zgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) call scgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) call dzgemm(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc) Fortran 95: call gemm(a, b, c [,transa][,transb] [,alpha][,beta]) BLAS and Sparse BLAS Routines 2 119 Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?gemm routines perform a matrix-matrix operation with general matrices. The operation is defined as C := alpha*op(A)*op(B) + beta*C, where: op(x) is one of op(x) = x, or op(x) = x', or op(x) = conjg(x'), alpha and beta are scalars, A, B and C are matrices: op(A) is an m-by-k matrix, op(B) is a k-by-n matrix, C is an m-by-n matrix. See also ?gemm3m, BLAS-like extension routines, that use matrix multiplication for similar matrix-matrix operations. Input Parameters transa CHARACTER*1. Specifies the form of op(A) used in the matrix multiplication: if transa = 'N' or 'n', then op(A) = A; if transa = 'T' or 't', then op(A) = A'; if transa = 'C' or 'c', then op(A) = conjg(A'). transb CHARACTER*1. Specifies the form of op(B) used in the matrix multiplication: if transb = 'N' or 'n', then op(B) = B; if transb = 'T' or 't', then op(B) = B'; if transb = 'C' or 'c', then op(B) = conjg(B'). m INTEGER. Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero. n INTEGER. Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero. k INTEGER. Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero. alpha REAL for sgemm DOUBLE PRECISION for dgemm COMPLEX for cgemm, scgemm DOUBLE COMPLEX for zgemm, dzgemm Specifies the scalar alpha. a REAL for sgemm, scgemm DOUBLE PRECISION for dgemm, dzgemm COMPLEX for cgemm DOUBLE COMPLEX for zgemm 2 Intel® Math Kernel Library Reference Manual 120 Array, DIMENSION (lda, ka), where ka is k when transa = 'N' or 'n', and is m otherwise. Before entry with transa = 'N' or 'n', the leading mby- k part of the array a must contain the matrix A, otherwise the leading kby- m part of the array a must contain the matrix A. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. When transa = 'N' or 'n', then lda must be at least max(1, m), otherwise lda must be at least max(1, k). b REAL for sgemm DOUBLE PRECISION for dgemm COMPLEX for cgemm, scgemm DOUBLE COMPLEX for zgemm, dzgemm Array, DIMENSION (ldb, kb), where kb is n when transb = 'N' or 'n', and is k otherwise. Before entry with transb = 'N' or 'n', the leading kby- n part of the array b must contain the matrix B, otherwise the leading nby- k part of the array b must contain the matrix B. ldb INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program. When transb = 'N' or 'n', then ldb must be at least max(1, k), otherwise ldb must be at least max(1, n). beta REAL for sgemm DOUBLE PRECISION for dgemm COMPLEX for cgemm, scgemm DOUBLE COMPLEX for zgemm, dzgemm Specifies the scalar beta. When beta is equal to zero, then c need not be set on input. c REAL for sgemm DOUBLE PRECISION for dgemm COMPLEX for cgemm, scgemm DOUBLE COMPLEX for zgemm, dzgemm Array, DIMENSION (ldc, n). Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry. ldc INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program. The value of ldc must be at least max(1, m). Output Parameters c Overwritten by the m-by-n matrix (alpha*op(A)*op(B) + beta*C). Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine gemm interface are the following: a Holds the matrix A of size (ma,ka) where ka = k if transa= 'N', ka = m otherwise, ma = m if transa= 'N', ma = k otherwise. b Holds the matrix B of size (mb,kb) where BLAS and Sparse BLAS Routines 2 121 kb = n if transb = 'N', kb = k otherwise, mb = k if transb = 'N', mb = n otherwise. c Holds the matrix C of size (m,n). transa Must be 'N', 'C', or 'T'. The default value is 'N'. transb Must be 'N', 'C', or 'T'. The default value is 'N'. alpha The default value is 1. beta The default value is 0. ?hemm Computes a scalar-matrix-matrix product (either one of the matrices is Hermitian) and adds the result to scalar-matrix product. Syntax Fortran 77: call chemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) call zhemm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) Fortran 95: call hemm(a, b, c [,side][,uplo] [,alpha][,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?hemm routines perform a matrix-matrix operation using Hermitian matrices. The operation is defined as C := alpha*A*B + beta*C or C := alpha*B*A + beta*C, where: alpha and beta are scalars, A is an Hermitian matrix, B and C are m-by-n matrices. Input Parameters side CHARACTER*1. Specifies whether the Hermitian matrix A appears on the left or right in the operation as follows: if side = 'L' or 'l', then C := alpha*A*B + beta*C; if side = 'R' or 'r', then C := alpha*B*A + beta*C. 2 Intel® Math Kernel Library Reference Manual 122 uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the Hermitian matrix A is used: If uplo = 'U' or 'u', then the upper triangular part of the Hermitian matrix A is used. If uplo = 'L' or 'l', then the low triangular part of the Hermitian matrix A is used. m INTEGER. Specifies the number of rows of the matrix C. The value of m must be at least zero. n INTEGER. Specifies the number of columns of the matrix C. The value of n must be at least zero. alpha COMPLEX for chemm DOUBLE COMPLEX for zhemm Specifies the scalar alpha. a COMPLEX for chemm DOUBLE COMPLEX for zhemm Array, DIMENSION (lda,ka), where ka is m when side = 'L' or 'l' and is n otherwise. Before entry with side = 'L' or 'l', the m-by-m part of the array a must contain the Hermitian matrix, such that when uplo = 'U' or 'u', the leading m-by-m upper triangular part of the array a must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of a is not referenced, and when uplo = 'L' or 'l', the leading m-by-m lower triangular part of the array a must contain the lower triangular part of the Hermitian matrix, and the strictly upper triangular part of a is not referenced. Before entry with side = 'R' or 'r', the n-by-n part of the array a must contain the Hermitian matrix, such that when uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of a is not referenced, and when uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular part of the Hermitian matrix, and the strictly upper triangular part of a is not referenced. The imaginary parts of the diagonal elements need not be set, they are assumed to be zero. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub) program. When side = 'L' or 'l' then lda must be at least max(1, m), otherwise lda must be at least max(1,n). b COMPLEX for chemm DOUBLE COMPLEX for zhemm Array, DIMENSION (ldb,n). Before entry, the leading m-by-n part of the array b must contain the matrix B. ldb INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program. The value of ldb must be at least max(1, m). beta COMPLEX for chemm DOUBLE COMPLEX for zhemm Specifies the scalar beta. When beta is supplied as zero, then c need not be set on input. c COMPLEX for chemm DOUBLE COMPLEX for zhemm BLAS and Sparse BLAS Routines 2 123 Array, DIMENSION (c, n). Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is zero, in which case c need not be set on entry. ldc INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program. The value of ldc must be at least max(1, m). Output Parameters c Overwritten by the m-by-n updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine hemm interface are the following: a Holds the matrix A of size (k,k) where k = m if side = 'L', k = n otherwise. b Holds the matrix B of size (m,n). c Holds the matrix C of size (m,n). side Must be 'L' or 'R'. The default value is 'L'. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. beta The default value is 0. ?herk Performs a rank-k update of a Hermitian matrix. Syntax Fortran 77: call cherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) call zherk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) Fortran 95: call herk(a, c [,uplo] [, trans] [,alpha][,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?herk routines perform a matrix-matrix operation using Hermitian matrices. The operation is defined as C := alpha*A*conjg(A') + beta*C, or C := alpha*conjg(A')*A + beta*C, where: 2 Intel® Math Kernel Library Reference Manual 124 alpha and beta are real scalars, C is an n-by-n Hermitian matrix, A is an n-by-k matrix in the first case and a k-by-n matrix in the second case. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array c is used. If uplo = 'U' or 'u', then the upper triangular part of the array c is used. If uplo = 'L' or 'l', then the low triangular part of the array c is used. trans CHARACTER*1. Specifies the operation: if trans = 'N' or 'n', then C:= alpha*A*conjg(A')+beta*C; if trans = 'C' or 'c', then C:= alpha*conjg(A')*A+beta*C. n INTEGER. Specifies the order of the matrix C. The value of n must be at least zero. k INTEGER. With trans = 'N' or 'n', k specifies the number of columns of the matrix A, and with trans = 'C' or 'c', k specifies the number of rows of the matrix A. The value of k must be at least zero. alpha REAL for cherk DOUBLE PRECISION for zherk Specifies the scalar alpha. a COMPLEX for cherk DOUBLE COMPLEX for zherk Array, DIMENSION (lda, ka), where ka is k when trans = 'N' or 'n', and is n otherwise. Before entry with trans = 'N' or 'n', the leading nby- k part of the array a must contain the matrix a, otherwise the leading kby- n part of the array a must contain the matrix A. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. When trans = 'N' or 'n', then lda must be at least max(1, n), otherwise lda must be at least max(1, k). beta REAL for cherk DOUBLE PRECISION for zherk Specifies the scalar beta. c COMPLEX for cherk DOUBLE COMPLEX for zherk Array, DIMENSION (ldc,n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array c must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of c is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array c must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of c is not referenced. The imaginary parts of the diagonal elements need not be set, they are assumed to be zero. ldc INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program. The value of ldc must be at least max(1, n). BLAS and Sparse BLAS Routines 2 125 Output Parameters c With uplo = 'U' or 'u', the upper triangular part of the array c is overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', the lower triangular part of the array c is overwritten by the lower triangular part of the updated matrix. The imaginary parts of the diagonal elements are set to zero. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine herk interface are the following: a Holds the matrix A of size (ma,ka) where ka = k if transa= 'N', ka = n otherwise, ma = n if transa= 'N', ma = k otherwise. c Holds the matrix C of size (n,n). uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N' or 'C'. The default value is 'N'. alpha The default value is 1. beta The default value is 0. ?her2k Performs a rank-2k update of a Hermitian matrix. Syntax Fortran 77: call cher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) call zher2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) Fortran 95: call her2k(a, b, c [,uplo][,trans] [,alpha][,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?her2k routines perform a rank-2k matrix-matrix operation using Hermitian matrices. The operation is defined as C := alpha*A*conjg(B') + conjg(alpha)*B*conjg(A') + beta*C, or C := alpha *conjg(B')*A + conjg(alpha) *conjg(A')*B + beta*C, where: 2 Intel® Math Kernel Library Reference Manual 126 alpha is a scalar and beta is a real scalar, C is an n-by-n Hermitian matrix, A and B are n-by-k matrices in the first case and k-by-n matrices in the second case. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array c is used. If uplo = 'U' or 'u', then the upper triangular of the array c is used. If uplo = 'L' or 'l', then the low triangular of the array c is used. trans CHARACTER*1. Specifies the operation: if trans = 'N' or 'n', then C:=alpha*A*conjg(B') + alpha*B*conjg(A') + beta*C; if trans = 'C' or 'c', then C:=alpha*conjg(A')*B + alpha*conjg(B')*A + beta*C. n INTEGER. Specifies the order of the matrix C. The value of n must be at least zero. k INTEGER. With trans = 'N' or 'n', k specifies the number of columns of the matrix A, and with trans = 'C' or 'c', k specifies the number of rows of the matrix A. The value of k must be at least equal to zero. alpha COMPLEX for cher2k DOUBLE COMPLEX for zher2k Specifies the scalar alpha. a COMPLEX for cher2k DOUBLE COMPLEX for zher2k Array, DIMENSION (lda, ka), where ka is k when trans = 'N' or 'n', and is n otherwise. Before entry with trans = 'N' or 'n', the leading nby- k part of the array a must contain the matrix A, otherwise the leading kby- n part of the array a must contain the matrix A. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. When trans = 'N' or 'n', then lda must be at least max(1, n), otherwise lda must be at least max(1, k). beta REAL for cher2k DOUBLE PRECISION for zher2k Specifies the scalar beta. b COMPLEX for cher2k DOUBLE COMPLEX for zher2k Array, DIMENSION (ldb, kb), where kb is k when trans = 'N' or 'n', and is n otherwise. Before entry with trans = 'N' or 'n', the leading nby- k part of the array b must contain the matrix B, otherwise the leading kby- n part of the array b must contain the matrix B. ldb INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program. When trans = 'N' or 'n', then ldb must be at least max(1, n), otherwise ldb must be at least max(1, k). c COMPLEX for cher2k DOUBLE COMPLEX for zher2k Array, DIMENSION (ldc,n). BLAS and Sparse BLAS Routines 2 127 Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array c must contain the upper triangular part of the Hermitian matrix and the strictly lower triangular part of c is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array c must contain the lower triangular part of the Hermitian matrix and the strictly upper triangular part of c is not referenced. The imaginary parts of the diagonal elements need not be set, they are assumed to be zero. ldc INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program. The value of ldc must be at least max(1, n). Output Parameters c With uplo = 'U' or 'u', the upper triangular part of the array c is overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', the lower triangular part of the array c is overwritten by the lower triangular part of the updated matrix. The imaginary parts of the diagonal elements are set to zero. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine her2k interface are the following: a Holds the matrix A of size (ma,ka) where ka = k if trans = 'N', ka = n otherwise, ma = n if trans = 'N', ma = k otherwise. b Holds the matrix B of size (mb,kb) where kb = k if trans = 'N', kb = n otherwise, mb = n if trans = 'N', mb = k otherwise. c Holds the matrix C of size (n,n). uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N' or 'C'. The default value is 'N'. alpha The default value is 1. beta The default value is 0. ?symm Performs a scalar-matrix-matrix product (one matrix operand is symmetric) and adds the result to a scalarmatrix product. Syntax Fortran 77: call ssymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) call dsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) call csymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) 2 Intel® Math Kernel Library Reference Manual 128 call zsymm(side, uplo, m, n, alpha, a, lda, b, ldb, beta, c, ldc) Fortran 95: call symm(a, b, c [,side][,uplo] [,alpha][,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?symm routines perform a matrix-matrix operation using symmetric matrices. The operation is defined as C := alpha*A*B + beta*C, or C := alpha*B*A + beta*C, where: alpha and beta are scalars, A is a symmetric matrix, B and C are m-by-n matrices. Input Parameters side CHARACTER*1. Specifies whether the symmetric matrix A appears on the left or right in the operation: if side = 'L' or 'l', then C := alpha*A*B + beta*C; if side = 'R' or 'r', then C := alpha*B*A + beta*C. uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the symmetric matrix A is used: if uplo = 'U' or 'u', then the upper triangular part is used; if uplo = 'L' or 'l', then the lower triangular part is used. m INTEGER. Specifies the number of rows of the matrix C. The value of m must be at least zero. n INTEGER. Specifies the number of columns of the matrix C. The value of n must be at least zero. alpha REAL for ssymm DOUBLE PRECISION for dsymm COMPLEX for csymm DOUBLE COMPLEX for zsymm Specifies the scalar alpha. a REAL for ssymm DOUBLE PRECISION for dsymm COMPLEX for csymm DOUBLE COMPLEX for zsymm Array, DIMENSION (lda, ka), where ka is m when side = 'L' or 'l' and is n otherwise. Before entry with side = 'L' or 'l', the m-by-m part of the array a must contain the symmetric matrix, such that when uplo = 'U' or 'u', the leading m-by-m upper triangular part of the array a must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part BLAS and Sparse BLAS Routines 2 129 of a is not referenced, and when uplo = 'L' or 'l', the leading m-by-m lower triangular part of the array a must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of a is not referenced. Before entry with side = 'R' or 'r', the n-by-n part of the array a must contain the symmetric matrix, such that when uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array a must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of a is not referenced, and when uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array a must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of a is not referenced. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. When side = 'L' or 'l' then lda must be at least max(1, m), otherwise lda must be at least max(1, n). b REAL for ssymm DOUBLE PRECISION for dsymm COMPLEX for csymm DOUBLE COMPLEX for zsymm Array, DIMENSION (ldb,n). Before entry, the leading m-by-n part of the array b must contain the matrix B. ldb INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program. The value of ldb must be at least max(1, m). beta REAL for ssymm DOUBLE PRECISION for dsymm COMPLEX for csymm DOUBLE COMPLEX for zsymm Specifies the scalar beta. When beta is set to zero, then c need not be set on input. c REAL for ssymm DOUBLE PRECISION for dsymm COMPLEX for csymm DOUBLE COMPLEX for zsymm Array, DIMENSION (ldc,n). Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is zero, in which case c need not be set on entry. ldc INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program. The value of ldc must be at least max(1, m). Output Parameters c Overwritten by the m-by-n updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine symm interface are the following: a Holds the matrix A of size (k,k) where k = m if side = 'L', k = n otherwise. 2 Intel® Math Kernel Library Reference Manual 130 b Holds the matrix B of size (m,n). c Holds the matrix C of size (m,n). side Must be 'L' or 'R'. The default value is 'L'. uplo Must be 'U' or 'L'. The default value is 'U'. alpha The default value is 1. beta The default value is 0. ?syrk Performs a rank-n update of a symmetric matrix. Syntax Fortran 77: call ssyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) call dsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) call csyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) call zsyrk(uplo, trans, n, k, alpha, a, lda, beta, c, ldc) Fortran 95: call syrk(a, c [,uplo] [, trans] [,alpha][,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?syrk routines perform a matrix-matrix operation using symmetric matrices. The operation is defined as C := alpha*A*A' + beta*C, or C := alpha*A'*A + beta*C, where: alpha and beta are scalars, C is an n-by-n symmetric matrix, A is an n-by-k matrix in the first case and a k-by-n matrix in the second case. Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array c is used. If uplo = 'U' or 'u', then the upper triangular part of the array c is used. If uplo = 'L' or 'l', then the low triangular part of the array c is used. trans CHARACTER*1. Specifies the operation: if trans = 'N' or 'n', then C := alpha*A*A' + beta*C; if trans = 'T' or 't', then C := alpha*A'*A + beta*C; if trans = 'C' or 'c', then C := alpha*A'*A + beta*C. BLAS and Sparse BLAS Routines 2 131 n INTEGER. Specifies the order of the matrix C. The value of n must be at least zero. k INTEGER. On entry with trans = 'N' or 'n', k specifies the number of columns of the matrix a, and on entry with trans = 'T' or 't' or 'C' or 'c', k specifies the number of rows of the matrix a. The value of k must be at least zero. alpha REAL for ssyrk DOUBLE PRECISION for dsyrk COMPLEX for csyrk DOUBLE COMPLEX for zsyrk Specifies the scalar alpha. a REAL for ssyrk DOUBLE PRECISION for dsyrk COMPLEX for csyrk DOUBLE COMPLEX for zsyrk Array, DIMENSION (lda,ka), where ka is k when trans = 'N' or 'n', and is n otherwise. Before entry with trans = 'N' or 'n', the leading nby- k part of the array a must contain the matrix A, otherwise the leading kby- n part of the array a must contain the matrix A. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. When trans = 'N' or 'n', then lda must be at least max(1,n), otherwise lda must be at least max(1, k). beta REAL for ssyrk DOUBLE PRECISION for dsyrk COMPLEX for csyrk DOUBLE COMPLEX for zsyrk Specifies the scalar beta. c REAL for ssyrk DOUBLE PRECISION for dsyrk COMPLEX for csyrk DOUBLE COMPLEX for zsyrk Array, DIMENSION (ldc,n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array c must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of c is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array c must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of c is not referenced. ldc INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program. The value of ldc must be at least max(1, n). Output Parameters c With uplo = 'U' or 'u', the upper triangular part of the array c is overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', the lower triangular part of the array c is overwritten by the lower triangular part of the updated matrix. 2 Intel® Math Kernel Library Reference Manual 132 Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine syrk interface are the following: a Holds the matrix A of size (ma,ka) where ka = k if transa= 'N', ka = n otherwise, ma = n if transa= 'N', ma = k otherwise. c Holds the matrix C of size (n,n). uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N', 'C', or 'T'. The default value is 'N'. alpha The default value is 1. beta The default value is 0. ?syr2k Performs a rank-2k update of a symmetric matrix. Syntax Fortran 77: call ssyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) call dsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) call csyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) call zsyr2k(uplo, trans, n, k, alpha, a, lda, b, ldb, beta, c, ldc) Fortran 95: call syr2k(a, b, c [,uplo][,trans] [,alpha][,beta]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?syr2k routines perform a rank-2k matrix-matrix operation using symmetric matrices. The operation is defined as C := alpha*A*B' + alpha*B*A' + beta*C, or C := alpha*A'*B + alpha*B'*A + beta*C, where: alpha and beta are scalars, C is an n-by-n symmetric matrix, A and B are n-by-k matrices in the first case, and k-by-n matrices in the second case. BLAS and Sparse BLAS Routines 2 133 Input Parameters uplo CHARACTER*1. Specifies whether the upper or lower triangular part of the array c is used. If uplo = 'U' or 'u', then the upper triangular part of the array c is used. If uplo = 'L' or 'l', then the low triangular part of the array c is used. trans CHARACTER*1. Specifies the operation: if trans = 'N' or 'n', then C := alpha*A*B'+alpha*B*A'+beta*C; if trans = 'T' or 't', then C := alpha*A'*B +alpha*B'*A +beta*C; if trans = 'C' or 'c', then C := alpha*A'*B +alpha*B'*A +beta*C. n INTEGER. Specifies the order of the matrix C. The value of n must be at least zero. k INTEGER. On entry with trans = 'N' or 'n', k specifies the number of columns of the matrices A and B, and on entry with trans = 'T' or 't' or 'C' or 'c', k specifies the number of rows of the matrices A and B. The value of k must be at least zero. alpha REAL for ssyr2k DOUBLE PRECISION for dsyr2k COMPLEX for csyr2k DOUBLE COMPLEX for zsyr2k Specifies the scalar alpha. a REAL for ssyr2k DOUBLE PRECISION for dsyr2k COMPLEX for csyr2k DOUBLE COMPLEX for zsyr2k Array, DIMENSION (lda,ka), where ka is k when trans = 'N' or 'n', and is n otherwise. Before entry with trans = 'N' or 'n', the leading nby- k part of the array a must contain the matrix A, otherwise the leading kby- n part of the array a must contain the matrix A. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. When trans = 'N' or 'n', then lda must be at least max(1, n), otherwise lda must be at least max(1, k). b REAL for ssyr2k DOUBLE PRECISION for dsyr2k COMPLEX for csyr2k DOUBLE COMPLEX for zsyr2k Array, DIMENSION (ldb, kb) where kb is k when trans = 'N' or 'n' and is 'n' otherwise. Before entry with trans = 'N' or 'n', the leading n-byk part of the array b must contain the matrix B, otherwise the leading k-byn part of the array b must contain the matrix B. ldb INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. When trans = 'N' or 'n', then ldb must be at least max(1, n), otherwise ldb must be at least max(1, k). beta REAL for ssyr2k DOUBLE PRECISION for dsyr2k COMPLEX for csyr2k DOUBLE COMPLEX for zsyr2k Specifies the scalar beta. c REAL for ssyr2k DOUBLE PRECISION for dsyr2k 2 Intel® Math Kernel Library Reference Manual 134 COMPLEX for csyr2k DOUBLE COMPLEX for zsyr2k Array, DIMENSION (ldc,n). Before entry with uplo = 'U' or 'u', the leading n-by-n upper triangular part of the array c must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of c is not referenced. Before entry with uplo = 'L' or 'l', the leading n-by-n lower triangular part of the array c must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of c is not referenced. ldc INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program. The value of ldc must be at least max(1, n). Output Parameters c With uplo = 'U' or 'u', the upper triangular part of the array c is overwritten by the upper triangular part of the updated matrix. With uplo = 'L' or 'l', the lower triangular part of the array c is overwritten by the lower triangular part of the updated matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine syr2k interface are the following: a Holds the matrix A of size (ma,ka) where ka = k if trans = 'N', ka = n otherwise, ma = n if trans = 'N', ma = k otherwise. b Holds the matrix B of size (mb,kb) where kb = k if trans = 'N', kb = n otherwise, mb = n if trans = 'N', mb = k otherwise. c Holds the matrix C of size (n,n). uplo Must be 'U' or 'L'. The default value is 'U'. trans Must be 'N', 'C', or 'T'. The default value is 'N'. alpha The default value is 1. beta The default value is 0. ?trmm Computes a scalar-matrix-matrix product (one matrix operand is triangular). Syntax Fortran 77: call strmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) call dtrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) BLAS and Sparse BLAS Routines 2 135 call ctrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) call ztrmm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) Fortran 95: call trmm(a, b [,side] [, uplo] [,transa][,diag] [,alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?trmm routines perform a matrix-matrix operation using triangular matrices. The operation is defined as B := alpha*op(A)*B or B := alpha*B*op(A) where: alpha is a scalar, B is an m-by-n matrix, A is a unit, or non-unit, upper or lower triangular matrix op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A'). Input Parameters side CHARACTER*1. Specifies whether op(A) appears on the left or right of B in the operation: if side = 'L' or 'l', then B := alpha*op(A)*B; if side = 'R' or 'r', then B := alpha*B*op(A). uplo CHARACTER*1. Specifies whether the matrix A is upper or lower triangular: if uplo = 'U' or 'u', then the matrix is upper triangular; if uplo = 'L' or 'l', then the matrix is low triangular. transa CHARACTER*1. Specifies the form of op(A) used in the matrix multiplication: if transa = 'N' or 'n', then op(A) = A; if transa = 'T' or 't', then op(A) = A'; if transa = 'C' or 'c', then op(A) = conjg(A'). diag CHARACTER*1. Specifies whether the matrix A is unit triangular: if diag = 'U' or 'u' then the matrix is unit triangular; if diag = 'N' or 'n', then the matrix is not unit triangular. m INTEGER. Specifies the number of rows of B. The value of m must be at least zero. n INTEGER. Specifies the number of columns of B. The value of n must be at least zero. alpha REAL for strmm DOUBLE PRECISION for dtrmm COMPLEX for ctrmm DOUBLE COMPLEX for ztrmm Specifies the scalar alpha. 2 Intel® Math Kernel Library Reference Manual 136 When alpha is zero, then a is not referenced and b need not be set before entry. a REAL for strmm DOUBLE PRECISION for dtrmm COMPLEX for ctrmm DOUBLE COMPLEX for ztrmm Array, DIMENSION (lda,k), where k is m when side = 'L' or 'l' and is n when side = 'R' or 'r'. Before entry with uplo = 'U' or 'u', the leading k by k upper triangular part of the array a must contain the upper triangular matrix and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading k by k lower triangular part of the array a must contain the lower triangular matrix and the strictly upper triangular part of a is not referenced. When diag = 'U' or 'u', the diagonal elements of a are not referenced either, but are assumed to be unity. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. When side = 'L' or 'l', then lda must be at least max(1, m), when side = 'R' or 'r', then lda must be at least max(1, n). b REAL for strmm DOUBLE PRECISION for dtrmm COMPLEX for ctrmm DOUBLE COMPLEX for ztrmm Array, DIMENSION (ldb,n). Before entry, the leading m-by-n part of the array b must contain the matrix B. ldb INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program. The value of ldb must be at least max(1, m). Output Parameters b Overwritten by the transformed matrix. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine trmm interface are the following: a Holds the matrix A of size (k,k) where k = m if side = 'L', k = n otherwise. b Holds the matrix B of size (m,n). side Must be 'L' or 'R'. The default value is 'L'. uplo Must be 'U' or 'L'. The default value is 'U'. transa Must be 'N', 'C', or 'T'. The default value is 'N'. diag Must be 'N' or 'U'. The default value is 'N'. alpha The default value is 1. BLAS and Sparse BLAS Routines 2 137 ?trsm Solves a matrix equation (one matrix operand is triangular). Syntax Fortran 77: call strsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) call dtrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) call ctrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) call ztrsm(side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb) Fortran 95: call trsm(a, b [,side] [, uplo] [,transa][,diag] [,alpha]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?trsm routines solve one of the following matrix equations: op(A)*X = alpha*B, or X*op(A) = alpha*B, where: alpha is a scalar, X and B are m-by-n matrices, A is a unit, or non-unit, upper or lower triangular matrix op(A) is one of op(A) = A, or op(A) = A', or op(A) = conjg(A'). The matrix B is overwritten by the solution matrix X. Input Parameters side CHARACTER*1. Specifies whether op(A) appears on the left or right of X in the equation: if side = 'L' or 'l', then op(A)*X = alpha*B; if side = 'R' or 'r', then X*op(A) = alpha*B. uplo CHARACTER*1. Specifies whether the matrix A is upper or lower triangular: if uplo = 'U' or 'u', then the matrix is upper triangular; if uplo = 'L' or 'l', then the matrix is low triangular. transa CHARACTER*1. Specifies the form of op(A) used in the matrix multiplication: if transa = 'N' or 'n', then op(A) = A; if transa = 'T' or 't', then op(A) = A'; if transa = 'C' or 'c', then op(A) = conjg(A'). diag CHARACTER*1. Specifies whether the matrix A is unit triangular: 2 Intel® Math Kernel Library Reference Manual 138 if diag = 'U' or 'u' then the matrix is unit triangular; if diag = 'N' or 'n', then the matrix is not unit triangular. m INTEGER. Specifies the number of rows of B. The value of m must be at least zero. n INTEGER. Specifies the number of columns of B. The value of n must be at least zero. alpha REAL for strsm DOUBLE PRECISION for dtrsm COMPLEX for ctrsm DOUBLE COMPLEX for ztrsm Specifies the scalar alpha. When alpha is zero, then a is not referenced and b need not be set before entry. a REAL for strsm DOUBLE PRECISION for dtrsm COMPLEX for ctrsm DOUBLE COMPLEX for ztrsm Array, DIMENSION (lda, k), where k is m when side = 'L' or 'l' and is n when side = 'R' or 'r'. Before entry with uplo = 'U' or 'u', the leading k by k upper triangular part of the array a must contain the upper triangular matrix and the strictly lower triangular part of a is not referenced. Before entry with uplo = 'L' or 'l', the leading k by k lower triangular part of the array a must contain the lower triangular matrix and the strictly upper triangular part of a is not referenced. When diag = 'U' or 'u', the diagonal elements of a are not referenced either, but are assumed to be unity. lda INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program. When side = 'L' or 'l', then lda must be at least max(1, m), when side = 'R' or 'r', then lda must be at least max(1, n). b REAL for strsm DOUBLE PRECISION for dtrsm COMPLEX for ctrsm DOUBLE COMPLEX for ztrsm Array, DIMENSION (ldb,n). Before entry, the leading m-by-n part of the array b must contain the right-hand side matrix B. ldb INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program. The value of ldb must be at least max(1, +m). Output Parameters b Overwritten by the solution matrix X. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine trsm interface are the following: a Holds the matrix A of size (k,k) where k = m if side = 'L', k = n otherwise. BLAS and Sparse BLAS Routines 2 139 b Holds the matrix B of size (m,n). side Must be 'L' or 'R'. The default value is 'L'. uplo Must be 'U' or 'L'. The default value is 'U'. transa Must be 'N', 'C', or 'T'. The default value is 'N'. diag Must be 'N' or 'U'. The default value is 'N'. alpha The default value is 1. Sparse BLAS Level 1 Routines This section describes Sparse BLAS Level 1, an extension of BLAS Level 1 included in the Intel® Math Kernel Library beginning with the Intel MKL release 2.1. Sparse BLAS Level 1 is a group of routines and functions that perform a number of common vector operations on sparse vectors stored in compressed form. Sparse vectors are those in which the majority of elements are zeros. Sparse BLAS routines and functions are specially implemented to take advantage of vector sparsity. This allows you to achieve large savings in computer time and memory. If nz is the number of non-zero vector elements, the computer time taken by Sparse BLAS operations will be O(nz). Vector Arguments Compressed sparse vectors. Let a be a vector stored in an array, and assume that the only non-zero elements of a are the following: a(k1), a (k2), a (k3) . . . a(knz), where nz is the total number of non-zero elements in a. In Sparse BLAS, this vector can be represented in compressed form by two FORTRAN arrays, x (values) and indx (indices). Each array has nz elements: x(1)=a(k1), x(2)=a(k2), . . . x(nz)= a(knz), indx(1)=k1, indx(2)=k2, . . . indx(nz)= knz. Thus, a sparse vector is fully determined by the triple (nz, x, indx). If you pass a negative or zero value of nz to Sparse BLAS, the subroutines do not modify any arrays or variables. Full-storage vectors. Sparse BLAS routines can also use a vector argument fully stored in a single FORTRAN array (a full-storage vector). If y is a full-storage vector, its elements must be stored contiguously: the first element in y(1), the second in y(2), and so on. This corresponds to an increment incy = 1 in BLAS Level 1. No increment value for full-storage vectors is passed as an argument to Sparse BLAS routines or functions. Naming Conventions Similar to BLAS, the names of Sparse BLAS subprograms have prefixes that determine the data type involved: s and d for single- and double-precision real; c and z for single- and double-precision complex respectively. If a Sparse BLAS routine is an extension of a "dense" one, the subprogram name is formed by appending the suffix i (standing for indexed) to the name of the corresponding "dense" subprogram. For example, the Sparse BLAS routine saxpyi corresponds to the BLAS routine saxpy, and the Sparse BLAS function cdotci corresponds to the BLAS function cdotc. 2 Intel® Math Kernel Library Reference Manual 140 Routines and Data Types Routines and data types supported in the Intel MKL implementation of Sparse BLAS are listed in Table “Sparse BLAS Routines and Their Data Types”. Sparse BLAS Routines and Their Data Types Routine/ Function Data Types Description ?axpyi s, d, c, z Scalar-vector product plus vector (routines) ?doti s, d Dot product (functions) ?dotci c, z Complex dot product conjugated (functions) ?dotui c, z Complex dot product unconjugated (functions) ?gthr s, d, c, z Gathering a full-storage sparse vector into compressed form nz, x, indx (routines) ?gthrz s, d, c, z Gathering a full-storage sparse vector into compressed form and assigning zeros to gathered elements in the fullstorage vector (routines) ?roti s, d Givens rotation (routines) ?sctr s, d, c, z Scattering a vector from compressed form to full-storage form (routines) BLAS Level 1 Routines That Can Work With Sparse Vectors The following BLAS Level 1 routines will give correct results when you pass to them a compressed-form array x(with the increment incx=1): ?asum sum of absolute values of vector elements ?copy copying a vector ?nrm2 Euclidean norm of a vector ?scal scaling a vector i?amax index of the element with the largest absolute value for real flavors, or the largest sum |Re(x(i))|+|Im(x(i))| for complex flavors. i?amin index of the element with the smallest absolute value for real flavors, or the smallest sum |Re(x(i))|+|Im(x(i))| for complex flavors. The result i returned by i?amax and i?amin should be interpreted as index in the compressed-form array, so that the largest (smallest) value is x(i); the corresponding index in full-storage array is indx(i). You can also call ?rotg to compute the parameters of Givens rotation and then pass these parameters to the Sparse BLAS routines ?roti. ?axpyi Adds a scalar multiple of compressed sparse vector to a full-storage vector. Syntax Fortran 77: call saxpyi(nz, a, x, indx, y) BLAS and Sparse BLAS Routines 2 141 call daxpyi(nz, a, x, indx, y) call caxpyi(nz, a, x, indx, y) call zaxpyi(nz, a, x, indx, y) Fortran 95: call axpyi(x, indx, y [, a]) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?axpyi routines perform a vector-vector operation defined as y := a*x + y where: a is a scalar, x is a sparse vector stored in compressed form, y is a vector in full storage form. The ?axpyi routines reference or modify only the elements of y whose indices are listed in the array indx. The values in indx must be distinct. Input Parameters nz INTEGER. The number of elements in x and indx. a REAL for saxpyi DOUBLE PRECISION for daxpyi COMPLEX for caxpyi DOUBLE COMPLEX for zaxpyi Specifies the scalar a. x REAL for saxpyi DOUBLE PRECISION for daxpyi COMPLEX for caxpyi DOUBLE COMPLEX for zaxpyi Array, DIMENSION at least nz. indx INTEGER. Specifies the indices for the elements of x. Array, DIMENSION at least nz. y REAL for saxpyi DOUBLE PRECISION for daxpyi COMPLEX for caxpyi DOUBLE COMPLEX for zaxpyi Array, DIMENSION at least max(indx(i)). Output Parameters y Contains the updated vector y. 2 Intel® Math Kernel Library Reference Manual 142 Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine axpyi interface are the following: x Holds the vector with the number of elements nz. indx Holds the vector with the number of elements nz. y Holds the vector with the number of elements nz. a The default value is 1. ?doti Computes the dot product of a compressed sparse real vector by a full-storage real vector. Syntax Fortran 77: res = sdoti(nz, x, indx, y ) res = ddoti(nz, x, indx, y ) Fortran 95: res = doti(x, indx, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?doti routines return the dot product of x and y defined as res = x(1)*y(indx(1)) + x(2)*y(indx(2)) +...+ x(nz)*y(indx(nz)) where the triple (nz, x, indx) defines a sparse real vector stored in compressed form, and y is a real vector in full storage form. The functions reference only the elements of y whose indices are listed in the array indx. The values in indx must be distinct. Input Parameters nz INTEGER. The number of elements in x and indx . x REAL for sdoti DOUBLE PRECISION for ddoti Array, DIMENSION at least nz. indx INTEGER. Specifies the indices for the elements of x. Array, DIMENSION at least nz. y REAL for sdoti DOUBLE PRECISION for ddoti Array, DIMENSION at least max(indx(i)). BLAS and Sparse BLAS Routines 2 143 Output Parameters res REAL for sdoti DOUBLE PRECISION for ddoti Contains the dot product of x and y, if nz is positive. Otherwise, res contains 0. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine doti interface are the following: x Holds the vector with the number of elements nz. indx Holds the vector with the number of elements nz. y Holds the vector with the number of elements nz. ?dotci Computes the conjugated dot product of a compressed sparse complex vector with a full-storage complex vector. Syntax Fortran 77: res = cdotci(nz, x, indx, y ) res = zdotci(nz, x, indx, y ) Fortran 95: res = dotci(x, indx, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?dotci routines return the dot product of x and y defined as conjg(x(1))*y(indx(1)) + ... + conjg(x(nz))*y(indx(nz)) where the triple (nz, x, indx) defines a sparse complex vector stored in compressed form, and y is a real vector in full storage form. The functions reference only the elements of y whose indices are listed in the array indx. The values in indx must be distinct. Input Parameters nz INTEGER. The number of elements in x and indx . x COMPLEX for cdotci DOUBLE COMPLEX for zdotci Array, DIMENSION at least nz. indx INTEGER. Specifies the indices for the elements of x. 2 Intel® Math Kernel Library Reference Manual 144 Array, DIMENSION at least nz. y COMPLEX for cdotci DOUBLE COMPLEX for zdotci Array, DIMENSION at least max(indx(i)). Output Parameters res COMPLEX for cdotci DOUBLE COMPLEX for zdotci Contains the conjugated dot product of x and y, if nz is positive. Otherwise, res contains 0. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine dotci interface are the following: x Holds the vector with the number of elements (nz). indx Holds the vector with the number of elements (nz). y Holds the vector with the number of elements (nz). ?dotui Computes the dot product of a compressed sparse complex vector by a full-storage complex vector. Syntax Fortran 77: res = cdotui(nz, x, indx, y ) res = zdotui(nz, x, indx, y ) Fortran 95: res = dotui(x, indx, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?dotui routines return the dot product of x and y defined as res = x(1)*y(indx(1)) + x(2)*y(indx(2)) +...+ x(nz)*y(indx(nz)) where the triple (nz, x, indx) defines a sparse complex vector stored in compressed form, and y is a real vector in full storage form. The functions reference only the elements of y whose indices are listed in the array indx. The values in indx must be distinct. Input Parameters nz INTEGER. The number of elements in x and indx . BLAS and Sparse BLAS Routines 2 145 x COMPLEX for cdotui DOUBLE COMPLEX for zdotui Array, DIMENSION at least nz. indx INTEGER. Specifies the indices for the elements of x. Array, DIMENSION at least nz. y COMPLEX for cdotui DOUBLE COMPLEX for zdotui Array, DIMENSION at least max(indx(i)). Output Parameters res COMPLEX for cdotui DOUBLE COMPLEX for zdotui Contains the dot product of x and y, if nz is positive. Otherwise, res contains 0. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine dotui interface are the following: x Holds the vector with the number of elements nz. indx Holds the vector with the number of elements nz. y Holds the vector with the number of elements nz. ?gthr Gathers a full-storage sparse vector's elements into compressed form. Syntax Fortran 77: call sgthr(nz, y, x, indx ) call dgthr(nz, y, x, indx ) call cgthr(nz, y, x, indx ) call zgthr(nz, y, x, indx ) Fortran 95: res = gthr(x, indx, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?gthr routines gather the specified elements of a full-storage sparse vector y into compressed form(nz, x, indx). The routines reference only the elements of y whose indices are listed in the array indx: 2 Intel® Math Kernel Library Reference Manual 146 x(i) = y(indx(i)), for i=1,2,... +nz. Input Parameters nz INTEGER. The number of elements of y to be gathered. indx INTEGER. Specifies indices of elements to be gathered. Array, DIMENSION at least nz. y REAL for sgthr DOUBLE PRECISION for dgthr COMPLEX for cgthr DOUBLE COMPLEX for zgthr Array, DIMENSION at least max(indx(i)). Output Parameters x REAL for sgthr DOUBLE PRECISION for dgthr COMPLEX for cgthr DOUBLE COMPLEX for zgthr Array, DIMENSION at least nz. Contains the vector converted to the compressed form. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine gthr interface are the following: x Holds the vector with the number of elements nz. indx Holds the vector with the number of elements nz. y Holds the vector with the number of elements nz. ?gthrz Gathers a sparse vector's elements into compressed form, replacing them by zeros. Syntax Fortran 77: call sgthrz(nz, y, x, indx ) call dgthrz(nz, y, x, indx ) call cgthrz(nz, y, x, indx ) call zgthrz(nz, y, x, indx ) Fortran 95: res = gthrz(x, indx, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h BLAS and Sparse BLAS Routines 2 147 Description The ?gthrz routines gather the elements with indices specified by the array indx from a full-storage vector y into compressed form (nz, x, indx) and overwrite the gathered elements of y by zeros. Other elements of y are not referenced or modified (see also ?gthr). Input Parameters nz INTEGER. The number of elements of y to be gathered. indx INTEGER. Specifies indices of elements to be gathered. Array, DIMENSION at least nz. y REAL for sgthrz DOUBLE PRECISION for dgthrz COMPLEX for cgthrz DOUBLE COMPLEX for zgthrz Array, DIMENSION at least max(indx(i)). Output Parameters x REAL for sgthrz DOUBLE PRECISION for d gthrz COMPLEX for cgthrz DOUBLE COMPLEX for zgthrz Array, DIMENSION at least nz. Contains the vector converted to the compressed form. y The updated vector y. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine gthrz interface are the following: x Holds the vector with the number of elements nz. indx Holds the vector with the number of elements nz. y Holds the vector with the number of elements nz. ?roti Applies Givens rotation to sparse vectors one of which is in compressed form. Syntax Fortran 77: call sroti(nz, x, indx, y, c, s) call droti(nz, x, indx, y, c, s) Fortran 95: call roti(x, indx, y, c, s) Include Files • FORTRAN 77: mkl_blas.fi 2 Intel® Math Kernel Library Reference Manual 148 • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?roti routines apply the Givens rotation to elements of two real vectors, x (in compressed form nz, x, indx) and y (in full storage form): x(i) = c*x(i) + s*y(indx(i)) y(indx(i)) = c*y(indx(i))- s*x(i) The routines reference only the elements of y whose indices are listed in the array indx. The values in indx must be distinct. Input Parameters nz INTEGER. The number of elements in x and indx. x REAL for sroti DOUBLE PRECISION for droti Array, DIMENSION at least nz. indx INTEGER. Specifies the indices for the elements of x. Array, DIMENSION at least nz. y REAL for sroti DOUBLE PRECISION for droti Array, DIMENSION at least max(indx(i)). c A scalar: REAL for sroti DOUBLE PRECISION for droti. s A scalar: REAL for sroti DOUBLE PRECISION for droti. Output Parameters x and y The updated arrays. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine roti interface are the following: x Holds the vector with the number of elements nz. indx Holds the vector with the number of elements nz. y Holds the vector with the number of elements nz. ?sctr Converts compressed sparse vectors into full storage form. Syntax Fortran 77: call ssctr(nz, x, indx, y ) call dsctr(nz, x, indx, y ) BLAS and Sparse BLAS Routines 2 149 call csctr(nz, x, indx, y ) call zsctr(nz, x, indx, y ) Fortran 95: call sctr(x, indx, y) Include Files • FORTRAN 77: mkl_blas.fi • Fortran 95: blas.f90 • C: mkl_blas.h Description The ?sctr routines scatter the elements of the compressed sparse vector (nz, x, indx) to a full-storage vector y. The routines modify only the elements of y whose indices are listed in the array indx: y(indx(i) = x(i), for i=1,2,... +nz. Input Parameters nz INTEGER. The number of elements of x to be scattered. indx INTEGER. Specifies indices of elements to be scattered. Array, DIMENSION at least nz. x REAL for ssctr DOUBLE PRECISION for dsctr COMPLEX for csctr DOUBLE COMPLEX for zsctr Array, DIMENSION at least nz. Contains the vector to be converted to full-storage form. Output Parameters y REAL for ssctr DOUBLE PRECISION for dsctr COMPLEX for csctr DOUBLE COMPLEX for zsctr Array, DIMENSION at least max(indx(i)). Contains the vector y with updated elements. Fortran 95 Interface Notes Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see Fortran 95 Interface Conventions. Specific details for the routine sctr interface are the following: x Holds the vector with the number of elements nz. indx Holds the vector with the number of elements nz. y Holds the vector with the number of elements nz. 2 Intel® Math Kernel Library Reference Manual 150 Sparse BLAS Level 2 and Level 3 Routines This section describes Sparse BLAS Level 2 and Level 3 routines included in the Intel® Math Kernel Library (Intel® MKL) . Sparse BLAS Level 2 is a group of routines and functions that perform operations between a sparse matrix and dense vectors. Sparse BLAS Level 3 is a group of routines and functions that perform operations between a sparse matrix and dense matrices. The terms and concepts required to understand the use of the Intel MKL Sparse BLAS Level 2 and Level 3 routines are discussed in the Linear Solvers Basics appendix. The Sparse BLAS routines can be useful to implement iterative methods for solving large sparse systems of equations or eigenvalue problems. For example, these routines can be considered as building blocks for Iterative Sparse Solvers based on Reverse Communication Interface (RCI ISS) described in the Chapter 8 of the manual. Intel MKL provides Sparse BLAS Level 2 and Level 3 routines with typical (or conventional) interface similar to the interface used in the NIST* Sparse BLAS library [Rem05]. Some software packages and libraries (the PARDISO* Solver used in Intel MKL, Sparskit 2 [Saad94], the Compaq* Extended Math Library (CXML)[CXML01]) use different (early) variation of the compressed sparse row (CSR) format and support only Level 2 operations with simplified interfaces. Intel MKL provides an additional set of Sparse BLAS Level 2 routines with similar simplified interfaces. Each of these routines operates only on a matrix of the fixed type. The routines described in this section support both one-based indexing and zero-based indexing of the input data (see details in the section One-based and Zero-based Indexing). Naming Conventions in Sparse BLAS Level 2 and Level 3 Each Sparse BLAS Level 2 and Level 3 routine has a six- or eight-character base name preceded by the prefix mkl_ or mkl_cspblas_ . The routines with typical (conventional) interface have six-character base names in accordance with the template: mkl_ ( ) The routines with simplified interfaces have eight-character base names in accordance with the templates: mkl_ ( ) for routines with one-based indexing; and mkl_cspblas_ ( ) for routines with zero-based indexing. The field indicates the data type: s real, single precision c complex, single precision d real, double precision z complex, double precision The field indicates the sparse matrix storage format (see section Sparse Matrix Storage Formats): coo coordinate format csr compressed sparse row format and its variations csc compressed sparse column format and its variations dia diagonal format sky skyline storage format bsr block sparse row format and its variations The field indicates the type of operation: BLAS and Sparse BLAS Routines 2 151 mv matrix-vector product (Level 2) mm matrix-matrix product (Level 3) sv solving a single triangular system (Level 2) sm solving triangular systems with multiple right-hand sides (Level 3) The field indicates the matrix type: ge sparse representation of a general matrix sy sparse representation of the upper or lower triangle of a symmetric matrix tr sparse representation of a triangular matrix Sparse Matrix Storage Formats The current version of Intel MKL Sparse BLAS Level 2 and Level 3 routines support the following point entry [Duff86] storage formats for sparse matrices: • compressed sparse row format (CSR) and its variations; • compressed sparse column format (CSC); • coordinate format; • diagonal format; • skyline storage format; and one block entry storage format: • block sparse row format (BSR) and its variations. For more information see "Sparse Matrix Storage Formats" in Appendix A. Intel MKL provides auxiliary routines - matrix converters - that convert sparse matrix from one storage format to another. Routines and Supported Operations This section describes operations supported by the Intel MKL Sparse BLAS Level 2 and Level 3 routines. The following notations are used here: A is a sparse matrix; B and C are dense matrices; D is a diagonal scaling matrix; x and y are dense vectors; alpha and beta are scalars; op(A) is one of the possible operations: op(A) = A; op(A) = A' - transpose of A; op(A) = conj(A') - conjugated transpose of A. inv(op(A)) denotes the inverse of op(A). The Intel MKL Sparse BLAS Level 2 and Level 3 routines support the following operations: • computing the vector product between a sparse matrix and a dense vector: y := alpha*op(A)*x + beta*y • solving a single triangular system: y := alpha*inv(op(A))*x 2 Intel® Math Kernel Library Reference Manual 152 • computing a product between sparse matrix and dense matrix: C := alpha*op(A)*B + beta*C • solving a sparse triangular system with multiple right-hand sides: C := alpha*inv(op(A))*B Intel MKL provides an additional set of the Sparse BLAS Level 2 routines with simplified interfaces. Each of these routines operates on a matrix of the fixed type. The following operations are supported: • computing the vector product between a sparse matrix and a dense vector (for general and symmetric matrices): y := op(A)*x • solving a single triangular system (for triangular matrices): y := inv(op(A))*x Matrix type is indicated by the field in the routine name (see section Naming Conventions in Sparse BLAS Level 2 and Level 3). NOTE The routines with simplified interfaces support only four sparse matrix storage formats, specifically: CSR format in the 3-array variation accepted in the direct sparse solvers and in the CXML; diagonal format accepted in the CXML; coordinate format; BSR format in the 3-array variation. Note that routines with both typical (conventional) and simplified interfaces use the same computational kernels that work with certain internal data structures. The Intel MKL Sparse BLAS Level 2 and Level 3 routines do not support in-place operations. Complete list of all routines is given in the “Sparse BLAS Level 2 and Level 3 Routines”. Interface Consideration One-Based and Zero-Based Indexing The Intel MKL Sparse BLAS Level 2 and Level 3 routines support one-based and zero-based indexing of data arrays. Routines with typical interfaces support zero-based indexing for the following sparse data storage formats: CSR, CSC, BSR, and COO. Routines with simplified interfaces support zero based indexing for the following sparse data storage formats: CSR, BSR, and COO. See the complete list of Sparse BLAS Level 2 and Level 3 Routines. The one-based indexing uses the convention of starting array indices at 1. The zero-based indexing uses the convention of starting array indices at 0. For example, indices of the 5-element array x can be presented in case of one-based indexing as follows: Element index: 1 2 3 4 5 Element value: 1.0 5.0 7.0 8.0 9.0 and in case of zero-based indexing as follows: Element index: 0 1 2 3 4 Element value: 1.0 5.0 7.0 8.0 9.0 The detailed descriptions of the one-based and zero-based variants of the sparse data storage formats are given in the "Sparse Matrix Storage Formats" in Appendix A. BLAS and Sparse BLAS Routines 2 153 Most parameters of the routines are identical for both one-based and zero-based indexing, but some of them have certain differences. The following table lists all these differences. Parameter One-based Indexing Zero-based Indexing val Array containing non-zero elements of the matrix A, its length is pntre(m) - pntrb(1). Array containing non-zero elements of the matrix A, its length is pntre(m—1) - pntrb(0). pntrb Array of length m. This array contains row indices, such that pntrb(i) - pntrb(1)+1 is the first index of row i in the arrays val and indx Array of length m. This array contains row indices, such that pntrb(i) - pntrb(0) is the first index of row i in the arrays val and indx. pntre Array of length m. This array contains row indices, such that pntre(I) - pntrb(1) is the last index of row i in the arrays val and indx. Array of length m. This array contains row indices, such that pntre(i) - pntrb(0)-1 is the last index of row i in the arrays val and indx. ia Array of length m + 1, containing indices of elements in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m + 1) is equal to the number of non-zeros plus one. Array of length m+1, containing indices of elements in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m) is equal to the number of nonzeros. ldb Specifies the leading dimension of b as declared in the calling (sub)program. Specifies the second dimension of b as declared in the calling (sub)program. ldc Specifies the leading dimension of c as declared in the calling (sub)program. Specifies the second dimension of c as declared in the calling (sub)program. Difference Between Fortran and C Interfaces Intel MKL provides both Fortran and C interfaces to all Sparse BLAS Level 2 and Level 3 routines. Parameter descriptions are common for both interfaces with the exception of data types that refer to the FORTRAN 77 standard types. Correspondence between data types specific to the Fortran and C interfaces are given below: Fortran C REAL*4 float REAL*8 double INTEGER*4 int INTEGER*8 long long int CHARACTER char For routines with C interfaces all parameters (including scalars) must be passed by references. Another difference is how two-dimensional arrays are represented. In Fortran the column-major order is used, and in C - row-major order. This changes the meaning of the parameters ldb and ldc (see the table above). Differences Between Intel MKL and NIST* Interfaces The Intel MKL Sparse BLAS Level 3 routines have the following conventional interfaces: 2 Intel® Math Kernel Library Reference Manual 154 mkl_xyyymm(transa, m, n, k, alpha, matdescra, arg(A), b, ldb, beta, c, ldc), for matrixmatrix product; mkl_xyyysm(transa, m, n, alpha, matdescra, arg(A), b, ldb, c, ldc), for triangular solvers with multiple right-hand sides. Here x denotes data type, and yyy - sparse matrix data structure (storage format). The analogous NIST* Sparse BLAS (NSB) library routines have the following interfaces: xyyymm(transa, m, n, k, alpha, descra, arg(A), b, ldb, beta, c, ldc, work, lwork), for matrix-matrix product; xyyysm(transa, m, n, unitd, dv, alpha, descra, arg(A), b, ldb, beta, c, ldc, work, lwork), for triangular solvers with multiple right-hand sides. Some similar arguments are used in both libraries. The argument transa indicates what operation is performed and is slightly different in the NSB library (see Table “Parameter transa”). The arguments m and k are the number of rows and column in the matrix A, respectively, n is the number of columns in the matrix C. The arguments alpha and beta are scalar alpha and beta respectively (beta is not used in the Intel MKL triangular solvers.) The arguments b and c are rectangular arrays with the leading dimension ldb and ldc, respectively. arg(A) denotes the list of arguments that describe the sparse representation of A. Parameter transa MKL interface NSB interface Operation data type CHARACTER*1 INTEGER value N or n 0 op(A) = A T or t 1 op(A) = A' C or c 2 op(A) = A' Parameter matdescra The parameter matdescra describes the relevant characteristic of the matrix A. This manual describes matdescra as an array of six elements in line with the NIST* implementation. However, only the first four elements of the array are used in the current versions of the Intel MKL Sparse BLAS routines. Elements matdescra(5) and matdescra(6) are reserved for future use. Note that whether matdescra is described in your application as an array of length 6 or 4 is of no importance because the array is declared as a pointer in the Intel MKL routines. To learn more about declaration of the matdescra array, see Sparse BLAS examples located in the following subdirectory of the Intel MKL installation directory: examples/spblas/. The table below lists elements of the parameter matdescra, their values and meanings. The parameter matdescra corresponds to the argument descra from NSB library. Possible Values of the Parameter matdescra (descra) MKL interface NSB interface Matrix characteristics one-based indexing zero-based indexing data type CHARACTER Char INTEGER 1st element matdescra(1) matdescra(0) descra(1) matrix structure value G G 0 general S S 1 symmetric (A = A') BLAS and Sparse BLAS Routines 2 155 MKL interface NSB interface Matrix characteristics H H 2 Hermitian (A=conjg(A')) T T 3 triangular A A 4 skew(anti)-symmetric (A=-A') D D 5 diagonal 2nd element matdescra(2) matdescra(1) descra(2) upper/lower triangular indicator value L L 1 lower U U 2 upper 3rd element matdescra(3) matdescra(2) descra(3) main diagonal type value N N 0 non-unit U U 1 unit 4th element matdescra(4) matdescra(3) type of indexing value F one-based indexing C zero-based indexing In some cases possible element values of the parameter matdescra depend on the values of other elements. The Table "Possible Combinations of Element Values of the Parameter matdescra" lists all possible combinations of element values for both multiplication routines and triangular solvers. Possible Combinations of Element Values of the Parameter matdescra Routines matdescra(1) matdescra(2) matdescra(3) matdescra(4) Multiplication Routines G ignored ignored F (default) or C S or H L (default) N (default) F (default) or C S or H L (default) U F (default) or C S or H U N (default) F (default) or C S or H U U F (default) or C A L (default) ignored F (default) or C A U ignored F (default) or C Multiplication Routines and Triangular Solvers T L U F (default) or C T L N F (default) or C T U U F (default) or C T U N F (default) or C D ignored N (default) F (default) or C D ignored U F (default) or C For a matrix in the skyline format with the main diagonal declared to be a unit, diagonal elements must be stored in the sparse representation even if they are zero. In all other formats, diagonal elements can be stored (if needed) in the sparse representation if they are not zero. 2 Intel® Math Kernel Library Reference Manual 156 Operations with Partial Matrices One of the distinctive feature of the Intel MKL Sparse BLAS routines is a possibility to perform operations only on partial matrices composed of certain parts (triangles and the main diagonal) of the input sparse matrix. It can be done by setting properly first three elements of the parameter matdescra. An arbitrary sparse matrix A can be decomposed as A = L + D + U where L is the strict lower triangle of A, U is the strict upper triangle of A, D is the main diagonal. Table "Output Matrices for Multiplication Routines" shows correspondence between the output matrices and values of the parameter matdescra for the sparse matrix A for multiplication routines. Output Matrices for Multiplication Routines matdescra(1) matdescra(2) matdescra(3) Output Matrix G ignored ignored alpha*op(A)*x + beta*y alpha*op(A)*B + beta*C S or H L N alpha*op(L+D+L')*x + beta*y alpha*op(L+D+L')*B + beta*C S or H L U alpha*op(L+I+L')*x + beta*y alpha*op(L+I+L')*B + beta*C S or H U N alpha*op(U'+D+U)*x + beta*y alpha*op(U'+D+U)*B + beta*C S or H U U alpha*op(U'+I+U)*x + beta*y alpha*op(U'+I+U)*B + beta*C T L U alpha*op(L+I)*x + beta*y alpha*op(L+I)*B + beta*C T L N alpha*op(L+D)*x + beta*y alpha*op(L+D)*B + beta*C T U U alpha*op(U+I)*x + beta*y alpha*op(U+I)*B + beta*C T U N alpha*op(U+D)*x + beta*y alpha*op(U+D)*B + beta*C A L ignored alpha*op(L-L')*x + beta*y alpha*op(L-L')*B + beta*C A U ignored alpha*op(U-U')*x + beta*y alpha*op(U-U')*B + beta*C D ignored N alpha*D*x + beta*y alpha*D*B + beta*C D ignored U alpha*x + beta*y alpha*B + beta*C Table “Output Matrices for Triangular Solvers” shows correspondence between the output matrices and values of the parameter matdescra for the sparse matrix A for triangular solvers. BLAS and Sparse BLAS Routines 2 157 Output Matrices for Triangular Solvers matdescra(1) matdescra(2) matdescra(3) Output Matrix T L N alpha*inv(op(L+L))*x alpha*inv(op(L+L))*B T L U alpha*inv(op(L+L))*x alpha*inv(op(L+L))*B T U N alpha*inv(op(U+U))*x alpha*inv(op(U+U))*B T U U alpha*inv(op(U+U))*x alpha*inv(op(U+U))*B D ignored N alpha*inv(D)*x alpha*inv(D)*B D ignored U alpha*x alpha*B Sparse BLAS Level 2 and Level 3 Routines. Table “Sparse BLAS Level 2 and Level 3 Routines” lists the sparse BLAS Level 2 and Level 3 routines described in more detail later in this section. Sparse BLAS Level 2 and Level 3 Routines Routine/Function Description Simplified interface, one-based indexing mkl_?csrgemv Computes matrix - vector product of a sparse general matrix in the CSR format (3-array variation) mkl_?bsrgemv Computes matrix - vector product of a sparse general matrix in the BSR format (3-array variation). mkl_?coogemv Computes matrix - vector product of a sparse general matrix in the coordinate format. mkl_?diagemv Computes matrix - vector product of a sparse general matrix in the diagonal format. mkl_?csrsymv Computes matrix - vector product of a sparse symmetrical matrix in the CSR format (3-array variation) mkl_?bsrsymv Computes matrix - vector product of a sparse symmetrical matrix in the BSR format (3-array variation). mkl_?coosymv Computes matrix - vector product of a sparse symmetrical matrix in the coordinate format. mkl_?diasymv Computes matrix - vector product of a sparse symmetrical matrix in the diagonal format. mkl_?csrtrsv Triangular solvers with simplified interface for a sparse matrix in the CSR format (3-array variation). 2 Intel® Math Kernel Library Reference Manual 158 Routine/Function Description mkl_?bsrtrsv Triangular solver with simplified interface for a sparse matrix in the BSR format (3-array variation). mkl_?cootrsv Triangular solvers with simplified interface for a sparse matrix in the coordinate format. mkl_?diatrsv Triangular solvers with simplified interface for a sparse matrix in the diagonal format. Simplified interface, zero-based indexing mkl_cspblas_?csrgemv Computes matrix - vector product of a sparse general matrix in the CSR format (3-array variation) with zero-based indexing. mkl_cspblas_?bsrgemv Computes matrix - vector product of a sparse general matrix in the BSR format (3-array variation)with zero-based indexing. mkl_cspblas_?coogemv Computes matrix - vector product of a sparse general matrix in the coordinate format with zero-based indexing. mkl_cspblas_?csrsymv Computes matrix - vector product of a sparse symmetrical matrix in the CSR format (3-array variation) with zero-based indexing mkl_cspblas_?bsrsymv Computes matrix - vector product of a sparse symmetrical matrix in the BSR format (3-array variation) with zero-based indexing. mkl_cspblas_?coosymv Computes matrix - vector product of a sparse symmetrical matrix in the coordinate format with zero-based indexing. mkl_cspblas_?csrtrsv Triangular solvers with simplified interface for a sparse matrix in the CSR format (3-array variation) with zero-based indexing. mkl_cspblas_?bsrtrsv Triangular solver with simplified interface for a sparse matrix in the BSR format (3-array variation) with zero-based indexing. mkl_cspblas_?cootrsv Triangular solver with simplified interface for a sparse matrix in the coordinate format with zero-based indexing. Typical (conventional) interface, one-based and zero-based indexing mkl_?csrmv Computes matrix - vector product of a sparse matrix in the CSR format. mkl_?bsrmv Computes matrix - vector product of a sparse matrix in the BSR format. mkl_?cscmv Computes matrix - vector product for a sparse matrix in the CSC format. mkl_?coomv Computes matrix - vector product for a sparse matrix in the coordinate format. mkl_?csrsv Solves a system of linear equations for a sparse matrix in the CSR format. BLAS and Sparse BLAS Routines 2 159 Routine/Function Description mkl_?bsrsv Solves a system of linear equations for a sparse matrix in the BSR format. mkl_?cscsv Solves a system of linear equations for a sparse matrix in the CSC format. mkl_?coosv Solves a system of linear equations for a sparse matrix in the coordinate format. mkl_?csrmm Computes matrix - matrix product of a sparse matrix in the CSR format mkl_?bsrmm Computes matrix - matrix product of a sparse matrix in the BSR format. mkl_?cscmm Computes matrix - matrix product of a sparse matrix in the CSC format mkl_?coomm Computes matrix - matrix product of a sparse matrix in the coordinate format. mkl_?csrsm Solves a system of linear matrix equations for a sparse matrix in the CSR format. mkl_?bsrsm Solves a system of linear matrix equations for a sparse matrix in the BSR format. mkl_?cscsm Solves a system of linear matrix equations for a sparse matrix in the CSC format. mkl_?coosm Solves a system of linear matrix equations for a sparse matrix in the coordinate format. Typical (conventional) interface, one-based indexing mkl_?diamv Computes matrix - vector product of a sparse matrix in the diagonal format. mkl_?skymv Computes matrix - vector product for a sparse matrix in the skyline storage format. mkl_?diasv Solves a system of linear equations for a sparse matrix in the diagonal format. mkl_?skysv Solves a system of linear equations for a sparse matrix in the skyline format. mkl_?diamm Computes matrix - matrix product of a sparse matrix in the diagonal format. mkl_?skymm Computes matrix - matrix product of a sparse matrix in the skyline storage format. mkl_?diasm Solves a system of linear matrix equations for a sparse matrix in the diagonal format. mkl_?skysm Solves a system of linear matrix equations for a sparse matrix in the skyline storage format. Auxiliary routines Matrix converters 2 Intel® Math Kernel Library Reference Manual 160 Routine/Function Description mkl_?dnscsr Converts a sparse matrix in the dense representation to the CSR format (3-array variation). mkl_?csrcoo Converts a sparse matrix in the CSR format (3-array variation) to the coordinate format and vice versa. mkl_?csrbsr Converts a sparse matrix in the CSR format to the BSR format (3-array variations) and vice versa. mkl_?csrcsc Converts a sparse matrix in the CSR format to the CSC and vice versa (3-array variations). mkl_?csrdia Converts a sparse matrix in the CSR format (3-array variation) to the diagonal format and vice versa. mkl_?csrsky Converts a sparse matrix in the CSR format (3-array variation) to the sky line format and vice versa. Operations on sparse matrices mkl_?csradd Computes the sum of two sparse matrices stored in the CSR format (3-array variation) with one-based indexing. mkl_?csrmultcsr Computes the product of two sparse matrices stored in the CSR format (3-array variation) with one-based indexing. mkl_?csrmultd Computes product of two sparse matrices stored in the CSR format (3-array variation) with one-based indexing. The result is stored in the dense matrix. mkl_?csrgemv Computes matrix - vector product of a sparse general matrix stored in the CSR format (3-array variation) with one-based indexing. Syntax Fortran: call mkl_scsrgemv(transa, m, a, ia, ja, x, y) call mkl_dcsrgemv(transa, m, a, ia, ja, x, y) call mkl_ccsrgemv(transa, m, a, ia, ja, x, y) call mkl_zcsrgemv(transa, m, a, ia, ja, x, y) C: mkl_scsrgemv(&transa, &m, a, ia, ja, x, y); mkl_dcsrgemv(&transa, &m, a, ia, ja, x, y); mkl_ccsrgemv(&transa, &m, a, ia, ja, x, y); mkl_zcsrgemv(&transa, &m, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h BLAS and Sparse BLAS Routines 2 161 Description The mkl_?csrgemv routine performs a matrix-vector operation defined as y := A*x or y := A'*x, where: x and y are vectors, A is an m-by-m sparse square matrix in the CSR format (3-array variation), A' is the transpose of A. NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. transa CHARACTER*1. Specifies the operation. If transa = 'N' or 'n', then as y := A*x If transa = 'T' or 't' or 'C' or 'c', then y := A'*x, m INTEGER. Number of rows of the matrix A. a REAL for mkl_scsrgemv. DOUBLE PRECISION for mkl_dcsrgemv. COMPLEX for mkl_ccsrgemv. DOUBLE COMPLEX for mkl_zcsrgemv. Array containing non-zero elements of the matrix A. Its length is equal to the number of non-zero elements in the matrix A. Refer to values array description in Sparse Matrix Storage Formats for more details. ia INTEGER. Array of length m + 1, containing indices of elements in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m + 1) is equal to the number of non-zeros plus one. Refer to rowIndex array description in Sparse Matrix Storage Formats for more details. ja INTEGER. Array containing the column indices for each non-zero element of the matrix A. Its length is equal to the length of the array a. Refer to columns array description in Sparse Matrix Storage Formats for more details. x REAL for mkl_scsrgemv. DOUBLE PRECISION for mkl_dcsrgemv. COMPLEX for mkl_ccsrgemv. DOUBLE COMPLEX for mkl_zcsrgemv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_scsrgemv. DOUBLE PRECISION for mkl_dcsrgemv. COMPLEX for mkl_ccsrgemv. 2 Intel® Math Kernel Library Reference Manual 162 DOUBLE COMPLEX for mkl_zcsrgemv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_scsrgemv(transa, m, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) SUBROUTINE mkl_dcsrgemv(transa, m, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_ccsrgemv(transa, m, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_zcsrgemv(transa, m, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_scsrgemv(char *transa, int *m, float *a, int *ia, int *ja, float *x, float *y); void mkl_dcsrgemv(char *transa, int *m, double *a, int *ia, int *ja, double *x, double *y); void mkl_ccsrgemv(char *transa, int *m, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zcsrgemv(char *transa, int *m, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); BLAS and Sparse BLAS Routines 2 163 mkl_?bsrgemv Computes matrix - vector product of a sparse general matrix stored in the BSR format (3-array variation) with one-based indexing. Syntax Fortran: call mkl_sbsrgemv(transa, m, lb, a, ia, ja, x, y) call mkl_dbsrgemv(transa, m, lb, a, ia, ja, x, y) call mkl_cbsrgemv(transa, m, lb, a, ia, ja, x, y) call mkl_zbsrgemv(transa, m, lb, a, ia, ja, x, y) C: mkl_sbsrgemv(&transa, &m, &lb, a, ia, ja, x, y); mkl_dbsrgemv(&transa, &m, &lb, a, ia, ja, x, y); mkl_cbsrgemv(&transa, &m, &lb, a, ia, ja, x, y); mkl_zbsrgemv(&transa, &m, &lb, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_?bsrgemv routine performs a matrix-vector operation defined as y := A*x or y := A'*x, where: x and y are vectors, A is an m-by-m block sparse square matrix in the BSR format (3-array variation), A' is the transpose of A. NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. transa CHARACTER*1. Specifies the operation. If transa = 'N' or 'n', then the matrix-vector product is computed as y := A*x If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is computed as y := A'*x, m INTEGER. Number of block rows of the matrix A. 2 Intel® Math Kernel Library Reference Manual 164 lb INTEGER. Size of the block in the matrix A. a REAL for mkl_sbsrgemv. DOUBLE PRECISION for mkl_dbsrgemv. COMPLEX for mkl_cbsrgemv. DOUBLE COMPLEX for mkl_zbsrgemv. Array containing elements of non-zero blocks of the matrix A. Its length is equal to the number of non-zero blocks in the matrix A multiplied by lb*lb. Refer to values array description in BSR Format for more details. ia INTEGER. Array of length (m + 1), containing indices of block in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m + 1) is equal to the number of non-zero blocks plus one. Refer to rowIndex array description in BSR Format for more details. ja INTEGER. Array containing the column indices for each non-zero block in the matrix A. Its length is equal to the number of non-zero blocks of the matrix A. Refer to columns array description in BSR Format for more details. x REAL for mkl_sbsrgemv. DOUBLE PRECISION for mkl_dbsrgemv. COMPLEX for mkl_cbsrgemv. DOUBLE COMPLEX for mkl_zbsrgemv. Array, DIMENSION (m*lb). On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_sbsrgemv. DOUBLE PRECISION for mkl_dbsrgemv. COMPLEX for mkl_cbsrgemv. DOUBLE COMPLEX for mkl_zbsrgemv. Array, DIMENSION at least (m*lb). On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_sbsrgemv(transa, m, lb, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m, lb INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) SUBROUTINE mkl_dbsrgemv(transa, m, lb, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) BLAS and Sparse BLAS Routines 2 165 SUBROUTINE mkl_cbsrgemv(transa, m, lb, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m, lb INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_zbsrgemv(transa, m, lb, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_dbsrgemv(char *transa, int *m, int *lb, double *a, int *ia, int *ja, double *x, double *y); void mkl_sbsrgemv(char *transa, int *m, int *lb, float *a, int *ia, int *ja, float *x, float *y); void mkl_cbsrgemv(char *transa, int *m, int *lb, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zbsrgemv(char *transa, int *m, int *lb, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); mkl_?coogemv Computes matrix-vector product of a sparse general matrix stored in the coordinate format with one-based indexing. Syntax Fortran: call mkl_scoogemv(transa, m, val, rowind, colind, nnz, x, y) call mkl_dcoogemv(transa, m, val, rowind, colind, nnz, x, y) call mkl_ccoogemv(transa, m, val, rowind, colind, nnz, x, y) call mkl_zcoogemv(transa, m, val, rowind, colind, nnz, x, y) C: mkl_scoogemv(&transa, &m, val, rowind, colind, &nnz, x, y); mkl_dcoogemv(&transa, &m, val, rowind, colind, &nnz, x, y); mkl_ccoogemv(&transa, &m, val, rowind, colind, &nnz, x, y); mkl_zcoogemv(&transa, &m, val, rowind, colind, &nnz, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h 2 Intel® Math Kernel Library Reference Manual 166 Description The mkl_?coogemv routine performs a matrix-vector operation defined as y := A*x or y := A'*x, where: x and y are vectors, A is an m-by-m sparse square matrix in the coordinate format, A' is the transpose of A. NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. transa CHARACTER*1. Specifies the operation. If transa = 'N' or 'n', then the matrix-vector product is computed as y := A*x If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is computed as y := A'*x, m INTEGER. Number of rows of the matrix A. val REAL for mkl_scoogemv. DOUBLE PRECISION for mkl_dcoogemv. COMPLEX for mkl_ccoogemv. DOUBLE COMPLEX for mkl_zcoogemv. Array of length nnz, contains non-zero elements of the matrix A in the arbitrary order. Refer to values array description in Coordinate Format for more details. rowind INTEGER. Array of length nnz, contains the row indices for each non-zero element of the matrix A. Refer to rows array description in Coordinate Format for more details. colind INTEGER. Array of length nnz, contains the column indices for each nonzero element of the matrix A. Refer to columns array description in Coordinate Format for more details. nnz INTEGER. Specifies the number of non-zero element of the matrix A. Refer to nnz description in Coordinate Format for more details. x REAL for mkl_scoogemv. DOUBLE PRECISION for mkl_dcoogemv. COMPLEX for mkl_ccoogemv. DOUBLE COMPLEX for mkl_zcoogemv. Array, DIMENSION is m. One entry, the array x must contain the vector x. Output Parameters y REAL for mkl_scoogemv. DOUBLE PRECISION for mkl_dcoogemv. BLAS and Sparse BLAS Routines 2 167 COMPLEX for mkl_ccoogemv. DOUBLE COMPLEX for mkl_zcoogemv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_scoogemv(transa, m, val, rowind, colind, nnz, x, y) CHARACTER*1 transa INTEGER m, nnz INTEGER rowind(*), colind(*) REAL val(*), x(*), y(*) SUBROUTINE mkl_dcoogemv(transa, m, val, rowind, colind, nnz, x, y) CHARACTER*1 transa INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE PRECISION val(*), x(*), y(*) SUBROUTINE mkl_ccoogemv(transa, m, val, rowind, colind, nnz, x, y) CHARACTER*1 transa INTEGER m, nnz INTEGER rowind(*), colind(*) COMPLEX val(*), x(*), y(*) SUBROUTINE mkl_zcoogemv(transa, m, val, rowind, colind, nnz, x, y) CHARACTER*1 transa INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE COMPLEX val(*), x(*), y(*) C: void mkl_scoogemv(char *transa, int *m, float *val, int *rowind, int *colind, int *nnz, float *x, float *y); void mkl_dcoogemv(char *transa, int *m, double *val, int *rowind, int *colind, int *nnz, double *x, double *y); void mkl_ccoogemv(char *transa, int *m, MKL_Complex8 *val, int *rowind, int *colind, int *nnz, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zcoogemv(char *transa, int *m, MKL_Complex16 *val, int *rowind, int *colind, int *nnz, MKL_Complex16 *x, MKL_Complex16 *y); 2 Intel® Math Kernel Library Reference Manual 168 mkl_?diagemv Computes matrix - vector product of a sparse general matrix stored in the diagonal format with one-based indexing. Syntax Fortran: call mkl_sdiagemv(transa, m, val, lval, idiag, ndiag, x, y) call mkl_ddiagemv(transa, m, val, lval, idiag, ndiag, x, y) call mkl_cdiagemv(transa, m, val, lval, idiag, ndiag, x, y) call mkl_zdiagemv(transa, m, val, lval, idiag, ndiag, x, y) C: mkl_sdiagemv(&transa, &m, val, &lval, idiag, &ndiag, x, y); mkl_ddiagemv(&transa, &m, val, &lval, idiag, &ndiag, x, y); mkl_cdiagemv(&transa, &m, val, &lval, idiag, &ndiag, x, y); mkl_zdiagemv(&transa, &m, val, &lval, idiag, &ndiag, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_?diagemv routine performs a matrix-vector operation defined as y := A*x or y := A'*x, where: x and y are vectors, A is an m-by-m sparse square matrix in the diagonal storage format, A' is the transpose of A. NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. transa CHARACTER*1. Specifies the operation. If transa = 'N' or 'n', then y := A*x If transa = 'T' or 't' or 'C' or 'c', then y := A'*x, m INTEGER. Number of rows of the matrix A. val REAL for mkl_sdiagemv. DOUBLE PRECISION for mkl_ddiagemv. BLAS and Sparse BLAS Routines 2 169 COMPLEX for mkl_ccsrgemv. DOUBLE COMPLEX for mkl_zdiagemv. Two-dimensional array of size lval*ndiag, contains non-zero diagonals of the matrix A. Refer to values array description in Diagonal Storage Scheme for more details. lval INTEGER. Leading dimension of val lval=m. Refer to lval description in Diagonal Storage Scheme for more details. idiag INTEGER. Array of length ndiag, contains the distances between main diagonal and each non-zero diagonals in the matrix A. Refer to distance array description in Diagonal Storage Scheme for more details. ndiag INTEGER. Specifies the number of non-zero diagonals of the matrix A. x REAL for mkl_sdiagemv. DOUBLE PRECISION for mkl_ddiagemv. COMPLEX for mkl_ccsrgemv. DOUBLE COMPLEX for mkl_zdiagemv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_sdiagemv. DOUBLE PRECISION for mkl_ddiagemv. COMPLEX for mkl_ccsrgemv. DOUBLE COMPLEX for mkl_zdiagemv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_sdiagemv(transa, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 transa INTEGER m, lval, ndiag INTEGER idiag(*) REAL val(lval,*), x(*), y(*) SUBROUTINE mkl_ddiagemv(transa, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 transa INTEGER m, lval, ndiag INTEGER idiag(*) DOUBLE PRECISION val(lval,*), x(*), y(*) SUBROUTINE mkl_cdiagemv(transa, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 transa INTEGER m, lval, ndiag INTEGER idiag(*) COMPLEX val(lval,*), x(*), y(*) 2 Intel® Math Kernel Library Reference Manual 170 SUBROUTINE mkl_zdiagemv(transa, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 transa INTEGER m, lval, ndiag INTEGER idiag(*) DOUBLE COMPLEX val(lval,*), x(*), y(*) C: void mkl_sdiagemv(char *transa, int *m, float *val, int *lval, int *idiag, int *ndiag, float *x, float *y); void mkl_ddiagemv(char *transa, int *m, double *val, int *lval, int *idiag, int *ndiag, double *x, double *y); void mkl_cdiagemv(char *transa, int *m, MKL_Complex8 *val, int *lval, int *idiag, int *ndiag, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zdiagemv(char *transa, int *m, MKL_Complex16 *val, int *lval, int *idiag, int *ndiag, MKL_Complex16 *x, MKL_Complex16 *y); mkl_?csrsymv Computes matrix - vector product of a sparse symmetrical matrix stored in the CSR format (3-array variation) with one-based indexing. Syntax Fortran: call mkl_scsrsymv(uplo, m, a, ia, ja, x, y) call mkl_dcsrsymv(uplo, m, a, ia, ja, x, y) call mkl_ccsrsymv(uplo, m, a, ia, ja, x, y) call mkl_zcsrsymv(uplo, m, a, ia, ja, x, y) C: mkl_scsrsymv(&uplo, &m, a, ia, ja, x, y); mkl_dcsrsymv(&uplo, &m, a, ia, ja, x, y); mkl_ccsrsymv(&uplo, &m, a, ia, ja, x, y); mkl_zcsrsymv(&uplo, &m, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_?csrsymv routine performs a matrix-vector operation defined as y := A*x where: x and y are vectors, BLAS and Sparse BLAS Routines 2 171 A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (3-array variation). NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. m INTEGER. Number of rows of the matrix A. a REAL for mkl_scsrsymv. DOUBLE PRECISION for mkl_dcsrsymv. COMPLEX for mkl_ccsrsymv. DOUBLE COMPLEX for mkl_zcsrsymv. Array containing non-zero elements of the matrix A. Its length is equal to the number of non-zero elements in the matrix A. Refer to values array description in Sparse Matrix Storage Formats for more details. ia INTEGER. Array of length m + 1, containing indices of elements in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m + 1) is equal to the number of non-zeros plus one. Refer to rowIndex array description in Sparse Matrix Storage Formats for more details. ja INTEGER. Array containing the column indices for each non-zero element of the matrix A. Its length is equal to the length of the array a. Refer to columns array description in Sparse Matrix Storage Formats for more details. x REAL for mkl_scsrsymv. DOUBLE PRECISION for mkl_dcsrsymv. COMPLEX for mkl_ccsrsymv. DOUBLE COMPLEX for mkl_zcsrsymv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_scsrsymv. DOUBLE PRECISION for mkl_dcsrsymv. COMPLEX for mkl_ccsrsymv. DOUBLE COMPLEX for mkl_zcsrsymv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. 2 Intel® Math Kernel Library Reference Manual 172 Interfaces FORTRAN 77: SUBROUTINE mkl_scsrsymv(uplo, m, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) SUBROUTINE mkl_dcsrsymv(uplo, m, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_ccsrsymv(uplo, m, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_zcsrsymv(uplo, m, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_scsrsymv(char *uplo, int *m, float *a, int *ia, int *ja, float *x, float *y); void mkl_dcsrsymv(char *uplo, int *m, double *a, int *ia, int *ja, double *x, double *y); void mkl_ccsrsymv(char *uplo, int *m, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zcsrsymv(char *uplo, int *m, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); mkl_?bsrsymv Computes matrix-vector product of a sparse symmetrical matrix stored in the BSR format (3-array variation) with one-based indexing. BLAS and Sparse BLAS Routines 2 173 Syntax Fortran: call mkl_sbsrsymv(uplo, m, lb, a, ia, ja, x, y) call mkl_dbsrsymv(uplo, m, lb, a, ia, ja, x, y) call mkl_cbsrsymv(uplo, m, lb, a, ia, ja, x, y) call mkl_zbsrsymv(uplo, m, lb, a, ia, ja, x, y) C: mkl_sbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y); mkl_dbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y); mkl_cbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y); mkl_zbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_?bsrsymv routine performs a matrix-vector operation defined as y := A*x where: x and y are vectors, A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (3-array variation). NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is considered. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. m INTEGER. Number of block rows of the matrix A. lb INTEGER. Size of the block in the matrix A. a REAL for mkl_sbsrsymv. DOUBLE PRECISION for mkl_dbsrsymv. COMPLEX for mkl_cbsrsymv. DOUBLE COMPLEX for mkl_zcsrgemv. 2 Intel® Math Kernel Library Reference Manual 174 Array containing elements of non-zero blocks of the matrix A. Its length is equal to the number of non-zero blocks in the matrix A multiplied by lb*lb. Refer to values array description in BSR Format for more details. ia INTEGER. Array of length (m + 1), containing indices of block in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m + 1) is equal to the number of non-zero blocks plus one. Refer to rowIndex array description in BSR Format for more details. ja INTEGER. Array containing the column indices for each non-zero block in the matrix A. Its length is equal to the number of non-zero blocks of the matrix A. Refer to columns array description in BSR Format for more details. x REAL for mkl_sbsrsymv. DOUBLE PRECISION for mkl_dbsrsymv. COMPLEX for mkl_cbsrsymv. DOUBLE COMPLEX for mkl_zcsrgemv. Array, DIMENSION (m*lb). On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_sbsrsymv. DOUBLE PRECISION for mkl_dbsrsymv. COMPLEX for mkl_cbsrsymv. DOUBLE COMPLEX for mkl_zcsrgemv. Array, DIMENSION at least (m*lb). On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_sbsrsymv(uplo, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m, lb INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) SUBROUTINE mkl_dbsrsymv(uplo, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_cbsrsymv(uplo, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m, lb INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) BLAS and Sparse BLAS Routines 2 175 SUBROUTINE mkl_zbsrsymv(uplo, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_sbsrsymv(char *uplo, int *m, int *lb, float *a, int *ia, int *ja, float *x, float *y); void mkl_dbsrsymv(char *uplo, int *m, int *lb, double *a, int *ia, int *ja, double *x, double *y); void mkl_cbsrsymv(char *uplo, int *m, int *lb, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zbsrsymv(char *uplo, int *m, int *lb, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); mkl_?coosymv Computes matrix - vector product of a sparse symmetrical matrix stored in the coordinate format with one-based indexing. Syntax Fortran: call mkl_scoosymv(uplo, m, val, rowind, colind, nnz, x, y) call mkl_dcoosymv(uplo, m, val, rowind, colind, nnz, x, y) call mkl_ccoosymv(uplo, m, val, rowind, colind, nnz, x, y) call mkl_zcoosymv(uplo, m, val, rowind, colind, nnz, x, y) C: mkl_scoosymv(&uplo, &m, val, rowind, colind, &nnz, x, y); mkl_dcoosymv(&uplo, &m, val, rowind, colind, &nnz, x, y); mkl_ccoosymv(&uplo, &m, val, rowind, colind, &nnz, x, y); mkl_zcoosymv(&uplo, &m, val, rowind, colind, &nnz, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_?coosymv routine performs a matrix-vector operation defined as y := A*x where: x and y are vectors, 2 Intel® Math Kernel Library Reference Manual 176 A is an upper or lower triangle of the symmetrical sparse matrix in the coordinate format. NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. m INTEGER. Number of rows of the matrix A. val REAL for mkl_scoosymv. DOUBLE PRECISION for mkl_dcoosymv. COMPLEX for mkl_ccoosymv. DOUBLE COMPLEX for mkl_zcoosymv. Array of length nnz, contains non-zero elements of the matrix A in the arbitrary order. Refer to values array description in Coordinate Format for more details. rowind INTEGER. Array of length nnz, contains the row indices for each non-zero element of the matrix A. Refer to rows array description in Coordinate Format for more details. colind INTEGER. Array of length nnz, contains the column indices for each nonzero element of the matrix A. Refer to columns array description in Coordinate Format for more details. nnz INTEGER. Specifies the number of non-zero element of the matrix A. Refer to nnz description in Coordinate Format for more details. x REAL for mkl_scoosymv. DOUBLE PRECISION for mkl_dcoosymv. COMPLEX for mkl_ccoosymv. DOUBLE COMPLEX for mkl_zcoosymv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_scoosymv. DOUBLE PRECISION for mkl_dcoosymv. COMPLEX for mkl_ccoosymv. DOUBLE COMPLEX for mkl_zcoosymv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. BLAS and Sparse BLAS Routines 2 177 Interfaces FORTRAN 77: SUBROUTINE mkl_scoosymv(uplo, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo INTEGER m, nnz INTEGER rowind(*), colind(*) REAL val(*), x(*), y(*) SUBROUTINE mkl_dcoosymv(uplo, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE PRECISION val(*), x(*), y(*) SUBROUTINE mkl_cdcoosymv(uplo, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo INTEGER m, nnz INTEGER rowind(*), colind(*) COMPLEX val(*), x(*), y(*) SUBROUTINE mkl_zcoosymv(uplo, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE COMPLEX val(*), x(*), y(*) C: void mkl_scoosymv(char *uplo, int *m, float *val, int *rowind, int *colind, int *nnz, float *x, float *y); void mkl_dcoosymv(char *uplo, int *m, double *val, int *rowind, int *colind, int *nnz, double *x, double *y); void mkl_ccoosymv(char *uplo, int *m, MKL_Complex8 *val, int *rowind, int *colind, int *nnz, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zcoosymv(char *uplo, int *m, MKL_Complex16 *val, int *rowind, int *colind, int *nnz, MKL_Complex16 *x, MKL_Complex16 *y); mkl_?diasymv Computes matrix - vector product of a sparse symmetrical matrix stored in the diagonal format with one-based indexing. 2 Intel® Math Kernel Library Reference Manual 178 Syntax Fortran: call mkl_sdiasymv(uplo, m, val, lval, idiag, ndiag, x, y) call mkl_ddiasymv(uplo, m, val, lval, idiag, ndiag, x, y) call mkl_cdiasymv(uplo, m, val, lval, idiag, ndiag, x, y) call mkl_zdiasymv(uplo, m, val, lval, idiag, ndiag, x, y) C: mkl_sdiasymv(&uplo, &m, val, &lval, idiag, &ndiag, x, y); mkl_ddiasymv(&uplo, &m, val, &lval, idiag, &ndiag, x, y); mkl_cdiasymv(&uplo, &m, val, &lval, idiag, &ndiag, x, y); mkl_zdiasymv(&uplo, &m, val, &lval, idiag, &ndiag, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_?diasymv routine performs a matrix-vector operation defined as y := A*x where: x and y are vectors, A is an upper or lower triangle of the symmetrical sparse matrix. NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. m INTEGER. Number of rows of the matrix A. val REAL for mkl_sdiasymv. DOUBLE PRECISION for mkl_ddiasymv. COMPLEX for mkl_cdiasymv. DOUBLE COMPLEX for mkl_zdiasymv. Two-dimensional array of size lval by ndiag, contains non-zero diagonals of the matrix A. Refer to values array description in Diagonal Storage Scheme for more details. lval INTEGER. Leading dimension of val, lval =m. Refer to lval description in Diagonal Storage Scheme for more details. BLAS and Sparse BLAS Routines 2 179 idiag INTEGER. Array of length ndiag, contains the distances between main diagonal and each non-zero diagonals in the matrix A. Refer to distance array description in Diagonal Storage Scheme for more details. ndiag INTEGER. Specifies the number of non-zero diagonals of the matrix A. x REAL for mkl_sdiasymv. DOUBLE PRECISION for mkl_ddiasymv. COMPLEX for mkl_cdiasymv. DOUBLE COMPLEX for mkl_zdiasymv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_sdiasymv. DOUBLE PRECISION for mkl_ddiasymv. COMPLEX for mkl_cdiasymv. DOUBLE COMPLEX for mkl_zdiasymv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_sdiasymv(uplo, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 uplo INTEGER m, lval, ndiag INTEGER idiag(*) REAL val(lval,*), x(*), y(*) SUBROUTINE mkl_ddiasymv(uplo, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 uplo INTEGER m, lval, ndiag INTEGER idiag(*) DOUBLE PRECISION val(lval,*), x(*), y(*) SUBROUTINE mkl_cdiasymv(uplo, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 uplo INTEGER m, lval, ndiag INTEGER idiag(*) COMPLEX val(lval,*), x(*), y(*) SUBROUTINE mkl_zdiasymv(uplo, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 uplo INTEGER m, lval, ndiag INTEGER idiag(*) DOUBLE COMPLEX val(lval,*), x(*), y(*) 2 Intel® Math Kernel Library Reference Manual 180 C: void mkl_sdiasymv(char *uplo, int *m, float *val, int *lval, int *idiag, int *ndiag, float *x, float *y); void mkl_ddiasymv(char *uplo, int *m, double *val, int *lval, int *idiag, int *ndiag, double *x, double *y); void mkl_cdiasymv(char *uplo, int *m, MKL_Complex8 *val, int *lval, int *idiag, int *ndiag, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zdiasymv(char *uplo, int *m, MKL_Complex16 *val, int *lval, int *idiag, int *ndiag, MKL_Complex16 *x, MKL_Complex16 *y); mkl_?csrtrsv Triangular solvers with simplified interface for a sparse matrix in the CSR format (3-array variation) with onebased indexing. Syntax Fortran: call mkl_scsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) call mkl_dcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) call mkl_ccsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) call mkl_zcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) C: mkl_scsrtrsv(&uplo, &transa, &diag, &m, a, ia, ja, x, y); mkl_dcsrtrsv(&uplo, &transa, &diag, &m, a, ia, ja, x, y); mkl_ccsrtrsv(&uplo, &transa, &diag, &m, a, ia, ja, x, y); mkl_zcsrtrsv(&uplo, &transa, &diag, &m, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_?csrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse matrix stored in the CSR format (3 array variation): A*y = x or A'*y = x, where: x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, A' is the transpose of A. BLAS and Sparse BLAS Routines 2 181 NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. transa CHARACTER*1. Specifies the system of linear equations. If transa = 'N' or 'n', then A*y = x If transa = 'T' or 't' or 'C' or 'c', then A'*y = x, diag CHARACTER*1. Specifies whether A is unit triangular. If diag = 'U' or 'u', then A is a unit triangular. If diag = 'N' or 'n', then A is not unit triangular. m INTEGER. Number of rows of the matrix A. a REAL for mkl_scsrtrmv. DOUBLE PRECISION for mkl_dcsrtrmv. COMPLEX for mkl_ccsrtrmv. DOUBLE COMPLEX for mkl_zcsrtrmv. Array containing non-zero elements of the matrix A. Its length is equal to the number of non-zero elements in the matrix A. Refer to values array description in Sparse Matrix Storage Formats for more details. NOTE The non-zero elements of the given row of the matrix must be stored in the same order as they appear in the row (from left to right). No diagonal element can be omitted from a sparse storage if the solver is called with the non-unit indicator. ia INTEGER. Array of length m + 1, containing indices of elements in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m + 1) is equal to the number of non-zeros plus one. Refer to rowIndex array description in Sparse Matrix Storage Formats for more details. ja INTEGER. Array containing the column indices for each non-zero element of the matrix A. Its length is equal to the length of the array a. Refer to columns array description in Sparse Matrix Storage Formats for more details. NOTE Column indices must be sorted in increasing order for each row. x REAL for mkl_scsrtrmv. DOUBLE PRECISION for mkl_dcsrtrmv. COMPLEX for mkl_ccsrtrmv. DOUBLE COMPLEX for mkl_zcsrtrmv. Array, DIMENSION is m. On entry, the array x must contain the vector x. 2 Intel® Math Kernel Library Reference Manual 182 Output Parameters y REAL for mkl_scsrtrmv. DOUBLE PRECISION for mkl_dcsrtrmv. COMPLEX for mkl_ccsrtrmv. DOUBLE COMPLEX for mkl_zcsrtrmv. Array, DIMENSION at least m. Contains the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_scsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) SUBROUTINE mkl_dcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_ccsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_zcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_scsrtrsv(char *uplo, char *transa, char *diag, int *m, float *a, int *ia, int *ja, float *x, float *y); void mkl_dcsrtrsv(char *uplo, char *transa, char *diag, int *m, double *a, int *ia, int *ja, double *x, double *y); void mkl_ccsrtrsv(char *uplo, char *transa, char *diag, int *m, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zcsrtrsv(char *uplo, char *transa, char *diag, int *m, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); BLAS and Sparse BLAS Routines 2 183 mkl_?bsrtrsv Triangular solver with simplified interface for a sparse matrix stored in the BSR format (3-array variation) with one-based indexing. Syntax Fortran: call mkl_sbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) call mkl_dbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) call mkl_cbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) call mkl_zbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) C: mkl_sbsrtrsv(&uplo, &transa, &diag, &m, &lb, a, ia, ja, x, y); mkl_dbsrtrsv(&uplo, &transa, &diag, &m, &lb, a, ia, ja, x, y); mkl_cbsrtrsv(&uplo, &transa, &diag, &m, &lb, a, ia, ja, x, y); mkl_zbsrtrsv(&uplo, &transa, &diag, &m, &lb, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_?bsrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse matrix stored in the BSR format (3-array variation) : y := A*x or y := A'*x, where: x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, A' is the transpose of A. NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. transa CHARACTER*1. Specifies the operation. If transa = 'N' or 'n', then the matrix-vector product is computed as y := A*x 2 Intel® Math Kernel Library Reference Manual 184 If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is computed as y := A'*x. diag CHARACTER*1. Specifies whether A is a unit triangular matrix. If diag = 'U' or 'u', then A is a unit triangular. If diag = 'N' or 'n', then A is not a unit triangular. m INTEGER. Number of block rows of the matrix A. lb INTEGER. Size of the block in the matrix A. a REAL for mkl_sbsrtrsv. DOUBLE PRECISION for mkl_dbsrtrsv. COMPLEX for mkl_cbsrtrsv. DOUBLE COMPLEX for mkl_zbsrtrsv. Array containing elements of non-zero blocks of the matrix A. Its length is equal to the number of non-zero blocks in the matrix A multiplied by lb*lb. Refer to values array description in BSR Format for more details. NOTE The non-zero elements of the given row of the matrix must be stored in the same order as they appear in the row (from left to right). No diagonal element can be omitted from a sparse storage if the solver is called with the non-unit indicator. ia INTEGER. Array of length (m + 1), containing indices of block in the array a, such that ia(I) is the index in the array a of the first non-zero element from the row I. The value of the last element ia(m + 1) is equal to the number of non-zero blocks plus one. Refer to rowIndex array description in BSR Format for more details. ja INTEGER. Array containing the column indices for each non-zero block in the matrix A. Its length is equal to the number of non-zero blocks of the matrix A. Refer to columns array description in BSR Format for more details. x REAL for mkl_sbsrtrsv. DOUBLE PRECISION for mkl_dbsrtrsv. COMPLEX for mkl_cbsrtrsv. DOUBLE COMPLEX for mkl_zbsrtrsv. Array, DIMENSION (m*lb). On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_sbsrtrsv. DOUBLE PRECISION for mkl_dbsrtrsv. COMPLEX for mkl_cbsrtrsv. DOUBLE COMPLEX for mkl_zbsrtrsv. Array, DIMENSION at least (m*lb). On exit, the array y must contain the vector y. BLAS and Sparse BLAS Routines 2 185 Interfaces FORTRAN 77: SUBROUTINE mkl_sbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, lb INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) SUBROUTINE mkl_dbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_cbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, lb INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_zbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_sbsrtrsv(char *uplo, char *transa, char *diag, int *m, int *lb, float *a, int *ia, int *ja, float *x, float *y); void mkl_dbsrtrsv(char *uplo, char *transa, char *diag, int *m, int *lb, double *a, int *ia, int *ja, double *x, double *y); void mkl_cbsrtrsv(char *uplo, char *transa, char *diag, int *m, int *lb, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zbsrtrsv(char *uplo, char *transa, char *diag, int *m, int *lb, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); mkl_?cootrsv Triangular solvers with simplified interface for a sparse matrix in the coordinate format with one-based indexing. 2 Intel® Math Kernel Library Reference Manual 186 Syntax Fortran: call mkl_scootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x, y) call mkl_dcootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x, y) call mkl_ccootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x, y) call mkl_zcootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x, y) C: mkl_scootrsv(&uplo, &transa, &diag, &m, val, rowind, colind, &nnz, x, y); mkl_dcootrsv(&uplo, &transa, &diag, &m, val, rowind, colind, &nnz, x, y); mkl_ccootrsv(&uplo, &transa, &diag, &m, val, rowind, colind, &nnz, x, y); mkl_zcootrsv(&uplo, &transa, &diag, &m, val, rowind, colind, &nnz, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_?cootrsv routine solves a system of linear equations with matrix-vector operations for a sparse matrix stored in the coordinate format: A*y = x or A'*y = x, where: x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, A' is the transpose of A. NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is considered. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. transa CHARACTER*1. Specifies the system of linear equations. If transa = 'N' or 'n', then A*y = x If transa = 'T' or 't' or 'C' or 'c', then A'*y = x, diag CHARACTER*1. Specifies whether A is unit triangular. If diag = 'U' or 'u', then A is unit triangular. If diag = 'N' or 'n', then A is not unit triangular. BLAS and Sparse BLAS Routines 2 187 m INTEGER. Number of rows of the matrix A. val REAL for mkl_scootrsv. DOUBLE PRECISION for mkl_dcootrsv. COMPLEX for mkl_ccootrsv. DOUBLE COMPLEX for mkl_zcootrsv. Array of length nnz, contains non-zero elements of the matrix A in the arbitrary order. Refer to values array description in Coordinate Format for more details. rowind INTEGER. Array of length nnz, contains the row indices for each non-zero element of the matrix A. Refer to rows array description in Coordinate Format for more details. colind INTEGER. Array of length nnz, contains the column indices for each nonzero element of the matrix A. Refer to columns array description in Coordinate Format for more details. nnz INTEGER. Specifies the number of non-zero element of the matrix A. Refer to nnz description in Coordinate Format for more details. x REAL for mkl_scootrsv. DOUBLE PRECISION for mkl_dcootrsv. COMPLEX for mkl_ccootrsv. DOUBLE COMPLEX for mkl_zcootrsv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_scootrsv. DOUBLE PRECISION for mkl_dcootrsv. COMPLEX for mkl_ccootrsv. DOUBLE COMPLEX for mkl_zcootrsv. Array, DIMENSION at least m. Contains the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_scootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, nnz INTEGER rowind(*), colind(*) REAL val(*), x(*), y(*) SUBROUTINE mkl_dcootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE PRECISION val(*), x(*), y(*) 2 Intel® Math Kernel Library Reference Manual 188 SUBROUTINE mkl_ccootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, nnz INTEGER rowind(*), colind(*) COMPLEX val(*), x(*), y(*) SUBROUTINE mkl_zcootrsv(uplo, transa, diag, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE COMPLEX val(*), x(*), y(*) C: void mkl_scootrsv(char *uplo, char *transa, char *diag, int *m, float *val, int *rowind, int *colind, int *nnz, float *x, double *y); void mkl_dcootrsv(char *uplo, char *transa, char *diag, int *m, double *val, int *rowind, int *colind, int *nnz, double *x, double *y); void mkl_ccootrsv(char *uplo, char *transa, char *diag, int *m, MKL_Complex8 *val, int *rowind, int *colind, int *nnz, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zcootrsv(char *uplo, char *transa, char *diag, int *m, MKL_Complex16 *val, int *rowind, int *colind, int *nnz, MKL_Complex16 *x, MKL_Complex16 *y); mkl_?diatrsv Triangular solvers with simplified interface for a sparse matrix in the diagonal format with one-based indexing. Syntax Fortran: call mkl_sdiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x, y) call mkl_ddiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x, y) call mkl_cdiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x, y) call mkl_zdiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x, y) C: mkl_sdiatrsv(&uplo, &transa, &diag, &m, val, &lval, idiag, &ndiag, x, y); mkl_ddiatrsv(&uplo, &transa, &diag, &m, val, &lval, idiag, &ndiag, x, y); mkl_cdiatrsv(&uplo, &transa, &diag, &m, val, &lval, idiag, &ndiag, x, y); mkl_zdiatrsv(&uplo, &transa, &diag, &m, val, &lval, idiag, &ndiag, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h BLAS and Sparse BLAS Routines 2 189 Description The mkl_?diatrsv routine solves a system of linear equations with matrix-vector operations for a sparse matrix stored in the diagonal format: A*y = x or A'*y = x, where: x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, A' is the transpose of A. NOTE This routine supports only one-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. transa CHARACTER*1. Specifies the system of linear equations. If transa = 'N' or 'n', then A*y = x If transa = 'T' or 't' or 'C' or 'c', then A'*y = x, diag CHARACTER*1. Specifies whether A is unit triangular. If diag = 'U' or 'u', then A is unit triangular. If diag = 'N' or 'n', then A is not unit triangular. m INTEGER. Number of rows of the matrix A. val REAL for mkl_sdiatrsv. DOUBLE PRECISION for mkl_ddiatrsv. COMPLEX for mkl_cdiatrsv. DOUBLE COMPLEX for mkl_zdiatrsv. Two-dimensional array of size lval by ndiag, contains non-zero diagonals of the matrix A. Refer to values array description in Diagonal Storage Scheme for more details. lval INTEGER. Leading dimension of val, lval=m. Refer to lval description in Diagonal Storage Scheme for more details. idiag INTEGER. Array of length ndiag, contains the distances between main diagonal and each non-zero diagonals in the matrix A. NOTE All elements of this array must be sorted in increasing order. Refer to distance array description in Diagonal Storage Scheme for more details. ndiag INTEGER. Specifies the number of non-zero diagonals of the matrix A. x REAL for mkl_sdiatrsv. 2 Intel® Math Kernel Library Reference Manual 190 DOUBLE PRECISION for mkl_ddiatrsv. COMPLEX for mkl_cdiatrsv. DOUBLE COMPLEX for mkl_zdiatrsv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_sdiatrsv. DOUBLE PRECISION for mkl_ddiatrsv. COMPLEX for mkl_cdiatrsv. DOUBLE COMPLEX for mkl_zdiatrsv. Array, DIMENSION at least m. Contains the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_sdiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, lval, ndiag INTEGER indiag(*) REAL val(lval,*), x(*), y(*) SUBROUTINE mkl_ddiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, lval, ndiag INTEGER indiag(*) DOUBLE PRECISION val(lval,*), x(*), y(*) SUBROUTINE mkl_cdiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, lval, ndiag INTEGER indiag(*) COMPLEX val(lval,*), x(*), y(*) SUBROUTINE mkl_zdiatrsv(uplo, transa, diag, m, val, lval, idiag, ndiag, x, y) CHARACTER*1 uplo, transa, diag INTEGER m, lval, ndiag INTEGER indiag(*) DOUBLE COMPLEX val(lval,*), x(*), y(*) C: void mkl_sdiatrsv(char *uplo, char *transa, char *diag, int *m, float *val, int *lval, int *idiag, int *ndiag, float *x, float *y); void mkl_ddiatrsv(char *uplo, char *transa, char *diag, int *m, double *val, int *lval, int *idiag, int *ndiag, double *x, double *y); BLAS and Sparse BLAS Routines 2 191 void mkl_cdiatrsv(char *uplo, char *transa, char *diag, int *m, MKL_Complex8 *val, int *lval, int *idiag, int *ndiag, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_zdiatrsv(char *uplo, char *transa, char *diag, int *m, MKL_Complex16 *val, int *lval, int *idiag, int *ndiag, MKL_Complex16 *x, MKL_Complex16 *y); mkl_cspblas_?csrgemv Computes matrix - vector product of a sparse general matrix stored in the CSR format (3-array variation) with zero-based indexing. Syntax Fortran: call mkl_cspblas_scsrgemv(transa, m, a, ia, ja, x, y) call mkl_cspblas_dcsrgemv(transa, m, a, ia, ja, x, y) call mkl_cspblas_ccsrgemv(transa, m, a, ia, ja, x, y) call mkl_cspblas_zcsrgemv(transa, m, a, ia, ja, x, y) C: mkl_cspblas_scsrgemv(&transa, &m, a, ia, ja, x, y); mkl_cspblas_dcsrgemv(&transa, &m, a, ia, ja, x, y); mkl_cspblas_ccsrgemv(&transa, &m, a, ia, ja, x, y); mkl_cspblas_zcsrgemv(&transa, &m, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_cspblas_?csrgemv routine performs a matrix-vector operation defined as y := A*x or y := A'*x, where: x and y are vectors, A is an m-by-m sparse square matrix in the CSR format (3-array variation) with zero-based indexing, A' is the transpose of A. NOTE This routine supports only zero-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. 2 Intel® Math Kernel Library Reference Manual 192 transa CHARACTER*1. Specifies the operation. If transa = 'N' or 'n', then the matrix-vector product is computed as y := A*x If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is computed as y := A'*x, m INTEGER. Number of rows of the matrix A. a REAL for mkl_cspblas_scsrgemv. DOUBLE PRECISION for mkl_cspblas_dcsrgemv. COMPLEX for mkl_cspblas_ccsrgemv. DOUBLE COMPLEX for mkl_cspblas_zcsrgemv. Array containing non-zero elements of the matrix A. Its length is equal to the number of non-zero elements in the matrix A. Refer to values array description in Sparse Matrix Storage Formats for more details. ia INTEGER. Array of length m + 1, containing indices of elements in the array a, such that ia(I) is the index in the array a of the first non-zero element from the row I. The value of the last element ia(m) is equal to the number of non-zeros. Refer to rowIndex array description in Sparse Matrix Storage Formats for more details. ja INTEGER. Array containing the column indices for each non-zero element of the matrix A. Its length is equal to the length of the array a. Refer to columns array description in Sparse Matrix Storage Formats for more details. x REAL for mkl_cspblas_scsrgemv. DOUBLE PRECISION for mkl_cspblas_dcsrgemv. COMPLEX for mkl_cspblas_ccsrgemv. DOUBLE COMPLEX for mkl_cspblas_zcsrgemv. Array, DIMENSION is m. One entry, the array x must contain the vector x. Output Parameters y REAL for mkl_cspblas_scsrgemv. DOUBLE PRECISION for mkl_cspblas_dcsrgemv. COMPLEX for mkl_cspblas_ccsrgemv. DOUBLE COMPLEX for mkl_cspblas_zcsrgemv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_cspblas_scsrgemv(transa, m, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) BLAS and Sparse BLAS Routines 2 193 SUBROUTINE mkl_cspblas_dcsrgemv(transa, m, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_cspblas_ccsrgemv(transa, m, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_cspblas_zcsrgemv(transa, m, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_cspblas_scsrgemv(char *transa, int *m, float *a, int *ia, int *ja, float *x, float *y); void mkl_cspblas_dcsrgemv(char *transa, int *m, double *a, int *ia, int *ja, double *x, double *y); void mkl_cspblas_ccsrgemv(char *transa, int *m, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_cspblas_zcsrgemv(char *transa, int *m, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); mkl_cspblas_?bsrgemv Computes matrix - vector product of a sparse general matrix stored in the BSR format (3-array variation) with zero-based indexing. Syntax Fortran: call mkl_cspblas_sbsrgemv(transa, m, lb, a, ia, ja, x, y) call mkl_cspblas_dbsrgemv(transa, m, lb, a, ia, ja, x, y) call mkl_cspblas_cbsrgemv(transa, m, lb, a, ia, ja, x, y) call mkl_cspblas_zbsrgemv(transa, m, lb, a, ia, ja, x, y) C: mkl_cspblas_sbsrgemv(&transa, &m, &lb, a, ia, ja, x, y); mkl_cspblas_dbsrgemv(&transa, &m, &lb, a, ia, ja, x, y); 2 Intel® Math Kernel Library Reference Manual 194 mkl_cspblas_cbsrgemv(&transa, &m, &lb, a, ia, ja, x, y); mkl_cspblas_zbsrgemv(&transa, &m, &lb, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_cspblas_?bsrgemv routine performs a matrix-vector operation defined as y := A*x or y := A'*x, where: x and y are vectors, A is an m-by-m block sparse square matrix in the BSR format (3-array variation) with zero-based indexing, A' is the transpose of A. NOTE This routine supports only zero-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. transa CHARACTER*1. Specifies the operation. If transa = 'N' or 'n', then the matrix-vector product is computed as y := A*x If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is computed as y := A'*x, m INTEGER. Number of block rows of the matrix A. lb INTEGER. Size of the block in the matrix A. a REAL for mkl_cspblas_sbsrgemv. DOUBLE PRECISION for mkl_cspblas_dbsrgemv. COMPLEX for mkl_cspblas_cbsrgemv. DOUBLE COMPLEX for mkl_cspblas_zbsrgemv. Array containing elements of non-zero blocks of the matrix A. Its length is equal to the number of non-zero blocks in the matrix A multiplied by lb*lb. Refer to values array description in BSR Format for more details. ia INTEGER. Array of length (m + 1), containing indices of block in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m + 1) is equal to the number of non-zero blocks. Refer to rowIndex array description in BSR Format for more details. ja INTEGER. Array containing the column indices for each non-zero block in the matrix A. Its length is equal to the number of non-zero blocks of the matrix A. Refer to columns array description in BSR Format for more details. BLAS and Sparse BLAS Routines 2 195 x REAL for mkl_cspblas_sbsrgemv. DOUBLE PRECISION for mkl_cspblas_dbsrgemv. COMPLEX for mkl_cspblas_cbsrgemv. DOUBLE COMPLEX for mkl_cspblas_zbsrgemv. Array, DIMENSION (m*lb). On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_cspblas_sbsrgemv. DOUBLE PRECISION for mkl_cspblas_dbsrgemv. COMPLEX for mkl_cspblas_cbsrgemv. DOUBLE COMPLEX for mkl_cspblas_zbsrgemv. Array, DIMENSION at least (m*lb). On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_cspblas_sbsrgemv(transa, m, lb, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m, lb INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) SUBROUTINE mkl_cspblas_dbsrgemv(transa, m, lb, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_cspblas_cbsrgemv(transa, m, lb, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m, lb INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_cspblas_zbsrgemv(transa, m, lb, a, ia, ja, x, y) CHARACTER*1 transa INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_cspblas_sbsrgemv(char *transa, int *m, int *lb, float *a, int *ia, int *ja, float *x, float *y); 2 Intel® Math Kernel Library Reference Manual 196 void mkl_cspblas_dbsrgemv(char *transa, int *m, int *lb, double *a, int *ia, int *ja, double *x, double *y); void mkl_cspblas_cbsrgemv(char *transa, int *m, int *lb, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_cspblas_zbsrgemv(char *transa, int *m, int *lb, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); mkl_cspblas_?coogemv Computes matrix - vector product of a sparse general matrix stored in the coordinate format with zerobased indexing. Syntax Fortran: call mkl_cspblas_scoogemv(transa, m, val, rowind, colind, nnz, x, y) call mkl_cspblas_dcoogemv(transa, m, val, rowind, colind, nnz, x, y) call mkl_cspblas_ccoogemv(transa, m, val, rowind, colind, nnz, x, y) call mkl_cspblas_zcoogemv(transa, m, val, rowind, colind, nnz, x, y) C: mkl_cspblas_scoogemv(&transa, &m, val, rowind, colind, &nnz, x, y); mkl_cspblas_dcoogemv(&transa, &m, val, rowind, colind, &nnz, x, y); mkl_cspblas_ccoogemv(&transa, &m, val, rowind, colind, &nnz, x, y); mkl_cspblas_zcoogemv(&transa, &m, val, rowind, colind, &nnz, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_cspblas_dcoogemv routine performs a matrix-vector operation defined as y := A*x or y := A'*x, where: x and y are vectors, A is an m-by-m sparse square matrix in the coordinate format with zero-based indexing, A' is the transpose of A. NOTE This routine supports only zero-based indexing of the input arrays. BLAS and Sparse BLAS Routines 2 197 Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. transa CHARACTER*1. Specifies the operation. If transa = 'N' or 'n', then the matrix-vector product is computed as y := A*x If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is computed as y := A'*x. m INTEGER. Number of rows of the matrix A. val REAL for mkl_cspblas_scoogemv. DOUBLE PRECISION for mkl_cspblas_dcoogemv. COMPLEX for mkl_cspblas_ccoogemv. DOUBLE COMPLEX for mkl_cspblas_zcoogemv. Array of length nnz, contains non-zero elements of the matrix A in the arbitrary order. Refer to values array description in Coordinate Format for more details. rowind INTEGER. Array of length nnz, contains the row indices for each non-zero element of the matrix A. Refer to rows array description in Coordinate Format for more details. colind INTEGER. Array of length nnz, contains the column indices for each nonzero element of the matrix A. Refer to columns array description in Coordinate Format for more details. nnz INTEGER. Specifies the number of non-zero element of the matrix A. Refer to nnz description in Coordinate Format for more details. x REAL for mkl_cspblas_scoogemv. DOUBLE PRECISION for mkl_cspblas_dcoogemv. COMPLEX for mkl_cspblas_ccoogemv. DOUBLE COMPLEX for mkl_cspblas_zcoogemv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_cspblas_scoogemv. DOUBLE PRECISION for mkl_cspblas_dcoogemv. COMPLEX for mkl_cspblas_ccoogemv. DOUBLE COMPLEX for mkl_cspblas_zcoogemv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_cspblas_scoogemv(transa, m, val, rowind, colind, nnz, x, y) CHARACTER*1 transa INTEGER m, nnz INTEGER rowind(*), colind(*) REAL val(*), x(*), y(*) 2 Intel® Math Kernel Library Reference Manual 198 SUBROUTINE mkl_cspblas_dcoogemv(transa, m, val, rowind, colind, nnz, x, y) CHARACTER*1 transa INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE PRECISION val(*), x(*), y(*) SUBROUTINE mkl_cspblas_ccoogemv(transa, m, val, rowind, colind, nnz, x, y) CHARACTER*1 transa INTEGER m, nnz INTEGER rowind(*), colind(*) COMPLEX val(*), x(*), y(*) SUBROUTINE mkl_cspblas_zcoogemv(transa, m, val, rowind, colind, nnz, x, y) CHARACTER*1 transa INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE COMPLEX val(*), x(*), y(*) C: void mkl_cspblas_scoogemv(char *transa, int *m, float *val, int *rowind, int *colind, int *nnz, float *x, float *y); void mkl_cspblas_dcoogemv(char *transa, int *m, double *val, int *rowind, int *colind, int *nnz, double *x, double *y); void mkl_cspblas_ccoogemv(char *transa, int *m, MKL_Complex8 *val, int *rowind, int *colind, int *nnz, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_cspblas_zcoogemv(char *transa, int *m, MKL_Complex16 *val, int *rowind, int *colind, int *nnz, MKL_Complex16 *x, MKL_Complex16 *y); mkl_cspblas_?csrsymv Computes matrix-vector product of a sparse symmetrical matrix stored in the CSR format (3-array variation) with zero-based indexing. Syntax Fortran: call mkl_cspblas_scsrsymv(uplo, m, a, ia, ja, x, y) call mkl_cspblas_dcsrsymv(uplo, m, a, ia, ja, x, y) call mkl_cspblas_ccsrsymv(uplo, m, a, ia, ja, x, y) call mkl_cspblas_zcsrsymv(uplo, m, a, ia, ja, x, y) C: mkl_cspblas_scsrsymv(&uplo, &m, a, ia, ja, x, y); mkl_cspblas_dcsrsymv(&uplo, &m, a, ia, ja, x, y); BLAS and Sparse BLAS Routines 2 199 mkl_cspblas_ccsrsymv(&uplo, &m, a, ia, ja, x, y); mkl_cspblas_zcsrsymv(&uplo, &m, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_cspblas_?csrsymv routine performs a matrix-vector operation defined as y := A*x where: x and y are vectors, A is an upper or lower triangle of the symmetrical sparse matrix in the CSR format (3-array variation) with zero-based indexing. NOTE This routine supports only zero-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. m INTEGER. Number of rows of the matrix A. a REAL for mkl_cspblas_scsrsymv. DOUBLE PRECISION for mkl_cspblas_dcsrsymv. COMPLEX for mkl_cspblas_ccsrsymv. DOUBLE COMPLEX for mkl_cspblas_zcsrsymv. Array containing non-zero elements of the matrix A. Its length is equal to the number of non-zero elements in the matrix A. Refer to values array description in Sparse Matrix Storage Formats for more details. ia INTEGER. Array of length m + 1, containing indices of elements in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m + 1) is equal to the number of non-zeros. Refer to rowIndex array description in Sparse Matrix Storage Formats for more details. ja INTEGER. Array containing the column indices for each non-zero element of the matrix A. Its length is equal to the length of the array a. Refer to columns array description in Sparse Matrix Storage Formats for more details. x REAL for mkl_cspblas_scsrsymv. DOUBLE PRECISION for mkl_cspblas_dcsrsymv. COMPLEX for mkl_cspblas_ccsrsymv. 2 Intel® Math Kernel Library Reference Manual 200 DOUBLE COMPLEX for mkl_cspblas_zcsrsymv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_cspblas_scsrsymv. DOUBLE PRECISION for mkl_cspblas_dcsrsymv. COMPLEX for mkl_cspblas_ccsrsymv. DOUBLE COMPLEX for mkl_cspblas_zcsrsymv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_cspblas_scsrsymv(uplo, m, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) SUBROUTINE mkl_cspblas_dcsrsymv(uplo, m, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_cspblas_ccsrsymv(uplo, m, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_cspblas_zcsrsymv(uplo, m, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_cspblas_scsrsymv(char *uplo, int *m, float *a, int *ia, int *ja, float *x, float *y); void mkl_cspblas_dcsrsymv(char *uplo, int *m, double *a, int *ia, int *ja, double *x, double *y); BLAS and Sparse BLAS Routines 2 201 void mkl_cspblas_ccsrsymv(char *uplo, int *m, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_cspblas_zcsrsymv(char *uplo, int *m, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); mkl_cspblas_?bsrsymv Computes matrix-vector product of a sparse symmetrical matrix stored in the BSR format (3-arrays variation) with zero-based indexing. Syntax Fortran: call mkl_cspblas_sbsrsymv(uplo, m, lb, a, ia, ja, x, y) call mkl_cspblas_dbsrsymv(uplo, m, lb, a, ia, ja, x, y) call mkl_cspblas_cbsrsymv(uplo, m, lb, a, ia, ja, x, y) call mkl_cspblas_zbsrsymv(uplo, m, lb, a, ia, ja, x, y) C: mkl_cspblas_sbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y); mkl_cspblas_dbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y); mkl_cspblas_cbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y); mkl_cspblas_zbsrsymv(&uplo, &m, &lb, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_cspblas_?bsrsymv routine performs a matrix-vector operation defined as y := A*x where: x and y are vectors, A is an upper or lower triangle of the symmetrical sparse matrix in the BSR format (3-array variation) with zero-based indexing. NOTE This routine supports only zero-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. 2 Intel® Math Kernel Library Reference Manual 202 uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. m INTEGER. Number of block rows of the matrix A. lb INTEGER. Size of the block in the matrix A. a REAL for mkl_cspblas_sbsrsymv. DOUBLE PRECISION for mkl_cspblas_dbsrsymv. COMPLEX for mkl_cspblas_cbsrsymv. DOUBLE COMPLEX for mkl_cspblas_zbsrsymv. Array containing elements of non-zero blocks of the matrix A. Its length is equal to the number of non-zero blocks in the matrix A multiplied by lb*lb. Refer to values array description in BSR Format for more details. ia INTEGER. Array of length (m + 1), containing indices of block in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m + 1) is equal to the number of non-zero blocks plus one. Refer to rowIndex array description in BSR Format for more details. ja INTEGER. Array containing the column indices for each non-zero block in the matrix A. Its length is equal to the number of non-zero blocks of the matrix A. Refer to columns array description in BSR Format for more details. x REAL for mkl_cspblas_sbsrsymv. DOUBLE PRECISION for mkl_cspblas_dbsrsymv. COMPLEX for mkl_cspblas_cbsrsymv. DOUBLE COMPLEX for mkl_cspblas_zbsrsymv. Array, DIMENSION (m*lb). On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_cspblas_sbsrsymv. DOUBLE PRECISION for mkl_cspblas_dbsrsymv. COMPLEX for mkl_cspblas_cbsrsymv. DOUBLE COMPLEX for mkl_cspblas_zbsrsymv. Array, DIMENSION at least (m*lb). On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_cspblas_sbsrsymv(uplo, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m, lb INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) BLAS and Sparse BLAS Routines 2 203 SUBROUTINE mkl_cspblas_dbsrsymv(uplo, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_cspblas_cbsrsymv(uplo, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m, lb INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_cspblas_zbsrsymv(uplo, m, lb, a, ia, ja, x, y) CHARACTER*1 uplo INTEGER m, lb INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_cspblas_sbsrsymv(char *uplo, int *m, int *lb, float *a, int *ia, int *ja, float *x, float *y); void mkl_cspblas_dbsrsymv(char *uplo, int *m, int *lb, double *a, int *ia, int *ja, double *x, double *y); void mkl_cspblas_cbsrsymv(char *uplo, int *m, int *lb, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_cspblas_zbsrsymv(char *uplo, int *m, int *lb, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); mkl_cspblas_?coosymv Computes matrix - vector product of a sparse symmetrical matrix stored in the coordinate format with zero-based indexing . Syntax Fortran: call mkl_cspblas_scoosymv(uplo, m, val, rowind, colind, nnz, x, y) call mkl_cspblas_dcoosymv(uplo, m, val, rowind, colind, nnz, x, y) call mkl_cspblas_ccoosymv(uplo, m, val, rowind, colind, nnz, x, y) call mkl_cspblas_zcoosymv(uplo, m, val, rowind, colind, nnz, x, y) C: mkl_cspblas_scoosymv(&uplo, &m, val, rowind, colind, &nnz, x, y); mkl_cspblas_dcoosymv(&uplo, &m, val, rowind, colind, &nnz, x, y); 2 Intel® Math Kernel Library Reference Manual 204 mkl_cspblas_ccoosymv(&uplo, &m, val, rowind, colind, &nnz, x, y); mkl_cspblas_zcoosymv(&uplo, &m, val, rowind, colind, &nnz, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_cspblas_?coosymv routine performs a matrix-vector operation defined as y := A*x where: x and y are vectors, A is an upper or lower triangle of the symmetrical sparse matrix in the coordinate format with zero-based indexing. NOTE This routine supports only zero-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. m INTEGER. Number of rows of the matrix A. val REAL for mkl_cspblas_scoosymv. DOUBLE PRECISION for mkl_cspblas_dcoosymv. COMPLEX for mkl_cspblas_ccoosymv. DOUBLE COMPLEX for mkl_cspblas_zcoosymv. Array of length nnz, contains non-zero elements of the matrix A in the arbitrary order. Refer to values array description in Coordinate Format for more details. rowind INTEGER. Array of length nnz, contains the row indices for each non-zero element of the matrix A. Refer to rows array description in Coordinate Format for more details. colind INTEGER. Array of length nnz, contains the column indices for each nonzero element of the matrix A. Refer to columns array description in Coordinate Format for more details. nnz INTEGER. Specifies the number of non-zero element of the matrix A. Refer to nnz description in Coordinate Format for more details. x REAL for mkl_cspblas_scoosymv. DOUBLE PRECISION for mkl_cspblas_dcoosymv. COMPLEX for mkl_cspblas_ccoosymv. DOUBLE COMPLEX for mkl_cspblas_zcoosymv. Array, DIMENSION is m. On entry, the array x must contain the vector x. BLAS and Sparse BLAS Routines 2 205 Output Parameters y REAL for mkl_cspblas_scoosymv. DOUBLE PRECISION for mkl_cspblas_dcoosymv. COMPLEX for mkl_cspblas_ccoosymv. DOUBLE COMPLEX for mkl_cspblas_zcoosymv. Array, DIMENSION at least m. On exit, the array y must contain the vector y. Interfaces FORTRAN 77: SUBROUTINE mkl_cspblas_scoosymv(uplo, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo INTEGER m, nnz INTEGER rowind(*), colind(*) REAL val(*), x(*), y(*) SUBROUTINE mkl_cspblas_dcoosymv(uplo, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE PRECISION val(*), x(*), y(*) SUBROUTINE mkl_cspblas_ccoosymv(uplo, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo INTEGER m, nnz INTEGER rowind(*), colind(*) COMPLEX val(*), x(*), y(*) SUBROUTINE mkl_cspblas_zcoosymv(uplo, m, val, rowind, colind, nnz, x, y) CHARACTER*1 uplo INTEGER m, nnz INTEGER rowind(*), colind(*) DOUBLE COMPLEX val(*), x(*), y(*) C: void mkl_cspblas_scoosymv(char *uplo, int *m, float *val, int *rowind, int *colind, int *nnz, float *x, float *y); void mkl_cspblas_dcoosymv(char *uplo, int *m, double *val, int *rowind, int *colind, int *nnz, double *x, double *y); void mkl_cspblas_ccoosymv(char *uplo, int *m, MKL_Complex8 *val, int *rowind, int *colind, int *nnz, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_cspblas_zcoosymv(char *uplo, int *m, MKL_Complex16 *val, int *rowind, int *colind, int *nnz, MKL_Complex16 *x, MKL_Complex16 *y); 2 Intel® Math Kernel Library Reference Manual 206 mkl_cspblas_?csrtrsv Triangular solvers with simplified interface for a sparse matrix in the CSR format (3-array variation) with zero-based indexing. Syntax Fortran: call mkl_cspblas_scsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) call mkl_cspblas_dcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) call mkl_cspblas_ccsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) call mkl_cspblas_zcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) C: mkl_cspblas_scsrtrsv(&uplo, &transa, &diag, &m, a, ia, ja, x, y); mkl_cspblas_dcsrtrsv(&uplo, &transa, &diag, &m, a, ia, ja, x, y); mkl_cspblas_ccsrtrsv(&uplo, &transa, &diag, &m, a, ia, ja, x, y); mkl_cspblas_zcsrtrsv(&uplo, &transa, &diag, &m, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_cspblas_?csrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse matrix stored in the CSR format (3-array variation) with zero-based indexing: A*y = x or A'*y = x, where: x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, A' is the transpose of A. NOTE This routine supports only zero-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies whether the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. transa CHARACTER*1. Specifies the system of linear equations. If transa = 'N' or 'n', then A*y = x BLAS and Sparse BLAS Routines 2 207 If transa = 'T' or 't' or 'C' or 'c', then A'*y = x, diag CHARACTER*1. Specifies whether matrix A is unit triangular. If diag = 'U' or 'u', then A is unit triangular. If diag = 'N' or 'n', then A is not unit triangular. m INTEGER. Number of rows of the matrix A. a REAL for mkl_cspblas_scsrtrsv. DOUBLE PRECISION for mkl_cspblas_dcsrtrsv. COMPLEX for mkl_cspblas_ccsrtrsv. DOUBLE COMPLEX for mkl_cspblas_zcsrtrsv. Array containing non-zero elements of the matrix A. Its length is equal to the number of non-zero elements in the matrix A. Refer to values array description in Sparse Matrix Storage Formats for more details. NOTE The non-zero elements of the given row of the matrix must be stored in the same order as they appear in the row (from left to right). No diagonal element can be omitted from a sparse storage if the solver is called with the non-unit indicator. ia INTEGER. Array of length m+1, containing indices of elements in the array a, such that ia(i) is the index in the array a of the first non-zero element from the row i. The value of the last element ia(m) is equal to the number of non-zeros. Refer to rowIndex array description in Sparse Matrix Storage Formats for more details. ja INTEGER. Array containing the column indices for each non-zero element of the matrix A. Its length is equal to the length of the array a. Refer to columns array description in Sparse Matrix Storage Formats for more details. NOTE Column indices must be sorted in increasing order for each row. x REAL for mkl_cspblas_scsrtrsv. DOUBLE PRECISION for mkl_cspblas_dcsrtrsv. COMPLEX for mkl_cspblas_ccsrtrsv. DOUBLE COMPLEX for mkl_cspblas_zcsrtrsv. Array, DIMENSION is m. On entry, the array x must contain the vector x. Output Parameters y REAL for mkl_cspblas_scsrtrsv. DOUBLE PRECISION for mkl_cspblas_dcsrtrsv. COMPLEX for mkl_cspblas_ccsrtrsv. DOUBLE COMPLEX for mkl_cspblas_zcsrtrsv. Array, DIMENSION at least m. Contains the vector y. 2 Intel® Math Kernel Library Reference Manual 208 Interfaces FORTRAN 77: SUBROUTINE mkl_cspblas_scsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m INTEGER ia(*), ja(*) REAL a(*), x(*), y(*) SUBROUTINE mkl_cspblas_dcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m INTEGER ia(*), ja(*) DOUBLE PRECISION a(*), x(*), y(*) SUBROUTINE mkl_cspblas_ccsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m INTEGER ia(*), ja(*) COMPLEX a(*), x(*), y(*) SUBROUTINE mkl_cspblas_zcsrtrsv(uplo, transa, diag, m, a, ia, ja, x, y) CHARACTER*1 uplo, transa, diag INTEGER m INTEGER ia(*), ja(*) DOUBLE COMPLEX a(*), x(*), y(*) C: void mkl_cspblas_scsrtrsv(char *uplo, char *transa, char *diag, int *m, float *a, int *ia, int *ja, float *x, float *y); void mkl_cspblas_dcsrtrsv(char *uplo, char *transa, char *diag, int *m, double *a, int *ia, int *ja, double *x, double *y); void mkl_cspblas_ccsrtrsv(char *uplo, char *transa, char *diag, int *m, MKL_Complex8 *a, int *ia, int *ja, MKL_Complex8 *x, MKL_Complex8 *y); void mkl_cspblas_zcsrtrsv(char *uplo, char *transa, char *diag, int *m, MKL_Complex16 *a, int *ia, int *ja, MKL_Complex16 *x, MKL_Complex16 *y); mkl_cspblas_?bsrtrsv Triangular solver with simplified interface for a sparse matrix stored in the BSR format (3-array variation) with zero-based indexing. BLAS and Sparse BLAS Routines 2 209 Syntax Fortran: call mkl_cspblas_sbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) call mkl_cspblas_dbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) call mkl_cspblas_cbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) call mkl_cspblas_zbsrtrsv(uplo, transa, diag, m, lb, a, ia, ja, x, y) C: mkl_cspblas_sbsrtrsv(&uplo, &transa, &diag, &m, &lb, a, ia, ja, x, y); mkl_cspblas_dbsrtrsv(&uplo, &transa, &diag, &m, &lb, a, ia, ja, x, y); mkl_cspblas_cbsrtrsv(&uplo, &transa, &diag, &m, &lb, a, ia, ja, x, y); mkl_cspblas_zbsrtrsv(&uplo, &transa, &diag, &m, &lb, a, ia, ja, x, y); Include Files • FORTRAN 77: mkl_spblas.fi • C: mkl_spblas.h Description The mkl_cspblas_?bsrtrsv routine solves a system of linear equations with matrix-vector operations for a sparse matrix stored in the BSR format (3-array variation) with zero-based indexing: y := A*x or y := A'*x, where: x and y are vectors, A is a sparse upper or lower triangular matrix with unit or non-unit main diagonal, A' is the transpose of A. NOTE This routine supports only zero-based indexing of the input arrays. Input Parameters Parameter descriptions are common for all implemented interfaces with the exception of data types that refer here to the FORTRAN 77 standard types. Data types specific to the different interfaces are described in the section "Interfaces" below. uplo CHARACTER*1. Specifies the upper or low triangle of the matrix A is used. If uplo = 'U' or 'u', then the upper triangle of the matrix A is used. If uplo = 'L' or 'l', then the low triangle of the matrix A is used. transa CHARACTER*1. Specifies the operation. If transa = 'N' or 'n', then the matrix-vector product is computed as y := A*x If transa = 'T' or 't' or 'C' or 'c', then the matrix-vector product is computed as y := A'*x. diag CHARACTER*1. Specifies whether matrix A is unit triangular or not. If diag = 'U' or 'u', A is unit triangular. 2 Intel® Math Kernel Library Reference Manual 210 If diag = 'N' or 'n', A is not unit triangular. m INTEGER. Number of block rows of the matrix A. lb INTEGER. Size of the block in the matrix A. a REAL for mkl_cspblas_sbsrtrsv. DOUBLE PRECISION for mkl_cspblas_dbsrtrsv. COMPLEX for mkl_cspblas_cbsrtrsv. DOUBLE COMPLEX for mkl_cspblas_zbsrtrsv. Array containing elements of non-zero blocks of the matrix A. Its length is equal to the number of non-zero blocks in the matrix A multiplied by lb*lb. Refer to values array description in BSR Format for more details. NOTE The non-zero elements of the given row of the matrix must be stored in the same order as they appear in the row (from left to right). No diagonal element can be omitted from a sparse storage if the solver is called with the non-unit indicator. ia INTEGER. Array of length (m + 1), containing indices of block in the array a, such that ia(I) is the index in the array a of the first non-zero element from the row I. The value

10% de réduction sur vos envois d'emailing --> CLIQUEZ ICI

Retour à l'accueil, cliquez ici

Documentation INTEL Rechercher un produit INTEL :

http://software.intel.com/sites/products/search/search.php?q=&x=26&y=18&product=&version=&docos=

Accéder au manuel utilisateur

Intel ® Math Kernel Library for Linux* OS User's Guide http://software.intel.com/sites/products/documentation/hpc/mkl/mkl_userguide_lnx/mkl_userguide_lnx.pdf

Accéder au manuel utilisateur

Intel ® Math Kernel Library for Mac OS* X User's Guide http://software.intel.com/sites/products/documentation/hpc/mkl/mkl_userguide_mac/mkl_userguide_mac.pdf

Accéder au manuel utilisateur

Intel ® Math Kernel Library for Windows* OS User's Guide Intel® MKL - Windows* OS Document Number: 315930-018US http://software.intel.com/sites/products/documentation/hpc/mkl/mkl_userguide_win/mkl_userguide_win.pdf

Accéder au manuel utilisateur

Intel ® Math Kernel Library Reference Manual Document Number: 630813-045US MKL 10.3 Update 8 http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/mklman.pdf

Accéder au manuel utilisateur

Intel(R) VTune(TM) Amplifier XE 2011 Getting Started Tutorials for Linux* OS Document Number: 324207-005US

http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/lin/start/getting_started_amplifier_xe_linux.pdf Intel(R) VTune(TM) Amplifier XE 2011 Getting Started Tutorials for Windows* OS

http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/win/start/getting_started_amplifier_xe_windows.pdf Intel® VTune™ Amplifier XE 2011 Release Notes for Linux Installation Guide and Release Notes Document number: 323591-001U

http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/lin/start/release_notes_amplifier_xe_linux.pdf Intel® VTune™ Amplifier XE 2011 Release Notes for Windows* OS Installation Guide and Release Notes Document number: 323401-001U

http://software.intel.com/sites/products/documentation/hpc/amplifierxe/en-us/2011Update/win/start/release_notes_amplifier_xe_windows.pdf Intel(R) Threading Building Blocks Reference Manual

http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/tbbxe/Reference.pdf Intel® Threading Building Blocks Design Patterns Design Patterns Document Number 323512-005U

http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/tbbxe/Design_Patterns.pdf Intel® Parallel Studio 2011 SP1 Installation Guide and Release Notes Document number: 321604-003US 24 July 201

http://software.intel.com/sites/products/documentation/studio/studio/en-us/2011Update/release_notes_studio.pdf Intel® Math Kernel Library Summary Statistics Application Note

http://software.intel.com/sites/products/documentation/hpc/mkl/sslnotes/sslnotes.pdf Intel® Math Kernel Library Vector Statistical Library Notes

http://software.intel.com/sites/products/documentation/hpc/mkl/vslnotes/vslnotes.pdf

Intel ® C++ Composer XE 2011 Getting Started Tutorials Document Number: 323648-001US

http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/start/lin/getting_started_composerxe2011_cpp_lin.pdf Intel ® C++ Composer XE 2011 Getting Started Tutorials Document Number: 323649-001US

http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/start/mac/getting_started_composerxe2011_cpp_mac.pdf Intel ® C++ Composer XE 2011 Getting Started Tutorials Document Number: 323647-001US

http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/start/win/getting_started_composerxe2011_cpp_win.pdf Intel ® Fortran Composer XE 2011 Getting Started Tutorials Document Number: 323651-001US

http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/start/lin/getting_started_composerxe2011_for_lin.pdf Intel® Parallel Inspector 2011 Release Notes Installation Guide and Release Notes Document number: 320754-002U

http://software.intel.com/sites/products/documentation/studio/inspector/en-us/2011Update/start/release_notes_inspector.pdf Intel ® Visual Fortran Composer XE 2011 Getting Started Tutorials Document Number: 323650-001US

http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/start/win/getting_started_composerxe2011_for_win.pdf Intel® Rapid Storage Technology User Guide August 2011 Revision 1.

http://download.intel.com/support/chipsets/imsm/sb/irst_user_guide.pdf Intel® Matrix Storage Manager 8.x User's Manual January 2009 Revision 1.

http://download.intel.com/support/chipsets/imsm/sb/8_x_raid_ahci_users_manual.pdf Intel ® Math Kernel Library for Linux* OS User's Guide Intel® MKL - Linux* OS Document Number: 314774-019US Legal InformationContents Legal Information................................................................................7 Introducing the Intel® Math Kernel Library...........................................9 Getting Help and Support...................................................................11 Notational Conventions......................................................................13 Chapter 1: Overview Document Overview.................................................................................15 What's New.............................................................................................15 Related Information.................................................................................15 Chapter 2: Getting Started Checking Your Installation.........................................................................17 Setting Environment Variables...................................................................17 Scripts to Set Environment Variables .................................................18 Automating the Process of Setting Environment Variables.....................19 Compiler Support.....................................................................................19 Using Code Examples...............................................................................20 What You Need to Know Before You Begin Using the Intel ® Math Kernel Library...............................................................................................20 Chapter 3: Structure of the Intel® Math Kernel Library Architecture Support................................................................................23 High-level Directory Structure....................................................................23 Layered Model Concept.............................................................................24 Accessing the Intel ® Math Kernel Library Documentation...............................25 Contents of the Documentation Directories..........................................26 Viewing Man Pages..........................................................................26 Chapter 4: Linking Your Application with the Intel® Math Kernel Library Linking Quick Start...................................................................................27 Using the -mkl Compiler Option.........................................................27 Using the Single Dynamic Library.......................................................28 Selecting Libraries to Link with..........................................................28 Using the Link-line Advisor................................................................29 Using the Command-line Link Tool.....................................................29 Linking Examples.....................................................................................29 Linking on IA-32 Architecture Systems...............................................29 Linking on Intel(R) 64 Architecture Systems........................................30 Linking in Detail.......................................................................................31 Listing Libraries on a Link Line...........................................................31 Dynamically Selecting the Interface and Threading Layer......................32 Linking with Interface Libraries..........................................................33 Using the ILP64 Interface vs. LP64 Interface...............................33 Linking with Fortran 95 Interface Libraries..................................35 Linking with Threading Libraries.........................................................35 Sequential Mode of the Library..................................................35 Contents 3Selecting the Threading Layer...................................................36 Linking with Computational Libraries..................................................37 Linking with Compiler Run-time Libraries............................................37 Linking with System Libraries............................................................38 Building Custom Shared Objects................................................................38 Using the Custom Shared Object Builder.............................................38 Composing a List of Functions ..........................................................39 Specifying Function Names...............................................................40 Distributing Your Custom Shared Object.............................................40 Chapter 5: Managing Performance and Memory Using Parallelism of the Intel ® Math Kernel Library........................................41 Threaded Functions and Problems......................................................41 Avoiding Conflicts in the Execution Environment..................................43 Techniques to Set the Number of Threads...........................................44 Setting the Number of Threads Using an OpenMP* Environment Variable......................................................................................44 Changing the Number of Threads at Run Time.....................................44 Using Additional Threading Control.....................................................46 Intel MKL-specific Environment Variables for Threading Control. . . . .46 MKL_DYNAMIC........................................................................47 MKL_DOMAIN_NUM_THREADS..................................................48 Setting the Environment Variables for Threading Control..............49 Tips and Techniques to Improve Performance..............................................49 Coding Techniques...........................................................................50 Hardware Configuration Tips.............................................................50 Managing Multi-core Performance......................................................51 Operating on Denormals...................................................................52 FFT Optimized Radices.....................................................................52 Using Memory Management ......................................................................52 Intel MKL Memory Management Software............................................52 Redefining Memory Functions............................................................53 Chapter 6: Language-specific Usage Options Using Language-Specific Interfaces with Intel ® Math Kernel Library.................55 Interface Libraries and Modules.........................................................55 Fortran 95 Interfaces to LAPACK and BLAS..........................................57 Compiler-dependent Functions and Fortran 90 Modules.........................57 Mixed-language Programming with the Intel Math Kernel Library....................58 Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments..............................................................................58 Using Complex Types in C/C++.........................................................59 Calling BLAS Functions that Return the Complex Values in C/C++ Code..........................................................................................60 Support for Boost uBLAS Matrix-matrix Multiplication...........................61 Invoking Intel MKL Functions from Java* Applications...........................62 Intel MKL Java* Examples........................................................62 Running the Java* Examples.....................................................64 Known Limitations of the Java* Examples...................................65 Chapter 7: Coding Tips Intel® Math Kernel Library for Linux* OS User's Guide 4Aligning Data for Consistent Results...........................................................67 Using Predefined Preprocessor Symbols for Intel ® MKL Version-Dependent Compilation.........................................................................................68 Chapter 8: Working with the Intel® Math Kernel Library Cluster Software Linking with ScaLAPACK and Cluster FFTs....................................................69 Setting the Number of Threads..................................................................70 Using Shared Libraries..............................................................................71 Building ScaLAPACK Tests.........................................................................71 Examples for Linking with ScaLAPACK and Cluster FFT..................................71 Examples for Linking a C Application..................................................71 Examples for Linking a Fortran Application..........................................72 Chapter 9: Programming with Intel® Math Kernel Library in the Eclipse* Integrated Development Environment (IDE) Configuring the Eclipse* IDE CDT to Link with Intel MKL ...............................73 Getting Assistance for Programming in the Eclipse* IDE ...............................73 Viewing the Intel ® Math Kernel Library Reference Manual in the Eclipse* IDE................................................................................74 Searching the Intel Web Site from the Eclipse* IDE..............................74 Chapter 10: LINPACK and MP LINPACK Benchmarks Intel ® Optimized LINPACK Benchmark for Linux* OS.....................................77 Contents of the Intel ® Optimized LINPACK Benchmark..........................77 Running the Software.......................................................................78 Known Limitations of the Intel ® Optimized LINPACK Benchmark.............79 Intel ® Optimized MP LINPACK Benchmark for Clusters...................................79 Overview of the Intel ® Optimized MP LINPACK Benchmark for Clusters....79 Contents of the Intel ® Optimized MP LINPACK Benchmark for Clusters. . . .80 Building the MP LINPACK..................................................................82 New Features of Intel ® Optimized MP LINPACK Benchmark....................82 Benchmarking a Cluster....................................................................83 Options to Reduce Search Time.........................................................83 Appendix A: Intel® Math Kernel Library Language Interfaces Support Language Interfaces Support, by Function Domain.......................................87 Include Files............................................................................................88 Appendix B: Support for Third-Party Interfaces GMP* Functions.......................................................................................91 FFTW Interface Support............................................................................91 Appendix C: Directory Structure in Detail Detailed Structure of the IA-32 Architecture Directories................................93 Static Libraries in the lib/ia32 Directory..............................................93 Dynamic Libraries in the lib/ia32 Directory..........................................94 Detailed Structure of the Intel ® 64 Architecture Directories............................95 Static Libraries in the lib/intel64 Directory...........................................95 Dynamic Libraries in the lib/intel64 Directory.......................................97 Contents 5Intel® Math Kernel Library for Linux* OS User's Guide 6Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http:// www.intel.com/design/literature.htm Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/ processor_number/ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Java is a registered trademark of Oracle and/or its affiliates. Copyright © 2006 - 2011, Intel Corporation. All rights reserved. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for 7Optimization Notice use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Intel® Math Kernel Library for Linux* OS User's Guide 8Introducing the Intel® Math Kernel Library The Intel ® Math Kernel Library (Intel ® MKL) improves performance of scientific, engineering, and financial software that solves large computational problems. Among other functionality, Intel MKL provides linear algebra routines, fast Fourier transforms, as well as vectorized math and random number generation functions, all optimized for the latest Intel processors, including processors with multiple cores (see the Intel ® MKL Release Notes for the full list of supported processors). Intel MKL also performs well on non-Intel processors. Intel MKL is thread-safe and extensively threaded using the OpenMP* technology. Intel MKL provides the following major functionality: • Linear algebra, implemented in LAPACK (solvers and eigensolvers) plus level 1, 2, and 3 BLAS, offering the vector, vector-matrix, and matrix-matrix operations needed for complex mathematical software. If you prefer the FORTRAN 90/95 programming language, you can call LAPACK driver and computational subroutines through specially designed interfaces with reduced numbers of arguments. A C interface to LAPACK is also available. • ScaLAPACK (SCAlable LAPACK) with its support functionality including the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS). ScaLAPACK is available for Intel MKL for Linux* and Windows* operating systems. • Direct sparse solver, an iterative sparse solver, and a supporting set of sparse BLAS (level 1, 2, and 3) for solving sparse systems of equations. • Multidimensional discrete Fourier transforms (1D, 2D, 3D) with a mixed radix support (for sizes not limited to powers of 2). Distributed versions of these functions are provided for use on clusters on the Linux* and Windows* operating systems. • A set of vectorized transcendental functions called the Vector Math Library (VML). For most of the supported processors, the Intel MKL VML functions offer greater performance than the libm (scalar) functions, while keeping the same high accuracy. • The Vector Statistical Library (VSL), which offers high performance vectorized random number generators for several probability distributions, convolution and correlation routines, and summary statistics functions. • Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search. For details see the Intel® MKL Reference Manual. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 9 Intel® Math Kernel Library for Linux* OS User's Guide 10Getting Help and Support Intel provides a support web site that contains a rich repository of self help information, including getting started tips, known product issues, product errata, license information, user forums, and more. Visit the Intel MKL support website at http://www.intel.com/software/products/support/. The Intel MKL documentation integrates into the Eclipse* integrated development environment (IDE). See Getting Assistance for Programming in the Eclipse* IDE . 11 Intel® Math Kernel Library for Linux* OS User's Guide 12Notational Conventions The following term is used in reference to the operating system. Linux* OS This term refers to information that is valid on all supported Linux* operating systems. The following notations are used to refer to Intel MKL directories. The installation directory for the Intel® C++ Composer XE or Intel® Fortran Composer XE . The main directory where Intel MKL is installed: =/mkl. Replace this placeholder with the specific pathname in the configuring, linking, and building instructions. The following font conventions are used in this document. Italic Italic is used for emphasis and also indicates document names in body text, for example: see Intel MKL Reference Manual. Monospace lowercase Indicates filenames, directory names, and pathnames, for example: ./benchmarks/ linpack Monospace lowercase mixed with uppercase Indicates: • Commands and command-line options, for example, icc myprog.c -L$MKLPATH -I$MKLINCLUDE -lmkl -liomp5 -lpthread • C/C++ code fragments, for example, a = new double [SIZE*SIZE]; UPPERCASE MONOSPACE Indicates system variables, for example, $MKLPATH. Monospace italic Indicates a parameter in discussions, for example, lda. When enclosed in angle brackets, indicates a placeholder for an identifier, an expression, a string, a symbol, or a value, for example, . Substitute one of these items for the placeholder. [ items ] Square brackets indicate that the items enclosed in brackets are optional. { item | item } Braces indicate that only one of the items listed between braces should be selected. A vertical bar ( | ) separates the items. 13 Intel® Math Kernel Library for Linux* OS User's Guide 14Overview 1 Document Overview The Intel® Math Kernel Library (Intel® MKL) User's Guide provides usage information for the library. The usage information covers the organization, configuration, performance, and accuracy of Intel MKL, specifics of routine calls in mixed-language programming, linking, and more. This guide describes OS-specific usage of Intel MKL, along with OS-independent features. The document contains usage information for all Intel MKL function domains. This User's Guide provides the following information: • Describes post-installation steps to help you start using the library • Shows you how to configure the library with your development environment • Acquaints you with the library structure • Explains how to link your application with the library and provides simple usage scenarios • Describes how to code, compile, and run your application with Intel MKL This guide is intended for Linux OS programmers with beginner to advanced experience in software development. See Also Language Interfaces Support, by Function Domain What's New This User's Guide documents the Intel® Math Kernel Library (Intel® MKL) 10.3 Update 8. The document was updated to reflect addition of Data Fitting Functions to the product. Related Information To reference how to use the library in your application, use this guide in conjunction with the following documents: • The Intel® Math Kernel Library Reference Manual, which provides reference information on routine functionalities, parameter descriptions, interfaces, calling syntaxes, and return values. • The Intel® Math Kernel Library for Linux* OS Release Notes. 151 Intel® Math Kernel Library for Linux* OS User's Guide 16Getting Started 2 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Checking Your Installation After installing the Intel® Math Kernel Library (Intel® MKL), verify that the library is properly installed and configured: 1. Intel MKL installs in . Check that the subdirectory of referred to as was created. 2. If you want to keep multiple versions of Intel MKL installed on your system, update your build scripts to point to the correct Intel MKL version. 3. Check that the following files appear in the /bin directory and its subdirectories: mklvars.sh mklvars.csh ia32/mklvars_ia32.sh ia32/mklvars_ia32.csh intel64/mklvars_intel64.sh intel64/mklvars_intel64.csh Use these files to assign Intel MKL-specific values to several environment variables, as explained in Setting Environment Variables 4. To understand how the Intel MKL directories are structured, see Intel® Math Kernel Library Structure. 5. To make sure that Intel MKL runs on your system, launch an Intel MKL example, as explained in Using Code Examples. See Also Notational Conventions Setting Environment Variables See Also Setting the Number of Threads Using an OpenMP* Environment Variable 17Scripts to Set Environment Variables When the installation of Intel MKL for Linux* OS is complete, set the INCLUDE, MKLROOT, LD_LIBRARY_PATH, MANPATH, LIBRARY_PATH, CPATH, FPATH, and NLSPATH environment variables in the command shell using one of the script files in the bin subdirectory of the Intel MKL installation directory. Choose the script corresponding to your system architecture and command shell as explained in the following table: Architecture Shell Script File IA-32 C ia32/mklvars_ia32.csh IA-32 Bash and Bourne (sh) ia32/mklvars_ia32.sh Intel® 64 C intel64/mklvars_intel64.csh Intel® 64 Bash and Bourne (sh) intel64/mklvars_intel64.sh IA-32 and Intel® 64 C mklvars.csh IA-32 and Intel® 64 Bash and Bourne (sh) mklvars.sh Running the Scripts The scripts accept parameters to specify the following: • Architecture. • Addition of a path to Fortran 95 modules precompiled with the Intel ® Fortran compiler to the FPATH environment variable. Supply this parameter only if you are using the Intel ® Fortran compiler. • Interface of the Fortran 95 modules. This parameter is needed only if you requested addition of a path to the modules. Usage and values of these parameters depend on the scriptname (regardless of the extension). The following table lists values of the script parameters. Script Architecture (required, when applicable) Addition of a Path to Fortran 95 Modules (optional) Interface (optional) mklvars_ia32 n/a † mod n/a mklvars_intel64 n/a mod lp64, default ilp64 mklvars ia32 intel64 mod lp64, default ilp64 † Not applicable. For example: • The command mklvars_ia32.sh sets environment variables for the IA-32 architecture and adds no path to the Fortran 95 modules. • The command mklvars_intel64.sh mod ilp64 sets environment variables for the Intel ® 64 architecture and adds the path to the Fortran 95 modules for the ILP64 interface to the FPATH environment variable. • The command mklvars.sh intel64 mod 2 Intel® Math Kernel Library for Linux* OS User's Guide 18sets environment variables for the Intel ® 64 architecture and adds the path to the Fortran 95 modules for the LP64 interface to the FPATH environment variable. NOTE Supply the parameter specifying the architecture first, if it is needed. Values of the other two parameters can be listed in any order. See Also High-level Directory Structure Interface Libraries and Modules Fortran 95 Interfaces to LAPACK and BLAS Setting the Number of Threads Using an OpenMP* Environment Variable Automating the Process of Setting Environment Variables To automate setting of the INCLUDE, MKLROOT, LD_LIBRARY_PATH, MANPATH, LIBRARY_PATH, CPATH, FPATH, and NLSPATH environment variables, add mklvars*.*sh to your shell profile so that each time you login, the script automatically executes and sets the paths to the appropriate Intel MKL directories. To do this, with a local user account, edit the following files by adding the appropriate script to the path manipulation section right before exporting variables: Shell Files Commands bash ~/.bash_profile, ~/.bash_login or ~/.profile # setting up MKL environment for bash . /bin [/]/mklvars[].sh [] [mod] [lp64|ilp64] sh ~/.profile # setting up MKL environment for sh . /bin [/]/mklvars[].sh [] [mod] [lp64|ilp64] csh ~/.login # setting up MKL environment for sh . /bin [/]/mklvars[].csh [] [mod] [lp64|ilp64] In the above commands, replace with ia32 or intel64. If you have super user permissions, add the same commands to a general-system file in /etc/profile (for bash and sh) or in /etc/csh.login (for csh). CAUTION Before uninstalling Intel MKL, remove the above commands from all profile files where the script execution was added. Otherwise you may experience problems logging in. See Also Scripts to Set Environment Variables Compiler Support Intel MKL supports compilers identified in the Release Notes. However, the library has been successfully used with other compilers as well. Intel MKL provides a set of include files to simplify program development by specifying enumerated values and prototypes for the respective functions. Calling Intel MKL functions from your application without an appropriate include file may lead to incorrect behavior of the functions. Getting Started 2 19See Also Include Files Using Code Examples The Intel MKL package includes code examples, located in the examples subdirectory of the installation directory. Use the examples to determine: • Whether Intel MKL is working on your system • How you should call the library • How to link the library The examples are grouped in subdirectories mainly by Intel MKL function domains and programming languages. For example, the examples/spblas subdirectory contains a makefile to build the Sparse BLAS examples and the examples/vmlc subdirectory contains the makefile to build the C VML examples. Source code for the examples is in the next-level sources subdirectory. See Also High-level Directory Structure What You Need to Know Before You Begin Using the Intel® Math Kernel Library Target platform Identify the architecture of your target machine: • IA-32 or compatible • Intel® 64 or compatible Reason: Because Intel MKL libraries are located in directories corresponding to your particular architecture (see Architecture Support), you should provide proper paths on your link lines (see Linking Examples). To configure your development environment for the use with Intel MKL, set your environment variables using the script corresponding to your architecture (see Setting Environment Variables for details). Mathematical problem Identify all Intel MKL function domains that you require: • BLAS • Sparse BLAS • LAPACK • PBLAS • ScaLAPACK • Sparse Solver routines • Vector Mathematical Library functions (VML) • Vector Statistical Library functions • Fourier Transform functions (FFT) • Cluster FFT • Trigonometric Transform routines • Poisson, Laplace, and Helmholtz Solver routines • Optimization (Trust-Region) Solver routines • Data Fitting Functions • GMP* arithmetic functions. Deprecated and will be removed in a future release 2 Intel® Math Kernel Library for Linux* OS User's Guide 20Reason: The function domain you intend to use narrows the search in the Reference Manual for specific routines you need. Additionally, if you are using the Intel MKL cluster software, your link line is function-domain specific (see Working with the Cluster Software). Coding tips may also depend on the function domain (see Tips and Techniques to Improve Performance). Programming language Intel MKL provides support for both Fortran and C/C++ programming. Identify the language interfaces that your function domains support (see Intel® Math Kernel Library Language Interfaces Support). Reason: Intel MKL provides language-specific include files for each function domain to simplify program development (see Language Interfaces Support, by Function Domain). For a list of language-specific interface libraries and modules and an example how to generate them, see also Using Language-Specific Interfaces with Intel® Math Kernel Library. Range of integer data If your system is based on the Intel 64 architecture, identify whether your application performs calculations with large data arrays (of more than 2 31 -1 elements). Reason: To operate on large data arrays, you need to select the ILP64 interface, where integers are 64-bit; otherwise, use the default, LP64, interface, where integers are 32-bit (see Using the ILP64 Interface vs. LP64 Interface). Threading model Identify whether and how your application is threaded: • Threaded with the Intel compiler • Threaded with a third-party compiler • Not threaded Reason: The compiler you use to thread your application determines which threading library you should link with your application. For applications threaded with a third-party compiler you may need to use Intel MKL in the sequential mode (for more information, see Sequential Mode of the Library and Linking with Threading Libraries). Number of threads Determine the number of threads you want Intel MKL to use. Reason: Intel MKL is based on the OpenMP* threading. By default, the OpenMP* software sets the number of threads that Intel MKL uses. If you need a different number, you have to set it yourself using one of the available mechanisms. For more information, see Using Parallelism of the Intel® Math Kernel Library. Linking model Decide which linking model is appropriate for linking your application with Intel MKL libraries: • Static • Dynamic Reason: The link line syntax and libraries for static and dynamic linking are different. For the list of link libraries for static and dynamic models, linking examples, and other relevant topics, like how to save disk space by creating a custom dynamic library, see Linking Your Application with the Intel® Math Kernel Library. MPI used Decide what MPI you will use with the Intel MKL cluster software. You are strongly encouraged to use Intel® MPI 3.2 or later. MPI used Reason: To link your application with ScaLAPACK and/or Cluster FFT, the libraries corresponding to your particular MPI should be listed on the link line (see Working with the Cluster Software). Getting Started 2 212 Intel® Math Kernel Library for Linux* OS User's Guide 22Structure of the Intel® Math Kernel Library 3 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Architecture Support Intel® Math Kernel Library (Intel® MKL) for Linux* OS provides two architecture-specific implementations. The following table lists the supported architectures and directories where each architecture-specific implementation is located. Architecture Location IA-32 or compatible /lib/ia32 Intel® 64 or compatible /lib/intel64 See Also High-level Directory Structure Detailed Structure of the IA-32 Architecture Directories Detailed Structure of the Intel® 64 Architecture Directories High-level Directory Structure Directory Contents Installation directory of the Intel® Math Kernel Library (Intel® MKL) Subdirectories of bin Scripts to set environmental variables in the user shell bin/ia32 Shell scripts for the IA-32 architecture bin/intel64 Shell scripts for the Intel® 64 architecture benchmarks/linpack Shared-memory (SMP) version of the LINPACK benchmark benchmarks/mp_linpack Message-passing interface (MPI) version of the LINPACK benchmark examples Examples directory. Each subdirectory has source and data files include INCLUDE files for the library routines, as well as for tests and examples 23Directory Contents include/ia32 Fortran 95 .mod files for the IA-32 architecture and Intel® Fortran compiler include/intel64/lp64 Fortran 95 .mod files for the Intel® 64 architecture, Intel Fortran compiler, and LP64 interface include/intel64/ilp64 Fortran 95 .mod files for the Intel® 64 architecture, Intel Fortran compiler, and ILP64 interface include/fftw Header files for the FFTW2 and FFTW3 interfaces interfaces/blas95 Fortran 95 interfaces to BLAS and a makefile to build the library interfaces/fftw2x_cdft MPI FFTW 2.x interfaces to the Intel MKL Cluster FFTs interfaces/fftw3x_cdft MPI FFTW 3.x interfaces to the Intel MKL Cluster FFTs interfaces/fftw2xc FFTW 2.x interfaces to the Intel MKL FFTs (C interface) interfaces/fftw2xf FFTW 2.x interfaces to the Intel MKL FFTs (Fortran interface) interfaces/fftw3xc FFTW 3.x interfaces to the Intel MKL FFTs (C interface) interfaces/fftw3xf FFTW 3.x interfaces to the Intel MKL FFTs (Fortran interface) interfaces/lapack95 Fortran 95 interfaces to LAPACK and a makefile to build the library lib/ia32 Static libraries and shared objects for the IA-32 architecture lib/intel64 Static libraries and shared objects for the Intel® 64 architecture tests Source and data files for tests tools Tools and plug-ins tools/builder Tools for creating custom dynamically linkable libraries tools/plugins/ com.intel.mkl.help Eclipse* IDE plug-in with Intel MKL Reference Manual in WebHelp format. See mkl_documentation.htm for more information Subdirectories of Documentation/en_US/mkl Intel MKL documentation. man/en_US/man3 Man pages for Intel MKL functions. No directory for man pages is created in locales other than en_US even if a directory for the localized documentation is created in the respective locales. For more information, see Contents of the Documentation Directories. See Also Notational Conventions Layered Model Concept Intel MKL is structured to support multiple compilers and interfaces, different OpenMP* implementations, both serial and multiple threads, and a wide range of processors. Conceptually Intel MKL can be divided into distinct parts to support different interfaces, threading models, and core computations: 1. Interface Layer 2. Threading Layer 3. Computational Layer 3 Intel® Math Kernel Library for Linux* OS User's Guide 24You can combine Intel MKL libraries to meet your needs by linking with one library in each part layer-bylayer. Once the interface library is selected, the threading library you select picks up the chosen interface, and the computational library uses interfaces and OpenMP implementation (or non-threaded mode) chosen in the first two layers. To support threading with different compilers, one more layer is needed, which contains libraries not included in Intel MKL: • Compiler run-time libraries (RTL). The following table provides more details of each layer. Layer Description Interface Layer This layer matches compiled code of your application with the threading and/or computational parts of the library. This layer provides: • LP64 and ILP64 interfaces. • Compatibility with compilers that return function values differently. • A mapping between single-precision names and double-precision names for applications using Cray*-style naming (SP2DP interface). SP2DP interface supports Cray-style naming in applications targeted for the Intel 64 architecture and using the ILP64 interface. SP2DP interface provides a mapping between single-precision names (for both real and complex types) in the application and double-precision names in Intel MKL BLAS and LAPACK. Function names are mapped as shown in the following example for BLAS functions ?GEMM: SGEMM -> DGEMM DGEMM -> DGEMM CGEMM -> ZGEMM ZGEMM -> ZGEMM Mind that no changes are made to double-precision names. Threading Layer This layer: • Provides a way to link threaded Intel MKL with different threading compilers. • Enables you to link with a threaded or sequential mode of the library. This layer is compiled for different environments (threaded or sequential) and compilers (from Intel, GNU*, and so on). Computational Layer This layer is the heart of Intel MKL. It has only one library for each combination of architecture and supported OS. The Computational layer accommodates multiple architectures through identification of architecture features and chooses the appropriate binary code at run time. Compiler Run-time Libraries (RTL) To support threading with Intel compilers, Intel MKL uses RTLs of the Intel® C++ Composer XE or Intel® Fortran Composer XE. To thread using third-party threading compilers, use libraries in the Threading layer or an appropriate compatibility library. See Also Using the ILP64 Interface vs. LP64 Interface Linking Your Application with the Intel® Math Kernel Library Linking with Threading Libraries Accessing the Intel® Math Kernel Library Documentation Structure of the Intel® Math Kernel Library 3 25Contents of the Documentation Directories Most of Intel MKL documentation is installed at /Documentation// mkl. For example, the documentation in English is installed at / Documentation/en_US/mkl. However, some Intel MKL-related documents are installed one or two levels up. The following table lists MKL-related documentation. File name Comment Files in /Documentation /clicense or /flicense Common end user license for the Intel® C++ Composer XE 2011 or Intel® Fortran Composer XE 2011, respectively mklsupport.txt Information on package number for customer support reference Contents of /Documentation//mkl redist.txt List of redistributable files mkl_documentation.htm Overview and links for the Intel MKL documentation mkl_manual/index.htm Intel MKL Reference Manual in an uncompressed HTML format Release_Notes.htm Intel MKL Release Notes mkl_userguide/index.htm Intel MKL User's Guide in an uncompressed HTML format, this document mkl_link_line_advisor.htm Intel MKL Link-line Advisor Viewing Man Pages To access Intel MKL man pages, add the man pages directory to the MANPATH environment variable. If you performed the Setting Environment Variables step of the Getting Started process, this is done automatically. To view the man page for an Intel MKL function, enter the following command in your command shell: man In this release, is the function name with omitted prefixes denoting data type, task type, or any other field that may vary for this function. Examples: • For the BLAS function ddot, enter man dot • For the ScaLAPACK function pzgeql2, enter man pgeql2 • For the statistical function vslConvSetMode, enter man vslSetMode • For the VML function vdPackM , enter man vPack • For the FFT function DftiCommitDescriptor, enter man DftiCommitDescriptor NOTE Function names in the man command are case-sensitive. See Also High-level Directory Structure Setting Environment Variables 3 Intel® Math Kernel Library for Linux* OS User's Guide 26Linking Your Application with the Intel® Math Kernel Library 4 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Linking Quick Start Intel® Math Kernel Library (Intel® MKL) provides several options for quick linking of your application, which depend on the way you link: Using the Intel® Composer XE compiler see Using the -mkl Compiler Option. Explicit dynamic linking see Using the Single Dynamic Library for how to simplify your link line. Explicitly listing libraries on your link line see Selecting Libraries to Link with for a summary of the libraries. Using an interactive interface see Using the Link-line Advisor to determine libraries and options to specify on your link or compilation line. Using an internally provided tool see Using the Command-line Link Tool to determine libraries, options, and environment variables or even compile and build your application. Using the -mkl Compiler Option The Intel® Composer XE compiler supports the following variants of the -mkl compiler option: -mkl or -mkl=parallel to link with standard threaded Intel MKL. -mkl=sequential to link with sequential version of Intel MKL. -mkl=cluster to link with Intel MKL cluster components (sequential) that use Intel MPI. For more information on the -mkl compiler option, see the Intel Compiler User and Reference Guides. On Intel® 64 architecture systems, for each variant of the -mkl option, the compiler links your application using the LP64 interface. If you specify any variant of the -mkl compiler option, the compiler automatically includes the Intel MKL libraries. In cases not covered by the option, use the Link-line Advisor or see Linking in Detail. See Also Listing Libraries on a Link Line Using the ILP64 Interface vs. LP64 Interface Using the Link-line Advisor 27Intel® Software Documentation Library Using the Single Dynamic Library You can simplify your link line through the use of the Intel MKL Single Dynamic Library (SDL). To use SDL, place libmkl_rt.so on your link line. For example: ic? application.c -lmkl_rt SDL enables you to select the interface and threading library for Intel MKL at run time. By default, linking with SDL provides: • LP64 interface on systems based on the Intel® 64 architecture • Intel threading To use other interfaces or change threading preferences, including use of the sequential version of Intel MKL, you need to specify your choices using functions or environment variables as explained in section Dynamically Selecting the Interface and Threading Layer. Selecting Libraries to Link with To link with Intel MKL: • Choose one library from the Interface layer and one library from the Threading layer • Add the only library from the Computational layer and run-time libraries (RTL) The following table lists Intel MKL libraries to link with your application. Interface layer Threading layer Computational layer RTL IA-32 architecture, static linking libmkl_intel.a libmkl_intel_ thread.a libmkl_core.a libiomp5.so IA-32 architecture, dynamic linking libmkl_intel. so libmkl_intel_ thread.so libmkl_core. so libiomp5.so Intel® 64 architecture, static linking libmkl_intel_ lp64.a libmkl_intel_ thread.a libmkl_core.a libiomp5.so Intel® 64 architecture, dynamic linking libmkl_intel_ lp64.so libmkl_intel_ thread.so libmkl_core. so libiomp5.so The Single Dynamic Library (SDL) automatically links interface, threading, and computational libraries and thus simplifies linking. The following table lists Intel MKL libraries for dynamic linking using SDL. See Dynamically Selecting the Interface and Threading Layer for how to set the interface and threading layers at run time through function calls or environment settings. SDL RTL IA-32 and Intel® 64 architectures libmkl_rt.so libiomp5.so † † Use the Link-line Advisor to check whether you need to explicitly link the libiomp5.so RTL. For exceptions and alternatives to the libraries listed above, see Linking in Detail. See Also Layered Model Concept 4 Intel® Math Kernel Library for Linux* OS User's Guide 28Using the Link-line Advisor Using the -mkl Compiler Option Working with the Intel® Math Kernel Library Cluster Software Using the Link-line Advisor Use the Intel MKL Link-line Advisor to determine the libraries and options to specify on your link or compilation line. The latest version of the tool is available at http://software.intel.com/en-us/articles/intel-mkl-link-lineadvisor. The tool is also available in the product. The Advisor requests information about your system and on how you intend to use Intel MKL (link dynamically or statically, use threaded or sequential mode, etc.). The tool automatically generates the appropriate link line for your application. See Also Contents of the Documentation Directories Using the Command-line Link Tool Use the command-line Link tool provided by Intel MKL to simplify building your application with Intel MKL. The tool not only provides the options, libraries, and environment variables to use, but also performs compilation and building of your application. The tool mkl_link_tool is installed in the /tools directory. See the knowledge base article at http://software.intel.com/en-us/articles/mkl-command-line-link-tool for more information. Linking Examples See Also Using the Link-line Advisor Examples for Linking with ScaLAPACK and Cluster FFT Linking on IA-32 Architecture Systems The following examples illustrate linking that uses Intel(R) compilers. The examples use the .f Fortran source file. C/C++ users should instead specify a .cpp (C++) or .c (C) file and replace ifort with icc NOTE If you successfully completed the Setting Environment Variables step of the Getting Started process, you can omit -I$MKLINCLUDE in all the examples and omit -L$MKLPATH in the examples for dynamic linking. In these examples, MKLPATH=$MKLROOT/lib/ia32, MKLINCLUDE=$MKLROOT/include : • Static linking of myprog.f and parallel Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -Wl,--start-group $MKLPATH/libmkl_intel.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/ libmkl_core.a -Wl,--end-group -liomp5 -lpthread • Dynamic linking of myprog.f and parallel Intel MKL: Linking Your Application with the Intel® Math Kernel Library 4 29ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel -lmkl_intel_thread -lmkl_core -liomp5 -lpthread • Static linking of myprog.f and sequential version of Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -Wl,--start-group $MKLPATH/libmkl_intel.a $MKLPATH/libmkl_sequential.a $MKLPATH/ libmkl_core.a -Wl,--end-group -lpthread • Dynamic linking of myprog.f and sequential version of Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel -lmkl_sequential -lmkl_core -lpthread • Dynamic linking of user code myprog.f and parallel or sequential Intel MKL (Call the mkl_set_threading_layer function or set value of the MKL_THREADING_LAYER environment variable to choose threaded or sequential mode): ifort myprog.f -lmkl_rt • Static linking of myprog.f, Fortran 95 LAPACK interface, and parallel Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/ia32 -lmkl_lapack95 -Wl,--start-group $MKLPATH/libmkl_intel.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/ libmkl_core.a -Wl,--end-group -liomp5 -lpthread • Static linking of myprog.f, Fortran 95 BLAS interface, and parallel Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/ia32 -lmkl_blas95 -Wl,--start-group $MKLPATH/libmkl_intel.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/ libmkl_core.a -Wl,--end-group -liomp5 -lpthread See Also Fortran 95 Interfaces to LAPACK and BLAS Examples for Linking a C Application Examples for Linking a Fortran Application Using the Single Dynamic Library Linking on Intel(R) 64 Architecture Systems The following examples illustrate linking that uses Intel(R) compilers. The examples use the .f Fortran source file. C/C++ users should instead specify a .cpp (C++) or .c (C) file and replace ifort with icc NOTE If you successfully completed the Setting Environment Variables step of the Getting Started process, you can omit -I$MKLINCLUDE in all the examples and omit -L$MKLPATH in the examples for dynamic linking. In these examples, MKLPATH=$MKLROOT/lib/intel64, MKLINCLUDE=$MKLROOT/include: • Static linking of myprog.f and parallel Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread 4 Intel® Math Kernel Library for Linux* OS User's Guide 30• Dynamic linking of myprog.f and parallel Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread • Static linking of myprog.f and sequential version of Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread • Dynamic linking of myprog.f and sequential version of Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread • Static linking of myprog.f and parallel Intel MKL supporting the ILP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -Wl,--start-group $MKLPATH/libmkl_intel_ilp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread • Dynamic linking of myprog.f and parallel Intel MKL supporting the ILP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread • Dynamic linking of user code myprog.f and parallel or sequential Intel MKL (Call appropriate functions or set environment variables to choose threaded or sequential mode and to set the interface): ifort myprog.f -lmkl_rt • Static linking of myprog.f, Fortran 95 LAPACK interface, and parallel Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/intel64/lp64 -lmkl_lapack95_lp64 -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/ libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread • Static linking of myprog.f, Fortran 95 BLAS interface, and parallel Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/intel64/lp64 -lmkl_blas95_lp64 -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/ libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread See Also Fortran 95 Interfaces to LAPACK and BLAS Examples for Linking a C Application Examples for Linking a Fortran Application Using the Single Dynamic Library Linking in Detail This section recommends which libraries to link with depending on your Intel MKL usage scenario and provides details of the linking. Listing Libraries on a Link Line To link with Intel MKL, specify paths and libraries on the link line as shown below. Linking Your Application with the Intel® Math Kernel Library 4 31NOTE The syntax below is for dynamic linking. For static linking, replace each library name preceded with "-l" with the path to the library file. For example, replace -lmkl_core with $MKLPATH/ libmkl_core.a, where $MKLPATH is the appropriate user-defined environment variable. -L -I [-I/{ia32|intel64|{ilp64|lp64}}] [-lmkl_blas{95|95_ilp64|95_lp64}] [-lmkl_lapack{95|95_ilp64|95_lp64}] [ ] -lmkl_{intel|intel_ilp64|intel_lp64|intel_sp2dp|gf|gf_ilp64|gf_lp64} -lmkl_{intel_thread|gnu_thread|pgi_thread|sequential} -lmkl_core -liomp5 [-lpthread] [-lm] In case of static linking, enclose the cluster components, interface, threading, and computational libraries in grouping symbols (for example, -Wl,--start-group $MKLPATH/libmkl_cdft_core.a $MKLPATH/ libmkl_blacs_intelmpi_ilp64.a $MKLPATH/libmkl_intel_ilp64.a $MKLPATH/ libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group). The order of listing libraries on the link line is essential, except for the libraries enclosed in the grouping symbols above. See Also Using the Link-line Advisor Linking Examples Working with the Intel® Math Kernel Library Cluster Software Dynamically Selecting the Interface and Threading Layer The Single Dynamic Library (SDL) enables you to dynamically select the interface and threading layer for Intel MKL. Setting the Interface Layer Available interfaces depend on the architecture of your system. On systems based on the Intel ® 64 architecture, LP64 and ILP64 interfaces are available. To set one of these interfaces at run time, use the mkl_set_interface_layer function or the MKL_INTERFACE_LAYER environment variable. The following table provides values to be used to set each interface. Interface Layer Value of MKL_INTERFACE_LAYER Value of the Parameter of mkl_set_interface_layer LP64 LP64 MKL_INTERFACE_LP64 ILP64 ILP64 MKL_INTERFACE_ILP64 If the mkl_set_interface_layer function is called, the environment variable MKL_INTERFACE_LAYER is ignored. By default the LP64 interface is used. See the Intel MKL Reference Manual for details of the mkl_set_interface_layer function. 4 Intel® Math Kernel Library for Linux* OS User's Guide 32Setting the Threading Layer To set the threading layer at run time, use the mkl_set_threading_layer function or the MKL_THREADING_LAYER environment variable. The following table lists available threading layers along with the values to be used to set each layer. Threading Layer Value of MKL_THREADING_LAYER Value of the Parameter of mkl_set_threading_layer Intel threading INTEL MKL_THREADING_INTEL Sequential mode of Intel MKL SEQUENTIAL MKL_THREADING_SEQUENTIAL GNU threading GNU MKL_THREADING_GNU PGI threading PGI MKL_THREADING_PGI If the mkl_set_threading_layer function is called, the environment variable MKL_THREADING_LAYER is ignored. By default Intel threading is used. See the Intel MKL Reference Manual for details of the mkl_set_threading_layer function. See Also Using the Single Dynamic Library Layered Model Concept Directory Structure in Detail Linking with Interface Libraries Using the ILP64 Interface vs. LP64 Interface The Intel MKL ILP64 libraries use the 64-bit integer type (necessary for indexing large arrays, with more than 2 31 -1 elements), whereas the LP64 libraries index arrays with the 32-bit integer type. The LP64 and ILP64 interfaces are implemented in the Interface layer. Link with the following interface libraries for the LP64 or ILP64 interface, respectively: • libmkl_intel_lp64.a or libmkl_intel_ilp64.a for static linking • libmkl_intel_lp64.so or libmkl_intel_ilp64.so for dynamic linking The ILP64 interface provides for the following: • Support large data arrays (with more than 2 31 -1 elements) • Enable compiling your Fortran code with the -i8 compiler option The LP64 interface provides compatibility with the previous Intel MKL versions because "LP64" is just a new name for the only interface that the Intel MKL versions lower than 9.1 provided. Choose the ILP64 interface if your application uses Intel MKL for calculations with large data arrays or the library may be used so in future. Intel MKL provides the same include directory for the ILP64 and LP64 interfaces. Compiling for LP64/ILP64 The table below shows how to compile for the ILP64 and LP64 interfaces: Linking Your Application with the Intel® Math Kernel Library 4 33Fortran Compiling for ILP64 ifort -i8 -I/include ... Compiling for LP64 ifort -I/include ... C or C++ Compiling for ILP64 icc -DMKL_ILP64 -I/include ... Compiling for LP64 icc -I/include ... CAUTION Linking of an application compiled with the -i8 or -DMKL_ILP64 option to the LP64 libraries may result in unpredictable consequences and erroneous output. Coding for ILP64 You do not need to change existing code if you are not using the ILP64 interface. To migrate to ILP64 or write new code for ILP64, use appropriate types for parameters of the Intel MKL functions and subroutines: Integer Types Fortran C or C++ 32-bit integers INTEGER*4 or INTEGER(KIND=4) int Universal integers for ILP64/ LP64: • 64-bit for ILP64 • 32-bit otherwise INTEGER without specifying KIND MKL_INT Universal integers for ILP64/ LP64: • 64-bit integers INTEGER*8 or INTEGER(KIND=8) MKL_INT64 FFT interface integers for ILP64/ LP64 INTEGER without specifying KIND MKL_LONG To determine the type of an integer parameter of a function, use appropriate include files. For functions that support only a Fortran interface, use the C/C++ include files *.h. The above table explains which integer parameters of functions become 64-bit and which remain 32-bit for ILP64. The table applies to most Intel MKL functions except some VML and VSL functions, which require integer parameters to be 64-bit or 32-bit regardless of the interface: • VML: The mode parameter of VML functions is 64-bit. • Random Number Generators (RNG): All discrete RNG except viRngUniformBits64 are 32-bit. The viRngUniformBits64 generator function and vslSkipAheadStream service function are 64-bit. • Summary Statistics: The estimate parameter of the vslsSSCompute/vsldSSCompute function is 64- bit. Refer to the Intel MKL Reference Manual for more information. 4 Intel® Math Kernel Library for Linux* OS User's Guide 34To better understand ILP64 interface details, see also examples and tests. Limitations All Intel MKL function domains support ILP64 programming with the following exceptions: • FFTW interfaces to Intel MKL: • FFTW 2.x wrappers do not support ILP64. • FFTW 3.2 wrappers support ILP64 by a dedicated set of functions plan_guru64. • GMP* Arithmetic Functions do not support ILP64. NOTE GMP Arithmetic Functions are deprecated and will be removed in a future release. See Also High-level Directory Structure Include Files Language Interfaces Support, by Function Domain Layered Model Concept Directory Structure in Detail Linking with Fortran 95 Interface Libraries The libmkl_blas95*.a and libmkl_lapack95*.a libraries contain Fortran 95 interfaces for BLAS and LAPACK, respectively, which are compiler-dependent. In the Intel MKL package, they are prebuilt for the Intel® Fortran compiler. If you are using a different compiler, build these libraries before using the interface. See Also Fortran 95 Interfaces to LAPACK and BLAS Compiler-dependent Functions and Fortran 90 Modules Linking with Threading Libraries Sequential Mode of the Library You can use Intel MKL in a sequential (non-threaded) mode. In this mode, Intel MKL runs unthreaded code. However, it is thread-safe (except the LAPACK deprecated routine ?lacon), which means that you can use it in a parallel region in your OpenMP* code. The sequential mode requires no compatibility OpenMP* run-time library and does not respond to the environment variable OMP_NUM_THREADS or its Intel MKL equivalents. You should use the library in the sequential mode only if you have a particular reason not to use Intel MKL threading. The sequential mode may be helpful when using Intel MKL with programs threaded with some non-Intel compilers or in other situations where you need a non-threaded version of the library (for instance, in some MPI cases). To set the sequential mode, in the Threading layer, choose the *sequential.* library. Add the POSIX threads library (pthread) to your link line for the sequential mode because the *sequential.* library depends on pthread . See Also Directory Structure in Detail Using Parallelism of the Intel® Math Kernel Library Avoiding Conflicts in the Execution Environment Linking Examples Linking Your Application with the Intel® Math Kernel Library 4 35Selecting the Threading Layer Several compilers that Intel MKL supports use the OpenMP* threading technology. Intel MKL supports implementations of the OpenMP* technology that these compilers provide. To make use of this support, you need to link with the appropriate library in the Threading Layer and Compiler Support Run-time Library (RTL). Threading Layer Each Intel MKL threading library contains the same code compiled by the respective compiler (Intel, gnu and PGI* compilers on Linux OS). RTL This layer includes libiomp, the compatibility OpenMP* run-time library of the Intel compiler. In addition to the Intel compiler, libiomp provides support for one more threading compiler on Linux OS (GNU). That is, a program threaded with a GNU compiler can safely be linked with Intel MKL and libiomp. The table below helps explain what threading library and RTL you should choose under different scenarios when using Intel MKL (static cases only): Compiler Application Threaded? Threading Layer RTL Recommended Comment Intel Does not matter libmkl_intel_ thread.a libiomp5.so PGI Yes libmkl_pgi_ thread.a or libmkl_ sequential.a PGI* supplied Use of libmkl_sequential.a removes threading from Intel MKL calls. PGI No libmkl_intel_ thread.a libiomp5.so PGI No libmkl_pgi_ thread.a PGI* supplied PGI No libmkl_ sequential.a None gnu Yes libmkl_gnu_ thread.a libiomp5.so or GNU OpenMP run-time library libiomp5 offers superior scaling performance. gnu Yes libmkl_ sequential.a None gnu No libmkl_intel_ thread.a libiomp5.so other Yes libmkl_ sequential.a None other No libmkl_intel_ thread.a libiomp5.so 4 Intel® Math Kernel Library for Linux* OS User's Guide 36Linking with Computational Libraries If you are not using the Intel MKL cluster software, you need to link your application with only one computational library, depending on the linking method: Static Linking Dynamic Linking lib mkl_core.a lib mkl_core.so Computational Libraries for Applications that Use the Intel MKL Cluster Software ScaLAPACK and Cluster Fourier Transform Functions (Cluster FFT) require more computational libraries, which may depend on your architecture. The following table lists computational libraries for IA-32 architecture applications that use ScaLAPACK or Cluster FFT. Computational Libraries for IA-32 Architecture Function domain Static Linking Dynamic Linking ScaLAPACK † libmkl_scalapack_core.a libmkl_core.a libmkl_scalapack_core.so libmkl_core.so Cluster Fourier Transform Functions † libmkl_cdft_core.a libmkl_core.a libmkl_cdft_core.so libmkl_core.so † Also add the library with BLACS routines corresponding to the MPI used. The following table lists computational libraries for Intel ® 64 architecture applications that use ScaLAPACK or Cluster FFT. Computational Libraries for the Intel ® 64 Architecture Function domain Static Linking Dynamic Linking ScaLAPACK, LP64 interface 1 libmkl_scalapack_lp64.a libmkl_core.a libmkl_scalapack_lp64.so libmkl_core.so ScaLAPACK, ILP64 interface 1 libmkl_scalapack_ilp64.a libmkl_core.a libmkl_scalapack_ilp64.so libmkl_core.so Cluster Fourier Transform Functions 1 libmkl_cdft_core.a libmkl_core.a libmkl_cdft_core.so libmkl_core.so † Also add the library with BLACS routines corresponding to the MPI used. See Also Linking with ScaLAPACK and Cluster FFTs Using the Link-line Advisor Using the ILP64 Interface vs. LP64 Interface Linking with Compiler Run-time Libraries Dynamically link libiomp, the compatibility OpenMP* run-time library, even if you link other libraries statically. Linking Your Application with the Intel® Math Kernel Library 4 37Linking to the libiomp statically can be problematic because the more complex your operating environment or application, the more likely redundant copies of the library are included. This may result in performance issues (oversubscription of threads) and even incorrect results. To link libiomp dynamically, be sure the LD_LIBRARY_PATH environment variable is defined correctly. See Also Scripts to Set Environment Variables Layered Model Concept Linking with System Libraries To use the Intel MKL FFT, Trigonometric Transform, or Poisson, Laplace, and Helmholtz Solver routines, link in the math support system library by adding " -lm " to the link line. On Linux OS, the libiomp library relies on the native pthread library for multi-threading. Any time libiomp is required, add -lpthread to your link line afterwards (the order of listing libraries is important). Building Custom Shared Objects ?ustom shared objects reduce the collection of functions available in Intel MKL libraries to those required to solve your particular problems, which helps to save disk space and build your own dynamic libraries for distribution. The Intel MKL custom shared object builder enables you to create a dynamic library (shared object) containing the selected functions and located in the tools/builder directory. The builder contains a makefile and a definition file with the list of functions. NOTE The objects in Intel MKL static libraries are position-independent code (PIC), which is not typical for static libraries. Therefore, the custom shared object builder can create a shared object from a subset of Intel MKL functions by picking the respective object files from the static libraries. Using the Custom Shared Object Builder To build a custom shared object, use the following command: make target [] The following table lists possible values of target and explains what the command does for each value: Value Comment libia32 The builder uses static Intel MKL interface, threading, and core libraries to build a custom shared object for the IA-32 architecture. libintel64 The builder uses static Intel MKL interface, threading, and core libraries to build a custom shared object for the Intel® 64 architecture. soia32 The builder uses the single dynamic library libmkl_rt.so to build a custom shared object for the IA-32 architecture. sointel64 The builder uses the single dynamic library libmkl_rt.so to build a custom shared object for the Intel® 64 architecture. help The command prints Help on the custom shared object builder The placeholder stands for the list of parameters that define macros to be used by the makefile. The following table describes these parameters: 4 Intel® Math Kernel Library for Linux* OS User's Guide 38Parameter [Values] Description interface = {lp64|ilp64} Defines whether to use LP64 or ILP64 programming interfacefor the Intel 64architecture.The default value is lp64. threading = {parallel| sequential} Defines whether to use the Intel MKL in the threaded or sequential mode. The default value is parallel. export = Specifies the full name of the file that contains the list of entry-point functions to be included in the shared object. The default name is user_example_list (no extension). name = Specifies the name of the library to be created. By default, the names of the created library is mkl_custom.so. xerbla = Specifies the name of the object file .o that contains the user's error handler. The makefile adds this error handler to the library for use instead of the default Intel MKL error handler xerbla. If you omit this parameter, the native Intel MKL xerbla is used. See the description of the xerbla function in the Intel MKL Reference Manual on how to develop your own error handler. MKLROOT = Specifies the location of Intel MKL libraries used to build the custom shared object. By default, the builder uses the Intel MKL installation directory. All the above parameters are optional. In the simplest case, the command line is make ia32, and the missing options have default values. This command creates the mkl_custom.so library for processors using the IA-32 architecture. The command takes the list of functions from the user_list file and uses the native Intel MKL error handler xerbla. An example of a more complex case follows: make ia32 export=my_func_list.txt name=mkl_small xerbla=my_xerbla.o In this case, the command creates the mkl_small.so library for processors using the IA-32 architecture. The command takes the list of functions from my_func_list.txt file and uses the user's error handler my_xerbla.o. The process is similar for processors using the Intel® 64 architecture. See Also Using the Single Dynamic Library Composing a List of Functions To compose a list of functions for a minimal custom shared object needed for your application, you can use the following procedure: 1. Link your application with installed Intel MKL libraries to make sure the application builds. 2. Remove all Intel MKL libraries from the link line and start linking. Unresolved symbols indicate Intel MKL functions that your application uses. 3. Include these functions in the list. Important Each time your application starts using more Intel MKL functions, update the list to include the new functions. See Also Specifying Function Names Linking Your Application with the Intel® Math Kernel Library 4 39Specifying Function Names In the file with the list of functions for your custom shared object, adjust function names to the required interface. For example, for Fortran functions append an underscore character "_" to the names as a suffix: dgemm_ ddot_ dgetrf_ For more examples, see domain-specific lists of functions in the /tools/builder folder. NOTE The lists of functions are provided in the /tools/builder folder merely as examples. See Composing a List of Functions for how to compose lists of functions for your custom shared object. TIP Names of Fortran-style routines (BLAS, LAPACK, etc.) can be both upper-case or lower-case, with or without the trailing underscore. For example, these names are equivalent: BLAS: dgemm, DGEMM, dgemm_, DGEMM_ LAPACK: dgetrf, DGETRF, dgetrf_, DGETRF_. Properly capitalize names of C support functions in the function list. To do this, follow the guidelines below: 1. In the mkl_service.h include file, look up a #define directive for your function. 2. Take the function name from the replacement part of that directive. For example, the #define directive for the mkl_disable_fast_mm function is #define mkl_disable_fast_mm MKL_Disable_Fast_MM. Capitalize the name of this function in the list like this: MKL_Disable_Fast_MM. For the names of the Fortran support functions, see the tip. NOTE If selected functions have several processor-specific versions, the builder automatically includes them all in the custom library and the dispatcher manages them. Distributing Your Custom Shared Object To enable use of your custom shared object in a threaded mode, distribute libiomp5.so along with the custom shared object. 4 Intel® Math Kernel Library for Linux* OS User's Guide 40Managing Performance and Memory 5 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Using Parallelism of the Intel® Math Kernel Library Intel MKL is extensively parallelized. See Threaded Functions and Problems for lists of threaded functions and problems that can be threaded. Intel MKL is thread-safe, which means that all Intel MKL functions (except the LAPACK deprecated routine ? lacon) work correctly during simultaneous execution by multiple threads. In particular, any chunk of threaded Intel MKL code provides access for multiple threads to the same shared data, while permitting only one thread at any given time to access a shared piece of data. Therefore, you can call Intel MKL from multiple threads and not worry about the function instances interfering with each other. The library uses OpenMP* threading software, so you can use the environment variable OMP_NUM_THREADS to specify the number of threads or the equivalent OpenMP run-time function calls. Intel MKL also offers variables that are independent of OpenMP, such as MKL_NUM_THREADS, and equivalent Intel MKL functions for thread management. The Intel MKL variables are always inspected first, then the OpenMP variables are examined, and if neither is used, the OpenMP software chooses the default number of threads. By default, Intel MKL uses the number of threads equal to the number of physical cores on the system. To achieve higher performance, set the number of threads to the number of real processors or physical cores, as summarized in Techniques to Set the Number of Threads. See Also Managing Multi-core Performance Threaded Functions and Problems The following Intel MKL function domains are threaded: • Direct sparse solver. • LAPACK. For the list of threaded routines, see Threaded LAPACK Routines. • Level1 and Level2 BLAS. For the list of threaded routines, see Threaded BLAS Level1 and Level2 Routines. • All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers. • All mathematical VML functions. • FFT. For the list of FFT transforms that can be threaded, see Threaded FFT Problems. 41Threaded LAPACK Routines In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z. The following LAPACK routines are threaded: • Linear equations, computational routines: • Factorization: ?getrf, ?gbtrf, ?potrf, ?pptrf, ?sytrf, ?hetrf, ?sptrf, ?hptrf • Solving: ?dttrsb, ?gbtrs, ?gttrs, ?pptrs, ?pbtrs, ?pttrs, ?sytrs, ?sptrs, ?hptrs, ? tptrs, ?tbtrs • Orthogonal factorization, computational routines: ?geqrf, ?ormqr, ?unmqr, ?ormlq, ?unmlq, ?ormql, ?unmql, ?ormrq, ?unmrq • Singular Value Decomposition, computational routines: ?gebrd, ?bdsqr • Symmetric Eigenvalue Problems, computational routines: ?sytrd, ?hetrd, ?sptrd, ?hptrd, ?steqr, ?stedc. • Generalized Nonsymmetric Eigenvalue Problems, computational routines: chgeqz/zhgeqz. A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of parallelism: ?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges, cggesx/zggesx, cggev/zggev, cggevx/zggevx, and so on. Threaded BLAS Level1 and Level2 Routines In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z. The following routines are threaded for Intel ® Core™2 Duo and Intel ® Core™ i7 processors: • Level1 BLAS: ?axpy, ?copy, ?swap, ddot/sdot, cdotc, drot/srot • Level2 BLAS: ?gemv, ?trmv, dsyr/ssyr, dsyr2/ssyr2, dsymv/ssymv Threaded FFT Problems The following characteristics of a specific problem determine whether your FFT computation may be threaded: • rank • domain • size/length • precision (single or double) • placement (in-place or out-of-place) • strides • number of transforms • layout (for example, interleaved or split layout of complex data) Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow. One-dimensional (1D) transforms 1D transforms are threaded in many cases. 5 Intel® Math Kernel Library for Linux* OS User's Guide 421D complex-to-complex (c2c) transforms of size N using interleaved complex data layout are threaded under the following conditions depending on the architecture: Architecture Conditions Intel ® 64 N is a power of 2, log2(N) > 9, the transform is double-precision out-of-place, and input/output strides equal 1. IA-32 N is a power of 2, log2(N) > 13, and the transform is single-precision. N is a power of 2, log2(N) > 14, and the transform is double-precision. Any N is composite, log2(N) > 16, and input/output strides equal 1. 1D real-to-complex and complex-to-real transforms are not threaded. 1D complex-to-complex transforms using split-complex layout are not threaded. Prime-size complex-to-complex 1D transforms are not threaded. Multidimensional transforms All multidimensional transforms on large-volume data are threaded. Avoiding Conflicts in the Execution Environment Certain situations can cause conflicts in the execution environment that make the use of threads in Intel MKL problematic. This section briefly discusses why these problems exist and how to avoid them. If you thread the program using OpenMP directives and compile the program with Intel compilers, Intel MKL and the program will both use the same threading library. Intel MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads unless you specifically request Intel MKL to do so via the MKL_DYNAMIC functionality. However, Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library. If your program is threaded by some other means, Intel MKL may operate in multithreaded mode, and the performance may suffer due to overuse of the resources. The following table considers several cases where the conflicts may arise and provides recommendations depending on your threading model: Threading model Discussion You thread the program using OS threads (pthreads on Linux* OS). If more than one thread calls Intel MKL, and the function being called is threaded, it may be important that you turn off Intel MKL threading. Set the number of threads to one by any of the available means (see Techniques to Set the Number of Threads). You thread the program using OpenMP directives and/or pragmas and compile the program using a compiler other than a compiler from Intel. This is more problematic because setting of the OMP_NUM_THREADS environment variable affects both the compiler's threading library and libiomp. In this case, choose the threading library that matches the layered Intel MKL with the OpenMP compiler you employ (see Linking Examples on how to do this). If this is not possible, use Intel MKL in the sequential mode. To do this, you should link with the appropriate threading library: libmkl_sequential.a or libmkl_sequential.so (see High-level Directory Structure). There are multiple programs running on a multiple-cpu system, for example, a parallelized program that runs using MPI for communication in which each processor is treated as a node. The threading software will see multiple processors on the system even though each processor has a separate MPI process running on it. In this case, one of the solutions is to set the number of threads to one by any of the available means (see Techniques to Set the Number of Threads). Section Intel(R) Optimized MP LINPACK Benchmark for Clusters discusses another solution for a Hybrid (OpenMP* + MPI) mode. Managing Performance and Memory 5 43See Also Using Additional Threading Control Linking with Compiler Run-time Libraries Techniques to Set the Number of Threads Use one of the following techniques to change the number of threads to use in Intel MKL: • Set one of the OpenMP or Intel MKL environment variables: • OMP_NUM_THREADS • MKL_NUM_THREADS • MKL_DOMAIN_NUM_THREADS • Call one of the OpenMP or Intel MKL functions: • omp_set_num_threads() • mkl_set_num_threads() • mkl_domain_set_num_threads() When choosing the appropriate technique, take into account the following rules: • The Intel MKL threading controls take precedence over the OpenMP controls because they are inspected first. • A function call takes precedence over any environment variables. The exception, which is a consequence of the previous rule, is the OpenMP subroutine omp_set_num_threads(), which does not have precedence over Intel MKL environment variables, such as MKL_NUM_THREADS. See Using Additional Threading Control for more details. • You cannot change run-time behavior in the course of the run using the environment variables because they are read only once at the first call to Intel MKL. Setting the Number of Threads Using an OpenMP* Environment Variable You can set the number of threads using the environment variable OMP_NUM_THREADS. To change the number of threads, use the appropriate command in the command shell in which the program is going to run, for example: • For the bash shell, enter: export OMP_NUM_THREADS= • For the csh or tcsh shell, enter: set OMP_NUM_THREADS= See Also Using Additional Threading Control Changing the Number of Threads at Run Time You cannot change the number of threads during run time using environment variables. However, you can call OpenMP API functions from your program to change the number of threads during run time. The following sample code shows how to change the number of threads during run time using the omp_set_num_threads() routine. See also Techniques to Set the Number of Threads. The following example shows both C and Fortran code examples. To run this example in the C language, use the omp.h header file from the Intel(R) compiler package. If you do not have the Intel compiler but wish to explore the functionality in the example, use Fortran API for omp_set_num_threads() rather than the C version. For example, omp_set_num_threads_( &i_one ); // ******* C language ******* #include "omp.h" 5 Intel® Math Kernel Library for Linux* OS User's Guide 44#include "mkl.h" #include #define SIZE 1000 int main(int args, char *argv[]){ double *a, *b, *c; a = (double*)malloc(sizeof(double)*SIZE*SIZE); b = (double*)malloc(sizeof(double)*SIZE*SIZE); c = (double*)malloc(sizeof(double)*SIZE*SIZE); double alpha=1, beta=1; int m=SIZE, n=SIZE, k=SIZE, lda=SIZE, ldb=SIZE, ldc=SIZE, i=0, j=0; char transa='n', transb='n'; for( i=0; i #include ... mkl_set_num_threads ( 1 ); // ******* Fortran language ******* ... call mkl_set_num_threads( 1 ) See the Intel MKL Reference Manual for the detailed description of the threading control functions, their parameters, calling syntax, and more code examples. MKL_DYNAMIC The MKL_DYNAMIC environment variable enables Intel MKL to dynamically change the number of threads. The default value of MKL_DYNAMIC is TRUE, regardless of OMP_DYNAMIC, whose default value may be FALSE. When MKL_DYNAMIC is TRUE, Intel MKL tries to use what it considers the best number of threads, up to the maximum number you specify. For example, MKL_DYNAMIC set to TRUE enables optimal choice of the number of threads in the following cases: • If the requested number of threads exceeds the number of physical cores (perhaps because of using the Intel® Hyper-Threading Technology), and MKL_DYNAMIC is not changed from its default value of TRUE, Intel MKL will scale down the number of threads to the number of physical cores. • If you are able to detect the presence of MPI, but cannot determine if it has been called in a thread-safe mode (it is impossible to detect this with MPICH 1.2.x, for instance), and MKL_DYNAMIC has not been changed from its default value of TRUE, Intel MKL will run one thread. Managing Performance and Memory 5 47When MKL_DYNAMIC is FALSE, Intel MKL tries not to deviate from the number of threads the user requested. However, setting MKL_DYNAMIC=FALSE does not ensure that Intel MKL will use the number of threads that you request. The library may have no choice on this number for such reasons as system resources. Additionally, the library may examine the problem and use a different number of threads than the value suggested. For example, if you attempt to do a size one matrix-matrix multiply across eight threads, the library may instead choose to use only one thread because it is impractical to use eight threads in this event. Note also that if Intel MKL is called in a parallel region, it will use only one thread by default. If you want the library to use nested parallelism, and the thread within a parallel region is compiled with the same OpenMP compiler as Intel MKL is using, you may experiment with setting MKL_DYNAMIC to FALSE and manually increasing the number of threads. In general, set MKL_DYNAMIC to FALSE only under circumstances that Intel MKL is unable to detect, for example, to use nested parallelism where the library is already called from a parallel section. MKL_DOMAIN_NUM_THREADS The MKL_DOMAIN_NUM_THREADS environment variable suggests the number of threads for a particular function domain. MKL_DOMAIN_NUM_THREADS accepts a string value , which must have the following format: ::= { } ::= [ * ] ( | | | ) [ * ] ::= ::= MKL_DOMAIN_ALL | MKL_DOMAIN_BLAS | MKL_DOMAIN_FFT | MKL_DOMAIN_VML | MKL_DOMAIN_PARDISO ::= [ * ] ( | | ) [ * ] ::= ::= | | In the syntax above, values of indicate function domains as follows: MKL_DOMAIN_ALL All function domains MKL_DOMAIN_BLAS BLAS Routines MKL_DOMAIN_FFT non-cluster Fourier Transform Functions MKL_DOMAIN_VML Vector Mathematical Functions MKL_DOMAIN_PARDISO PARDISO For example, MKL_DOMAIN_ALL 2 : MKL_DOMAIN_BLAS 1 : MKL_DOMAIN_FFT 4 MKL_DOMAIN_ALL=2 : MKL_DOMAIN_BLAS=1 : MKL_DOMAIN_FFT=4 MKL_DOMAIN_ALL=2, MKL_DOMAIN_BLAS=1, MKL_DOMAIN_FFT=4 MKL_DOMAIN_ALL=2; MKL_DOMAIN_BLAS=1; MKL_DOMAIN_FFT=4 MKL_DOMAIN_ALL = 2 MKL_DOMAIN_BLAS 1 , MKL_DOMAIN_FFT 4 MKL_DOMAIN_ALL,2: MKL_DOMAIN_BLAS 1, MKL_DOMAIN_FFT,4 . The global variables MKL_DOMAIN_ALL, MKL_DOMAIN_BLAS, MKL_DOMAIN_FFT, MKL_DOMAIN_VML, and MKL_DOMAIN_PARDISO, as well as the interface for the Intel MKL threading control functions, can be found in the mkl.h header file. The table below illustrates how values of MKL_DOMAIN_NUM_THREADS are interpreted. 5 Intel® Math Kernel Library for Linux* OS User's Guide 48Value of MKL_DOMAIN_NUM_ THREADS Interpretation MKL_DOMAIN_ALL= 4 All parts of Intel MKL should try four threads. The actual number of threads may be still different because of the MKL_DYNAMIC setting or system resource issues. The setting is equivalent to MKL_NUM_THREADS = 4. MKL_DOMAIN_ALL= 1, MKL_DOMAIN_BLAS =4 All parts of Intel MKL should try one thread, except for BLAS, which is suggested to try four threads. MKL_DOMAIN_VML= 2 VML should try two threads. The setting affects no other part of Intel MKL. Be aware that the domain-specific settings take precedence over the overall ones. For example, the "MKL_DOMAIN_BLAS=4" value of MKL_DOMAIN_NUM_THREADS suggests trying four threads for BLAS, regardless of later setting MKL_NUM_THREADS, and a function call "mkl_domain_set_num_threads ( 4, MKL_DOMAIN_BLAS );" suggests the same, regardless of later calls to mkl_set_num_threads(). However, a function call with input "MKL_DOMAIN_ALL", such as "mkl_domain_set_num_threads (4, MKL_DOMAIN_ALL);" is equivalent to "mkl_set_num_threads(4)", and thus it will be overwritten by later calls to mkl_set_num_threads. Similarly, the environment setting of MKL_DOMAIN_NUM_THREADS with "MKL_DOMAIN_ALL=4" will be overwritten with MKL_NUM_THREADS = 2. Whereas the MKL_DOMAIN_NUM_THREADS environment variable enables you set several variables at once, for example, "MKL_DOMAIN_BLAS=4,MKL_DOMAIN_FFT=2", the corresponding function does not take string syntax. So, to do the same with the function calls, you may need to make several calls, which in this example are as follows: mkl_domain_set_num_threads ( 4, MKL_DOMAIN_BLAS ); mkl_domain_set_num_threads ( 2, MKL_DOMAIN_FFT ); Setting the Environment Variables for Threading Control To set the environment variables used for threading control, in the command shell in which the program is going to run, enter the export or set commands, depending on the shell you use. For example, for a bash shell, use the export commands: export = For example: export MKL_NUM_THREADS=4 export MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_ALL=1, MKL_DOMAIN_BLAS=4" export MKL_DYNAMIC=FALSE For the csh or tcsh shell, use the set commands. set =. For example: set MKL_NUM_THREADS=4 set MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_ALL=1, MKL_DOMAIN_BLAS=4" set MKL_DYNAMIC=FALSE Tips and Techniques to Improve Performance Managing Performance and Memory 5 49Coding Techniques To obtain the best performance with Intel MKL, ensure the following data alignment in your source code: • Align arrays on 16-byte boundaries. See Aligning Addresses on 16-byte Boundaries for how to do it. • Make sure leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16, where element_size is the size of an array element in bytes. • For two-dimensional arrays, avoid leading dimension values divisible by 2048 bytes. For example, for a double-precision array, with element_size = 8, avoid leading dimensions 256, 512, 768, 1024, … (elements). LAPACK Packed Routines The routines with the names that contain the letters HP, OP, PP, SP, TP, UP in the matrix type and storage position (the second and third letters respectively) operate on the matrices in the packed format (see LAPACK "Routine Naming Conventions" sections in the Intel MKL Reference Manual). Their functionality is strictly equivalent to the functionality of the unpacked routines with the names containing the letters HE, OR, PO, SY, TR, UN in the same positions, but the performance is significantly lower. If the memory restriction is not too tight, use an unpacked routine for better performance. In this case, you need to allocate N 2 /2 more memory than the memory required by a respective packed routine, where N is the problem size (the number of equations). For example, to speed up solving a symmetric eigenproblem with an expert driver, use the unpacked routine: call dsyevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z, ldz, work, lwork, iwork, ifail, info) where a is the dimension lda-by-n, which is at least N 2 elements, instead of the packed routine: call dspevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz, work, iwork, ifail, info) where ap is the dimension N*(N+1)/2. FFT Functions Additional conditions can improve performance of the FFT functions. The addresses of the first elements of arrays and the leading dimension values, in bytes (n*element_size), of two-dimensional arrays should be divisible by cache line size, which equals: • 32 bytes for the Intel ® Pentium® III processors • 64 bytes for the Intel ® Pentium® 4 processors and processors using Intel ® 64 architecture Hardware Configuration Tips Dual-Core Intel® Xeon® processor 5100 series systems To get the best performance with Intel MKL on Dual-Core Intel ® Xeon® processor 5100 series systems, enable the Hardware DPL (streaming data) Prefetcher functionality of this processor. To configure this functionality, use the appropriate BIOS settings, as described in your BIOS documentation. 5 Intel® Math Kernel Library for Linux* OS User's Guide 50Intel® Hyper-Threading Technology Intel ® Hyper-Threading Technology (Intel ® HT Technology) is especially effective when each thread performs different types of operations and when there are under-utilized resources on the processor. However, Intel MKL fits neither of these criteria because the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance by disabling Intel HT Technology. If you run with Intel HT Technology enabled, performance may be especially impacted if you run on fewer threads than physical cores. Moreover, if, for example, there are two threads to every physical core, the thread scheduler may assign two threads to some cores and ignore the other cores altogether. If you are using the OpenMP* library of the Intel Compiler, read the respective User Guide on how to best set the thread affinity interface to avoid this situation. For Intel MKL, apply the following setting: set KMP_AFFINITY=granularity=fine,compact,1,0 See Also Using Parallelism of the Intel® Math Kernel Library Managing Multi-core Performance You can obtain best performance on systems with multi-core processors by requiring that threads do not migrate from core to core. To do this, bind threads to the CPU cores by setting an affinity mask to threads. Use one of the following options: • OpenMP facilities (recommended, if available), for example, the KMP_AFFINITY environment variable using the Intel OpenMP library • A system function, as explained below Consider the following performance issue: • The system has two sockets with two cores each, for a total of four cores (CPUs) • T he two -thread parallel application that calls the Intel MKL FFT happens to run faster than in four threads, but the performance in two threads is very unstable The following code example shows how to resolve this issue by setting an affinity mask by operating system means using the Intel compiler. The code calls the system function sched_setaffinity to bind the threads to the cores on different sockets. Then the Intel MKL FFT function is called: #define _GNU_SOURCE //for using the GNU CPU affinity // (works with the appropriate kernel and glibc) // Set affinity mask #include #include #include #include int main(void) { int NCPUs = sysconf(_SC_NPROCESSORS_CONF); printf("Using thread affinity on %i NCPUs\n", NCPUs); #pragma omp parallel default(shared) { cpu_set_t new_mask; cpu_set_t was_mask; int tid = omp_get_thread_num(); CPU_ZERO(&new_mask); // 2 packages x 2 cores/pkg x 1 threads/core (4 total cores) CPU_SET(tid==0 ? 0 : 2, &new_mask); if (sched_getaffinity(0, sizeof(was_mask), &was_mask) == -1) { printf("Error: sched_getaffinity(%d, sizeof(was_mask), &was_mask)\n", tid); } if (sched_setaffinity(0, sizeof(new_mask), &new_mask) == -1) { printf("Error: sched_setaffinity(%d, sizeof(new_mask), &new_mask)\n", tid); } printf("tid=%d new_mask=%08X was_mask=%08X\n", tid, Managing Performance and Memory 5 51 *(unsigned int*)(&new_mask), *(unsigned int*)(&was_mask)); } // Call Intel MKL FFT function return 0; } Compile the application with the Intel compiler using the following command: icc test_application.c -openmp where test_application.c is the filename for the application. Build the application. Run it in two threads, for example, by using the environment variable to set the number of threads: env OMP_NUM_THREADS=2 ./a.out See the Linux Programmer's Manual (in man pages format) for particulars of the sched_setaffinity function used in the above example. Operating on Denormals The IEEE 754-2008 standard, "An IEEE Standard for Binary Floating-Point Arithmetic", defines denormal (or subnormal) numbers as non-zero numbers smaller than the smallest possible normalized numbers for a specific floating-point format. Floating-point operations on denormals are slower than on normalized operands because denormal operands and results are usually handled through a software assist mechanism rather than directly in hardware. This software processing causes Intel MKL functions that consume denormals to run slower than with normalized floating-point numbers. You can mitigate this performance issue by setting the appropriate bit fields in the MXCSR floating-point control register to flush denormals to zero (FTZ) or to replace any denormals loaded from memory with zero (DAZ). Check your compiler documentation to determine whether it has options to control FTZ and DAZ. Note that these compiler options may slightly affect accuracy. FFT Optimized Radices You can improve the performance of Intel MKL FFT if the length of your data vector permits factorization into powers of optimized radices. In Intel MKL, the optimized radices are 2, 3, 5, 7, 11, and 13. Using Memory Management Intel MKL Memory Management Software Intel MKL has memory management software that controls memory buffers for the use by the library functions. New buffers that the library allocates when your application calls Intel MKL are not deallocated until the program ends. To get the amount of memory allocated by the memory management software, call the mkl_mem_stat() function. If your program needs to free memory, call mkl_free_buffers(). If another call is made to a library function that needs a memory buffer, the memory manager again allocates the buffers and they again remain allocated until either the program ends or the program deallocates the memory. This behavior facilitates better performance. However, some tools may report this behavior as a memory leak. The memory management software is turned on by default. To turn it off, set the MKL_DISABLE_FAST_MM environment variable to any value or call the mkl_disable_fast_mm() function. Be aware that this change may negatively impact performance of some Intel MKL routines, especially for small problem sizes. 5 Intel® Math Kernel Library for Linux* OS User's Guide 52Redefining Memory Functions In C/C++ programs, you can replace Intel MKL memory functions that the library uses by default with your own functions. To do this, use the memory renaming feature. Memory Renaming Intel MKL memory management by default uses standard C run-time memory functions to allocate or free memory. These functions can be replaced using memory renaming. Intel MKL accesses the memory functions by pointers i_malloc, i_free, i_calloc, and i_realloc, which are visible at the application level. These pointers initially hold addresses of the standard C run-time memory functions malloc, free, calloc, and realloc, respectively. You can programmatically redefine values of these pointers to the addresses of your application's memory management functions. Redirecting the pointers is the only correct way to use your own set of memory management functions. If you call your own memory functions without redirecting the pointers, the memory will get managed by two independent memory management packages, which may cause unexpected memory issues. How to Redefine Memory Functions To redefine memory functions, use the following procedure: 1. Include the i_malloc.h header file in your code. This header file contains all declarations required for replacing the memory allocation functions. The header file also describes how memory allocation can be replaced in those Intel libraries that support this feature. 2. Redefine values of pointers i_malloc, i_free, i_calloc, and i_realloc prior to the first call to MKL functions, as shown in the following example: #include "i_malloc.h" . . . i_malloc = my_malloc; i_calloc = my_calloc; i_realloc = my_realloc; i_free = my_free; . . . // Now you may call Intel MKL functions Managing Performance and Memory 5 535 Intel® Math Kernel Library for Linux* OS User's Guide 54Language-specific Usage Options 6 The Intel® Math Kernel Library (Intel® MKL) provides broad support for Fortran and C/C++ programming. However, not all functions support both Fortran and C interfaces. For example, some LAPACK functions have no C interface. You can call such functions from C using mixed-language programming. If you want to use LAPACK or BLAS functions that support Fortran 77 in the Fortran 95 environment, additional effort may be initially required to build compiler-specific interface libraries and modules from the source code provided with Intel MKL. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Using Language-Specific Interfaces with Intel® Math Kernel Library This section discusses mixed-language programming and the use of language-specific interfaces with Intel MKL. See also Appendix G in the Intel MKL Reference Manual for details of the FFTW interfaces to Intel MKL. Interface Libraries and Modules You can create the following interface libraries and modules using the respective makefiles located in the interfaces directory. File name Contains Libraries, in Intel MKL architecture-specific directories libmkl_blas95.a 1 Fortran 95 wrappers for BLAS (BLAS95) for IA-32 architecture. libmkl_blas95_ilp64.a 1 Fortran 95 wrappers for BLAS (BLAS95) supporting LP64 interface. libmkl_blas95_lp64.a 1 Fortran 95 wrappers for BLAS (BLAS95) supporting ILP64 interface. libmkl_lapack95.a 1 Fortran 95 wrappers for LAPACK (LAPACK95) for IA-32 architecture. libmkl_lapack95_lp64.a 1 Fortran 95 wrappers for LAPACK (LAPACK95) supporting LP64 interface. libmkl_lapack95_ilp64.a 1 Fortran 95 wrappers for LAPACK (LAPACK95) supporting ILP64 interface. 55File name Contains libfftw2xc_intel.a 1 Interfaces for FFTW version 2.x (C interface for Intel compilers) to call Intel MKL FFTs. libfftw2xc_gnu.a Interfaces for FFTW version 2.x (C interface for GNU compilers) to call Intel MKL FFTs. libfftw2xf_intel.a Interfaces for FFTW version 2.x (Fortran interface for Intel compilers) to call Intel MKL FFTs. libfftw2xf_gnu.a Interfaces for FFTW version 2.x (Fortran interface for GNU compiler) to call Intel MKL FFTs. libfftw3xc_intel.a 2 Interfaces for FFTW version 3.x (C interface for Intel compiler) to call Intel MKL FFTs. libfftw3xc_gnu.a Interfaces for FFTW version 3.x (C interface for GNU compilers) to call Intel MKL FFTs. libfftw3xf_intel.a 2 Interfaces for FFTW version 3.x (Fortran interface for Intel compilers) to call Intel MKL FFTs. libfftw3xf_gnu.a Interfaces for FFTW version 3.x (Fortran interface for GNU compilers) to call Intel MKL FFTs. libfftw2x_cdft_SINGLE.a Single-precision interfaces for MPI FFTW version 2.x (C interface) to call Intel MKL cluster FFTs. libfftw2x_cdft_DOUBLE.a Double-precision interfaces for MPI FFTW version 2.x (C interface) to call Intel MKL cluster FFTs. libfftw3x_cdft.a Interfaces for MPI FFTW version 3.x (C interface) to call Intel MKL cluster FFTs. libfftw3x_cdft_ilp64.a Interfaces for MPI FFTW version 3.x (C interface) to call Intel MKL cluster FFTs supporting the ILP64 interface. Modules, in architecture- and interface-specific subdirectories of the Intel MKL include directory blas95.mod 1 Fortran 95 interface module for BLAS (BLAS95). lapack95.mod 1 Fortran 95 interface module for LAPACK (LAPACK95). f95_precision.mod 1 Fortran 95 definition of precision parameters for BLAS95 and LAPACK95. mkl95_blas.mod 1 Fortran 95 interface module for BLAS (BLAS95), identical to blas95.mod. To be removed in one of the future releases. mkl95_lapack.mod 1 Fortran 95 interface module for LAPACK (LAPACK95), identical to lapack95.mod. To be removed in one of the future releases. mkl95_precision.mod 1 Fortran 95 definition of precision parameters for BLAS95 and LAPACK95, identical to f95_precision.mod. To be removed in one of the future releases. mkl_service.mod 1 Fortran 95 interface module for Intel MKL support functions. 1 Prebuilt for the Intel® Fortran compiler 2 FFTW3 interfaces are integrated with Intel MKL. Look into /interfaces/fftw3x*/ makefile for options defining how to build and where to place the standalone library with the wrappers. See Also Fortran 95 Interfaces to LAPACK and BLAS 6 Intel® Math Kernel Library for Linux* OS User's Guide 56Fortran 95 Interfaces to LAPACK and BLAS Fortran 95 interfaces are compiler-dependent. Intel MKL provides the interface libraries and modules precompiled with the Intel® Fortran compiler. Additionally, the Fortran 95 interfaces and wrappers are delivered as sources. (For more information, see Compiler-dependent Functions and Fortran 90 Modules). If you are using a different compiler, build the appropriate library and modules with your compiler and link the library as a user's library: 1. Go to the respective directory /interfaces/blas95 or / interfaces/lapack95 2. Type one of the following commands depending on your architecture: • For the IA-32 architecture, make libia32 INSTALL_DIR= • For the Intel® 64 architecture, make libintel64 [interface=lp64|ilp64] INSTALL_DIR= Important The parameter INSTALL_DIR is required. As a result, the required library is built and installed in the /lib directory, and the .mod files are built and installed in the /include/[/{lp64|ilp64}] directory, where is one of {ia32, intel64}. By default, the ifort compiler is assumed. You may change the compiler with an additional parameter of make: FC=. For example, the command make libintel64 FC=pgf95 INSTALL_DIR= interface=lp64 builds the required library and .mod files and installs them in subdirectories of . To delete the library from the building directory, use one of the following commands: • For the IA-32 architecture, make cleania32 INSTALL_DIR= • For the Intel ® 64 architecture, make cleanintel64 [interface=lp64|ilp64] INSTALL_DIR= • For all the architectures, make clean INSTALL_DIR= CAUTION Even if you have administrative rights, avoid setting INSTALL_DIR=../.. or INSTALL_DIR= in a build or clean command above because these settings replace or delete the Intel MKL prebuilt Fortran 95 library and modules. Compiler-dependent Functions and Fortran 90 Modules Compiler-dependent functions occur whenever the compiler inserts into the object code function calls that are resolved in its run-time library (RTL). Linking of such code without the appropriate RTL will result in undefined symbols. Intel MKL has been designed to minimize RTL dependencies. In cases where RTL dependencies might arise, the functions are delivered as source code and you need to compile the code with whatever compiler you are using for your application. Language-specific Usage Options 6 57In particular, Fortran 90 modules result in the compiler-specific code generation requiring RTL support. Therefore, Intel MKL delivers these modules compiled with the Intel compiler, along with source code, to be used with different compilers. Mixed-language Programming with the Intel Math Kernel Library Appendix A: Intel(R) Math Kernel Library Language Interfaces Support lists the programming languages supported for each Intel MKL function domain. However, you can call Intel MKL routines from different language environments. Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments Not all Intel MKL function domains support both C and Fortran environments. To use Intel MKL Fortran-style functions in C/C++ environments, you should observe certain conventions, which are discussed for LAPACK and BLAS in the subsections below. CAUTION Avoid calling BLAS 95/LAPACK 95 from C/C++. Such calls require skills in manipulating the descriptor of a deferred-shape array, which is the Fortran 90 type. Moreover, BLAS95/LAPACK95 routines contain links to a Fortran RTL. LAPACK and BLAS Because LAPACK and BLAS routines are Fortran-style, when calling them from C-language programs, follow the Fortran-style calling conventions: • Pass variables by address, not by value. Function calls in Example "Calling a Complex BLAS Level 1 Function from C++" and Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" illustrate this. • Store your data in Fortran style, that is, column-major rather than row-major order. With row-major order, adopted in C, the last array index changes most quickly and the first one changes most slowly when traversing the memory segment where the array is stored. With Fortran-style columnmajor order, the last index changes most slowly whereas the first index changes most quickly (as illustrated by the figure below for a two-dimensional array). For example, if a two-dimensional matrix A of size mxn is stored densely in a one-dimensional array B, you can access a matrix element like this: A[i][j] = B[i*n+j] in C ( i=0, ... , m-1, j=0, ... , -1) A(i,j) = B(j*m+i) in Fortran ( i=1, ... , m, j=1, ... , n). When calling LAPACK or BLAS routines from C, be aware that because the Fortran language is caseinsensitive, the routine names can be both upper-case or lower-case, with or without the trailing underscore. For example, the following names are equivalent: 6 Intel® Math Kernel Library for Linux* OS User's Guide 58• LAPACK: dgetrf, DGETRF, dgetrf_, and DGETRF_ • BLAS: dgemm, DGEMM, dgemm_, and DGEMM_ See Example "Calling a Complex BLAS Level 1 Function from C++" on how to call BLAS routines from C. See also the Intel(R) MKL Reference Manual for a description of the C interface to LAPACK functions. CBLAS Instead of calling BLAS routines from a C-language program, you can use the CBLAS interface. CBLAS is a C-style interface to the BLAS routines. You can call CBLAS routines using regular C-style calls. Use the mkl.h header file with the CBLAS interface. The header file specifies enumerated values and prototypes of all the functions. It also determines whether the program is being compiled with a C++ compiler, and if it is, the included file will be correct for use with C++ compilation. Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" illustrates the use of the CBLAS interface. C Interface to LAPACK Instead of calling LAPACK routines from a C-language program, you can use the C interface to LAPACK provided by Intel MKL. The C interface to LAPACK is a C-style interface to the LAPACK routines. This interface supports matrices in row-major and column-major order, which you can define in the first function argument matrix_order. Use the mkl_lapacke.h header file with the C interface to LAPACK. The header file specifies constants and prototypes of all the functions. It also determines whether the program is being compiled with a C++ compiler, and if it is, the included file will be correct for use with C++ compilation. You can find examples of the C interface to LAPACK in the examples/lapacke subdirectory in the Intel MKL installation directory. Using Complex Types in C/C++ As described in the documentation for the Intel® Fortran Compiler XE, C/C++ does not directly implement the Fortran types COMPLEX(4) and COMPLEX(8). However, you can write equivalent structures. The type COMPLEX(4) consists of two 4-byte floating-point numbers. The first of them is the real-number component, and the second one is the imaginary-number component. The type COMPLEX(8) is similar to COMPLEX(4) except that it contains two 8-byte floating-point numbers. Intel MKL provides complex types MKL_Complex8 and MKL_Complex16, which are structures equivalent to the Fortran complex types COMPLEX(4) and COMPLEX(8), respectively. The MKL_Complex8 and MKL_Complex16 types are defined in the mkl_types.h header file. You can use these types to define complex data. You can also redefine the types with your own types before including the mkl_types.h header file. The only requirement is that the types must be compatible with the Fortran complex layout, that is, the complex type must be a pair of real numbers for the values of real and imaginary parts. For example, you can use the following definitions in your C++ code: #define MKL_Complex8 std::complex and #define MKL_Complex16 std::complex See Example "Calling a Complex BLAS Level 1 Function from C++" for details. You can also define these types in the command line: -DMKL_Complex8="std::complex" -DMKL_Complex16="std::complex" See Also Intel® Software Documentation Library Language-specific Usage Options 6 59Calling BLAS Functions that Return the Complex Values in C/C++ Code Complex values that functions return are handled differently in C and Fortran. Because BLAS is Fortran-style, you need to be careful when handling a call from C to a BLAS function that returns complex values. However, in addition to normal function calls, Fortran enables calling functions as though they were subroutines, which provides a mechanism for returning the complex value correctly when the function is called from a C program. When a Fortran function is called as a subroutine, the return value is the first parameter in the calling sequence. You can use this feature to call a BLAS function from C. The following example shows how a call to a Fortran function as a subroutine converts to a call from C and the hidden parameter result gets exposed: Normal Fortran function call: result = cdotc( n, x, 1, y, 1 ) A call to the function as a subroutine: call cdotc( result, n, x, 1, y, 1) A call to the function from C: cdotc( &result, &n, x, &one, y, &one ) NOTE Intel MKL has both upper-case and lower-case entry points in the Fortran-style (caseinsensitive) BLAS, with or without the trailing underscore. So, all these names are equivalent and acceptable: cdotc, CDOTC, cdotc_, and CDOTC_. The above example shows one of the ways to call several level 1 BLAS functions that return complex values from your C and C++ applications. An easier way is to use the CBLAS interface. For instance, you can call the same function using the CBLAS interface as follows: cblas_cdotu( n, x, 1, y, 1, &result ) NOTE The complex value comes last on the argument list in this case. The following examples show use of the Fortran-style BLAS interface from C and C++, as well as the CBLAS (C language) interface: • Example "Calling a Complex BLAS Level 1 Function from C" • Example "Calling a Complex BLAS Level 1 Function from C++" • Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" Example "Calling a Complex BLAS Level 1 Function from C" The example below illustrates a call from a C program to the complex BLAS Level 1 function zdotc(). This function computes the dot product of two double-precision complex vectors. In this example, the complex dot product is returned in the structure c. #include "mkl.h" #define N 5 int main() { int n = N, inca = 1, incb = 1, i; MKL_Complex16 a[N], b[N], c; for( i = 0; i < n; i++ ){ a[i].real = (double)i; a[i].imag = (double)i * 2.0; b[i].real = (double)(n - i); b[i].imag = (double)i * 2.0; } zdotc( &c, &n, a, &inca, b, &incb ); printf( "The complex dot product is: ( %6.2f, %6.2f)\n", c.real, c.imag ); return 0; } 6 Intel® Math Kernel Library for Linux* OS User's Guide 60Example "Calling a Complex BLAS Level 1 Function from C++" Below is the C++ implementation: #include #include #define MKL_Complex16 std::complex #include "mkl.h" #define N 5 int main() { int n, inca = 1, incb = 1, i; std::complex a[N], b[N], c; n = N; for( i = 0; i < n; i++ ){ a[i] = std::complex(i,i*2.0); b[i] = std::complex(n-i,i*2.0); } zdotc(&c, &n, a, &inca, b, &incb ); std::cout << "The complex dot product is: " << c << std::endl; return 0; } Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" This example uses CBLAS: #include #include "mkl.h" typedef struct{ double re; double im; } complex16; #define N 5 int main() { int n, inca = 1, incb = 1, i; complex16 a[N], b[N], c; n = N; for( i = 0; i < n; i++ ){ a[i].re = (double)i; a[i].im = (double)i * 2.0; b[i].re = (double)(n - i); b[i].im = (double)i * 2.0; } cblas_zdotc_sub(n, a, inca, b, incb, &c ); printf( "The complex dot product is: ( %6.2f, %6.2f)\n", c.re, c.im ); return 0; } Support for Boost uBLAS Matrix-matrix Multiplication If you are used to uBLAS, you can perform BLAS matrix-matrix multiplication in C++ using Intel MKL substitution of Boost uBLAS functions. uBLAS is the Boost C++ open-source library that provides BLAS functionality for dense, packed, and sparse matrices. The library uses an expression template technique for passing expressions as function arguments, which enables evaluating vector and matrix expressions in one pass without temporary matrices. uBLAS provides two modes: • Debug (safe) mode, default. Checks types and conformance. • Release (fast) mode. Does not check types and conformance. To enable this mode, use the NDEBUG preprocessor symbol. The documentation for the Boost uBLAS is available at www.boost.org. Language-specific Usage Options 6 61Intel MKL provides overloaded prod() functions for substituting uBLAS dense matrix-matrix multiplication with the Intel MKL gemm calls. Though these functions break uBLAS expression templates and introduce temporary matrices, the performance advantage can be considerable for matrix sizes that are not too small (roughly, over 50). You do not need to change your source code to use the functions. To call them: • Include the header file mkl_boost_ublas_matrix_prod.hpp in your code (from the Intel MKL include directory) • Add appropriate Intel MKL libraries to the link line. The list of expressions that are substituted follows: prod( m1, m2 ) prod( trans(m1), m2 ) prod( trans(conj(m1)), m2 ) prod( conj(trans(m1)), m2 ) prod( m1, trans(m2) ) prod( trans(m1), trans(m2) ) prod( trans(conj(m1)), trans(m2) ) prod( conj(trans(m1)), trans(m2) ) prod( m1, trans(conj(m2)) ) prod( trans(m1), trans(conj(m2)) ) prod( trans(conj(m1)), trans(conj(m2)) ) prod( conj(trans(m1)), trans(conj(m2)) ) prod( m1, conj(trans(m2)) ) prod( trans(m1), conj(trans(m2)) ) prod( trans(conj(m1)), conj(trans(m2)) ) prod( conj(trans(m1)), conj(trans(m2)) ) These expressions are substituted in the release mode only (with NDEBUG preprocessor symbol defined). Supported uBLAS versions are Boost 1.34.1 and higher. To get them, visit www.boost.org. A code example provided in the /examples/ublas/source/sylvester.cpp file illustrates usage of the Intel MKL uBLAS header file for solving a special case of the Sylvester equation. To run the Intel MKL ublas examples, specify the BOOST_ROOT parameter in the make command, for instance, when using Boost version 1.37.0: make libia32 BOOST_ROOT = /boost_1_37_0 See Also Using Code Examples Invoking Intel MKL Functions from Java* Applications Intel MKL Java* Examples To demonstrate binding with Java, Intel MKL includes a set of Java examples in the following directory: /examples/java. The examples are provided for the following MKL functions: • ?gemm, ?gemv, and ?dot families from CBLAS • The complete set of non-cluster FFT functions 6 Intel® Math Kernel Library for Linux* OS User's Guide 62• ESSL 1 -like functions for one-dimensional convolution and correlation • VSL Random Number Generators (RNG), except user-defined ones and file subroutines • VML functions, except GetErrorCallBack, SetErrorCallBack, and ClearErrorCallBack You can see the example sources in the following directory: /examples/java/examples. The examples are written in Java. They demonstrate usage of the MKL functions with the following variety of data: • 1- and 2-dimensional data sequences • Real and complex types of the data • Single and double precision However, the wrappers, used in the examples, do not: • Demonstrate the use of large arrays (>2 billion elements) • Demonstrate processing of arrays in native memory • Check correctness of function parameters • Demonstrate performance optimizations The examples use the Java Native Interface (JNI* developer framework) to bind with Intel MKL. The JNI documentation is available from http://java.sun.com/javase/6/docs/technotes/guides/jni/. The Java example set includes JNI wrappers that perform the binding. The wrappers do not depend on the examples and may be used in your Java applications. The wrappers for CBLAS, FFT, VML, VSL RNG, and ESSL-like convolution and correlation functions do not depend on each other. To build the wrappers, just run the examples. The makefile builds the wrapper binaries. After running the makefile, you can run the examples, which will determine whether the wrappers were built correctly. As a result of running the examples, the following directories will be created in /examples/ java: • docs • include • classes • bin • _results The directories docs, include, classes, and bin will contain the wrapper binaries and documentation; the directory _results will contain the testing results. For a Java programmer, the wrappers are the following Java classes: • com.intel.mkl.CBLAS • com.intel.mkl.DFTI • com.intel.mkl.ESSL • com.intel.mkl.VML • com.intel.mkl.VSL Documentation for the particular wrapper and example classes will be generated from the Java sources while building and running the examples. To browse the documentation, open the index file in the docs directory (created by the build script): /examples/java/docs/index.html. The Java wrappers for CBLAS, VML, VSL RNG, and FFT establish the interface that directly corresponds to the underlying native functions, so you can refer to the Intel MKL Reference Manual for their functionality and parameters. Interfaces for the ESSL-like functions are described in the generated documentation for the com.intel.mkl.ESSL class. Each wrapper consists of the interface part for Java and JNI stub written in C. You can find the sources in the following directory: Language-specific Usage Options 6 63/examples/java/wrappers. Both Java and C parts of the wrapper for CBLAS and VML demonstrate the straightforward approach, which you may use to cover additional CBLAS functions. The wrapper for FFT is more complicated because it needs to support the lifecycle for FFT descriptor objects. To compute a single Fourier transform, an application needs to call the FFT software several times with the same copy of the native FFT descriptor. The wrapper provides the handler class to hold the native descriptor, while the virtual machine runs Java bytecode. The wrapper for VSL RNG is similar to the one for FFT. The wrapper provides the handler class to hold the native descriptor of the stream state. The wrapper for the convolution and correlation functions mitigates the same difficulty of the VSL interface, which assumes a similar lifecycle for "task descriptors". The wrapper utilizes the ESSL-like interface for those functions, which is simpler for the case of 1-dimensional data. The JNI stub additionally encapsulates the MKL functions into the ESSL-like wrappers written in C and so "packs" the lifecycle of a task descriptor into a single call to the native method. The wrappers meet the JNI Specification versions 1.1 and 5.0 and should work with virtually every modern implementation of Java. The examples and the Java part of the wrappers are written for the Java language described in "The Java Language Specification (First Edition)" and extended with the feature of "inner classes" (this refers to late 1990s). This level of language version is supported by all versions of the Sun Java Development Kit* (JDK*) developer toolkit and compatible implementations starting from version 1.1.5, or by all modern versions of Java. The level of C language is "Standard C" (that is, C89) with additional assumptions about integer and floatingpoint data types required by the Intel MKL interfaces and the JNI header files. That is, the native float and double data types must be the same as JNI jfloat and jdouble data types, respectively, and the native int must be 4 bytes long. 1 IBM Engineering Scientific Subroutine Library (ESSL*). See Also Running the Java* Examples Running the Java* Examples The Java examples support all the C and C++ compilers that Intel MKL does. The makefile intended to run the examples also needs the make utility, which is typically provided with the Linux* OS distribution. To run Java examples, the JDK* developer toolkit is required for compiling and running Java code. A Java implementation must be installed on the computer or available via the network. You may download the JDK from the vendor website. The examples should work for all versions of JDK. However, they were tested only with the following Java implementation s for all the supported architectures: • J2SE* SDK 1.4.2, JDK 5.0 and 6.0 from Sun Microsystems, Inc. (http://sun.com/). • JRockit* JDK 1.4.2 and 5.0 from Oracle Corporation (http://oracle.com/). Note that the Java run-time environment* (JRE*) system, which may be pre-installed on your computer, is not enough. You need the JDK* developer toolkit that supports the following set of tools: • java • javac • javah • javadoc To make these tools available for the examples makefile, set the JAVA_HOME environment variable and add the JDK binaries directory to the system PATH, for example , using thebash shell: export JAVA_HOME=/home//jdk1.5.0_09 export PATH=${JAVA_HOME}/bin:${PATH} 6 Intel® Math Kernel Library for Linux* OS User's Guide 64You may also need to clear the JDK_HOME environment variable, if it is assigned a value: unset JDK_HOME To start the examples, use the makefile found in the Intel MKL Java examples directory: make {soia32|sointel64|libia32|libintel64} [function=...] [compiler=...] If you type the make command and omit the target (for example, soia32), the makefile prints the help info, which explains the targets and parameters. For the examples list, see the examples.lst file in the Java examples directory. Known Limitations of the Java* Examples This section explains limitations of Java examples. Functionality Some Intel MKL functions may fail to work if called from the Java environment by using a wrapper, like those provided with the Intel MKL Java examples. Only those specific CBLAS, FFT, VML, VSL RNG, and the convolution/correlation functions listed in the Intel MKL Java Examples section were tested with the Java environment. So, you may use the Java wrappers for these CBLAS, FFT, VML, VSL RNG, and convolution/ correlation functions in your Java applications. Performance The Intel MKL functions must work faster than similar functions written in pure Java. However, the main goal of these wrappers is to provide code examples, not maximum performance. So, an Intel MKL function called from a Java application will probably work slower than the same function called from a program written in C/ C++ or Fortran. Known bugs There are a number of known bugs in Intel MKL (identified in the Release Notes), as well as incompatibilities between different versions of JDK. The examples and wrappers include workarounds for these problems. Look at the source code in the examples and wrappers for comments that describe the workarounds. Language-specific Usage Options 6 656 Intel® Math Kernel Library for Linux* OS User's Guide 66Coding Tips 7 This section discusses programming with the Intel® Math Kernel Library (Intel® MKL) to provide coding tips that meet certain, specific needs, such as consistent results of computations or conditional compilation. Aligning Data for Consistent Results Routines in Intel MKL may return different results from run-to-run on the same system. This is usually due to a change in the order in which floating-point operations are performed. The two most influential factors are array alignment and parallelism. Array alignment can determine how internal loops order floating-point operations. Non-deterministic parallelism may change the order in which computational tasks are executed. While these results may differ, they should still fall within acceptable computational error bounds. To better assure identical results from run-to-run, do the following: • Align input arrays on 16-byte boundaries • Run Intel MKL in the sequential mode To align input arrays on 16-byte boundaries, use mkl_malloc() in place of system provided memory allocators, as shown in the code example below. Sequential mode of Intel MKL removes the influence of nondeterministic parallelism. Aligning Addresses on 16-byte Boundaries // ******* C language ******* ... #include ... void *darray; int workspace; ... // Allocate workspace aligned on 16-byte boundary darray = mkl_malloc( sizeof(double)*workspace, 16 ); ... // call the program using MKL mkl_app( darray ); ... // Free workspace mkl_free( darray ); ! ******* Fortran language ******* ... double precision darray pointer (p_wrk,darray(1)) integer workspace ... ! Allocate workspace aligned on 16-byte boundary p_wrk = mkl_malloc( 8*workspace, 16 ) ... ! call the program using MKL call mkl_app( darray ) ... ! Free workspace call mkl_free(p_wrk) 67Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation Preprocessor symbols (macros) substitute values in a program before it is compiled. The substitution is performed in the preprocessing phase. The following preprocessor symbols are available: Predefined Preprocessor Symbol Description __INTEL_MKL__ Intel MKL major version __INTEL_MKL_MINOR__ Intel MKL minor version __INTEL_MKL_UPDATE__ Intel MKL update number INTEL_MKL_VERSION Intel MKL full version in the following format: INTEL_MKL_VERSION = (__INTEL_MKL__*100+__INTEL_MKL_MINOR__)*100+__I NTEL_MKL_UPDATE__ These symbols enable conditional compilation of code that uses new features introduced in a particular version of the library. To perform conditional compilation: 1. Include in your code the file where the macros are defined: • mkl.h for C/C++ • mkl.fi for Fortran 2. [Optionally] Use the following preprocessor directives to check whether the macro is defined: • #ifdef, #endif for C/C++ • !DEC$IF DEFINED, !DEC$ENDIF for Fortran 3. Use preprocessor directives for conditional inclusion of code: • #if, #endif for C/C++ • !DEC$IF, !DEC$ENDIF for Fortran Example Compile a part of the code if Intel MKL version is MKL 10.3 update 4: C/C++: #include "mkl.h" #ifdef INTEL_MKL_VERSION #if INTEL_MKL_VERSION == 100304 // Code to be conditionally compiled #endif #endif Fortran: include "mkl.fi" !DEC$IF DEFINED INTEL_MKL_VERSION !DEC$IF INTEL_MKL_VERSION .EQ. 100304 * Code to be conditionally compiled !DEC$ENDIF !DEC$ENDIF 7 Intel® Math Kernel Library for Linux* OS User's Guide 68Working with the Intel® Math Kernel Library Cluster Software 8 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Linking with ScaLAPACK and Cluster FFTs The Intel MKL ScaLAPACK and Cluster FFTs support MPI implementations identified in the Intel MKL Release Notes. To link a program that calls ScaLAPACK or Cluster FFTs, you need to know how to link a message-passing interface (MPI) application first. Use mpi scripts to do this. For example, mpicc or mpif77 are C or FORTRAN 77 scripts, respectively, that use the correct MPI header files. The location of these scripts and the MPI library depends on your MPI implementation. For example, for the default installation of MPICH, /opt/mpich/bin/mpicc and /opt/ mpich/bin/mpif77 are the compiler scripts and /opt/mpich/lib/libmpich.a is the MPI library. Check the documentation that comes with your MPI implementation for implementation-specific details of linking. To link with Intel MKL ScaLAPACK and/or Cluster FFTs, use the following general form : < linker script> \ -L [-Wl,--start-group] \ [-Wl,--end-group] where the placeholders stand for paths and libraries as explained in the following table: One of ScaLAPACK or Cluster FFT libraries for the appropriate architecture and programming interface (LP64 or ILP64). Available libraries are listed in Directory Structure in Detail. For example, for the IA-32 architecture, it is either - lmkl_scalapack_core or -lmkl_cdft_core. The BLACS library corresponding to your architecture, programming interface (LP64 or ILP64), and MPI version. Available BLACS libraries are listed in Directory Structure in Detail. For example, for the IA-32 architecture, choose one of - lmkl_blacs, -lmkl_blacs_intelmpi, or -lmkl_blacs_openmpi, depending on the MPI version you use; specifically, for Intel MPI 3.x, choose - lmkl_blacs_intelmpi. for ScaLAPACK, and for Cluster FFTs. Processor optimized kernels, threading library, and system library for threading support, linked as described in Listing Libraries on a Link Line. 69 The LAPACK library and . One of several MPI implementations (MPICH, Intel MPI, and so on). < linker script> A linker script that corresponds to the MPI version. For instance, for Intel MPI 3.x, use . For example, if you are using Intel MPI 3.x, want to statically use the LP64 interface with ScaLAPACK, and have only one MPI process per core (and thus do not use threading), specify the following linker options: -L$MKLPATH -I$MKLINCLUDE -Wl,--start-group $MKLPATH/libmkl_scalapack_lp64.a $MKLPATH/ libmkl_blacs_intelmpi_lp64.a $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -static_mpi -Wl,--end-group -lpthread -lm NOTE Grouping symbols -Wl,--start-group and -Wl,--end-group are required for static linking. TIP Use the Link-line Advisor to quickly choose the appropriate set of , , and . See Also Linking Your Application with the Intel® Math Kernel Library Examples for Linking with ScaLAPACK and Cluster FFT Setting the Number of Threads The OpenMP* software responds to the environment variable OMP_NUM_THREADS. Intel MKL also has other mechanisms to set the number of threads, such as the MKL_NUM_THREADS or MKL_DOMAIN_NUM_THREADS environment variables (see Using Additional Threading Control). Make sure that the relevant environment variables have the same and correct values on all the nodes. Intel MKL versions 10.0 and higher no longer set the default number of threads to one, but depend on the OpenMP libraries used with the compiler to set the default number. For the threading layer based on the Intel compiler (libmkl_intel_thread.a), this value is the number of CPUs according to the OS. CAUTION Avoid over-prescribing the number of threads, which may occur, for instance, when the number of MPI ranks per node and the number of threads per node are both greater than one. The product of MPI ranks per node and the number of threads per node should not exceed the number of physical cores per node. The best way to set an environment variable, such as OMP_NUM_THREADS, is your login environment. Remember that changing this value on the head node and then doing your run, as you do on a sharedmemory (SMP) system, does not change the variable on all the nodes because mpirun starts a fresh default shell on all the nodes. To change the number of threads on all the nodes, in .bashrc, add a line at the top, as follows: OMP_NUM_THREADS=1; export OMP_NUM_THREADS You can run multiple CPUs per node using MPICH. To do this, build MPICH to enable multiple CPUs per node. Be aware that certain MPICH applications may fail to work perfectly in a threaded environment (see the Known Limitations section in the Release Notes. If you encounter problems with MPICH and setting of the number of threads is greater than one, first try setting the number of threads to one and see whether the problem persists. 8 Intel® Math Kernel Library for Linux* OS User's Guide 70See Also Techniques to Set the Number of Threads Using Shared Libraries All needed shared libraries must be visible on all the nodes at run time. To achieve this, point these libraries by the LD_LIBRARY_PATH environment variable in the .bashrc file. If Intel MKL is installed only on one node, link statically when building your Intel MKL applications rather than use shared libraries. The Intel compilers or GNU compilers can be used to compile a program that uses Intel MKL. However, make sure that the MPI implementation and compiler match up correctly. Building ScaLAPACK Tests To build ScaLAPACK tests, • For the IA-32 architecture, add libmkl_scalapack_core.a to your link command. • For the Intel® 64 architecture, add libmkl_scalapack_lp64.a or libmkl_scalapack_ilp64.a, depending on the desired interface. Examples for Linking with ScaLAPACK and Cluster FFT This section provides examples of linking with ScaLAPACK and Cluster FFT. Note that a binary linked with ScaLAPACK runs the same way as any other MPI application (refer to the documentation that comes with your MPI implementation). For instance, the script mpirun is used in the case of MPICH2 and OpenMPI, and a number of MPI processes is set by -np. In the case of MPICH 2.0 and all Intel MPIs, start the daemon before running your application; the execution is driven by the script mpiexec. For further linking examples, see the support website for Intel products at http://www.intel.com/software/ products/support/. See Also Directory Structure in Detail Examples for Linking a C Application These examples illustrate linking of an application whose main module is in C under the following conditions: • MPICH2 1.0.7 or higher is installed in /opt/mpich. • $MKLPATH is a user-defined variable containing /lib/ia32. • You use the Intel® C++ Compiler 10.0 or higher. To link with ScaLAPACK for a cluster of systems based on the IA-32 architecture, use the following link line: /opt/mpich/bin/mpicc \ -L$MKLPATH \ -lmkl_scalapack_core \ -lmkl_blacs_intelmpi \ -lmkl_intel -lmkl_intel_thread -lmkl_core \ -liomp5 -lpthread To link with Cluster FFT for a cluster of systems based on the IA-32 architecture, use the following link line: /opt/mpich/bin/mpicc \ -Wl,--start-group \ $MKLPATH/libmkl_cdft_core.a \ Working with the Intel® Math Kernel Library Cluster Software 8 71 $MKLPATH/libmkl_blacs_intelmpi.a \ $MKLPATH/libmkl_intel.a \ $MKLPATH/libmkl_intel_thread.a \ $MKLPATH/libmkl_core.a \ -Wl,--end-group \ -liomp5 -lpthread See Also Linking with ScaLAPACK and Cluster FFTs Examples for Linking a Fortran Application These examples illustrate linking of an application whose main module is in Fortran under the following conditions: • Intel MPI 3.0 is installed in /opt/intel/mpi/3.0. • $MKLPATH is a user-defined variable containing /lib/intel64 . • You use the Intel® Fortran Compiler 10.0 or higher. To link with ScaLAPACK for a cluster of systems based on the Intel® 64 architecture, use the following link line: /opt/intel/mpi/3.0/bin/mpiifort \ -L$MKLPATH \ -lmkl_scalapack_lp64 \ -lmkl_blacs_intelmpi_lp64 \ -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core \ -liomp5 -lpthread To link with Cluster FFT for a cluster of systems based on the Intel® 64 architecture, use the following link line: /opt/intel/mpi/3.0/bin/mpiifort \ -Wl,--start-group \ $MKLPATH/libmkl_cdft_core.a \ $MKLPATH/libmkl_blacs_intelmpi_ilp64.a \ $MKLPATH/libmkl_intel_ilp64.a \ $MKLPATH/libmkl_intel_thread.a \ $MKLPATH/libmkl_core.a \ -Wl,--end-group \ -liomp5 -lpthread See Also Linking with ScaLAPACK and Cluster FFTs 8 Intel® Math Kernel Library for Linux* OS User's Guide 72Programming with Intel® Math Kernel Library in the Eclipse* Integrated Development Environment (IDE) 9 Configuring the Eclipse* IDE CDT to Link with Intel MKL This section explains how to configure the Eclipse* Integrated Development Environment (IDE) C/C++ Development Tools (CDT) to link with Intel® Math Kernel Library (Intel® MKL). TIP After configuring your CDT, you can benefit from the Eclipse-provided code assist feature. See Code/Context Assist description in the CDT Help for details. To configure your Eclipse IDE CDT to link with Intel MKL, you need to perform the steps explained below. The specific instructions for performing these steps depend on your version of the CDT and on the tool-chain/ compiler integration. Refer to the CDT Help for more details. To configure your Eclipse IDE CDT, do the following: 1. Open Project Properties for your project. 2. Add the Intel MKL include path, that is, /include, to the project's include paths. 3. Add the Intel MKL library path for the target architecture to the project's library paths. For example, for the Intel® 64 architecture, add /lib/intel64. 4. Specify the names of the Intel MKL libraries to link with your application. For example, you may need the following libraries: mkl_intel_lp64, mkl_intel_thread, mkl_core, and iomp5. NOTE Because compilers typically require library names rather than file names, omit the "lib" prefix and "a" or "so" extension. See Also Selecting Libraries to Link with Linking in Detail Getting Assistance for Programming in the Eclipse* IDE Intel MKL provides an Eclipse* IDE plug-in (com.intel.mkl.help) that contains the Intel MKL Reference Manual (see High-level Directory Structure for the plug-in location after the library installation). To install the plug-in, do one of the following: • Use the Eclipse IDE Update Manager (recommended). To invoke the Manager, use Help > Software Updates command in your Eclipse IDE. • Copy the plug-in to the plugins folder of your Eclipse IDE directory. In this case, if you use earlier C/C++ Development Tools (CDT) versions (3.x, 4.x), delete or rename the index subfolder in the eclipse/configuration/org.eclipse.help.base folder of your Eclipse IDE to avoid delays in Index updating. The following Intel MKL features assist you while programming in the Eclipse* IDE: • The Intel MKL Reference Manual viewable from within the IDE 73• Eclipse Help search tuned to target the Intel Web sites • Code/Content Assist in the Eclipse IDE CDT The Intel MKL plug-in for Eclipse IDE provides the first two features. The last feature is native to the Eclipse IDE CDT. See the Code Assist description in Eclipse IDE Help for details. Viewing the Intel® Math Kernel Library Reference Manual in the Eclipse* IDE To view the Reference Manual, in Eclipse, 1. Select Help > Help Contents from the menu. 2. In the Help tab, under All Topics , click Intel® Math Kernel Library Help . 3. In the Help tree that expands, click Intel Math Kernel Library Reference Manual. 4. The Intel MKL Help Index is also available in Eclipse, and the Reference Manual is included in the Eclipse Help search. Searching the Intel Web Site from the Eclipse* IDE The Intel MKL plug-in tunes Eclipse Help search to targethttp://www.intel.com so that when you are connected to the Internet and run a search from the Eclipse Help pane, the search hits at the site are shown through a separate link. The following figure shows search results for "VML Functions" in Eclipse Help. In the figure, 1 hit means an entry hit to the respective site. Click "Intel.com (1 hit)" to open the list of actual hits to the Intel Web site. 9 Intel® Math Kernel Library for Linux* OS User's Guide 74Programming with Intel® Math Kernel Library in the Eclipse* Integrated Development Environment (IDE) 9 759 Intel® Math Kernel Library for Linux* OS User's Guide 76LINPACK and MP LINPACK Benchmarks 10 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Intel® Optimized LINPACK Benchmark for Linux* OS Intel® Optimized LINPACK Benchmark is a generalization of the LINPACK 1000 benchmark. It solves a dense (real*8) system of linear equations (Ax=b), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. The generalization is in the number of equations (N) it can solve, which is not limited to 1000. It uses partial pivoting to assure the accuracy of the results. Do not use this benchmark to report LINPACK 100 performance because that is a compiled-code only benchmark. This is a shared-memory (SMP) implementation which runs on a single platform. Do not confuse this benchmark with: • MP LINPACK, which is a distributed memory version of the same benchmark. • LINPACK, the library, which has been expanded upon by the LAPACK library. Intel provides optimized versions of the LINPACK benchmarks to help you obtain high LINPACK benchmark results on your genuine Intel processor systems more easily than with the High Performance Linpack (HPL) benchmark. Use this package to benchmark your SMP machine. Additional information on this software as well as other Intel software performance products is available at http://www.intel.com/software/products/. Contents of the Intel® Optimized LINPACK Benchmark The Intel Optimized LINPACK Benchmark for Linux* OS contains the following files, located in the ./ benchmarks/linpack/ subdirectory of the Intel® Math Kernel Library (Intel® MKL) directory: File in ./benchmarks/ linpack/ Description xlinpack_xeon32 The 32-bit program executable for a system based on Intel® Xeon® processor or Intel® Xeon® processor MP with or without Streaming SIMD Extensions 3 (SSE3). xlinpack_xeon64 The 64-bit program executable for a system with Intel® Xeon® processor using Intel® 64 architecture. runme_xeon32 A sample shell script for executing a pre-determined problem set for linpack_xeon32. OMP_NUM_THREADS set to 2 processors. runme_xeon64 A sample shell script for executing a pre-determined problem set for linpack_xeon64. OMP_NUM_THREADS set to 4 processors. 77File in ./benchmarks/ linpack/ Description lininput_xeon32 Input file for pre-determined problem for the runme_xeon32 script. lininput_xeon64 Input file for pre-determined problem for the runme_xeon64 script. lin_xeon32.txt Result of the runme_xeon32 script execution. lin_xeon64.txt Result of the runme_xeon64 script execution. help.lpk Simple help file. xhelp.lpk Extended help file. See Also High-level Directory Structure Running the Software To obtain results for the pre-determined sample problem sizes on a given system, type one of the following, as appropriate: ./runme_xeon32 ./runme_xeon64 To run the software for other problem sizes, see the extended help included with the program. Extended help can be viewed by running the program executable with the -e option: ./xlinpack_xeon32 -e ./xlinpack_xeon64 -e The pre-defined data input fileslininput_xeon32 and lininput_xeon64 are provided merely as examples. Different systems have different number of processors or amount of memory and thus require new input files. The extended help can be used for insight into proper ways to change the sample input files. Each input file requires at least the following amount of memory: lininput_xeon32 2 GB lininput_xeon64 16 GB If the system has less memory than the above sample data input requires, you may need to edit or create your own data input files, as explained in the extended help. Each sample script uses the OMP_NUM_THREADS environment variable to set the number of processors it is targeting. To optimize performance on a different number of physical processors, change that line appropriately. If you run the Intel Optimized LINPACK Benchmark without setting the number of threads, it will default to the number of cores according to the OS. You can find the settings for this environment variable in the runme_* sample scripts. If the settings do not yet match the situation for your machine, edit the script. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 10 Intel® Math Kernel Library for Linux* OS User's Guide 78Known Limitations of the Intel® Optimized LINPACK Benchmark The following limitations are known for the Intel Optimized LINPACK Benchmark for Linux* OS: • Intel Optimized LINPACK Benchmark is threaded to effectively use multiple processors. So, in multiprocessor systems, best performance will be obtained with the Intel® Hyper-Threading Technology turned off, which ensures that the operating system assigns threads to physical processors only. • If an incomplete data input file is given, the binaries may either hang or fault. See the sample data input files and/or the extended help for insight into creating a correct data input file. Intel® Optimized MP LINPACK Benchmark for Clusters Overview of the Intel® Optimized MP LINPACK Benchmark for Clusters The Intel® Optimized MP LINPACK Benchmark for Clusters is based on modifications and additions to HPL 2.0 from Innovative Computing Laboratories (ICL) at the University of Tennessee, Knoxville (UTK). The Intel Optimized MP LINPACK Benchmark for Clusters can be used for Top 500 runs (see http://www.top500.org). To use the benchmark you need be intimately familiar with the HPL distribution and usage. The Intel Optimized MP LINPACK Benchmark for Clusters provides some additional enhancements and bug fixes designed to make the HPL usage more convenient, as well as explain Intel® Message-Passing Interface (MPI) settings that may enhance performance. The ./benchmarks/mp_linpack directory adds techniques to minimize search times frequently associated with long runs. The Intel® Optimized MP LINPACK Benchmark for Clusters is an implementation of the Massively Parallel MP LINPACK benchmark by means of HPL code. It solves a random dense (real*8) system of linear equations (Ax=b), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. You can solve any size (N) system of equations that fit into memory. The benchmark uses full row pivoting to ensure the accuracy of the results. Use the Intel Optimized MP LINPACK Benchmark for Clusters on a distributed memory machine. On a shared memory machine, use the Intel Optimized LINPACK Benchmark. Intel provides optimized versions of the LINPACK benchmarks to help you obtain high LINPACK benchmark results on your systems based on genuine Intel processors more easily than with the HPL benchmark. Use the Intel Optimized MP LINPACK Benchmark to benchmark your cluster. The prebuilt binaries require that you first install Intel® MPI 3.x be installed on the cluster. The run-time version of Intel MPI is free and can be downloaded from www.intel.com/software/products/ . The Intel package includes software developed at the University of Tennessee, Knoxville, Innovative Computing Laboratories and neither the University nor ICL endorse or promote this product. Although HPL 2.0 is redistributable under certain conditions, this particular package is subject to the Intel MKL license. Intel MKL has introduced a new functionality into MP LINPACK, which is called a hybrid build, while continuing to support the older version. The term hybrid refers to special optimizations added to take advantage of mixed OpenMP*/MPI parallelism. If you want to use one MPI process per node and to achieve further parallelism by means of OpenMP, use the hybrid build. In general, the hybrid build is useful when the number of MPI processes per core is less than one. If you want to rely exclusively on MPI for parallelism and use one MPI per core, use the non-hybrid build. In addition to supplying certain hybrid prebuilt binaries, Intel MKL supplies some hybrid prebuilt libraries for Intel® MPI to take advantage of the additional OpenMP* optimizations. If you wish to use an MPI version other than Intel MPI, you can do so by using the MP LINPACK source provided. You can use the source to build a non-hybrid version that may be used in a hybrid mode, but it would be missing some of the optimizations added to the hybrid version. Non-hybrid builds are the default of the source code makefiles provided. In some cases, the use of the hybrid mode is required for external reasons. If there is a choice, the non-hybrid code may be faster. To use the non-hybrid code in a hybrid mode, use the threaded version of Intel MKL BLAS, link with a thread-safe MPI, and call function MPI_init_thread() so as to indicate a need for MPI to be thread-safe. LINPACK and MP LINPACK Benchmarks 10 79Intel MKL also provides prebuilt binaries that are dynamically linked against Intel MPI libraries. NOTE Performance of statically and dynamically linked prebuilt binaries may be different. The performance of both depends on the version of Intel MPI you are using. You can build binaries statically linked against a particular version of Intel MPI by yourself. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Contents of the Intel® Optimized MP LINPACK Benchmark for Clusters The Intel Optimized MP LINPACK Benchmark for Clusters (MP LINPACK Benchmark) includes the HPL 2.0 distribution in its entirety, as well as the modifications delivered in the files listed in the table below and located in the ./benchmarks/mp_linpack/ subdirectory of the Intel MKL directory. Directory/File in ./benchmarks/ mp_linpack/ Contents testing/ptest/HPL_pdtest.c HPL 2.0 code modified to display captured DGEMM information in ASYOUGO2_DISPLAY if it was captured (for details, see New Features). src/blas/HPL_dgemm.c HPL 2.0 code modified to capture DGEMM information, if desired, from ASYOUGO2_DISPLAY. src/grid/HPL_grid_init.c HPL 2.0 code modified to do additional grid experiments originally not in HPL 2.0. src/pgesv/HPL_pdgesvK2.c HPL 2.0 code modified to do ASYOUGO and ENDEARLY modifications. src/pgesv/HPL_pdgesv0.c HPL 2.0 code modified to do ASYOUGO, ASYOUGO2, and ENDEARLY modifications. testing/ptest/HPL.dat HPL 2.0 sample HPL.dat modified. Make.ia32 (New) Sample architecture makefile for processors using the IA-32 architecture and Linux OS. Make.intel64 (New) Sample architecture makefile for processors using the Intel® 64 architecture and Linux OS. HPL.dat A repeat of testing/ptest/HPL.dat in the top-level directory. Prebuilt executables readily available for simple performance testing. bin_intel/ia32/xhpl_ia32 (New) Prebuilt binary for the IA-32 architecture and Linux OS. Statically linked against Intel® MPI 3.2. bin_intel/ia32/xhpl_ia32_dynamic (New) Prebuilt binary for the IA-32 architecture and Linux OS. Dynamically linked against Intel® MPI 3.2. 10 Intel® Math Kernel Library for Linux* OS User's Guide 80Directory/File in ./benchmarks/ mp_linpack/ Contents bin_intel/intel64/xhpl_intel64 (New) Prebuilt binary for the Intel® 64 architecture and Linux OS. Statically linked against Intel® MPI 3.2. bin_intel/intel64/ xhpl_intel64_dynamic (New) Prebuilt binary for the Intel® 64 architecture and Linux OS. Dynamically linked against Intel® MPI 3.2. Prebuilt hybrid executables bin_intel/ia32/xhpl_hybrid_ia32 (New) Prebuilt hybrid binary for the IA-32 architecture and Linux OS. Statically linked against Intel® MPI 3.2. bin_intel/ia32/ xhpl_hybrid_ia32_dynamic (New) Prebuilt hybrid binary for the IA-32 architecture and Linux OS. Dynamically linked against Intel® MPI 3.2. bin_intel/intel64/ xhpl_hybrid_intel64 (New) Prebuilt hybrid binary for the Intel® 64 architecture and Linux OS. Statically linked against Intel® MPI 3.2. bin_intel/intel64/ xhpl_hybrid_intel64_dynamic (New) Prebuilt hybrid binary for the Intel® 64 and Linux OS. Dynamically linked against Intel® MPI 3.2. Prebuilt libraries lib_hybrid/ia32/libhpl_hybrid.a (New) Prebuilt library with the hybrid version of MP LINPACK for the IA-32 architecture and Intel MPI 3.2. lib_hybrid/intel64/ libhpl_hybrid.a (New) Prebuilt library with the hybrid version of MP LINPACK for the Intel® 64 architecture and Intel MPI 3.2. Files that refer to run scripts bin_intel/ia32/runme_ia32 (New) Sample run script for the IA-32 architecture and a pure MPI binary statically linked against Intel MPI 3.2. bin_intel/ia32/ runme_ia32_dynamic (New) Sample run script for the IA-32 architecture and a pure MPI binary dynamically linked against Intel MPI 3.2. bin_intel/ia32/HPL_serial.dat (New) Example of an MP LINPACK benchmark input file for a pure MPI binary and the IA-32 architecture. bin_intel/ia32/runme_hybrid_ia32 (New) Sample run script for the IA-32 architecture and a hybrid binary statically linked against Intel MPI 3.2. bin_intel/ia32/ runme_hybrid_ia32_dynamic (New) Sample run script for the IA-32 architecture and a hybrid binary dynamically linked against Intel MPI 3.2. bin_intel/ia32/HPL_hybrid.dat (New) Example of an MP LINPACK benchmark input file for a hybrid binary and the IA-32 architecture. bin_intel/intel64/runme_intel64 (New) Sample run script for the Intel® 64 architecture and a pure MPI binary statically linked against Intel MPI 3.2. bin_intel/intel64/ runme_intel64_dynamic (New) Sample run script for the Intel® 64 architecture and a pure MPI binary dynamically linked against Intel MPI 3.2. bin_intel/intel64/HPL_serial.dat (New) Example of an MP LINPACK benchmark input file for a pure MPI binary and the Intel® 64 architecture. bin_intel/intel64/ runme_hybrid_intel64 (New) Sample run script for the Intel® 64 architecture and a hybrid binary statically linked against Intel MPI 3.2. LINPACK and MP LINPACK Benchmarks 10 81Directory/File in ./benchmarks/ mp_linpack/ Contents bin_intel/intel64/ runme_hybrid_intel64_dynamic (New) Sample run script for the Intel® 64 architecture and a hybrid binary dynamically linked against Intel MPI 3.2. bin_intel/intel64/HPL_hybrid.dat (New) Example of an MP LINPACK benchmark input file for a hybrid binary and the Intel® 64 architecture. nodeperf.c (New) Sample utility that tests the DGEMM speed across the cluster. See Also High-level Directory Structure Building the MP LINPACK The MP LINPACK Benchmark contains a few sample architecture makefiles. You can edit them to fit your specific configuration. Specifically: • Set TOPdir to the directory that MP LINPACK is being built in. • You may set MPI variables, that is, MPdir, MPinc, and MPlib. • Specify the location Intel MKL and of files to be used (LAdir, LAinc, LAlib). • Adjust compiler and compiler/linker options. • Specify the version of MP LINPACK you are going to build (hybrid or non-hybrid) by setting the version parameter for the make command. For example: make arch=intel64 version=hybrid install For some sample cases, like Linux systems based on the Intel® 64 architecture, the makefiles contain values that must be common. However, you need to be familiar with building an HPL and picking appropriate values for these variables. New Features of Intel® Optimized MP LINPACK Benchmark The toolset is basically identical with the HPL 2.0 distribution. There are a few changes that are optionally compiled in and disabled until you specifically request them. These new features are: ASYOUGO: Provides non-intrusive performance information while runs proceed. There are only a few outputs and this information does not impact performance. This is especially useful because many runs can go for hours without any information. ASYOUGO2: Provides slightly intrusive additional performance information by intercepting every DGEMM call. ASYOUGO2_DISPLAY: Displays the performance of all the significant DGEMMs inside the run. ENDEARLY: Displays a few performance hints and then terminates the run early. FASTSWAP: Inserts the LAPACK-optimized DLASWP into HPL's code. You can experiment with this to determine best results. HYBRID: Establishes the Hybrid OpenMP/MPI mode of MP LINPACK, providing the possibility to use threaded Intel MKL and prebuilt MP LINPACK hybrid libraries. CAUTION Use this option only with an Intel compiler and the Intel® MPI library version 3.1 or higher. You are also recommended to use the compiler version 10.0 or higher. 10 Intel® Math Kernel Library for Linux* OS User's Guide 82Benchmarking a Cluster To benchmark a cluster, follow the sequence of steps below (some of them are optional). Pay special attention to the iterative steps 3 and 4. They make a loop that searches for HPL parameters (specified in HPL.dat) that enable you to reach the top performance of your cluster. 1. Install HPL and make sure HPL is functional on all the nodes. 2. You may run nodeperf.c (included in the distribution) to see the performance of DGEMM on all the nodes. Compile nodeperf.c with your MPI and Intel MKL. For example: mpiicc -O3 nodeperf.c -L$MKLPATH $MKLPATH/libmkl_intel_lp64.a \ -Wl,--start-group $MKLPATH/libmkl_sequential.a \ $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread . Launching nodeperf.c on all the nodes is especially helpful in a very large cluster. nodeperf enables quick identification of the potential problem spot without numerous small MP LINPACK runs around the cluster in search of the bad node. It goes through all the nodes, one at a time, and reports the performance of DGEMM followed by some host identifier. Therefore, the higher the DGEMM performance, the faster that node was performing. 3. Edit HPL.dat to fit your cluster needs. Read through the HPL documentation for ideas on this. Note, however, that you should use at least 4 nodes. 4. Make an HPL run, using compile options such as ASYOUGO, ASYOUGO2, or ENDEARLY to aid in your search. These options enable you to gain insight into the performance sooner than HPL would normally give this insight. When doing so, follow these recommendations: • Use MP LINPACK, which is a patched version of HPL, to save time in the search. All performance intrusive features are compile-optional in MP LINPACK. That is, if you do not use the new options to reduce search time, these features are disabled. The primary purpose of the additions is to assist you in finding solutions. HPL requires a long time to search for many different parameters. In MP LINPACK, the goal is to get the best possible number. Given that the input is not fixed, there is a large parameter space you must search over. An exhaustive search of all possible inputs is improbably large even for a powerful cluster. MP LINPACK optionally prints information on performance as it proceeds. You can also terminate early. • Save time by compiling with -DENDEARLY -DASYOUGO2 and using a negative threshold (do not use a negative threshold on the final run that you intend to submit as a Top500 entry). Set the threshold in line 13 of the HPL 2.0 input file HPL.dat • If you are going to run a problem to completion, do it with -DASYOUGO. 5. Using the quick performance feedback, return to step 3 and iterate until you are sure that the performance is as good as possible. See Also Options to Reduce Search Time Options to Reduce Search Time Running large problems to completion on large numbers of nodes can take many hours. The search space for MP LINPACK is also large: not only can you run any size problem, but over a number of block sizes, grid layouts, lookahead steps, using different factorization methods, and so on. It can be a large waste of time to run a large problem to completion only to discover it ran 0.01% slower than your previous best problem. Use the following options to reduce the search time: • -DASYOUGO • -DENDEARLY • -DASYOUGO2 LINPACK and MP LINPACK Benchmarks 10 83Use -DASYOUGO2 cautiously because it does have a marginal performance impact. To see DGEMM internal performance, compile with -DASYOUGO2 and -DASYOUGO2_DISPLAY. These options provide a lot of useful DGEMM performance information at the cost of around 0.2% performance loss. If you want to use the old HPL, simply omit these options and recompile from scratch. To do this, try "make arch= clean_arch_all". -DASYOUGO -DASYOUGO gives performance data as the run proceeds. The performance always starts off higher and then drops because this actually happens in LU decomposition (a decomposition of a matrix into a product of a lower (L) and upper (U) triangular matrices). The ASYOUGO performance estimate is usually an overestimate (because the LU decomposition slows down as it goes), but it gets more accurate as the problem proceeds. The greater the lookahead step, the less accurate the first number may be. ASYOUGO tries to estimate where one is in the LU decomposition that MP LINPACK performs and this is always an overestimate as compared to ASYOUGO2, which measures actually achieved DGEMM performance. Note that the ASYOUGO output is a subset of the information that ASYOUGO2 provides. So, refer to the description of the -DASYOUGO2 option below for the details of the output. -DENDEARLY -DENDEARLY t erminates the problem after a few steps, so that you can set up 10 or 20 HPL runs without monitoring them, see how they all do, and then only run the fastest ones to completion. -DENDEARLY assumes -DASYOUGO. You do not need to define both, although it doesn't hurt. To avoid the residual check for a problem that terminates early, set the "threshold" parameter in HPL.dat to a negative number when testing ENDEARLY. It also sometimes gives a better picture to compile with -DASYOUGO2 when using - DENDEARLY. Usage notes on -DENDEARLY follow: • -DENDEARLY stops the problem after a few iterations of DGEMM on the block size (the bigger the blocksize, the further it gets). It prints only 5 or 6 "updates", whereas -DASYOUGO prints about 46 or so output elements before the problem completes. • Performance for -DASYOUGO and -DENDEARLY always starts off at one speed, slowly increases, and then slows down toward the end (because that is what LU does). -DENDEARLY is likely to terminate before it starts to slow down. • -DENDEARLY terminates the problem early with an HPL Error exit. It means that you need to ignore the missing residual results, which are wrong because the problem never completed. However, you can get an idea what the initial performance was, and if it looks good, then run the problem to completion without - DENDEARLY. To avoid the error check, you can set HPL's threshold parameter in HPL.dat to a negative number. • Though -DENDEARLY terminates early, HPL treats the problem as completed and computes Gflop rating as though the problem ran to completion. Ignore this erroneously high rating. • The bigger the problem, the more accurately the last update that -DENDEARLY returns is close to what happens when the problem runs to completion. -DENDEARLY is a poor approximation for small problems. It is for this reason that you are suggested to use ENDEARLY in conjunction with ASYOUGO2, because ASYOUGO2 reports actual DGEMM performance, which can be a closer approximation to problems just starting. -DASYOUGO2 -DASYOUGO2 gives detailed single-node DGEMM performance information. It captures all DGEMM calls (if you use Fortran BLAS) and records their data. Because of this, the routine has a marginal intrusive overhead. Unlike -DASYOUGO, which is quite non-intrusive, -DASYOUGO2 interrupts every DGEMM call to monitor its performance. You should beware of this overhead, although for big problems, it is, less than 0.1%. Here is a sample ASYOUGO2 output (the first 3 non-intrusive numbers can be found in ASYOUGO and ENDEARLY), so it suffices to describe these numbers here: 10 Intel® Math Kernel Library for Linux* OS User's Guide 84Col=001280 Fract=0.050 Mflops=42454.99 (DT=9.5 DF=34.1 DMF=38322.78). The problem size was N=16000 with a block size of 128. After 10 blocks, that is, 1280 columns, an output was sent to the screen. Here, the fraction of columns completed is 1280/16000=0.08. Only up to 40 outputs are printed, at various places through the matrix decomposition: fractions 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0.090 0.095 0.100 0.105 0.110 0.115 0.120 0.125 0.130 0.135 0.140 0.145 0.150 0.155 0.160 0.165 0.170 0.175 0.180 0.185 0.190 0.195 0.200 0.205 0.210 0.215 0.220 0.225 0.230 0.235 0.240 0.245 0.250 0.255 0.260 0.265 0.270 0.275 0.280 0.285 0.290 0.295 0.300 0.305 0.310 0.315 0.320 0.325 0.330 0.335 0.340 0.345 0.350 0.355 0.360 0.365 0.370 0.375 0.380 0.385 0.390 0.395 0.400 0.405 0.410 0.415 0.420 0.425 0.430 0.435 0.440 0.445 0.450 0.455 0.460 0.465 0.470 0.475 0.480 0.485 0.490 0.495 0.515 0.535 0.555 0.575 0.595 0.615 0.635 0.655 0.675 0.695 0.795 0.895. However, this problem size is so small and the block size so big by comparison that as soon as it prints the value for 0.045, it was already through 0.08 fraction of the columns. On a really big problem, the fractional number will be more accurate. It never prints more than the 112 numbers above. So, smaller problems will have fewer than 112 updates, and the biggest problems will have precisely 112 updates. Mflops is an estimate based on 1280 columns of LU being completed. However, with lookahead steps, sometimes that work is not actually completed when the output is made. Nevertheless, this is a good estimate for comparing identical runs. The 3 numbers in parenthesis are intrusive ASYOUGO2 addins. DT is the total time processor 0 has spent in DGEMM. DF is the number of billion operations that have been performed in DGEMM by one processor. Hence, the performance of processor 0 (in Gflops) in DGEMM is always DF/DT. Using the number of DGEMM flops as a basis instead of the number of LU flops, you get a lower bound on performance of the run by looking at DMF, which can be compared to Mflops above (It uses the global LU time, but the DGEMM flops are computed under the assumption that the problem is evenly distributed amongst the nodes, as only HPL's node (0,0) returns any output.) Note that when using the above performance monitoring tools to compare different HPL.dat input data sets, you should be aware that the pattern of performance drop-off that LU experiences is sensitive to some input data. For instance, when you try very small problems, the performance drop-off from the initial values to end values is very rapid. The larger the problem, the less the drop-off, and it is probably safe to use the first few performance values to estimate the difference between a problem size 700000 and 701000, for instance. Another factor that influences the performance drop-off is the grid dimensions (P and Q). For big problems, the performance tends to fall off less from the first few steps when P and Q are roughly equal in value. You can make use of a large number of parameters, such as broadcast types, and change them so that the final performance is determined very closely by the first few steps. Using these tools will greatly assist the amount of data you can test. See Also Benchmarking a Cluster LINPACK and MP LINPACK Benchmarks 10 8510 Intel® Math Kernel Library for Linux* OS User's Guide 86Intel® Math Kernel Library Language Interfaces Support A Language Interfaces Support, by Function Domain The following table shows language interfaces that Intel® Math Kernel Library (Intel® MKL) provides for each function domain. However, Intel MKL routines can be called from other languages using mixed-language programming. See Mixed-language Programming with Intel® MKL for an example of how to call Fortran routines from C/C++. Function Domain FORTRAN 77 interface Fortran 9 0/95 interface C/C++ interface Basic Linear Algebra Subprograms (BLAS) Yes Yes via CBLAS BLAS-like extension transposition routines Yes Yes Sparse BLAS Level 1 Yes Yes via CBLAS Sparse BLAS Level 2 and 3 Yes Yes Yes LAPACK routines for solving systems of linear equations Yes Yes Yes LAPACK routines for solving least-squares problems, eigenvalue and singular value problems, and Sylvester's equations Yes Yes Yes Auxiliary and utility LAPACK routines Yes Yes Parallel Basic Linear Algebra Subprograms (PBLAS) Yes ScaLAPACK routines Yes † DSS/PARDISO* solvers Yes Yes Yes Other Direct and Iterative Sparse Solver routines Yes Yes Yes Vector Mathematical Library (VML) functions Yes Yes Yes Vector Statistical Library (VSL) functions Yes Yes Yes Fourier Transform functions (FFT) Yes Yes Cluster FFT functions Yes Yes Trigonometric Transform routines Yes Yes Fast Poisson, Laplace, and Helmholtz Solver (Poisson Library) routines Yes Yes Optimization (Trust-Region) Solver routines Yes Yes Yes Data Fitting functions Yes Yes Yes GMP* arithmetic functions †† Yes Support functions (including memory allocation) Yes Yes Yes † Supported using a mixed language programming call. See Intel ® MKL Include Files for the respective header file. 87†† GMP Arithmetic Functions are deprecated and will be removed in a future release. Include Files Function domain Fortran Include Files C/C++ Include Files All function domains mkl.fi mkl.h BLAS Routines blas.f90 mkl_blas.fi mkl_blas.h BLAS-like Extension Transposition Routines mkl_trans.fi mkl_trans.h CBLAS Interface to BLAS mkl_cblas.h Sparse BLAS Routines mkl_spblas.fi mkl_spblas.h LAPACK Routines lapack.f90 mkl_lapack.fi mkl_lapack.h C Interface to LAPACK mkl_lapacke.h ScaLAPACK Routines mkl_scalapack.h All Sparse Solver Routines mkl_solver.f90 mkl_solver.h PARDISO mkl_pardiso.f77 mkl_pardiso.f90 mkl_pardiso.h DSS Interface mkl_dss.f77 mkl_dss.f90 mkl_dss.h RCI Iterative Solvers ILU Factorization mkl_rci.fi mkl_rci.h Optimization Solver Routines mkl_rci.fi mkl_rci.h Vector Mathematical Functions mkl_vml.f77 mkl_vml.90 mkl_vml.h Vector Statistical Functions mkl_vsl.f77 mkl_vsl.f90 mkl_vsl_functions.h Fourier Transform Functions mkl_dfti.f90 mkl_dfti.h Cluster Fourier Transform Functions mkl_cdft.f90 mkl_cdft.h Partial Differential Equations Support Routines Trigonometric Transforms mkl_trig_transforms.f90 mkl_trig_transforms.h Poisson Solvers mkl_poisson.f90 mkl_poisson.h Data Fitting functions mkl_df.f77 mkl_df.f90 mkl_df.h GMP interface † mkl_gmp.h Support functions mkl_service.f90 mkl_service.h A Intel® Math Kernel Library for Linux* OS User's Guide 88Function domain Fortran Include Files C/C++ Include Files mkl_service.fi Memory allocation routines i_malloc.h Intel MKL examples interface mkl_example.h † GMP Arithmetic Functions are deprecated and will be removed in a future release. See Also Language Interfaces Support, by Function Domain Intel® Math Kernel Library Language Interfaces Support A 89A Intel® Math Kernel Library for Linux* OS User's Guide 90Support for Third-Party Interfaces B GMP* Functions Intel® Math Kernel Library (Intel® MKL) implementation of GMP* arithmetic functions includes arbitrary precision arithmetic operations on integer numbers. The interfaces of such functions fully match the GNU Multiple Precision* (GMP) Arithmetic Library. For specifications of these functions, please see http:// software.intel.com/sites/products/documentation/hpc/mkl/gnump/index.htm. NOTE Intel MKL GMP Arithmetic Functions are deprecated and will be removed in a future release. If you currently use the GMP* library, you need to modify INCLUDE statements in your programs to mkl_gmp.h. FFTW Interface Support Intel® Math Kernel Library (Intel® MKL) offers two collections of wrappers for the FFTW interface (www.fftw.org). The wrappers are the superstructure of FFTW to be used for calling the Intel MKL Fourier transform functions. These collections correspond to the FFTW versions 2.x and 3.x and the Intel MKL versions 7.0 and later. These wrappers enable using Intel MKL Fourier transforms to improve the performance of programs that use FFTW without changing the program source code. See the "FFTW Interface to Intel® Math Kernel Library" appendix in the Intel MKL Reference Manual for details on the use of the wrappers. Important For ease of use, FFTW3 interface is also integrated in Intel MKL. 91B Intel® Math Kernel Library for Linux* OS User's Guide 92Directory Structure in Detail C Tables in this section show contents of the Intel(R) Math Kernel Library (Intel(R) MKL) architecture-specific directories. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Detailed Structure of the IA-32 Architecture Directories Static Libraries in the lib/ia32 Directory File Contents Interface layer libmkl_intel.a Interface library for the Intel compilers libmkl_blas95.a Fortran 95 interface library for BLAS for the Intel® Fortran compiler libmkl_lapack95.a Fortran 95 interface library for LAPACK for the Intel Fortran compiler libmkl_gf.a Interface library for the GNU* Fortran compiler Threading layer libmkl_intel_thread.a Threading library for the Intel compilers libmkl_gnu_thread.a Threading library for the GNU Fortran and C compilers libmkl_pgi_thread.a Threading library for the PGI* compiler libmkl_sequential.a Sequential library Computational layer libmkl_core.a Kernel library for the IA-32 architecture libmkl_solver.a Deprecated. Empty library for backward compatibility libmkl_solver_sequential.a Deprecated. Empty library for backward compatibility libmkl_scalapack_core.a ScaLAPACK routines libmkl_cdft_core.a Cluster version of FFT functions 93File Contents Run-time Libraries (RTL) libmkl_blacs.a BLACS routines supporting the following MPICH versions: • Myricom* MPICH version 1.2.5.10 • ANL* MPICH version 1.2.5.2 libmkl_blacs_intelmpi.a BLACS routines supporting Intel MPI and MPICH2 libmkl_blacs_intelmpi20.a A soft link to lib/32/libmkl_blacs_intelmpi.a libmkl_blacs_openmpi.a BLACS routines supporting OpenMPI Dynamic Libraries in the lib/ia32 Directory File Contents libmkl_rt.so Single Dynamic Library Interface layer libmkl_intel.so Interface library for the Intel compilers libmkl_gf.so Interface library for the GNU Fortran compiler Threading layer libmkl_intel_thread.so Threading library for the Intel compilers libmkl_gnu_thread.so Threading library for the GNU Fortran and C compilers libmkl_pgi_thread.so Threading library for the PGI* compiler libmkl_sequential.so Sequential library Computational layer libmkl_core.so Library dispatcher for dynamic load of processor-specific kernel library libmkl_def.so Default kernel library (Intel® Pentium®, Pentium® Pro, Pentium® II, and Pentium® III processors) libmkl_p4.so Pentium® 4 processor kernel library libmkl_p4p.so Kernel library for the Intel® Pentium® 4 processor with Streaming SIMD Extensions 3 (SSE3), including Intel® Core™ Duo and Intel® Core™ Solo processors. libmkl_p4m.so Kernel library for processors based on the Intel® Core™ microarchitecture (except Intel® Core™ Duo and Intel® Core™ Solo processors, for which mkl_p4p.so is intended) libmkl_p4m3.so Kernel library for the Intel® Core™ i7 processors libmkl_vml_def.so VML/VSL part of default kernel for old Intel® Pentium® processors libmkl_vml_ia.so VML/VSL default kernel for newer Intel® architecture processors C Intel® Math Kernel Library for Linux* OS User's Guide 94File Contents libmkl_vml_p4.so VML/VSL part of Pentium® 4 processor kernel libmkl_vml_p4m.so VML/VSL for processors based on the Intel® Core™ microarchitecture libmkl_vml_p4m2.so VML/VSL for 45nm Hi-k Intel® Core™2 and Intel Xeon® processor families libmkl_vml_p4m3.so VML/VSL for the Intel® Core™ i7 processors libmkl_vml_p4p.so VML/VSL for Pentium® 4 processor with Streaming SIMD Extensions 3 (SSE3) libmkl_vml_avx.so VML/VSL optimized for the Intel® Advanced Vector Extensions (Intel® AVX) libmkl_scalapack_core.so ScaLAPACK routines. libmkl_cdft_core.so Cluster version of FFT functions. Run-time Libraries (RTL) libmkl_blacs_intelmpi.so BLACS routines supporting Intel MPI and MPICH2 locale/en_US/mkl_msg.cat Catalog of Intel® Math Kernel Library (Intel® MKL) messages in English locale/ja_JP/mkl_msg.cat Catalog of Intel MKL messages in Japanese. Available only if the Intel® MKL package provides Japanese localization. Please see the Release Notes for this information Detailed Structure of the Intel® 64 Architecture Directories Static Libraries in the lib/intel64 Directory File Contents Interface layer libmkl_intel_lp64.a LP64 interface library for the Intel compilers libmkl_intel_ilp64.a ILP64 interface library for the Intel compilers libmkl_intel_sp2dp.a SP2DP interface library for the Intel compilers libmkl_blas95_lp64.a Fortran 95 interface library for BLAS for the Intel® Fortran compiler. Supports the LP64 interface libmkl_blas95_ilp64.a Fortran 95 interface library for BLAS for the Intel® Fortran compiler. Supports the ILP64 interface libmkl_lapack95_lp64.a Fortran 95 interface library for LAPACK for the Intel® Fortran compiler. Supports the LP64 interface libmkl_lapack95_ilp64.a Fortran 95 interface library for LAPACK for the Intel® Fortran compiler. Supports the ILP64 interface Directory Structure in Detail C 95File Contents libmkl_gf_lp64.a LP64 interface library for the GNU Fortran compilers libmkl_gf_ilp64.a ILP64 interface library for the GNU Fortran compilers Threading layer libmkl_intel_thread.a Threading library for the Intel compilers libmkl_gnu_thread.a Threading library for the GNU Fortran and C compilers libmkl_pgi_thread.a Threading library for the PGI compiler libmkl_sequential.a Sequential library Computational layer libmkl_core.a Kernel library for the Intel® 64 architecture libmkl_solver_lp64.a Deprecated. Empty library for backward compatibility libmkl_solver_lp64_sequential.a Deprecated. Empty library for backward compatibility libmkl_solver_ilp64.a Deprecated. Empty library for backward compatibility libmkl_solver_ilp64_sequential.a Deprecated. Empty library for backward compatibility libmkl_scalapack_lp64.a ScaLAPACK routine library supporting the LP64 interface libmkl_scalapack_ilp64.a ScaLAPACK routine library supporting the ILP64 interface libmkl_cdft_core.a Cluster version of FFT functions. Run-time Libraries (RTL) libmkl_blacs_lp64.a LP64 version of BLACS routines supporting the following MPICH versions: • Myricom* MPICH version 1.2.5.10 • ANL* MPICH version 1.2.5.2 libmkl_blacs_ilp64.a ILP64 version of BLACS routines supporting the following MPICH versions: • Myricom* MPICH version 1.2.5.10 • ANL* MPICH version 1.2.5.2 libmkl_blacs_intelmpi_lp64.a LP64 version of BLACS routines supporting Intel MPI and MPICH2 libmkl_blacs_intelmpi_ilp64.a ILP64 version of BLACS routines supporting Intel MPI and MPICH2 libmkl_blacs_intelmpi20_lp64.a A soft link to lib/intel64/ libmkl_blacs_intelmpi_lp64.a libmkl_blacs_intelmpi20_ilp64.a A soft link to lib/intel64/ libmkl_blacs_intelmpi_ilp64.a libmkl_blacs_openmpi_lp64.a LP64 version of BLACS routines supporting OpenMPI. libmkl_blacs_openmpi_ilp64.a ILP64 version of BLACS routines supporting OpenMPI. libmkl_blacs_sgimpt_lp64.a LP64 version of BLACS routines supporting SGI MPT. C Intel® Math Kernel Library for Linux* OS User's Guide 96File Contents libmkl_blacs_sgimpt_ilp64.a ILP64 version of BLACS routines supporting SGI MPT. Dynamic Libraries in the lib/intel64 Directory File Contents libmkl_rt.so Single Dynamic Library Interface layer libmkl_intel_lp64.so LP64 interface library for the Intel compilers libmkl_intel_ilp64.so ILP64 interface library for the Intel compilers libmkl_intel_sp2dp.so SP2DP interface library for the Intel compilers libmkl_gf_lp64.so LP64 interface library for the GNU Fortran compilers libmkl_gf_ilp64.so ILP64 interface library for the GNU Fortran compilers Threading layer libmkl_intel_thread.so Threading library for the Intel compilers libmkl_gnu_thread.so Threading library for the GNU Fortran and C compilers libmkl_pgi_thread.so Threading library for the PGI* compiler libmkl_sequential.so Sequential library Computational layer libmkl_core.so Library dispatcher for dynamic load of processor-specific kernel libmkl_def.so Default kernel library libmkl_mc.so Kernel library for processors based on the Intel® Core™ microarchitecture libmkl_mc3.so Kernel library for the Intel® Core™ i7 processors libmkl_avx.so Kernel optimized for the Intel® Advanced Vector Extensions (Intel® AVX). libmkl_vml_def.so VML/VSL part of default kernels libmkl_vml_p4n.so VML/VSL for the Intel® Xeon® processor using the Intel® 64 architecture libmkl_vml_mc.so VML/VSL for processors based on the Intel® Core™ microarchitecture libmkl_vml_mc2.so VML/VSL for 45nm Hi-k Intel® Core™2 and Intel Xeon® processor families libmkl_vml_mc3.so VML/VSL for the Intel® Core™ i7 processors libmkl_vml_avx.so VML/VSL optimized for the Intel® Advanced Vector Extensions (Intel® AVX) libmkl_scalapack_lp64.so ScaLAPACK routine library supporting the LP64 interface Directory Structure in Detail C 97File Contents libmkl_scalapack_ilp64.so ScaLAPACK routine library supporting the ILP64 interface libmkl_cdft_core.so Cluster version of FFT functions. Run-time Libraries (RTL) libmkl_intelmpi_lp64.so LP64 version of BLACS routines supporting Intel MPI and MPICH2 libmkl_intelmpi_ilp64.so ILP64 version of BLACS routines supporting Intel MPI and MPICH2 locale/en_US/mkl_msg.cat Catalog of Intel® Math Kernel Library (Intel® MKL) messages in English locale/ja_JP/mkl_msg.cat Catalog of Intel MKL messages in Japanese. Available only if the Intel® MKL package provides Japanese localization. Please see the Release Notes for this information C Intel® Math Kernel Library for Linux* OS User's Guide 98Index A affinity mask 51 aligning data 67 architecture support 23 B BLAS calling routines from C 58 Fortran 95 interface to 57 threaded routines 41 C C interface to LAPACK, use of 58 C, calling LAPACK, BLAS, CBLAS from 58 C/C++, Intel(R) MKL complex types 59 calling BLAS functions from C 60 CBLAS interface from C 60 complex BLAS Level 1 function from C 60 complex BLAS Level 1 function from C++ 60 Fortran-style routines from C 58 CBLAS interface, use of 58 Cluster FFT, linking with 69 cluster software, Intel(R) MKL cluster software, linking with commands 69 linking examples 71 code examples, use of 20 coding data alignment techniques to improve performance 50 compilation, Intel(R) MKL version-dependent 68 compiler run-time libraries, linking with 37 compiler-dependent function 57 complex types in C and C++, Intel(R) MKL 59 computation results, consistency 67 computational libraries, linking with 37 conditional compilation 68 configuring Eclipse* CDT 73 consistent results 67 conventions, notational 13 custom shared object building 38 composing list of functions 39 specifying function names 40 D denormal number, performance 52 directory structure documentation 26 high-level 23 in-detail documentation directories, contents 26 man pages 26 documentation, for Intel(R) MKL, viewing in Eclipse* IDE 74 E Eclipse* CDT configuring 73 viewing Intel(R) MKL documentation in 74 Eclipse* IDE, searching the Intel Web site 74 Enter index keyword 27 environment variables, setting 18 examples, linking for cluster software 71 general 29 F FFT interface data alignment 50 optimised radices 52 threaded problems 41 FFTW interface support 91 Fortran 95 interface libraries 35 G GNU* Multiple Precision Arithmetic Library 91 H header files, Intel(R) MKL 88 HT technology, configuration tip 50 hybrid, version, of MP LINPACK 79 I ILP64 programming, support for 33 include files, Intel(R) MKL 88 installation, checking 17 Intel(R) Hyper-Threading Technology, configuration tip 50 Intel(R) Web site, searching in Eclipse* IDE 74 interface Fortran 95, libraries 35 LP64 and ILP64, use of 33 interface libraries and modules, Intel(R) MKL 55 interface libraries, linking with 33 J Java* examples 62 L language interfaces support 87 language-specific interfaces interface libraries and modules 55 LAPACK C interface to, use of 58 calling routines from C 58 Fortran 95 interface to 57 performance of packed routines 50 threaded routines 41 layers, Intel(R) MKL structure 24 libraries to link with computational 37 interface 33 run-time 37 system libraries 38 Index 99threading 36 link tool, command line 29 link-line syntax 31 linking examples cluster software 71 general 29 linking with compiler run-time libraries 37 computational libraries 37 interface libraries 33 system libraries 38 threading libraries 36 linking, quick start 27 linking, Web-based advisor 29 LINPACK benchmark 77 M man pages, viewing 26 memory functions, redefining 53 memory management 52 memory renaming 53 mixed-language programming 58 module, Fortran 95 57 MP LINPACK benchmark 79 multi-core performance 51 N notational conventions 13 number of threads changing at run time 44 changing with OpenMP* environment variable 44 Intel(R) MKL choice, particular cases 47 setting for cluster 70 techniques to set 44 P parallel performance 43 parallelism, of Intel(R) MKL 41 performance multi-core 51 with denormals 52 with subnormals 52 S ScaLAPACK, linking with 69 SDL 28, 32 sequential mode of Intel(R) MKL 35 Single Dynamic Library 28, 32 structure high-level 23 in-detail model 24 support, technical 11 supported architectures 23 syntax, link-line 31 system libraries, linking with 38 T technical support 11 thread safety, of Intel(R) MKL 41 threaded functions 41 threaded problems 41 threading control, Intel(R) MKL-specific 46 threading libraries, linking with 36 U uBLAS, matrix-matrix multiplication, substitution with Intel MKL functions 61 unstable output, getting rid of 67 usage information 15 Intel® Math Kernel Library for Linux* OS User's Guide 100 Intel ® Math Kernel Library for Mac OS* X User's Guide Intel® MKL - Mac OS* X Document Number: 315932-018US Legal InformationContents Legal Information................................................................................7 Introducing the Intel® Math Kernel Library...........................................9 Getting Help and Support...................................................................11 Notational Conventions......................................................................13 Chapter 1: Overview Document Overview.................................................................................15 What's New.............................................................................................15 Related Information.................................................................................15 Chapter 2: Getting Started Checking Your Installation.........................................................................17 Setting Environment Variables ..................................................................17 Compiler Support.....................................................................................19 Using Code Examples...............................................................................19 What You Need to Know Before You Begin Using the Intel ® Math Kernel Library...............................................................................................19 Chapter 3: Structure of the Intel® Math Kernel Library Architecture Support................................................................................21 High-level Directory Structure....................................................................21 Layered Model Concept.............................................................................22 Accessing the Intel ® Math Kernel Library Documentation...............................23 Contents of the Documentation Directories..........................................23 Viewing Man Pages..........................................................................24 Chapter 4: Linking Your Application with the Intel® Math Kernel Library Linking Quick Start...................................................................................25 Using the -mkl Compiler Option.........................................................25 Using the Single Dynamic Library.......................................................26 Selecting Libraries to Link with..........................................................26 Using the Link-line Advisor................................................................27 Using the Command-line Link Tool.....................................................27 Linking Examples.....................................................................................27 Linking on IA-32 Architecture Systems...............................................27 Linking on Intel(R) 64 Architecture Systems........................................28 Linking in Detail.......................................................................................29 Listing Libraries on a Link Line...........................................................29 Dynamically Selecting the Interface and Threading Layer......................30 Linking with Interface Libraries..........................................................31 Using the ILP64 Interface vs. LP64 Interface...............................31 Linking with Fortran 95 Interface Libraries..................................33 Linking with Threading Libraries.........................................................33 Sequential Mode of the Library..................................................33 Selecting the Threading Layer...................................................33 Linking with Compiler Run-time Libraries............................................34 Contents 3Linking with System Libraries............................................................34 Building Custom Dynamically Linked Shared Libraries ..................................35 Using the Custom Dynamically Linked Shared Library Builder................35 Composing a List of Functions ..........................................................36 Specifying Function Names...............................................................36 Distributing Your Custom Dynamically Linked Shared Library.................37 Chapter 5: Managing Performance and Memory Using Parallelism of the Intel ® Math Kernel Library........................................39 Threaded Functions and Problems......................................................39 Avoiding Conflicts in the Execution Environment..................................41 Techniques to Set the Number of Threads...........................................42 Setting the Number of Threads Using an OpenMP* Environment Variable......................................................................................42 Changing the Number of Threads at Run Time.....................................42 Using Additional Threading Control.....................................................44 Intel MKL-specific Environment Variables for Threading Control. . . . .44 MKL_DYNAMIC........................................................................45 MKL_DOMAIN_NUM_THREADS..................................................46 Setting the Environment Variables for Threading Control..............47 Tips and Techniques to Improve Performance..............................................47 Coding Techniques...........................................................................47 Hardware Configuration Tips.............................................................48 Operating on Denormals...................................................................49 FFT Optimized Radices.....................................................................49 Using Memory Management ......................................................................49 Intel MKL Memory Management Software............................................49 Redefining Memory Functions............................................................49 Chapter 6: Language-specific Usage Options Using Language-Specific Interfaces with Intel ® Math Kernel Library.................51 Interface Libraries and Modules.........................................................51 Fortran 95 Interfaces to LAPACK and BLAS..........................................52 Compiler-dependent Functions and Fortran 90 Modules.........................53 Mixed-language Programming with the Intel Math Kernel Library....................53 Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments..............................................................................54 Using Complex Types in C/C++.........................................................55 Calling BLAS Functions that Return the Complex Values in C/C++ Code..........................................................................................55 Support for Boost uBLAS Matrix-matrix Multiplication...........................57 Invoking Intel MKL Functions from Java* Applications...........................58 Intel MKL Java* Examples........................................................58 Running the Java* Examples.....................................................60 Known Limitations of the Java* Examples...................................60 Chapter 7: Coding Tips Aligning Data for Consistent Results...........................................................63 Using Predefined Preprocessor Symbols for Intel ® MKL Version-Dependent Compilation.........................................................................................64 Intel® Math Kernel Library for Mac OS* X User's Guide 4Chapter 8: Configuring Your Integrated Development Environment to Link with Intel Math Kernel Library Configuring the Apple Xcode* Developer Software to Link with Intel ® Math Kernel Library......................................................................................65 Chapter 9: Intel® Optimized LINPACK Benchmark for Mac OS* X Contents of the Intel ® Optimized LINPACK Benchmark..................................67 Running the Software...............................................................................67 Known Limitations of the Intel ® Optimized LINPACK Benchmark.....................68 Appendix A: Intel® Math Kernel Library Language Interfaces Support Language Interfaces Support, by Function Domain.......................................69 Include Files............................................................................................70 Appendix B: Support for Third-Party Interfaces GMP* Functions.......................................................................................73 FFTW Interface Support............................................................................73 Appendix C: Directory Structure in Detail Static Libraries in the lib directory..............................................................75 Dynamic Libraries in the lib directory..........................................................76 Contents 5Intel® Math Kernel Library for Mac OS* X User's Guide 6Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http:// www.intel.com/design/literature.htm Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/ processor_number/ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Java is a registered trademark of Oracle and/or its affiliates. Copyright © 2007 - 2011, Intel Corporation. All rights reserved. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for 7Optimization Notice use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Intel® Math Kernel Library for Mac OS* X User's Guide 8Introducing the Intel® Math Kernel Library The Intel ® Math Kernel Library (Intel ® MKL) improves performance of scientific, engineering, and financial software that solves large computational problems. Among other functionality, Intel MKL provides linear algebra routines, fast Fourier transforms, as well as vectorized math and random number generation functions, all optimized for the latest Intel processors, including processors with multiple cores (see the Intel ® MKL Release Notes for the full list of supported processors). Intel MKL also performs well on non-Intel processors. Intel MKL is thread-safe and extensively threaded using the OpenMP* technology. Intel MKL provides the following major functionality: • Linear algebra, implemented in LAPACK (solvers and eigensolvers) plus level 1, 2, and 3 BLAS, offering the vector, vector-matrix, and matrix-matrix operations needed for complex mathematical software. If you prefer the FORTRAN 90/95 programming language, you can call LAPACK driver and computational subroutines through specially designed interfaces with reduced numbers of arguments. A C interface to LAPACK is also available. • ScaLAPACK (SCAlable LAPACK) with its support functionality including the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS). ScaLAPACK is available for Intel MKL for Linux* and Windows* operating systems. • Direct sparse solver, an iterative sparse solver, and a supporting set of sparse BLAS (level 1, 2, and 3) for solving sparse systems of equations. • Multidimensional discrete Fourier transforms (1D, 2D, 3D) with a mixed radix support (for sizes not limited to powers of 2). Distributed versions of these functions are provided for use on clusters on the Linux* and Windows* operating systems. • A set of vectorized transcendental functions called the Vector Math Library (VML). For most of the supported processors, the Intel MKL VML functions offer greater performance than the libm (scalar) functions, while keeping the same high accuracy. • The Vector Statistical Library (VSL), which offers high performance vectorized random number generators for several probability distributions, convolution and correlation routines, and summary statistics functions. • Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search. For details see the Intel® MKL Reference Manual. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 9 Intel® Math Kernel Library for Mac OS* X User's Guide 10Getting Help and Support Intel provides a support web site that contains a rich repository of self help information, including getting started tips, known product issues, product errata, license information, user forums, and more. Visit the Intel MKL support website at http://www.intel.com/software/products/support/. 11 Intel® Math Kernel Library for Mac OS* X User's Guide 12Notational Conventions The following term is used in reference to the operating system. Mac OS * X This term refers to information that is valid on all Intel®-based systems running the Mac OS* X operating system. The following notations are used to refer to Intel MKL directories. The installation directory for the Intel® C++ Composer XE or Intel® Fortran Composer XE . The main directory where Intel MKL is installed: =/mkl. Replace this placeholder with the specific pathname in the configuring, linking, and building instructions. The following font conventions are used in this document. Italic Italic is used for emphasis and also indicates document names in body text, for example: see Intel MKL Reference Manual. Monospace lowercase mixed with uppercase Indicates: • Commands and command-line options, for example, icc myprog.c -L$MKLPATH -I$MKLINCLUDE -lmkl -liomp5 -lpthread • Filenames, directory names, and pathnames, for example, /System/Library/Frameworks/JavaVM.framework/Versions/1.5/Home • C/C++ code fragments, for example, a = new double [SIZE*SIZE]; UPPERCASE MONOSPACE Indicates system variables, for example, $MKLPATH. Monospace italic Indicates a parameter in discussions, for example, lda. When enclosed in angle brackets, indicates a placeholder for an identifier, an expression, a string, a symbol, or a value, for example, . Substitute one of these items for the placeholder. [ items ] Square brackets indicate that the items enclosed in brackets are optional. { item | item } Braces indicate that only one of the items listed between braces should be selected. A vertical bar ( | ) separates the items. 13 Intel® Math Kernel Library for Mac OS* X User's Guide 14Overview 1 Document Overview The Intel® Math Kernel Library (Intel® MKL) User's Guide provides usage information for the library. The usage information covers the organization, configuration, performance, and accuracy of Intel MKL, specifics of routine calls in mixed-language programming, linking, and more. This guide describes OS-specific usage of Intel MKL, along with OS-independent features. The document contains usage information for all Intel MKL function domains. This User's Guide provides the following information: • Describes post-installation steps to help you start using the library • Shows you how to configure the library with your development environment • Acquaints you with the library structure • Explains how to link your application with the library and provides simple usage scenarios • Describes how to code, compile, and run your application with Intel MKL This guide is intended for Mac OS X programmers with beginner to advanced experience in software development. See Also Language Interfaces Support, by Function Domain What's New This User's Guide documents the Intel® Math Kernel Library (Intel® MKL) 10.3 Update 8. The document was updated to reflect addition of Data Fitting Functions to the product. Related Information To reference how to use the library in your application, use this guide in conjunction with the following documents: • The Intel® Math Kernel Library Reference Manual, which provides reference information on routine functionalities, parameter descriptions, interfaces, calling syntaxes, and return values. • The Intel® Math Kernel Library for Mac OS * X Release Notes. 151 Intel® Math Kernel Library for Mac OS* X User's Guide 16Getting Started 2 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Checking Your Installation After installing the Intel® Math Kernel Library (Intel® MKL), verify that the library is properly installed and configured: 1. Intel MKL installs in . Check that the subdirectory of referred to as was created. 2. If you want to keep multiple versions of Intel MKL installed on your system, update your build scripts to point to the correct Intel MKL version. 3. Check that the following files appear in the /bin directory and its subdirectories: mklvars.sh mklvars.csh ia32/mklvars_ia32.sh ia32/mklvars_ia32.csh intel64/mklvars_intel64.sh intel64/mklvars_intel64.csh Use these files to assign Intel MKL-specific values to several environment variables, as explained in Setting Environment Variables 4. To understand how the Intel MKL directories are structured, see Intel® Math Kernel Library Structure. 5. To make sure that Intel MKL runs on your system, launch an Intel MKL example, as explained in Using Code Examples. See Also Notational Conventions Setting Environment Variables When the installation of Intel MKL for Mac OS* X is complete, set the INCLUDE, MKLROOT, DYLD_LIBRARY_PATH, MANPATH, LIBRARY_PATH, CPATH, FPATH, and NLSPATH environment variables in the command shell using one of the script files in the bin subdirectory of the Intel MKL installation directory. Choose the script corresponding to your system architecture and command shell as explained in the following table: 17Architecture Shell Script File IA-32 C ia32/mklvars_ia32.csh IA-32 Bash ia32/mklvars_ia32.sh Intel® 64 C intel64/mklvars_intel64.csh Intel® 64 Bash intel64/mklvars_intel64.sh IA-32 and Intel® 64 C mklvars.csh IA-32 and Intel® 64 Bash mklvars.sh Running the Scripts The scripts accept parameters to specify the following: • Architecture. • Addition of a path to Fortran 95 modules precompiled with the Intel ® Fortran compiler to the FPATH environment variable. Supply this parameter only if you are using the Intel ® Fortran compiler. • Interface of the Fortran 95 modules. This parameter is needed only if you requested addition of a path to the modules. Usage and values of these parameters depend on the scriptname (regardless of the extension). The following table lists values of the script parameters. Script Architecture (required, when applicable) Addition of a Path to Fortran 95 Modules (optional) Interface (optional) mklvars_ia32 n/a † mod n/a mklvars_intel64 n/a mod lp64, default ilp64 mklvars ia32 intel64 mod lp64, default ilp64 † Not applicable. For example: • The command mklvars_ia32.sh sets environment variables for the IA-32 architecture and adds no path to the Fortran 95 modules. • The command mklvars_intel64.sh mod ilp64 sets environment variables for the Intel ® 64 architecture and adds the path to the Fortran 95 modules for the ILP64 interface to the FPATH environment variable. • The command mklvars.sh intel64 mod sets environment variables for the Intel ® 64 architecture and adds the path to the Fortran 95 modules for the LP64 interface to the FPATH environment variable. NOTE Supply the parameter specifying the architecture first, if it is needed. Values of the other two parameters can be listed in any order. 2 Intel® Math Kernel Library for Mac OS* X User's Guide 18See Also High-level Directory Structure Interface Libraries and Modules Fortran 95 Interfaces to LAPACK and BLAS Setting the Number of Threads Using an OpenMP* Environment Variable Compiler Support Intel MKL supports compilers identified in the Release Notes. However, the library has been successfully used with other compilers as well. Intel MKL provides a set of include files to simplify program development by specifying enumerated values and prototypes for the respective functions. Calling Intel MKL functions from your application without an appropriate include file may lead to incorrect behavior of the functions. See Also Include Files Using Code Examples The Intel MKL package includes code examples, located in the examples subdirectory of the installation directory. Use the examples to determine: • Whether Intel MKL is working on your system • How you should call the library • How to link the library The examples are grouped in subdirectories mainly by Intel MKL function domains and programming languages. For example, the examples/spblas subdirectory contains a makefile to build the Sparse BLAS examples and the examples/vmlc subdirectory contains the makefile to build the C VML examples. Source code for the examples is in the next-level sources subdirectory. See Also High-level Directory Structure What You Need to Know Before You Begin Using the Intel® Math Kernel Library Target platform Identify the architecture of your target machine: • IA-32 or compatible • Intel® 64 or compatible Reason: Linking Examples To configure your development environment for the use with Intel MKL, set your environment variables using the script corresponding to your architecture (see Setting Environment Variables for details). Mathematical problem Identify all Intel MKL function domains that you require: • BLAS • Sparse BLAS • LAPACK • Sparse Solver routines • Vector Mathematical Library functions (VML) • Vector Statistical Library functions Getting Started 2 19• Fourier Transform functions (FFT) • Trigonometric Transform routines • Poisson, Laplace, and Helmholtz Solver routines • Optimization (Trust-Region) Solver routines • Data Fitting Functions • GMP* arithmetic functions. Deprecated and will be removed in a future release Reason: The function domain you intend to use narrows the search in the Reference Manual for specific routines you need. Coding tips may also depend on the function domain (see Tips and Techniques to Improve Performance). Programming language Intel MKL provides support for both Fortran and C/C++ programming. Identify the language interfaces that your function domains support (see Intel® Math Kernel Library Language Interfaces Support). Reason: Intel MKL provides language-specific include files for each function domain to simplify program development (see Language Interfaces Support, by Function Domain). For a list of language-specific interface libraries and modules and an example how to generate them, see also Using Language-Specific Interfaces with Intel® Math Kernel Library. Range of integer data If your system is based on the Intel 64 architecture, identify whether your application performs calculations with large data arrays (of more than 2 31 -1 elements). Reason: To operate on large data arrays, you need to select the ILP64 interface, where integers are 64-bit; otherwise, use the default, LP64, interface, where integers are 32-bit (see Using the ILP64 Interface vs. LP64 Interface). Threading model Identify whether and how your application is threaded: • Threaded with the Intel compiler • Threaded with a third-party compiler • Not threaded Reason: The compiler you use to thread your application determines which threading library you should link with your application. For applications threaded with a third-party compiler you may need to use Intel MKL in the sequential mode (for more information, see Sequential Mode of the Library and Linking with Threading Libraries). Number of threads Determine the number of threads you want Intel MKL to use. Reason: Intel MKL is based on the OpenMP* threading. By default, the OpenMP* software sets the number of threads that Intel MKL uses. If you need a different number, you have to set it yourself using one of the available mechanisms. For more information, see Using Parallelism of the Intel® Math Kernel Library. Linking model Decide which linking model is appropriate for linking your application with Intel MKL libraries: • Static • Dynamic Reason: The link line syntax and libraries for static and dynamic linking are different. For the list of link libraries for static and dynamic models, linking examples, and other relevant topics, like how to save disk space by creating a custom dynamic library, see Linking Your Application with the Intel® Math Kernel Library. 2 Intel® Math Kernel Library for Mac OS* X User's Guide 20Structure of the Intel® Math Kernel Library 3 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Architecture Support Intel® Math Kernel Library (Intel® MKL) for Mac OS* X supports the IA-32, Intel® 64, and compatible architectures in its universal libraries, located in the /lib directory. NOTE Universal libraries contain both 32-bit and 64-bit code. If these libraries are used for linking, the linker dispatches appropriate code as follows: • A 32-bit linker dispatches 32-bit code and creates 32-bit executable files. • A 64-bit linker dispatches 64-bit code and creates 64-bit executable files. See Also High-level Directory Structure Directory Structure in Detail High-level Directory Structure Directory Contents Installation directory of the Intel® Math Kernel Library (Intel® MKL) Subdirectories of bin/ Scripts to set environmental variables in the user shell bin/ia32 Shell scripts for the IA-32 architecture bin/intel64 Shell scripts for the Intel® 64 architecture benchmarks/linpack Shared-Memory (SMP) version of LINPACK benchmark examples Examples directory. Each subdirectory has source and data files include INCLUDE files for the library routines, as well as for tests and examples include/ia32 Fortran 95 .mod files for the IA-32 architecture and Intel® Fortran compiler 21Directory Contents include/intel64/lp64 Fortran 95 .mod files for the Intel® 64 architecture, Intel Fortran compiler, and LP64 interface include/intel64/ilp64 Fortran 95 .mod files for the Intel® 64 architecture, Intel® Fortran compiler, and ILP64 interface include/fftw Header files for the FFTW2 and FFTW3 interfaces interfaces/blas95 Fortran 95 interfaces to BLAS and a makefile to build the library interfaces/fftw2xc FFTW 2.x interfaces to Intel MKL FFTs (C interface) interfaces/fftw2xf FFTW 2.x interfaces to Intel MKL FFTs (Fortran interface) interfaces/fftw3xc FFTW 3.x interfaces to Intel MKL FFTs (C interface) interfaces/fftw3xf FFTW 3.x interfaces to Intel MKL FFTs (Fortran interface) interfaces/lapack95 Fortran 95 interfaces to LAPACK and a makefile to build the library lib Universal static libraries and shared objects for the IA-32 and Intel® 64 architectures tests Source and data files for tests tools Tools and plug-ins tools/builder Tools for creating custom dynamically linkable libraries tools/plugins/ com.intel.mkl.help Eclipse* IDE plug-in with Intel MKL Reference Manual in WebHelp format. See mkl_documentation.htm for more information Subdirectories of Documentation/en_US/mkl Intel MKL documentation man/en_US/man3 Man pages for Intel MKL functions See Also Notational Conventions Layered Model Concept Intel MKL is structured to support multiple compilers and interfaces, different OpenMP* implementations, both serial and multiple threads, and a wide range of processors. Conceptually Intel MKL can be divided into distinct parts to support different interfaces, threading models, and core computations: 1. Interface Layer 2. Threading Layer 3. Computational Layer You can combine Intel MKL libraries to meet your needs by linking with one library in each part layer-bylayer. Once the interface library is selected, the threading library you select picks up the chosen interface, and the computational library uses interfaces and OpenMP implementation (or non-threaded mode) chosen in the first two layers. To support threading with different compilers, one more layer is needed, which contains libraries not included in Intel MKL: • Compiler run-time libraries (RTL). The following table provides more details of each layer. 3 Intel® Math Kernel Library for Mac OS* X User's Guide 22Layer Description Interface Layer This layer matches compiled code of your application with the threading and/or computational parts of the library. This layer provides: • LP64 and ILP64 interfaces. • Compatibility with compilers that return function values differently. • A mapping between single-precision names and double-precision names for applications using Cray*-style naming (SP2DP interface). SP2DP interface supports Cray-style naming in applications targeted for the Intel 64 architecture and using the ILP64 interface. SP2DP interface provides a mapping between single-precision names (for both real and complex types) in the application and double-precision names in Intel MKL BLAS and LAPACK. Function names are mapped as shown in the following example for BLAS functions ?GEMM: SGEMM -> DGEMM DGEMM -> DGEMM CGEMM -> ZGEMM ZGEMM -> ZGEMM Mind that no changes are made to double-precision names. Threading Layer This layer: • Provides a way to link threaded Intel MKL with different threading compilers. • Enables you to link with a threaded or sequential mode of the library. This layer is compiled for different environments (threaded or sequential) and compilers (from Intel, GNU*). Computational Layer This layer is the heart of Intel MKL. It has only one library for each combination of architecture and supported OS. The Computational layer accommodates multiple architectures through identification of architecture features and chooses the appropriate binary code at run time. Compiler Run-time Libraries (RTL) To support threading with Intel compilers, Intel MKL uses RTLs of the Intel® C++ Composer XE or Intel® Fortran Composer XE. To thread using third-party threading compilers, use libraries in the Threading layer or an appropriate compatibility library. See Also Using the ILP64 Interface vs. LP64 Interface Linking Your Application with the Intel® Math Kernel Library Linking with Threading Libraries Accessing the Intel® Math Kernel Library Documentation Contents of the Documentation Directories Most of Intel MKL documentation is installed at /Documentation// mkl. For example, the documentation in English is installed at / Documentation/en_US/mkl. However, some Intel MKL-related documents are installed one or two levels up. The following table lists MKL-related documentation. File name Comment Files in /Documentation /clicense or /flicense Common end user license for the Intel® C++ Composer XE 2011 or Intel® Fortran Composer XE 2011, respectively Structure of the Intel® Math Kernel Library 3 23File name Comment mklsupport.txt Information on package number for customer support reference Contents of /Documentation//mkl redist.txt List of redistributable files mkl_documentation.htm Overview and links for the Intel MKL documentation mkl_manual/index.htm Intel MKL Reference Manual in an uncompressed HTML format Release_Notes.htm Intel MKL Release Notes mkl_userguide/index.htm Intel MKL User's Guide in an uncompressed HTML format, this document mkl_link_line_advisor.htm Intel MKL Link-line Advisor Viewing Man Pages To access Intel MKL man pages, add the man pages directory to the MANPATH environment variable. If you performed the Setting Environment Variables step of the Getting Started process, this is done automatically. To view the man page for an Intel MKL function, enter the following command in your command shell: man In this release, is the function name with omitted prefixes denoting data type, task type, or any other field that may vary for this function. Examples: • For the BLAS function ddot, enter man dot • For the statistical function vslConvSetMode, enter man vslSetMode • For the VML function vdPackM , enter man vPack • For the FFT function DftiCommitDescriptor, enter man DftiCommitDescriptor NOTE Function names in the man command are case-sensitive. See Also High-level Directory Structure Setting Environment Variables 3 Intel® Math Kernel Library for Mac OS* X User's Guide 24Linking Your Application with the Intel® Math Kernel Library 4 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Linking Quick Start Intel® Math Kernel Library (Intel® MKL) provides several options for quick linking of your application, which depend on the way you link: Using the Intel® Composer XE compiler see Using the -mkl Compiler Option. Explicit dynamic linking see Using the Single Dynamic Library for how to simplify your link line. Explicitly listing libraries on your link line see Selecting Libraries to Link with for a summary of the libraries. Using an interactive interface see Using the Link-line Advisor to determine libraries and options to specify on your link or compilation line. Using an internally provided tool see Using the Command-line Link Tool to determine libraries, options, and environment variables or even compile and build your application. Using the -mkl Compiler Option The Intel® Composer XE compiler supports the following variants of the -mkl compiler option: -mkl or -mkl=parallel to link with standard threaded Intel MKL. -mkl=sequential to link with sequential version of Intel MKL. -mkl=cluster to link with Intel MKL cluster components (sequential) that use Intel MPI. For more information on the -mkl compiler option, see the Intel Compiler User and Reference Guides. On Intel® 64 architecture systems, for each variant of the -mkl option, the compiler links your application using the LP64 interface. If you specify any variant of the -mkl compiler option, the compiler automatically includes the Intel MKL libraries. In cases not covered by the option, use the Link-line Advisor or see Linking in Detail. See Also Listing Libraries on a Link Line Using the ILP64 Interface vs. LP64 Interface Using the Link-line Advisor 25Intel® Software Documentation Library Using the Single Dynamic Library You can simplify your link line through the use of the Intel MKL Single Dynamic Library (SDL). To use SDL, place libmkl_rt.dylib on your link line. For example: ic? application.c -lmkl_rt SDL enables you to select the interface and threading library for Intel MKL at run time. By default, linking with SDL provides: • LP64 interface on systems based on the Intel® 64 architecture • Intel threading To use other interfaces or change threading preferences, including use of the sequential version of Intel MKL, you need to specify your choices using functions or environment variables as explained in section Dynamically Selecting the Interface and Threading Layer. Selecting Libraries to Link with To link with Intel MKL: • Choose one library from the Interface layer and one library from the Threading layer • Add the only library from the Computational layer and run-time libraries (RTL) The following table lists Intel MKL libraries to link with your application. Interface layer Threading layer Computational layer RTL IA-32 architecture, static linking libmkl_intel.a libmkl_intel_ thread.a libmkl_core.a libiomp5.dylib IA-32 architecture, dynamic linking libmkl_intel. dylib libmkl_intel_ thread.dylib libmkl_core. dylib libiomp5.dylib Intel® 64 architecture, static linking libmkl_intel_ lp64.a libmkl_intel_ thread.a libmkl_core.a libiomp5.dylib Intel® 64 architecture, dynamic linking libmkl_intel_ lp64.dylib libmkl_intel_ thread.dylib libmkl_core. dylib libiomp5.dylib The Single Dynamic Library (SDL) automatically links interface, threading, and computational libraries and thus simplifies linking. The following table lists Intel MKL libraries for dynamic linking using SDL. See Dynamically Selecting the Interface and Threading Layer for how to set the interface and threading layers at run time through function calls or environment settings. SDL RTL IA-32 and Intel® 64 architectures libmkl_rt.dylib libiomp5.dylib † † Use the Link-line Advisor to check whether you need to explicitly link the libiomp5.dylib RTL. For exceptions and alternatives to the libraries listed above, see Linking in Detail. See Also Layered Model Concept 4 Intel® Math Kernel Library for Mac OS* X User's Guide 26Using the Link-line Advisor Using the -mkl Compiler Option Using the Link-line Advisor Use the Intel MKL Link-line Advisor to determine the libraries and options to specify on your link or compilation line. The latest version of the tool is available at http://software.intel.com/en-us/articles/intel-mkl-link-lineadvisor. The tool is also available in the product. The Advisor requests information about your system and on how you intend to use Intel MKL (link dynamically or statically, use threaded or sequential mode, etc.). The tool automatically generates the appropriate link line for your application. See Also Contents of the Documentation Directories Using the Command-line Link Tool Use the command-line Link tool provided by Intel MKL to simplify building your application with Intel MKL. The tool not only provides the options, libraries, and environment variables to use, but also performs compilation and building of your application. The tool mkl_link_tool is installed in the /tools directory. See the knowledge base article at http://software.intel.com/en-us/articles/mkl-command-line-link-tool for more information. Linking Examples See Also Using the Link-line Advisor Linking on IA-32 Architecture Systems The following examples illustrate linking that uses Intel(R) compilers. The examples use the .f Fortran source file. C/C++ users should instead specify a .cpp (C++) or .c (C) file and replace ifort with icc NOTE If you successfully completed the Setting Environment Variables step of the Getting Started process, you can omit -I$MKLINCLUDE in all the examples and omit -L$MKLPATH in the examples for dynamic linking. In these examples, MKLPATH=$MKLROOT/lib, MKLINCLUDE=$MKLROOT/include : • Static linking of myprog.f and parallel Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE $MKLPATH/libmkl_intel.a $MKLPATH/ libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel.a $MKLPATH/ libmkl_intel_thread.a $MKLPATH/libmkl_core.a -liomp5 -lpthread • Dynamic linking of myprog.f and parallel Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel -lmkl_intel_thread -lmkl_core -liomp5 -lpthread • Static linking of myprog.f and sequential version of Intel MKL: Linking Your Application with the Intel® Math Kernel Library 4 27ifort myprog.f -L$MKLPATH -I$MKLINCLUDE $MKLPATH/libmkl_intel.a $MKLPATH/ libmkl_sequential.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel.a $MKLPATH/ libmkl_sequential.a $MKLPATH/libmkl_core.a -lpthread • Dynamic linking of myprog.f and sequential version of Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel -lmkl_sequential -lmkl_core -lpthread • Dynamic linking of user code myprog.f and parallel or sequential Intel MKL (Call the mkl_set_threading_layer function or set value of the MKL_THREADING_LAYER environment variable to choose threaded or sequential mode): ifort myprog.f -lmkl_rt • Static linking of myprog.f, Fortran 95 LAPACK interface, and parallel Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/ia32 -lmkl_lapack95 $MKLPATH/libmkl_intel.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -liomp5 -lpthread • Static linking of myprog.f, Fortran 95 BLAS interface, and parallel Intel MKL: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/ia32 -lmkl_blas95 $MKLPATH/libmkl_intel.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/ libmkl_core.a $MKLPATH/libmkl_intel.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/ libmkl_core.a -liomp5 -lpthread See Also Fortran 95 Interfaces to LAPACK and BLAS Linking on Intel(R) 64 Architecture Systems The following examples illustrate linking that uses Intel(R) compilers. The examples use the .f Fortran source file. C/C++ users should instead specify a .cpp (C++) or .c (C) file and replace ifort with icc NOTE If you successfully completed the Setting Environment Variables step of the Getting Started process, you can omit -I$MKLINCLUDE in all the examples and omit -L$MKLPATH in the examples for dynamic linking. In these examples, MKLPATH=$MKLROOT/lib, MKLINCLUDE=$MKLROOT/include: • Static linking of myprog.f and parallel Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE $MKLPATH/libmkl_intel_lp64.a $MKLPATH/ libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -liomp5 -lpthread • Dynamic linking of myprog.f and parallel Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread • Static linking of myprog.f and sequential version of Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE $MKLPATH/libmkl_intel_lp64.a $MKLPATH/ libmkl_sequential.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -lpthread 4 Intel® Math Kernel Library for Mac OS* X User's Guide 28• Dynamic linking of myprog.f and sequential version of Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread • Static linking of myprog.f and parallel Intel MKL supporting the ILP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE $MKLPATH/libmkl_intel_ilp64.a $MKLPATH/ libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel_ilp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -liomp5 -lpthread • Dynamic linking of myprog.f and parallel Intel MKL supporting the ILP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread • Dynamic linking of user code myprog.f and parallel or sequential Intel MKL (Call appropriate functions or set environment variables to choose threaded or sequential mode and to set the interface): ifort myprog.f -lmkl_rt • Static linking of myprog.f, Fortran 95 LAPACK interface, and parallel Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/intel64/lp64 -lmkl_lapack95_lp64 $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -liomp5 -lpthread • Static linking of myprog.f, Fortran 95 BLAS interface, and parallel Intel MKL supporting the LP64 interface: ifort myprog.f -L$MKLPATH -I$MKLINCLUDE -I$MKLINCLUDE/intel64/lp64 -lmkl_blas95_lp64 $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -liomp5 -lpthread See Also Fortran 95 Interfaces to LAPACK and BLAS Linking in Detail This section recommends which libraries to link with depending on your Intel MKL usage scenario and provides details of the linking. Listing Libraries on a Link Line To link with Intel MKL, specify paths and libraries on the link line as shown below. NOTE The syntax below is for dynamic linking. For static linking, replace each library name preceded with "-l" with the path to the library file. For example, replace -lmkl_core with $MKLPATH/ libmkl_core.a, where $MKLPATH is the appropriate user-defined environment variable. -L -I [-I/{ia32|intel64|{ilp64|lp64}}] [-lmkl_blas{95|95_ilp64|95_lp64}] [-lmkl_lapack{95|95_ilp64|95_lp64}] -lmkl_{intel|intel_ilp64|intel_lp64} Linking Your Application with the Intel® Math Kernel Library 4 29-lmkl_{intel_thread|sequential} -lmkl_core -liomp5 [-lpthread] [-lm] In case of static linking, for all components except BLAS and FFT, repeat interface, threading, and computational libraries two times (for example, libmkl_intel_ilp64.a libmkl_intel_thread.a libmkl_core.a libmkl_intel_ilp64.a libmkl_intel_thread.a libmkl_core.a). For the LAPACK component, repeat the threading and computational libraries three times. The order of listing libraries on the link line is essential. See Also Using the Link-line Advisor Linking Examples Dynamically Selecting the Interface and Threading Layer The Single Dynamic Library (SDL) enables you to dynamically select the interface and threading layer for Intel MKL. Setting the Interface Layer Available interfaces depend on the architecture of your system. On systems based on the Intel ® 64 architecture, LP64 and ILP64 interfaces are available. To set one of these interfaces at run time, use the mkl_set_interface_layer function or the MKL_INTERFACE_LAYER environment variable. The following table provides values to be used to set each interface. Interface Layer Value of MKL_INTERFACE_LAYER Value of the Parameter of mkl_set_interface_layer LP64 LP64 MKL_INTERFACE_LP64 ILP64 ILP64 MKL_INTERFACE_ILP64 If the mkl_set_interface_layer function is called, the environment variable MKL_INTERFACE_LAYER is ignored. By default the LP64 interface is used. See the Intel MKL Reference Manual for details of the mkl_set_interface_layer function. Setting the Threading Layer To set the threading layer at run time, use the mkl_set_threading_layer function or the MKL_THREADING_LAYER environment variable. The following table lists available threading layers along with the values to be used to set each layer. Threading Layer Value of MKL_THREADING_LAYER Value of the Parameter of mkl_set_threading_layer Intel threading INTEL MKL_THREADING_INTEL Sequential mode of Intel MKL SEQUENTIAL MKL_THREADING_SEQUENTIAL If the mkl_set_threading_layer function is called, the environment variable MKL_THREADING_LAYER is ignored. By default Intel threading is used. 4 Intel® Math Kernel Library for Mac OS* X User's Guide 30See the Intel MKL Reference Manual for details of the mkl_set_threading_layer function. See Also Using the Single Dynamic Library Layered Model Concept Directory Structure in Detail Linking with Interface Libraries Using the ILP64 Interface vs. LP64 Interface The Intel MKL ILP64 libraries use the 64-bit integer type (necessary for indexing large arrays, with more than 2 31 -1 elements), whereas the LP64 libraries index arrays with the 32-bit integer type. The LP64 and ILP64 interfaces are implemented in the Interface layer. Link with the following interface libraries for the LP64 or ILP64 interface, respectively: • libmkl_intel_lp64.a or libmkl_intel_ilp64.a for static linking • libmkl_intel_lp64.dylib or libmkl_intel_ilp64.dylib for dynamic linking The ILP64 interface provides for the following: • Support large data arrays (with more than 2 31 -1 elements) • Enable compiling your Fortran code with the -i8 compiler option The LP64 interface provides compatibility with the previous Intel MKL versions because "LP64" is just a new name for the only interface that the Intel MKL versions lower than 9.1 provided. Choose the ILP64 interface if your application uses Intel MKL for calculations with large data arrays or the library may be used so in future. Intel MKL provides the same include directory for the ILP64 and LP64 interfaces. Compiling for LP64/ILP64 The table below shows how to compile for the ILP64 and LP64 interfaces: Fortran Compiling for ILP64 ifort -i8 -I/include ... Compiling for LP64 ifort -I/include ... C or C++ Compiling for ILP64 icc -DMKL_ILP64 -I/include ... Compiling for LP64 icc -I/include ... CAUTION Linking of an application compiled with the -i8 or -DMKL_ILP64 option to the LP64 libraries may result in unpredictable consequences and erroneous output. Coding for ILP64 You do not need to change existing code if you are not using the ILP64 interface. To migrate to ILP64 or write new code for ILP64, use appropriate types for parameters of the Intel MKL functions and subroutines: Linking Your Application with the Intel® Math Kernel Library 4 31Integer Types Fortran C or C++ 32-bit integers INTEGER*4 or INTEGER(KIND=4) int Universal integers for ILP64/ LP64: • 64-bit for ILP64 • 32-bit otherwise INTEGER without specifying KIND MKL_INT Universal integers for ILP64/ LP64: • 64-bit integers INTEGER*8 or INTEGER(KIND=8) MKL_INT64 FFT interface integers for ILP64/ LP64 INTEGER without specifying KIND MKL_LONG To determine the type of an integer parameter of a function, use appropriate include files. For functions that support only a Fortran interface, use the C/C++ include files *.h. The above table explains which integer parameters of functions become 64-bit and which remain 32-bit for ILP64. The table applies to most Intel MKL functions except some VML and VSL functions, which require integer parameters to be 64-bit or 32-bit regardless of the interface: • VML: The mode parameter of VML functions is 64-bit. • Random Number Generators (RNG): All discrete RNG except viRngUniformBits64 are 32-bit. The viRngUniformBits64 generator function and vslSkipAheadStream service function are 64-bit. • Summary Statistics: The estimate parameter of the vslsSSCompute/vsldSSCompute function is 64- bit. Refer to the Intel MKL Reference Manual for more information. To better understand ILP64 interface details, see also examples and tests. Limitations All Intel MKL function domains support ILP64 programming with the following exceptions: • FFTW interfaces to Intel MKL: • FFTW 2.x wrappers do not support ILP64. • FFTW 3.2 wrappers support ILP64 by a dedicated set of functions plan_guru64. • GMP* Arithmetic Functions do not support ILP64. NOTE GMP Arithmetic Functions are deprecated and will be removed in a future release. See Also High-level Directory Structure Include Files Language Interfaces Support, by Function Domain Layered Model Concept Directory Structure in Detail 4 Intel® Math Kernel Library for Mac OS* X User's Guide 32Linking with Fortran 95 Interface Libraries The libmkl_blas95*.a and libmkl_lapack95*.a libraries contain Fortran 95 interfaces for BLAS and LAPACK, respectively, which are compiler-dependent. In the Intel MKL package, they are prebuilt for the Intel® Fortran compiler. If you are using a different compiler, build these libraries before using the interface. See Also Fortran 95 Interfaces to LAPACK and BLAS Compiler-dependent Functions and Fortran 90 Modules Linking with Threading Libraries Sequential Mode of the Library You can use Intel MKL in a sequential (non-threaded) mode. In this mode, Intel MKL runs unthreaded code. However, it is thread-safe (except the LAPACK deprecated routine ?lacon), which means that you can use it in a parallel region in your OpenMP* code. The sequential mode requires no compatibility OpenMP* run-time library and does not respond to the environment variable OMP_NUM_THREADS or its Intel MKL equivalents. You should use the library in the sequential mode only if you have a particular reason not to use Intel MKL threading. The sequential mode may be helpful when using Intel MKL with programs threaded with some non-Intel compilers or in other situations where you need a non-threaded version of the library (for instance, in some MPI cases). To set the sequential mode, in the Threading layer, choose the *sequential.* library. Add the POSIX threads library (pthread) to your link line for the sequential mode because the *sequential.* library depends on pthread . See Also Directory Structure in Detail Using Parallelism of the Intel® Math Kernel Library Avoiding Conflicts in the Execution Environment Linking Examples Selecting the Threading Layer Several compilers that Intel MKL supports use the OpenMP* threading technology. Intel MKL supports implementations of the OpenMP* technology that these compilers provide. To make use of this support, you need to link with the appropriate library in the Threading Layer and Compiler Support Run-time Library (RTL). Threading Layer Each Intel MKL threading library contains the same code compiled by the respective compiler (Intel, gnu and PGI* compilers on Mac OS X). RTL This layer includes libiomp, the compatibility OpenMP* run-time library of the Intel compiler. In addition to the Intel compiler, libiomp provides support for one more threading compiler on Mac OS X (GNU). That is, a program threaded with a GNU compiler can safely be linked with Intel MKL and libiomp. The table below helps explain what threading library and RTL you should choose under different scenarios when using Intel MKL (static cases only): Linking Your Application with the Intel® Math Kernel Library 4 33Compiler Application Threaded? Threading Layer RTL Recommended Comment Intel Does not matter libmkl_intel_ thread.a libiomp5.dylib PGI Yes libmkl_pgi_ thread.a or libmkl_ sequential.a PGI* supplied Use of libmkl_sequential.a removes threading from Intel MKL calls. PGI No libmkl_intel_ thread.a libiomp5.dylib PGI No libmkl_pgi_ thread.a PGI* supplied PGI No libmkl_ sequential.a None gnu Yes libmkl_ sequential.a None gnu No libmkl_intel_ thread.a libiomp5.dylib other Yes libmkl_ sequential.a None other No libmkl_intel_ thread.a libiomp5.dylib Linking with Compiler Run-time Libraries Dynamically link libiomp, the compatibility OpenMP* run-time library, even if you link other libraries statically. Linking to the libiomp statically can be problematic because the more complex your operating environment or application, the more likely redundant copies of the library are included. This may result in performance issues (oversubscription of threads) and even incorrect results. To link libiomp dynamically, be sure the DYLD_LIBRARY_PATH environment variable is defined correctly. See Also Setting Environment Variables Layered Model Concept Linking with System Libraries To use the Intel MKL FFT, Trigonometric Transform, or Poisson, Laplace, and Helmholtz Solver routines, link in the math support system library by adding " -lm " to the link line. On Mac OS X, the libiomp library relies on the native pthread library for multi-threading. Any time libiomp is required, add -lpthread to your link line afterwards (the order of listing libraries is important). 4 Intel® Math Kernel Library for Mac OS* X User's Guide 34Building Custom Dynamically Linked Shared Libraries ?ustom dynamically linked shared libraries reduce the collection of functions available in Intel MKL libraries to those required to solve your particular problems, which helps to save disk space and build your own dynamic libraries for distribution. The Intel MKL custom dynamically linked shared library builder enables you to create a dynamic ally linked shared library containing the selected functions and located in the tools/builder directory. The builder contains a makefile and a definition file with the list of functions. Using the Custom Dynamically Linked Shared Library Builder To build a custom dynamically linked shared library, use the following command: make target [] The following table lists possible values of target and explains what the command does for each value: Value Comment libuni The builder uses static Intel MKL interface, threading, and core libraries to build a universal dynamically linked shared library for the IA-32 or Intel® 64 architecture. dylibuni The builder uses the single dynamic library libmkl_rt.dylib to build a universal dynamically linked shared library for the IA-32 or Intel® 64 architecture. help The command prints Help on the custom dynamically linked shared library builder The placeholder stands for the list of parameters that define macros to be used by the makefile. The following table describes these parameters: Parameter [Values] Description interface = {lp64|ilp64} Defines whether to use LP64 or ILP64 programming interfacefor the Intel 64architecture.The default value is lp64. threading = {parallel| sequential} Defines whether to use the Intel MKL in the threaded or sequential mode. The default value is parallel. export = Specifies the full name of the file that contains the list of entry-point functions to be included in the shared object. The default name is user_example_list (no extension). name = Specifies the name of the library to be created. By default, the names of the created library is mkl_custom.dylib. xerbla = Specifies the name of the object file .o that contains the user's error handler. The makefile adds this error handler to the library for use instead of the default Intel MKL error handler xerbla. If you omit this parameter, the native Intel MKL xerbla is used. See the description of the xerbla function in the Intel MKL Reference Manual on how to develop your own error handler. MKLROOT = Specifies the location of Intel MKL libraries used to build the custom dynamically linked shared library. By default, the builder uses the Intel MKL installation directory. All the above parameters are optional. In the simplest case, the command line is make ia32, and the missing options have default values. This command creates the mkl_custom.dylib library for processors using the IA-32 architecture. The command takes the list of functions from the user_list file and uses the native Intel MKL error handler xerbla. An example of a more complex case follows: Linking Your Application with the Intel® Math Kernel Library 4 35make ia32 export=my_func_list.txt name=mkl_small xerbla=my_xerbla.o In this case, the command creates the mkl_small.dylib library for processors using the IA-32 architecture. The command takes the list of functions from my_func_list.txt file and uses the user's error handler my_xerbla.o. The process is similar for processors using the Intel® 64 architecture. See Also Using the Single Dynamic Library Composing a List of Functions To compose a list of functions for a minimal custom dynamically linked shared library needed for your application, you can use the following procedure: 1. Link your application with installed Intel MKL libraries to make sure the application builds. 2. Remove all Intel MKL libraries from the link line and start linking. Unresolved symbols indicate Intel MKL functions that your application uses. 3. Include these functions in the list. Important Each time your application starts using more Intel MKL functions, update the list to include the new functions. See Also Specifying Function Names Specifying Function Names In the file with the list of functions for your custom dynamically linked shared library, adjust function names to the required interface. For example, for Fortran functions append an underscore character "_" to the names as a suffix: dgemm_ ddot_ dgetrf_ For more examples, see domain-specific lists of functions in the /tools/builder folder. NOTE The lists of functions are provided in the /tools/builder folder merely as examples. See Composing a List of Functions for how to compose lists of functions for your custom dynamically linked shared library. TIP Names of Fortran-style routines (BLAS, LAPACK, etc.) can be both upper-case or lower-case, with or without the trailing underscore. For example, these names are equivalent: BLAS: dgemm, DGEMM, dgemm_, DGEMM_ LAPACK: dgetrf, DGETRF, dgetrf_, DGETRF_. Properly capitalize names of C support functions in the function list. To do this, follow the guidelines below: 1. In the mkl_service.h include file, look up a #define directive for your function. 2. Take the function name from the replacement part of that directive. For example, the #define directive for the mkl_disable_fast_mm function is #define mkl_disable_fast_mm MKL_Disable_Fast_MM. Capitalize the name of this function in the list like this: MKL_Disable_Fast_MM. 4 Intel® Math Kernel Library for Mac OS* X User's Guide 36For the names of the Fortran support functions, see the tip. NOTE If selected functions have several processor-specific versions, the builder automatically includes them all in the custom library and the dispatcher manages them. Distributing Your Custom Dynamically Linked Shared Library To enable use of your custom dynamically linked shared library in a threaded mode, distribute libiomp5.dylib along with the custom dynamically linked shared library. Linking Your Application with the Intel® Math Kernel Library 4 374 Intel® Math Kernel Library for Mac OS* X User's Guide 38Managing Performance and Memory 5 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Using Parallelism of the Intel® Math Kernel Library Intel MKL is extensively parallelized. See Threaded Functions and Problems for lists of threaded functions and problems that can be threaded. Intel MKL is thread-safe, which means that all Intel MKL functions (except the LAPACK deprecated routine ? lacon) work correctly during simultaneous execution by multiple threads. In particular, any chunk of threaded Intel MKL code provides access for multiple threads to the same shared data, while permitting only one thread at any given time to access a shared piece of data. Therefore, you can call Intel MKL from multiple threads and not worry about the function instances interfering with each other. The library uses OpenMP* threading software, so you can use the environment variable OMP_NUM_THREADS to specify the number of threads or the equivalent OpenMP run-time function calls. Intel MKL also offers variables that are independent of OpenMP, such as MKL_NUM_THREADS, and equivalent Intel MKL functions for thread management. The Intel MKL variables are always inspected first, then the OpenMP variables are examined, and if neither is used, the OpenMP software chooses the default number of threads. By default, Intel MKL uses the number of threads equal to the number of physical cores on the system. To achieve higher performance, set the number of threads to the number of real processors or physical cores, as summarized in Techniques to Set the Number of Threads. Threaded Functions and Problems The following Intel MKL function domains are threaded: • Direct sparse solver. • LAPACK. For the list of threaded routines, see Threaded LAPACK Routines. • Level1 and Level2 BLAS. For the list of threaded routines, see Threaded BLAS Level1 and Level2 Routines. • All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers. • All mathematical VML functions. • FFT. For the list of FFT transforms that can be threaded, see Threaded FFT Problems. Threaded LAPACK Routines In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z. 39The following LAPACK routines are threaded: • Linear equations, computational routines: • Factorization: ?getrf, ?gbtrf, ?potrf, ?pptrf, ?sytrf, ?hetrf, ?sptrf, ?hptrf • Solving: ?dttrsb, ?gbtrs, ?gttrs, ?pptrs, ?pbtrs, ?pttrs, ?sytrs, ?sptrs, ?hptrs, ? tptrs, ?tbtrs • Orthogonal factorization, computational routines: ?geqrf, ?ormqr, ?unmqr, ?ormlq, ?unmlq, ?ormql, ?unmql, ?ormrq, ?unmrq • Singular Value Decomposition, computational routines: ?gebrd, ?bdsqr • Symmetric Eigenvalue Problems, computational routines: ?sytrd, ?hetrd, ?sptrd, ?hptrd, ?steqr, ?stedc. • Generalized Nonsymmetric Eigenvalue Problems, computational routines: chgeqz/zhgeqz. A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of parallelism: ?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges, cggesx/zggesx, cggev/zggev, cggevx/zggevx, and so on. Threaded BLAS Level1 and Level2 Routines In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z. The following routines are threaded for Intel ® Core™2 Duo and Intel ® Core™ i7 processors: • Level1 BLAS: ?axpy, ?copy, ?swap, ddot/sdot, cdotc, drot/srot • Level2 BLAS: ?gemv, ?trmv, dsyr/ssyr, dsyr2/ssyr2, dsymv/ssymv Threaded FFT Problems The following characteristics of a specific problem determine whether your FFT computation may be threaded: • rank • domain • size/length • precision (single or double) • placement (in-place or out-of-place) • strides • number of transforms • layout (for example, interleaved or split layout of complex data) Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow. One-dimensional (1D) transforms 1D transforms are threaded in many cases. 1D complex-to-complex (c2c) transforms of size N using interleaved complex data layout are threaded under the following conditions depending on the architecture: 5 Intel® Math Kernel Library for Mac OS* X User's Guide 40Architecture Conditions Intel ® 64 N is a power of 2, log2(N) > 9, the transform is double-precision out-of-place, and input/output strides equal 1. IA-32 N is a power of 2, log2(N) > 13, and the transform is single-precision. N is a power of 2, log2(N) > 14, and the transform is double-precision. Any N is composite, log2(N) > 16, and input/output strides equal 1. 1D real-to-complex and complex-to-real transforms are not threaded. 1D complex-to-complex transforms using split-complex layout are not threaded. Prime-size complex-to-complex 1D transforms are not threaded. Multidimensional transforms All multidimensional transforms on large-volume data are threaded. Avoiding Conflicts in the Execution Environment Certain situations can cause conflicts in the execution environment that make the use of threads in Intel MKL problematic. This section briefly discusses why these problems exist and how to avoid them. If you thread the program using OpenMP directives and compile the program with Intel compilers, Intel MKL and the program will both use the same threading library. Intel MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads unless you specifically request Intel MKL to do so via the MKL_DYNAMIC functionality. However, Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library. If your program is threaded by some other means, Intel MKL may operate in multithreaded mode, and the performance may suffer due to overuse of the resources. The following table considers several cases where the conflicts may arise and provides recommendations depending on your threading model: Threading model Discussion You thread the program using OS threads (pthreads on Mac OS* X). If more than one thread calls Intel MKL, and the function being called is threaded, it may be important that you turn off Intel MKL threading. Set the number of threads to one by any of the available means (see Techniques to Set the Number of Threads). You thread the program using OpenMP directives and/or pragmas and compile the program using a compiler other than a compiler from Intel. This is more problematic because setting of the OMP_NUM_THREADS environment variable affects both the compiler's threading library and libiomp. In this case, choose the threading library that matches the layered Intel MKL with the OpenMP compiler you employ (see Linking Examples on how to do this). If this is not possible, use Intel MKL in the sequential mode. To do this, you should link with the appropriate threading library: libmkl_sequential.a or libmkl_sequential.dylib (see High-level Directory Structure). There are multiple programs running on a multiple-cpu system, for example, a parallelized program that runs using MPI for communication in which each processor is treated as a node. The threading software will see multiple processors on the system even though each processor has a separate MPI process running on it. In this case, one of the solutions is to set the number of threads to one by any of the available means (see Techniques to Set the Number of Threads). See Also Using Additional Threading Control Managing Performance and Memory 5 41Linking with Compiler Run-time Libraries Techniques to Set the Number of Threads Use one of the following techniques to change the number of threads to use in Intel MKL: • Set one of the OpenMP or Intel MKL environment variables: • OMP_NUM_THREADS • MKL_NUM_THREADS • MKL_DOMAIN_NUM_THREADS • Call one of the OpenMP or Intel MKL functions: • omp_set_num_threads() • mkl_set_num_threads() • mkl_domain_set_num_threads() When choosing the appropriate technique, take into account the following rules: • The Intel MKL threading controls take precedence over the OpenMP controls because they are inspected first. • A function call takes precedence over any environment variables. The exception, which is a consequence of the previous rule, is the OpenMP subroutine omp_set_num_threads(), which does not have precedence over Intel MKL environment variables, such as MKL_NUM_THREADS. See Using Additional Threading Control for more details. • You cannot change run-time behavior in the course of the run using the environment variables because they are read only once at the first call to Intel MKL. Setting the Number of Threads Using an OpenMP* Environment Variable You can set the number of threads using the environment variable OMP_NUM_THREADS. To change the number of threads, in the command shell in which the program is going to run, enter: export OMP_NUM_THREADS=. See Also Using Additional Threading Control Changing the Number of Threads at Run Time You cannot change the number of threads during run time using environment variables. However, you can call OpenMP API functions from your program to change the number of threads during run time. The following sample code shows how to change the number of threads during run time using the omp_set_num_threads() routine. See also Techniques to Set the Number of Threads. The following example shows both C and Fortran code examples. To run this example in the C language, use the omp.h header file from the Intel(R) compiler package. If you do not have the Intel compiler but wish to explore the functionality in the example, use Fortran API for omp_set_num_threads() rather than the C version. For example, omp_set_num_threads_( &i_one ); // ******* C language ******* #include "omp.h" #include "mkl.h" #include #define SIZE 1000 int main(int args, char *argv[]){ double *a, *b, *c; a = (double*)malloc(sizeof(double)*SIZE*SIZE); b = (double*)malloc(sizeof(double)*SIZE*SIZE); c = (double*)malloc(sizeof(double)*SIZE*SIZE); double alpha=1, beta=1; 5 Intel® Math Kernel Library for Mac OS* X User's Guide 42int m=SIZE, n=SIZE, k=SIZE, lda=SIZE, ldb=SIZE, ldc=SIZE, i=0, j=0; char transa='n', transb='n'; for( i=0; i #include ... mkl_set_num_threads ( 1 ); // ******* Fortran language ******* ... call mkl_set_num_threads( 1 ) See the Intel MKL Reference Manual for the detailed description of the threading control functions, their parameters, calling syntax, and more code examples. MKL_DYNAMIC The MKL_DYNAMIC environment variable enables Intel MKL to dynamically change the number of threads. The default value of MKL_DYNAMIC is TRUE, regardless of OMP_DYNAMIC, whose default value may be FALSE. When MKL_DYNAMIC is TRUE, Intel MKL tries to use what it considers the best number of threads, up to the maximum number you specify. For example, MKL_DYNAMIC set to TRUE enables optimal choice of the number of threads in the following cases: • If the requested number of threads exceeds the number of physical cores (perhaps because of using the Intel® Hyper-Threading Technology), and MKL_DYNAMIC is not changed from its default value of TRUE, Intel MKL will scale down the number of threads to the number of physical cores. • If you are able to detect the presence of MPI, but cannot determine if it has been called in a thread-safe mode (it is impossible to detect this with MPICH 1.2.x, for instance), and MKL_DYNAMIC has not been changed from its default value of TRUE, Intel MKL will run one thread. When MKL_DYNAMIC is FALSE, Intel MKL tries not to deviate from the number of threads the user requested. However, setting MKL_DYNAMIC=FALSE does not ensure that Intel MKL will use the number of threads that you request. The library may have no choice on this number for such reasons as system resources. Managing Performance and Memory 5 45Additionally, the library may examine the problem and use a different number of threads than the value suggested. For example, if you attempt to do a size one matrix-matrix multiply across eight threads, the library may instead choose to use only one thread because it is impractical to use eight threads in this event. Note also that if Intel MKL is called in a parallel region, it will use only one thread by default. If you want the library to use nested parallelism, and the thread within a parallel region is compiled with the same OpenMP compiler as Intel MKL is using, you may experiment with setting MKL_DYNAMIC to FALSE and manually increasing the number of threads. In general, set MKL_DYNAMIC to FALSE only under circumstances that Intel MKL is unable to detect, for example, to use nested parallelism where the library is already called from a parallel section. MKL_DOMAIN_NUM_THREADS The MKL_DOMAIN_NUM_THREADS environment variable suggests the number of threads for a particular function domain. MKL_DOMAIN_NUM_THREADS accepts a string value , which must have the following format: ::= { } ::= [ * ] ( | | | ) [ * ] ::= ::= MKL_DOMAIN_ALL | MKL_DOMAIN_BLAS | MKL_DOMAIN_FFT | MKL_DOMAIN_VML | MKL_DOMAIN_PARDISO ::= [ * ] ( | | ) [ * ] ::= ::= | | In the syntax above, values of indicate function domains as follows: MKL_DOMAIN_ALL All function domains MKL_DOMAIN_BLAS BLAS Routines MKL_DOMAIN_FFT Fourier Transform Functions MKL_DOMAIN_VML Vector Mathematical Functions MKL_DOMAIN_PARDISO PARDISO For example, MKL_DOMAIN_ALL 2 : MKL_DOMAIN_BLAS 1 : MKL_DOMAIN_FFT 4 MKL_DOMAIN_ALL=2 : MKL_DOMAIN_BLAS=1 : MKL_DOMAIN_FFT=4 MKL_DOMAIN_ALL=2, MKL_DOMAIN_BLAS=1, MKL_DOMAIN_FFT=4 MKL_DOMAIN_ALL=2; MKL_DOMAIN_BLAS=1; MKL_DOMAIN_FFT=4 MKL_DOMAIN_ALL = 2 MKL_DOMAIN_BLAS 1 , MKL_DOMAIN_FFT 4 MKL_DOMAIN_ALL,2: MKL_DOMAIN_BLAS 1, MKL_DOMAIN_FFT,4 . The global variables MKL_DOMAIN_ALL, MKL_DOMAIN_BLAS, MKL_DOMAIN_FFT, MKL_DOMAIN_VML, and MKL_DOMAIN_PARDISO, as well as the interface for the Intel MKL threading control functions, can be found in the mkl.h header file. The table below illustrates how values of MKL_DOMAIN_NUM_THREADS are interpreted. 5 Intel® Math Kernel Library for Mac OS* X User's Guide 46Value of MKL_DOMAIN_NUM_ THREADS Interpretation MKL_DOMAIN_ALL= 4 All parts of Intel MKL should try four threads. The actual number of threads may be still different because of the MKL_DYNAMIC setting or system resource issues. The setting is equivalent to MKL_NUM_THREADS = 4. MKL_DOMAIN_ALL= 1, MKL_DOMAIN_BLAS =4 All parts of Intel MKL should try one thread, except for BLAS, which is suggested to try four threads. MKL_DOMAIN_VML= 2 VML should try two threads. The setting affects no other part of Intel MKL. Be aware that the domain-specific settings take precedence over the overall ones. For example, the "MKL_DOMAIN_BLAS=4" value of MKL_DOMAIN_NUM_THREADS suggests trying four threads for BLAS, regardless of later setting MKL_NUM_THREADS, and a function call "mkl_domain_set_num_threads ( 4, MKL_DOMAIN_BLAS );" suggests the same, regardless of later calls to mkl_set_num_threads(). However, a function call with input "MKL_DOMAIN_ALL", such as "mkl_domain_set_num_threads (4, MKL_DOMAIN_ALL);" is equivalent to "mkl_set_num_threads(4)", and thus it will be overwritten by later calls to mkl_set_num_threads. Similarly, the environment setting of MKL_DOMAIN_NUM_THREADS with "MKL_DOMAIN_ALL=4" will be overwritten with MKL_NUM_THREADS = 2. Whereas the MKL_DOMAIN_NUM_THREADS environment variable enables you set several variables at once, for example, "MKL_DOMAIN_BLAS=4,MKL_DOMAIN_FFT=2", the corresponding function does not take string syntax. So, to do the same with the function calls, you may need to make several calls, which in this example are as follows: mkl_domain_set_num_threads ( 4, MKL_DOMAIN_BLAS ); mkl_domain_set_num_threads ( 2, MKL_DOMAIN_FFT ); Setting the Environment Variables for Threading Control To set the environment variables used for threading control, in the command shell in which the program is going to run, enter : export = For example: export MKL_NUM_THREADS=4 export MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_ALL=1, MKL_DOMAIN_BLAS=4" export MKL_DYNAMIC=FALSE Tips and Techniques to Improve Performance Coding Techniques To obtain the best performance with Intel MKL, ensure the following data alignment in your source code: • Align arrays on 16-byte boundaries. See Aligning Addresses on 16-byte Boundaries for how to do it. • Make sure leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16, where element_size is the size of an array element in bytes. • For two-dimensional arrays, avoid leading dimension values divisible by 2048 bytes. For example, for a double-precision array, with element_size = 8, avoid leading dimensions 256, 512, 768, 1024, … (elements). Managing Performance and Memory 5 47LAPACK Packed Routines The routines with the names that contain the letters HP, OP, PP, SP, TP, UP in the matrix type and storage position (the second and third letters respectively) operate on the matrices in the packed format (see LAPACK "Routine Naming Conventions" sections in the Intel MKL Reference Manual). Their functionality is strictly equivalent to the functionality of the unpacked routines with the names containing the letters HE, OR, PO, SY, TR, UN in the same positions, but the performance is significantly lower. If the memory restriction is not too tight, use an unpacked routine for better performance. In this case, you need to allocate N 2 /2 more memory than the memory required by a respective packed routine, where N is the problem size (the number of equations). For example, to speed up solving a symmetric eigenproblem with an expert driver, use the unpacked routine: call dsyevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z, ldz, work, lwork, iwork, ifail, info) where a is the dimension lda-by-n, which is at least N 2 elements, instead of the packed routine: call dspevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz, work, iwork, ifail, info) where ap is the dimension N*(N+1)/2. FFT Functions Additional conditions can improve performance of the FFT functions. The addresses of the first elements of arrays and the leading dimension values, in bytes (n*element_size), of two-dimensional arrays should be divisible by cache line size, which equals64 bytes. Hardware Configuration Tips Dual-Core Intel® Xeon® processor 5100 series systems To get the best performance with Intel MKL on Dual-Core Intel ® Xeon® processor 5100 series systems, enable the Hardware DPL (streaming data) Prefetcher functionality of this processor. To configure this functionality, use the appropriate BIOS settings, as described in your BIOS documentation. Intel® Hyper-Threading Technology Intel ® Hyper-Threading Technology (Intel ® HT Technology) is especially effective when each thread performs different types of operations and when there are under-utilized resources on the processor. However, Intel MKL fits neither of these criteria because the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance by disabling Intel HT Technology. If you run with Intel HT Technology enabled, performance may be especially impacted if you run on fewer threads than physical cores. Moreover, if, for example, there are two threads to every physical core, the thread scheduler may assign two threads to some cores and ignore the other cores altogether. If you are using the OpenMP* library of the Intel Compiler, read the respective User Guide on how to best set the thread affinity interface to avoid this situation. For Intel MKL, apply the following setting: set KMP_AFFINITY=granularity=fine,compact,1,0 See Also Using Parallelism of the Intel® Math Kernel Library 5 Intel® Math Kernel Library for Mac OS* X User's Guide 48Operating on Denormals The IEEE 754-2008 standard, "An IEEE Standard for Binary Floating-Point Arithmetic", defines denormal (or subnormal) numbers as non-zero numbers smaller than the smallest possible normalized numbers for a specific floating-point format. Floating-point operations on denormals are slower than on normalized operands because denormal operands and results are usually handled through a software assist mechanism rather than directly in hardware. This software processing causes Intel MKL functions that consume denormals to run slower than with normalized floating-point numbers. You can mitigate this performance issue by setting the appropriate bit fields in the MXCSR floating-point control register to flush denormals to zero (FTZ) or to replace any denormals loaded from memory with zero (DAZ). Check your compiler documentation to determine whether it has options to control FTZ and DAZ. Note that these compiler options may slightly affect accuracy. FFT Optimized Radices You can improve the performance of Intel MKL FFT if the length of your data vector permits factorization into powers of optimized radices. In Intel MKL, the optimized radices are 2, 3, 5, 7, 11, and 13. Using Memory Management Intel MKL Memory Management Software Intel MKL has memory management software that controls memory buffers for the use by the library functions. New buffers that the library allocates when your application calls Intel MKL are not deallocated until the program ends. To get the amount of memory allocated by the memory management software, call the mkl_mem_stat() function. If your program needs to free memory, call mkl_free_buffers(). If another call is made to a library function that needs a memory buffer, the memory manager again allocates the buffers and they again remain allocated until either the program ends or the program deallocates the memory. This behavior facilitates better performance. However, some tools may report this behavior as a memory leak. The memory management software is turned on by default. To turn it off, set the MKL_DISABLE_FAST_MM environment variable to any value or call the mkl_disable_fast_mm() function. Be aware that this change may negatively impact performance of some Intel MKL routines, especially for small problem sizes. Redefining Memory Functions In C/C++ programs, you can replace Intel MKL memory functions that the library uses by default with your own functions. To do this, use the memory renaming feature. Memory Renaming Intel MKL memory management by default uses standard C run-time memory functions to allocate or free memory. These functions can be replaced using memory renaming. Intel MKL accesses the memory functions by pointers i_malloc, i_free, i_calloc, and i_realloc, which are visible at the application level. These pointers initially hold addresses of the standard C run-time memory functions malloc, free, calloc, and realloc, respectively. You can programmatically redefine values of these pointers to the addresses of your application's memory management functions. Redirecting the pointers is the only correct way to use your own set of memory management functions. If you call your own memory functions without redirecting the pointers, the memory will get managed by two independent memory management packages, which may cause unexpected memory issues. Managing Performance and Memory 5 49How to Redefine Memory Functions To redefine memory functions, use the following procedure: 1. Include the i_malloc.h header file in your code. This header file contains all declarations required for replacing the memory allocation functions. The header file also describes how memory allocation can be replaced in those Intel libraries that support this feature. 2. Redefine values of pointers i_malloc, i_free, i_calloc, and i_realloc prior to the first call to MKL functions, as shown in the following example: #include "i_malloc.h" . . . i_malloc = my_malloc; i_calloc = my_calloc; i_realloc = my_realloc; i_free = my_free; . . . // Now you may call Intel MKL functions 5 Intel® Math Kernel Library for Mac OS* X User's Guide 50Language-specific Usage Options 6 The Intel® Math Kernel Library (Intel® MKL) provides broad support for Fortran and C/C++ programming. However, not all functions support both Fortran and C interfaces. For example, some LAPACK functions have no C interface. You can call such functions from C using mixed-language programming. If you want to use LAPACK or BLAS functions that support Fortran 77 in the Fortran 95 environment, additional effort may be initially required to build compiler-specific interface libraries and modules from the source code provided with Intel MKL. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Using Language-Specific Interfaces with Intel® Math Kernel Library This section discusses mixed-language programming and the use of language-specific interfaces with Intel MKL. See also Appendix G in the Intel MKL Reference Manual for details of the FFTW interfaces to Intel MKL. Interface Libraries and Modules You can create the following interface libraries and modules using the respective makefiles located in the interfaces directory. File name Contains Libraries, in Intel MKL architecture-specific directories libmkl_blas95.a 1 Fortran 95 wrappers for BLAS (BLAS95) for IA-32 architecture. libmkl_blas95_ilp64.a 1 Fortran 95 wrappers for BLAS (BLAS95) supporting LP64 interface. libmkl_blas95_lp64.a 1 Fortran 95 wrappers for BLAS (BLAS95) supporting ILP64 interface. libmkl_lapack95.a 1 Fortran 95 wrappers for LAPACK (LAPACK95) for IA-32 architecture. libmkl_lapack95_lp64.a 1 Fortran 95 wrappers for LAPACK (LAPACK95) supporting LP64 interface. libmkl_lapack95_ilp64.a 1 Fortran 95 wrappers for LAPACK (LAPACK95) supporting ILP64 interface. 51File name Contains libfftw2xc_intel.a 1 Interfaces for FFTW version 2.x (C interface for Intel compilers) to call Intel MKL FFTs. libfftw2xc_gnu.a Interfaces for FFTW version 2.x (C interface for GNU compilers) to call Intel MKL FFTs. libfftw2xf_intel.a Interfaces for FFTW version 2.x (Fortran interface for Intel compilers) to call Intel MKL FFTs. libfftw2xf_gnu.a Interfaces for FFTW version 2.x (Fortran interface for GNU compiler) to call Intel MKL FFTs. libfftw3xc_intel.a 2 Interfaces for FFTW version 3.x (C interface for Intel compiler) to call Intel MKL FFTs. libfftw3xc_gnu.a Interfaces for FFTW version 3.x (C interface for GNU compilers) to call Intel MKL FFTs. libfftw3xf_intel.a 2 Interfaces for FFTW version 3.x (Fortran interface for Intel compilers) to call Intel MKL FFTs. libfftw3xf_gnu.a Interfaces for FFTW version 3.x (Fortran interface for GNU compilers) to call Intel MKL FFTs. Modules, in architecture- and interface-specific subdirectories of the Intel MKL include directory blas95.mod 1 Fortran 95 interface module for BLAS (BLAS95). lapack95.mod 1 Fortran 95 interface module for LAPACK (LAPACK95). f95_precision.mod 1 Fortran 95 definition of precision parameters for BLAS95 and LAPACK95. mkl95_blas.mod 1 Fortran 95 interface module for BLAS (BLAS95), identical to blas95.mod. To be removed in one of the future releases. mkl95_lapack.mod 1 Fortran 95 interface module for LAPACK (LAPACK95), identical to lapack95.mod. To be removed in one of the future releases. mkl95_precision.mod 1 Fortran 95 definition of precision parameters for BLAS95 and LAPACK95, identical to f95_precision.mod. To be removed in one of the future releases. mkl_service.mod 1 Fortran 95 interface module for Intel MKL support functions. 1 Prebuilt for the Intel® Fortran compiler 2 FFTW3 interfaces are integrated with Intel MKL. Look into /interfaces/fftw3x*/ makefile for options defining how to build and where to place the standalone library with the wrappers. See Also Fortran 95 Interfaces to LAPACK and BLAS Fortran 95 Interfaces to LAPACK and BLAS Fortran 95 interfaces are compiler-dependent. Intel MKL provides the interface libraries and modules precompiled with the Intel® Fortran compiler. Additionally, the Fortran 95 interfaces and wrappers are delivered as sources. (For more information, see Compiler-dependent Functions and Fortran 90 Modules). If you are using a different compiler, build the appropriate library and modules with your compiler and link the library as a user's library: 1. Go to the respective directory /interfaces/blas95 or / interfaces/lapack95 6 Intel® Math Kernel Library for Mac OS* X User's Guide 522. Type one of the following commands depending on your architecture: • For the IA-32 architecture, make libia32 INSTALL_DIR= • For the Intel® 64 architecture, make libintel64 [interface=lp64|ilp64] INSTALL_DIR= Important The parameter INSTALL_DIR is required. As a result, the required library is built and installed in the /lib directory, and the .mod files are built and installed in the /include/[/{lp64|ilp64}] directory, where is one of {ia32, intel64}. By default, the ifort compiler is assumed. You may change the compiler with an additional parameter of make: FC=. For example, the command make libintel64 FC=pgf95 INSTALL_DIR= interface=lp64 builds the required library and .mod files and installs them in subdirectories of . To delete the library from the building directory, use one of the following commands: • For the IA-32 architecture, make cleania32 INSTALL_DIR= • For the Intel ® 64 architecture, make cleanintel64 [interface=lp64|ilp64] INSTALL_DIR= • For all the architectures, make clean INSTALL_DIR= CAUTION Even if you have administrative rights, avoid setting INSTALL_DIR=../.. or INSTALL_DIR= in a build or clean command above because these settings replace or delete the Intel MKL prebuilt Fortran 95 library and modules. Compiler-dependent Functions and Fortran 90 Modules Compiler-dependent functions occur whenever the compiler inserts into the object code function calls that are resolved in its run-time library (RTL). Linking of such code without the appropriate RTL will result in undefined symbols. Intel MKL has been designed to minimize RTL dependencies. In cases where RTL dependencies might arise, the functions are delivered as source code and you need to compile the code with whatever compiler you are using for your application. In particular, Fortran 90 modules result in the compiler-specific code generation requiring RTL support. Therefore, Intel MKL delivers these modules compiled with the Intel compiler, along with source code, to be used with different compilers. Mixed-language Programming with the Intel Math Kernel Library Appendix A: Intel(R) Math Kernel Library Language Interfaces Support lists the programming languages supported for each Intel MKL function domain. However, you can call Intel MKL routines from different language environments. Language-specific Usage Options 6 53Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments Not all Intel MKL function domains support both C and Fortran environments. To use Intel MKL Fortran-style functions in C/C++ environments, you should observe certain conventions, which are discussed for LAPACK and BLAS in the subsections below. CAUTION Avoid calling BLAS 95/LAPACK 95 from C/C++. Such calls require skills in manipulating the descriptor of a deferred-shape array, which is the Fortran 90 type. Moreover, BLAS95/LAPACK95 routines contain links to a Fortran RTL. LAPACK and BLAS Because LAPACK and BLAS routines are Fortran-style, when calling them from C-language programs, follow the Fortran-style calling conventions: • Pass variables by address, not by value. Function calls in Example "Calling a Complex BLAS Level 1 Function from C++" and Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" illustrate this. • Store your data in Fortran style, that is, column-major rather than row-major order. With row-major order, adopted in C, the last array index changes most quickly and the first one changes most slowly when traversing the memory segment where the array is stored. With Fortran-style columnmajor order, the last index changes most slowly whereas the first index changes most quickly (as illustrated by the figure below for a two-dimensional array). For example, if a two-dimensional matrix A of size mxn is stored densely in a one-dimensional array B, you can access a matrix element like this: A[i][j] = B[i*n+j] in C ( i=0, ... , m-1, j=0, ... , -1) A(i,j) = B(j*m+i) in Fortran ( i=1, ... , m, j=1, ... , n). When calling LAPACK or BLAS routines from C, be aware that because the Fortran language is caseinsensitive, the routine names can be both upper-case or lower-case, with or without the trailing underscore. For example, the following names are equivalent: • LAPACK: dgetrf, DGETRF, dgetrf_, and DGETRF_ • BLAS: dgemm, DGEMM, dgemm_, and DGEMM_ See Example "Calling a Complex BLAS Level 1 Function from C++" on how to call BLAS routines from C. See also the Intel(R) MKL Reference Manual for a description of the C interface to LAPACK functions. CBLAS Instead of calling BLAS routines from a C-language program, you can use the CBLAS interface. 6 Intel® Math Kernel Library for Mac OS* X User's Guide 54CBLAS is a C-style interface to the BLAS routines. You can call CBLAS routines using regular C-style calls. Use the mkl.h header file with the CBLAS interface. The header file specifies enumerated values and prototypes of all the functions. It also determines whether the program is being compiled with a C++ compiler, and if it is, the included file will be correct for use with C++ compilation. Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" illustrates the use of the CBLAS interface. C Interface to LAPACK Instead of calling LAPACK routines from a C-language program, you can use the C interface to LAPACK provided by Intel MKL. The C interface to LAPACK is a C-style interface to the LAPACK routines. This interface supports matrices in row-major and column-major order, which you can define in the first function argument matrix_order. Use the mkl_lapacke.h header file with the C interface to LAPACK. The header file specifies constants and prototypes of all the functions. It also determines whether the program is being compiled with a C++ compiler, and if it is, the included file will be correct for use with C++ compilation. You can find examples of the C interface to LAPACK in the examples/lapacke subdirectory in the Intel MKL installation directory. Using Complex Types in C/C++ As described in the documentation for the Intel® Fortran Compiler XE, C/C++ does not directly implement the Fortran types COMPLEX(4) and COMPLEX(8). However, you can write equivalent structures. The type COMPLEX(4) consists of two 4-byte floating-point numbers. The first of them is the real-number component, and the second one is the imaginary-number component. The type COMPLEX(8) is similar to COMPLEX(4) except that it contains two 8-byte floating-point numbers. Intel MKL provides complex types MKL_Complex8 and MKL_Complex16, which are structures equivalent to the Fortran complex types COMPLEX(4) and COMPLEX(8), respectively. The MKL_Complex8 and MKL_Complex16 types are defined in the mkl_types.h header file. You can use these types to define complex data. You can also redefine the types with your own types before including the mkl_types.h header file. The only requirement is that the types must be compatible with the Fortran complex layout, that is, the complex type must be a pair of real numbers for the values of real and imaginary parts. For example, you can use the following definitions in your C++ code: #define MKL_Complex8 std::complex and #define MKL_Complex16 std::complex See Example "Calling a Complex BLAS Level 1 Function from C++" for details. You can also define these types in the command line: -DMKL_Complex8="std::complex" -DMKL_Complex16="std::complex" See Also Intel® Software Documentation Library Calling BLAS Functions that Return the Complex Values in C/C++ Code Complex values that functions return are handled differently in C and Fortran. Because BLAS is Fortran-style, you need to be careful when handling a call from C to a BLAS function that returns complex values. However, in addition to normal function calls, Fortran enables calling functions as though they were subroutines, which provides a mechanism for returning the complex value correctly when the function is called from a C program. When a Fortran function is called as a subroutine, the return value is the first parameter in the calling sequence. You can use this feature to call a BLAS function from C. The following example shows how a call to a Fortran function as a subroutine converts to a call from C and the hidden parameter result gets exposed: Language-specific Usage Options 6 55Normal Fortran function call: result = cdotc( n, x, 1, y, 1 ) A call to the function as a subroutine: call cdotc( result, n, x, 1, y, 1) A call to the function from C: cdotc( &result, &n, x, &one, y, &one ) NOTE Intel MKL has both upper-case and lower-case entry points in the Fortran-style (caseinsensitive) BLAS, with or without the trailing underscore. So, all these names are equivalent and acceptable: cdotc, CDOTC, cdotc_, and CDOTC_. The above example shows one of the ways to call several level 1 BLAS functions that return complex values from your C and C++ applications. An easier way is to use the CBLAS interface. For instance, you can call the same function using the CBLAS interface as follows: cblas_cdotu( n, x, 1, y, 1, &result ) NOTE The complex value comes last on the argument list in this case. The following examples show use of the Fortran-style BLAS interface from C and C++, as well as the CBLAS (C language) interface: • Example "Calling a Complex BLAS Level 1 Function from C" • Example "Calling a Complex BLAS Level 1 Function from C++" • Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" Example "Calling a Complex BLAS Level 1 Function from C" The example below illustrates a call from a C program to the complex BLAS Level 1 function zdotc(). This function computes the dot product of two double-precision complex vectors. In this example, the complex dot product is returned in the structure c. #include "mkl.h" #define N 5 int main() { int n = N, inca = 1, incb = 1, i; MKL_Complex16 a[N], b[N], c; for( i = 0; i < n; i++ ){ a[i].real = (double)i; a[i].imag = (double)i * 2.0; b[i].real = (double)(n - i); b[i].imag = (double)i * 2.0; } zdotc( &c, &n, a, &inca, b, &incb ); printf( "The complex dot product is: ( %6.2f, %6.2f)\n", c.real, c.imag ); return 0; } Example "Calling a Complex BLAS Level 1 Function from C++" Below is the C++ implementation: #include #include #define MKL_Complex16 std::complex #include "mkl.h" #define N 5 int main() { int n, inca = 1, incb = 1, i; std::complex a[N], b[N], c; n = N; 6 Intel® Math Kernel Library for Mac OS* X User's Guide 56 for( i = 0; i < n; i++ ){ a[i] = std::complex(i,i*2.0); b[i] = std::complex(n-i,i*2.0); } zdotc(&c, &n, a, &inca, b, &incb ); std::cout << "The complex dot product is: " << c << std::endl; return 0; } Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" This example uses CBLAS: #include #include "mkl.h" typedef struct{ double re; double im; } complex16; #define N 5 int main() { int n, inca = 1, incb = 1, i; complex16 a[N], b[N], c; n = N; for( i = 0; i < n; i++ ){ a[i].re = (double)i; a[i].im = (double)i * 2.0; b[i].re = (double)(n - i); b[i].im = (double)i * 2.0; } cblas_zdotc_sub(n, a, inca, b, incb, &c ); printf( "The complex dot product is: ( %6.2f, %6.2f)\n", c.re, c.im ); return 0; } Support for Boost uBLAS Matrix-matrix Multiplication If you are used to uBLAS, you can perform BLAS matrix-matrix multiplication in C++ using Intel MKL substitution of Boost uBLAS functions. uBLAS is the Boost C++ open-source library that provides BLAS functionality for dense, packed, and sparse matrices. The library uses an expression template technique for passing expressions as function arguments, which enables evaluating vector and matrix expressions in one pass without temporary matrices. uBLAS provides two modes: • Debug (safe) mode, default. Checks types and conformance. • Release (fast) mode. Does not check types and conformance. To enable this mode, use the NDEBUG preprocessor symbol. The documentation for the Boost uBLAS is available at www.boost.org. Intel MKL provides overloaded prod() functions for substituting uBLAS dense matrix-matrix multiplication with the Intel MKL gemm calls. Though these functions break uBLAS expression templates and introduce temporary matrices, the performance advantage can be considerable for matrix sizes that are not too small (roughly, over 50). You do not need to change your source code to use the functions. To call them: • Include the header file mkl_boost_ublas_matrix_prod.hpp in your code (from the Intel MKL include directory) • Add appropriate Intel MKL libraries to the link line. The list of expressions that are substituted follows: prod( m1, m2 ) prod( trans(m1), m2 ) prod( trans(conj(m1)), m2 ) prod( conj(trans(m1)), m2 ) Language-specific Usage Options 6 57prod( m1, trans(m2) ) prod( trans(m1), trans(m2) ) prod( trans(conj(m1)), trans(m2) ) prod( conj(trans(m1)), trans(m2) ) prod( m1, trans(conj(m2)) ) prod( trans(m1), trans(conj(m2)) ) prod( trans(conj(m1)), trans(conj(m2)) ) prod( conj(trans(m1)), trans(conj(m2)) ) prod( m1, conj(trans(m2)) ) prod( trans(m1), conj(trans(m2)) ) prod( trans(conj(m1)), conj(trans(m2)) ) prod( conj(trans(m1)), conj(trans(m2)) ) These expressions are substituted in the release mode only (with NDEBUG preprocessor symbol defined). Supported uBLAS versions are Boost 1.34.1 and higher. To get them, visit www.boost.org. A code example provided in the /examples/ublas/source/sylvester.cpp file illustrates usage of the Intel MKL uBLAS header file for solving a special case of the Sylvester equation. To run the Intel MKL ublas examples, specify the BOOST_ROOT parameter in the make command, for instance, when using Boost version 1.37.0: make libia32 BOOST_ROOT = /boost_1_37_0 See Also Using Code Examples Invoking Intel MKL Functions from Java* Applications Intel MKL Java* Examples To demonstrate binding with Java, Intel MKL includes a set of Java examples in the following directory: /examples/java. The examples are provided for the following MKL functions: • ?gemm, ?gemv, and ?dot families from CBLAS • The complete set of FFT functions • ESSL 1 -like functions for one-dimensional convolution and correlation • VSL Random Number Generators (RNG), except user-defined ones and file subroutines • VML functions, except GetErrorCallBack, SetErrorCallBack, and ClearErrorCallBack You can see the example sources in the following directory: /examples/java/examples. The examples are written in Java. They demonstrate usage of the MKL functions with the following variety of data: • 1- and 2-dimensional data sequences • Real and complex types of the data • Single and double precision However, the wrappers, used in the examples, do not: • Demonstrate the use of large arrays (>2 billion elements) • Demonstrate processing of arrays in native memory 6 Intel® Math Kernel Library for Mac OS* X User's Guide 58• Check correctness of function parameters • Demonstrate performance optimizations The examples use the Java Native Interface (JNI* developer framework) to bind with Intel MKL. The JNI documentation is available from http://java.sun.com/javase/6/docs/technotes/guides/jni/. The Java example set includes JNI wrappers that perform the binding. The wrappers do not depend on the examples and may be used in your Java applications. The wrappers for CBLAS, FFT, VML, VSL RNG, and ESSL-like convolution and correlation functions do not depend on each other. To build the wrappers, just run the examples. The makefile builds the wrapper binaries. After running the makefile, you can run the examples, which will determine whether the wrappers were built correctly. As a result of running the examples, the following directories will be created in /examples/ java: • docs • include • classes • bin • _results The directories docs, include, classes, and bin will contain the wrapper binaries and documentation; the directory _results will contain the testing results. For a Java programmer, the wrappers are the following Java classes: • com.intel.mkl.CBLAS • com.intel.mkl.DFTI • com.intel.mkl.ESSL • com.intel.mkl.VML • com.intel.mkl.VSL Documentation for the particular wrapper and example classes will be generated from the Java sources while building and running the examples. To browse the documentation, open the index file in the docs directory (created by the build script): /examples/java/docs/index.html. The Java wrappers for CBLAS, VML, VSL RNG, and FFT establish the interface that directly corresponds to the underlying native functions, so you can refer to the Intel MKL Reference Manual for their functionality and parameters. Interfaces for the ESSL-like functions are described in the generated documentation for the com.intel.mkl.ESSL class. Each wrapper consists of the interface part for Java and JNI stub written in C. You can find the sources in the following directory: /examples/java/wrappers. Both Java and C parts of the wrapper for CBLAS and VML demonstrate the straightforward approach, which you may use to cover additional CBLAS functions. The wrapper for FFT is more complicated because it needs to support the lifecycle for FFT descriptor objects. To compute a single Fourier transform, an application needs to call the FFT software several times with the same copy of the native FFT descriptor. The wrapper provides the handler class to hold the native descriptor, while the virtual machine runs Java bytecode. The wrapper for VSL RNG is similar to the one for FFT. The wrapper provides the handler class to hold the native descriptor of the stream state. The wrapper for the convolution and correlation functions mitigates the same difficulty of the VSL interface, which assumes a similar lifecycle for "task descriptors". The wrapper utilizes the ESSL-like interface for those functions, which is simpler for the case of 1-dimensional data. The JNI stub additionally encapsulates the MKL functions into the ESSL-like wrappers written in C and so "packs" the lifecycle of a task descriptor into a single call to the native method. The wrappers meet the JNI Specification versions 1.1 and 5.0 and should work with virtually every modern implementation of Java. Language-specific Usage Options 6 59The examples and the Java part of the wrappers are written for the Java language described in "The Java Language Specification (First Edition)" and extended with the feature of "inner classes" (this refers to late 1990s). This level of language version is supported by all versions of the Sun Java Development Kit* (JDK*) developer toolkit and compatible implementations starting from version 1.1.5, or by all modern versions of Java. The level of C language is "Standard C" (that is, C89) with additional assumptions about integer and floatingpoint data types required by the Intel MKL interfaces and the JNI header files. That is, the native float and double data types must be the same as JNI jfloat and jdouble data types, respectively, and the native int must be 4 bytes long. 1 IBM Engineering Scientific Subroutine Library (ESSL*). See Also Running the Java* Examples Running the Java* Examples The Java examples support all the C and C++ compilers that Intel MKL does. The makefile intended to run the examples also needs the make utility, which is typically provided with the Mac OS* X distribution. To run Java examples, the JDK* developer toolkit is required for compiling and running Java code. A Java implementation must be installed on the computer or available via the network. You may download the JDK from the vendor website. The examples should work for all versions of JDK. However, they were tested only with the following Java implementation for all the supported architectures: • J2SE* SDK 1.4.2 and JDK 5.0 from Apple Computer, Inc. (http://apple.com/). Note that the Java run-time environment* (JRE*) system, which may be pre-installed on your computer, is not enough. You need the JDK* developer toolkit that supports the following set of tools: • java • javac • javah • javadoc To make these tools available for the examples makefile, set the JAVA_HOME environment variable and add the JDK binaries directory to the system PATH, for example : export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.5/Home export PATH=${JAVA_HOME}/bin:${PATH} You may also need to clear the JDK_HOME environment variable, if it is assigned a value: unset JDK_HOME To start the examples, use the makefile found in the Intel MKL Java examples directory: make {dylibia32|libia32} [function=...] [compiler=...] If you type the make command and omit the target (for example, dylibia32), the makefile prints the help info, which explains the targets and parameters. For the examples list, see the examples.lst file in the Java examples directory. Known Limitations of the Java* Examples This section explains limitations of Java examples. 6 Intel® Math Kernel Library for Mac OS* X User's Guide 60Functionality Some Intel MKL functions may fail to work if called from the Java environment by using a wrapper, like those provided with the Intel MKL Java examples. Only those specific CBLAS, FFT, VML, VSL RNG, and the convolution/correlation functions listed in the Intel MKL Java Examples section were tested with the Java environment. So, you may use the Java wrappers for these CBLAS, FFT, VML, VSL RNG, and convolution/ correlation functions in your Java applications. Performance The Intel MKL functions must work faster than similar functions written in pure Java. However, the main goal of these wrappers is to provide code examples, not maximum performance. So, an Intel MKL function called from a Java application will probably work slower than the same function called from a program written in C/ C++ or Fortran. Known bugs There are a number of known bugs in Intel MKL (identified in the Release Notes), as well as incompatibilities between different versions of JDK. The examples and wrappers include workarounds for these problems. Look at the source code in the examples and wrappers for comments that describe the workarounds. Language-specific Usage Options 6 616 Intel® Math Kernel Library for Mac OS* X User's Guide 62Coding Tips 7 This section discusses programming with the Intel® Math Kernel Library (Intel® MKL) to provide coding tips that meet certain, specific needs, such as consistent results of computations or conditional compilation. Aligning Data for Consistent Results Routines in Intel MKL may return different results from run-to-run on the same system. This is usually due to a change in the order in which floating-point operations are performed. The two most influential factors are array alignment and parallelism. Array alignment can determine how internal loops order floating-point operations. Non-deterministic parallelism may change the order in which computational tasks are executed. While these results may differ, they should still fall within acceptable computational error bounds. To better assure identical results from run-to-run, do the following: • Align input arrays on 16-byte boundaries • Run Intel MKL in the sequential mode To align input arrays on 16-byte boundaries, use mkl_malloc() in place of system provided memory allocators, as shown in the code example below. Sequential mode of Intel MKL removes the influence of nondeterministic parallelism. Aligning Addresses on 16-byte Boundaries // ******* C language ******* ... #include ... void *darray; int workspace; ... // Allocate workspace aligned on 16-byte boundary darray = mkl_malloc( sizeof(double)*workspace, 16 ); ... // call the program using MKL mkl_app( darray ); ... // Free workspace mkl_free( darray ); ! ******* Fortran language ******* ... double precision darray pointer (p_wrk,darray(1)) integer workspace ... ! Allocate workspace aligned on 16-byte boundary p_wrk = mkl_malloc( 8*workspace, 16 ) ... ! call the program using MKL call mkl_app( darray ) ... ! Free workspace call mkl_free(p_wrk) 63Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation Preprocessor symbols (macros) substitute values in a program before it is compiled. The substitution is performed in the preprocessing phase. The following preprocessor symbols are available: Predefined Preprocessor Symbol Description __INTEL_MKL__ Intel MKL major version __INTEL_MKL_MINOR__ Intel MKL minor version __INTEL_MKL_UPDATE__ Intel MKL update number INTEL_MKL_VERSION Intel MKL full version in the following format: INTEL_MKL_VERSION = (__INTEL_MKL__*100+__INTEL_MKL_MINOR__)*100+__I NTEL_MKL_UPDATE__ These symbols enable conditional compilation of code that uses new features introduced in a particular version of the library. To perform conditional compilation: 1. Include in your code the file where the macros are defined: • mkl.h for C/C++ • mkl.fi for Fortran 2. [Optionally] Use the following preprocessor directives to check whether the macro is defined: • #ifdef, #endif for C/C++ • !DEC$IF DEFINED, !DEC$ENDIF for Fortran 3. Use preprocessor directives for conditional inclusion of code: • #if, #endif for C/C++ • !DEC$IF, !DEC$ENDIF for Fortran Example Compile a part of the code if Intel MKL version is MKL 10.3 update 4: C/C++: #include "mkl.h" #ifdef INTEL_MKL_VERSION #if INTEL_MKL_VERSION == 100304 // Code to be conditionally compiled #endif #endif Fortran: include "mkl.fi" !DEC$IF DEFINED INTEL_MKL_VERSION !DEC$IF INTEL_MKL_VERSION .EQ. 100304 * Code to be conditionally compiled !DEC$ENDIF !DEC$ENDIF 7 Intel® Math Kernel Library for Mac OS* X User's Guide 64Configuring Your Integrated Development Environment to Link with Intel Math Kernel Library 8 Configuring the Apple Xcode* Developer Software to Link with Intel® Math Kernel Library This section provides information on linking Intel MKL with the Apple Xcode* developer software. Please note that the screen shots are from Apple Xcode* 2.4 and may be different in other versions, whereas the fundamental steps to configuring Xcode* for use with Intel MKL are more widely applicable: 1. Open your project that uses Intel MKL. 2. Under Targets, double-click the active target. In the Target dialog box, assign values to the build settings as explained in the next steps. 3. Click the plus icon under the Build Settings table, located at the bottom of the dialog box, to add a row. In the new row, type HEADER_SEARCH_PATHS under Name and the path to the Intel® MKL include files, that is, /include, under Value: 654. Click the plus icon under the Build Settings table to add another row, in which type LIBRARY_SEARCH_PATHS under Name and the path to the Intel MKL libraries, such as /lib, under Value. 5. Double-click OTHER_LDFLAGS under Name and under Value, type linker options for additional libraries (for example, -lmkl_core -lguide -lpthread). 6. (Optional, needed only for dynamic linking) Under Executables, double-click the active executable, click the Arguments tab, and under Variables to be set in the environment, add DYLD_LIBRARY_PATH with the value of /lib. See Also Notational Conventions Linking in Detail 8 Intel® Math Kernel Library for Mac OS* X User's Guide 66Intel® Optimized LINPACK Benchmark for Mac OS* X 9 Intel® Optimized LINPACK Benchmark is a generalization of the LINPACK 1000 benchmark. It solves a dense (real*8) system of linear equations (Ax=b), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. The generalization is in the number of equations (N) it can solve, which is not limited to 1000. It uses partial pivoting to assure the accuracy of the results. Do not use this benchmark to report LINPACK 100 performance because that is a compiled-code only benchmark. This is a shared-memory (SMP) implementation which runs on a single platform. Do not confuse this benchmark with LINPACK, the library, which has been expanded upon by the LAPACK library. Intel provides optimized versions of the LINPACK benchmarks to help you obtain high LINPACK benchmark results on your genuine Intel processor systems more easily than with the High Performance Linpack (HPL) benchmark. Use this package to benchmark your SMP machine. Additional information on this software as well as other Intel software performance products is available at http://www.intel.com/software/products/. Contents of the Intel® Optimized LINPACK Benchmark The Intel Optimized LINPACK Benchmark for Mac OS* X contains the following files, located in the ./ benchmarks/linpack/ subdirectory of the Intel® Math Kernel Library (Intel® MKL) directory: File in ./benchmarks/ linpack/ Description linpack_cd32.app The 32-bit program executable for a system using Intel® Core™ Duo processor on Mac OS* X. linpack_cd64.app The 64-bit program executable for a system using Intel® Core™ microarchitecture on Mac OS* X. runme32 A sample shell script for executing a pre-determined problem set for linpack_cd32.appOMP_NUM_THREADS set to 2 cores. runme64 A sample shell script for executing a pre-determined problem set for linpack_cd64.appOMP_NUM_THREADS set to 2 cores. lininput Input file for pre-determined problem for the runme32 script. lin_cd32.txt Result of the runme32 script execution. lin_cd64.txt Result of the runme64 script execution. help.lpk Simple help file. xhelp.lpk Extended help file. See Also High-level Directory Structure Running the Software To obtain results for the pre-determined sample problem sizes on a given system, type one of the following, as appropriate: 67./runme32 ./runme64 To run the software for other problem sizes, see the extended help included with the program. Extended help can be viewed by running the program executable with the -e option: ./linpack_cd32.app -e ./linpack_cd64.app -e The pre-defined data input filelininput is provided merely as an example. Different systems have different amount of memory and thus require new input files. The extended help can be used for insight into proper ways to change the sample input files. lininput requires at least 2 GB of memory. If the system has less memory than the above sample data input requires, you may need to edit or create your own data input files, as explained in the extended help. Each sample script uses the OMP_NUM_THREADS environment variable to set the number of processors it is targeting. To optimize performance on a different number of physical processors, change that line appropriately. If you run the Intel Optimized LINPACK Benchmark without setting the number of threads, it will default to the number of cores according to the OS. You can find the settings for this environment variable in the runme* sample scripts. If the settings do not yet match the situation for your machine, edit the script. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Known Limitations of the Intel® Optimized LINPACK Benchmark The following limitations are known for the Intel Optimized LINPACK Benchmark for Mac OS* X: • If an incomplete data input file is given, the binaries may either hang or fault. See the sample data input files and/or the extended help for insight into creating a correct data input file. • The binary will hang if it is not given an input file or any other arguments. 9 Intel® Math Kernel Library for Mac OS* X User's Guide 68Intel® Math Kernel Library Language Interfaces Support A Language Interfaces Support, by Function Domain The following table shows language interfaces that Intel® Math Kernel Library (Intel® MKL) provides for each function domain. However, Intel MKL routines can be called from other languages using mixed-language programming. See Mixed-language Programming with Intel® MKL for an example of how to call Fortran routines from C/C++. Function Domain FORTRAN 77 interface Fortran 9 0/95 interface C/C++ interface Basic Linear Algebra Subprograms (BLAS) Yes Yes via CBLAS BLAS-like extension transposition routines Yes Yes Sparse BLAS Level 1 Yes Yes via CBLAS Sparse BLAS Level 2 and 3 Yes Yes Yes LAPACK routines for solving systems of linear equations Yes Yes Yes LAPACK routines for solving least-squares problems, eigenvalue and singular value problems, and Sylvester's equations Yes Yes Yes Auxiliary and utility LAPACK routines Yes Yes DSS/PARDISO* solvers Yes Yes Yes Other Direct and Iterative Sparse Solver routines Yes Yes Yes Vector Mathematical Library (VML) functions Yes Yes Yes Vector Statistical Library (VSL) functions Yes Yes Yes Fourier Transform functions (FFT) Yes Yes Trigonometric Transform routines Yes Yes Fast Poisson, Laplace, and Helmholtz Solver (Poisson Library) routines Yes Yes Optimization (Trust-Region) Solver routines Yes Yes Yes Data Fitting functions Yes Yes Yes GMP* arithmetic functions †† Yes Support functions (including memory allocation) Yes Yes Yes †† GMP Arithmetic Functions are deprecated and will be removed in a future release. 69Include Files Function domain Fortran Include Files C/C++ Include Files All function domains mkl.fi mkl.h BLAS Routines blas.f90 mkl_blas.fi mkl_blas.h BLAS-like Extension Transposition Routines mkl_trans.fi mkl_trans.h CBLAS Interface to BLAS mkl_cblas.h Sparse BLAS Routines mkl_spblas.fi mkl_spblas.h LAPACK Routines lapack.f90 mkl_lapack.fi mkl_lapack.h C Interface to LAPACK mkl_lapacke.h All Sparse Solver Routines mkl_solver.f90 mkl_solver.h PARDISO mkl_pardiso.f77 mkl_pardiso.f90 mkl_pardiso.h DSS Interface mkl_dss.f77 mkl_dss.f90 mkl_dss.h RCI Iterative Solvers ILU Factorization mkl_rci.fi mkl_rci.h Optimization Solver Routines mkl_rci.fi mkl_rci.h Vector Mathematical Functions mkl_vml.f77 mkl_vml.90 mkl_vml.h Vector Statistical Functions mkl_vsl.f77 mkl_vsl.f90 mkl_vsl_functions.h Fourier Transform Functions mkl_dfti.f90 mkl_dfti.h Partial Differential Equations Support Routines Trigonometric Transforms mkl_trig_transforms.f90 mkl_trig_transforms.h Poisson Solvers mkl_poisson.f90 mkl_poisson.h Data Fitting functions mkl_df.f77 mkl_df.f90 mkl_df.h GMP interface † mkl_gmp.h Support functions mkl_service.f90 mkl_service.fi mkl_service.h Memory allocation routines i_malloc.h Intel MKL examples interface mkl_example.h † GMP Arithmetic Functions are deprecated and will be removed in a future release. A Intel® Math Kernel Library for Mac OS* X User's Guide 70See Also Language Interfaces Support, by Function Domain Intel® Math Kernel Library Language Interfaces Support A 71A Intel® Math Kernel Library for Mac OS* X User's Guide 72Support for Third-Party Interfaces B GMP* Functions Intel® Math Kernel Library (Intel® MKL) implementation of GMP* arithmetic functions includes arbitrary precision arithmetic operations on integer numbers. The interfaces of such functions fully match the GNU Multiple Precision* (GMP) Arithmetic Library. For specifications of these functions, please see http:// software.intel.com/sites/products/documentation/hpc/mkl/gnump/index.htm. NOTE Intel MKL GMP Arithmetic Functions are deprecated and will be removed in a future release. If you currently use the GMP* library, you need to modify INCLUDE statements in your programs to mkl_gmp.h. FFTW Interface Support Intel® Math Kernel Library (Intel® MKL) offers two collections of wrappers for the FFTW interface (www.fftw.org). The wrappers are the superstructure of FFTW to be used for calling the Intel MKL Fourier transform functions. These collections correspond to the FFTW versions 2.x and 3.x and the Intel MKL versions 7.0 and later. These wrappers enable using Intel MKL Fourier transforms to improve the performance of programs that use FFTW without changing the program source code. See the "FFTW Interface to Intel® Math Kernel Library" appendix in the Intel MKL Reference Manual for details on the use of the wrappers. Important For ease of use, FFTW3 interface is also integrated in Intel MKL. 73B Intel® Math Kernel Library for Mac OS* X User's Guide 74Directory Structure in Detail C Tables in this section show contents of the /lib directory. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Static Libraries in the lib directory File Contents Interface layer libmkl_intel.a Interface library for the Intel compilers. To be used on Intel® 64 architecture systems to support LP64 interface or on IA-32 architecture systems. libmkl_intel_lp64.a Interface library for the Intel compilers. To be used on Intel® 64 architecture systems to support LP64 interface or on IA-32 architecture systems. libmkl_intel_ilp64.a Interface library for the Intel compilers. To be used on Intel® 64 architecture systems to support ILP64 interface or on IA-32 architecture systems. libmkl_intel_sp2dp.a SP2DP interface library for the Intel compilers. Threading layer libmkl_intel_thread.a Threading library for the Intel compilers libmkl_pgi_thread.a Threading library for the PGI* compiler libmkl_sequential.a Sequential library Computational layer libmkl_core.a Kernel library libmkl_solver_lp64.a Deprecated. Empty library for backward compatibility libmkl_solver_lp64_sequential.a Deprecated. Empty library for backward compatibility libmkl_solver_ilp64.a Deprecated. Empty library for backward compatibility libmkl_solver_ilp64_sequential.a Deprecated. Empty library for backward compatibility 75Dynamic Libraries in the lib directory File Contents libmkl_rt.dylib Single Dynamic Library Interface layer libmkl_intel.dylib Interface library for the Intel compilers. To be used on Intel® 64 architecture systems to support LP64 interface or on IA-32 architecture systems. libmkl_intel_lp64.dylib Interface library for the Intel compilers. To be used on Intel® 64 architecture systems to support LP64 interface or on IA-32 architecture systems. libmkl_intel_ilp64.dylib Interface library for the Intel compilers. To be used on Intel® 64 architecture systems to support ILP64 interface or on IA-32 architecture systems. libmkl_intel_sp2dp.dylib SP2DP interface library for the Intel compilers. Threading layer libmkl_intel_thread.dylib Threading library for the Intel compilers libmkl_sequential.dylib Sequential library Computational layer libmkl_core.dylib Contains the dispatcher for dynamic load of the processor-specific kernel library libmkl_lapack.dylib LAPACK and DSS/PARDISO routines and drivers libmkl_mc.dylib 64-bit kernel for processors based on the Intel® Core™ microarchitecture libmkl_mc3.dylib 64-bit kernel for the Intel® Core™ i7 processors libmkl_p4p.dylib 32-bit kernel for the Intel® Pentium® 4 processor with Streaming SIMD Extensions 3 (SSE3), including Intel® Core™ Duo and Intel® Core™ Solo processors. libmkl_p4m.dylib 32-bit kernel for the Intel® Core™ microarchitecture libmkl_p4m3.dylib 32-bit kernel library for the Intel® Core™ i7 processors libmkl_vml_mc.dylib 64-bit VML for processors based on the Intel® Core™ microarchitecture libmkl_vml_mc2.dylib 64-bit VML/VSL for 45nm Hi-k Intel® Core™2 and the Intel Xeon® processor families libmkl_vml_mc3.dylib 64-bit VML/VSL for the Intel® Core™ i7 processors libmkl_vml_p4p.dylib 32-bit VML for the Intel® Pentium® 4 processor with Streaming SIMD Extensions 3 (SSE3) libmkl_vml_p4m.dylib 32-bit VML for processors based on Intel® Core™ microarchitecture libmkl_vml_p4m2.dylib 32-bit VML/VSL for 45nm Hi-k Intel® Core™2 and Intel Xeon® processor families C Intel® Math Kernel Library for Mac OS* X User's Guide 76File Contents libmkl_vml_p4m3.dylib 32-bit VML/VSL for the Intel® Core™ i7 processors libmkl_vml_avx.dylib VML/VSL optimized for the Intel® Advanced Vector Extensions (Intel® AVX) RTL locale/en_US/mkl_msg.cat Catalog of Intel® Math Kernel Library (Intel® MKL) messages in English Directory Structure in Detail C 77C Intel® Math Kernel Library for Mac OS* X User's Guide 78Index A aligning data 63 architecture support 21 B BLAS calling routines from C 54 Fortran 95 interface to 52 threaded routines 39 C C interface to LAPACK, use of 54 C, calling LAPACK, BLAS, CBLAS from 54 C/C++, Intel(R) MKL complex types 55 calling BLAS functions from C 55 CBLAS interface from C 55 complex BLAS Level 1 function from C 55 complex BLAS Level 1 function from C++ 55 Fortran-style routines from C 54 CBLAS interface, use of 54 code examples, use of 19 coding data alignment techniques to improve performance 47 compilation, Intel(R) MKL version-dependent 64 compiler run-time libraries, linking with 34 compiler-dependent function 53 complex types in C and C++, Intel(R) MKL 55 computation results, consistency 63 conditional compilation 64 consistent results 63 conventions, notational 13 custom dynamically linked shared library building 35 composing list of functions 36 specifying function names 36 D denormal number, performance 49 directory structure documentation 23 high-level 21 in-detail documentation directories, contents 23 man pages 24 E Enter index keyword 25 environment variables, setting 17 examples, linking 27 F FFT interface data alignment 47 optimised radices 49 threaded problems 39 FFTW interface support 73 Fortran 95 interface libraries 33 G GNU* Multiple Precision Arithmetic Library 73 H header files, Intel(R) MKL 70 HT technology, configuration tip 48 I ILP64 programming, support for 31 include files, Intel(R) MKL 70 installation, checking 17 Intel(R) Hyper-Threading Technology, configuration tip 48 interface Fortran 95, libraries 33 LP64 and ILP64, use of 31 interface libraries and modules, Intel(R) MKL 51 interface libraries, linking with 31 J Java* examples 58 L language interfaces support 69 language-specific interfaces interface libraries and modules 51 LAPACK C interface to, use of 54 calling routines from C 54 Fortran 95 interface to 52 performance of packed routines 47 threaded routines 39 layers, Intel(R) MKL structure 22 libraries to link with interface 31 run-time 34 system libraries 34 threading 33 link tool, command line 27 link-line syntax 29 linking examples 27 linking with compiler run-time libraries 34 interface libraries 31 system libraries 34 threading libraries 33 linking, quick start 25 linking, Web-based advisor 27 LINPACK benchmark Index 79M man pages, viewing 24 memory functions, redefining 49 memory management 49 memory renaming 49 mixed-language programming 53 module, Fortran 95 52 N notational conventions 13 number of threads changing at run time 42 changing with OpenMP* environment variable 42 Intel(R) MKL choice, particular cases 45 techniques to set 42 P parallel performance 41 parallelism, of Intel(R) MKL 39 performance with denormals 49 with subnormals 49 S SDL 26, 30 sequential mode of Intel(R) MKL 33 Single Dynamic Library 26, 30 structure high-level 21 in-detail model 22 support, technical 11 supported architectures 21 syntax, link-line 29 system libraries, linking with 34 T technical support 11 thread safety, of Intel(R) MKL 39 threaded functions 39 threaded problems 39 threading control, Intel(R) MKL-specific 44 threading libraries, linking with 33 U uBLAS, matrix-matrix multiplication, substitution with Intel MKL functions 57 unstable output, getting rid of 63 usage information 15 X Xcode*, configuring 65 Intel® Math Kernel Library for Mac OS* X User's Guide 80 Intel ® Math Kernel Library for Windows* OS User's Guide Intel® MKL - Windows* OS Document Number: 315930-018US Legal InformationContents Legal Information................................................................................7 Introducing the Intel® Math Kernel Library...........................................9 Getting Help and Support...................................................................11 Notational Conventions......................................................................13 Chapter 1: Overview Document Overview.................................................................................15 What's New.............................................................................................15 Related Information.................................................................................15 Chapter 2: Getting Started Checking Your Installation.........................................................................17 Setting Environment Variables ..................................................................17 Compiler Support.....................................................................................19 Using Code Examples...............................................................................19 What You Need to Know Before You Begin Using the Intel ® Math Kernel Library...............................................................................................19 Chapter 3: Structure of the Intel® Math Kernel Library Architecture Support................................................................................23 High-level Directory Structure....................................................................23 Layered Model Concept.............................................................................25 Contents of the Documentation Directories..................................................26 Chapter 4: Linking Your Application with the Intel® Math Kernel Library Linking Quick Start...................................................................................27 Using the /Qmkl Compiler Option.......................................................27 Automatically Linking a Project in the Visual Studio* Integrated Development Environment with Intel ® MKL......................................28 Automatically Linking Your Microsoft Visual C/C++* Project with Intel ® MKL..........................................................................28 Automatically Linking Your Intel ® Visual Fortran Project with Intel ® MKL..........................................................................28 Using the Single Dynamic Library.......................................................28 Selecting Libraries to Link with..........................................................29 Using the Link-line Advisor................................................................29 Using the Command-line Link Tool.....................................................30 Linking Examples.....................................................................................30 Linking on IA-32 Architecture Systems...............................................30 Linking on Intel(R) 64 Architecture Systems........................................31 Linking in Detail.......................................................................................31 Dynamically Selecting the Interface and Threading Layer......................32 Linking with Interface Libraries..........................................................33 Using the cdecl and stdcall Interfaces.........................................33 Using the ILP64 Interface vs. LP64 Interface...............................34 Linking with Fortran 95 Interface Libraries..................................36 Contents 3Linking with Threading Libraries.........................................................36 Sequential Mode of the Library..................................................36 Selecting the Threading Layer...................................................36 Linking with Computational Libraries..................................................37 Linking with Compiler Run-time Libraries............................................38 Linking with System Libraries............................................................38 Building Custom Dynamic-link Libraries.......................................................39 Using the Custom Dynamic-link Library Builder in the Command-line Mode.........................................................................................39 Composing a List of Functions ..........................................................40 Specifying Function Names...............................................................41 Building a Custom Dynamic-link Library in the Visual Studio* Development System...................................................................41 Distributing Your Custom Dynamic-link Library....................................42 Chapter 5: Managing Performance and Memory Using Parallelism of the Intel ® Math Kernel Library........................................43 Threaded Functions and Problems......................................................43 Avoiding Conflicts in the Execution Environment..................................45 Techniques to Set the Number of Threads...........................................46 Setting the Number of Threads Using an OpenMP* Environment Variable......................................................................................46 Changing the Number of Threads at Run Time.....................................46 Using Additional Threading Control.....................................................48 Intel MKL-specific Environment Variables for Threading Control. . . . .48 MKL_DYNAMIC........................................................................49 MKL_DOMAIN_NUM_THREADS..................................................50 Setting the Environment Variables for Threading Control..............51 Tips and Techniques to Improve Performance..............................................52 Coding Techniques...........................................................................52 Hardware Configuration Tips.............................................................53 Managing Multi-core Performance......................................................53 Operating on Denormals...................................................................54 FFT Optimized Radices.....................................................................54 Using Memory Management ......................................................................54 Intel MKL Memory Management Software............................................54 Redefining Memory Functions............................................................55 Chapter 6: Language-specific Usage Options Using Language-Specific Interfaces with Intel ® Math Kernel Library.................57 Interface Libraries and Modules.........................................................57 Fortran 95 Interfaces to LAPACK and BLAS..........................................59 Compiler-dependent Functions and Fortran 90 Modules.........................59 Using the stdcall Calling Convention in C/C++.....................................60 Compiling an Application that Calls the Intel ® Math Kernel Library and Uses the CVF Calling Conventions..................................................60 Mixed-language Programming with the Intel Math Kernel Library....................61 Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments..............................................................................61 Using Complex Types in C/C++.........................................................62 Intel® Math Kernel Library for Windows* OS User's Guide 4Calling BLAS Functions that Return the Complex Values in C/C++ Code..........................................................................................63 Support for Boost uBLAS Matrix-matrix Multiplication...........................64 Invoking Intel MKL Functions from Java* Applications...........................65 Intel MKL Java* Examples........................................................66 Running the Java* Examples.....................................................67 Known Limitations of the Java* Examples...................................68 Chapter 7: Coding Tips Aligning Data for Consistent Results...........................................................69 Using Predefined Preprocessor Symbols for Intel ® MKL Version-Dependent Compilation.........................................................................................70 Chapter 8: Working with the Intel® Math Kernel Library Cluster Software MPI Support............................................................................................71 Linking with ScaLAPACK and Cluster FFTs....................................................71 Determining the Number of Threads...........................................................73 Using DLLs..............................................................................................73 Setting Environment Variables on a Cluster.................................................74 Building ScaLAPACK Tests.........................................................................74 Examples for Linking with ScaLAPACK and Cluster FFT..................................74 Examples for Linking a C Application..................................................75 Examples for Linking a Fortran Application..........................................75 Chapter 9: Programming with Intel® Math Kernel Library in Integrated Development Environments (IDE) Configuring Your Integrated Development Environment to Link with Intel Math Kernel Library .............................................................................77 Configuring the Microsoft Visual C/C++* Development System to Link with Intel ® MKL............................................................................77 Configuring Intel ® Visual Fortran to Link with Intel MKL.........................77 Running an Intel MKL Example in the Visual Studio* 2008 IDE...............78 Creating, Configuring, and Running the Intel ® C/C++ and/or Visual C++* 2008 Project.....................................................78 Creating, Configuring, and Running the Intel Visual Fortran Project...............................................................................80 Support Files for Intel ® Math Kernel Library Examples...................81 Known Limitations of the Project Creation Procedure....................82 Getting Assistance for Programming in the Microsoft Visual Studio* IDE .........82 Viewing Intel MKL Documentation in Visual Studio* IDE........................82 Using Context-Sensitive Help............................................................83 Using the IntelliSense* Capability......................................................84 Chapter 10: LINPACK and MP LINPACK Benchmarks Intel ® Optimized LINPACK Benchmark for Windows* OS................................87 Contents of the Intel ® Optimized LINPACK Benchmark..........................87 Running the Software.......................................................................88 Known Limitations of the Intel ® Optimized LINPACK Benchmark.............89 Intel ® Optimized MP LINPACK Benchmark for Clusters...................................89 Overview of the Intel ® Optimized MP LINPACK Benchmark for Clusters....89 Contents 5Contents of the Intel ® Optimized MP LINPACK Benchmark for Clusters. . . .90 Building the MP LINPACK..................................................................91 New Features of Intel ® Optimized MP LINPACK Benchmark....................91 Benchmarking a Cluster....................................................................92 Options to Reduce Search Time.........................................................92 Appendix A: Intel® Math Kernel Library Language Interfaces Support Language Interfaces Support, by Function Domain.......................................95 Include Files............................................................................................96 Appendix B: Support for Third-Party Interfaces GMP* Functions.......................................................................................99 FFTW Interface Support............................................................................99 Appendix C: Directory Structure in Detail Detailed Structure of the IA-32 Architecture Directories...............................101 Static Libraries in the lib\ia32 Directory............................................101 Dynamic Libraries in the lib\ia32 Directory........................................102 Contents of the redist\ia32\mkl Directory..........................................102 Detailed Structure of the Intel ® 64 Architecture Directories..........................103 Static Libraries in the lib\intel64 Directory.........................................104 Dynamic Libraries in the lib\intel64 Directory.....................................105 Contents of the redist\intel64\mkl Directory......................................105 Intel® Math Kernel Library for Windows* OS User's Guide 6Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http:// www.intel.com/design/literature.htm Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/ processor_number/ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java is a registered trademark of Oracle and/or its affiliates. Copyright © 2007 - 2011, Intel Corporation. All rights reserved. Microsoft product screen shot(s) reprinted with permission from Microsoft Corporation. 7Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Intel® Math Kernel Library for Windows* OS User's Guide 8Introducing the Intel® Math Kernel Library The Intel ® Math Kernel Library (Intel ® MKL) improves performance of scientific, engineering, and financial software that solves large computational problems. Among other functionality, Intel MKL provides linear algebra routines, fast Fourier transforms, as well as vectorized math and random number generation functions, all optimized for the latest Intel processors, including processors with multiple cores (see the Intel ® MKL Release Notes for the full list of supported processors). Intel MKL also performs well on non-Intel processors. Intel MKL is thread-safe and extensively threaded using the OpenMP* technology. Intel MKL provides the following major functionality: • Linear algebra, implemented in LAPACK (solvers and eigensolvers) plus level 1, 2, and 3 BLAS, offering the vector, vector-matrix, and matrix-matrix operations needed for complex mathematical software. If you prefer the FORTRAN 90/95 programming language, you can call LAPACK driver and computational subroutines through specially designed interfaces with reduced numbers of arguments. A C interface to LAPACK is also available. • ScaLAPACK (SCAlable LAPACK) with its support functionality including the Basic Linear Algebra Communications Subprograms (BLACS) and the Parallel Basic Linear Algebra Subprograms (PBLAS). ScaLAPACK is available for Intel MKL for Linux* and Windows* operating systems. • Direct sparse solver, an iterative sparse solver, and a supporting set of sparse BLAS (level 1, 2, and 3) for solving sparse systems of equations. • Multidimensional discrete Fourier transforms (1D, 2D, 3D) with a mixed radix support (for sizes not limited to powers of 2). Distributed versions of these functions are provided for use on clusters on the Linux* and Windows* operating systems. • A set of vectorized transcendental functions called the Vector Math Library (VML). For most of the supported processors, the Intel MKL VML functions offer greater performance than the libm (scalar) functions, while keeping the same high accuracy. • The Vector Statistical Library (VSL), which offers high performance vectorized random number generators for several probability distributions, convolution and correlation routines, and summary statistics functions. • Data Fitting Library, which provides capabilities for spline-based approximation of functions, derivatives and integrals of functions, and search. For details see the Intel® MKL Reference Manual. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 9 Intel® Math Kernel Library for Windows* OS User's Guide 10Getting Help and Support Intel provides a support web site that contains a rich repository of self help information, including getting started tips, known product issues, product errata, license information, user forums, and more. Visit the Intel MKL support website at http://www.intel.com/software/products/support/. The Intel MKL documentation integrates into the Microsoft Visual Studio* integrated development environment (IDE). See Getting Assistance for Programming in the Microsoft Visual Studio* IDE. 11 Intel® Math Kernel Library for Windows* OS User's Guide 12Notational Conventions The following term is used in reference to the operating system. Windows* OS This term refers to information that is valid on all supported Windows* operating systems. The following notations are used to refer to Intel MKL directories. The installation directory for the Intel® C++ Composer XE or Intel® Visual Fortran Composer XE . The main directory where Intel MKL is installed: =\mkl. Replace this placeholder with the specific pathname in the configuring, linking, and building instructions. The following font conventions are used in this document. Italic Italic is used for emphasis and also indicates document names in body text, for example: see Intel MKL Reference Manual. Monospace lowercase mixed with uppercase Indicates: • Commands and command-line options, for example, ifort myprog.f mkl_blas95.lib mkl_c.lib libiomp5md.lib • Filenames, directory names, and pathnames, for example, C:\Program Files\Java\jdk1.5.0_09 • C/C++ code fragments, for example, a = new double [SIZE*SIZE]; UPPERCASE MONOSPACE Indicates system variables, for example, $MKLPATH. Monospace italic Indicates a parameter in discussions, for example, lda. When enclosed in angle brackets, indicates a placeholder for an identifier, an expression, a string, a symbol, or a value, for example, . Substitute one of these items for the placeholder. [ items ] Square brackets indicate that the items enclosed in brackets are optional. { item | item } Braces indicate that only one of the items listed between braces should be selected. A vertical bar ( | ) separates the items. 13 Intel® Math Kernel Library for Windows* OS User's Guide 14Overview 1 Document Overview The Intel® Math Kernel Library (Intel® MKL) User's Guide provides usage information for the library. The usage information covers the organization, configuration, performance, and accuracy of Intel MKL, specifics of routine calls in mixed-language programming, linking, and more. This guide describes OS-specific usage of Intel MKL, along with OS-independent features. The document contains usage information for all Intel MKL function domains. This User's Guide provides the following information: • Describes post-installation steps to help you start using the library • Shows you how to configure the library with your development environment • Acquaints you with the library structure • Explains how to link your application with the library and provides simple usage scenarios • Describes how to code, compile, and run your application with Intel MKL This guide is intended for Windows OS programmers with beginner to advanced experience in software development. See Also Language Interfaces Support, by Function Domain What's New This User's Guide documents the Intel® Math Kernel Library (Intel® MKL) 10.3 Update 8. The document was updated to reflect addition of Data Fitting Functions to the product and to describe how to build a custom dynamic-link library in the Visual Studio* Development System (see Building a Custom Dynamic-link Library in the Visual Studio* Development System). Related Information To reference how to use the library in your application, use this guide in conjunction with the following documents: • The Intel® Math Kernel Library Reference Manual, which provides reference information on routine functionalities, parameter descriptions, interfaces, calling syntaxes, and return values. • The Intel® Math Kernel Library for Windows* OS Release Notes. 151 Intel® Math Kernel Library for Windows* OS User's Guide 16Getting Started 2 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Checking Your Installation After installing the Intel® Math Kernel Library (Intel® MKL), verify that the library is properly installed and configured: 1. Intel MKL installs in . Check that the subdirectory of referred to as was created. Check that subdirectories for Intel MKL redistributable DLLs redist\ia32\mkl and redist \intel64\mkl were created in the directory (See redist.txt in the Intel MKL documentation directory for a list of files that can be redistributed.) 2. If you want to keep multiple versions of Intel MKL installed on your system, update your build scripts to point to the correct Intel MKL version. 3. Check that the following files appear in the \bin directory and its subdirectories: mklvars.bat ia32\mklvars_ia32.bat intel64\mklvars_intel64.bat Use these files to assign Intel MKL-specific values to several environment variables, as explained in Setting Environment Variables 4. To understand how the Intel MKL directories are structured, see Intel® Math Kernel Library Structure. 5. To make sure that Intel MKL runs on your system, do one of the following: • Launch an Intel MKL example, as explained in Using Code Examples • In the Visual Studio* IDE, create and run a simple project that uses Intel MKL, as explained in Running an Intel MKL Example in the Visual Studio IDE See Also Notational Conventions Setting Environment Variables When the installation of Intel MKL for Windows* OS is complete, set the PATH, LIB, and INCLUDE environment variables in the command shell using one of the script files in the bin subdirectory of the Intel MKL installation directory: ia32\mklvars_ia32.bat for the IA-32 architecture, 17intel64\mklvars_intel64.bat for the Intel® 64 architecture, mklvars.bat for the IA-32 and Intel® 64 architectures. Running the Scripts The scripts accept parameters to specify the following: • Architecture. • Addition of a path to Fortran 95 modules precompiled with the Intel ® Fortran compiler to the INCLUDE environment variable. Supply this parameter only if you are using the Intel ® Fortran compiler. • Interface of the Fortran 95 modules. This parameter is needed only if you requested addition of a path to the modules. Usage and values of these parameters depend on the script. The following table lists values of the script parameters. Script Architecture (required, when applicable) Addition of a Path to Fortran 95 Modules (optional) Interface (optional) mklvars_ia32 n/a † mod n/a mklvars_intel64 n/a mod lp64, default ilp64 mklvars ia32 intel64 mod lp64, default ilp64 † Not applicable. For example: • The command mklvars_ia32 sets environment variables for the IA-32 architecture and adds no path to the Fortran 95 modules. • The command mklvars_intel64 mod ilp64 sets environment variables for the Intel ® 64 architecture and adds the path to the Fortran 95 modules for the ILP64 interface to the INCLUDE environment variable. • The command mklvars intel64 mod sets environment variables for the Intel ® 64 architecture and adds the path to the Fortran 95 modules for the LP64 interface to the INCLUDE environment variable. NOTE Supply the parameter specifying the architecture first, if it is needed. Values of the other two parameters can be listed in any order. See Also High-level Directory Structure Interface Libraries and Modules Fortran 95 Interfaces to LAPACK and BLAS Setting the Number of Threads Using an OpenMP* Environment Variable 2 Intel® Math Kernel Library for Windows* OS User's Guide 18Compiler Support Intel MKL supports compilers identified in the Release Notes. However, the library has been successfully used with other compilers as well. Although Compaq no longer supports the Compaq Visual Fortran* (CVF) compiler, Intel MKL still preserves the CVF interface in the IA-32 architecture implementation. You can use this interface with the Intel® Fortran Compiler. Intel MKL provides both stdcall (default CVF interface) and cdecl (default interface of the Microsoft Visual C* application) interfaces for the IA-32 architecture. Intel MKL provides a set of include files to simplify program development by specifying enumerated values and prototypes for the respective functions. Calling Intel MKL functions from your application without an appropriate include file may lead to incorrect behavior of the functions. See Also Compiling an Application that Calls the Intel® Math Kernel Library and Uses the CVF Calling Conventions Using the cdecl and stdcall Interfaces Include Files Using Code Examples The Intel MKL package includes code examples, located in the examples subdirectory of the installation directory. Use the examples to determine: • Whether Intel MKL is working on your system • How you should call the library • How to link the library The examples are grouped in subdirectories mainly by Intel MKL function domains and programming languages. For example, the examples\spblas subdirectory contains a makefile to build the Sparse BLAS examples and the examples\vmlc subdirectory contains the makefile to build the C VML examples. Source code for the examples is in the next-level sources subdirectory. See Also High-level Directory Structure Running an Intel MKL Example in the Visual Studio* 2008 IDE What You Need to Know Before You Begin Using the Intel® Math Kernel Library Target platform Identify the architecture of your target machine: • IA-32 or compatible • Intel® 64 or compatible Reason: Because Intel MKL libraries are located in directories corresponding to your particular architecture (see Architecture Support), you should provide proper paths on your link lines (see Linking Examples). To configure your development environment for the use with Intel MKL, set your environment variables using the script corresponding to your architecture (see Setting Environment Variables for details). Mathematical problem Identify all Intel MKL function domains that you require: • BLAS • Sparse BLAS Getting Started 2 19• LAPACK • PBLAS • ScaLAPACK • Sparse Solver routines • Vector Mathematical Library functions (VML) • Vector Statistical Library functions • Fourier Transform functions (FFT) • Cluster FFT • Trigonometric Transform routines • Poisson, Laplace, and Helmholtz Solver routines • Optimization (Trust-Region) Solver routines • Data Fitting Functions • GMP* arithmetic functions. Deprecated and will be removed in a future release Reason: The function domain you intend to use narrows the search in the Reference Manual for specific routines you need. Additionally, if you are using the Intel MKL cluster software, your link line is function-domain specific (see Working with the Cluster Software). Coding tips may also depend on the function domain (see Tips and Techniques to Improve Performance). Programming language Intel MKL provides support for both Fortran and C/C++ programming. Identify the language interfaces that your function domains support (see Intel® Math Kernel Library Language Interfaces Support). Reason: Intel MKL provides language-specific include files for each function domain to simplify program development (see Language Interfaces Support, by Function Domain). For a list of language-specific interface libraries and modules and an example how to generate them, see also Using Language-Specific Interfaces with Intel® Math Kernel Library. Range of integer data If your system is based on the Intel 64 architecture, identify whether your application performs calculations with large data arrays (of more than 2 31 -1 elements). Reason: To operate on large data arrays, you need to select the ILP64 interface, where integers are 64-bit; otherwise, use the default, LP64, interface, where integers are 32-bit (see Using the ILP64 Interface vs. LP64 Interface). Threading model Identify whether and how your application is threaded: • Threaded with the Intel compiler • Threaded with a third-party compiler • Not threaded Reason: The compiler you use to thread your application determines which threading library you should link with your application. For applications threaded with a third-party compiler you may need to use Intel MKL in the sequential mode (for more information, see Sequential Mode of the Library and Linking with Threading Libraries). Number of threads Determine the number of threads you want Intel MKL to use. Reason: Intel MKL is based on the OpenMP* threading. By default, the OpenMP* software sets the number of threads that Intel MKL uses. If you need a different number, you have to set it yourself using one of the available mechanisms. For more information, see Using Parallelism of the Intel® Math Kernel Library. Linking model Decide which linking model is appropriate for linking your application with Intel MKL libraries: • Static 2 Intel® Math Kernel Library for Windows* OS User's Guide 20• Dynamic Reason: The link libraries for static and dynamic linking are different. For the list of link libraries for static and dynamic models, linking examples, and other relevant topics, like how to save disk space by creating a custom dynamic library, see Linking Your Application with the Intel® Math Kernel Library. MPI used Decide what MPI you will use with the Intel MKL cluster software. You are strongly encouraged to use Intel® MPI 3.2 or later. MPI used Reason: To link your application with ScaLAPACK and/or Cluster FFT, the libraries corresponding to your particular MPI should be listed on the link line (see Working with the Cluster Software). Getting Started 2 212 Intel® Math Kernel Library for Windows* OS User's Guide 22Structure of the Intel® Math Kernel Library 3 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Architecture Support Intel® Math Kernel Library (Intel® MKL) for Windows* OS provides two architecture-specific implementations. The following table lists the supported architectures and directories where each architecture-specific implementation is located. Architecture Location IA-32 or compatible \lib\ia32 \redist\ia32\mkl (DLLs) Intel® 64 or compatible \lib\intel64 \redist \intel64\mkl (DLLs) See Also High-level Directory Structure Detailed Structure of the IA-32 Architecture Directories Detailed Structure of the Intel® 64 Architecture Directories High-level Directory Structure Directory Contents Installation directory of the Intel® Math Kernel Library (Intel® MKL) Subdirectories of bin Batch files to set environmental variables in the user shell bin\ia32 Batch files for the IA-32 architecture bin\intel64 Batch files for the Intel® 64 architecture benchmarks\linpack Shared-Memory (SMP) version of the LINPACK benchmark benchmarks\mp_linpack Message-passing interface (MPI) version of the LINPACK benchmark 23Directory Contents lib\ia32 Static libraries and static interfaces to DLLs for the IA-32 architecture lib\intel64 Static libraries and static interfaces to DLLs for the Intel® 64 architecture examples Examples directory. Each subdirectory has source and data files include INCLUDE files for the library routines, as well as for tests and examples include\ia32 Fortran 95 .mod files for the IA-32 architecture and Intel Fortran compiler include\intel64\lp64 Fortran 95 .mod files for the Intel® 64 architecture, Intel® Fortran compiler, and LP64 interface include\intel64\ilp64 Fortran 95 .mod files for the Intel® 64 architecture, Intel Fortran compiler, and ILP64 interface include\fftw Header files for the FFTW2 and FFTW3 interfaces interfaces\blas95 Fortran 95 interfaces to BLAS and a makefile to build the library interfaces\fftw2x_cdft MPI FFTW 2.x interfaces to Intel MKL Cluster FFTs interfaces\fftw3x_cdft MPI FFTW 3.x interfaces to Intel MKL Cluster FFTs interfaces\fftw2xc FFTW 2.x interfaces to the Intel MKL FFTs (C interface) interfaces\fftw2xf FFTW 2.x interfaces to the Intel MKL FFTs (Fortran interface) interfaces\fftw3xc FFTW 3.x interfaces to the Intel MKL FFTs (C interface) interfaces\fftw3xf FFTW 3.x interfaces to the Intel MKL FFTs (Fortran interface) interfaces\lapack95 Fortran 95 interfaces to LAPACK and a makefile to build the library tests Source and data files for tests tools Commad-line link tool and tools for creating custom dynamically linkable libraries tools\builder Tools for creating custom dynamically linkable libraries Subdirectories of redist\ia32\mkl DLLs for applications running on processors with the IA-32 architecture redist\intel64\mkl DLLs for applications running on processors with Intel® 64 architecture Documentation\en_US\MKL Intel MKL documentation Documentation\vshelp \1033\ intel.mkldocs Help2-format files for integration of the Intel MKL documentation with the Microsoft Visual Studio* 2005/2008 IDE Documentation\msvhelp \1033\mkl Microsoft Help Viewer*-format files for integration of the Intel MKL documentation with the Microsoft Visual Studio* 2010 IDE See Also Notational Conventions 3 Intel® Math Kernel Library for Windows* OS User's Guide 24Layered Model Concept Intel MKL is structured to support multiple compilers and interfaces, different OpenMP* implementations, both serial and multiple threads, and a wide range of processors. Conceptually Intel MKL can be divided into distinct parts to support different interfaces, threading models, and core computations: 1. Interface Layer 2. Threading Layer 3. Computational Layer You can combine Intel MKL libraries to meet your needs by linking with one library in each part layer-bylayer. Once the interface library is selected, the threading library you select picks up the chosen interface, and the computational library uses interfaces and OpenMP implementation (or non-threaded mode) chosen in the first two layers. To support threading with different compilers, one more layer is needed, which contains libraries not included in Intel MKL: • Compiler run-time libraries (RTL). The following table provides more details of each layer. Layer Description Interface Layer This layer matches compiled code of your application with the threading and/or computational parts of the library. This layer provides: • cdecl and CVF default interfaces. • LP64 and ILP64 interfaces. • Compatibility with compilers that return function values differently. • A mapping between single-precision names and double-precision names for applications using Cray*-style naming (SP2DP interface). SP2DP interface supports Cray-style naming in applications targeted for the Intel 64 architecture and using the ILP64 interface. SP2DP interface provides a mapping between single-precision names (for both real and complex types) in the application and double-precision names in Intel MKL BLAS and LAPACK. Function names are mapped as shown in the following example for BLAS functions ?GEMM: SGEMM -> DGEMM DGEMM -> DGEMM CGEMM -> ZGEMM ZGEMM -> ZGEMM Mind that no changes are made to double-precision names. Threading Layer This layer: • Provides a way to link threaded Intel MKL with different threading compilers. • Enables you to link with a threaded or sequential mode of the library. This layer is compiled for different environments (threaded or sequential) and compilers (from Intel, Microsoft, and so on). Computational Layer This layer is the heart of Intel MKL. It has only one library for each combination of architecture and supported OS. The Computational layer accommodates multiple architectures through identification of architecture features and chooses the appropriate binary code at run time. Compiler Run-time Libraries (RTL) To support threading with Intel compilers, Intel MKL uses RTLs of the Intel® C++ Composer XE or Intel® Visual Fortran Composer XE. To thread using third-party threading compilers, use libraries in the Threading layer or an appropriate compatibility library. See Also Using the ILP64 Interface vs. LP64 Interface Structure of the Intel® Math Kernel Library 3 25Linking Your Application with the Intel® Math Kernel Library Linking with Threading Libraries Contents of the Documentation Directories Most of Intel MKL documentation is installed at \Documentation\ \mkl. For example, the documentation in English is installed at \Documentation\en_US\mkl. However, some Intel MKL-related documents are installed one or two levels up. The following table lists MKL-related documentation. File name Comment Files in \Documentation \clicense.rtf or \flicense.rtf Common end user license for the Intel® C++ Composer XE 2011 or Intel® Visual Fortran Composer XE 2011, respectively mklsupport.txt Information on package number for customer support reference Contents of \Documentation\\mkl redist.txt List of redistributable files mkl_documentation.htm Overview and links for the Intel MKL documentation mkl_manual\index.htm Intel MKL Reference Manual in an uncompressed HTML format Release_Notes.htm Intel MKL Release Notes mkl_userguide\index.htm Intel MKL User's Guide in an uncompressed HTML format, this document mkl_link_line_advisor.htm Intel MKL Link-line Advisor 3 Intel® Math Kernel Library for Windows* OS User's Guide 26Linking Your Application with the Intel® Math Kernel Library 4 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Linking Quick Start Intel® Math Kernel Library (Intel® MKL) provides several options for quick linking of your application. The simplest options depend on your development environment: Intel® Composer XE compiler see Using the /Qmkl Compiler Option. Microsoft Visual Studio* Integrated Development Environment (IDE) see Automatically Linking a Project in the Visual Studio* IDE with Intel MKL. Other options are independent of your development environment, but depend on the way you link: Explicit dynamic linking see Using the Single Dynamic Library for how to simplify your link line. Explicitly listing libraries on your link line see Selecting Libraries to Link with for a summary of the libraries. Using an interactive interface see Using the Link-line Advisor to determine libraries and options to specify on your link or compilation line. Using an internally provided tool see Using the Command-line Link Tool to determine libraries, options, and environment variables or even compile and build your application. Using the /Qmkl Compiler Option The Intel® Composer XE compiler supports the following variants of the /Qmkl compiler option: /Qmkl or /Qmkl:parallel to link with standard threaded Intel MKL. /Qmkl:sequential to link with sequential version of Intel MKL. /Qmkl:cluster to link with Intel MKL cluster components (sequential) that use Intel MPI. For more information on the /Qmkl compiler option, see the Intel Compiler User and Reference Guides. For each variant of the /Qmkl option, the compiler links your application using the following conventions: • cdecl for the IA-32 architecture • LP64 for the Intel® 64 architecture If you specify any variant of the /Qmkl compiler option, the compiler automatically includes the Intel MKL libraries. In cases not covered by the option, use the Link-line Advisor or see Linking in Detail. 27See Also Using the ILP64 Interface vs. LP64 Interface Using the Link-line Advisor Intel® Software Documentation Library Automatically Linking a Project in the Visual Studio* Integrated Development Environment with Intel® MKL After a default installation of the Intel® Math Kernel Library (Intel® MKL), Intel® C++ Composer XE, or Intel® Visual Fortran Composer XE, you can easily configure your project to automatically link with Intel MKL. Automatically Linking Your Microsoft Visual C/C++* Project with Intel® MKL Configure your Microsoft Visual C/C++* project for automatic linking with Intel MKL as follows: • For the Visual Studio* 2010 development system: 1. Go to Project>Properties>Configuration Properties>Intel Performance Libraries. 2. Change the Use MKL property setting by selecting Parallel, Sequential, or Cluster as appropriate. • For the Visual Studio 2005/2008 development system: 1. Go to Project>Intel C++ Composer XE 2011>Select Build Components. 2. From the Use MKL drop-down menu, select Parallel, Sequential, or Cluster as appropriate. Specific Intel MKL libraries that link with your application may depend on more project settings. For details, see the Intel® Composer XE documentation. See Also Intel® Software Documentation Library Automatically Linking Your Intel® Visual Fortran Project with Intel® MKL Configure your Intel® Visual Fortran project for automatic linking with Intel MKL as follows: Go to Project > Properties > Libraries > Use Intel Math Kernel Library and select Parallel, Sequential, or Cluster as appropriate. Specific Intel MKL libraries that link with your application may depend on more project settings. For details see the Intel® Visual Fortran Compiler XE User and Reference Guides. See Also Intel® Software Documentation Library Using the Single Dynamic Library You can simplify your link line through the use of the Intel MKL Single Dynamic Library (SDL). To use SDL, place mkl_rt.lib on your link line. For example: icl.exe application.c mkl_rt.lib mkl_rt.lib is the import library for mkl_rt.dll. SDL enables you to select the interface and threading library for Intel MKL at run time. By default, linking with SDL provides: • LP64 interface on systems based on the Intel® 64 architecture • Intel threading To use other interfaces or change threading preferences, including use of the sequential version of Intel MKL, you need to specify your choices using functions or environment variables as explained in section Dynamically Selecting the Interface and Threading Layer. 4 Intel® Math Kernel Library for Windows* OS User's Guide 28Selecting Libraries to Link with To link with Intel MKL: • Choose one library from the Interface layer and one library from the Threading layer • Add the only library from the Computational layer and run-time libraries (RTL) The following table lists Intel MKL libraries to link with your application. Interface layer Threading layer Computational layer RTL IA-32 architecture, static linking mkl_intel_c.lib mkl_intel_ thread.lib mkl_core.lib libiomp5md.lib IA-32 architecture, dynamic linking mkl_intel_c_ dll.lib mkl_intel_ thread_dll.lib mkl_core_dll. lib libiomp5md.lib Intel® 64 architecture, static linking mkl_intel_ lp64.lib mkl_intel_ thread.lib mkl_core.lib libiomp5md.lib Intel® 64 architecture, dynamic linking mkl_intel_ lp64_dll.lib mkl_intel_ thread_dll.lib mkl_core_dll. lib libiomp5md.lib The Single Dynamic Library (SDL) automatically links interface, threading, and computational libraries and thus simplifies linking. The following table lists Intel MKL libraries for dynamic linking using SDL. See Dynamically Selecting the Interface and Threading Layer for how to set the interface and threading layers at run time through function calls or environment settings. SDL RTL IA-32 and Intel® 64 architectures mkl_rt.lib libiomp5md.lib † † Linking with libiomp5md.lib is not required. For exceptions and alternatives to the libraries listed above, see Linking in Detail. See Also Layered Model Concept Using the Link-line Advisor Using the /Qmkl Compiler Option Working with the Intel® Math Kernel Library Cluster Software Using the Link-line Advisor Use the Intel MKL Link-line Advisor to determine the libraries and options to specify on your link or compilation line. The latest version of the tool is available at http://software.intel.com/en-us/articles/intel-mkl-link-lineadvisor. The tool is also available in the product. The Advisor requests information about your system and on how you intend to use Intel MKL (link dynamically or statically, use threaded or sequential mode, etc.). The tool automatically generates the appropriate link line for your application. See Also Contents of the Documentation Directories Linking Your Application with the Intel® Math Kernel Library 4 29Using the Command-line Link Tool Use the command-line Link tool provided by Intel MKL to simplify building your application with Intel MKL. The tool not only provides the options, libraries, and environment variables to use, but also performs compilation and building of your application. The tool mkl_link_tool.exe is installed in the \tools directory. See the knowledge base article at http://software.intel.com/en-us/articles/mkl-command-line-link-tool for more information. Linking Examples See Also Using the Link-line Advisor Examples for Linking with ScaLAPACK and Cluster FFT Linking on IA-32 Architecture Systems The following examples illustrate linking that uses Intel(R) compilers. The examples use the .f Fortran source file. C/C++ users should instead specify a .cpp (C++) or .c (C) file and replace ifort with icc: • Static linking of myprog.f and parallel Intel MKL supporting the cdecl interface: ifort myprog.f mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib • Dynamic linking of myprog.f and parallel Intel MKL supporting the cdecl interface: ifort myprog.f mkl_intel_c_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib • Static linking of myprog.f and sequential version of Intel MKL supporting the cdecl interface: ifort myprog.f mkl_intel_c.lib mkl_sequential.lib mkl_core.lib • Dynamic linking of myprog.f and sequential version of Intel MKL supporting the cdecl interface: ifort myprog.f mkl_intel_c_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib • Static linking of user code myprog.f and parallel Intel MKL supporting the stdcall interface: ifort myprog.f mkl_intel_s.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib • Dynamic linking of user code myprog.f and parallel Intel MKL supporting the stdcall interface: ifort myprog.f mkl_intel_s_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib • Dynamic linking of user code myprog.f and parallel or sequential Intel MKL supporting the cdecl or stdcall interface (Call the mkl_set_threading_layer function or set value of the MKL_THREADING_LAYER environment variable to choose threaded or sequential mode): ifort myprog.f mkl_rt.lib • Static linking of myprog.f, Fortran 95 LAPACK interface, and parallel Intel MKL supporting the cdecl interface: ifort myprog.f mkl_lapack95.lib mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib • Static linking of myprog.f, Fortran 95 BLAS interface, and parallel Intel MKL supporting the cdecl interface: ifort myprog.f mkl_blas95.lib mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib 4 Intel® Math Kernel Library for Windows* OS User's Guide 30See Also Fortran 95 Interfaces to LAPACK and BLAS Examples for Linking a C Application Examples for Linking a Fortran Application Using the Single Dynamic Library Linking on Intel(R) 64 Architecture Systems The following examples illustrate linking that uses Intel(R) compilers. The examples use the .f Fortran source file. C/C++ users should instead specify a .cpp (C++) or .c (C) file and replace ifort with icc: • Static linking of myprog.f and parallel Intel MKL supporting the LP64 interface: ifort myprog.f mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib • Dynamic linking of myprog.f and parallel Intel MKL supporting the LP64 interface: ifort myprog.f mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib • Static linking of myprog.f and sequential version of Intel MKL supporting the LP64 interface: ifort myprog.f mkl_intel_lp64.lib mkl_sequential.lib mkl_core.lib • Dynamic linking of myprog.f and sequential version of Intel MKL supporting the LP64 interface: ifort myprog.f mkl_intel_lp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib • Static linking of myprog.f and parallel Intel MKL supporting the ILP64 interface: ifort myprog.f mkl_intel_ilp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib • Dynamic linking of myprog.f and parallel Intel MKL supporting the ILP64 interface: ifort myprog.f mkl_intel_ilp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib • Dynamic linking of user code myprog.f and parallel or sequential Intel MKL supporting the LP64 or ILP64 interface (Call appropriate functions or set environment variables to choose threaded or sequential mode and to set the interface): ifort myprog.f mkl_rt.lib • Static linking of myprog.f, Fortran 95 LAPACK interface, and parallel Intel MKL supporting the LP64 interface: ifort myprog.f mkl_lapack95_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib • Static linking of myprog.f, Fortran 95 BLAS interface, and parallel Intel MKL supporting the LP64 interface: ifort myprog.f mkl_blas95_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib See Also Fortran 95 Interfaces to LAPACK and BLAS Examples for Linking a C Application Examples for Linking a Fortran Application Using the Single Dynamic Library Linking in Detail This section recommends which libraries to link with depending on your Intel MKL usage scenario and provides details of the linking. Linking Your Application with the Intel® Math Kernel Library 4 31Dynamically Selecting the Interface and Threading Layer The Single Dynamic Library (SDL) enables you to dynamically select the interface and threading layer for Intel MKL. Setting the Interface Layer Available interfaces depend on the architecture of your system. On systems based on the Intel ® 64 architecture, LP64 and ILP64 interfaces are available. To set one of these interfaces at run time, use the mkl_set_interface_layer function or the MKL_INTERFACE_LAYER environment variable. The following table provides values to be used to set each interface. Interface Layer Value of MKL_INTERFACE_LAYER Value of the Parameter of mkl_set_interface_layer LP64 LP64 MKL_INTERFACE_LP64 ILP64 ILP64 MKL_INTERFACE_ILP64 If the mkl_set_interface_layer function is called, the environment variable MKL_INTERFACE_LAYER is ignored. By default the LP64 interface is used. See the Intel MKL Reference Manual for details of the mkl_set_interface_layer function. On systems based on the IA-32 architecture, the cdecl and stdcall interfaces are available. These interfaces have different function naming conventions, and SDL selects between cdecl and stdcall at link time according to the function names. Setting the Threading Layer To set the threading layer at run time, use the mkl_set_threading_layer function or the MKL_THREADING_LAYER environment variable. The following table lists available threading layers along with the values to be used to set each layer. Threading Layer Value of MKL_THREADING_LAYER Value of the Parameter of mkl_set_threading_layer Intel threading INTEL MKL_THREADING_INTEL Sequential mode of Intel MKL SEQUENTIAL MKL_THREADING_SEQUENTIAL PGI threading PGI MKL_THREADING_PGI If the mkl_set_threading_layer function is called, the environment variable MKL_THREADING_LAYER is ignored. By default Intel threading is used. See the Intel MKL Reference Manual for details of the mkl_set_threading_layer function. Replacing Error Handling and Progress Information Routines You can replace the Intel MKL error handling routine xerbla or progress information routine mkl_progress with your own function. If you are using SDL, to replace xerbla or mkl_progress, call the mkl_set_xerbla and mkl_set_progress function, respectively. See the Intel MKL Reference Manual for details. 4 Intel® Math Kernel Library for Windows* OS User's Guide 32NOTE If you are using SDL, you cannot perform the replacement by linking the object file with your implementation of xerbla or mkl_progress. See Also Using the Single Dynamic Library Layered Model Concept Using the cdecl and stdcall Interfaces Directory Structure in Detail Linking with Interface Libraries Using the cdecl and stdcall Interfaces Intel MKL provides the following interfaces in its IA-32 architecture implementation: • stdcall Default Compaq Visual Fortran* (CVF) interface. Use it with the Intel® Fortran Compiler. • cdecl Default interface of the Microsoft Visual C/C++* application. To use each of these interfaces, link with the appropriate library, as specified in the following table: Interface Library for Static Linking Library for Dynamic Linking cdecl mkl_intel_c.lib mkl_intel_c_dll.lib stdcall mkl_intel_s.lib mkl_intel_s_dll.lib To link with the cdecl or stdcall interface library, use appropriate calling syntax in C applications and appropriate compiler options for Fortran applications. If you are using a C compiler, to link with the cdecl or stdcall interface library, call Intel MKL routines in your code as explained in the table below: Interface Library Calling Intel MKL Routines mkl_intel_s [_dll].lib Call a routine with the following statement: extern __stdcall name( , , .. ); where stdcall is actually the CVF compiler default compilation, which differs from the regular stdcall compilation in the way how strings are passed to the routine. Because the default CVF format is not identical with stdcall, you must specially handle strings in the calling sequence. See how to do it in sections on interfaces in the CVF documentation. mkl_intel_c [_dll].lib Use the following declaration: name( , , .. ); If you are using a Fortran compiler, to link with the cdecl or stdcall interface library, provide compiler options as explained in the table below: Interface Library Compiler Options Comment CVF compiler mkl_intel_s[_dll].lib Default mkl_intel_c[_dll].lib /iface=(cref, nomixed_str_len_arg) Linking Your Application with the Intel® Math Kernel Library 4 33Interface Library Compiler Options Comment Intel® Fortran compiler mkl_intel_c[_dll].lib Default mkl_intel_s[_dll].lib /Gm or /iface:cvf /Gm and /iface:cvf options enable compatibility of the CVF and Powerstation calling conventions See Also Using the stdcall Calling Convention in C/C++ Compiling an Application that Calls the Intel® Math Kernel Library and Uses the CVF Calling Conventions Using the ILP64 Interface vs. LP64 Interface The Intel MKL ILP64 libraries use the 64-bit integer type (necessary for indexing large arrays, with more than 2 31 -1 elements), whereas the LP64 libraries index arrays with the 32-bit integer type. The LP64 and ILP64 interfaces are implemented in the Interface layer. Link with the following interface libraries for the LP64 or ILP64 interface, respectively: • mkl_intel_lp64.lib or mkl_intel_ilp64.lib for static linking • mkl_intel_lp64_dll.lib or mkl_intel_ilp64_dll.lib for dynamic linking The ILP64 interface provides for the following: • Support large data arrays (with more than 2 31 -1 elements) • Enable compiling your Fortran code with the /4I8 compiler option The LP64 interface provides compatibility with the previous Intel MKL versions because "LP64" is just a new name for the only interface that the Intel MKL versions lower than 9.1 provided. Choose the ILP64 interface if your application uses Intel MKL for calculations with large data arrays or the library may be used so in future. Intel MKL provides the same include directory for the ILP64 and LP64 interfaces. Compiling for LP64/ILP64 The table below shows how to compile for the ILP64 and LP64 interfaces: Fortran Compiling for ILP64 ifort /4I8 /I\include ... Compiling for LP64 ifort /I\include ... C or C++ Compiling for ILP64 icl /DMKL_ILP64 /I\include ... Compiling for LP64 icl /I\include ... CAUTION Linking of an application compiled with the /4I8 or /DMKL_ILP64 option to the LP64 libraries may result in unpredictable consequences and erroneous output. Coding for ILP64 You do not need to change existing code if you are not using the ILP64 interface. 4 Intel® Math Kernel Library for Windows* OS User's Guide 34To migrate to ILP64 or write new code for ILP64, use appropriate types for parameters of the Intel MKL functions and subroutines: Integer Types Fortran C or C++ 32-bit integers INTEGER*4 or INTEGER(KIND=4) int Universal integers for ILP64/ LP64: • 64-bit for ILP64 • 32-bit otherwise INTEGER without specifying KIND MKL_INT Universal integers for ILP64/ LP64: • 64-bit integers INTEGER*8 or INTEGER(KIND=8) MKL_INT64 FFT interface integers for ILP64/ LP64 INTEGER without specifying KIND MKL_LONG To determine the type of an integer parameter of a function, use appropriate include files. For functions that support only a Fortran interface, use the C/C++ include files *.h. The above table explains which integer parameters of functions become 64-bit and which remain 32-bit for ILP64. The table applies to most Intel MKL functions except some VML and VSL functions, which require integer parameters to be 64-bit or 32-bit regardless of the interface: • VML: The mode parameter of VML functions is 64-bit. • Random Number Generators (RNG): All discrete RNG except viRngUniformBits64 are 32-bit. The viRngUniformBits64 generator function and vslSkipAheadStream service function are 64-bit. • Summary Statistics: The estimate parameter of the vslsSSCompute/vsldSSCompute function is 64- bit. Refer to the Intel MKL Reference Manual for more information. To better understand ILP64 interface details, see also examples and tests. Limitations All Intel MKL function domains support ILP64 programming with the following exceptions: • FFTW interfaces to Intel MKL: • FFTW 2.x wrappers do not support ILP64. • FFTW 3.2 wrappers support ILP64 by a dedicated set of functions plan_guru64. • GMP* Arithmetic Functions do not support ILP64. NOTE GMP Arithmetic Functions are deprecated and will be removed in a future release. See Also High-level Directory Structure Include Files Language Interfaces Support, by Function Domain Layered Model Concept Linking Your Application with the Intel® Math Kernel Library 4 35Directory Structure in Detail Linking with Fortran 95 Interface Libraries The mkl_blas95*.lib and mkl_lapack95*.lib libraries contain Fortran 95 interfaces for BLAS and LAPACK, respectively, which are compiler-dependent. In the Intel MKL package, they are prebuilt for the Intel® Fortran compiler. If you are using a different compiler, build these libraries before using the interface. See Also Fortran 95 Interfaces to LAPACK and BLAS Compiler-dependent Functions and Fortran 90 Modules Linking with Threading Libraries Sequential Mode of the Library You can use Intel MKL in a sequential (non-threaded) mode. In this mode, Intel MKL runs unthreaded code. However, it is thread-safe (except the LAPACK deprecated routine ?lacon), which means that you can use it in a parallel region in your OpenMP* code. The sequential mode requires no compatibility OpenMP* run-time library and does not respond to the environment variable OMP_NUM_THREADS or its Intel MKL equivalents. You should use the library in the sequential mode only if you have a particular reason not to use Intel MKL threading. The sequential mode may be helpful when using Intel MKL with programs threaded with some non-Intel compilers or in other situations where you need a non-threaded version of the library (for instance, in some MPI cases). To set the sequential mode, in the Threading layer, choose the *sequential.* library. See Also Directory Structure in Detail Using Parallelism of the Intel® Math Kernel Library Avoiding Conflicts in the Execution Environment Linking Examples Selecting the Threading Layer Several compilers that Intel MKL supports use the OpenMP* threading technology. Intel MKL supports implementations of the OpenMP* technology that these compilers provide. To make use of this support, you need to link with the appropriate library in the Threading Layer and Compiler Support Run-time Library (RTL). Threading Layer Each Intel MKL threading library contains the same code compiled by the respective compiler (Intel and PGI* compilers on Windows OS). RTL This layer includes libiomp, the compatibility OpenMP* run-time library of the Intel compiler. In addition to the Intel compiler, libiomp provides support for one more threading compiler on Windows OS (Microsoft Visual C++*). That is, a program threaded with the Microsoft Visual C++ compiler can safely be linked with Intel MKL and libiomp. The table below helps explain what threading library and RTL you should choose under different scenarios when using Intel MKL (static cases only): 4 Intel® Math Kernel Library for Windows* OS User's Guide 36Compiler Application Threaded? Threading Layer RTL Recommended Comment Intel Does not matter mkl_intel_ thread.lib libiomp5md.lib PGI Yes mkl_pgi_thread. lib or mkl_sequential. lib PGI* supplied Use of mkl_sequential.lib removes threading from Intel MKL calls. PGI No mkl_intel_ thread.lib libiomp5md.lib PGI No mkl_pgi_thread. lib PGI* supplied PGI No mkl_sequential. lib None Microsoft Yes mkl_intel_ thread.lib libiomp5md.lib For the OpenMP* library of the Microsoft Visual Studio* IDE version 2005 or later. Microsoft Yes mkl_sequential. lib None For Win32 threading. Microsoft No mkl_intel_ thread.lib libiomp5md.lib other Yes mkl_sequential. lib None other No mkl_intel_ thread.lib libiomp5md.lib TIP To use the threaded Intel MKL, compile your code with the /MT option. The compiler driver will pass the option to the linker and the latter will load multi-thread (MT) run-time libraries. Linking with Computational Libraries If you are not using the Intel MKL cluster software, you need to link your application with only one computational library, depending on the linking method: Static Linking Dynamic Linking mkl_core.lib mkl_core_dll.lib Computational Libraries for Applications that Use the Intel MKL Cluster Software ScaLAPACK and Cluster Fourier Transform Functions (Cluster FFT) require more computational libraries, which may depend on your architecture. The following table lists computational libraries for IA-32 architecture applications that use ScaLAPACK or Cluster FFT. Linking Your Application with the Intel® Math Kernel Library 4 37Computational Libraries for IA-32 Architecture Function domain Static Linking Dynamic Linking ScaLAPACK † mkl_scalapack_core.lib mkl_core.lib mkl_scalapack_core_dll.lib mkl_core_dll.lib Cluster Fourier Transform Functions † mkl_cdft_core.lib mkl_core.lib mkl_cdft_core_dll.lib mkl_core_dll.lib † Also add the library with BLACS routines corresponding to the MPI used. The following table lists computational libraries for Intel ® 64 architecture applications that use ScaLAPACK or Cluster FFT. Computational Libraries for the Intel ® 64 Architecture Function domain Static Linking Dynamic Linking ScaLAPACK, LP64 interface 1 mkl_scalapack_lp64.lib mkl_core.lib mkl_scalapack_lp64_dll.lib mkl_core_dll.lib ScaLAPACK, ILP64 interface 1 mkl_scalapack_ilp64.lib mkl_core.lib mkl_scalapack_ilp64_dll.lib mkl_core_dll.lib Cluster Fourier Transform Functions 1 mkl_cdft_core.lib mkl_core.lib mkl_cdft_core_dll.lib mkl_core_dll.lib † Also add the library with BLACS routines corresponding to the MPI used. See Also Linking with ScaLAPACK and Cluster FFTs Using the Link-line Advisor Using the ILP64 Interface vs. LP64 Interface Linking with Compiler Run-time Libraries Dynamically link libiomp, the compatibility OpenMP* run-time library, even if you link other libraries statically. Linking to the libiomp statically can be problematic because the more complex your operating environment or application, the more likely redundant copies of the library are included. This may result in performance issues (oversubscription of threads) and even incorrect results. To link libiomp dynamically, be sure the PATH environment variable is defined correctly. See Also Setting Environment Variables Layered Model Concept Linking with System Libraries If your system is based on the Intel® 64 architecture, be aware that Microsoft SDK builds 1289 or higher provide the bufferoverflowu.lib library to resolve the __security_cookie external references. Makefiles for examples and tests include this library by using the buf_lib=bufferoverflowu.lib macro. If you are using older SDKs, leave this macro empty on your command line as follows: buf_lib= . 4 Intel® Math Kernel Library for Windows* OS User's Guide 38Building Custom Dynamic-link Libraries ?ustom dynamic-link libraries (DLL) reduce the collection of functions available in Intel MKL libraries to those required to solve your particular problems, which helps to save disk space and build your own dynamic libraries for distribution. The Intel MKL custom DLL builder enables you to create a dynamic library containing the selected functions and located in the tools\builder directory. The builder contains a makefile and a definition file with the list of functions. Using the Custom Dynamic-link Library Builder in the Command-line Mode To build a custom DLL, use the following command: nmake target [] The following table lists possible values of target and explains what the command does for each value: Value Comment libia32 The builder uses static Intel MKL interface, threading, and core libraries to build a custom DLL for the IA-32 architecture. libintel64 The builder uses static Intel MKL interface, threading, and core libraries to build a custom DLL for the Intel® 64 architecture. dllia32 The builder uses the single dynamic library libmkl_rt.dll to build a custom DLL for the IA-32 architecture. dllintel64 The builder uses the single dynamic library libmkl_rt.dll to build a custom DLL for the Intel® 64 architecture. help The command prints Help on the custom DLL builder The placeholder stands for the list of parameters that define macros to be used by the makefile. The following table describes these parameters: Parameter [Values] Description interface Defines which programming interface to use.Possible values: • For the IA-32 architecture, {cdecl|stdcall}. The default value is cdecl. • For the Intel 64 architecture, {lp64|ilp64}. The default value is lp64. threading = {parallel| sequential} Defines whether to use the Intel MKL in the threaded or sequential mode. The default value is parallel. export = Specifies the full name of the file that contains the list of entry-point functions to be included in the DLL. The default name is user_example_list (no extension). name = Specifies the name of the dll and interface library to be created. By default, the names of the created libraries are mkl_custom.dll and mkl_custom.lib. xerbla = Specifies the name of the object file .obj that contains the user's error handler. The makefile adds this error handler to the library for use instead of the default Intel MKL error handler xerbla. If you omit this parameter, the native Intel MKL xerbla is used. See the description of the xerbla function in the Intel MKL Reference Manual on how to develop your own error handler. For the IA-32 architecture, the object file should be in the interface defined by the interface macro (cdecl or stdcall). Linking Your Application with the Intel® Math Kernel Library 4 39Parameter [Values] Description MKLROOT = Specifies the location of Intel MKL libraries used to build the custom DLL. By default, the builder uses the Intel MKL installation directory. buf_lib Manages resolution of the __security_cookie external references in the custom DLL on systems based on the Intel® 64 architecture. By default, the makefile uses the bufferoverflowu.lib library of Microsoft SDK builds 1289 or higher. This library resolves the __security_cookie external references. To avoid using this library, set the empty value of this parameter. Therefore, if you are using an older SDK, set buf_lib= . CAUTION Use the buf_lib parameter only with the empty value. Incorrect value of the parameter causes builder errors. crt = Specifies the name of the Microsoft C run-time library to be used to build the custom DLL. By default, the builder uses msvcrt.lib. manifest = {yes|no|embed} Manages the creation of a Microsoft manifest for the custom DLL: • If manifest=yes, the manifest file with the name defined by the name parameter above and the manifest extension will be created. • If manifest=no, the manifest file will not be created. • If manifest=embed, the manifest will be embedded into the DLL. By default, the builder does not use the manifest parameter. All the above parameters are optional. In the simplest case, the command line is nmake ia32, and the missing options have default values. This command creates the mkl_custom.dll and mkl_custom.lib libraries with the cdecl interface for processors using the IA-32 architecture. The command takes the list of functions from the functions_list file and uses the native Intel MKL error handler xerbla. An example of a more complex case follows: nmake ia32 interface=stdcall export=my_func_list.txt name=mkl_small xerbla=my_xerbla.obj In this case, the command creates the mkl_small.dll and mkl_small.lib libraries with the stdcall interface for processors using the IA-32 architecture. The command takes the list of functions from my_func_list.txt file and uses the user's error handler my_xerbla.obj. The process is similar for processors using the Intel® 64 architecture. See Also Linking with System Libraries Composing a List of Functions To compose a list of functions for a minimal custom DLL needed for your application, you can use the following procedure: 1. Link your application with installed Intel MKL libraries to make sure the application builds. 2. Remove all Intel MKL libraries from the link line and start linking. Unresolved symbols indicate Intel MKL functions that your application uses. 3. Include these functions in the list. 4 Intel® Math Kernel Library for Windows* OS User's Guide 40Important Each time your application starts using more Intel MKL functions, update the list to include the new functions. See Also Specifying Function Names Specifying Function Names In the file with the list of functions for your custom DLL, adjust function names to the required interface. For example, you can list the cdecl entry points as follows: DGEMM DTRSM DDOT DGETRF DGETRS cblas_dgemm cblas_ddot You can list the stdcall entry points as follows: _DGEMM@60 _DDOT@20 _DGETRF@24 For more examples, see domain-specific lists of function names in the \tools\builder folder. This folder contains lists of function names for both cdecl or stdcall interfaces. NOTE The lists of function names are provided in the \tools\builder folder merely as examples. See Composing a List of Functions for how to compose lists of functions for your custom DLL. TIP Names of Fortran-style routines (BLAS, LAPACK, etc.) can be both upper-case or lower-case, with or without the trailing underscore. For example, these names are equivalent: BLAS: dgemm, DGEMM, dgemm_, DGEMM_ LAPACK: dgetrf, DGETRF, dgetrf_, DGETRF_. Properly capitalize names of C support functions in the function list. To do this, follow the guidelines below: 1. In the mkl_service.h include file, look up a #define directive for your function. 2. Take the function name from the replacement part of that directive. For example, the #define directive for the mkl_disable_fast_mm function is #define mkl_disable_fast_mm MKL_Disable_Fast_MM. Capitalize the name of this function in the list like this: MKL_Disable_Fast_MM. For the names of the Fortran support functions, see the tip. Building a Custom Dynamic-link Library in the Visual Studio* Development System You can build a custom dynamic-link library (DLL) in the Microsoft Visual Studio* Development System (VS*) . To do this, use projects available in the tools\builder\MSVS_Projects subdirectory of the Intel MKL directory. The directory contains the VS2005, VS2008, and VS2010 subdirectories with projects for the respective versions of the Visual Studio Development System. For each version of VS two solutions are available: Linking Your Application with the Intel® Math Kernel Library 4 41• libia32.sln builds a custom DLL for the IA-32 architecture. • libintel64.sln builds a custom DLL for the Intel® 64 architecture. The builder uses the following default settings for the custom DLL: Interface: cdecl for the IA-32 architecture and LP64 for the Intel 64 architecture Error handler: Native Intel MKL xerbla Create Microsoft manifest: yes List of functions: in the project's source file examples.def To build a custom DLL: 1. Open the libia32.sln or libintel64.sln solution depending on the architecture of your system. The solution includes the following projects: • i_malloc_dll • vml_dll_core • cdecl_parallel (in libia32.sln) or lp64_parallel (in libintel64.sln) • cdecl_sequential (in libia32.sln) or lp64_sequential (in libintel64.sln) 2. [Optional] To change any of the default settings, select the project depending on whether the DLL will use Intel MKL functions in the sequential or multi-threaded mode: • In the libia32 solution, select the cdecl_sequential or cdecl_parallel project. • In the libintel64 solution, select the lp64_sequential or lp64_parallel project. 3. [Optional] To build the DLL that uses the stdcall interface for the IA-32 architecture or the ILP64 interface for the Intel 64 architecture: a. Select Project>Properties>Configuration Properties>Linker>Input>Additional Dependencies. b. In the libia32 solution, change mkl_intel_c.lib to mkl_intel_s.lib. In the libintel64 solution, change mkl_intel_lp64.lib to mkl_intel_ilp64.lib. 4. [Optional] To include your own error handler in the DLL: a. Select Project>Properties>Configuration Properties>Linker>Input. b. Add .obj 5. [Optional] To turn off creation of the manifest: a. Select Project>Properties>Configuration Properties>Linker>Manifest File>Generate Manifest. b. Select: no. 6. [Optional] To change the list of functions to be included in the DLL: a. Select Source Files. b. Edit the examples.def file. Refer to Specifying Function Names for how to specify entry points. 7. To build the library: • In VS2005 - VS2008, select Build>Project Only>Link Only and link projects in this order: i_malloc_dll, vml_dll_core, cdecl_sequential/lp64_sequential or cdecl_ parallel/ lp64_parallel. • In VS2010, select Build>Build Solution. See Also Using the Custom Dynamic-link Library Builder in the Command-line Mode Distributing Your Custom Dynamic-link Library To enable use of your custom DLL in a threaded mode, distribute libiomp5md.dll along with the custom DLL. 4 Intel® Math Kernel Library for Windows* OS User's Guide 42Managing Performance and Memory 5 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Using Parallelism of the Intel® Math Kernel Library Intel MKL is extensively parallelized. See Threaded Functions and Problems for lists of threaded functions and problems that can be threaded. Intel MKL is thread-safe, which means that all Intel MKL functions (except the LAPACK deprecated routine ? lacon) work correctly during simultaneous execution by multiple threads. In particular, any chunk of threaded Intel MKL code provides access for multiple threads to the same shared data, while permitting only one thread at any given time to access a shared piece of data. Therefore, you can call Intel MKL from multiple threads and not worry about the function instances interfering with each other. The library uses OpenMP* threading software, so you can use the environment variable OMP_NUM_THREADS to specify the number of threads or the equivalent OpenMP run-time function calls. Intel MKL also offers variables that are independent of OpenMP, such as MKL_NUM_THREADS, and equivalent Intel MKL functions for thread management. The Intel MKL variables are always inspected first, then the OpenMP variables are examined, and if neither is used, the OpenMP software chooses the default number of threads. By default, Intel MKL uses the number of threads equal to the number of physical cores on the system. To achieve higher performance, set the number of threads to the number of real processors or physical cores, as summarized in Techniques to Set the Number of Threads. See Also Managing Multi-core Performance Threaded Functions and Problems The following Intel MKL function domains are threaded: • Direct sparse solver. • LAPACK. For the list of threaded routines, see Threaded LAPACK Routines. • Level1 and Level2 BLAS. For the list of threaded routines, see Threaded BLAS Level1 and Level2 Routines. • All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers. • All mathematical VML functions. • FFT. For the list of FFT transforms that can be threaded, see Threaded FFT Problems. 43Threaded LAPACK Routines In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z. The following LAPACK routines are threaded: • Linear equations, computational routines: • Factorization: ?getrf, ?gbtrf, ?potrf, ?pptrf, ?sytrf, ?hetrf, ?sptrf, ?hptrf • Solving: ?dttrsb, ?gbtrs, ?gttrs, ?pptrs, ?pbtrs, ?pttrs, ?sytrs, ?sptrs, ?hptrs, ? tptrs, ?tbtrs • Orthogonal factorization, computational routines: ?geqrf, ?ormqr, ?unmqr, ?ormlq, ?unmlq, ?ormql, ?unmql, ?ormrq, ?unmrq • Singular Value Decomposition, computational routines: ?gebrd, ?bdsqr • Symmetric Eigenvalue Problems, computational routines: ?sytrd, ?hetrd, ?sptrd, ?hptrd, ?steqr, ?stedc. • Generalized Nonsymmetric Eigenvalue Problems, computational routines: chgeqz/zhgeqz. A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of parallelism: ?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges, cggesx/zggesx, cggev/zggev, cggevx/zggevx, and so on. Threaded BLAS Level1 and Level2 Routines In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z. The following routines are threaded for Intel ® Core™2 Duo and Intel ® Core™ i7 processors: • Level1 BLAS: ?axpy, ?copy, ?swap, ddot/sdot, cdotc, drot/srot • Level2 BLAS: ?gemv, ?trmv, dsyr/ssyr, dsyr2/ssyr2, dsymv/ssymv Threaded FFT Problems The following characteristics of a specific problem determine whether your FFT computation may be threaded: • rank • domain • size/length • precision (single or double) • placement (in-place or out-of-place) • strides • number of transforms • layout (for example, interleaved or split layout of complex data) Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow. One-dimensional (1D) transforms 1D transforms are threaded in many cases. 5 Intel® Math Kernel Library for Windows* OS User's Guide 441D complex-to-complex (c2c) transforms of size N using interleaved complex data layout are threaded under the following conditions depending on the architecture: Architecture Conditions Intel ® 64 N is a power of 2, log2(N) > 9, the transform is double-precision out-of-place, and input/output strides equal 1. IA-32 N is a power of 2, log2(N) > 13, and the transform is single-precision. N is a power of 2, log2(N) > 14, and the transform is double-precision. Any N is composite, log2(N) > 16, and input/output strides equal 1. 1D real-to-complex and complex-to-real transforms are not threaded. 1D complex-to-complex transforms using split-complex layout are not threaded. Prime-size complex-to-complex 1D transforms are not threaded. Multidimensional transforms All multidimensional transforms on large-volume data are threaded. Avoiding Conflicts in the Execution Environment Certain situations can cause conflicts in the execution environment that make the use of threads in Intel MKL problematic. This section briefly discusses why these problems exist and how to avoid them. If you thread the program using OpenMP directives and compile the program with Intel compilers, Intel MKL and the program will both use the same threading library. Intel MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads unless you specifically request Intel MKL to do so via the MKL_DYNAMIC functionality. However, Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library. If your program is threaded by some other means, Intel MKL may operate in multithreaded mode, and the performance may suffer due to overuse of the resources. The following table considers several cases where the conflicts may arise and provides recommendations depending on your threading model: Threading model Discussion You thread the program using OS threads (Win32* threads on Windows* OS). If more than one thread calls Intel MKL, and the function being called is threaded, it may be important that you turn off Intel MKL threading. Set the number of threads to one by any of the available means (see Techniques to Set the Number of Threads). You thread the program using OpenMP directives and/or pragmas and compile the program using a compiler other than a compiler from Intel. This is more problematic because setting of the OMP_NUM_THREADS environment variable affects both the compiler's threading library and libiomp. In this case, choose the threading library that matches the layered Intel MKL with the OpenMP compiler you employ (see Linking Examples on how to do this). If this is not possible, use Intel MKL in the sequential mode. To do this, you should link with the appropriate threading library: mkl_sequential.lib or mkl_sequential.dll (see High-level Directory Structure). There are multiple programs running on a multiple-cpu system, for example, a parallelized program that runs using MPI for communication in which each processor is treated as a node. The threading software will see multiple processors on the system even though each processor has a separate MPI process running on it. In this case, one of the solutions is to set the number of threads to one by any of the available means (see Techniques to Set the Number of Threads). Section Intel(R) Optimized MP LINPACK Benchmark for Clusters discusses another solution for a Hybrid (OpenMP* + MPI) mode. Managing Performance and Memory 5 45TIP To get best performance with threaded Intel MKL, compile your code with the /MT option. See Also Using Additional Threading Control Linking with Compiler Run-time Libraries Techniques to Set the Number of Threads Use one of the following techniques to change the number of threads to use in Intel MKL: • Set one of the OpenMP or Intel MKL environment variables: • OMP_NUM_THREADS • MKL_NUM_THREADS • MKL_DOMAIN_NUM_THREADS • Call one of the OpenMP or Intel MKL functions: • omp_set_num_threads() • mkl_set_num_threads() • mkl_domain_set_num_threads() When choosing the appropriate technique, take into account the following rules: • The Intel MKL threading controls take precedence over the OpenMP controls because they are inspected first. • A function call takes precedence over any environment variables. The exception, which is a consequence of the previous rule, is the OpenMP subroutine omp_set_num_threads(), which does not have precedence over Intel MKL environment variables, such as MKL_NUM_THREADS. See Using Additional Threading Control for more details. • You cannot change run-time behavior in the course of the run using the environment variables because they are read only once at the first call to Intel MKL. Setting the Number of Threads Using an OpenMP* Environment Variable You can set the number of threads using the environment variable OMP_NUM_THREADS. To change the number of threads, in the command shell in which the program is going to run, enter: set OMP_NUM_THREADS=. Some shells require the variable and its value to be exported: export OMP_NUM_THREADS=. You can alternatively assign value to the environment variable using Microsoft Windows* OS Control Panel. Note that you will not benefit from setting this variable on Microsoft Windows* 98 or Windows* ME because multiprocessing is not supported. See Also Using Additional Threading Control Changing the Number of Threads at Run Time You cannot change the number of threads during run time using environment variables. However, you can call OpenMP API functions from your program to change the number of threads during run time. The following sample code shows how to change the number of threads during run time using the omp_set_num_threads() routine. See also Techniques to Set the Number of Threads. 5 Intel® Math Kernel Library for Windows* OS User's Guide 46The following example shows both C and Fortran code examples. To run this example in the C language, use the omp.h header file from the Intel(R) compiler package. If you do not have the Intel compiler but wish to explore the functionality in the example, use Fortran API for omp_set_num_threads() rather than the C version. For example, omp_set_num_threads_( &i_one ); // ******* C language ******* #include "omp.h" #include "mkl.h" #include #define SIZE 1000 int main(int args, char *argv[]){ double *a, *b, *c; a = (double*)malloc(sizeof(double)*SIZE*SIZE); b = (double*)malloc(sizeof(double)*SIZE*SIZE); c = (double*)malloc(sizeof(double)*SIZE*SIZE); double alpha=1, beta=1; int m=SIZE, n=SIZE, k=SIZE, lda=SIZE, ldb=SIZE, ldc=SIZE, i=0, j=0; char transa='n', transb='n'; for( i=0; i #include ... mkl_set_num_threads ( 1 ); // ******* Fortran language ******* ... call mkl_set_num_threads( 1 ) See the Intel MKL Reference Manual for the detailed description of the threading control functions, their parameters, calling syntax, and more code examples. MKL_DYNAMIC The MKL_DYNAMIC environment variable enables Intel MKL to dynamically change the number of threads. The default value of MKL_DYNAMIC is TRUE, regardless of OMP_DYNAMIC, whose default value may be FALSE. When MKL_DYNAMIC is TRUE, Intel MKL tries to use what it considers the best number of threads, up to the maximum number you specify. Managing Performance and Memory 5 49For example, MKL_DYNAMIC set to TRUE enables optimal choice of the number of threads in the following cases: • If the requested number of threads exceeds the number of physical cores (perhaps because of using the Intel® Hyper-Threading Technology), and MKL_DYNAMIC is not changed from its default value of TRUE, Intel MKL will scale down the number of threads to the number of physical cores. • If you are able to detect the presence of MPI, but cannot determine if it has been called in a thread-safe mode (it is impossible to detect this with MPICH 1.2.x, for instance), and MKL_DYNAMIC has not been changed from its default value of TRUE, Intel MKL will run one thread. When MKL_DYNAMIC is FALSE, Intel MKL tries not to deviate from the number of threads the user requested. However, setting MKL_DYNAMIC=FALSE does not ensure that Intel MKL will use the number of threads that you request. The library may have no choice on this number for such reasons as system resources. Additionally, the library may examine the problem and use a different number of threads than the value suggested. For example, if you attempt to do a size one matrix-matrix multiply across eight threads, the library may instead choose to use only one thread because it is impractical to use eight threads in this event. Note also that if Intel MKL is called in a parallel region, it will use only one thread by default. If you want the library to use nested parallelism, and the thread within a parallel region is compiled with the same OpenMP compiler as Intel MKL is using, you may experiment with setting MKL_DYNAMIC to FALSE and manually increasing the number of threads. In general, set MKL_DYNAMIC to FALSE only under circumstances that Intel MKL is unable to detect, for example, to use nested parallelism where the library is already called from a parallel section. MKL_DOMAIN_NUM_THREADS The MKL_DOMAIN_NUM_THREADS environment variable suggests the number of threads for a particular function domain. MKL_DOMAIN_NUM_THREADS accepts a string value , which must have the following format: ::= { } ::= [ * ] ( | | | ) [ * ] ::= ::= MKL_DOMAIN_ALL | MKL_DOMAIN_BLAS | MKL_DOMAIN_FFT | MKL_DOMAIN_VML | MKL_DOMAIN_PARDISO ::= [ * ] ( | | ) [ * ] ::= ::= | | In the syntax above, values of indicate function domains as follows: MKL_DOMAIN_ALL All function domains MKL_DOMAIN_BLAS BLAS Routines MKL_DOMAIN_FFT non-cluster Fourier Transform Functions MKL_DOMAIN_VML Vector Mathematical Functions MKL_DOMAIN_PARDISO PARDISO For example, MKL_DOMAIN_ALL 2 : MKL_DOMAIN_BLAS 1 : MKL_DOMAIN_FFT 4 MKL_DOMAIN_ALL=2 : MKL_DOMAIN_BLAS=1 : MKL_DOMAIN_FFT=4 MKL_DOMAIN_ALL=2, MKL_DOMAIN_BLAS=1, MKL_DOMAIN_FFT=4 MKL_DOMAIN_ALL=2; MKL_DOMAIN_BLAS=1; MKL_DOMAIN_FFT=4 MKL_DOMAIN_ALL = 2 MKL_DOMAIN_BLAS 1 , MKL_DOMAIN_FFT 4 5 Intel® Math Kernel Library for Windows* OS User's Guide 50MKL_DOMAIN_ALL,2: MKL_DOMAIN_BLAS 1, MKL_DOMAIN_FFT,4 . The global variables MKL_DOMAIN_ALL, MKL_DOMAIN_BLAS, MKL_DOMAIN_FFT, MKL_DOMAIN_VML, and MKL_DOMAIN_PARDISO, as well as the interface for the Intel MKL threading control functions, can be found in the mkl.h header file. The table below illustrates how values of MKL_DOMAIN_NUM_THREADS are interpreted. Value of MKL_DOMAIN_NUM_ THREADS Interpretation MKL_DOMAIN_ALL= 4 All parts of Intel MKL should try four threads. The actual number of threads may be still different because of the MKL_DYNAMIC setting or system resource issues. The setting is equivalent to MKL_NUM_THREADS = 4. MKL_DOMAIN_ALL= 1, MKL_DOMAIN_BLAS =4 All parts of Intel MKL should try one thread, except for BLAS, which is suggested to try four threads. MKL_DOMAIN_VML= 2 VML should try two threads. The setting affects no other part of Intel MKL. Be aware that the domain-specific settings take precedence over the overall ones. For example, the "MKL_DOMAIN_BLAS=4" value of MKL_DOMAIN_NUM_THREADS suggests trying four threads for BLAS, regardless of later setting MKL_NUM_THREADS, and a function call "mkl_domain_set_num_threads ( 4, MKL_DOMAIN_BLAS );" suggests the same, regardless of later calls to mkl_set_num_threads(). However, a function call with input "MKL_DOMAIN_ALL", such as "mkl_domain_set_num_threads (4, MKL_DOMAIN_ALL);" is equivalent to "mkl_set_num_threads(4)", and thus it will be overwritten by later calls to mkl_set_num_threads. Similarly, the environment setting of MKL_DOMAIN_NUM_THREADS with "MKL_DOMAIN_ALL=4" will be overwritten with MKL_NUM_THREADS = 2. Whereas the MKL_DOMAIN_NUM_THREADS environment variable enables you set several variables at once, for example, "MKL_DOMAIN_BLAS=4,MKL_DOMAIN_FFT=2", the corresponding function does not take string syntax. So, to do the same with the function calls, you may need to make several calls, which in this example are as follows: mkl_domain_set_num_threads ( 4, MKL_DOMAIN_BLAS ); mkl_domain_set_num_threads ( 2, MKL_DOMAIN_FFT ); Setting the Environment Variables for Threading Control To set the environment variables used for threading control, in the command shell in which the program is going to run, enter : set = For example, set MKL_NUM_THREADS=4 set MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_ALL=1, MKL_DOMAIN_BLAS=4" set MKL_DYNAMIC=FALSE Some shells require the variable and its value to be exported: export = For example: export MKL_NUM_THREADS=4 export MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_ALL=1, MKL_DOMAIN_BLAS=4" export MKL_DYNAMIC=FALSE Managing Performance and Memory 5 51You can alternatively assign values to the environment variables using Microsoft Windows* OS Control Panel. Tips and Techniques to Improve Performance Coding Techniques To obtain the best performance with Intel MKL, ensure the following data alignment in your source code: • Align arrays on 16-byte boundaries. See Aligning Addresses on 16-byte Boundaries for how to do it. • Make sure leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16, where element_size is the size of an array element in bytes. • For two-dimensional arrays, avoid leading dimension values divisible by 2048 bytes. For example, for a double-precision array, with element_size = 8, avoid leading dimensions 256, 512, 768, 1024, … (elements). LAPACK Packed Routines The routines with the names that contain the letters HP, OP, PP, SP, TP, UP in the matrix type and storage position (the second and third letters respectively) operate on the matrices in the packed format (see LAPACK "Routine Naming Conventions" sections in the Intel MKL Reference Manual). Their functionality is strictly equivalent to the functionality of the unpacked routines with the names containing the letters HE, OR, PO, SY, TR, UN in the same positions, but the performance is significantly lower. If the memory restriction is not too tight, use an unpacked routine for better performance. In this case, you need to allocate N 2 /2 more memory than the memory required by a respective packed routine, where N is the problem size (the number of equations). For example, to speed up solving a symmetric eigenproblem with an expert driver, use the unpacked routine: call dsyevx(jobz, range, uplo, n, a, lda, vl, vu, il, iu, abstol, m, w, z, ldz, work, lwork, iwork, ifail, info) where a is the dimension lda-by-n, which is at least N 2 elements, instead of the packed routine: call dspevx(jobz, range, uplo, n, ap, vl, vu, il, iu, abstol, m, w, z, ldz, work, iwork, ifail, info) where ap is the dimension N*(N+1)/2. FFT Functions Additional conditions can improve performance of the FFT functions. The addresses of the first elements of arrays and the leading dimension values, in bytes (n*element_size), of two-dimensional arrays should be divisible by cache line size, which equals: • 32 bytes for the Intel ® Pentium® III processors • 64 bytes for the Intel ® Pentium® 4 processors and processors using Intel ® 64 architecture 5 Intel® Math Kernel Library for Windows* OS User's Guide 52Hardware Configuration Tips Dual-Core Intel® Xeon® processor 5100 series systems To get the best performance with Intel MKL on Dual-Core Intel ® Xeon® processor 5100 series systems, enable the Hardware DPL (streaming data) Prefetcher functionality of this processor. To configure this functionality, use the appropriate BIOS settings, as described in your BIOS documentation. Intel® Hyper-Threading Technology Intel ® Hyper-Threading Technology (Intel ® HT Technology) is especially effective when each thread performs different types of operations and when there are under-utilized resources on the processor. However, Intel MKL fits neither of these criteria because the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance by disabling Intel HT Technology. If you run with Intel HT Technology enabled, performance may be especially impacted if you run on fewer threads than physical cores. Moreover, if, for example, there are two threads to every physical core, the thread scheduler may assign two threads to some cores and ignore the other cores altogether. If you are using the OpenMP* library of the Intel Compiler, read the respective User Guide on how to best set the thread affinity interface to avoid this situation. For Intel MKL, apply the following setting: set KMP_AFFINITY=granularity=fine,compact,1,0 See Also Using Parallelism of the Intel® Math Kernel Library Managing Multi-core Performance You can obtain best performance on systems with multi-core processors by requiring that threads do not migrate from core to core. To do this, bind threads to the CPU cores by setting an affinity mask to threads. Use one of the following options: • OpenMP facilities (recommended, if available), for example, the KMP_AFFINITY environment variable using the Intel OpenMP library • A system function, as explained below Consider the following performance issue: • The system has two sockets with two cores each, for a total of four cores (CPUs) • Performance of t he four -thread parallel application using the Intel MKL LAPACK is unstable The following code example shows how to resolve this issue by setting an affinity mask by operating system means using the Intel compiler. The code calls the system function SetThreadAffinityMask to bind the threads to appropriate cores , thus preventing migration of the threads. Then the Intel MKL LAPACK routine is called: // Set affinity mask #include #include int main(void) { #pragma omp parallel default(shared) { int tid = omp_get_thread_num(); // 2 packages x 2 cores/pkg x 1 threads/core (4 total cores) DWORD_PTR mask = (1 << (tid == 0 ? 0 : 2 )); SetThreadAffinityMask( GetCurrentThread(), mask ); } // Call Intel MKL LAPACK routine return 0; Managing Performance and Memory 5 53 } Compile the application with the Intel compiler using the following command: icl /Qopenmp test_application.c where test_application.c is the filename for the application. Build the application. Run it in four threads, for example, by using the environment variable to set the number of threads: set OMP_NUM_THREADS=4 test_application.exe See Windows API documentation at msdn.microsoft.com/ for the restrictions on the usage of Windows API routines and particulars of the SetThreadAffinityMask function used in the above example. See also a similar example at en.wikipedia.org/wiki/Affinity_mask . Operating on Denormals The IEEE 754-2008 standard, "An IEEE Standard for Binary Floating-Point Arithmetic", defines denormal (or subnormal) numbers as non-zero numbers smaller than the smallest possible normalized numbers for a specific floating-point format. Floating-point operations on denormals are slower than on normalized operands because denormal operands and results are usually handled through a software assist mechanism rather than directly in hardware. This software processing causes Intel MKL functions that consume denormals to run slower than with normalized floating-point numbers. You can mitigate this performance issue by setting the appropriate bit fields in the MXCSR floating-point control register to flush denormals to zero (FTZ) or to replace any denormals loaded from memory with zero (DAZ). Check your compiler documentation to determine whether it has options to control FTZ and DAZ. Note that these compiler options may slightly affect accuracy. FFT Optimized Radices You can improve the performance of Intel MKL FFT if the length of your data vector permits factorization into powers of optimized radices. In Intel MKL, the optimized radices are 2, 3, 5, 7, 11, and 13. Using Memory Management Intel MKL Memory Management Software Intel MKL has memory management software that controls memory buffers for the use by the library functions. New buffers that the library allocates when your application calls Intel MKL are not deallocated until the program ends. To get the amount of memory allocated by the memory management software, call the mkl_mem_stat() function. If your program needs to free memory, call mkl_free_buffers(). If another call is made to a library function that needs a memory buffer, the memory manager again allocates the buffers and they again remain allocated until either the program ends or the program deallocates the memory. This behavior facilitates better performance. However, some tools may report this behavior as a memory leak. The memory management software is turned on by default. To turn it off, set the MKL_DISABLE_FAST_MM environment variable to any value or call the mkl_disable_fast_mm() function. Be aware that this change may negatively impact performance of some Intel MKL routines, especially for small problem sizes. 5 Intel® Math Kernel Library for Windows* OS User's Guide 54Redefining Memory Functions In C/C++ programs, you can replace Intel MKL memory functions that the library uses by default with your own functions. To do this, use the memory renaming feature. Memory Renaming Intel MKL memory management by default uses standard C run-time memory functions to allocate or free memory. These functions can be replaced using memory renaming. Intel MKL accesses the memory functions by pointers i_malloc, i_free, i_calloc, and i_realloc, which are visible at the application level. These pointers initially hold addresses of the standard C run-time memory functions malloc, free, calloc, and realloc, respectively. You can programmatically redefine values of these pointers to the addresses of your application's memory management functions. Redirecting the pointers is the only correct way to use your own set of memory management functions. If you call your own memory functions without redirecting the pointers, the memory will get managed by two independent memory management packages, which may cause unexpected memory issues. How to Redefine Memory Functions To redefine memory functions, use the following procedure: If you are using the statically linked Intel MKL, 1. Include the i_malloc.h header file in your code. This header file contains all declarations required for replacing the memory allocation functions. The header file also describes how memory allocation can be replaced in those Intel libraries that support this feature. 2. Redefine values of pointers i_malloc, i_free, i_calloc, and i_realloc prior to the first call to MKL functions, as shown in the following example: #include "i_malloc.h" . . . i_malloc = my_malloc; i_calloc = my_calloc; i_realloc = my_realloc; i_free = my_free; . . . // Now you may call Intel MKL functions If you are using the dynamically linked Intel MKL, 1. Include the i_malloc.h header file in your code. 2. Redefine values of pointers i_malloc_dll, i_free_dll, i_calloc_dll, and i_realloc_dll prior to the first call to MKL functions, as shown in the following example: #include "i_malloc.h" . . . i_malloc_dll = my_malloc; i_calloc_dll = my_calloc; i_realloc_dll = my_realloc; i_free_dll = my_free; . . . // Now you may call Intel MKL functions Managing Performance and Memory 5 555 Intel® Math Kernel Library for Windows* OS User's Guide 56Language-specific Usage Options 6 The Intel® Math Kernel Library (Intel® MKL) provides broad support for Fortran and C/C++ programming. However, not all functions support both Fortran and C interfaces. For example, some LAPACK functions have no C interface. You can call such functions from C using mixed-language programming. If you want to use LAPACK or BLAS functions that support Fortran 77 in the Fortran 95 environment, additional effort may be initially required to build compiler-specific interface libraries and modules from the source code provided with Intel MKL. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Using Language-Specific Interfaces with Intel® Math Kernel Library This section discusses mixed-language programming and the use of language-specific interfaces with Intel MKL. See also Appendix G in the Intel MKL Reference Manual for details of the FFTW interfaces to Intel MKL. Interface Libraries and Modules You can create the following interface libraries and modules using the respective makefiles located in the interfaces directory. File name Contains Libraries, in Intel MKL architecture-specific directories mkl_blas95.lib 1 Fortran 95 wrappers for BLAS (BLAS95) for IA-32 architecture. mkl_blas95_ilp64.lib 1 Fortran 95 wrappers for BLAS (BLAS95) supporting LP64 interface. mkl_blas95_lp64.lib 1 Fortran 95 wrappers for BLAS (BLAS95) supporting ILP64 interface. mkl_lapack95.lib 1 Fortran 95 wrappers for LAPACK (LAPACK95) for IA-32 architecture. mkl_lapack95_lp64.lib 1 Fortran 95 wrappers for LAPACK (LAPACK95) supporting LP64 interface. mkl_lapack95_ilp64.lib 1 Fortran 95 wrappers for LAPACK (LAPACK95) supporting ILP64 interface. 57File name Contains fftw2xc_intel.lib 1 Interfaces for FFTW version 2.x (C interface for Intel compilers) to call Intel MKL FFTs. fftw2xc_ms.lib Contains interfaces for FFTW version 2.x (C interface for Microsoft compilers) to call Intel MKL FFTs. fftw2xf_intel.lib Interfaces for FFTW version 2.x (Fortran interface for Intel compilers) to call Intel MKL FFTs. fftw3xc_intel.lib 2 Interfaces for FFTW version 3.x (C interface for Intel compiler) to call Intel MKL FFTs. fftw3xc_ms.lib Interfaces for FFTW version 3.x (C interface for Microsoft compilers) to call Intel MKL FFTs. fftw3xf_intel.lib 2 Interfaces for FFTW version 3.x (Fortran interface for Intel compilers) to call Intel MKL FFTs. fftw2x_cdft_SINGLE.lib Single-precision interfaces for MPI FFTW version 2.x (C interface) to call Intel MKL cluster FFTs. fftw2x_cdft_DOUBLE.lib Double-precision interfaces for MPI FFTW version 2.x (C interface) to call Intel MKL cluster FFTs. fftw3x_cdft.lib Interfaces for MPI FFTW version 3.x (C interface) to call Intel MKL cluster FFTs. fftw3x_cdft_ilp64.lib Interfaces for MPI FFTW version 3.x (C interface) to call Intel MKL cluster FFTs supporting the ILP64 interface. Modules, in architecture- and interface-specific subdirectories of the Intel MKL include directory blas95.mod 1 Fortran 95 interface module for BLAS (BLAS95). lapack95.mod 1 Fortran 95 interface module for LAPACK (LAPACK95). f95_precision.mod 1 Fortran 95 definition of precision parameters for BLAS95 and LAPACK95. mkl95_blas.mod 1 Fortran 95 interface module for BLAS (BLAS95), identical to blas95.mod. To be removed in one of the future releases. mkl95_lapack.mod 1 Fortran 95 interface module for LAPACK (LAPACK95), identical to lapack95.mod. To be removed in one of the future releases. mkl95_precision.mod 1 Fortran 95 definition of precision parameters for BLAS95 and LAPACK95, identical to f95_precision.mod. To be removed in one of the future releases. mkl_service.mod 1 Fortran 95 interface module for Intel MKL support functions. 1 Prebuilt for the Intel® Fortran compiler 2 FFTW3 interfaces are integrated with Intel MKL. Look into \interfaces\fftw3x* \makefile for options defining how to build and where to place the standalone library with the wrappers. See Also Fortran 95 Interfaces to LAPACK and BLAS 6 Intel® Math Kernel Library for Windows* OS User's Guide 58Fortran 95 Interfaces to LAPACK and BLAS Fortran 95 interfaces are compiler-dependent. Intel MKL provides the interface libraries and modules precompiled with the Intel® Fortran compiler. Additionally, the Fortran 95 interfaces and wrappers are delivered as sources. (For more information, see Compiler-dependent Functions and Fortran 90 Modules). If you are using a different compiler, build the appropriate library and modules with your compiler and link the library as a user's library: 1. Go to the respective directory \interfaces\blas95 or \interfaces\lapack95 2. Type one of the following commands depending on your architecture: • For the IA-32 architecture, nmake libia32 install_dir= • For the Intel® 64 architecture, nmake libintel64 [interface=lp64|ilp64] install_dir= Important The parameter install_dir is required. As a result, the required library is built and installed in the \lib directory, and the .mod files are built and installed in the \include\[\{lp64|ilp64}] directory, where is one of {ia32, intel64}. By default, the ifort compiler is assumed. You may change the compiler with an additional parameter of nmake: FC=. For example, the command nmake libintel64 FC=f95 install_dir= interface=lp64 builds the required library and .mod files and installs them in subdirectories of . To delete the library from the building directory, use one of the following commands: • For the IA-32 architecture, nmake cleania32 install_dir= • For the Intel ® 64 architecture, nmake cleanintel64 [interface=lp64|ilp64] install_dir= • For all the architectures, nmake clean install_dir= CAUTION Even if you have administrative rights, avoid setting install_dir=..\.. or install_dir= in a build or clean command above because these settings replace or delete the Intel MKL prebuilt Fortran 95 library and modules. Compiler-dependent Functions and Fortran 90 Modules Compiler-dependent functions occur whenever the compiler inserts into the object code function calls that are resolved in its run-time library (RTL). Linking of such code without the appropriate RTL will result in undefined symbols. Intel MKL has been designed to minimize RTL dependencies. In cases where RTL dependencies might arise, the functions are delivered as source code and you need to compile the code with whatever compiler you are using for your application. Language-specific Usage Options 6 59In particular, Fortran 90 modules result in the compiler-specific code generation requiring RTL support. Therefore, Intel MKL delivers these modules compiled with the Intel compiler, along with source code, to be used with different compilers. Using the stdcall Calling Convention in C/C++ Intel MKL supports stdcall calling convention for the following function domains: • BLAS Routines • Sparse BLAS Routines • LAPACK Routines • Vector Mathematical Functions • Vector Statistical Functions • PARDISO • Direct Sparse Solvers • RCI Iterative Solvers • Support Functions To use the stdcall calling convention in C/C++, follow the guidelines below: • In your function calls, pass lengths of character strings to the functions. For example, compare the following calls to dgemm: cdecl: dgemm("N", "N", &n, &m, &k, &alpha, b, &ldb, a, &lda, &beta, c, &ldc); stdcall: dgemm("N", 1, "N", 1, &n, &m, &k, &alpha, b, &ldb, a, &lda, &beta, c, &ldc); • Define the MKL_STDCALL macro using either of the following techniques: – Define the macro in your source code before including Intel MKL header files: ... #define MKL_STDCALL #include "mkl.h" ... – Pass the macro to the compiler. For example: icl -DMKL_STDCALL foo.c • Link your application with the following library: – mkl_intel_s.lib for static linking – mkl_intel_s_dll.lib for dynamic linking See Also Using the cdecl and stdcall Interfaces Compiling an Application that Calls the Intel® Math Kernel Library and Uses the CVF Calling Conventions Include Files Compiling an Application that Calls the Intel® Math Kernel Library and Uses the CVF Calling Conventions The IA-32 architecture implementation of Intel MKL supports the Compaq Visual Fortran* (CVF) calling convention by providing the stdcall interface. 6 Intel® Math Kernel Library for Windows* OS User's Guide 60Although the Intel MKL does not provide the CVF interface in its Intel® 64 architecture implementation, you can use the Intel® Visual Fortran Compiler to compile your Intel® 64 architecture application that calls Intel MKL and uses the CVF calling convention. To do this: • Provide the following compiler options to enable compatibility with the CVF calling convention: /Gm or /iface:cvf • Additionally provide the following options to enable calling Intel MKL from your application: /iface:nomixed_str_len_arg See Also Using the cdecl and stdcall Interfaces Compiler Support Mixed-language Programming with the Intel Math Kernel Library Appendix A: Intel(R) Math Kernel Library Language Interfaces Support lists the programming languages supported for each Intel MKL function domain. However, you can call Intel MKL routines from different language environments. Calling LAPACK, BLAS, and CBLAS Routines from C/C++ Language Environments Not all Intel MKL function domains support both C and Fortran environments. To use Intel MKL Fortran-style functions in C/C++ environments, you should observe certain conventions, which are discussed for LAPACK and BLAS in the subsections below. CAUTION Avoid calling BLAS 95/LAPACK 95 from C/C++. Such calls require skills in manipulating the descriptor of a deferred-shape array, which is the Fortran 90 type. Moreover, BLAS95/LAPACK95 routines contain links to a Fortran RTL. LAPACK and BLAS Because LAPACK and BLAS routines are Fortran-style, when calling them from C-language programs, follow the Fortran-style calling conventions: • Pass variables by address, not by value. Function calls in Example "Calling a Complex BLAS Level 1 Function from C++" and Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" illustrate this. • Store your data in Fortran style, that is, column-major rather than row-major order. With row-major order, adopted in C, the last array index changes most quickly and the first one changes most slowly when traversing the memory segment where the array is stored. With Fortran-style columnmajor order, the last index changes most slowly whereas the first index changes most quickly (as illustrated by the figure below for a two-dimensional array). Language-specific Usage Options 6 61For example, if a two-dimensional matrix A of size mxn is stored densely in a one-dimensional array B, you can access a matrix element like this: A[i][j] = B[i*n+j] in C ( i=0, ... , m-1, j=0, ... , -1) A(i,j) = B(j*m+i) in Fortran ( i=1, ... , m, j=1, ... , n). When calling LAPACK or BLAS routines from C, be aware that because the Fortran language is caseinsensitive, the routine names can be both upper-case or lower-case, with or without the trailing underscore. For example, the following names are equivalent: • LAPACK: dgetrf, DGETRF, dgetrf_, and DGETRF_ • BLAS: dgemm, DGEMM, dgemm_, and DGEMM_ See Example "Calling a Complex BLAS Level 1 Function from C++" on how to call BLAS routines from C. See also the Intel(R) MKL Reference Manual for a description of the C interface to LAPACK functions. CBLAS Instead of calling BLAS routines from a C-language program, you can use the CBLAS interface. CBLAS is a C-style interface to the BLAS routines. You can call CBLAS routines using regular C-style calls. Use the mkl.h header file with the CBLAS interface. The header file specifies enumerated values and prototypes of all the functions. It also determines whether the program is being compiled with a C++ compiler, and if it is, the included file will be correct for use with C++ compilation. Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" illustrates the use of the CBLAS interface. C Interface to LAPACK Instead of calling LAPACK routines from a C-language program, you can use the C interface to LAPACK provided by Intel MKL. The C interface to LAPACK is a C-style interface to the LAPACK routines. This interface supports matrices in row-major and column-major order, which you can define in the first function argument matrix_order. Use the mkl_lapacke.h header file with the C interface to LAPACK. The header file specifies constants and prototypes of all the functions. It also determines whether the program is being compiled with a C++ compiler, and if it is, the included file will be correct for use with C++ compilation. You can find examples of the C interface to LAPACK in the examples\lapacke subdirectory in the Intel MKL installation directory. Using Complex Types in C/C++ As described in the documentation for the Intel® Visual Fortran Compiler XE, C/C++ does not directly implement the Fortran types COMPLEX(4) and COMPLEX(8). However, you can write equivalent structures. The type COMPLEX(4) consists of two 4-byte floating-point numbers. The first of them is the real-number component, and the second one is the imaginary-number component. The type COMPLEX(8) is similar to COMPLEX(4) except that it contains two 8-byte floating-point numbers. Intel MKL provides complex types MKL_Complex8 and MKL_Complex16, which are structures equivalent to the Fortran complex types COMPLEX(4) and COMPLEX(8), respectively. The MKL_Complex8 and MKL_Complex16 types are defined in the mkl_types.h header file. You can use these types to define complex data. You can also redefine the types with your own types before including the mkl_types.h header file. The only requirement is that the types must be compatible with the Fortran complex layout, that is, the complex type must be a pair of real numbers for the values of real and imaginary parts. For example, you can use the following definitions in your C++ code: #define MKL_Complex8 std::complex and #define MKL_Complex16 std::complex 6 Intel® Math Kernel Library for Windows* OS User's Guide 62See Example "Calling a Complex BLAS Level 1 Function from C++" for details. You can also define these types in the command line: -DMKL_Complex8="std::complex" -DMKL_Complex16="std::complex" See Also Intel® Software Documentation Library Calling BLAS Functions that Return the Complex Values in C/C++ Code Complex values that functions return are handled differently in C and Fortran. Because BLAS is Fortran-style, you need to be careful when handling a call from C to a BLAS function that returns complex values. However, in addition to normal function calls, Fortran enables calling functions as though they were subroutines, which provides a mechanism for returning the complex value correctly when the function is called from a C program. When a Fortran function is called as a subroutine, the return value is the first parameter in the calling sequence. You can use this feature to call a BLAS function from C. The following example shows how a call to a Fortran function as a subroutine converts to a call from C and the hidden parameter result gets exposed: Normal Fortran function call: result = cdotc( n, x, 1, y, 1 ) A call to the function as a subroutine: call cdotc( result, n, x, 1, y, 1) A call to the function from C: cdotc( &result, &n, x, &one, y, &one ) NOTE Intel MKL has both upper-case and lower-case entry points in the Fortran-style (caseinsensitive) BLAS, with or without the trailing underscore. So, all these names are equivalent and acceptable: cdotc, CDOTC, cdotc_, and CDOTC_. The above example shows one of the ways to call several level 1 BLAS functions that return complex values from your C and C++ applications. An easier way is to use the CBLAS interface. For instance, you can call the same function using the CBLAS interface as follows: cblas_cdotu( n, x, 1, y, 1, &result ) NOTE The complex value comes last on the argument list in this case. The following examples show use of the Fortran-style BLAS interface from C and C++, as well as the CBLAS (C language) interface: • Example "Calling a Complex BLAS Level 1 Function from C" • Example "Calling a Complex BLAS Level 1 Function from C++" • Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" Example "Calling a Complex BLAS Level 1 Function from C" The example below illustrates a call from a C program to the complex BLAS Level 1 function zdotc(). This function computes the dot product of two double-precision complex vectors. In this example, the complex dot product is returned in the structure c. #include "mkl.h" #define N 5 int main() { int n = N, inca = 1, incb = 1, i; MKL_Complex16 a[N], b[N], c; for( i = 0; i < n; i++ ){ a[i].real = (double)i; a[i].imag = (double)i * 2.0; b[i].real = (double)(n - i); b[i].imag = (double)i * 2.0; Language-specific Usage Options 6 63} zdotc( &c, &n, a, &inca, b, &incb ); printf( "The complex dot product is: ( %6.2f, %6.2f)\n", c.real, c.imag ); return 0; } Example "Calling a Complex BLAS Level 1 Function from C++" Below is the C++ implementation: #include #include #define MKL_Complex16 std::complex #include "mkl.h" #define N 5 int main() { int n, inca = 1, incb = 1, i; std::complex a[N], b[N], c; n = N; for( i = 0; i < n; i++ ){ a[i] = std::complex(i,i*2.0); b[i] = std::complex(n-i,i*2.0); } zdotc(&c, &n, a, &inca, b, &incb ); std::cout << "The complex dot product is: " << c << std::endl; return 0; } Example "Using CBLAS Interface Instead of Calling BLAS Directly from C" This example uses CBLAS: #include #include "mkl.h" typedef struct{ double re; double im; } complex16; #define N 5 int main() { int n, inca = 1, incb = 1, i; complex16 a[N], b[N], c; n = N; for( i = 0; i < n; i++ ){ a[i].re = (double)i; a[i].im = (double)i * 2.0; b[i].re = (double)(n - i); b[i].im = (double)i * 2.0; } cblas_zdotc_sub(n, a, inca, b, incb, &c ); printf( "The complex dot product is: ( %6.2f, %6.2f)\n", c.re, c.im ); return 0; } Support for Boost uBLAS Matrix-matrix Multiplication If you are used to uBLAS, you can perform BLAS matrix-matrix multiplication in C++ using Intel MKL substitution of Boost uBLAS functions. uBLAS is the Boost C++ open-source library that provides BLAS functionality for dense, packed, and sparse matrices. The library uses an expression template technique for passing expressions as function arguments, which enables evaluating vector and matrix expressions in one pass without temporary matrices. uBLAS provides two modes: • Debug (safe) mode, default. 6 Intel® Math Kernel Library for Windows* OS User's Guide 64Checks types and conformance. • Release (fast) mode. Does not check types and conformance. To enable this mode, use the NDEBUG preprocessor symbol. The documentation for the Boost uBLAS is available at www.boost.org. Intel MKL provides overloaded prod() functions for substituting uBLAS dense matrix-matrix multiplication with the Intel MKL gemm calls. Though these functions break uBLAS expression templates and introduce temporary matrices, the performance advantage can be considerable for matrix sizes that are not too small (roughly, over 50). You do not need to change your source code to use the functions. To call them: • Include the header file mkl_boost_ublas_matrix_prod.hpp in your code (from the Intel MKL include directory) • Add appropriate Intel MKL libraries to the link line. The list of expressions that are substituted follows: prod( m1, m2 ) prod( trans(m1), m2 ) prod( trans(conj(m1)), m2 ) prod( conj(trans(m1)), m2 ) prod( m1, trans(m2) ) prod( trans(m1), trans(m2) ) prod( trans(conj(m1)), trans(m2) ) prod( conj(trans(m1)), trans(m2) ) prod( m1, trans(conj(m2)) ) prod( trans(m1), trans(conj(m2)) ) prod( trans(conj(m1)), trans(conj(m2)) ) prod( conj(trans(m1)), trans(conj(m2)) ) prod( m1, conj(trans(m2)) ) prod( trans(m1), conj(trans(m2)) ) prod( trans(conj(m1)), conj(trans(m2)) ) prod( conj(trans(m1)), conj(trans(m2)) ) These expressions are substituted in the release mode only (with NDEBUG preprocessor symbol defined). Supported uBLAS versions are Boost 1.34.1 and higher. To get them, visit www.boost.org. A code example provided in the \examples\ublas\source\sylvester.cpp file illustrates usage of the Intel MKL uBLAS header file for solving a special case of the Sylvester equation. To run the Intel MKL ublas examples, specify the boost_root parameter in the n make command, for instance, when using Boost version 1.37.0: nmake libia32 boost_root = \boost_1_37_0 Intel MKL ublas examples on default Boost uBLAS configuration support only: • Microsoft Visual C++* Compiler versions 2005 and higher • Intel C++ Compiler versions 11.1 and higher with Microsoft Visual Studio IDE versions 2005 and higher See Also Using Code Examples Invoking Intel MKL Functions from Java* Applications Language-specific Usage Options 6 65Intel MKL Java* Examples To demonstrate binding with Java, Intel MKL includes a set of Java examples in the following directory: \examples\java. The examples are provided for the following MKL functions: • ?gemm, ?gemv, and ?dot families from CBLAS • The complete set of non-cluster FFT functions • ESSL 1 -like functions for one-dimensional convolution and correlation • VSL Random Number Generators (RNG), except user-defined ones and file subroutines • VML functions, except GetErrorCallBack, SetErrorCallBack, and ClearErrorCallBack You can see the example sources in the following directory: \examples\java\examples. The examples are written in Java. They demonstrate usage of the MKL functions with the following variety of data: • 1- and 2-dimensional data sequences • Real and complex types of the data • Single and double precision However, the wrappers, used in the examples, do not: • Demonstrate the use of large arrays (>2 billion elements) • Demonstrate processing of arrays in native memory • Check correctness of function parameters • Demonstrate performance optimizations The examples use the Java Native Interface (JNI* developer framework) to bind with Intel MKL. The JNI documentation is available from http://java.sun.com/javase/6/docs/technotes/guides/jni/. The Java example set includes JNI wrappers that perform the binding. The wrappers do not depend on the examples and may be used in your Java applications. The wrappers for CBLAS, FFT, VML, VSL RNG, and ESSL-like convolution and correlation functions do not depend on each other. To build the wrappers, just run the examples. The makefile builds the wrapper binaries. After running the makefile, you can run the examples, which will determine whether the wrappers were built correctly. As a result of running the examples, the following directories will be created in \examples \java: • docs • include • classes • bin • _results The directories docs, include, classes, and bin will contain the wrapper binaries and documentation; the directory _results will contain the testing results. For a Java programmer, the wrappers are the following Java classes: • com.intel.mkl.CBLAS • com.intel.mkl.DFTI • com.intel.mkl.ESSL • com.intel.mkl.VML • com.intel.mkl.VSL 6 Intel® Math Kernel Library for Windows* OS User's Guide 66Documentation for the particular wrapper and example classes will be generated from the Java sources while building and running the examples. To browse the documentation, open the index file in the docs directory (created by the build script): \examples\java\docs\index.html. The Java wrappers for CBLAS, VML, VSL RNG, and FFT establish the interface that directly corresponds to the underlying native functions, so you can refer to the Intel MKL Reference Manual for their functionality and parameters. Interfaces for the ESSL-like functions are described in the generated documentation for the com.intel.mkl.ESSL class. Each wrapper consists of the interface part for Java and JNI stub written in C. You can find the sources in the following directory: \examples\java\wrappers. Both Java and C parts of the wrapper for CBLAS and VML demonstrate the straightforward approach, which you may use to cover additional CBLAS functions. The wrapper for FFT is more complicated because it needs to support the lifecycle for FFT descriptor objects. To compute a single Fourier transform, an application needs to call the FFT software several times with the same copy of the native FFT descriptor. The wrapper provides the handler class to hold the native descriptor, while the virtual machine runs Java bytecode. The wrapper for VSL RNG is similar to the one for FFT. The wrapper provides the handler class to hold the native descriptor of the stream state. The wrapper for the convolution and correlation functions mitigates the same difficulty of the VSL interface, which assumes a similar lifecycle for "task descriptors". The wrapper utilizes the ESSL-like interface for those functions, which is simpler for the case of 1-dimensional data. The JNI stub additionally encapsulates the MKL functions into the ESSL-like wrappers written in C and so "packs" the lifecycle of a task descriptor into a single call to the native method. The wrappers meet the JNI Specification versions 1.1 and 5.0 and should work with virtually every modern implementation of Java. The examples and the Java part of the wrappers are written for the Java language described in "The Java Language Specification (First Edition)" and extended with the feature of "inner classes" (this refers to late 1990s). This level of language version is supported by all versions of the Sun Java Development Kit* (JDK*) developer toolkit and compatible implementations starting from version 1.1.5, or by all modern versions of Java. The level of C language is "Standard C" (that is, C89) with additional assumptions about integer and floatingpoint data types required by the Intel MKL interfaces and the JNI header files. That is, the native float and double data types must be the same as JNI jfloat and jdouble data types, respectively, and the native int must be 4 bytes long. 1 IBM Engineering Scientific Subroutine Library (ESSL*). See Also Running the Java* Examples Running the Java* Examples The Java examples support all the C and C++ compilers that Intel MKL does. The makefile intended to run the examples also needs the n make utility, which is typically provided with the C/C++ compiler package. To run Java examples, the JDK* developer toolkit is required for compiling and running Java code. A Java implementation must be installed on the computer or available via the network. You may download the JDK from the vendor website. The examples should work for all versions of JDK. However, they were tested only with the following Java implementation s for all the supported architectures: • J2SE* SDK 1.4.2, JDK 5.0 and 6.0 from Sun Microsystems, Inc. (http://sun.com/). • JRockit* JDK 1.4.2 and 5.0 from Oracle Corporation (http://oracle.com/). Note that the Java run-time environment* (JRE*) system, which may be pre-installed on your computer, is not enough. You need the JDK* developer toolkit that supports the following set of tools: Language-specific Usage Options 6 67• java • javac • javah • javadoc To make these tools available for the examples makefile, set the JAVA_HOME environment variable and add the JDK binaries directory to the system PATH, for example : SET JAVA_HOME=C:\Program Files\Java\jdk1.5.0_09 SET PATH=%JAVA_HOME%\bin;%PATH% You may also need to clear the JDK_HOME environment variable, if it is assigned a value: SET JDK_HOME= To start the examples, use the makefile found in the Intel MKL Java examples directory: nmake {dllia32|dllintel64|libia32|libintel64} [function=...] [compiler=...] If you type the make command and omit the target (for example, dllia32), the makefile prints the help info, which explains the targets and parameters. For the examples list, see the examples.lst file in the Java examples directory. Known Limitations of the Java* Examples This section explains limitations of Java examples. Functionality Some Intel MKL functions may fail to work if called from the Java environment by using a wrapper, like those provided with the Intel MKL Java examples. Only those specific CBLAS, FFT, VML, VSL RNG, and the convolution/correlation functions listed in the Intel MKL Java Examples section were tested with the Java environment. So, you may use the Java wrappers for these CBLAS, FFT, VML, VSL RNG, and convolution/ correlation functions in your Java applications. Performance The Intel MKL functions must work faster than similar functions written in pure Java. However, the main goal of these wrappers is to provide code examples, not maximum performance. So, an Intel MKL function called from a Java application will probably work slower than the same function called from a program written in C/ C++ or Fortran. Known bugs There are a number of known bugs in Intel MKL (identified in the Release Notes), as well as incompatibilities between different versions of JDK. The examples and wrappers include workarounds for these problems. Look at the source code in the examples and wrappers for comments that describe the workarounds. 6 Intel® Math Kernel Library for Windows* OS User's Guide 68Coding Tips 7 This section discusses programming with the Intel® Math Kernel Library (Intel® MKL) to provide coding tips that meet certain, specific needs, such as consistent results of computations or conditional compilation. Aligning Data for Consistent Results Routines in Intel MKL may return different results from run-to-run on the same system. This is usually due to a change in the order in which floating-point operations are performed. The two most influential factors are array alignment and parallelism. Array alignment can determine how internal loops order floating-point operations. Non-deterministic parallelism may change the order in which computational tasks are executed. While these results may differ, they should still fall within acceptable computational error bounds. To better assure identical results from run-to-run, do the following: • Align input arrays on 16-byte boundaries • Run Intel MKL in the sequential mode To align input arrays on 16-byte boundaries, use mkl_malloc() in place of system provided memory allocators, as shown in the code example below. Sequential mode of Intel MKL removes the influence of nondeterministic parallelism. Aligning Addresses on 16-byte Boundaries // ******* C language ******* ... #include ... void *darray; int workspace; ... // Allocate workspace aligned on 16-byte boundary darray = mkl_malloc( sizeof(double)*workspace, 16 ); ... // call the program using MKL mkl_app( darray ); ... // Free workspace mkl_free( darray ); ! ******* Fortran language ******* ... double precision darray pointer (p_wrk,darray(1)) integer workspace ... ! Allocate workspace aligned on 16-byte boundary p_wrk = mkl_malloc( 8*workspace, 16 ) ... ! call the program using MKL call mkl_app( darray ) ... ! Free workspace call mkl_free(p_wrk) 69Using Predefined Preprocessor Symbols for Intel® MKL Version-Dependent Compilation Preprocessor symbols (macros) substitute values in a program before it is compiled. The substitution is performed in the preprocessing phase. The following preprocessor symbols are available: Predefined Preprocessor Symbol Description __INTEL_MKL__ Intel MKL major version __INTEL_MKL_MINOR__ Intel MKL minor version __INTEL_MKL_UPDATE__ Intel MKL update number INTEL_MKL_VERSION Intel MKL full version in the following format: INTEL_MKL_VERSION = (__INTEL_MKL__*100+__INTEL_MKL_MINOR__)*100+__I NTEL_MKL_UPDATE__ These symbols enable conditional compilation of code that uses new features introduced in a particular version of the library. To perform conditional compilation: 1. Include in your code the file where the macros are defined: • mkl.h for C/C++ • mkl.fi for Fortran 2. [Optionally] Use the following preprocessor directives to check whether the macro is defined: • #ifdef, #endif for C/C++ • !DEC$IF DEFINED, !DEC$ENDIF for Fortran 3. Use preprocessor directives for conditional inclusion of code: • #if, #endif for C/C++ • !DEC$IF, !DEC$ENDIF for Fortran Example Compile a part of the code if Intel MKL version is MKL 10.3 update 4: C/C++: #include "mkl.h" #ifdef INTEL_MKL_VERSION #if INTEL_MKL_VERSION == 100304 // Code to be conditionally compiled #endif #endif Fortran: include "mkl.fi" !DEC$IF DEFINED INTEL_MKL_VERSION !DEC$IF INTEL_MKL_VERSION .EQ. 100304 * Code to be conditionally compiled !DEC$ENDIF !DEC$ENDIF 7 Intel® Math Kernel Library for Windows* OS User's Guide 70Working with the Intel® Math Kernel Library Cluster Software 8 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 MPI Support Intel MKL ScaLAPACK and Cluster FFTs support MPI implementations identified in the Intel® Math Kernel Library (Intel® MKL) Release Notes. To link applications with ScaLAPACK or Cluster FFTs, you need to configure your system depending on your message-passing interface (MPI) implementation as explained below. If you are using MPICH2, do the following: 1. Add mpich2\include to the include path (assuming the default MPICH2 installation). 2. Add mpich2\lib to the library path. 3. Add mpi.lib to your link command. 4. Add fmpich2.lib to your Fortran link command. 5. Add cxx.lib to your Release target link command and cxxd.lib to your Debug target link command for C++ programs. If you are using the Microsoft MPI, do the following: 1. Add Microsoft Compute Cluster Pack\include to the include path (assuming the default installation of the Microsoft MPI). 2. Add Microsoft Compute Cluster Pack\Lib\AMD64 to the library path. 3. Add msmpi.lib to your link command. If you are using the Intel® MPI, do the following: 1. Add the following string to the include path: %ProgramFiles%\Intel\MPI\\\include, where is the directory for a particular MPI version and is ia32 or intel64, for example, %ProgramFiles%\Intel\MPI\3.1\intel64\include. 2. Add the following string to the library path: %ProgramFiles%\Intel\MPI\\\lib, for example, %ProgramFiles%\Intel\MPI\3.1\intel64\lib. 3. Add impi.lib and impicxx.lib to your link command. Check the documentation that comes with your MPI implementation for implementation-specific details of linking. Linking with ScaLAPACK and Cluster FFTs To link with Intel MKL ScaLAPACK and/or Cluster FFTs, use the following commands : 71set lib =;;%lib% where the placeholders stand for paths and libraries as explained in the following table: \lib\{ia32|intel64}, depending on your architecture. If you performed the Setting Environment Variables step of the Getting Started process, you do not need to add this directory to the lib environment variable. Typically the lib subdirectory in the MPI installation directory. For example, C:\Program Files (x86)\Intel\MPI\3.2.0.005\ia32\lib for a default installation of Intel MPI 3.2. One of icl, ifort, xilink. One of ScaLAPACK or Cluster FFT libraries for the appropriate architecture, which are listed in Directory Structure in Detail. For example, for the IA-32 architecture, it is one of mkl_scalapack_core.lib or mkl_cdft_core.lib. The BLACS library corresponding to your architecture, programming interface (LP64 or ILP64), and MPI version. These libraries are listed in Directory Structure in Detail. For example, for the IA-32 architecture, choose one of mkl_blacs_mpich2.lib or mkl_blacs_intelmpi.lib in case of static linking or mkl_blacs_dll.lib in case of dynamic linking; specifically, for MPICH2, choose mkl_blacs_mpich2.lib in case of static linking. Intel MKL libraries other than ScaLAPACK or Cluster FFTs libraries. TIP Use the Link-line Advisor to quickly choose the appropriate set of , , and . Intel MPI provides prepackaged scripts for its linkers to help you link using the respective linker. Therefore, if you are using Intel MPI, the best way to link is to use the following commands: \mpivars.bat set lib = ;%lib% where the placeholders that are not yet defined are explained in the following table: 8 Intel® Math Kernel Library for Windows* OS User's Guide 72 By default, the bin subdirectory in the MPI installation directory. For example, C: \Program Files (x86)\Intel\MPI\3.2.0.005\ia32\lib for a default installation of Intel MPI 3.2; mpicl or mpiifort See Also Linking Your Application with the Intel® Math Kernel Library Examples for Linking with ScaLAPACK and Cluster FFT Determining the Number of Threads The OpenMP* software responds to the environment variable OMP_NUM_THREADS. Intel MKL also has other mechanisms to set the number of threads, such as the MKL_NUM_THREADS or MKL_DOMAIN_NUM_THREADS environment variables (see Using Additional Threading Control). Make sure that the relevant environment variables have the same and correct values on all the nodes. Intel MKL versions 10.0 and higher no longer set the default number of threads to one, but depend on the OpenMP libraries used with the compiler to set the default number. For the threading layer based on the Intel compiler (mkl_intel_thread.lib), this value is the number of CPUs according to the OS. CAUTION Avoid over-prescribing the number of threads, which may occur, for instance, when the number of MPI ranks per node and the number of threads per node are both greater than one. The product of MPI ranks per node and the number of threads per node should not exceed the number of physical cores per node. The OMP_NUM_THREADS environment variable is assumed in the discussion below. Set OMP_NUM_THREADS so that the product of its value and the number of MPI ranks per node equals the number of real processors or cores of a node. If the Intel ® Hyper-Threading Technology is enabled on the node, use only half number of the processors that are visible on Windows OS. See Also Setting Environment Variables on a Cluster Using DLLs All the needed DLLs must be visible on all the nodes at run time, and you should install Intel® Math Kernel Library (Intel® MKL) on each node of the cluster. You can use Remote Installation Services (RIS) provided by Microsoft to remotely install the library on each of the nodes that are part of your cluster. The best way to make the DLLs visible is to point to these libraries in the PATH environment variable. See Setting Environment Variables on a Cluster on how to set the value of the PATH environment variable. The ScaLAPACK DLLs for the IA-32 and Intel® 64 architectures (in the \redist \ia32\mkl and \redist\intel64\mkl directories, respectively) use the MPI dispatching mechanism. MPI dispatching is based on the MKL_BLACS_MPI environment variable. The BLACS DLL uses MKL_BLACS_MPI for choosing the needed MPI libraries. The table below lists possible values of the variable. Value Comment MPICH2 Default value. MPICH2 1.0.x for Windows* OS is used for message passing INTELM PI Intel MPI is used for message passing Working with the Intel® Math Kernel Library Cluster Software 8 73Value Comment MSMPI Microsoft MPI is used for message passing If you are using a non-default MPI, assign the same appropriate value to MKL_BLACS_MPI on all nodes. See Also Setting Environment Variables on a Cluster Setting Environment Variables on a Cluster If you are using MPICH2 or Intel MPI, to set an environment variable on the cluster, use -env, -genv, - genvlist keys of mpiexec. See the following MPICH2 examples on how to set the value of OMP_NUM_THREADS: mpiexec -genv OMP_NUM_THREADS 2 .... mpiexec -genvlist OMP_NUM_THREADS .... mpiexec -n 1 -host first -env OMP_NUM_THREADS 2 test.exe : -n 1 -host second -env OMP_NUM_THREADS 3 test.exe .... See the following Intel MPI examples on how to set the value of MKL_BLACS_MPI: mpiexec -genv MKL_BLACS_MPI INTELMPI .... mpiexec -genvlist MKL_BLACS_MPI .... mpiexec -n 1 -host first -env MKL_BLACS_MPI INTELMPI test.exe : -n 1 -host second -env MKL_BLACS_MPI INTELMPI test.exe. When using MPICH2, you may have problems with getting the global environment, such as MKL_BLACS_MPI, by the -genvlist key. In this case, set up user or system environments on each node as follows: From the Start menu, select Settings > Control Panel > System > Advanced > Environment Variables. If you are using Microsoft MPI, the above ways of setting environment variables are also applicable if the Microsoft Single Program Multiple Data (SPMD) process managers are running in a debug mode on all nodes of the cluster. However, the best way to set environment variables is using the Job Scheduler with the Microsoft Management Console (MMC) and/or the Command Line Interface (CLI) to submit a job and pass environment variables. For more information about MMC and CLI, see the Microsoft Help and Support page at the Microsoft Web site (http://www.microsoft.com/). Building ScaLAPACK Tests To build ScaLAPACK tests, • For the IA-32 architecture, add mkl_scalapack_core.lib to your link command. • For the Intel® 64 architecture, add mkl_scalapack_lp64.lib or mkl_scalapack_ilp64.lib, depending on the desired interface. Examples for Linking with ScaLAPACK and Cluster FFT This section provides examples of linking with ScaLAPACK and Cluster FFT. Note that a binary linked with ScaLAPACK runs the same way as any other MPI application (refer to the documentation that comes with your MPI implementation). For further linking examples, see the support website for Intel products at http://www.intel.com/software/ products/support/. 8 Intel® Math Kernel Library for Windows* OS User's Guide 74See Also Directory Structure in Detail Examples for Linking a C Application These examples illustrate linking of an application whose main module is in C under the following conditions: • MPICH2 1.0.x is installed in c:\mpich2x64. • You use the Intel® C++ Compiler 10.0 or higher. To link with ScaLAPACK using LP64 interface for a cluster of Intel® 64 architecture based systems, set the environment variable and use the link line as follows: set lib=c:\mpich2x64\lib;\lib\intel64;%lib% icl mkl_scalapack_lp64.lib mkl_blacs_mpich2_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib mpi.lib cxx.lib bufferoverflowu.lib To link with Cluster FFT using LP64 interface for a cluster of Intel® 64 architecture based systems, set the environment variable and use the link line as follows: set lib=c:\mpich2x64\lib;\lib\intel64;%lib% icl mkl_cdft_core.lib mkl_blacs_mpich2_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib mpi.lib cxx.lib bufferoverflowu.lib See Also Linking with ScaLAPACK and Cluster FFTs Linking with System Libraries Examples for Linking a Fortran Application These examples illustrate linking of an application whose main module is in Fortran under the following conditions: • Microsoft Windows Compute Cluster Pack SDK is installed in c:\MS CCP SDK. • You use the Intel® Fortran Compiler 10.0 or higher. To link with ScaLAPACK using LP64 interface for a cluster of Intel® 64 architecture based systems, set the environment variable and use the link line as follows: set lib="c:\MS CCP SDK\Lib\AMD64";\lib\intel64;%lib% ifort mkl_scalapack_lp64.lib mkl_blacs_mpich2_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib msmpi.lib bufferoverflowu.lib To link with Cluster FFTs using LP64 interface for a cluster of Intel® 64 architecture based systems, set the environment variable and use the link line as follows: set lib="c:\MS CCP SDK\Lib\AMD64";\lib\intel64;%lib% ifort mkl_cdft_core.lib mkl_blacs_mpich2_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib msmpi.lib bufferoverflowu.lib See Also Linking with ScaLAPACK and Cluster FFTs Linking with System Libraries Working with the Intel® Math Kernel Library Cluster Software 8 758 Intel® Math Kernel Library for Windows* OS User's Guide 76Programming with Intel® Math Kernel Library in Integrated Development Environments (IDE) 9 Configuring Your Integrated Development Environment to Link with Intel Math Kernel Library Configuring the Microsoft Visual C/C++* Development System to Link with Intel® MKL Steps for configuring Microsoft Visual C/C++* Development System for linking with Intel® Math Kernel Library (Intel® MKL) depend on whether If you installed the C++ Integration(s) in Microsoft Visual Studio* component of the Intel® Composer XE: • If you installed the integration component, see Automatically Linking Your Microsoft Visual C/C++ Project with Intel MKL. • If you did not install the integration component or need more control over Intel MKL libraries to link, you can configure the Microsoft Visual C++* 2005, Visual C++* 2008, or Visual C++* 2010 development system by performing the following steps. Though some versions of the Visual C++* development system may vary slightly in the menu items mentioned below, the fundamental configuring steps are applicable to all these versions. 1. From the menu, select View > Solution Explorer (and make sure this window is active) 2. Select Tools > Options > Projects > VC++ Directories 3. From the Show directories for list, select Include Files. Add the directory for the Intel MKL include files, that is, \include 4. From the Show directories for list, select Library Files. Add architecture-specific directories for Intel MKL and OpenMP* libraries, for example: \lib\ia32 and \compiler\lib\ia32 5. From the Show directories for list, select Executable Files. Add architecture-specific directories with dynamic-link libraries: • For OpenMP* support, for example: \redist\ia32\compiler • For Intel MKL (only if you link dynamically), for example: \redist \ia32\mkl 6. Select Project>Properties>Configuration Properties>Linker>Input>Additional Dependencies. Add the libraries required, for example, mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib See Also Intel® Software Documentation Library Linking in Detail Configuring Intel® Visual Fortran to Link with Intel MKL Steps for configuring Intel® Visual Fortran for linking with Intel® Math Kernel Library (Intel® MKL) depend on whether you installed the Visual Fortran Integration(s) in Microsoft Visual Studio* component of the Intel® Composer XE: • If you installed the integration component, see Automatically Linking Your Intel® Visual Fortran Project with Intel® MKL. 77• If you did not install the integration component or need more control over Intel MKL libraries to link, you can configure your project as follows: 1. Select Project>Properties>Linker>General>Additional Library Directories. Add architecturespecific directories for Intel MKL and OpenMP* libraries, for example: \lib\ia32 and \compiler\lib\ia32 2. Select Project>Properties>Linker>Input>Additional Dependencies. Insert names of the required libraries, for example: mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib 3. Select Project>Properties>Debugging>Environment. Add architecture-specific paths to dynamiclink libraries: • For OpenMP* support; for example: enter PATH=%PATH%;\redist \ia32\compiler • For Intel MKL (only if you link dynamically); for example: enter PATH=%PATH%;\redist\ia32\mkl See Also Intel® Software Documentation Library Running an Intel MKL Example in the Visual Studio* 2008 IDE This section explains how to create and configure projects with the Intel® Math Kernel Library (Intel® MKL) examples in Microsoft Visual Studio* 2008. For Intel MKL examples where the instructions below do not work, see Known Limitations. To run the Intel MKL C examples in Microsoft Visual Studio 2008: 1. Do either of the following: • Install Intel® C/C++ Compiler and integrate it into Visual Studio (recommended). • Use the Microsoft Visual C++* 2008 Compiler integrated into Visual Studio*. 2. Create, configure, and run the Intel C/C++ and/or Microsoft Visual C++* 2008. To run the Intel MKL Fortran examples in Microsoft Visual Studio 2008: 1. Install Intel® Visual Fortran Compiler and integrate it into Visual Studio. The default installation of the Intel Visual Fortran Compiler performs this integration. For more information, see the Intel Visual Fortran Compiler documentation. 2. Create, configure, and run the Intel Visual Fortran project. Creating, Configuring, and Running the Intel® C/C++ and/or Visual C++* 2008 Project This section demonstrates how to create a Visual C/C++ project using an Intel® Math Kernel Library (Intel® MKL) example in Microsoft Visual Studio 2008. The instructions below create a Win32/Debug project running one Intel MKL example in a Console window. For details on creation of different kinds of Microsoft Visual Studio projects, refer to MSDN Visual Studio documentation at http://www.microsoft.com. To create and configure the Win32/Debug project running an Intel MKL C example with the Intel® C/C++ Compiler integrated into Visual Studio and/or Microsoft Visual C++* 2008, perform the following steps: 1. Create a C Project: a. Open Visual Studio 2008. b. On the main menu, select File > New > Project to open the New Project window. c. Select Project Types > Visual C++ > Win32, then select Templates > Win32 Console Application. In the Name field, type , for example, MKL_CBLAS_CAXPYIX, and click OK. The New Project window closes, and the Win32 Application Wizard - window opens. d. Select Next, then select Application Settings, check Additional options > Empty project, and click Finish. 9 Intel® Math Kernel Library for Windows* OS User's Guide 78The Win32 Application Wizard - window closes. The next steps are performed inside the Solution Explorer window. To open it, select View > Solution Explorer from the main menu. 2. (optional) To switch to the Intel C/C++ project, right-click and from the drop-down menu, select Convert to use Intel® C++ Project System. (The menu item is available if the Intel® C/C++ Compiler is integrated into Visual Studio.) 3. Add sources of the Intel MKL example to the project: a. Right-click the Source Files folder under and select Add > Existing Item... from the drop-down menu. The Add Existing Item - window opens. b. Browse to the Intel MKL example directory, for example, \examples\cblas \source. Select the example file and supporting files with extension ".c" (C sources), for example, select files cblas_caxpyix.c and common_func.c For the list of supporting files in each example directory, see Support Files for Intel MKL Examples. Click Add. The Add Existing Item - window closes, and selected files appear in the Source Files folder in Solution Explorer. The next steps adjust the properties of the project. 4. Select . 5. On the main menu, select Project > Properties to open the Property Pages window. 6. Set Intel MKL Include dependencies: a. Select Configuration Properties > C/C++ > General. In the right-hand part of the window, select Additional Include Directories > ... (the browse button). The Additional Include Directories window opens. b. Click the New Line button (the first button in the uppermost row). When the new line appears in the window, click the browse button. The Select Directory window opens. c. Browse to the \include directory and click OK. The Select Directory window closes, and full path to the Intel MKL include directory appears in the Additional Include Directories window. d. Click OK to close the window. 7. Set library dependencies: a. Select Configuration Properties > Linker > General. In the right-hand part of the window, select Additional Library Directories > ... (the browse button). The Additional Library Directories window opens. b. Click the New Line button (the first button in the uppermost row). When the new line appears in the window, click the browse button. The Select Directory window opens. c. Browse to the directory with the Intel MKL libraries \lib\, where is one of {ia32, intel64}, for example: \lib\ia32. (For most laptop and desktop computers, is ia32.). Click OK. The Select Directory window closes, and the full path to the Intel MKL libraries appears in the Additional Library Directories window. d. Click the New Line button again. When the new line appears in the window, click the browse button. The Select Directory window opens. e. Browse to the compiler\lib\, where is one of { ia32, intel64 }, for example: \compiler\lib\ia32. Click OK. The Select Directory window closes, and the specified full path appears in the Additional Library Directories window. f. Click OK to close the Additional Library Directories window. g. Select Configuration Properties > Linker > Input. In the right-hand part of the window, select Additional Dependencies > ... (the browse button). The Additional Dependencies window opens. h. Type the libraries required, for example, if =ia32, type mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib For more details, see Linking in Detail. i. Click OK to close the Additional Dependencies window. Programming with Intel® Math Kernel Library in Integrated Development Environments (IDE) 9 79j. If the Intel MKL example directory does not contain a data directory, skip the next step. 8. Set data dependencies for the Intel MKL example: a. Select Configuration Properties > Debugging. In the right-hand part of the window, select Command Arguments > > . The Command Arguments window opens. b. Type the path to the proper data file in quotes. The name of the data file is the same as the name of the example file, with a "d" extension, for example, "\examples\cblas\data \cblas_caxpyix.d". c. Click OK to close the Command Arguments window. 9. Click OK to close the Property Pages window. 10.Certain examples do not pause before the end of execution. To see the results printed in the Console window, set a breakpoint at the very last 'return 0;' statement or add a call to 'getchar();' before the last 'return 0' statement. 11.To build the solution, select Build > Build Solution . NOTE You may see warnings about unsafe functions and variables. To get rid of these warnings, go to Project > Properties, and when the Property Pages window opens, go to Configuration Properties > C/C++ > Preprocessor. In the right-hand part of the window, select Preprocessor Definitions, add _CRT_SECURE_NO_WARNINGS, and click OK. 12.To run the example, select Debug > Start Debugging. The Console window opens. 13.You can see the results of the example in the Console window. If you used the 'getchar();' statement to pause execution of the program, press Enter to complete the run. If you used a breakpoint to pause execution of the program, select Debug > Continue. The Console window closes. See Also Running an Intel MKL Example in the Visual Studio* 2008 IDE Creating, Configuring, and Running the Intel Visual Fortran Project This section demonstrates how to create an Intel Visual Fortran project running an Intel MKL example in Microsoft Visual Studio 2008. The instructions below create a Win32/Debug project running one Intel MKL example in a Console window. For details on creation of different kinds of Microsoft Visual Studio projects, refer to MSDN Visual Studio documentation at http://www.microsoft.com. To create and configure a Win32/Debug project running the Intel MKL Fortran example with the Intel Visual Fortran Compiler integrated into Visual Studio, perform the following steps: 1. Create a Visual Fortran Project: a. Open Visual Studio 2008. b. On the main menu, select File > New > Project to open the New Project window. c. Select Project Types > Intel® Fortran > Console Application, then select Templates > Empty Project. When done, in the Name field, type for example, MKL_PDETTF_D_TRIG_TRANSFORM_BVP, and click OK. The New Project window closes. The next steps are performed inside the Solution Explorer window. To open it, select View>Solution Explorer from the main menu. 2. Add sources of Intel MKL example to the project: a. Right-click the Source Files folder under and select Add > Existing Item... from the drop-down menu. The Add Existing Item - window opens. b. Browse to the Intel MKL example directory, for example, \examples\pdettf \source. Select the example file and supporting files with extension ".f" or ".f90" (Fortran sources). For example, select the d_trig_tforms_bvp.f90 file. For the list of supporting files in each example directory, see Support Files for Intel MKL Examples. Click Add. 9 Intel® Math Kernel Library for Windows* OS User's Guide 80The Add Existing Item - window closes, and the selected files appear in the Source Files folder in Solution Explorer. Some examples with the "use" statements require the next two steps. c. Right-click the Header Files folder under and select Add > Existing Item... from the drop-down menu. The Add Existing Item - window opens. d. Browse to the \include directory. Select the header files that appear in the "use" statements. For example, select the mkl_dfti.f90 and mkl_trig_transforms.f90 files. Click Add. The Add Existing Item - window closes, and the selected files to appear in theHeader Filesfolder in Solution Explorer. The next steps adjust the properties of the project: 3. Select the . 4. On the main menu, select Project > Properties to open the Property Pages window. 5. Set the Intel MKL include dependencies: a. Select Configuration Properties > Fortran > General. In the right-hand part of the window, select Additional Include Directories > > . The Additional Include Directories window opens. b. Type the Intel MKL include directory in quotes: "\include". Click OK to close the window. 6. Select Configuration Properties > Fortran > Preprocessor. In the right-hand part of the window, select Preprocess Source File > Yes (default is No). This step is recommended because some examples require preprocessing. 7. Set library dependencies: a. Select Configuration Properties > Linker > General. In the right-hand part of the window, select Additional Library Directories > > . The Additional Library Directories window opens. b. Type the directory with the Intel MKL libraries in quotes, that is, "\lib \", where is one of { ia32, intel64 }, for example: "\lib\ia32". (For most laptop and desktop computers is ia32.) Click OK to close the window. c. Select Configuration Properties > Linker > Input. In the right-hand part of the window, select Additional Dependencies and type the libraries required, for example, if =ia32, type mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib. 8. In the Property Pages window, click OK to close the window. 9. Some examples do not pause before the end of execution. To see the results printed in the Console window, set a breakpoint at the very end of the program or add the 'pause' statement before the last 'end' statement. 10.To build the solution, select Build > Build Solution. 11.To run the example, select Debug > Start Debugging. The Console window opens. 12.You can see the results of the example in the Console window. If you used 'pause' statement to pause execution of the program, press Enter to complete the run. If you used a breakpoint to pause execution of the program, select Debug > Continue. The Console window closes. Support Files for Intel® Math Kernel Library Examples Below is the list of support files that have to be added to the project for respective examples: examples\cblas\source: common_func.c examples\dftc\source: dfti_example_status_print.c dfti_example_support.c Programming with Intel® Math Kernel Library in Integrated Development Environments (IDE) 9 81Known Limitations of the Project Creation Procedure You cannot create a Visual Studio* project using the instructions from Creating, Configuring, and Running the Intel® C/C++ and/or Visual C++* 2008 Project or Creating, Configuring, and Running the Intel® Visual Fortran Project for examples from the following directories: examples\blas examples\blas95 examples\cdftc examples\cdftf examples\dftf examples\fftw2x_cdf examples\fftw2xc examples\fftw2xf examples\fftw3xc examples\fftw3xf examples\java examples\lapack examples\lapack95 Getting Assistance for Programming in the Microsoft Visual Studio* IDE Viewing Intel MKL Documentation in Visual Studio* IDE Viewing Intel MKL Documentation in Document Explorer (Visual Studio* 2005/2008 IDE) Intel MKL documentation is integrated in the Visual Studio IDE (VS) help collection. To open Intel MKL help, 1. Select Help > Contents from the menu. This displays the list of VS Help collections. 2. Click Intel Math Kernel Library Help. 3. In the help tree that expands, click Intel MKL Reference Manual. To open the help index, select Help > Inde x from the menu. To search in the help, select Help > Search from the menu and enter a search string. 9 Intel® Math Kernel Library for Windows* OS User's Guide 82You can filter Visual Studio Help collections to show only content related to installed Intel tools. To do this, select "Intel" from the Filtered by list. This hides the contents and index entries for all collections that do not refer to Intel. Accessing Intel MKL Documentation in Visual Studio* 2010 IDE To access the Intel MKL documentation in Visual Studio* 2010 IDE: • Configure the IDE to use local help (once). To do this, Go to Help > Manage Help Settings and check I want to use online help • Use the Help > View Help menu item to view a list of available help collections and open the Intel MKL documentation. Using Context-Sensitive Help When typing your code in the Visual Studio* (VS) IDE Code Editor, you can get context-sensitive help using the F1 Help and Dynamic Help features. F1 Help To open the help topic relevant to the current selection, press F1. In particular, to open the help topic describing an Intel MKL function called in your code, select the function name and press F1. The topic with the function description opens in the window that displays search results: Programming with Intel® Math Kernel Library in Integrated Development Environments (IDE) 9 83Dynamic Help Dynamic Help also provides access to topics relevant to the current selection or to the text being typed. Links to all relevant topics are displayed in the Dynamic Help window. To get the list of relevant topics each time you select the Intel MKL function name or as you type it in your code, open the Dynamic Help window by selecting Help > Dynamic Help from the menu. To open a topic from the list, click the appropriate link in the Dynamic Help window, shown in the above figure. Typically only one link corresponds to each Intel MKL function. Using the IntelliSense* Capability IntelliSense is a set of native Visual Studio*(VS) IDE features that make language references easily accessible. The user programming with Intel MKL in the VS Code Editor can employ two IntelliSense features: Parameter Info and Complete Word. Both features use header files. Therefore, to benefit from IntelliSense, make sure the path to the include files is specified in the VS or solution settings. For example, see Configuring the Microsoft Visual C/C++* Development System to Link with Intel® MKL on how to do this. Parameter Info The Parameter Info feature displays the parameter list for a function to give information on the number and types of parameters. This feature requires adding the include statement with the appropriate Intel MKL header file to your code. To get the list of parameters of a function specified in the header file, 1. Type the function name. 2. Type the opening parenthesis. This brings up the tooltip with the list of the function parameters: 9 Intel® Math Kernel Library for Windows* OS User's Guide 84Complete Word For a software library, the Complete Word feature types or prompts for the rest of the name defined in the header file once you type the first few characters of the name in your code. This feature requires adding the include statement with the appropriate Intel MKL header file to your code. To complete the name of the function or named constant specified in the header file, 1. Type the first few characters of the name. 2. Press Alt+RIGHT ARROW or Ctrl+SPACEBAR. If you have typed enough characters to disambiguate the name, the rest of the name is typed automatically. Otherwise, a pop-up list appears with the names specified in the header file 3. Select the name from the list, if needed. Programming with Intel® Math Kernel Library in Integrated Development Environments (IDE) 9 859 Intel® Math Kernel Library for Windows* OS User's Guide 86LINPACK and MP LINPACK Benchmarks 10 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Intel® Optimized LINPACK Benchmark for Windows* OS Intel® Optimized LINPACK Benchmark is a generalization of the LINPACK 1000 benchmark. It solves a dense (real*8) system of linear equations (Ax=b), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. The generalization is in the number of equations (N) it can solve, which is not limited to 1000. It uses partial pivoting to assure the accuracy of the results. Do not use this benchmark to report LINPACK 100 performance because that is a compiled-code only benchmark. This is a shared-memory (SMP) implementation which runs on a single platform. Do not confuse this benchmark with: • MP LINPACK, which is a distributed memory version of the same benchmark. • LINPACK, the library, which has been expanded upon by the LAPACK library. Intel provides optimized versions of the LINPACK benchmarks to help you obtain high LINPACK benchmark results on your genuine Intel processor systems more easily than with the High Performance Linpack (HPL) benchmark. Use this package to benchmark your SMP machine. Additional information on this software as well as other Intel software performance products is available at http://www.intel.com/software/products/. Contents of the Intel® Optimized LINPACK Benchmark The Intel Optimized LINPACK Benchmark for Windows* OS contains the following files, located in the benchmarks\linpack\ subdirectory of the Intel® Math Kernel Library (Intel® MKL) directory: File in benchmarks \linpack\ Description linpack_xeon32.exe The 32-bit program executable for a system based on Intel® Xeon® processor or Intel® Xeon® processor MP with or without Streaming SIMD Extensions 3 (SSE3). linpack_xeon64.exe The 64-bit program executable for a system with Intel® Xeon® processor using Intel® 64 architecture. runme_xeon32.bat A sample shell script for executing a pre-determined problem set for linpack_xeon32.exe. OMP_NUM_THREADS set to 2 processors. runme_xeon64.bat A sample shell script for executing a pre-determined problem set for linpack_xeon64.exe. OMP_NUM_THREADS set to 4 processors. 87File in benchmarks \linpack\ Description lininput_xeon32 Input file for pre-determined problem for the runme_xeon32 script. lininput_xeon64 Input file for pre-determined problem for the runme_xeon64 script. win_xeon32.txt Result of the runme_xeon32 script execution. win_xeon64.txt Result of the runme_xeon64 script execution. help.lpk Simple help file. xhelp.lpk Extended help file. See Also High-level Directory Structure Running the Software To obtain results for the pre-determined sample problem sizes on a given system, type one of the following, as appropriate: runme_xeon32.bat runme_xeon64.bat To run the software for other problem sizes, see the extended help included with the program. Extended help can be viewed by running the program executable with the -e option: linpack_xeon32.exe -e linpack_xeon64.exe -e The pre-defined data input fileslininput_xeon32 and lininput_xeon64 are provided merely as examples. Different systems have different number of processors or amount of memory and thus require new input files. The extended help can be used for insight into proper ways to change the sample input files. Each input file requires at least the following amount of memory: lininput_xeon32 2 GB lininput_xeon64 16 GB If the system has less memory than the above sample data input requires, you may need to edit or create your own data input files, as explained in the extended help. Each sample script uses the OMP_NUM_THREADS environment variable to set the number of processors it is targeting. To optimize performance on a different number of physical processors, change that line appropriately. If you run the Intel Optimized LINPACK Benchmark without setting the number of threads, it will default to the number of cores according to the OS. You can find the settings for this environment variable in the runme_* sample scripts. If the settings do not yet match the situation for your machine, edit the script. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 10 Intel® Math Kernel Library for Windows* OS User's Guide 88Known Limitations of the Intel® Optimized LINPACK Benchmark The following limitations are known for the Intel Optimized LINPACK Benchmark for Windows* OS: • Intel Optimized LINPACK Benchmark is threaded to effectively use multiple processors. So, in multiprocessor systems, best performance will be obtained with the Intel® Hyper-Threading Technology turned off, which ensures that the operating system assigns threads to physical processors only. • If an incomplete data input file is given, the binaries may either hang or fault. See the sample data input files and/or the extended help for insight into creating a correct data input file. Intel® Optimized MP LINPACK Benchmark for Clusters Overview of the Intel® Optimized MP LINPACK Benchmark for Clusters The Intel® Optimized MP LINPACK Benchmark for Clusters is based on modifications and additions to HPL 2.0 from Innovative Computing Laboratories (ICL) at the University of Tennessee, Knoxville (UTK). The Intel Optimized MP LINPACK Benchmark for Clusters can be used for Top 500 runs (see http://www.top500.org). To use the benchmark you need be intimately familiar with the HPL distribution and usage. The Intel Optimized MP LINPACK Benchmark for Clusters provides some additional enhancements and bug fixes designed to make the HPL usage more convenient, as well as explain Intel® Message-Passing Interface (MPI) settings that may enhance performance. The .\benchmarks\mp_linpack directory adds techniques to minimize search times frequently associated with long runs. The Intel® Optimized MP LINPACK Benchmark for Clusters is an implementation of the Massively Parallel MP LINPACK benchmark by means of HPL code. It solves a random dense (real*8) system of linear equations (Ax=b), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. You can solve any size (N) system of equations that fit into memory. The benchmark uses full row pivoting to ensure the accuracy of the results. Use the Intel Optimized MP LINPACK Benchmark for Clusters on a distributed memory machine. On a shared memory machine, use the Intel Optimized LINPACK Benchmark. Intel provides optimized versions of the LINPACK benchmarks to help you obtain high LINPACK benchmark results on your systems based on genuine Intel processors more easily than with the HPL benchmark. Use the Intel Optimized MP LINPACK Benchmark to benchmark your cluster. The prebuilt binaries require that you first install Intel® MPI 3.x be installed on the cluster. The run-time version of Intel MPI is free and can be downloaded from www.intel.com/software/products/ . The Intel package includes software developed at the University of Tennessee, Knoxville, Innovative Computing Laboratories and neither the University nor ICL endorse or promote this product. Although HPL 2.0 is redistributable under certain conditions, this particular package is subject to the Intel MKL license. Intel MKL has introduced a new functionality into MP LINPACK, which is called a hybrid build, while continuing to support the older version. The term hybrid refers to special optimizations added to take advantage of mixed OpenMP*/MPI parallelism. If you want to use one MPI process per node and to achieve further parallelism by means of OpenMP, use the hybrid build. In general, the hybrid build is useful when the number of MPI processes per core is less than one. If you want to rely exclusively on MPI for parallelism and use one MPI per core, use the non-hybrid build. In addition to supplying certain hybrid prebuilt binaries, Intel MKL supplies some hybrid prebuilt libraries for Intel® MPI to take advantage of the additional OpenMP* optimizations. If you wish to use an MPI version other than Intel MPI, you can do so by using the MP LINPACK source provided. You can use the source to build a non-hybrid version that may be used in a hybrid mode, but it would be missing some of the optimizations added to the hybrid version. Non-hybrid builds are the default of the source code makefiles provided. In some cases, the use of the hybrid mode is required for external reasons. If there is a choice, the non-hybrid code may be faster. To use the non-hybrid code in a hybrid mode, use the threaded version of Intel MKL BLAS, link with a thread-safe MPI, and call function MPI_init_thread() so as to indicate a need for MPI to be thread-safe. LINPACK and MP LINPACK Benchmarks 10 89Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Contents of the Intel® Optimized MP LINPACK Benchmark for Clusters The Intel Optimized MP LINPACK Benchmark for Clusters (MP LINPACK Benchmark) includes the HPL 2.0 distribution in its entirety, as well as the modifications delivered in the files listed in the table below and located in the benchmarks\mp_linpack\ subdirectory of the Intel MKL directory. NOTE Because MP LINPACK Benchmark includes the entire HPL 2.0 distribution, which provides a configuration for Linux* OS only, some Linux OS files remain in the directory. Directory/File in benchmarks \mp_linpack\ Contents testing\ptest\HPL_pdtest.c HPL 2.0 code modified to display captured DGEMM information in ASYOUGO2_DISPLAY if it was captured (for details, see New Features). src\blas\HPL_dgemm.c HPL 2.0 code modified to capture DGEMM information, if desired, from ASYOUGO2_DISPLAY. src\grid\HPL_grid_init.c HPL 2.0 code modified to do additional grid experiments originally not in HPL 2.0. src\pgesv\HPL_pdgesvK2.c HPL 2.0 code modified to do ASYOUGO and ENDEARLY modifications. src\pgesv\HPL_pdgesv0.c HPL 2.0 code modified to do ASYOUGO, ASYOUGO2, and ENDEARLY modifications. testing\ptest\HPL.dat HPL 2.0 sample HPL.dat modified. makes All the makefiles in this directory have been rebuilt in the Windows OS distribution. testing\ptimer\ Some files in here have been modified in the Windows OS distribution. testing\timer\ Some files in here have been modified in the Windows OS distribution. Make (New) Sample architecture makefile for nmake utility to be used on processors based on the IA-32 and Intel® 64 architectures and Windows OS. bin_intel\ia32\xhpl_ia32.exe (New) Prebuilt binary for the IA-32 architecture, Windows OS, and Intel® MPI. bin_intel \intel64\xhpl_intel64.exe (New) Prebuilt binary for the Intel® 64 architecture, Windows OS, and Intel MPI. 10 Intel® Math Kernel Library for Windows* OS User's Guide 90Directory/File in benchmarks \mp_linpack\ Contents lib_hybrid \ia32\libhpl_hybrid.lib (New) Prebuilt library with the hybrid version of MP LINPACK for the IA-32 architecture and Intel MPI. lib_hybrid \intel64\libhpl_hybrid.lib (New) Prebuilt library with the hybrid version of MP LINPACK for the Intel® 64 architecture and Intel MPI. bin_intel \ia32\xhpl_hybrid_ia32.exe (New) Prebuilt hybrid binary for the IA-32 architecture, Windows OS, and Intel MPI. bin_intel \intel64\xhpl_hybrid_intel64.exe (New) Prebuilt hybrid binary for the Intel® 64 architecture, Windows OS, and Intel MPI. nodeperf.c (New) Sample utility that tests the DGEMM speed across the cluster. See Also High-level Directory Structure Building the MP LINPACK The MP LINPACK Benchmark contains a few sample architecture makefiles. You can edit them to fit your specific configuration. Specifically: • Set TOPdir to the directory that MP LINPACK is being built in. • Set MPI variables, that is, MPdir, MPinc, and MPlib. • Specify the location Intel MKL and of files to be used (LAdir, LAinc, LAlib). • Adjust compiler and compiler/linker options. • Specify the version of MP LINPACK you are going to build (hybrid or non-hybrid) by setting the version parameter for the nmake command. For example: nmake arch=intel64 mpi=intelmpi version=hybrid install For some sample cases, the makefiles contain values that must be common. However, you need to be familiar with building an HPL and picking appropriate values for these variables. New Features of Intel® Optimized MP LINPACK Benchmark The toolset is basically identical with the HPL 2.0 distribution. There are a few changes that are optionally compiled in and disabled until you specifically request them. These new features are: ASYOUGO: Provides non-intrusive performance information while runs proceed. There are only a few outputs and this information does not impact performance. This is especially useful because many runs can go for hours without any information. ASYOUGO2: Provides slightly intrusive additional performance information by intercepting every DGEMM call. ASYOUGO2_DISPLAY: Displays the performance of all the significant DGEMMs inside the run. ENDEARLY: Displays a few performance hints and then terminates the run early. FASTSWAP: Inserts the LAPACK-optimized DLASWP into HPL's code. You can experiment with this to determine best results. HYBRID: Establishes the Hybrid OpenMP/MPI mode of MP LINPACK, providing the possibility to use threaded Intel MKL and prebuilt MP LINPACK hybrid libraries. CAUTION Use this option only with an Intel compiler and the Intel® MPI library version 3.1 or higher. You are also recommended to use the compiler version 10.0 or higher. LINPACK and MP LINPACK Benchmarks 10 91Benchmarking a Cluster To benchmark a cluster, follow the sequence of steps below (some of them are optional). Pay special attention to the iterative steps 3 and 4. They make a loop that searches for HPL parameters (specified in HPL.dat) that enable you to reach the top performance of your cluster. 1. Install HPL and make sure HPL is functional on all the nodes. 2. You may run nodeperf.c (included in the distribution) to see the performance of DGEMM on all the nodes. Compile nodeperf.c with your MPI and Intel MKL. For example: icl /Za /O3 /w /D_WIN_ /I"\include" "\" "\lib\intel64\mkl_core.lib" "\lib\intel64\libiomp5md.lib" nodeperf.c where is msmpi.lib in the case of Microsoft* MPI and mpi.lib in the case of MPICH. Launching nodeperf.c on all the nodes is especially helpful in a very large cluster. nodeperf enables quick identification of the potential problem spot without numerous small MP LINPACK runs around the cluster in search of the bad node. It goes through all the nodes, one at a time, and reports the performance of DGEMM followed by some host identifier. Therefore, the higher the DGEMM performance, the faster that node was performing. 3. Edit HPL.dat to fit your cluster needs. Read through the HPL documentation for ideas on this. Note, however, that you should use at least 4 nodes. 4. Make an HPL run, using compile options such as ASYOUGO, ASYOUGO2, or ENDEARLY to aid in your search. These options enable you to gain insight into the performance sooner than HPL would normally give this insight. When doing so, follow these recommendations: • Use MP LINPACK, which is a patched version of HPL, to save time in the search. All performance intrusive features are compile-optional in MP LINPACK. That is, if you do not use the new options to reduce search time, these features are disabled. The primary purpose of the additions is to assist you in finding solutions. HPL requires a long time to search for many different parameters. In MP LINPACK, the goal is to get the best possible number. Given that the input is not fixed, there is a large parameter space you must search over. An exhaustive search of all possible inputs is improbably large even for a powerful cluster. MP LINPACK optionally prints information on performance as it proceeds. You can also terminate early. • Save time by compiling with -DENDEARLY -DASYOUGO2 and using a negative threshold (do not use a negative threshold on the final run that you intend to submit as a Top500 entry). Set the threshold in line 13 of the HPL 2.0 input file HPL.dat • If you are going to run a problem to completion, do it with -DASYOUGO. 5. Using the quick performance feedback, return to step 3 and iterate until you are sure that the performance is as good as possible. See Also Options to Reduce Search Time Options to Reduce Search Time Running large problems to completion on large numbers of nodes can take many hours. The search space for MP LINPACK is also large: not only can you run any size problem, but over a number of block sizes, grid layouts, lookahead steps, using different factorization methods, and so on. It can be a large waste of time to run a large problem to completion only to discover it ran 0.01% slower than your previous best problem. Use the following options to reduce the search time: 10 Intel® Math Kernel Library for Windows* OS User's Guide 92• -DASYOUGO • -DENDEARLY • -DASYOUGO2 Use -DASYOUGO2 cautiously because it does have a marginal performance impact. To see DGEMM internal performance, compile with -DASYOUGO2 and -DASYOUGO2_DISPLAY. These options provide a lot of useful DGEMM performance information at the cost of around 0.2% performance loss. If you want to use the old HPL, simply omit these options and recompile from scratch. To do this, try "nmake arch= clean_arch_all". -DASYOUGO -DASYOUGO gives performance data as the run proceeds. The performance always starts off higher and then drops because this actually happens in LU decomposition (a decomposition of a matrix into a product of a lower (L) and upper (U) triangular matrices). The ASYOUGO performance estimate is usually an overestimate (because the LU decomposition slows down as it goes), but it gets more accurate as the problem proceeds. The greater the lookahead step, the less accurate the first number may be. ASYOUGO tries to estimate where one is in the LU decomposition that MP LINPACK performs and this is always an overestimate as compared to ASYOUGO2, which measures actually achieved DGEMM performance. Note that the ASYOUGO output is a subset of the information that ASYOUGO2 provides. So, refer to the description of the -DASYOUGO2 option below for the details of the output. -DENDEARLY -DENDEARLY t erminates the problem after a few steps, so that you can set up 10 or 20 HPL runs without monitoring them, see how they all do, and then only run the fastest ones to completion. -DENDEARLY assumes -DASYOUGO. You do not need to define both, although it doesn't hurt. To avoid the residual check for a problem that terminates early, set the "threshold" parameter in HPL.dat to a negative number when testing ENDEARLY. It also sometimes gives a better picture to compile with -DASYOUGO2 when using - DENDEARLY. Usage notes on -DENDEARLY follow: • -DENDEARLY stops the problem after a few iterations of DGEMM on the block size (the bigger the blocksize, the further it gets). It prints only 5 or 6 "updates", whereas -DASYOUGO prints about 46 or so output elements before the problem completes. • Performance for -DASYOUGO and -DENDEARLY always starts off at one speed, slowly increases, and then slows down toward the end (because that is what LU does). -DENDEARLY is likely to terminate before it starts to slow down. • -DENDEARLY terminates the problem early with an HPL Error exit. It means that you need to ignore the missing residual results, which are wrong because the problem never completed. However, you can get an idea what the initial performance was, and if it looks good, then run the problem to completion without - DENDEARLY. To avoid the error check, you can set HPL's threshold parameter in HPL.dat to a negative number. • Though -DENDEARLY terminates early, HPL treats the problem as completed and computes Gflop rating as though the problem ran to completion. Ignore this erroneously high rating. • The bigger the problem, the more accurately the last update that -DENDEARLY returns is close to what happens when the problem runs to completion. -DENDEARLY is a poor approximation for small problems. It is for this reason that you are suggested to use ENDEARLY in conjunction with ASYOUGO2, because ASYOUGO2 reports actual DGEMM performance, which can be a closer approximation to problems just starting. LINPACK and MP LINPACK Benchmarks 10 93-DASYOUGO2 -DASYOUGO2 gives detailed single-node DGEMM performance information. It captures all DGEMM calls (if you use Fortran BLAS) and records their data. Because of this, the routine has a marginal intrusive overhead. Unlike -DASYOUGO, which is quite non-intrusive, -DASYOUGO2 interrupts every DGEMM call to monitor its performance. You should beware of this overhead, although for big problems, it is, less than 0.1%. Here is a sample ASYOUGO2 output (the first 3 non-intrusive numbers can be found in ASYOUGO and ENDEARLY), so it suffices to describe these numbers here: Col=001280 Fract=0.050 Mflops=42454.99 (DT=9.5 DF=34.1 DMF=38322.78). The problem size was N=16000 with a block size of 128. After 10 blocks, that is, 1280 columns, an output was sent to the screen. Here, the fraction of columns completed is 1280/16000=0.08. Only up to 40 outputs are printed, at various places through the matrix decomposition: fractions 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0.090 0.095 0.100 0.105 0.110 0.115 0.120 0.125 0.130 0.135 0.140 0.145 0.150 0.155 0.160 0.165 0.170 0.175 0.180 0.185 0.190 0.195 0.200 0.205 0.210 0.215 0.220 0.225 0.230 0.235 0.240 0.245 0.250 0.255 0.260 0.265 0.270 0.275 0.280 0.285 0.290 0.295 0.300 0.305 0.310 0.315 0.320 0.325 0.330 0.335 0.340 0.345 0.350 0.355 0.360 0.365 0.370 0.375 0.380 0.385 0.390 0.395 0.400 0.405 0.410 0.415 0.420 0.425 0.430 0.435 0.440 0.445 0.450 0.455 0.460 0.465 0.470 0.475 0.480 0.485 0.490 0.495 0.515 0.535 0.555 0.575 0.595 0.615 0.635 0.655 0.675 0.695 0.795 0.895. However, this problem size is so small and the block size so big by comparison that as soon as it prints the value for 0.045, it was already through 0.08 fraction of the columns. On a really big problem, the fractional number will be more accurate. It never prints more than the 112 numbers above. So, smaller problems will have fewer than 112 updates, and the biggest problems will have precisely 112 updates. Mflops is an estimate based on 1280 columns of LU being completed. However, with lookahead steps, sometimes that work is not actually completed when the output is made. Nevertheless, this is a good estimate for comparing identical runs. The 3 numbers in parenthesis are intrusive ASYOUGO2 addins. DT is the total time processor 0 has spent in DGEMM. DF is the number of billion operations that have been performed in DGEMM by one processor. Hence, the performance of processor 0 (in Gflops) in DGEMM is always DF/DT. Using the number of DGEMM flops as a basis instead of the number of LU flops, you get a lower bound on performance of the run by looking at DMF, which can be compared to Mflops above (It uses the global LU time, but the DGEMM flops are computed under the assumption that the problem is evenly distributed amongst the nodes, as only HPL's node (0,0) returns any output.) Note that when using the above performance monitoring tools to compare different HPL.dat input data sets, you should be aware that the pattern of performance drop-off that LU experiences is sensitive to some input data. For instance, when you try very small problems, the performance drop-off from the initial values to end values is very rapid. The larger the problem, the less the drop-off, and it is probably safe to use the first few performance values to estimate the difference between a problem size 700000 and 701000, for instance. Another factor that influences the performance drop-off is the grid dimensions (P and Q). For big problems, the performance tends to fall off less from the first few steps when P and Q are roughly equal in value. You can make use of a large number of parameters, such as broadcast types, and change them so that the final performance is determined very closely by the first few steps. Using these tools will greatly assist the amount of data you can test. See Also Benchmarking a Cluster 10 Intel® Math Kernel Library for Windows* OS User's Guide 94Intel® Math Kernel Library Language Interfaces Support A Language Interfaces Support, by Function Domain The following table shows language interfaces that Intel® Math Kernel Library (Intel® MKL) provides for each function domain. However, Intel MKL routines can be called from other languages using mixed-language programming. See Mixed-language Programming with Intel® MKL for an example of how to call Fortran routines from C/C++. Function Domain FORTRAN 77 interface Fortran 9 0/95 interface C/C++ interface Basic Linear Algebra Subprograms (BLAS) Yes Yes via CBLAS BLAS-like extension transposition routines Yes Yes Sparse BLAS Level 1 Yes Yes via CBLAS Sparse BLAS Level 2 and 3 Yes Yes Yes LAPACK routines for solving systems of linear equations Yes Yes Yes LAPACK routines for solving least-squares problems, eigenvalue and singular value problems, and Sylvester's equations Yes Yes Yes Auxiliary and utility LAPACK routines Yes Yes Parallel Basic Linear Algebra Subprograms (PBLAS) Yes ScaLAPACK routines Yes † DSS/PARDISO* solvers Yes Yes Yes Other Direct and Iterative Sparse Solver routines Yes Yes Yes Vector Mathematical Library (VML) functions Yes Yes Yes Vector Statistical Library (VSL) functions Yes Yes Yes Fourier Transform functions (FFT) Yes Yes Cluster FFT functions Yes Yes Trigonometric Transform routines Yes Yes Fast Poisson, Laplace, and Helmholtz Solver (Poisson Library) routines Yes Yes Optimization (Trust-Region) Solver routines Yes Yes Yes Data Fitting functions Yes Yes Yes GMP* arithmetic functions †† Yes Support functions (including memory allocation) Yes Yes Yes † Supported using a mixed language programming call. See Intel ® MKL Include Files for the respective header file. 95†† GMP Arithmetic Functions are deprecated and will be removed in a future release. Include Files Function domain Fortran Include Files C/C++ Include Files All function domains mkl.fi mkl.h BLAS Routines blas.f90 mkl_blas.fi mkl_blas.h BLAS-like Extension Transposition Routines mkl_trans.fi mkl_trans.h CBLAS Interface to BLAS mkl_cblas.h Sparse BLAS Routines mkl_spblas.fi mkl_spblas.h LAPACK Routines lapack.f90 mkl_lapack.fi mkl_lapack.h C Interface to LAPACK mkl_lapacke.h ScaLAPACK Routines mkl_scalapack.h All Sparse Solver Routines mkl_solver.f90 mkl_solver.h PARDISO mkl_pardiso.f77 mkl_pardiso.f90 mkl_pardiso.h DSS Interface mkl_dss.f77 mkl_dss.f90 mkl_dss.h RCI Iterative Solvers ILU Factorization mkl_rci.fi mkl_rci.h Optimization Solver Routines mkl_rci.fi mkl_rci.h Vector Mathematical Functions mkl_vml.f77 mkl_vml.90 mkl_vml.h Vector Statistical Functions mkl_vsl.f77 mkl_vsl.f90 mkl_vsl_functions.h Fourier Transform Functions mkl_dfti.f90 mkl_dfti.h Cluster Fourier Transform Functions mkl_cdft.f90 mkl_cdft.h Partial Differential Equations Support Routines Trigonometric Transforms mkl_trig_transforms.f90 mkl_trig_transforms.h Poisson Solvers mkl_poisson.f90 mkl_poisson.h Data Fitting functions mkl_df.f77 mkl_df.f90 mkl_df.h GMP interface † mkl_gmp.h Support functions mkl_service.f90 mkl_service.h A Intel® Math Kernel Library for Windows* OS User's Guide 96Function domain Fortran Include Files C/C++ Include Files mkl_service.fi Memory allocation routines i_malloc.h Intel MKL examples interface mkl_example.h † GMP Arithmetic Functions are deprecated and will be removed in a future release. See Also Language Interfaces Support, by Function Domain Intel® Math Kernel Library Language Interfaces Support A 97A Intel® Math Kernel Library for Windows* OS User's Guide 98Support for Third-Party Interfaces B GMP* Functions Intel® Math Kernel Library (Intel® MKL) implementation of GMP* arithmetic functions includes arbitrary precision arithmetic operations on integer numbers. The interfaces of such functions fully match the GNU Multiple Precision* (GMP) Arithmetic Library. For specifications of these functions, please see http:// software.intel.com/sites/products/documentation/hpc/mkl/gnump/index.htm. NOTE Intel MKL GMP Arithmetic Functions are deprecated and will be removed in a future release. If you currently use the GMP* library, you need to modify INCLUDE statements in your programs to mkl_gmp.h. FFTW Interface Support Intel® Math Kernel Library (Intel® MKL) offers two collections of wrappers for the FFTW interface (www.fftw.org). The wrappers are the superstructure of FFTW to be used for calling the Intel MKL Fourier transform functions. These collections correspond to the FFTW versions 2.x and 3.x and the Intel MKL versions 7.0 and later. These wrappers enable using Intel MKL Fourier transforms to improve the performance of programs that use FFTW without changing the program source code. See the "FFTW Interface to Intel® Math Kernel Library" appendix in the Intel MKL Reference Manual for details on the use of the wrappers. Important For ease of use, FFTW3 interface is also integrated in Intel MKL. 99B Intel® Math Kernel Library for Windows* OS User's Guide 100Directory Structure in Detail C Tables in this section show contents of the Intel(R) Math Kernel Library (Intel(R) MKL) architecture-specific directories. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Detailed Structure of the IA-32 Architecture Directories Static Libraries in the lib\ia32 Directory File Contents Interface layer mkl_intel_c.lib cdecl interface library mkl_intel_s.lib CVF default interface library mkl_blas95.lib Fortran 95 interface library for BLAS. Supports the Intel® Fortran compiler mkl_lapack95.lib Fortran 95 interface library for LAPACK. Supports the Intel® Fortran compiler Threading layer mkl_intel_thread.lib Threading library for the Intel compilers mkl_pgi_thread.lib Threading library for the PGI* compiler mkl_sequential.lib Sequential library Computational layer mkl_core.lib Kernel library for IA-32 architecture mkl_solver.lib Deprecated. Empty library for backward compatibility mkl_solver_sequential.lib Deprecated. Empty library for backward compatibility mkl_scalapack_core.lib ScaLAPACK routines mkl_cdft_core.lib Cluster version of FFTs Run-time Libraries (RTL) 101File Contents mkl_blacs_intelmpi.lib BLACS routines supporting Intel MPI mkl_blacs_mpich2.lib BLACS routines supporting MPICH2 Dynamic Libraries in the lib\ia32 Directory File Contents mkl_rt.lib Single Dynamic Library to be used for linking Interface layer mkl_intel_c_dll.lib cdecl interface library for dynamic linking mkl_intel_s_dll.lib CVF default interface library for dynamic linking Threading layer mkl_intel_thread_dll.lib Threading library for dynamic linking with the Intel compilers mkl_pgi_thread_dll.lib Threading library for dynamic linking with the PGI* compiler mkl_sequential_dll.lib Sequential library for dynamic linking Computational layer mkl_core_dll.lib Core library for dynamic linking mkl_scalapack_core_dll.lib ScaLAPACK routine library for dynamic linking mkl_cdft_core_dll.lib Cluster FFT library for dynamic linking Run-time Libraries (RTL) mkl_blacs_dll.lib BLACS interface library for dynamic linking Contents of the redist\ia32\mkl Directory File Contents mkl_rt.dll Single Dynamic Library Threading layer mkl_intel_thread.dll Dynamic threading library for the Intel compilers mkl_pgi_thread.dll Dynamic threading library for the PGI* compiler mkl_sequential.dll Dynamic sequential library Computational layer mkl_core.dll Core library containing processor-independent code and a dispatcher for dynamic loading of processor-specific code mkl_def.dll Default kernel (Intel® Pentium®, Pentium® Pro, Pentium® II, and Pentium® III processors) C Intel® Math Kernel Library for Windows* OS User's Guide 102File Contents mkl_p4.dll Pentium® 4 processor kernel mkl_p4p.dll Kernel for the Intel® Pentium® 4 processor with Streaming SIMD Extensions 3 (SSE3), including Intel® Core™ Duo and Intel® Core™ Solo processors. mkl_p4m.dll Kernel for processors based on the Intel® Core™ microarchitecture (except Intel® Core™ Duo and Intel® Core™ Solo processors, for which mkl_p4p.dll is intended) mkl_p4m3.dll Kernel for the Intel® Core™ i7 processors mkl_vml_def.dll VML/VSL part of default kernel for old Intel® Pentium® processors mkl_vml_ia.dll VML/VSL default kernel for newer Intel® architecture processors mkl_vml_p4.dll VML/VSL part of Pentium® 4 processor kernel mkl_vml_p4p.dll VML/VSL for Pentium® 4 processor with Streaming SIMD Extensions 3 (SSE3) mkl_vml_p4m.dll VML/VSL for processors based on the Intel® Core™ microarchitecture (except Intel® Core™ Duo and Intel® Core™ Solo processors, for which mkl_vml_p4p.dll is intended). mkl_vml_p4m2.dll VML/VSL for 45nm Hi-k Intel® Core™2 and Intel Xeon® processor families mkl_vml_p4m3.dll VML/VSL for the Intel® Core™ i7 processors mkl_vml_avx.dll VML/VSL optimized for the Intel® Advanced Vector Extensions (Intel® AVX) mkl_scalapack_core.dll ScaLAPACK routines mkl_cdft_core.dll Cluster FFT dynamic library libimalloc.dll Dynamic library to support renaming of memory functions Run-time Libraries (RTL) mkl_blacs.dll BLACS routines mkl_blacs_intelmpi.dll BLACS routines supporting Intel MPI mkl_blacs_mpich2.dll BLACS routines supporting MPICH2 1033\mkl_msg.dll Catalog of Intel® Math Kernel Library (Intel® MKL) messages in English 1041\mkl_msg.dll Catalog of Intel MKL messages in Japanese. Available only if the Intel® MKL package provides Japanese localization. Please see the Release Notes for this information Detailed Structure of the Intel® 64 Architecture Directories Directory Structure in Detail C 103Static Libraries in the lib\intel64 Directory File Contents Interface layer mkl_intel_lp64.lib LP64 interface library for the Intel compilers mkl_intel_ilp64.lib ILP64 interface library for the Intel compilers mkl_intel_sp2dp.a SP2DP interface library for the Intel compilers mkl_blas95_lp64.lib Fortran 95 interface library for BLAS. Supports the Intel® Fortran compiler and LP64 interface mkl_blas95_ilp64.lib Fortran 95 interface library for BLAS. Supports the Intel® Fortran compiler and ILP64 interface mkl_lapack95_lp64.lib Fortran 95 interface library for LAPACK. Supports the Intel® Fortran compiler and LP64 interface mkl_lapack95_ilp64.lib Fortran 95 interface library for LAPACK. Supports the Intel® Fortran compiler and ILP64 interface Threading layer mkl_intel_thread.lib Threading library for the Intel compilers mkl_pgi_thread.lib Threading library for the PGI* compiler mkl_sequential.lib Sequential library Computational layer mkl_core.lib Kernel library for the Intel® 64 architecture mkl_solver_lp64.lib Deprecated. Empty library for backward compatibility mkl_solver_lp64_sequential.lib Deprecated. Empty library for backward compatibility mkl_solver_ilp64.lib Deprecated. Empty library for backward compatibility mkl_solver_ilp64_sequential.lib Deprecated. Empty library for backward compatibility mkl_scalapack_lp64.lib ScaLAPACK routine library supporting the LP64 interface mkl_scalapack_ilp64.lib ScaLAPACK routine library supporting the ILP64 interface mkl_cdft_core.lib Cluster version of FFTs Run-time Libraries (RTL) mkl_blacs_intelmpi_lp64.lib LP64 version of BLACS routines supporting Intel MPI mkl_blacs_intelmpi_ilp64.lib ILP64 version of BLACS routines supporting Intel MPI mkl_blacs_mpich2_lp64.lib LP64 version of BLACS routines supporting MPICH2 mkl_blacs_mpich2_ilp64.lib ILP64 version of BLACS routines supporting MPICH2 mkl_blacs_msmpi_lp64.lib LP64 version of BLACS routines supporting Microsoft* MPI mkl_blacs_msmpi_ilp64.lib ILP64 version of BLACS routines supporting Microsoft* MPI C Intel® Math Kernel Library for Windows* OS User's Guide 104Dynamic Libraries in the lib\intel64 Directory File Contents mkl_rt.lib Single Dynamic Library to be used for linking Interface layer mkl_intel_lp64_dll.lib LP64 interface library for dynamic linking with the Intel compilers mkl_intel_ilp64_dll.lib ILP64 interface library for dynamic linking with the Intel compilers Threading layer mkl_intel_thread_dll.lib Threading library for dynamic linking with the Intel compilers mkl_pgi_thread_dll.lib Threading library for dynamic linking with the PGI* compiler mkl_sequential_dll.lib Sequential library for dynamic linking Computational layer mkl_core_dll.lib Core library for dynamic linking mkl_scalapack_lp64_dll.lib ScaLAPACK routine library for dynamic linking supporting the LP64 interface mkl_scalapack_ilp64_dll.lib ScaLAPACK routine library for dynamic linking supporting the ILP64 interface mkl_cdft_core_dll.lib Cluster FFT library for dynamic linking Run-time Libraries (RTL) mkl_blacs_lp64_dll.lib LP64 version of BLACS interface library for dynamic linking mkl_blacs_ilp64_dll.lib ILP64 version of BLACS interface library for dynamic linking Contents of the redist\intel64\mkl Directory File Contents mkl_rt.dll Single Dynamic Library Threading layer mkl_intel_thread.dll Dynamic threading library for the Intel compilers mkl_pgi_thread.dll Dynamic threading library for the PGI* compiler mkl_sequential.dll Dynamic sequential library Computational layer mkl_core.dll Core library containing processor-independent code and a Directory Structure in Detail C 105File Contents dispatcher for dynamic loading of processor-specific code mkl_def.dll Default kernel for the Intel® 64 architecture mkl_p4n.dll Kernel for the Intel® Xeon® processor using the Intel® 64 architecture mkl_mc.dll Kernel for processors based on the Intel® Core™ microarchitecture mkl_mc3.dll Kernel for the Intel® Core™ i7 processors mkl_avx.dll Kernel optimized for the Intel® Advanced Vector Extensions (Intel® AVX). mkl_vml_def.dll VML/VSL part of default kernel mkl_vml_p4n.dll VML/VSL for the Intel® Xeon® processor using the Intel® 64 architecture mkl_vml_mc.dll VML/VSL for processors based on the Intel® Core™ microarchitecture mkl_vml_mc2.dll VML/VSL for 45nm Hi-k Intel® Core™2 and Intel Xeon® processor families mkl_vml_mc3.dll VML/VSL for the Intel® Core® i7 processors mkl_vml_avx.dll VML/VSL optimized for the Intel® Advanced Vector Extensions (Intel® AVX) mkl_scalapack_lp64.dll ScaLAPACK routine library supporting the LP64 interface mkl_scalapack_ilp64.dll ScaLAPACK routine library supporting the ILP64 interface mkl_cdft_core.dll Cluster FFT dynamic library libimalloc.dll Dynamic library to support renaming of memory functions Run-time Libraries (RTL) mkl_blacs_lp64.dll LP64 version of BLACS routines mkl_blacs_ilp64.dll ILP64 version of BLACS routines mkl_blacs_intelmpi_lp64.dll LP64 version of BLACS routines supporting Intel MPI mkl_blacs_intelmpi_ilp64.dll ILP64 version of BLACS routines supporting Intel MPI mkl_blacs_mpich2_lp64.dll LP64 version of BLACS routines supporting MPICH2 mkl_blacs_mpich2_ilp64.dll ILP64 version of BLACS routines supporting MPICH2 mkl_blacs_msmpi_lp64.dll LP64 version of BLACS routines supporting Microsoft* MPI mkl_blacs_msmpi_ilp64.dll ILP64 version of BLACS routines supporting Microsoft* MPI 1033\mkl_msg.dll Catalog of Intel® Math Kernel Library (Intel® MKL) messages in English 1041\mkl_msg.dll Catalog of Intel MKL messages in Japanese. Available only if the Intel® MKL package provides Japanese localization. Please see the Release Notes for this information C Intel® Math Kernel Library for Windows* OS User's Guide 106Index A affinity mask 53 aligning data 69 architecture support 23 B BLAS calling routines from C 61 Fortran 95 interface to 59 threaded routines 43 building a custom DLL in Visual Studio* IDE 41 C C interface to LAPACK, use of 61 C, calling LAPACK, BLAS, CBLAS from 61 C/C++, Intel(R) MKL complex types 62 calling BLAS functions from C 63 CBLAS interface from C 63 complex BLAS Level 1 function from C 63 complex BLAS Level 1 function from C++ 63 Fortran-style routines from C 61 calling convention, cdecl and stdcall 19 CBLAS interface, use of 61 cdecl interface, use of 33 Cluster FFT, linking with 71 cluster software, Intel(R) MKL cluster software, linking with commands 71 linking examples 74 code examples, use of 19 coding data alignment techniques to improve performance 52 compilation, Intel(R) MKL version-dependent 70 compiler run-time libraries, linking with 38 compiler support 19 compiler-dependent function 59 complex types in C and C++, Intel(R) MKL 62 computation results, consistency 69 computational libraries, linking with 37 conditional compilation 70 configuring Intel(R) Visual Fortran 77 Microsoft Visual* C/C++ 77 project that runs Intel(R) MKL code example in Visual Studio* 2008 IDE 78 consistent results 69 context-sensitive Help, for Intel(R) MKL, in Visual Studio* IDE 83 conventions, notational 13 ctdcall interface, use of 33 custom DLL building 39 composing list of functions 40 specifying function names 41 CVF calling convention, use with Intel(R) MKL 60 D denormal number, performance 54 directory structure documentation 26 high-level 23 in-detail documentation directories, contents 26 E Enter index keyword 27 environment variables, setting 17 examples, linking for cluster software 74 general 30 F FFT interface data alignment 52 optimised radices 54 threaded problems 43 FFTW interface support 99 Fortran 95 interface libraries 36 G GNU* Multiple Precision Arithmetic Library 99 H header files, Intel(R) MKL 96 Help, for Intel(R) MKL in Visual Studio* IDE 82 HT technology, configuration tip 53 hybrid, version, of MP LINPACK 89 I ILP64 programming, support for 34 include files, Intel(R) MKL 96 installation, checking 17 Intel(R) Hyper-Threading Technology, configuration tip 53 Intel(R) Visual* Fortran project, linking with Intel(R) MKL 28 IntelliSense*, with Intel(R) MKL, in Visual Studio* IDE 84 interface cdecl and stdcall, use of 33 Fortran 95, libraries 36 LP64 and ILP64, use of 34 interface libraries and modules, Intel(R) MKL 57 interface libraries, linking with 33 J Java* examples 66 L language interfaces support 95 language-specific interfaces interface libraries and modules 57 LAPACK Index 107C interface to, use of 61 calling routines from C 61 Fortran 95 interface to 59 performance of packed routines 52 threaded routines 43 layers, Intel(R) MKL structure 25 libraries to link with computational 37 interface 33 run-time 38 system libraries 38 threading 36 link tool, command line 30 linking Intel(R) Visual* Fortran project with Intel(R) MKL 28 Microsoft Visual* C/C++ project with Intel(R) MKL 28 linking examples cluster software 74 general 30 linking with compiler run-time libraries 38 computational libraries 37 interface libraries 33 system libraries 38 threading libraries 36 linking, quick start 27 linking, Web-based advisor 29 LINPACK benchmark 87 M memory functions, redefining 55 memory management 54 memory renaming 55 Microsoft Visual* C/C++ project, linking with Intel(R) MKL 28 mixed-language programming 61 module, Fortran 95 59 MP LINPACK benchmark 89 multi-core performance 53 N notational conventions 13 number of threads changing at run time 46 changing with OpenMP* environment variable 46 Intel(R) MKL choice, particular cases 49 setting for cluster 73 techniques to set 46 P parallel performance 45 parallelism, of Intel(R) MKL 43 performance multi-core 53 with denormals 54 with subnormals 54 S ScaLAPACK, linking with 71 SDL 28, 32 sequential mode of Intel(R) MKL 36 Single Dynamic Library 28, 32 stdcall calling convention, use in C/C++ 60 structure high-level 23 in-detail model 25 support, technical 11 supported architectures 23 system libraries, linking with 38 T technical support 11 thread safety, of Intel(R) MKL 43 threaded functions 43 threaded problems 43 threading control, Intel(R) MKL-specific 48 threading libraries, linking with 36 U uBLAS, matrix-matrix multiplication, substitution with Intel MKL functions 64 unstable output, getting rid of 69 usage information 15 V Visual Studio* 2008 IDE, configuring a project that runs Intel(R) MKL code example 78 Visual Studio* IDE IntelliSense*, with Intel(R) MKL 84 using Intel(R) MKL context-sensitive Help in 83 Veiwing Intel(R) MKL documentation in 82 Intel® Math Kernel Library for Windows* OS User's Guide 108 Intel® Math Kernel Library Reference Manual Document Number: 630813-045US MKL 10.3 Update 8 Legal Information Contents Legal Information..............................................................................33 Introducing the Intel® Math Kernel Library.........................................35 Getting Help and Support...................................................................37 What's New........................................................................................39 Notational Conventions......................................................................41 Chapter 1: Function Domains BLAS Routines.........................................................................................44 Sparse BLAS Routines..............................................................................44 LAPACK Routines.....................................................................................44 ScaLAPACK Routines................................................................................44 PBLAS Routines.......................................................................................45 Sparse Solver Routines.............................................................................45 VML Functions.........................................................................................46 Statistical Functions.................................................................................46 Fourier Transform Functions......................................................................46 Partial Differential Equations Support..........................................................46 Nonlinear Optimization Problem Solvers......................................................47 Support Functions....................................................................................47 BLACS Routines.......................................................................................47 Data Fitting Functions...............................................................................48 GMP Arithmetic Functions..........................................................................48 Performance Enhancements......................................................................48 Parallelism..............................................................................................49 C Datatypes Specific to Intel MKL...............................................................49 Chapter 2: BLAS and Sparse BLAS Routines BLAS Routines.........................................................................................51 Routine Naming Conventions.............................................................51 Fortran 95 Interface Conventions.......................................................52 Matrix Storage Schemes...................................................................53 BLAS Level 1 Routines and Functions..................................................53 ?asum....................................................................................54 ?axpy....................................................................................55 ?copy.....................................................................................56 ?dot.......................................................................................58 ?sdot.....................................................................................59 ?dotc.....................................................................................60 ?dotu.....................................................................................61 ?nrm2....................................................................................62 ?rot.......................................................................................63 ?rotg.....................................................................................64 ?rotm....................................................................................65 ?rotmg...................................................................................67 ?scal......................................................................................69 Contents 3 ?swap....................................................................................70 i?amax...................................................................................71 i?amin...................................................................................72 ?cabs1...................................................................................73 BLAS Level 2 Routines......................................................................74 ?gbmv...................................................................................75 ?gemv...................................................................................77 ?ger......................................................................................79 ?gerc.....................................................................................81 ?geru.....................................................................................82 ?hbmv...................................................................................84 ?hemv...................................................................................86 ?her......................................................................................87 ?her2.....................................................................................89 ?hpmv...................................................................................91 ?hpr......................................................................................92 ?hpr2.....................................................................................94 ?sbmv....................................................................................95 ?spmv....................................................................................98 ?spr.......................................................................................99 ?spr2...................................................................................101 ?symv..................................................................................102 ?syr.....................................................................................104 ?syr2...................................................................................106 ?tbmv..................................................................................107 ?tbsv...................................................................................109 ?tpmv..................................................................................112 ?tpsv...................................................................................113 ?trmv...................................................................................115 ?trsv....................................................................................117 BLAS Level 3 Routines....................................................................118 ?gemm.................................................................................119 ?hemm.................................................................................122 ?herk...................................................................................124 ?her2k.................................................................................126 ?symm.................................................................................128 ?syrk...................................................................................131 ?syr2k..................................................................................133 ?trmm..................................................................................135 ?trsm...................................................................................138 Sparse BLAS Level 1 Routines..................................................................140 Vector Arguments..........................................................................140 Naming Conventions......................................................................140 Routines and Data Types................................................................141 BLAS Level 1 Routines That Can Work With Sparse Vectors.................141 ?axpyi..........................................................................................141 ?doti............................................................................................143 ?dotci...........................................................................................144 ?dotui...........................................................................................145 ?gthr............................................................................................146 Intel® Math Kernel Library Reference Manual 4 ?gthrz..........................................................................................147 ?roti.............................................................................................148 ?sctr............................................................................................149 Sparse BLAS Level 2 and Level 3 Routines.................................................151 Naming Conventions in Sparse BLAS Level 2 and Level 3.....................151 Sparse Matrix Storage Formats........................................................152 Routines and Supported Operations..................................................152 Interface Consideration...................................................................153 Sparse BLAS Level 2 and Level 3 Routines.........................................158 mkl_?csrgemv.......................................................................161 mkl_?bsrgemv......................................................................164 mkl_?coogemv......................................................................166 mkl_?diagemv.......................................................................169 mkl_?csrsymv.......................................................................171 mkl_?bsrsymv.......................................................................173 mkl_?coosymv......................................................................176 mkl_?diasymv.......................................................................178 mkl_?csrtrsv.........................................................................181 mkl_?bsrtrsv.........................................................................184 mkl_?cootrsv........................................................................186 mkl_?diatrsv.........................................................................189 mkl_cspblas_?csrgemv...........................................................192 mkl_cspblas_?bsrgemv...........................................................194 mkl_cspblas_?coogemv..........................................................197 mkl_cspblas_?csrsymv...........................................................199 mkl_cspblas_?bsrsymv...........................................................202 mkl_cspblas_?coosymv..........................................................204 mkl_cspblas_?csrtrsv.............................................................207 mkl_cspblas_?bsrtrsv.............................................................209 mkl_cspblas_?cootrsv............................................................212 mkl_?csrmv..........................................................................215 mkl_?bsrmv..........................................................................218 mkl_?cscmv..........................................................................222 mkl_?coomv.........................................................................225 mkl_?csrsv...........................................................................228 mkl_?bsrsv...........................................................................232 mkl_?cscsv...........................................................................235 mkl_?coosv...........................................................................239 mkl_?csrmm.........................................................................242 mkl_?bsrmm.........................................................................246 mkl_?cscmm.........................................................................250 mkl_?coomm........................................................................254 mkl_?csrsm..........................................................................257 mkl_?cscsm..........................................................................261 mkl_?coosm..........................................................................265 mkl_?bsrsm..........................................................................268 mkl_?diamv..........................................................................272 mkl_?skymv.........................................................................275 mkl_?diasv...........................................................................278 mkl_?skysv...........................................................................281 Contents 5 mkl_?diamm.........................................................................284 mkl_?skymm........................................................................288 mkl_?diasm..........................................................................291 mkl_?skysm..........................................................................295 mkl_?dnscsr..........................................................................298 mkl_?csrcoo..........................................................................301 mkl_?csrbsr..........................................................................304 mkl_?csrcsc..........................................................................307 mkl_?csrdia..........................................................................309 mkl_?csrsky..........................................................................313 mkl_?csradd.........................................................................316 mkl_?csrmultcsr....................................................................320 mkl_?csrmultd......................................................................324 BLAS-like Extensions..............................................................................327 ?axpby.........................................................................................327 ?gem2vu......................................................................................329 ?gem2vc.......................................................................................331 ?gemm3m....................................................................................333 mkl_?imatcopy..............................................................................335 mkl_?omatcopy.............................................................................338 mkl_?omatcopy2...........................................................................341 mkl_?omatadd...............................................................................344 Chapter 3: LAPACK Routines: Linear Equations Routine Naming Conventions...................................................................347 C Interface Conventions..........................................................................348 Fortran 95 Interface Conventions.............................................................351 Intel® MKL Fortran 95 Interfaces for LAPACK Routines vs. Netlib Implementation.........................................................................352 Matrix Storage Schemes.........................................................................353 Mathematical Notation............................................................................354 Error Analysis........................................................................................354 Computational Routines..........................................................................355 Routines for Matrix Factorization......................................................357 ?getrf...................................................................................357 ?gbtrf...................................................................................359 ?gttrf...................................................................................361 ?dttrfb..................................................................................363 ?potrf...................................................................................364 ?pstrf...................................................................................366 ?pftrf...................................................................................368 ?pptrf...................................................................................369 ?pbtrf...................................................................................371 ?pttrf...................................................................................373 ?sytrf...................................................................................374 ?hetrf...................................................................................378 ?sptrf...................................................................................381 ?hptrf...................................................................................383 Routines for Solving Systems of Linear Equations...............................385 ?getrs..................................................................................385 Intel® Math Kernel Library Reference Manual 6 ?gbtrs..................................................................................387 ?gttrs...................................................................................389 ?dttrsb.................................................................................392 ?potrs..................................................................................393 ?pftrs...................................................................................395 ?pptrs..................................................................................396 ?pbtrs..................................................................................398 ?pttrs...................................................................................400 ?sytrs...................................................................................402 ?hetrs..................................................................................404 ?sytrs2.................................................................................406 ?hetrs2................................................................................408 ?sptrs..................................................................................409 ?hptrs..................................................................................411 ?trtrs...................................................................................413 ?tptrs...................................................................................416 ?tbtrs...................................................................................418 Routines for Estimating the Condition Number...................................420 ?gecon.................................................................................420 ?gbcon.................................................................................422 ?gtcon..................................................................................424 ?pocon.................................................................................426 ?ppcon.................................................................................428 ?pbcon.................................................................................430 ?ptcon..................................................................................432 ?sycon.................................................................................434 ?syconv................................................................................436 ?hecon.................................................................................438 ?spcon.................................................................................439 ?hpcon.................................................................................441 ?trcon..................................................................................443 ?tpcon..................................................................................445 ?tbcon..................................................................................447 Refining the Solution and Estimating Its Error....................................449 ?gerfs..................................................................................449 ?gerfsx.................................................................................452 ?gbrfs..................................................................................458 ?gbrfsx.................................................................................461 ?gtrfs...................................................................................467 ?porfs..................................................................................469 ?porfsx.................................................................................472 ?pprfs..................................................................................478 ?pbrfs..................................................................................480 ?ptrfs...................................................................................483 ?syrfs...................................................................................485 ?syrfsx.................................................................................488 ?herfs..................................................................................494 ?herfsx.................................................................................496 ?sprfs...................................................................................501 ?hprfs..................................................................................504 Contents 7 ?trrfs...................................................................................506 ?tprfs...................................................................................508 ?tbrfs...................................................................................511 Routines for Matrix Inversion...........................................................514 ?getri...................................................................................514 ?potri...................................................................................516 ?pftri....................................................................................517 ?pptri...................................................................................519 ?sytri...................................................................................520 ?hetri...................................................................................522 ?sytri2.................................................................................523 ?hetri2.................................................................................525 ?sytri2x................................................................................527 ?hetri2x...............................................................................529 ?sptri...................................................................................530 ?hptri...................................................................................532 ?trtri....................................................................................534 ?tftri....................................................................................535 ?tptri...................................................................................536 Routines for Matrix Equilibration......................................................538 ?geequ.................................................................................538 ?geequb...............................................................................540 ?gbequ.................................................................................542 ?gbequb...............................................................................545 ?poequ.................................................................................547 ?poequb...............................................................................549 ?ppequ.................................................................................550 ?pbequ.................................................................................552 ?syequb...............................................................................554 ?heequb...............................................................................556 Driver Routines......................................................................................557 ?gesv...........................................................................................558 ?gesvx.........................................................................................561 ?gesvxx........................................................................................567 ?gbsv...........................................................................................574 ?gbsvx.........................................................................................576 ?gbsvxx........................................................................................582 ?gtsv............................................................................................589 ?gtsvx..........................................................................................591 ?dtsvb..........................................................................................595 ?posv...........................................................................................596 ?posvx.........................................................................................599 ?posvxx........................................................................................604 ?ppsv...........................................................................................611 ?ppsvx.........................................................................................612 ?pbsv...........................................................................................617 ?pbsvx.........................................................................................619 ?ptsv............................................................................................623 ?ptsvx..........................................................................................625 ?sysv...........................................................................................629 Intel® Math Kernel Library Reference Manual 8 ?sysvx..........................................................................................631 ?sysvxx........................................................................................635 ?hesv...........................................................................................642 ?hesvx.........................................................................................645 ?hesvxx........................................................................................649 ?spsv...........................................................................................655 ?spsvx..........................................................................................657 ?hpsv...........................................................................................661 ?hpsvx.........................................................................................663 Chapter 4: LAPACK Routines: Least Squares and Eigenvalue Problems Routine Naming Conventions...................................................................668 Matrix Storage Schemes.........................................................................669 Mathematical Notation............................................................................669 Computational Routines..........................................................................669 Orthogonal Factorizations................................................................670 ?geqrf..................................................................................671 ?geqrfp................................................................................674 ?geqpf..................................................................................676 ?geqp3.................................................................................678 ?orgqr..................................................................................681 ?ormqr.................................................................................683 ?ungqr.................................................................................685 ?unmqr................................................................................687 ?gelqf..................................................................................689 ?orglq..................................................................................692 ?ormlq.................................................................................694 ?unglq..................................................................................696 ?unmlq.................................................................................698 ?geqlf..................................................................................700 ?orgql..................................................................................702 ?ungql..................................................................................704 ?ormql.................................................................................706 ?unmql.................................................................................708 ?gerqf..................................................................................710 ?orgrq..................................................................................712 ?ungrq.................................................................................714 ?ormrq.................................................................................716 ?unmrq................................................................................718 ?tzrzf...................................................................................720 ?ormrz.................................................................................723 ?unmrz................................................................................725 ?ggqrf..................................................................................728 ?ggrqf..................................................................................731 Singular Value Decomposition..........................................................734 ?gebrd.................................................................................736 ?gbbrd.................................................................................739 ?orgbr..................................................................................742 ?ormbr.................................................................................744 ?ungbr.................................................................................747 Contents 9 ?unmbr................................................................................749 ?bdsqr..................................................................................752 ?bdsdc.................................................................................756 Symmetric Eigenvalue Problems......................................................758 ?sytrd..................................................................................762 ?syrdb..................................................................................764 ?herdb.................................................................................766 ?orgtr..................................................................................768 ?ormtr.................................................................................770 ?hetrd..................................................................................772 ?ungtr..................................................................................775 ?unmtr.................................................................................776 ?sptrd..................................................................................779 ?opgtr..................................................................................781 ?opmtr.................................................................................782 ?hptrd..................................................................................784 ?upgtr..................................................................................786 ?upmtr.................................................................................787 ?sbtrd..................................................................................789 ?hbtrd..................................................................................791 ?sterf...................................................................................793 ?steqr..................................................................................795 ?stemr.................................................................................798 ?stedc..................................................................................801 ?stegr..................................................................................805 ?pteqr..................................................................................810 ?stebz..................................................................................813 ?stein...................................................................................815 ?disna..................................................................................818 Generalized Symmetric-Definite Eigenvalue Problems.........................819 ?sygst..................................................................................820 ?hegst..................................................................................822 ?spgst..................................................................................823 ?hpgst..................................................................................825 ?sbgst..................................................................................827 ?hbgst..................................................................................829 ?pbstf..................................................................................831 Nonsymmetric Eigenvalue Problems.................................................833 ?gehrd.................................................................................835 ?orghr..................................................................................837 ?ormhr.................................................................................839 ?unghr.................................................................................842 ?unmhr................................................................................844 ?gebal..................................................................................847 ?gebak.................................................................................849 ?hseqr..................................................................................851 ?hsein..................................................................................855 ?trevc..................................................................................860 ?trsna..................................................................................864 ?trexc..................................................................................868 Intel® Math Kernel Library Reference Manual 10 ?trsen..................................................................................870 ?trsyl...................................................................................874 Generalized Nonsymmetric Eigenvalue Problems................................877 ?gghrd.................................................................................878 ?ggbal..................................................................................880 ?ggbak.................................................................................883 ?hgeqz.................................................................................885 ?tgevc..................................................................................890 ?tgexc..................................................................................894 ?tgsen..................................................................................896 ?tgsyl...................................................................................902 ?tgsna..................................................................................906 Generalized Singular Value Decomposition........................................910 ?ggsvp.................................................................................910 ?tgsja..................................................................................914 Cosine-Sine Decomposition.............................................................919 ?bbcsd.................................................................................920 ?orbdb/?unbdb......................................................................925 Driver Routines......................................................................................930 Linear Least Squares (LLS) Problems................................................930 ?gels....................................................................................930 ?gelsy..................................................................................933 ?gelss..................................................................................937 ?gelsd..................................................................................939 Generalized LLS Problems...............................................................943 ?gglse..................................................................................943 ?ggglm.................................................................................946 Symmetric Eigenproblems...............................................................948 ?syev...................................................................................949 ?heev...................................................................................951 ?syevd.................................................................................954 ?heevd.................................................................................956 ?syevx.................................................................................959 ?heevx.................................................................................963 ?syevr..................................................................................966 ?heevr.................................................................................970 ?spev...................................................................................975 ?hpev...................................................................................977 ?spevd.................................................................................979 ?hpevd.................................................................................981 ?spevx.................................................................................985 ?hpevx.................................................................................988 ?sbev...................................................................................991 ?hbev...................................................................................993 ?sbevd.................................................................................995 ?hbevd.................................................................................998 ?sbevx...............................................................................1001 ?hbevx...............................................................................1004 ?stev..................................................................................1008 ?stevd................................................................................1009 Contents 11 ?stevx................................................................................1012 ?stevr.................................................................................1015 Nonsymmetric Eigenproblems........................................................1019 ?gees.................................................................................1020 ?geesx...............................................................................1024 ?geev.................................................................................1028 ?geevx...............................................................................1032 Singular Value Decomposition........................................................1037 ?gesvd...............................................................................1037 ?gesdd...............................................................................1041 ?gejsv................................................................................1045 ?gesvj................................................................................1051 ?ggsvd...............................................................................1055 Cosine-Sine Decomposition............................................................1060 ?orcsd/?uncsd.....................................................................1060 Generalized Symmetric Definite Eigenproblems................................1065 ?sygv.................................................................................1066 ?hegv.................................................................................1068 ?sygvd...............................................................................1071 ?hegvd...............................................................................1074 ?sygvx...............................................................................1077 ?hegvx...............................................................................1081 ?spgv.................................................................................1085 ?hpgv.................................................................................1087 ?spgvd...............................................................................1089 ?hpgvd...............................................................................1092 ?spgvx...............................................................................1096 ?hpgvx...............................................................................1099 ?sbgv.................................................................................1103 ?hbgv.................................................................................1105 ?sbgvd...............................................................................1107 ?hbgvd...............................................................................1110 ?sbgvx...............................................................................1113 ?hbgvx...............................................................................1117 Generalized Nonsymmetric Eigenproblems.......................................1120 ?gges.................................................................................1121 ?ggesx...............................................................................1126 ?ggev.................................................................................1132 ?ggevx...............................................................................1136 Chapter 5: LAPACK Auxiliary and Utility Routines Auxiliary Routines.................................................................................1143 ?lacgv.........................................................................................1155 ?lacrm........................................................................................1156 ?lacrt..........................................................................................1156 ?laesy.........................................................................................1157 ?rot............................................................................................1158 ?spmv........................................................................................1159 ?spr...........................................................................................1161 ?symv........................................................................................1162 Intel® Math Kernel Library Reference Manual 12 ?syr............................................................................................1163 i?max1.......................................................................................1164 ?sum1........................................................................................1165 ?gbtf2.........................................................................................1166 ?gebd2.......................................................................................1167 ?gehd2.......................................................................................1168 ?gelq2........................................................................................1170 ?geql2........................................................................................1171 ?geqr2........................................................................................1172 ?geqr2p......................................................................................1174 ?gerq2........................................................................................1175 ?gesc2........................................................................................1176 ?getc2........................................................................................1177 ?getf2.........................................................................................1178 ?gtts2.........................................................................................1179 ?isnan........................................................................................1180 ?laisnan......................................................................................1181 ?labrd.........................................................................................1181 ?lacn2........................................................................................1184 ?lacon.........................................................................................1185 ?lacpy.........................................................................................1186 ?ladiv.........................................................................................1187 ?lae2..........................................................................................1188 ?laebz.........................................................................................1189 ?laed0........................................................................................1192 ?laed1........................................................................................1194 ?laed2........................................................................................1195 ?laed3........................................................................................1197 ?laed4........................................................................................1199 ?laed5........................................................................................1200 ?laed6........................................................................................1200 ?laed7........................................................................................1202 ?laed8........................................................................................1204 ?laed9........................................................................................1207 ?laeda........................................................................................1208 ?laein.........................................................................................1209 ?laev2........................................................................................1212 ?laexc.........................................................................................1213 ?lag2..........................................................................................1214 ?lags2........................................................................................1216 ?lagtf..........................................................................................1218 ?lagtm........................................................................................1220 ?lagts.........................................................................................1221 ?lagv2........................................................................................1223 ?lahqr.........................................................................................1224 ?lahrd.........................................................................................1226 ?lahr2.........................................................................................1228 ?laic1.........................................................................................1230 ?laln2.........................................................................................1232 ?lals0.........................................................................................1234 Contents 13 ?lalsa..........................................................................................1236 ?lalsd.........................................................................................1239 ?lamrg........................................................................................1241 ?laneg........................................................................................1242 ?langb........................................................................................1243 ?lange........................................................................................1244 ?langt.........................................................................................1245 ?lanhs........................................................................................1246 ?lansb........................................................................................1247 ?lanhb........................................................................................1248 ?lansp........................................................................................1249 ?lanhp........................................................................................1250 ?lanst/?lanht...............................................................................1251 ?lansy.........................................................................................1252 ?lanhe........................................................................................1253 ?lantb.........................................................................................1255 ?lantp.........................................................................................1256 ?lantr.........................................................................................1257 ?lanv2........................................................................................1259 ?lapll..........................................................................................1259 ?lapmr........................................................................................1260 ?lapmt........................................................................................1262 ?lapy2........................................................................................1262 ?lapy3........................................................................................1263 ?laqgb........................................................................................1264 ?laqge........................................................................................1265 ?laqhb........................................................................................1266 ?laqp2........................................................................................1268 ?laqps........................................................................................1269 ?laqr0.........................................................................................1270 ?laqr1.........................................................................................1273 ?laqr2.........................................................................................1274 ?laqr3.........................................................................................1277 ?laqr4.........................................................................................1280 ?laqr5.........................................................................................1282 ?laqsb........................................................................................1285 ?laqsp........................................................................................1286 ?laqsy.........................................................................................1287 ?laqtr.........................................................................................1289 ?lar1v.........................................................................................1290 ?lar2v.........................................................................................1293 ?larf...........................................................................................1294 ?larfb.........................................................................................1295 ?larfg.........................................................................................1298 ?larfgp........................................................................................1299 ?larft..........................................................................................1300 ?larfx..........................................................................................1302 ?largv.........................................................................................1304 ?larnv.........................................................................................1305 ?larra.........................................................................................1306 Intel® Math Kernel Library Reference Manual 14 ?larrb.........................................................................................1307 ?larrc..........................................................................................1309 ?larrd.........................................................................................1310 ?larre.........................................................................................1312 ?larrf..........................................................................................1315 ?larrj..........................................................................................1317 ?larrk.........................................................................................1318 ?larrr..........................................................................................1319 ?larrv.........................................................................................1320 ?lartg.........................................................................................1323 ?lartgp........................................................................................1324 ?lartgs........................................................................................1326 ?lartv.........................................................................................1327 ?laruv.........................................................................................1328 ?larz...........................................................................................1329 ?larzb.........................................................................................1330 ?larzt..........................................................................................1332 ?las2..........................................................................................1334 ?lascl..........................................................................................1335 ?lasd0........................................................................................1336 ?lasd1........................................................................................1338 ?lasd2........................................................................................1340 ?lasd3........................................................................................1342 ?lasd4........................................................................................1344 ?lasd5........................................................................................1346 ?lasd6........................................................................................1347 ?lasd7........................................................................................1350 ?lasd8........................................................................................1353 ?lasd9........................................................................................1354 ?lasda.........................................................................................1356 ?lasdq........................................................................................1358 ?lasdt.........................................................................................1360 ?laset.........................................................................................1361 ?lasq1........................................................................................1362 ?lasq2........................................................................................1363 ?lasq3........................................................................................1364 ?lasq4........................................................................................1365 ?lasq5........................................................................................1366 ?lasq6........................................................................................1367 ?lasr...........................................................................................1368 ?lasrt..........................................................................................1371 ?lassq.........................................................................................1372 ?lasv2.........................................................................................1373 ?laswp........................................................................................1374 ?lasy2.........................................................................................1375 ?lasyf.........................................................................................1377 ?lahef.........................................................................................1378 ?latbs.........................................................................................1380 ?latdf..........................................................................................1382 ?latps.........................................................................................1383 Contents 15 ?latrd.........................................................................................1385 ?latrs..........................................................................................1387 ?latrz..........................................................................................1390 ?lauu2........................................................................................1392 ?lauum.......................................................................................1393 ?org2l/?ung2l..............................................................................1394 ?org2r/?ung2r.............................................................................1395 ?orgl2/?ungl2..............................................................................1396 ?orgr2/?ungr2.............................................................................1397 ?orm2l/?unm2l............................................................................1399 ?orm2r/?unm2r...........................................................................1400 ?orml2/?unml2............................................................................1402 ?ormr2/?unmr2...........................................................................1404 ?ormr3/?unmr3...........................................................................1405 ?pbtf2.........................................................................................1407 ?potf2.........................................................................................1408 ?ptts2.........................................................................................1409 ?rscl...........................................................................................1411 ?syswapr....................................................................................1411 ?heswapr....................................................................................1413 ?sygs2/?hegs2.............................................................................1415 ?sytd2/?hetd2.............................................................................1417 ?sytf2.........................................................................................1418 ?hetf2.........................................................................................1419 ?tgex2........................................................................................1421 ?tgsy2........................................................................................1423 ?trti2..........................................................................................1426 clag2z.........................................................................................1427 dlag2s........................................................................................1427 slag2d........................................................................................1428 zlag2c.........................................................................................1429 ?larfp.........................................................................................1429 ila?lc..........................................................................................1431 ila?lr...........................................................................................1432 ?gsvj0........................................................................................1432 ?gsvj1........................................................................................1434 ?sfrk...........................................................................................1437 ?hfrk..........................................................................................1438 ?tfsm..........................................................................................1440 ?lansf.........................................................................................1442 ?lanhf.........................................................................................1443 ?tfttp..........................................................................................1444 ?tfttr..........................................................................................1445 ?tpttf..........................................................................................1446 ?tpttr..........................................................................................1448 ?trttf..........................................................................................1449 ?trttp..........................................................................................1450 ?pstf2.........................................................................................1451 dlat2s ........................................................................................1453 zlat2c ........................................................................................1454 Intel® Math Kernel Library Reference Manual 16 ?lacp2........................................................................................1455 ?la_gbamv..................................................................................1455 ?la_gbrcond................................................................................1457 ?la_gbrcond_c.............................................................................1459 ?la_gbrcond_x.............................................................................1460 ?la_gbrfsx_extended....................................................................1462 ?la_gbrpvgrw...............................................................................1467 ?la_geamv..................................................................................1468 ?la_gercond.................................................................................1470 ?la_gercond_c.............................................................................1471 ?la_gercond_x.............................................................................1472 ?la_gerfsx_extended.....................................................................1473 ?la_heamv..................................................................................1478 ?la_hercond_c.............................................................................1480 ?la_hercond_x.............................................................................1481 ?la_herfsx_extended....................................................................1482 ?la_herpvgrw...............................................................................1487 ?la_lin_berr.................................................................................1488 ?la_porcond................................................................................1489 ?la_porcond_c.............................................................................1490 ?la_porcond_x.............................................................................1492 ?la_porfsx_extended....................................................................1493 ?la_porpvgrw...............................................................................1498 ?laqhe........................................................................................1499 ?laqhp........................................................................................1501 ?larcm........................................................................................1502 ?la_rpvgrw..................................................................................1503 ?larscl2.......................................................................................1504 ?lascl2........................................................................................1504 ?la_syamv...................................................................................1505 ?la_syrcond.................................................................................1507 ?la_syrcond_c..............................................................................1508 ?la_syrcond_x.............................................................................1509 ?la_syrfsx_extended.....................................................................1511 ?la_syrpvgrw...............................................................................1516 ?la_wwaddw................................................................................1517 Utility Functions and Routines................................................................1518 ilaver..........................................................................................1519 ilaenv.........................................................................................1520 iparmq........................................................................................1522 ieeeck.........................................................................................1523 lsamen.......................................................................................1524 ?labad........................................................................................1524 ?lamch.......................................................................................1525 ?lamc1.......................................................................................1526 ?lamc2.......................................................................................1526 ?lamc3.......................................................................................1527 ?lamc4.......................................................................................1528 ?lamc5.......................................................................................1528 second/dsecnd.............................................................................1529 Contents 17 chla_transtype.............................................................................1529 iladiag........................................................................................1530 ilaprec........................................................................................1531 ilatrans.......................................................................................1531 ilauplo........................................................................................1532 xerbla_array................................................................................1532 Chapter 6: ScaLAPACK Routines Overview.............................................................................................1535 Routine Naming Conventions.................................................................1536 Computational Routines........................................................................1537 Linear Equations..........................................................................1537 Routines for Matrix Factorization....................................................1538 p?getrf...............................................................................1538 p?gbtrf...............................................................................1540 p?dbtrf...............................................................................1542 p?dttrf................................................................................1543 p?potrf...............................................................................1545 p?pbtrf...............................................................................1546 p?pttrf................................................................................1548 Routines for Solving Systems of Linear Equations.............................1550 p?getrs...............................................................................1550 p?gbtrs...............................................................................1551 p?dbtrs...............................................................................1553 p?dttrs...............................................................................1555 p?potrs...............................................................................1557 p?pbtrs...............................................................................1558 p?pttrs...............................................................................1560 p?trtrs................................................................................1562 Routines for Estimating the Condition Number..................................1563 p?gecon..............................................................................1564 p?pocon..............................................................................1566 p?trcon...............................................................................1568 Refining the Solution and Estimating Its Error..................................1570 p?gerfs...............................................................................1570 p?porfs...............................................................................1573 p?trrfs................................................................................1576 Routines for Matrix Inversion.........................................................1578 p?getri...............................................................................1578 p?potri...............................................................................1580 p?trtri.................................................................................1581 Routines for Matrix Equilibration.....................................................1583 p?geequ.............................................................................1583 p?poequ.............................................................................1584 Orthogonal Factorizations..............................................................1586 p?geqrf...............................................................................1587 p?geqpf..............................................................................1589 p?orgqr..............................................................................1591 p?ungqr..............................................................................1592 p?ormqr.............................................................................1594 Intel® Math Kernel Library Reference Manual 18 p?unmqr.............................................................................1596 p?gelqf...............................................................................1598 p?orglq...............................................................................1600 p?unglq..............................................................................1602 p?ormlq..............................................................................1603 p?unmlq.............................................................................1605 p?geqlf...............................................................................1608 p?orgql...............................................................................1609 p?ungql..............................................................................1611 p?ormql..............................................................................1612 p?unmql.............................................................................1615 p?gerqf...............................................................................1617 p?orgrq..............................................................................1619 p?ungrq..............................................................................1620 p?ormrq.............................................................................1622 p?unmrq.............................................................................1624 p?tzrzf................................................................................1626 p?ormrz..............................................................................1628 p?unmrz.............................................................................1631 p?ggqrf...............................................................................1633 p?ggrqf...............................................................................1636 Symmetric Eigenproblems.............................................................1640 p?sytrd...............................................................................1640 p?ormtr..............................................................................1643 p?hetrd..............................................................................1646 p?unmtr.............................................................................1648 p?stebz..............................................................................1651 p?stein...............................................................................1653 Nonsymmetric Eigenvalue Problems................................................1656 p?gehrd..............................................................................1657 p?ormhr.............................................................................1659 p?unmhr.............................................................................1662 p?lahqr...............................................................................1664 Singular Value Decomposition........................................................1666 p?gebrd..............................................................................1666 p?ormbr.............................................................................1669 p?unmbr.............................................................................1672 Generalized Symmetric-Definite Eigen Problems...............................1676 p?sygst...............................................................................1676 p?hegst..............................................................................1677 Driver Routines....................................................................................1679 p?gesv........................................................................................1679 p?gesvx......................................................................................1681 p?gbsv........................................................................................1685 p?dbsv........................................................................................1687 p?dtsv........................................................................................1689 p?posv........................................................................................1691 p?posvx......................................................................................1693 p?pbsv........................................................................................1697 p?ptsv........................................................................................1699 Contents 19 p?gels.........................................................................................1701 p?syev........................................................................................1704 p?syevd......................................................................................1706 p?syevx......................................................................................1708 p?heev.......................................................................................1713 p?heevd......................................................................................1715 p?heevx......................................................................................1717 p?gesvd......................................................................................1723 p?sygvx......................................................................................1726 p?hegvx......................................................................................1732 Chapter 7: ScaLAPACK Auxiliary and Utility Routines Auxiliary Routines.................................................................................1739 p?lacgv.......................................................................................1743 p?max1......................................................................................1744 ?combamax1...............................................................................1745 p?sum1......................................................................................1745 p?dbtrsv.....................................................................................1746 p?dttrsv......................................................................................1748 p?gebd2......................................................................................1751 p?gehd2.....................................................................................1754 p?gelq2......................................................................................1756 p?geql2......................................................................................1758 p?geqr2......................................................................................1760 p?gerq2......................................................................................1762 p?getf2.......................................................................................1763 p?labrd.......................................................................................1765 p?lacon.......................................................................................1768 p?laconsb....................................................................................1769 p?lacp2.......................................................................................1770 p?lacp3.......................................................................................1772 p?lacpy.......................................................................................1773 p?laevswp...................................................................................1774 p?lahrd.......................................................................................1775 p?laiect.......................................................................................1778 p?lange.......................................................................................1779 p?lanhs.......................................................................................1780 p?lansy, p?lanhe..........................................................................1782 p?lantr........................................................................................1783 p?lapiv........................................................................................1785 p?laqge.......................................................................................1787 p?laqsy.......................................................................................1789 p?lared1d....................................................................................1791 p?lared2d....................................................................................1792 p?larf.........................................................................................1793 p?larfb........................................................................................1795 p?larfc........................................................................................1798 p?larfg........................................................................................1800 p?larft........................................................................................1802 p?larz.........................................................................................1804 Intel® Math Kernel Library Reference Manual 20 p?larzb.......................................................................................1807 p?larzc........................................................................................1809 p?larzt........................................................................................1813 p?lascl........................................................................................1815 p?laset.......................................................................................1817 p?lasmsub...................................................................................1818 p?lassq.......................................................................................1819 p?laswp......................................................................................1821 p?latra........................................................................................1822 p?latrd........................................................................................1823 p?latrs........................................................................................1826 p?latrz........................................................................................1828 p?lauu2......................................................................................1830 p?lauum.....................................................................................1831 p?lawil........................................................................................1832 p?org2l/p?ung2l...........................................................................1833 p?org2r/p?ung2r..........................................................................1835 p?orgl2/p?ungl2...........................................................................1836 p?orgr2/p?ungr2..........................................................................1838 p?orm2l/p?unm2l.........................................................................1840 p?orm2r/p?unm2r........................................................................1843 p?orml2/p?unml2.........................................................................1846 p?ormr2/p?unmr2........................................................................1849 p?pbtrsv.....................................................................................1851 p?pttrsv......................................................................................1854 p?potf2.......................................................................................1857 p?rscl.........................................................................................1858 p?sygs2/p?hegs2.........................................................................1859 p?sytd2/p?hetd2..........................................................................1861 p?trti2........................................................................................1864 ?lamsh.......................................................................................1866 ?laref..........................................................................................1867 ?lasorte......................................................................................1868 ?lasrt2........................................................................................1869 ?stein2.......................................................................................1870 ?dbtf2.........................................................................................1872 ?dbtrf.........................................................................................1873 ?dttrf..........................................................................................1874 ?dttrsv........................................................................................1875 ?pttrsv........................................................................................1876 ?steqr2.......................................................................................1878 Utility Functions and Routines................................................................1879 p?labad.......................................................................................1879 p?lachkieee.................................................................................1880 p?lamch......................................................................................1881 p?lasnbt......................................................................................1882 pxerbla.......................................................................................1882 Chapter 8: Sparse Solver Routines PARDISO* - Parallel Direct Sparse Solver Interface...................................1885 Contents 21 pardiso.......................................................................................1886 pardisoinit...................................................................................1902 pardiso_64..................................................................................1903 pardiso_getenv, pardiso_setenv.....................................................1904 PARDISO Parameters in Tabular Form.............................................1905 Direct Sparse Solver (DSS) Interface Routines.........................................1914 DSS Interface Description.............................................................1916 DSS Routines..............................................................................1916 dss_create..........................................................................1916 dss_define_structure............................................................1918 dss_reorder.........................................................................1920 dss_factor_real, dss_factor_complex......................................1921 dss_solve_real, dss_solve_complex........................................1923 dss_delete..........................................................................1926 dss_statistics.......................................................................1927 mkl_cvt_to_null_terminated_str............................................1930 Implementation Details.................................................................1931 Iterative Sparse Solvers based on Reverse Communication Interface (RCI ISS)...............................................................................................1932 CG Interface Description...............................................................1933 FGMRES Interface Description........................................................1938 RCI ISS Routines.........................................................................1945 dcg_init..............................................................................1945 dcg_check...........................................................................1946 dcg....................................................................................1946 dcg_get..............................................................................1948 dcgmrhs_init.......................................................................1948 dcgmrhs_check....................................................................1949 dcgmrhs.............................................................................1950 dcgmrhs_get.......................................................................1952 dfgmres_init........................................................................1952 dfgmres_check....................................................................1953 dfgmres..............................................................................1954 dfgmres_get........................................................................1956 Implementation Details.................................................................1957 Preconditioners based on Incomplete LU Factorization Technique................1958 ILU0 and ILUT Preconditioners Interface Description.........................1960 dcsrilu0.......................................................................................1961 dcsrilut.......................................................................................1963 Calling Sparse Solver and Preconditioner Routines from C/C++..................1967 Chapter 9: Vector Mathematical Functions Data Types, Accuracy Modes, and Performance Tips..................................1969 Function Naming Conventions................................................................1970 Function Interfaces.......................................................................1971 VML Mathematical Functions..................................................1971 Pack Functions....................................................................1971 Unpack Functions.................................................................1972 Service Functions.................................................................1972 Input Parameters.................................................................1972 Intel® Math Kernel Library Reference Manual 22 Output Parameters...............................................................1973 Vector Indexing Methods.......................................................................1973 Error Diagnostics..................................................................................1973 VML Mathematical Functions..................................................................1974 Special Value Notations.................................................................1976 Arithmetic Functions.....................................................................1976 v?Add.................................................................................1976 v?Sub.................................................................................1979 v?Sqr.................................................................................1981 v?Mul.................................................................................1983 v?MulByConj.......................................................................1986 v?Conj................................................................................1987 v?Abs.................................................................................1989 v?Arg.................................................................................1991 v?LinearFrac........................................................................1993 Power and Root Functions.............................................................1995 v?Inv.................................................................................1995 v?Div.................................................................................1997 v?Sqrt................................................................................2000 v?InvSqrt............................................................................2002 v?Cbrt................................................................................2004 v?InvCbrt...........................................................................2006 v?Pow2o3...........................................................................2007 v?Pow3o2...........................................................................2009 v?Pow................................................................................2011 v?Powx...............................................................................2014 v?Hypot..............................................................................2017 Exponential and Logarithmic Functions............................................2019 v?Exp.................................................................................2019 v?Expm1............................................................................2022 v?Ln...................................................................................2024 v?Log10.............................................................................2027 v?Log1p..............................................................................2030 Trigonometric Functions................................................................2031 v?Cos.................................................................................2031 v?Sin..................................................................................2034 v?SinCos............................................................................2036 v?CIS.................................................................................2038 v?Tan.................................................................................2040 v?Acos...............................................................................2042 v?Asin................................................................................2045 v?Atan................................................................................2047 v?Atan2..............................................................................2050 Hyperbolic Functions.....................................................................2052 v?Cosh...............................................................................2052 v?Sinh................................................................................2055 v?Tanh...............................................................................2058 v?Acosh..............................................................................2061 v?Asinh..............................................................................2064 v?Atanh..............................................................................2067 Contents 23 Special Functions.........................................................................2070 v?Erf..................................................................................2070 v?Erfc.................................................................................2073 v?CdfNorm..........................................................................2075 v?ErfInv.............................................................................2077 v?ErfcInv............................................................................2080 v?CdfNormInv.....................................................................2082 v?LGamma..........................................................................2084 v?TGamma.........................................................................2086 Rounding Functions......................................................................2088 v?Floor...............................................................................2088 v?Ceil.................................................................................2089 v?Trunc..............................................................................2091 v?Round.............................................................................2093 v?NearbyInt........................................................................2094 v?Rint................................................................................2096 v?Modf...............................................................................2098 VML Pack/Unpack Functions...................................................................2100 v?Pack........................................................................................2100 v?Unpack....................................................................................2103 VML Service Functions...........................................................................2106 vmlSetMode................................................................................2106 vmlGetMode................................................................................2108 vmlSetErrStatus...........................................................................2109 vmlGetErrStatus..........................................................................2110 vmlClearErrStatus........................................................................2111 vmlSetErrorCallBack.....................................................................2111 vmlGetErrorCallBack.....................................................................2114 vmlClearErrorCallBack..................................................................2114 Chapter 10: Statistical Functions Random Number Generators..................................................................2115 Conventions................................................................................2116 Mathematical Notation..........................................................2117 Naming Conventions............................................................2118 Basic Generators..........................................................................2121 BRNG Parameter Definition....................................................2122 Random Streams.................................................................2123 Data Types.........................................................................2124 Error Reporting............................................................................2124 VSL RNG Usage Model..................................................................2125 Service Routines..........................................................................2127 vslNewStream.....................................................................2128 vslNewStreamEx..................................................................2129 vsliNewAbstractStream.........................................................2131 vsldNewAbstractStream........................................................2133 vslsNewAbstractStream........................................................2135 vslDeleteStream..................................................................2137 vslCopyStream....................................................................2138 vslCopyStreamState.............................................................2139 Intel® Math Kernel Library Reference Manual 24 vslSaveStreamF...................................................................2140 vslLoadStreamF...................................................................2141 vslSaveStreamM..................................................................2142 vslLoadStreamM..................................................................2144 vslGetStreamSize.................................................................2145 vslLeapfrogStream...............................................................2146 vslSkipAheadStream............................................................2148 vslGetStreamStateBrng........................................................2151 vslGetNumRegBrngs.............................................................2152 Distribution Generators.................................................................2153 Continuous Distributions.......................................................2156 Discrete Distributions...........................................................2189 Advanced Service Routines............................................................2208 Data types..........................................................................2208 vslRegisterBrng...................................................................2209 vslGetBrngProperties............................................................2210 Formats for User-Designed Generators...................................2211 Convolution and Correlation...................................................................2214 Naming Conventions.....................................................................2215 Data Types..................................................................................2215 Parameters.................................................................................2216 Task Status and Error Reporting.....................................................2218 Task Constructors........................................................................2220 vslConvNewTask/vslCorrNewTask...........................................2220 vslConvNewTask1D/vslCorrNewTask1D...................................2223 vslConvNewTaskX/vslCorrNewTaskX.......................................2225 vslConvNewTaskX1D/vslCorrNewTaskX1D...............................2228 Task Editors................................................................................2232 vslConvSetMode/vslCorrSetMode...........................................2232 vslConvSetInternalPrecision/vslCorrSetInternalPrecision............2234 vslConvSetStart/vslCorrSetStart............................................2235 vslConvSetDecimation/vslCorrSetDecimation...........................2237 Task Execution Routines................................................................2238 vslConvExec/vslCorrExec......................................................2239 vslConvExec1D/vslCorrExec1D...............................................2242 vslConvExecX/vslCorrExecX...................................................2246 vslConvExecX1D/vslCorrExecX1D...........................................2249 Task Destructors..........................................................................2253 vslConvDeleteTask/vslCorrDeleteTask.....................................2253 Task Copy...................................................................................2254 vslConvCopyTask/vslCorrCopyTask.........................................2254 Usage Examples...........................................................................2256 Mathematical Notation and Definitions............................................2258 Data Allocation............................................................................2259 VSL Summary Statistics........................................................................2261 Naming Conventions.....................................................................2262 Data Types..................................................................................2263 Parameters.................................................................................2263 Task Status and Error Reporting.....................................................2263 Task Constructors........................................................................2267 Contents 25 vslSSNewTask.....................................................................2267 Task Editors................................................................................2269 vslSSEditTask......................................................................2270 vslSSEditMoments................................................................2278 vslSSEditCovCor..................................................................2280 vslSSEditPartialCovCor.........................................................2282 vslSSEditQuantiles...............................................................2284 vslSSEditStreamQuantiles.....................................................2286 vslSSEditPooledCovariance....................................................2287 vslSSEditRobustCovariance...................................................2289 vslSSEditOutliersDetection....................................................2292 vslSSEditMissingValues.........................................................2294 vslSSEditCorParameterization................................................2298 Task Computation Routines...........................................................2300 vslSSCompute.....................................................................2302 Task Destructor...........................................................................2303 vslSSDeleteTask..................................................................2303 Usage Examples...........................................................................2304 Mathematical Notation and Definitions............................................2305 Chapter 11: Fourier Transform Functions FFT Functions.......................................................................................2312 Computing an FFT........................................................................2313 FFT Interface...............................................................................2313 Descriptor Manipulation Functions..................................................2313 DftiCreateDescriptor.............................................................2314 DftiCommitDescriptor...........................................................2316 DftiFreeDescriptor................................................................2317 DftiCopyDescriptor...............................................................2318 FFT Computation Functions............................................................2319 DftiComputeForward............................................................2320 DftiComputeBackward..........................................................2322 Descriptor Configuration Functions.................................................2325 DftiSetValue........................................................................2325 DftiGetValue........................................................................2327 Status Checking Functions.............................................................2329 DftiErrorClass......................................................................2329 DftiErrorMessage.................................................................2331 Configuration Settings..................................................................2332 DFTI_PRECISION.................................................................2334 DFTI_FORWARD_DOMAIN.....................................................2335 DFTI_DIMENSION, DFTI_LENGTHS.........................................2336 DFTI_PLACEMENT................................................................2336 DFTI_FORWARD_SCALE, DFTI_BACKWARD_SCALE...................2336 DFTI_NUMBER_OF_USER_THREADS.......................................2336 DFTI_INPUT_STRIDES, DFTI_OUTPUT_STRIDES......................2337 DFTI_NUMBER_OF_TRANSFORMS..........................................2339 DFTI_INPUT_DISTANCE, DFTI_OUTPUT_DISTANCE..................2339 DFTI_COMPLEX_STORAGE, DFTI_REAL_STORAGE, DFTI_CONJUGATE_EVEN_STORAGE....................................2340 Intel® Math Kernel Library Reference Manual 26 DFTI_PACKED_FORMAT........................................................2347 DFTI_WORKSPACE...............................................................2351 DFTI_COMMIT_STATUS........................................................2352 DFTI_ORDERING..................................................................2352 Cluster FFT Functions............................................................................2352 Computing Cluster FFT..................................................................2353 Distributing Data among Processes.................................................2354 Cluster FFT Interface....................................................................2356 Descriptor Manipulation Functions..................................................2356 DftiCreateDescriptorDM........................................................2357 DftiCommitDescriptorDM.......................................................2358 DftiFreeDescriptorDM...........................................................2359 FFT Computation Functions............................................................2360 DftiComputeForwardDM........................................................2360 DftiComputeBackwardDM......................................................2362 Descriptor Configuration Functions.................................................2364 DftiSetValueDM...................................................................2365 DftiGetValueDM...................................................................2367 Error Codes.................................................................................2370 Chapter 12: PBLAS Routines Overview.............................................................................................2373 Routine Naming Conventions.................................................................2374 PBLAS Level 1 Routines.........................................................................2375 p?amax......................................................................................2376 p?asum.......................................................................................2377 p?axpy.......................................................................................2378 p?copy........................................................................................2379 p?dot..........................................................................................2380 p?dotc........................................................................................2381 p?dotu........................................................................................2382 p?nrm2.......................................................................................2383 p?scal.........................................................................................2384 p?swap.......................................................................................2385 PBLAS Level 2 Routines.........................................................................2386 p?gemv......................................................................................2387 p?agemv.....................................................................................2389 p?ger..........................................................................................2391 p?gerc........................................................................................2393 p?geru........................................................................................2394 p?hemv......................................................................................2396 p?ahemv.....................................................................................2397 p?her.........................................................................................2399 p?her2........................................................................................2400 p?symv.......................................................................................2402 p?asymv.....................................................................................2404 p?syr..........................................................................................2406 p?syr2........................................................................................2407 p?trmv.......................................................................................2409 p?atrmv......................................................................................2410 Contents 27 p?trsv.........................................................................................2413 PBLAS Level 3 Routines.........................................................................2414 p?geadd......................................................................................2415 p?tradd.......................................................................................2416 p?gemm.....................................................................................2418 p?hemm.....................................................................................2420 p?herk........................................................................................2422 p?her2k......................................................................................2424 p?symm......................................................................................2426 p?syrk........................................................................................2428 p?syr2k......................................................................................2430 p?tran........................................................................................2432 p?tranu.......................................................................................2433 p?tranc.......................................................................................2434 p?trmm......................................................................................2435 p?trsm........................................................................................2437 Chapter 13: Partial Differential Equations Support Trigonometric Transform Routines..........................................................2441 Transforms Implemented..............................................................2442 Sequence of Invoking TT Routines..................................................2443 Interface Description....................................................................2445 TT Routines.................................................................................2445 ?_init_trig_transform............................................................2445 ?_commit_trig_transform......................................................2446 ?_forward_trig_transform.....................................................2448 ?_backward_trig_transform...................................................2450 free_trig_transform..............................................................2451 Common Parameters....................................................................2452 Implementation Details.................................................................2455 Poisson Library Routines .......................................................................2457 Poisson Library Implemented.........................................................2457 Sequence of Invoking PL Routines..................................................2462 Interface Description....................................................................2464 PL Routines for the Cartesian Solver...............................................2465 ?_init_Helmholtz_2D/?_init_Helmholtz_3D..............................2465 ?_commit_Helmholtz_2D/?_commit_Helmholtz_3D..................2467 ?_Helmholtz_2D/?_Helmholtz_3D..........................................2470 free_Helmholtz_2D/free_Helmholtz_3D...................................2474 PL Routines for the Spherical Solver................................................2475 ?_init_sph_p/?_init_sph_np...................................................2475 ?_commit_sph_p/?_commit_sph_np.......................................2476 ?_sph_p/?_sph_np...............................................................2478 free_sph_p/free_sph_np.......................................................2480 Common Parameters....................................................................2481 Implementation Details.................................................................2486 Calling PDE Support Routines from Fortran 90..........................................2492 Chapter 14: Nonlinear Optimization Problem Solvers Organization and Implementation...........................................................2495 Intel® Math Kernel Library Reference Manual 28 Routine Naming Conventions.................................................................2496 Nonlinear Least Squares Problem without Constraints................................2496 ?trnlsp_init..................................................................................2497 ?trnlsp_check..............................................................................2499 ?trnlsp_solve...............................................................................2500 ?trnlsp_get..................................................................................2502 ?trnlsp_delete..............................................................................2503 Nonlinear Least Squares Problem with Linear (Bound) Constraints..............2504 ?trnlspbc_init...............................................................................2505 ?trnlspbc_check...........................................................................2506 ?trnlspbc_solve............................................................................2508 ?trnlspbc_get...............................................................................2510 ?trnlspbc_delete..........................................................................2511 Jacobian Matrix Calculation Routines.......................................................2512 ?jacobi_init..................................................................................2512 ?jacobi_solve...............................................................................2513 ?jacobi_delete.............................................................................2514 ?jacobi........................................................................................2515 ?jacobix......................................................................................2516 Chapter 15: Support Functions Version Information Functions................................................................2521 mkl_get_version..........................................................................2521 mkl_get_version_string.................................................................2523 Threading Control Functions...................................................................2524 mkl_set_num_threads..................................................................2524 mkl_domain_set_num_threads......................................................2525 mkl_set_dynamic.........................................................................2526 mkl_get_max_threads..................................................................2526 mkl_domain_get_max_threads......................................................2527 mkl_get_dynamic.........................................................................2528 Error Handling Functions.......................................................................2528 xerbla.........................................................................................2529 pxerbla.......................................................................................2530 Equality Test Functions.........................................................................2530 lsame.........................................................................................2530 lsamen.......................................................................................2531 Timing Functions..................................................................................2532 second/dsecnd.............................................................................2532 mkl_get_cpu_clocks.....................................................................2533 mkl_get_cpu_frequency................................................................2534 mkl_get_max_cpu_frequency........................................................2534 mkl_get_clocks_frequency.............................................................2535 Memory Functions................................................................................2536 mkl_free_buffers..........................................................................2536 mkl_thread_free_buffers...............................................................2537 mkl_disable_fast_mm...................................................................2538 mkl_mem_stat............................................................................2538 mkl_malloc..................................................................................2539 mkl_free.....................................................................................2540 Contents 29 Examples of mkl_malloc(), mkl_free(), mkl_mem_stat() Usage..........2540 Miscellaneous Utility Functions...............................................................2542 mkl_progress...............................................................................2542 mkl_enable_instructions................................................................2544 Functions Supporting the Single Dynamic Library......................................2545 mkl_set_interface_layer................................................................2545 mkl_set_threading_layer...............................................................2546 mkl_set_xerbla............................................................................2546 mkl_set_progress.........................................................................2547 Chapter 16: BLACS Routines Matrix Shapes......................................................................................2549 BLACS Combine Operations...................................................................2550 ?gamx2d.....................................................................................2551 ?gamn2d.....................................................................................2552 ?gsum2d.....................................................................................2553 BLACS Point To Point Communication......................................................2554 ?gesd2d......................................................................................2556 ?trsd2d.......................................................................................2557 ?gerv2d......................................................................................2557 ?trrv2d.......................................................................................2558 BLACS Broadcast Routines.....................................................................2559 ?gebs2d......................................................................................2560 ?trbs2d.......................................................................................2560 ?gebr2d......................................................................................2561 ?trbr2d.......................................................................................2562 BLACS Support Routines........................................................................2562 Initialization Routines...................................................................2562 blacs_pinfo.........................................................................2563 blacs_setup.........................................................................2563 blacs_get............................................................................2564 blacs_set............................................................................2565 blacs_gridinit.......................................................................2566 blacs_gridmap.....................................................................2567 Destruction Routines....................................................................2568 blacs_freebuff.....................................................................2568 blacs_gridexit......................................................................2569 blacs_abort.........................................................................2569 blacs_exit...........................................................................2569 Informational Routines..................................................................2570 blacs_gridinfo......................................................................2570 blacs_pnum........................................................................2570 blacs_pcoord.......................................................................2571 Miscellaneous Routines.................................................................2571 blacs_barrier.......................................................................2571 Examples of BLACS Routines Usage........................................................2572 Chapter 17: Data Fitting Functions Naming Conventions.............................................................................2581 Data Types..........................................................................................2582 Intel® Math Kernel Library Reference Manual 30 Mathematical Conventions.....................................................................2582 Data Fitting Usage Model.......................................................................2585 Data Fitting Usage Examples..................................................................2585 Task Status and Error Reporting.............................................................2590 Task Creation and Initialization Routines..................................................2592 df?newtask1d..............................................................................2592 Task Editors.........................................................................................2594 df?editppspline1d.........................................................................2595 df?editptr....................................................................................2601 dfieditval.....................................................................................2602 df?editidxptr................................................................................2604 Computational Routines........................................................................2606 df?construct1d.............................................................................2606 df?interpolate1d/df?interpolateex1d................................................2607 df?integrate1d/df?integrateex1d.....................................................2613 df?searchcells1d/df?searchcellsex1d...............................................2619 df?interpcallback..........................................................................2621 df?integrcallback..........................................................................2623 df?searchcellscallback...................................................................2625 Task Destructors..................................................................................2627 dfdeletetask................................................................................2627 Appendix A: Linear Solvers Basics Sparse Linear Systems..........................................................................2629 Matrix Fundamentals....................................................................2629 Direct Method..............................................................................2630 Sparse Matrix Storage Formats......................................................2634 Appendix B: Routine and Function Arguments Vector Arguments in BLAS.....................................................................2645 Vector Arguments in VML......................................................................2646 Matrix Arguments.................................................................................2646 Appendix C: Code Examples BLAS Code Examples............................................................................2653 Fourier Transform Functions Code Examples............................................2656 FFT Code Examples......................................................................2656 Examples of Using Multi-Threading for FFT Computation............2662 Examples for Cluster FFT Functions.................................................2666 Auxiliary Data Transformations......................................................2667 Appendix D: CBLAS Interface to the BLAS CBLAS Arguments................................................................................2669 Level 1 CBLAS......................................................................................2670 Level 2 CBLAS......................................................................................2672 Level 3 CBLAS......................................................................................2676 Sparse CBLAS......................................................................................2678 Appendix E: Specific Features of Fortran 95 Interfaces for LAPACK Routines Interfaces Identical to Netlib..................................................................2681 Contents 31 Interfaces with Replaced Argument Names..............................................2682 Modified Netlib Interfaces......................................................................2684 Interfaces Absent From Netlib................................................................2684 Interfaces of New Functionality...............................................................2687 Appendix F: FFTW Interface to Intel® Math Kernel Library Notational Conventions ........................................................................2689 FFTW2 Interface to Intel® Math Kernel Library .........................................2689 Wrappers Reference.....................................................................2689 One-dimensional Complex-to-complex FFTs ............................2689 Multi-dimensional Complex-to-complex FFTs............................2690 One-dimensional Real-to-half-complex/Half-complex-to-real FFTs...............................................................................2690 Multi-dimensional Real-to-complex/Complex-to-real FFTs..........2690 Multi-threaded FFTW............................................................2691 FFTW Support Functions.......................................................2691 Limitations of the FFTW2 Interface to Intel MKL.......................2691 Calling Wrappers from Fortran.......................................................2692 Installation..................................................................................2693 Creating the Wrapper Library.................................................2693 Application Assembling ........................................................2694 Running Examples ...............................................................2694 MPI FFTW Wrappers.....................................................................2694 MPI FFTW Wrappers Reference..............................................2694 Creating MPI FFTW Wrapper Library.......................................2696 Application Assembling with MPI FFTW Wrapper Library............2696 Running Examples ...............................................................2696 FFTW3 Interface to Intel® Math Kernel Library..........................................2697 Using FFTW3 Wrappers.................................................................2697 Calling Wrappers from Fortran.......................................................2699 Building Your Own Wrapper Library.................................................2699 Building an Application..................................................................2700 Running Examples .......................................................................2700 MPI FFTW Wrappers.....................................................................2701 Building Your Own Wrapper Library........................................2701 Building an Application.........................................................2701 Running Examples...............................................................2702 Appendix G: Bibliography Appendix H: Glossary Intel® Math Kernel Library Reference Manual 32 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http:// www.intel.com/design/literature.htm Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/ processor_number/ Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java is a registered trademark of Oracle and/or its affiliates. Third Party Content Intel® Math Kernel Library (Intel® MKL) includes content from several 3rd party sources that was originally governed by the licenses referenced below: • Portions© Copyright 2001 Hewlett-Packard Development Company, L.P. 33 • Sections on the Linear Algebra PACKage (LAPACK) routines include derivative work portions that have been copyrighted: © 1991, 1992, and 1998 by The Numerical Algorithms Group, Ltd. • Intel MKL fully supports LAPACK 3.3 set of computational, driver, auxiliary and utility routines under the following license: Copyright © 1992-2010 The University of Tennessee. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer listed in this license in the documentation and/or other materials provided with the distribution. • Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. The original versions of LAPACK from which that part of Intel MKL was derived can be obtained from http://www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. • The original versions of the Basic Linear Algebra Subprograms (BLAS) from which the respective part of Intel® MKL was derived can be obtained from http://www.netlib.org/blas/index.html. • The original versions of the Basic Linear Algebra Communication Subprograms (BLACS) from which the respective part of Intel MKL was derived can be obtained from http://www.netlib.org/blacs/index.html. The authors of BLACS are Jack Dongarra and R. Clint Whaley. • The original versions of Scalable LAPACK (ScaLAPACK) from which the respective part of Intel® MKL was derived can be obtained from http://www.netlib.org/scalapack/index.html. The authors of ScaLAPACK are L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. • The original versions of the Parallel Basic Linear Algebra Subprograms (PBLAS) routines from which the respective part of Intel® MKL was derived can be obtained from http://www.netlib.org/scalapack/html/ pblas_qref.html. • PARDISO (PARallel DIrect SOlver)* in Intel® MKL is compliant with the 3.2 release of PARDISO that is freely distributed by the University of Basel. It can be obtained at http://www.pardiso-project.org. • Some Fast Fourier Transform (FFT) functions in this release of Intel® MKL have been generated by the SPIRAL software generation system (http://www.spiral.net/) under license from Carnegie Mellon University. The authors of SPIRAL are Markus Puschel, Jose Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo. Copyright© 1994-2011, Intel Corporation. All rights reserved. Intel® Math Kernel Library Reference Manual 34 Introducing the Intel® Math Kernel Library The Intel® Math Kernel Library (Intel® MKL) improves performance of scientific, engineering, and financial software that solves large computational problems. Among other functionality, Intel MKL provides linear algebra routines, fast Fourier transforms, as well as vectorized math and random number generation functions, all optimized for the latest Intel processors, including processors with multiple cores (see the Intel® MKL Release Notes for the full list of supported processors). Intel MKL also performs well on non-Intel processors. Intel MKL is thread-safe and extensively threaded using the OpenMP* technology. For more details about functionality provided by Intel MKL, see the Function Domains section. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 35 Intel® Math Kernel Library Reference Manual 36 Getting Help and Support Getting Help The online version of the Intel® Math Kernel Library (Intel® MKL) Reference Manual integrates into the Microsoft Visual Studio* development system help on Windows* OS or into the Eclipse* development system help on Linux* OS. For information on how to use the online help, see the Intel MKL User's Guide. Getting Technical Support Intel MKL provides a product web site that offers timely and comprehensive product information, including product features, white papers, and technical articles. For the latest information, check: http:// www.intel.com/software/products/support. Intel also provides a support web site that contains a rich repository of self help information, including getting started tips, known product issues, product errata, license information, user forums, and more (visit http://www.intel.com/software/products/). Registering your product entitles you to one year of technical support and product updates through Intel® Premier Support. Intel Premier Support is an interactive issue management and communication web site providing these services: • Submit issues and review their status. • Download product updates anytime of the day. To register your product, contact Intel, or seek product support, please visit http://www.intel.com/software/ products/support. 37 Intel® Math Kernel Library Reference Manual 38 What's New This Reference Manual documents Intel® Math Kernel Library (Intel® MKL) 10.3 Update 8 release. The following function domains were updated in Intel MKL 10.3 Update 8 with new functions, enhancements to the existing functionality, or improvements to the existing documentation: • New data fitting functions provide spline-based interpolation capabilities that you can use to approximate functions, function derivatives or function integrals, and perform cell search operations. See Data Fitting Functions. • The Fourier transform documentation has been updated and improved, especially in the descriptions of configuration settings that define the forward domain of the transform (see DFTI_FORWARD_DOMAIN), memory layout of the input/output data (see DFTI_INPUT_STRIDES, DFTI_OUTPUT_STRIDES), distances between consecutive data sets for computing multiple transforms (see DFTI_INPUT_DISTANCE, DFTI_OUTPUT_DISTANCE), and storage schemes (see DFTI_COMPLEX_STORAGE, DFTI_REAL_STORAGE). Additionally, several minor updates have been made to correct errors in the manual. 39 Intel® Math Kernel Library Reference Manual 40 Notational Conventions This manual uses the following terms to refer to operating systems: Windows* OS This term refers to information that is valid on all supported Windows* operating systems. Linux* OS This term refers to information that is valid on all supported Linux* operating systems. Mac OS* X This term refers to information that is valid on Intel®-based systems running the Mac OS* X operating system. This manual uses the following notational conventions: • Routine name shorthand (for example, ?ungqr instead of cungqr/zungqr). • Font conventions used for distinction between the text and the code. Routine Name Shorthand For shorthand, names that contain a question mark "?" represent groups of routines with similar functionality. Each group typically consists of routines used with four basic data types: single-precision real, double-precision real, single-precision complex, and double-precision complex. The question mark is used to indicate any or all possible varieties of a function; for example: ?swap Refers to all four data types of the vector-vector ?swap routine: sswap, dswap, cswap, and zswap. Font Conventions The following font conventions are used: UPPERCASE COURIER Data type used in the description of input and output parameters for Fortran interface. For example, CHARACTER*1. lowercase courier Code examples: a(k+i,j) = matrix(i,j) and data types for C interface, for example, const float* lowercase courier mixed with UpperCase courier Function names for C interface, for example, vmlSetMode lowercase courier italic Variables in arguments and parameters description. For example, incx. * Used as a multiplication symbol in code examples and equations and where required by the Fortran syntax. 41 Intel® Math Kernel Library Reference Manual 42 Function Domains 1 The Intel® Math Kernel Library includes Fortran routines and functions optimized for Intel® processor-based computers running operating systems that support multiprocessing. In addition to the Fortran interface, Intel MKL includes a C-language interface for the Discrete Fourier transform functions, as well as for the Vector Mathematical Library and Vector Statistical Library functions. For hardware and software requirements to use Intel MKL, see Intel® MKL Release Notes. The Intel® Math Kernel Library includes the following groups of routines: • Basic Linear Algebra Subprograms (BLAS): – vector operations – matrix-vector operations – matrix-matrix operations • Sparse BLAS Level 1, 2, and 3 (basic operations on sparse vectors and matrices) • LAPACK routines for solving systems of linear equations • LAPACK routines for solving least squares problems, eigenvalue and singular value problems, and Sylvester's equations • Auxiliary and utility LAPACK routines • ScaLAPACK computational, driver and auxiliary routines (only in Intel MKL for Linux* and Windows* operating systems) • PBLAS routines for distributed vector, matrix-vector, and matrix-matrix operation • Direct and Iterative Sparse Solver routines • Vector Mathematical Library (VML) functions for computing core mathematical functions on vector arguments (with Fortran and C interfaces) • Vector Statistical Library (VSL) functions for generating vectors of pseudorandom numbers with different types of statistical distributions and for performing convolution and correlation computations • General Fast Fourier Transform (FFT) Functions, providing fast computation of Discrete Fourier Transform via the FFT algorithms and having Fortran and C interfaces • Cluster FFT functions (only in Intel MKL for Linux* and Windows* operating systems) • Tools for solving partial differential equations - trigonometric transform routines and Poisson solver • Optimization Solver routines for solving nonlinear least squares problems through the Trust-Region (TR) algorithms and computing Jacobi matrix by central differences • Basic Linear Algebra Communication Subprograms (BLACS) that are used to support a linear algebra oriented message passing interface • Data Fitting functions for spline-based approximation of functions, derivatives and integrals of functions, and search • GMP arithmetic functions For specific issues on using the library, also see the Intel® MKL Release Notes. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 43 BLAS Routines The BLAS routines and functions are divided into the following groups according to the operations they perform: • BLAS Level 1 Routines perform operations of both addition and reduction on vectors of data. Typical operations include scaling and dot products. • BLAS Level 2 Routines perform matrix-vector operations, such as matrix-vector multiplication, rank-1 and rank-2 matrix updates, and solution of triangular systems. • BLAS Level 3 Routines perform matrix-matrix operations, such as matrix-matrix multiplication, rank-k update, and solution of triangular systems. Starting from release 8.0, Intel® MKL also supports the Fortran 95 interface to the BLAS routines. Starting from release 10.1, a number of BLAS-like Extensions are added to enable the user to perform certain data manipulation, including matrix in-place and out-of-place transposition operations combined with simple matrix arithmetic operations. Sparse BLAS Routines The Sparse BLAS Level 1 Routines and Functions and Sparse BLAS Level 2 and Level 3 Routines routines and functions operate on sparse vectors and matrices. These routines perform vector operations similar to the BLAS Level 1, 2, and 3 routines. The Sparse BLAS routines take advantage of vector and matrix sparsity: they allow you to store only non-zero elements of vectors and matrices. Intel MKL also supports Fortran 95 interface to Sparse BLAS routines. LAPACK Routines The Intel® Math Kernel Library fully supports LAPACK 3.1 set of computational, driver, auxiliary and utility routines. The original versions of LAPACK from which that part of Intel MKL was derived can be obtained from http:// www.netlib.org/lapack/index.html. The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. The LAPACK routines can be divided into the following groups according to the operations they perform: • Routines for solving systems of linear equations, factoring and inverting matrices, and estimating condition numbers (see Chapter 3). • Routines for solving least squares problems, eigenvalue and singular value problems, and Sylvester's equations (see Chapter 4). • Auxiliary and utility routines used to perform certain subtasks, common low-level computation or related tasks (see Chapter 5). Starting from release 8.0, Intel MKL also supports the Fortran 95 interface to LAPACK computational and driver routines. This interface provides an opportunity for simplified calls of LAPACK routines with fewer required arguments. ScaLAPACK Routines The ScaLAPACK package (included only with the Intel® MKL versions for Linux* and Windows* operating systems, see Chapter 6 and Chapter 7) runs on distributed-memory architectures and includes routines for solving systems of linear equations, solving linear least squares problems, eigenvalue and singular value problems, as well as performing a number of related computational tasks. The original versions of ScaLAPACK from which that part of Intel MKL was derived can be obtained from http://www.netlib.org/scalapack/index.html. The authors of ScaLAPACK are L. Blackford, J. Choi, A.Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K.Stanley, D. Walker, and R. Whaley. 1 Intel® Math Kernel Library Reference Manual 44 The Intel MKL version of ScaLAPACK is optimized for Intel® processors and uses MPICH version of MPI as well as Intel MPI. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 PBLAS Routines The PBLAS routines perform operations with distributed vectors and matrices. • PBLAS Level 1 Routines perform operations of both addition and reduction on vectors of data. Typical operations include scaling and dot products. • PBLAS Level 2 Routines perform distributed matrix-vector operations, such as matrix-vector multiplication, rank-1 and rank-2 matrix updates, and solution of triangular systems. • PBLAS Level 3 Routines perform distributed matrix-matrix operations, such as matrix-matrix multiplication, rank-k update, and solution of triangular systems. Intel MKL provides the PBLAS routines with interface similar to the interface used in the Netlib PBLAS (part of the ScaLAPACK package, see http://www.netlib.org/scalapack/html/pblas_qref.html). Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Sparse Solver Routines Direct sparse solver routines in Intel MKL (see Chapter 8) solve symmetric and symmetrically-structured sparse matrices with real or complex coefficients. For symmetric matrices, these Intel MKL subroutines can solve both positive-definite and indefinite systems. Intel MKL includes the PARDISO* sparse solver interface as well as an alternative set of user callable direct sparse solver routines. If you use the sparse solver PARDISO* from Intel MKL, please cite: O.Schenk and K.Gartner. Solving unsymmetric sparse systems of linear equations with PARDISO. J. of Future Generation Computer Systems, 20(3):475-487, 2004. Intel MKL provides also an iterative sparse solver (see Chapter 8) that uses Sparse BLAS level 2 and 3 routines and works with different sparse data formats. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for Function Domains 1 45 Optimization Notice use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 VML Functions The Vector Mathematical Library (VML) functions (see Chapter 9) include a set of highly optimized implementations of certain computationally expensive core mathematical functions (power, trigonometric, exponential, hyperbolic, etc.) that operate on vectors of real and complex numbers. Application programs that might significantly improve performance with VML include nonlinear programming software, integrals computation, and many others. VML provides interfaces both for Fortran and C languages. Statistical Functions The Vector Statistical Library (VSL) contains three sets of functions (see Chapter 10): • The first set includes a collection of pseudo- and quasi-random number generator subroutines implementing basic continuous and discrete distributions. To provide best performance, the VSL subroutines use calls to highly optimized Basic Random Number Generators (BRNGs) and a library of vector mathematical functions. • The second set includes a collection of routines that implement a wide variety of convolution and correlation operations. • The third set includes a collection of routines for initial statistical analysis of raw single and double precision multi-dimensional datasets. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Fourier Transform Functions The Intel® MKL multidimensional Fast Fourier Transform (FFT) functions with mixed radix support (see Chapter 11) provide uniformity of discrete Fourier transform computation and combine functionality with ease of use. Both Fortran and C interface specification are given. There is also a cluster version of FFT functions, which runs on distributed-memory architectures and is provided only in Intel MKL versions for the Linux* and Windows* operating systems. The FFT functions provide fast computation via the FFT algorithms for arbitrary lengths. See the Intel® MKL User's Guide for the specific radices supported. Partial Differential Equations Support Intel® MKL provides tools for solving Partial Differential Equations (PDE) (see Chapter 13). These tools are Trigonometric Transform interface routines and Poisson Library. 1 Intel® Math Kernel Library Reference Manual 46 The Trigonometric Transform routines may be helpful to users who implement their own solvers similar to the solver that the Poisson Library provides. The users can improve performance of their solvers by using fast sine, cosine, and staggered cosine transforms implemented in the Trigonometric Transform interface. The Poisson Library is designed for fast solving of simple Helmholtz, Poisson, and Laplace problems. The Trigonometric Transform interface, which underlies the solver, is based on the Intel MKL FFT interface (refer to Chapter 11), optimized for Intel® processors. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Nonlinear Optimization Problem Solvers Intel® MKL provides Nonlinear Optimization Problem Solver routines (see Chapter 14) that can be used to solve nonlinear least squares problems with or without linear (bound) constraints through the Trust-Region (TR) algorithms and compute Jacobi matrix by central differences. Support Functions The Intel® MKL support functions (see Chapter 15) are used to support the operation of the Intel MKL software and provide basic information on the library and library operation, such as the current library version, timing, setting and measuring of CPU frequency, error handling, and memory allocation. Starting from release 10.0, the Intel MKL support functions provide additional threading control. Starting from release 10.1, Intel MKL selectively supports a Progress Routine feature to track progress of a lengthy computation and/or interrupt the computation using a callback function mechanism. The user application can define a function called mkl_progress that is regularly called from the Intel MKL routine supporting the progress routine feature. See the Progress Routines section in Chapter 15 for reference. Refer to a specific LAPACK or DSS/PARDISO function description to see whether the function supports this feature or not. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 BLACS Routines The Intel® Math Kernel Library implements routines from the BLACS (Basic Linear Algebra Communication Subprograms) package (see Chapter 16) that are used to support a linear algebra oriented message passing interface that may be implemented efficiently and uniformly across a large range of distributed memory platforms. The original versions of BLACS from which that part of Intel MKL was derived can be obtained from http:// www.netlib.org/blacs/index.html. The authors of BLACS are Jack Dongarra and R. Clint Whaley. Function Domains 1 47 Data Fitting Functions The Data Fitting component includes a set of highly-optimized implementations of algorithms for the following spline-based computations: • spline construction • interpolation including computation of derivatives and integration • search The algorithms operate on single and double vector-valued functions set in the points of the given partition. You can use Data Fitting algorithms in applications that are based on data approximation. GMP Arithmetic Functions Intel® MKL implementation of GMP* arithmetic functions includes arbitrary precision arithmetic operations on integer numbers. The interfaces of such functions fully match the GNU Multiple Precision (GMP*) Arithmetic Library. NOTE GMP Arithmetic Functions are deprecated and will be removed in a future Intel MKL release. Performance Enhancements The Intel® Math Kernel Library has been optimized by exploiting both processor and system features and capabilities. Special care has been given to those routines that most profit from cache-management techniques. These especially include matrix-matrix operation routines such as dgemm(). In addition, code optimization techniques have been applied to minimize dependencies of scheduling integer and floating-point units on the results within the processor. The major optimization techniques used throughout the library include: • Loop unrolling to minimize loop management costs • Blocking of data to improve data reuse opportunities • Copying to reduce chances of data eviction from cache • Data prefetching to help hide memory latency • Multiple simultaneous operations (for example, dot products in dgemm) to eliminate stalls due to arithmetic unit pipelines • Use of hardware features such as the SIMD arithmetic units, where appropriate These are techniques from which the arithmetic code benefits the most. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 1 Intel® Math Kernel Library Reference Manual 48 Parallelism In addition to the performance enhancements discussed above, Intel® MKL offers performance gains through parallelism provided by the symmetric multiprocessing performance (SMP) feature. You can obtain improvements from SMP in the following ways: • One way is based on user-managed threads in the program and further distribution of the operations over the threads based on data decomposition, domain decomposition, control decomposition, or some other parallelizing technique. Each thread can use any of the Intel MKL functions (except for the deprecated ? lacon LAPACK routine) because the library has been designed to be thread-safe. • Another method is to use the FFT and BLAS level 3 routines. They have been parallelized and require no alterations of your application to gain the performance enhancements of multiprocessing. Performance using multiple processors on the level 3 BLAS shows excellent scaling. Since the threads are called and managed within the library, the application does not need to be recompiled thread-safe (see also Fortran 95 Interface Conventions in Chapter 2 ). • Yet another method is to use tuned LAPACK routines. Currently these include the single- and double precision flavors of routines for QR factorization of general matrices, triangular factorization of general and symmetric positive-definite matrices, solving systems of equations with such matrices, as well as solving symmetric eigenvalue problems. For instructions on setting the number of available processors for the BLAS level 3 and LAPACK routines, see Intel® MKL User's Guide. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 C Datatypes Specific to Intel MKL The mkl_types.h file defines datatypes specific to Intel MKL. C/C++ Type Fortran Type LP32 Equivalent (Size in Bytes) LP64 Equivalent (Size in Bytes) ILP64 Equivalent (Size in Bytes) MKL_INT (MKL integer) INTEGER (default INTEGER) C/C++: int Fortran: INTEGER*4 (4 bytes) C/C++: int Fortran: INTEGER*4 (4 bytes) C/C++: long long (or define MKL_ILP64 macros Fortran: INTEGER*8 (8 bytes) MKL_UINT (MKL unsigned integer) N/A C/C++: unsigned int (4 bytes) C/C++: unsigned int (4 bytes) C/C++: unsigned long long (8 bytes) MKL_LONG (MKL long integer) N/A C/C++: long (4 bytes) C/C++: long (Windows: 4 bytes) (Linux, Mac: 8 bytes) C/C++: long (8 bytes) Function Domains 1 49 C/C++ Type Fortran Type LP32 Equivalent (Size in Bytes) LP64 Equivalent (Size in Bytes) ILP64 Equivalent (Size in Bytes) MKL_Complex8 (Like C99 complex float) COMPLEX*8 (8 bytes) (8 bytes) (8 bytes) MKL_Complex16 (Like C99 complex double) COMPLEX*16 (16 bytes) (16 bytes) (16 bytes) You can redefine datatypes specific to Intel MKL. One reason to do this is if you have your own types which are binary-compatible with Intel MKL datatypes, with the same representation or memory layout. To redefine a datatype, use one of these methods: • Insert the #define statement redefining the datatype before the mkl.h header file #include statement. For example, #define MKL_INT size_t #include "mkl.h" • Use the compiler -D option to redefine the datatype. For example, ...-DMKL_INT=size_t... NOTE As the user, if you redefine Intel MKL datatypes you are responsible for making sure that your definition is compatible with that of Intel MKL. If not, it might cause unpredictable results or crash the application. 1 Intel® Math Kernel Library Reference Manual 50 BLAS and Sparse BLAS Routines 2 This chapter describes the Intel® Math Kernel Library implementation of the BLAS and Sparse BLAS routines, and BLAS-like extensions. The routine descriptions are arranged in several sections: • BLAS Level 1 Routines (vector-vector operations) • BLAS Level 2 Routines (matrix-vector operations) • BLAS Level 3 Routines (matrix-matrix operations) • Sparse BLAS Level 1 Routines (vector-vector operations). • Sparse BLAS Level 2 and Level 3 Routines (matrix-vector and matrix-matrix operations) • BLAS-like Extensions Each section presents the routine and function group descriptions in alphabetical order by routine or function group name; for example, the ?asum group, the ?axpy group. The question mark in the group name corresponds to different character codes indicating the data type (s, d, c, and z or their combination); see Routine Naming Conventions. When BLAS or Sparse BLAS routines encounter an error, they call the error reporting routine xerbla. In BLAS Level 1 groups i?amax and i?amin, an "i" is placed before the data-type indicator and corresponds to the index of an element in the vector. These groups are placed in the end of the BLAS Level 1 section. BLAS Routines Routine Naming Conventions BLAS routine names have the following structure: ( ) The field indicates the data type: s real, single precision c complex, single precision d real, double precision z complex, double precision Some routines and functions can have combined character codes, such as sc or dz. For example, the function scasum uses a complex input array and returns a real value. The field, in BLAS level 1, indicates the operation type. For example, the BLAS level 1 routines ? dot, ?rot, ?swap compute a vector dot product, vector rotation, and vector swap, respectively. In BLAS level 2 and 3, reflects the matrix argument type: ge general matrix gb general band matrix sy symmetric matrix sp symmetric matrix (packed storage) sb symmetric band matrix he Hermitian matrix hp Hermitian matrix (packed storage) 51 hb Hermitian band matrix tr triangular matrix tp triangular matrix (packed storage) tb triangular band matrix. The field, if present, provides additional details of the operation. BLAS level 1 names can have the following characters in the field: c conjugated vector u unconjugated vector g Givens rotation construction m modified Givens rotation mg modified Givens rotation construction BLAS level 2 names can have the following characters in the field: mv matrix-vector product sv solving a system of linear equations with a single unknown vector r rank-1 update of a matrix r2 rank-2 update of a matrix. BLAS level 3 names can have the following characters in the field: mm matrix-matrix product sm solving a system of linear equations with multiple unknown vectors rk rank-k update of a matrix r2k rank-2k update of a matrix. The examples below illustrate how to interpret BLAS routine names: ddot : double-precision real vector-vector dot product cdotc : complex vector-vector dot product, conjugated scasum : sum of magnitudes of vector elements, single precision real output and single precision complex input cdotu : vector-vector dot product, unconjugated, complex sgemv : matrix-vector product, general matrix, single precision ztrmm