GHOST features the possibility to generate fast mathematical kernels for common usage scenarios. Which kernels are generated gets decided at compile time. Calling the auto-generated kernels is transparent, i.e., GHOST first tries to find a suitable generated kernel and, if that fails, calls a fallback implementation.

Usage

The relevant build variables for code generation have the prefix GHOST_GEN. Concretely, there are currently three variables: GHOST_GEN_DENSEMAT_DIM, GHOST_GEN_SELL_C, and GHOST_GEN_CUSELLSPMV. GHOST_GEN_DENSEMAT_DIM and GHOST_GEN_SELL_C are comma-separated lists of numbers. They should reflect commonly used widths of block vectors and chunkheights of SELL matrices. For example, if block vectors of width 4 and 8 occur and the matrix should be stored in SELL-32, GHOST_GEN_DENSEMAT_DIM=4,8 and GHOST_GEN_SELL_C=32.

CUDA kernels are very sensitive to branches. This is the reason why there is a fully-templated version of the CUDA SELL SpMV where all constant decisions regarding scaling, shifts and dot computations are based on template parameters. In total, this sums up to six boolean template parameters. Generating all possible instances would result in a very large compile time. Thus, certain combinations of those boolean values for which kernel code should be created can be selected at compile time via the variable GHOST_GEN_CUSELLSPMV.

Implementation

GHOST comes with a simple code generator consisting of several perl scripts which are located in bin/. The main preprocessing script /bin/ghost_pp.pl first substitutes placeholders with configured values for code generation and places the resulting files in separate files. In a second step, desired code lines get duplicated (GHOST_UNROLL). The generation of function variants works similar to C++ function templates. However, C++ function templates cannot be used here because they would not work together with GHOST_UNROLL (e.g. if the unrolling size depends on a template parameters).

In addition, a header file containg prototypes of all generated functions is generated by bin/ghost_extractfunc.pl.

Lastly, the script bin/ghost_mapfunc.pl creates a .def file which should be included in a GHOST source file. In this .def file, all generated kernels are inserted into a map for easy lookup.

GHOST_SUBST

This macro is used for the generation of function variants. In GHOST, the generation of block vector kernels with fixed block sizes and the generation of SpMV kernels with fixed chunk heights for the SELL matrix are done with this mechanism.

Example: A file containing

would, together with GHOST_AUTOGEN_SPMMV=32,1;1,4, result in the generation of the following two files:

int func_32_1(void) {
    return 32+1;
}

int func_1_4(void) {
    return 1+4;
}

GHOST_UNROLL

This macro is used in the intrinsics implementation of compute kernels. The code line to be duplicated has to start with #GHOST_UNROLL#. After that, the actual code follows in a single line. Everything that should be substituted with a serial index has to be an "@" sign. After the code line, another "#" followed with by the unroll size has to be specified.

Example:

#GHOST_UNROLL#int bla@ = @*4;#4

would result in the following code after preprocessing:

int bla0 = 0*4;
int bla1 = 1*4;
int bla2 = 2*4;
int bla3 = 3*4;