CUDA_LOOP
Demonstrate How CUDA Blocks and Threads Allocate Tasks

CUDA_LOOP is a FORTRAN90 library which demonstrates how the user's choice of CUDA blocks and threads determines how the user's tasks will be distributed across the GPU.

A CUDA kernel "kernel()" is invoked by a command of the form

    
      kernel << blocks, threads >> ( args )

where blocks and threads are each vectors of up to 3 values, listing the number of blocks and number of threads to be used.

If a problem involves N tasks, then tasks are allotted to specific CUDA processes in an organized fashion. Some processes may get no tasks, one task, or multiple tasks.

Each process is given variables that can be used to determine the tasks to be performed:

gridDim.x, gridDim.y, gridDim.z: the block dimensions as given by the user in "blocks";
blockDim.x, blockDim.y, blockDim.z: the thread dimensions as given by the user in "threads";
blockIdx.x, blockIdx.y, blockId.z: the block indices for this process.
threadIdx.x, threadIdx.y, threadIdx.z: the thread indices for this process.

Essentially, a process can determine its linear index K by:

      K = threadIdx.x
        +  blockdim.x  * threadIdx.y
        +  blockDim.x  *  blockDim.y  * threadIdx.z
        +  blockDim.x  *  blockDim.y  *  blockDim.z  * blockIdx.x
        +  blockDim.x  *  blockDim.y  *  blockDim.z  *  gridDim.x  * blockIdx.y
        +  blockDim.x  *  blockDim.y  *  blockDim.z  *  gridDim.x  *  gridDim.y  * blockIdx.z

It should use this index as follow:

      Set task T = K.

      while ( T < N )
        carry out task T;
        T = T + blockDim.x * blockDim.y * blockDim.z * gridDim.x * gridDim.y * gridDim.z.

The CUDA_LOOP program suggests how a specific set of block and thread parameters would determine the assignment of individual tasks to CUDA processes.

Licensing:

The computer code and data files made available on this web page are distributed under the GNU LGPL license.

Languages:

CUDA_LOOP is available in a C version and a C++ version and a FORTRAN90 version and a MATLAB version and a Python version.

Reference:

John Cheng, Max Grossman, Ty McKercher,
Professional CUDA C Programming,
John Wiley, 2014,
ISBN: 978-1-118-73932-7.
Jason Sanders, Edward Kandrot,
CUDA by Example,
Addison Wesley, 2010,
ISBN: 978-0-13-138768-3,
LC: QA76.76.A65S255 2010.

Source Code:

cuda_loop.f90, the source code.

Examples and Tests:

cuda_loop_test.f90 a sample calling program.
cuda_loop_test.txt, the output file.

List of Routines:

CUDA_LOOP simulates the behavior of a CUDA loop.
TIMESTAMP prints the current YMDHMS date as a time stamp.

You can go up one level to the FORTRAN90 source codes.

Last revised on 29 March 2018.