Jump to content

Thread block (CUDA programming)

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Ssdeshp5 (talk | contribs) at 03:55, 21 September 2016 (Added Introduction). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

CUDA is a parallel computing platform and programming model that higher level languages can use to exploit parallelism. In CUDA, the kernel is executed with the aid of threads. The thread is an abstract entity that represents the execution of the kernel. A kernel is a small program or a function. Multithreaded applications use many such threads that are running at the same time to organize parallel computation. Every thread has an index, which is used for calculating memory address locations and also for taking control decisions.

For better process and data mapping, threads are grouped into thread blocks. A thread block is a programming abstraction that represents a group of threads that can be executing serially or parallelly. The number of threads varies with available shared memory. 'The number of threads in a thread block is also limited by the architecture to a total of 512 threads per block. '. The threads in the same thread block run on the same stream processor. Threads in the same block can communicate with each other via shared memory , barrier synchronization or other synchronization primitives such as atomic operations.

Multiple blocks are combined to form a grid. All the blocks in the same grid contain the same number of threads. Since the number of threads in a block is limited to 512, grids can be used for computations that require a large number of thread blocks to operate in parallel.

This template should only be used in the user namespace.This template should only be used in the user namespace.