Thread block (CUDA programming)
CUDA is a parallel computing platform and programming model that higher level languages can use to exploit parallelism. In CUDA, the kernel is executed with the aid of threads. The thread is an abstract entity that represents the execution of the kernel. A kernel is a small program or a function. Multithreaded applications use many such threads that are running at the same time to organize parallel computation. Every thread has an index, which is used for calculating memory address locations and also for taking control decisions.
For better process and data mapping, threads are grouped into thread blocks. A thread block is a programming abstraction that represents a group of threads that can be executing serially or parallelly. The number of threads varies with available shared memory. 'The number of threads in a thread block is also limited by the architecture to a total of 512 threads per block. '. The threads in the same thread block run on the same stream processor. Threads in the same block can communicate with each other via shared memory , barrier synchronization or other synchronization primitives such as atomic operations.
Multiple blocks are combined to form a grid. All the blocks in the same grid contain the same number of threads. Since the number of threads in a block is limited to 512, grids can be used for computations that require a large number of thread blocks to operate in parallel.
![]() | This is a user sandbox of Thread block (CUDA programming). You can use it for testing or practicing edits. This is not the place where you work on your assigned article for a dashboard.wikiedu.org course. Visit your Dashboard course page and follow the links for your assigned article in the My Articles section. |
This template should only be used in the user namespace.This template should only be used in the user namespace.