examples/cuda/shared

CUDA Shared Memory Examples

CUDA shared memory is a small amount of fast, on-chip memory that is shared among threads of the same thread block. It is much faster than global memory but limited in size. Here are some examples of how to use shared memory in CUDA programs.

Usage Examples

Example 1: Array Sum
- In this example, we calculate the sum of an array using shared memory to speed up the process.
- Array Sum
Example 2: Matrix Multiplication
- This example demonstrates how to perform matrix multiplication using shared memory to reduce global memory accesses.
- Matrix Multiplication
Example 3: Reduction
- The reduction example shows how to use shared memory to perform a reduction operation efficiently.
- Reduction

Performance Benefits

Shared memory provides a significant speedup over global memory by reducing the latency and bandwidth required for memory accesses.
By using shared memory, you can often reduce the number of memory transactions and improve the cache efficiency of your CUDA kernel.

For more examples and in-depth tutorials on CUDA shared memory, visit our CUDA Shared Memory Guide.

examples/cuda/shared_memory

Usage Examples

Performance Benefits