📘

CUDA Bounds-Checked Increment

Problem Statement

Add one to every element of a 1D array, out = x + 1, with a length that is not a multiple of the block size.

Background

When n isn't a multiple of blockDim.x, the last block has threads whose global index runs past the end of the array. The if i < n guard is what stops those threads from writing out of bounds.

Your Task

Implement inc_kernel and run(n=1000) (deliberately non-power-of-two) returning whether the result is exact.

How it is tested

Your solution must define a top-level function run(...) that allocates the inputs, copies them to the GPU, launches your @cuda.jit kernel, and returns a Python bool from np.allclose(gpu_result, reference). The grader prints run(...); the expected output is True.

Example:

Input:

n = 1000

Output:

True

Reasoning:

The input n = 1000 is used to create a 1D array of length 1000, with each element initialized to a value.
The inc_kernel function is launched with this array, adding 1 to each element using the formula out = x + 1, while using a bounds check if i < n to prevent out-of-bounds writes.
The resulting array is compared to a reference array, where each element is the original value plus 1, using np.allclose to check for exactness, considering floating point precision issues with a tolerance.
The comparison yields True, indicating that the result of the CUDA kernel is exact, which is the final output.

Constraints:

out[i] = x[i] + 1.0
The if i < x.size guard must protect the tail block
n is not a multiple of the block size

Editor

Python 3.13.1

GPU · T4

Test Results

0/0

Run code to see test results.