Loading...
Add one to every element of a 1D array, out = x + 1, with a length that is not a multiple of the block size.
When n isn't a multiple of blockDim.x, the last block has threads whose global index runs past the end of the array. The if i < n guard is what stops those threads from writing out of bounds.
Implement inc_kernel and run(n=1000) (deliberately non-power-of-two) returning whether the result is exact.
Your solution must define a top-level function run(...) that allocates the inputs, copies them to the GPU, launches your @cuda.jit kernel, and returns a Python bool from np.allclose(gpu_result, reference). The grader prints run(...); the expected output is True.
n = 1000
True
n = 1000 is used to create a 1D array of length 1000, with each element initialized to a value.inc_kernel function is launched with this array, adding 1 to each element using the formula out = x + 1, while using a bounds check if i < n to prevent out-of-bounds writes.np.allclose to check for exactness, considering floating point precision issues with a tolerance.True, indicating that the result of the CUDA kernel is exact, which is the final output.