CUDA Scalar Multiply Kernel

Problem Statement

Multiply every element of a 1D array by a runtime scalar: out = x * scale.

Background

Scalars are passed as ordinary kernel arguments — no device pointer needed — and used directly inside the kernel.

Your Task

Implement mul_kernel and run(n=1024, scale=3.0) returning whether the output equals x * scale.

How it is tested

Your solution must define a top-level function run(...) that allocates the inputs, copies them to the GPU, launches your @cuda.jit kernel, and returns a Python bool from np.allclose(gpu_result, reference). The grader prints run(...); the expected output is True.

The input values are n = 1024 and scale = 3.0, representing the size of the 1D array and the scalar multiplier, respectively.
A 1D array x of size n is created, and its elements are multiplied by the scale factor using the mul_kernel function, resulting in an output array out where each element is calculated as $out_i = x_i \cdot scale$ .
The resulting array out is compared to the reference array, which is also calculated as $x \cdot scale$ , using np.allclose to check for equality within a tolerance.
Since the mul_kernel function correctly multiplies each element of the array by the scale factor, the comparison returns True, indicating that the output array matches the reference array.

Problem Statement

Multiply every element of a 1D array by a runtime scalar: out = x * scale.

Background

Scalars are passed as ordinary kernel arguments — no device pointer needed — and used directly inside the kernel.

Your Task

Implement mul_kernel and run(n=1024, scale=3.0) returning whether the output equals x * scale.

How it is tested

The input values are n = 1024 and scale = 3.0, representing the size of the 1D array and the scalar multiplier, respectively.
A 1D array x of size n is created, and its elements are multiplied by the scale factor using the mul_kernel function, resulting in an output array out where each element is calculated as $out_i = x_i \cdot scale$ .
The resulting array out is compared to the reference array, which is also calculated as $x \cdot scale$ , using np.allclose to check for equality within a tolerance.
Since the mul_kernel function correctly multiplies each element of the array by the scale factor, the comparison returns True, indicating that the output array matches the reference array.

Problem Statement

Background

Your Task

How it is tested

Example:

Constraints:

Test Results

CUDA Scalar Multiply Kernel

Problem Statement

Background

Your Task

How it is tested

Example:

Constraints:

Test Results