Triton Scalar Multiply Kernel

Problem Statement

Write a Triton kernel that multiplies every element of a 1D tensor by a runtime scalar: out = x * scale.

Background

Scalars can be passed as ordinary kernel arguments (no pointer needed) and used directly inside the kernel.

Your Task

Implement mul_kernel and run(n=1024, scale=3.0) that returns whether the output equals x * scale.

How it is tested

Your solution must define a top-level function run(...) that allocates inputs on the GPU, launches your Triton kernel, and returns a boolean from torch.allclose(triton_out, torch_reference, ...). The grader prints run(...); the expected output is True.

The input values are n = 1024 and scale = 3.0, which are used to allocate a 1D tensor of length n and a scalar value scale.
The mul_kernel function is launched, which multiplies every element of the input tensor x by the scale value, resulting in an output tensor out where each element is calculated as $out_i = x_i \cdot scale$ .
The output tensor out is then compared to a reference tensor calculated using PyTorch, where each element is also $x_i \cdot scale$ , using the torch.allclose function with a small tolerance.
The comparison returns True if all elements of the output tensor are close to the corresponding elements of the reference tensor, indicating that the Triton kernel produced the correct result.

Problem Statement

Write a Triton kernel that multiplies every element of a 1D tensor by a runtime scalar: out = x * scale.

Background

Scalars can be passed as ordinary kernel arguments (no pointer needed) and used directly inside the kernel.

Your Task

Implement mul_kernel and run(n=1024, scale=3.0) that returns whether the output equals x * scale.

How it is tested

The input values are n = 1024 and scale = 3.0, which are used to allocate a 1D tensor of length n and a scalar value scale.
The mul_kernel function is launched, which multiplies every element of the input tensor x by the scale value, resulting in an output tensor out where each element is calculated as $out_i = x_i \cdot scale$ .
The output tensor out is then compared to a reference tensor calculated using PyTorch, where each element is also $x_i \cdot scale$ , using the torch.allclose function with a small tolerance.
The comparison returns True if all elements of the output tensor are close to the corresponding elements of the reference tensor, indicating that the Triton kernel produced the correct result.

Problem Statement

Background

Your Task

How it is tested

Example:

Constraints:

Test Results

Triton Scalar Multiply Kernel

Problem Statement

Background

Your Task

How it is tested

Example:

Constraints:

Test Results