Loading...
Write a Triton kernel that multiplies every element of a 1D tensor by a runtime scalar: out = x * scale.
Scalars can be passed as ordinary kernel arguments (no pointer needed) and used directly inside the kernel.
Implement mul_kernel and run(n=1024, scale=3.0) that returns whether the output equals x * scale.
Your solution must define a top-level function run(...) that allocates inputs on the GPU, launches your Triton kernel, and returns a boolean from torch.allclose(triton_out, torch_reference, ...). The grader prints run(...); the expected output is True.
n = 1024, scale = 3.0
True
n = 1024 and scale = 3.0, which are used to allocate a 1D tensor of length n and a scalar value scale.mul_kernel function is launched, which multiplies every element of the input tensor x by the scale value, resulting in an output tensor out where each element is calculated as outi=xi⋅scale.out is then compared to a reference tensor calculated using PyTorch, where each element is also xi⋅scale, using the torch.allclose function with a small tolerance.True if all elements of the output tensor are close to the corresponding elements of the reference tensor, indicating that the Triton kernel produced the correct result.