2D Image Convolution

Implement a 2D image convolution operation in valid mode, which involves sliding a kernel over an image to generate a feature map. This process is fundamental in Computer Vision as it enables the extraction of relevant features from images by applying a set of learnable filters.

The concept of convolution is based on the idea of scanning an image with a smaller matrix, known as a kernel, to compute feature values at each position. The kernel is slid over the entire image, and at each position, the element-wise product between the kernel and the overlapping image region is computed, followed by summation of these products. The resulting feature map has a size of $(H - kH + 1) \times (W - kW + 1)$ , where $H$ and $W$ are the dimensions of the image, and $kH$ and $kW$ are the dimensions of the kernel.

Here are the steps to perform the convolution:

Initialize an empty output matrix with dimensions $(H - kH + 1) \times (W - kW + 1)$ .
Slide the kernel over the image, scanning each valid position.
At each position, compute the element-wise product between the kernel and the overlapping image region.
Sum up the products to obtain the feature value at that position.

\text{Output}(i, j) = \sum_{x=0}^{kH-1} \sum_{y=0}^{kW-1} \text{Image}(i+x, j+y) \times \text{Kernel}(x, y)

This technique is widely used in image processing and analysis applications.

Example:

Input:

image = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
kernel = [[1, 1, 1], [1, 1, 1], [1, 1, 1]]

Output:

[[45.0]]

Reasoning:

The image size is $3 \times 3$ and the kernel size is $3 \times 3$ , so the output size will be $(3 - 3 + 1) \times (3 - 3 + 1) = 1 \times 1$ .
To compute the single output value, we calculate the sum of element-wise products between the kernel and the overlapping image region: $(1 \cdot 1) + (2 \cdot 1) + (3 \cdot 1) + (4 \cdot 1) + (5 \cdot 1) + (6 \cdot 1) + (7 \cdot 1) + (8 \cdot 1) + (9 \cdot 1) = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 45$ .
The result $45$ is already an integer, so rounding to 4 decimal places yields $45.0$ .
The final output is $[[45.0]]$ .

Constraints:

image and kernel are 2D lists of numbers
kernel fits within the image
Return 2D list of convolution results rounded to 4 decimal places

Example:

Constraints:

Test Results