Convolution Kernel Gradient Calculation

Key Idea with 2 different wordings

Kernel inputs multiplied with result feature vector derivatives gives us kernel derivative to total loss
In short he kernel gradient is found by summing the patches that produced the outputs, weighted by how important those outputs were.

Start with this

NB!

If stride was 1, the output gradient tensor will be applied to image directly as a convolution. Example course jupyter notebook NB the proposed method works only with stride 1 with any other stride it does not.
If stride was 2, the output gradient will be dilated and then applied to input tensor.