Convolution Kernel Gradient Calculation
Key Idea with 2 different wordings
- Kernel inputs multiplied with result feature vector derivatives gives us kernel derivative to total loss
- In short he kernel gradient is found by summing the patches that produced the outputs, weighted by how important those outputs were.
Start with this
NB!
- If stride was 1, the output gradient tensor will be applied to image directly as a convolution. Example course jupyter notebook NB the proposed method works only with stride 1 with any other stride it does not.
- If stride was 2, the output gradient will be dilated and then applied to input tensor.