Tái cấu trúc ND

Đối với giáo dục của tôi, tôi đang cố gắng thực hiện một lớp chập N chiều trong mạng lưới thần kinh tích chập.

Tôi muốn thực hiện một chức năng backpropagation. Tuy nhiên, tôi không chắc chắn về cách làm hiệu quả nhất.

Hiện tại, tôi đang sử dụng signal.fftconvolveđể:

Trong bước chuyển tiếp, kết hợp bộ lọc và nhân chuyển tiếp trên tất cả các bộ lọc;
Trong bước Backpropagation, kết hợp các dẫn xuất (đảo ngược tất cả các kích thước với hàm FlipAllAxes) với mảng ( https://jefkine.com/general/2016/09/05/backpropagation-in-convolutional-neural-networks/ ) tất cả các bộ lọc và tổng hợp chúng. Đầu ra tôi lấy là tổng của mỗi hình ảnh được tích hợp với mỗi đạo hàm cho mỗi bộ lọc.

Tôi đặc biệt bối rối về cách kết hợp các dẫn xuất . Sử dụng lớp dưới đây để backpropagate dẫn đến một vụ nổ về kích thước của trọng lượng.

Cách chính xác để lập trình tích chập của đạo hàm với đầu ra và bộ lọc là gì?

BIÊN TẬP:

Theo bài viết này ( Đào tạo nhanh về Mạng kết hợp thông qua các FFT ), trong đó tìm cách thực hiện chính xác những gì tôi muốn làm:

Các đạo hàm cho lớp trước được cho bởi sự tích chập của các đạo hàm của lớp hiện tại với các trọng số:

dL / dy_f = dL / dx * w_f ^ T
Đạo hàm cho các trọng số là tổng số của sự kết hợp của các đạo hàm với đầu vào ban đầu:

dL / dy = dL / dx * x

Tôi đã thực hiện, tốt nhất như tôi biết, điều này dưới đây. Tuy nhiên, điều này dường như không mang lại kết quả như mong muốn, vì mạng mà tôi đã viết bằng cách sử dụng lớp này thể hiện sự dao động mạnh mẽ trong quá trình đào tạo.

    import numpy as np
    from scipy import signal

    class ConvNDLayer:
        def __init__(self,channels, kernel_size, dim):

            self.channels = channels
            self.kernel_size = kernel_size;
            self.dim = dim

            self.last_input = None

            self.filt_dims = np.ones(dim+1).astype(int)
            self.filt_dims[1:] =  self.filt_dims[1:]*kernel_size
            self.filt_dims[0]= self.filt_dims[0]*channels 
            self.filters = np.random.randn(*self.filt_dims)/(kernel_size)**dim


        def FlipAllAxes(self, array):

            sl = slice(None,None,-1)
            return array[tuple([sl]*array.ndim)] 

        def ViewAsWindows(self, array, window_shape, step=1):
             # -- basic checks on arguments
             if not isinstance(array, cp.ndarray):
                 raise TypeError("`array` must be a Cupy ndarray")
             ndim = array.ndim
             if isinstance(window_shape, numbers.Number):
                  window_shape = (window_shape,) * ndim
             if not (len(window_shape) == ndim):
                   raise ValueError("`window_shape` is incompatible with `arr_in.shape`")

             if isinstance(step, numbers.Number):
                  if step < 1:
                  raise ValueError("`step` must be >= 1")
                  step = (step,) * ndim
             if len(step) != ndim:
                   raise ValueError("`step` is incompatible with `arr_in.shape`")

              arr_shape = array.shape
              window_shape = np.asarray(window_shape, dtype=arr_shape.dtype))

              if ((arr_shape - window_shape) < 0).any():
                   raise ValueError("`window_shape` is too large")

              if ((window_shape - 1) < 0).any():
                    raise ValueError("`window_shape` is too small")

               # -- build rolling window view
                    slices = tuple(slice(None, None, st) for st in step)
                    window_strides = array.strides
                    indexing_strides = array[slices].strides
                    win_indices_shape = (((array.shape -window_shape)
                    // step) + 1)

                 new_shape = tuple(list(win_indices_shape) + list(window_shape))
                 strides = tuple(list(indexing_strides) + list(window_strides))

                  arr_out = as_strided(array, shape=new_shape, strides=strides)

                  return arr_out

        def UnrollAxis(self, array, axis):
             # This so it works with a single dimension or a sequence of them
             axis = cp.asnumpy(cp.atleast_1d(axis))
             axis2 = cp.asnumpy(range(len(axis)))

             # Put unrolled axes at the beginning
             array = cp.moveaxis(array, axis,axis2)
             # Unroll
             return array.reshape((-1,) + array.shape[len(axis):])

        def Forward(self, array):

             output_shape =cp.zeros(array.ndim + 1)    
             output_shape[1:] =  cp.asarray(array.shape)
             output_shape[0]= self.channels 
             output_shape = output_shape.astype(int)
             output = cp.zeros(cp.asnumpy(output_shape))

             self.last_input = array

             for i, kernel in enumerate(self.filters):
                    conv = self.Convolve(array, kernel)
                    output[i] = conv

             return output


        def Backprop(self, d_L_d_out, learn_rate):

            d_A= cp.zeros_like(self.last_input)
            d_W = cp.zeros_like(self.filters)


           for i, (kernel, d_L_d_out_f) in enumerate(zip(self.filters, d_L_d_out)):

                d_A += signal.fftconvolve(d_L_d_out_f, kernel.T, "same")
                conv = signal.fftconvolve(d_L_d_out_f, self.last_input, "same")
                conv = self.ViewAsWindows(conv, kernel.shape)
                axes = np.arange(kernel.ndim)
                conv = self.UnrollAxis(conv, axes)  
                d_W[i] = np.sum(conv, axis=0)


           output = d_A*learn_rate
           self.filters =  self.filters - d_W*learn_rate
           return output

— Jack Rolph
nguồn

Nhân số gradient với learn_rate thường không đủ.

Để có hiệu suất tốt hơn và giảm biến động nặng, các gradient được chia tỷ lệ bằng cách sử dụng các trình tối ưu hóa bằng các phương pháp như chia cho vài gradient trước (RMSprop).

Các bản cập nhật cũng phụ thuộc vào lỗi, nếu bạn truyền lỗi cho từng mẫu riêng lẻ, điều đó thường tạo ra nhiễu, do đó, trung bình được coi là tốt hơn so với nhiều mẫu (lô nhỏ).

— SajanGohil
nguồn