Keras分类器的准确性在训练期间稳步增加,然后下降到0.25(局部最小值?)

我有下面的neural network,写在Keras使用Tensorflow作为后端,我在Windows 10上运行在Python 3.5(Anaconda)上:

model = Sequential() model.add(Dense(100, input_dim=283, init='normal', activation='relu')) model.add(Dropout(0.2)) model.add(Dense(150, init='normal', activation='relu')) model.add(Dropout(0.2)) model.add(Dense(200, init='normal', activation='relu')) model.add(Dropout(0.2)) model.add(Dense(200, init='normal', activation='relu')) model.add(Dropout(0.2)) model.add(Dense(200, init='normal', activation='relu')) model.add(Dropout(0.2)) model.add(Dense(4, init='normal', activation='sigmoid')) sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy']) 

我在我的GPU上训练。 在训练(10000个纪元)期间,幼稚networking的准确度从0.25稳定地增加到0.7和0.9之间,然后突然下降并坚持在0.25:

  Epoch 1/10000 6120/6120 [==============================] - 1s - loss: 1.5329 - acc: 0.2665 Epoch 2/10000 6120/6120 [==============================] - 1s - loss: 1.2985 - acc: 0.3784 Epoch 3/10000 6120/6120 [==============================] - 1s - loss: 1.2259 - acc: 0.4891 Epoch 4/10000 6120/6120 [==============================] - 1s - loss: 1.1867 - acc: 0.5208 Epoch 5/10000 6120/6120 [==============================] - 1s - loss: 1.1494 - acc: 0.5199 Epoch 6/10000 6120/6120 [==============================] - 1s - loss: 1.1042 - acc: 0.4953 Epoch 7/10000 6120/6120 [==============================] - 1s - loss: 1.0491 - acc: 0.4982 Epoch 8/10000 6120/6120 [==============================] - 1s - loss: 1.0066 - acc: 0.5065 Epoch 9/10000 6120/6120 [==============================] - 1s - loss: 0.9749 - acc: 0.5338 Epoch 10/10000 6120/6120 [==============================] - 1s - loss: 0.9456 - acc: 0.5696 Epoch 11/10000 6120/6120 [==============================] - 1s - loss: 0.9252 - acc: 0.5995 Epoch 12/10000 6120/6120 [==============================] - 1s - loss: 0.9111 - acc: 0.6106 Epoch 13/10000 6120/6120 [==============================] - 1s - loss: 0.8772 - acc: 0.6160 Epoch 14/10000 6120/6120 [==============================] - 1s - loss: 0.8517 - acc: 0.6245 Epoch 15/10000 6120/6120 [==============================] - 1s - loss: 0.8170 - acc: 0.6345 Epoch 16/10000 6120/6120 [==============================] - 1s - loss: 0.7850 - acc: 0.6428 Epoch 17/10000 6120/6120 [==============================] - 1s - loss: 0.7633 - acc: 0.6580 Epoch 18/10000 6120/6120 [==============================] - 4s - loss: 0.7375 - acc: 0.6717 Epoch 19/10000 6120/6120 [==============================] - 1s - loss: 0.7058 - acc: 0.6850 Epoch 20/10000 6120/6120 [==============================] - 1s - loss: 0.6787 - acc: 0.7018 Epoch 21/10000 6120/6120 [==============================] - 1s - loss: 0.6557 - acc: 0.7093 Epoch 22/10000 6120/6120 [==============================] - 1s - loss: 0.6304 - acc: 0.7208 Epoch 23/10000 6120/6120 [==============================] - 1s - loss: 0.6052 - acc: 0.7270 Epoch 24/10000 6120/6120 [==============================] - 1s - loss: 0.5848 - acc: 0.7371 Epoch 25/10000 6120/6120 [==============================] - 1s - loss: 0.5564 - acc: 0.7536 Epoch 26/10000 6120/6120 [==============================] - 1s - loss: 0.1787 - acc: 0.4163 Epoch 27/10000 6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500 Epoch 28/10000 6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500 Epoch 29/10000 6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500 Epoch 30/10000 6120/6120 [==============================] - 2s - loss: 1.1921e-07 - acc: 0.2500 Epoch 31/10000 6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500 Epoch 32/10000 6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500 ... 

我猜测这是由于优化器落入了一个将所有数据分配到一个类别的局部最小值。 我怎样才能阻止它做到这一点?

我试过的东西(但似乎没有阻止这种情况发生):

  1. 使用不同的优化器(adam)
  2. 确保培训数据包括每个类别的相同数量的例子
  3. 增加培训数据量(目前在6000)
  4. 改变2到5之间的类别数量
  5. 将networking中的隐藏层数从1增加到5
  6. 改变层的宽度(从50到500)

这些都没有帮助。 任何其他的想法,为什么这是发生和/或如何抑制它? 这可能是凯拉斯的一个错误吗? 非常感谢您的任何build议。

编辑:这个问题似乎已经解决,通过改变最后激活softmax(从sigmoid)和增加maxnorm(3)正则化到最后两个隐藏层:

 model = Sequential() model.add(Dense(100, input_dim=npoints, init='normal', activation='relu')) model.add(Dropout(0.2)) model.add(Dense(150, init='normal', activation='relu')) model.add(Dropout(0.2)) model.add(Dense(200, init='normal', activation='relu')) model.add(Dropout(0.2)) model.add(Dense(200, init='normal', activation='relu', W_constraint=maxnorm(3))) model.add(Dropout(0.2)) model.add(Dense(200, init='normal', activation='relu', W_constraint=maxnorm(3))) model.add(Dropout(0.2)) sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) model.add(Dense(ncat, init='normal', activation='softmax')) model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy']) 

非常感谢您的build议。

这个问题在sigmoid函数中作为最后一层的激活。 在这种情况下,最终图层的输出不能被解释为属于单个类别的例子的概率分布。 这个层的输出通常不会总和为1.在这种情况下,优化可能会导致意外的行为。 在我看来,添加一个maxnorm约束是没有必要的,但我强烈建议你使用categorical_crossentropy而不是mse损失,因为它证明了这个函数对于这个优化情况更好。