Bengio, Y., De Mori, R., Flammia, G., and Kompe, R. (1991). Phonetically motivated acoustic
parameters for continuous speech recognition using artificial neural networks. In Proceedings
of EuroSpeech’91. 17
Bengio, Y., De Mori, R., Flammia, G., and Kompe, R. (1992). Global optimization of a neural
network-hidden Markov model hybrid. IEEE Transactions on Neural Networks, 3(2), 252–259.
229, 231
Bengio, Y., Frasconi, P., and Simard, P. (1993). The problem of learning long-term dependencies
in recurrent networks. In IEEE International Conference on Neural Networks, pages 1183–
1195, San Francisco. IEEE Press. (invited paper). 163, 218
Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient
descent is difficult. IEEE Tr. Neural Nets. 163, 164, 210, 216, 218, 219
Bengio, Y., LeCun, Y., Nohl, C., and Burges, C. (1995). Lerec: A NN/HMM hybrid for on-line
handwriting recognition. Neural Computation, 7(6), 1289–1303. 231
Bengio, Y., Ducharme, R., and Vincent, P. (2001a). A neural probabilistic language model. In
NIPS’00 , pages 932–938. MIT Press. 16
Bengio, Y., Ducharme, R., and Vincent, P. (2001b). A neural probabilistic language model. In
NIPS’2000 , pages 932–938. 248, 249
Bengio, Y., Ducharme, R., and Vincent, P. (2001c). A neural probabilistic language model. In
T. K. Leen, T. G. Dietterich, and V. Tresp, editors, NIPS’2000 , pages 932–938. MIT Press.
343, 344
Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. (2003a). A neural probabilistic language
model. JMLR, 3, 1137–1155. 248
Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. (2003b). A neural probabilistic language
model. Journal of Machine Learning Research, 3, 1137–1155. 343, 344
Bengio, Y., Delalleau, O., and Le Roux, N. (2006a). The curse of highly variable functions for
local kernel machines. In NIPS’2005. 94
Bengio, Y., Larochelle, H., and Vincent, P. (2006b). Non-local manifold Parzen windows. In
NIPS’2005 . MIT Press. 97, 340
Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. (2007). Greedy layer-wise training of
deep networks. In NIPS’2006. 16, 308, 311
Bengio, Y., Louradour, J., Collobert, R., and Weston, J. (2009). Curriculum learning. In
ICML’09 . 117
Bengio, Y., L´eonard, N., and Courville, A. (2013a). Estimating or propagating gradients through
stochastic neurons for conditional computation. arXiv:1308.3432. 275
Bengio, Y., Yao, L., Alain, G., and Vincent, P. (2013b). Generalized denoising auto-encoders as
generative models. In NIPS’2013 . 304, 405, 408
Bengio, Y., Courville, A., and Vincent, P. (2013c). Representation learning: A review and
new perspectives. IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), 35(8),
1798–1828. 333, 403
415