Evaluation of deep learning techniques for acoustic environmental events detection


  • Svetlana Segarceanu Beia Consult International, Bucharest, Romania
  • Inge Gavat Politehnica University of Bucharest, Department of applied Electronics and information engineering, Bucharest, Romania
  • George Suciu Beia Consult International, Bucharest, Romania


environmental sound recognition, deep neural networks, spectral features


Environmental sounds detection plays an essential role in computer science and robotics as it simulates the human function of hearing. It is applied in environment research, monitoring and protection, by allowing investigation of natural reserves, and showing potential risks of damage deduced from the environmental acoustic. In this paper we present experimental results of attempts to separate different types of acoustic events from continuous environment recordings. We apply some deep learning approaches i.e., Deep Feed Forward networks, and Long short-term memory (LSTM) recurrent neural networks, feeding at input several types of parameters: spectral, cepstral and temporal features, and assessing various input/output data organization, or network configuration. The methods are evaluated and compared with some classical methods explored before.


HUANG, J., OHNISHI, N., SUGIE, N., Building ears for rbotos: Sound localization and separation, Artificial Life and Robotics, 1, 4, pp. 157-163, 1997.

BABIS, M., DURICEK, M., HARVANOVA, V., VOJTKO, M., Forest Guardian - Monitoring System for Detecting Logging Activities Based on Sound Recognition, IIT.SRC, pp. 1-6, 4 May, 2011.

PAPAN, J., JURECKA, M., PUCHYOVA, J., WSN for Forest Monitoring to Prevent Illegal Logging, IEEE, Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 809-812, 2012.

VENIER, L. A., MAZEROLLE, M. J., RODGERS, A., MCILWRICK, K. A., HOLMES, S., THOMPSON, D., Comparison of semiautomated bird song recognition with manual detection of recorded bird song sample, Avian Conservation and Ecology 12, 2, 2017, https://doi.org/10.5751/ ACE-01029-120202.

MITROVIC, D., ZEPPELZAUER, M., BREITENEDER, C., Discrimination and Retrieval of Animal Sounds, Poster: IEEE Multimedia Modelling Conference, Beijing; 01-03-2006 - 01-06-2006; in: Proceedings IEEE Multimedia Modeling Conference, pp. 339-343, 2006.

COWLING, M., SITTE, R., WYSOCK, T., Analysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System, Advanced Signal Processing for Communication Systems, Springer, 2002.

COWLING, M., SITTE, R., Recognition of Environmental Sounds Using Speech Recognition Techniques, In book: Advanced Signal Processing for Communication Systems, April 2006.

SACHIN CHACHADA, JAY KUO, C.-C, Environmental Sound Recognition: A Survey, APSIPA Transactions on Signal and Information Processing, 3, 2014.

SIGTIA, S., STARK, A., KRSTULOVIC, S., PLUMBLEY, M., Automatic environmental sound recognition: Performance versus computational cost, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24, 11, 2016.

DAVIS, S. B. AND MERMELSTEIN, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. on ASSP, 28, pp. 357-366, 1980.

OLTEANU, E., SEGARCEANU, S., GAVAT, I., Evaluation of speech specific and non-speech classification techniques for environmental sound recognition, SISOM 2019 (Symposium of Acoustics), Bucharest, May 2019.


BEALE, M. H., HAGAN, M. T., DEMUTH, H. B., Neural Network Toolbox™ User’s Guide, edited The MathWorks, Inc., Natick, Mass., 2010, available at https://www2.cs.siu.edu/~rahimi/cs437/slides/nnet.pdf.

Zacharias, V., AI for Data Science Artificial Intelligence Frameworks and Functionality for Deep Learning, Optimization, and Beyond, Technics Publications, LLC, 2018.


SAK, H., SENIOR, A., AND BEAUFAYS, F. (2014a), Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling, In Proc. Interspeech.