Abstract: Bird sounds recognition is of great significance in bird protection. With appropriate sound classification, research can automatically predict the quality of life in the area. Nowadays, the deep learning model is used to classify bird sound data with high classification accuracy. However, the generalization ability of most existing bird sound recognition models is poor, and the complicated algorithm is applied to extract bird sound features. To address these problems, a large data set containing 264 kinds of birds is constructed in this paper to enhance the generalization ability of the model, and then a lightweight bird sound recognition model is proposed to build a lightweight feature extraction and recognition network with MobileNetV3 as the backbone. By adjusting the depthwise separable convolution in the model, the recognition ability of the model is improved.
A multi-scale feature fusion structure is designed, and the Pyramid Split Attention (PSA) module is added to the multi-scale feature fusion structure to improve the adaptability of the network to scale extraction of spatial information and channel information. To improve the refinement ability of the model towards the global information, the channel attention mechanism and ordinary convolution are introduced into Bneck module which makes the Bneck module become the Bnecks module. The experimental results show that the accuracy of Top-1 and Top-5 of the model in identifying 264 kinds of birds on the self-built data set is 95.12% and 100%, which are higher than that of MobileNetV1, MobileNetV2, MobileNetV3 respectively. Although the accuracy is lower than ResNet50, the number of parameters and floating-point operations (FLOPs) of the model is only 2.6M and 127M respectively. The accuracy is only reduced by 2.25% while saving costs.
Keywords: Attention mechanism, bird sound recognition, deep learning, lightweight, multi-scale feature fusion.
| DOI: 10.17148/IARJSET.2023.10211