Bats inhabit all continents except Antarctica, and they have enormous potential as bioindicators. Therefore, monitoring bats helps us to understand the surrounding environmental changes. However, bats are nocturnal, which makes it difficult to visually monitor their behavior. This paper proposes a bat species identifier method based on the analysis of ultrasound called echolocation calls, which is a promising method to monitor bats’ activity levels effectively. We develop a robust method to identify the bat species with improved accuracy by analyzing their echolocation calls. First, 1400 sound files with four families, 13 genera, and 30 species were recorded in Japan and the Jincheon-gun in South Korea from 1999 to 2019. Bat echolocation calls were detected from the sound files and used to generate 54,525 spectrograms by applying short-time Fourier transform. We developed a deep learning– based bat species identifier using convolutional neural networks with MobileNetV1 used as the model’s architecture. Furthermore, we applied nested cross-validation with the Bayesian optimization algorithm to search for the optimal combination of hyperparameters and evaluate the expected performance. We achieved 98.1% accuracy, which outperformed previous studies that treated more than 30 bat species. We visualized important regions of the spectrograms which correspond to prediction using the Guided Grad-CAM. Moreover, we discussed how to treat the noise class and minimize the model training time. Then, we proposed potential solutions to boost the identifier’s performance, the generalization of the echolocation call recording protocols, and applicable techniques to improve the identification accuracy. Future perspectives are 1) to change the deep learning algorithm from image classification to object detection and 2) to apply the proposed identifier to unknown bat echolocation calls to evaluate the feasibility of estimating bat fauna and spatial activity distribution.