Music visualization research is extremely complex and dynamic. Several researchers have applied various methods to persevere in the study of all aspects that make up music. The complexity of music also includes factors such as waveform, frequency, pitch, rhythm, tempo, timbre, and chords. Researchers in recent years have studied the extraction of single elements, visualization, or cross-discipline for these aspects. As far as the current research is concerned, most of the disciplines related to music visualization are focused on computers, psychology, sports science, and other related disciplines. Research on the elements of music itself has focused on music visualization, music element extraction, music association, music emotion, and the study of several important aspects of music, such as waveform, frequency, pitch, rhythm, tempo, timbre, and chord. After reviewing the research, this paper has found that with the continuous development of science and technology, music visualization has a progressive intersection with computer science, artificial intelligence, and neural networks. Thus, future research can continue to interact more with computer science.
Ariza, C., & Cuthbert, M. S. (2010). Modeling Beats, Accents, Beams, And Time Signatures Hierarchically With Music21 Meter Objects. ICMC.
https://www.academia.edu/download/6799610/meterobjects.pdf
Bishop, D. T., Wright, M. J., & Karageorghis, C. I. (2014). The tempo and intensity of pre-task music modulate neural activity during reactive task performance. Psychology of Music, 42(5), 714–727. https://doi.org/10.1177/0305735613490595
Bittner, R. M., Bosch, J. J., Rubinstein, D., Meseguer-Brocal, G., & Ewert, S. (2022). A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 781–785.
https://doi.org/10.1109/ICASSP43922.2022.9746549
Bittner, R. M., Salamon, J., Essid, S., & Bello, J. P. (2015). Melody extraction by contour classification. International Conference on Music Information Retrieval (ISMIR). https://hal.science/hal-02943532/
Blok, M., Bana?, J., & Pietrolaj, M. (2021). IFE: NN-aided Instantaneous Pitch Estimation. 2021 14th International Conference on Human System Interaction (HSI), 1–7.
https://doi.org/10.1109/HSI52170.2021.9538713
Böck, S., & Davies, M. E. (2020). Deconstruct, Analyse, Reconstruct: How to improve Tempo, Beat, and Downbeat Estimation. ISMIR, 574–582.
https://program.ismir2020.net/static/final_papers/223.pdf
Böck, S., Davies, M. E., & Knees, P. (2019). Multi-Task Learning of Tempo and Beat: Learning One to Improve the Other. ISMIR, 486–493.
https://archives.ismir.net/ismir2019/paper/000058.pdf
Böck, S., Krebs, F., & Widmer, G. (2016). Joint Beat and Downbeat Tracking with Recurrent Neural Networks. ISMIR, 255–261.
https://archives.ismir.net/ismir2016/paper/000186.pdf
Burger, B., Thompson, M. R., Luck, G., Saarikallio, S., & Toiviainen, P. (2013). Influences of Rhythm- and Timbre-Related Musical Features on Characteristics of Music-Induced Movement. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00183
Calvo-Zaragoza, J., Jr., J. H., & Pacha, A. (2021). Understanding Optical Music Recognition. ACM Computing Surveys, 53(4), 1–35. https://doi.org/10.1145/3397499
Camacho, A., & Harris, J. G. (2008). A sawtooth waveform-inspired pitch estimator for speech and music. The Journal of the Acoustical Society of America, 124(3), 1638–1652.
Cambouropoulos, E., Kaliakatsos-Papakostas, M. A., & Tsougras, C. (2014). An idiom-independent representation of chords for computational music analysis and generation. ICMC. http://users.auth.gr/~emilios/papers/icmc-smc2014-GCT.pdf
Chan, W.-Y., Qu, H., & Mak, W.-H. (2009). Visualizing the semantic structure in classical music works. IEEE Transactions on Visualization and Computer Graphics, 16(1), 161–173.
Chen, L., Zheng, X., Zhang, C., Guo, L., & Yu, B. (2022). Multi-scale temporal-frequency attention for music source separation. 2022 IEEE International Conference on Multimedia and Expo (ICME), 1–6.
https://ieeexplore.ieee.org/abstract/document/9859957/?casa_token=GlxmE9jMDAsAAAAA:ALnSmw-XJdKNYwxmTb10Y12XcYrEwKTstEjNrq6l4TQr6LYboINPWQwom46zxrl643ALXCXNRWE
Chu, X. (2022). Feature Extraction and Intelligent Text Generation of Digital Music. Computational Intelligence and Neuroscience, 2022.
https://www.hindawi.com/journals/cin/2022/7952259/
Ciuha, P., Klemenc, B., & Solina, F. (2010). Visualization of concurrent tones in music with colors. Proceedings of the 18th ACM International Conference on Multimedia, 1677–1680. https://doi.org/10.1145/1873951.1874320
Clarke, E., DeNora, T., & Vuoskoski, J. (2015). Music, empathy, and cultural understanding. Physics of Life Reviews, 15, 61–88.
Cooper, M., Foote, J., Pampalk, E., & Tzanetakis, G. (2006). Visualization in audio-based music information retrieval. Computer Music Journal, 30(2), 42–62.
Coorevits, E., Moelants, D., Maes, P.-J., & Leman, M. (2019). Exploring the effect of tempo changes on violinists’ body movements. Musicae Scientiae, 23(1), 87–110. https://doi.org/10.1177/1029864917714609
Cousineau, M., Carcagno, S., Demany, L., & Pressnitzer, D. (2014). What is a melody? On the relationship between pitch and brightness of timbre. Frontiers in Systems Neuroscience, 7. https://doi.org/10.3389/fnsys.2013.00127
Cu, J., Cabredo, R., Legaspi, R., & Suarez, M. T. (2012). On Modelling Emotional Responses to Rhythm Features. In P. Anthony, M. Ishizuka, & D. Lukose (Eds.), PRICAI 2012: Trends in Artificial Intelligence (Vol. 7458, pp. 857–860). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-32695-0_85
Dalla Bella, S., Peretz, I., Rousseau, L., & Gosselin, N. (2001). A developmental study of the affective value of tempo and mode in music. Cognition, 80(3), B1–B10.
Dalton, B., Johnson, D., & Tzanetakis, G. (2019).Daw integratedd beat tracking for music production. Proc. Sound Music Comput. Conf, 7–11.
https://smc2019.uma.es/articles/P1/P1_01_SMC2019_paper.pdf
Défossez, A., Usunier, N., Bottou, L., & Bach, F. (2021). Music Source Separation in the Waveform Domain (arXiv:1911.13254). arXiv. http://arxiv.org/abs/1911.13254
Degara, N., Rúa, E. A., Pena, A., Torres-Guijarro, S., Davies, M. E., & Plumbley, M. D. (2011). Reliability-informed beat tracking of musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 290–301.
Dobrota, S., & Rei? Ercegovac, I. (2015). The relationship between music preferences of different modes and tempo and personality traits – implications for music pedagogy. Music Education Research, 17(2), 234–247.
https://doi.org/10.1080/14613808.2014.933790
Dong, H.-W., Hsiao, W.-Y., Yang, L.-C., & Yang, Y.-H. (2018). Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://ojs.aaai.org/index.php/AAAI/article/view/11312
Donnelly, P. J., & Sheppard, J. W. (2013). Classification of musical timbre using Bayesian networks. Computer Music Journal, 37(4), 70–86.
Driedger, J., Schreiber, H., de Haas, W. B., & Müller, M. (2019). Towards Automatically Correcting Tapped Beat Annotations for Music Recordings. ISMIR, 200–207. https://www.academia.edu/download/79136113/000022.pdf
Eghbal-Zadeh, H., Lehner, B., Schedl, M., & Widmer, G. (2015). I-Vectors for Timbre-Based Music Similarity and Music Artist Classification. ISMIR, 554–560. https://archives.ismir.net/ismir2015/paper/000128.pdf
Farbood, M. M. (2012). A parametric, temporal model of musical tension. Music Perception, 29(4), 387–428.
Flexer, A., Levé, F., Peeters, G., & Urbano, J. (2020). Introduction to the Special Collection" 20th Anniversary of ISMIR". Trans. Int. Soc. Music. Inf. Retr., 3(1), 218–220.
Fonteles, J. H., Rodrigues, M. A. F., & Basso, V. E. (2014). Real-time animations of virtual fountains based on a particle system for visualizing the musical structure. 2014 XVI Symposium on Virtual and Augmented Reality, 171–180.
https://ieeexplore.ieee.org/abstract/document/6913091/
Fonteles, J. H., Rodrigues, M. A. F., & Basso, V. E. D. (2013). Creating and evaluating a particle system for music visualization. Journal of Visual Languages & Computing, 24(6), 472–482.
Getz, L. M., Marks, S., & Roy, M. (2014). The influence of stress, optimism, and music training on music uses and preferences. Psychology of Music, 42(1), 71–85. https://doi.org/10.1177/0305735612456727
Ghahremani, P., BabaAli, B., Povey, D., Riedhammer, K., Trmal, J., & Khudanpur, S. (2014). A pitch extraction algorithm tuned for automatic speech recognition. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2494–2498. https://doi.org/10.1109/ICASSP.2014.6854049
Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience, 19(5), 893–906.
Greer, T., Singla, K., Ma, B., & Narayanan, S. (2019). Learning shared vector representations of lyrics and chords in music. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3951–3955. https://ieeexplore.ieee.org/abstract/document/8683735/?casa_token=YEdIbA5_CNAAAAAA:ByqvQ2TnFiBvtSZloSkf5T-XkUclouGmO1nJb4BjVHnanvIgxYKJvptG-XufCTNHVRYSPfKNRts
Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET investigation of auditory imagery for familiar melodies. Cerebral Cortex, 9(7), 697–704.
Herremans, D., Chuan, C.-H., & Chew, E. (2018). A Functional Taxonomy of Music Generation Systems. ACM Computing Surveys, 50(5), 1–30. https://doi.org/10.1145/3108242
Holzapfel, A., Davies, M. E., Zapata, J. R., Oliveira, J. L., & Gouyon, F. (2012). Selective sampling for beat tracking evaluation. IEEE Transactions on Audio, Speech, and Language Processing, 20(9), 2539–2548.
Hosoda, Y., Kawamura, A., & Iiguni, Y. (2021). Pitch Estimation Algorithm for Narrowband Speech Signal using Phase Differences between Harmonics. 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 920–925.
Huang, H., Wang, K., Hu, Y., & Li, S. (2021). Encoder-decoder-based pitch tracking and joint model training for Mandarin tone classification. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6943–6947. https://ieeexplore.ieee.org/abstract/document/9413888/?casa_token=TK1jwxpmfosAAAAA:2gJRIMhWOe4EASZ8W30G3TnKTBK0vu_fbvFEbjpHlZAQGVAeXMxFZYxsqW_FHm6KP6dOx9yH2AY
Janata, P., Tomic, S. T., & Haberman, J. M. (2012). Sensorimotor coupling in music and the psychology of the groove. Journal of Experimental Psychology: General, 141(1), 54.
Jeong, W.-U., & Kim, S.-H. (2019). Synesthesia Visualization of Music Waveform:‘Kinetic Lighting for Music Visualization. International Journal of Asia Digital Art and Design Association, 23(2), 22–27.
Juslin, P. N., Harmat, L., & Eerola, T. (2014). What makes music emotionally significant? Exploring the underlying mechanisms. Psychology of Music, 42(4), 599–623. https://doi.org/10.1177/0305735613484548
Karageorghis, C. I., Cheek, P., Simpson, S. D., & Bigliassi, M. (2018). Interactive effects of music tempi and intensities on grip strength and subjective effect. Scandinavian Journal of Medicine & Science in Sports, 28(3), 1166–1175. https://doi.org/10.1111/sms.12979
Karageorghis, C., Jones, L., & Stuart, D. (2008). Psychological Effects of Music Tempi during Exercise. International Journal of Sports Medicine, 29(7), 613–619. https://doi.org/10.1055/s-2007-989266
Khulusi, R., Kusnick, J., Meinecke, C., Gillmann, C., Focht, J., & Jänicke, S. (2020). A Survey on Visualizations for Musical Data. Computer Graphics Forum, 39(6), 82–110. https://doi.org/10.1111/cgf.13905
Kim, J., Ananthanarayan, S., & Yeh, T. (2015). Seen music: Ambient music data visualization for children with hearing impairments. Proceedings of the 14th International Conference on Interaction Design and Children, 426–429.
https://doi.org/10.1145/2771839.2771870
Kim, J. W., Bittner, R., Kumar, A., & Bello, J. P. (2019). Neural music synthesis for flexible timbre control. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 176–180.
https://ieeexplore.ieee.org/abstract/document/8683596/?casa_token=SysTsnxUyiwAAAAA:nenIkKjNeykOixUX3v5KHNm5JShJ2FqznqpKJ0YPXTLRWyc8zjK-AKwFMnd2_p2gHE_ZbPcqJVg
Kim, J. W., Salamon, J., Li, P., & Bello, J. P. (2018). Crepe: A Convolutional Representation for Pitch Estimation. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 161–165. https://doi.org/10.1109/ICASSP.2018.8461329
Klapuri, A. (2008). Multipitch analysis of polyphonic music and speech signals using an auditory model. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 255–266.
Koelsch, S., Gunter, T., Friederici, A. D., & Schröger, E. (2000). Brain indices of music processing: “nonmusicians” are musical. Journal of Cognitive Neuroscience, 12(3), 520–541.
Koelsch, S., & Jäncke, L. (2015). Music and the heart. European Heart Journal, 36(44), 3043–3049.
Koelsch, S., Rohrmeier, M., Torrecuso, R., & Jentschke, S. (2013). Processing of hierarchical syntactic structure in music. Proceedings of the National Academy of Sciences, 110(38), 15443–15448. https://doi.org/10.1073/pnas.1300272110
Krumhansl, C. L. (2000). Rhythm and pitch in music cognition. Psychological Bulletin, 126(1), 159.
Lahdelma, I., & Eerola, T. (2016). Single chords convey distinct emotional qualities to both naïve and expert listeners. Psychology of Music, 44(1), 37–54. https://doi.org/10.1177/0305735614552006
Lerch, A., & Knees, P. (2021). Machine learning applied to music/audio signal processing. In Electronics (Vol. 10, Issue 24, p. 3077). MDPI. https://www.mdpi.com/2079-9292/10/24/3077
Levitin, D. J., Grahn, J. A., & London, J. (2018). The Psychology of Music: Rhythm and Movement. Annual Review of Psychology, 69(1), 51–75. https://doi.org/10.1146/annurev-psych-122216-011740
Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R., & Pfister, H. (2014). UpSet: Visualization of intersecting sets. IEEE Transactions on Visualization and Computer Graphics, 20(12), 1983–1992.
Li, B., Liu, X., Dinesh, K., Duan, Z., & Sharma, G. (2018). Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications. IEEE Transactions on Multimedia, 21(2), 522–535.
Lima, H. B., Santos, C. G. R. D., & Meiguins, B. S. (2022). A Survey of Music Visualization Techniques. ACM Computing Surveys, 54(7), 1–29. https://doi.org/10.1145/3461835
Lin, Q., Lu, L., Weare, C., & Seide, F. (2010). Music rhythm characterization with application to workout-mix generation. 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 69–72.
https://ieeexplore.ieee.org/abstract/document/5496203/?casa_token=1cJI1537zmEAAAAA:jKnVsFY_UMKHG3helg4OXLm7TALSOXonut0eCQAYzqbR_xZSDIJdQpmGrsCWRlTTKI6Ebe4L8hs
Lluís, F., Pons, J., & Serra, X. (2019). End-to-end music source separation: Is it possible in the waveform domain? (arXiv:1810.12187). arXiv. http://arxiv.org/abs/1810.12187
Lu, C.-Y., Xue, M.-X., Chang, C.-C., Lee, C.-R., & Su, L. (2019). Play as you like: Timbre-enhanced multi-modal music style transfer. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 1061–1068.
https://aaai.org/ojs/index.php/AAAI/article/view/3897
Lui, S. (2013). A MUSIC TIMBRE SELF-TRAINING TOOL ON MOBILE DEVICE USING VOLUME NORMALIZED SIMPLIFIED SPECTRAL INFORMATION. ICMC.
https://www.researchgate.net/profile/Simon-Lui-2/publication/288201909_A_music_timbre_self-training_tool_on_mobile_device_using_volume_normalized_simplified_spectral_information/links/580097d908aec5444b724df8/A-music-timbre-self-training-tool-on-mobile-device-using-volume-normalized-simplified-spectral-information.pdf
Malandrino, D., Pirozzi, D., Zaccagnino, G., & Zaccagnino, R. (2015). A color-based visualization approach to understand harmonic structures of musical compositions. 2015 19th International Conference on Information Visualisation, 56–61.
https://ieeexplore.ieee.org/abstract/document/7272579/
Malandrino, D., Pirozzi, D., & Zaccagnino, R. (2018). Visualization and music harmony: Design, implementation, and evaluation. 2018 22nd International Conference Information Visualisation (IV), 498–503. https://ieeexplore.ieee.org/abstract/document/8564210/
Margulis, E. H. (2005). A model of melodic expectation. Music Perception, 22(4), 663–714.
McDermott, J. H., Schultz, A. F., Undurraga, E. A., & Godoy, R. A. (2016). Indifference to dissonance in native Amazonians reveals cultural variation in music perception. Nature, 535(7613), 547–550.
McLeod, P., & Wyvill, G. (2003). Visualization of musical pitch. Proceedings Computer Graphics International 2003, 300–303.
https://ieeexplore.ieee.org/abstract/document/1214486/
Miller, M., Bonnici, A., & El-Assady, M. (2019). Augmenting Music Sheets with Harmonic Fingerprints. Proceedings of the ACM Symposium on Document Engineering 2019, 1–10. https://doi.org/10.1145/3342558.3345395
Mingyang Wu, DeLiang Wang, & Brown, G. J. (2003). A multipitch tracking algorithm for noisy speech. IEEE Transactions on Speech and Audio Processing, 11(3), 229–241. https://doi.org/10.1109/TSA.2003.811539
Mo, S., & Niu, J. (2017). A novel method based on OMPGW method for feature extraction in automatic music mood classification. IEEE Transactions on Affective Computing, 10(3), 313–324.
Murthy, Y. V. S., & Koolagudi, S. G. (2019). Content-Based Music Information Retrieval (CB-MIR) and Its Applications toward the Music Industry: A Review. ACM Computing Surveys, 51(3), 1–46. https://doi.org/10.1145/3177849
Nakamura, T., & Saruwatari, H. (2020). Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 386–390. https://doi.org/10.1109/ICASSP40776.2020.9053934
Nanayakkara, S. C., Taylor, E., Wyse, L., & Ong, S. H. (2007). Towards building an experiential music visualizer. 2007 6th International Conference on Information, Communications & Signal Processing, 1–5. https://ieeexplore.ieee.org/abstract/document/4449609/
Nanni, L., Costa, Y. M., Lumini, A., Kim, M. Y., & Baek, S. R. (2016). Combining visual and acoustic features for music genre classification. Expert Systems with Applications, 45, 108–117.
Neuhoff, H., Polak, R., & Fischinger, T. (2017). Perception and evaluation of timing patterns in drum ensemble music from Mali. Music Perception: An Interdisciplinary Journal, 34(4), 438–451.
Nieto, O., Mysore, G. J., Wang, C., Smith, J. B., Schlüter, J., Grill, T., & McFee, B. (2020). Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications. Trans. Int. Soc. Music. Inf. Retr., 3(1), 246–263.
Ohmi, K. (2007). Music Visualization in Style and Structure. Journal of Visualization, 10(3), 257–258. https://doi.org/10.1007/BF03181691
Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Driessche, G., Lockhart, E., Cobo, L., & Stimberg, F. (2018). Parallel wavenet: Fast high-fidelity speech synthesis. International Conference on Machine Learning, 3918–3926. https://proceedings.mlr.press/v80/oord18a.html
Oord, A. van den, Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., & Kavukcuoglu, K. (2016). WaveNet: A Generative Model for Raw Audio (arXiv:1609.03499). arXiv. https://doi.org/10.48550/arXiv.1609.03499
Oord, A. van den, Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Driessche, G. van den, Lockhart, E., Cobo, L. C., Stimberg, F., Casagrande, N., Grewe, D., Noury, S., Dieleman, S., Elsen, E., Kalchbrenner, N., Zen, H., Graves, A., King, H., … Hassabis, D. (2017). Parallel WaveNet: Fast High-Fidelity Speech Synthesis (arXiv:1711.10433). arXiv. https://doi.org/10.48550/arXiv.1711.10433
Oramas, S., Espinosa-Anke, L., Sordo, M., Saggion, H., & Serra, X. (2016). Information extraction for knowledge base construction in the music domain. Data & Knowledge Engineering, 106, 70–83.
Oxenham, A. J. (2012). Pitch Perception. Journal of Neuroscience, 32(39), 13335–13338. https://doi.org/10.1523/JNEUROSCI.3815-12.2012
Palmer, S. E., Schloss, K. B., Xu, Z., & Prado-León, L. R. (2013). Music–color associations are mediated by emotion. Proceedings of the National Academy of Sciences, 110(22), 8836–8841. https://doi.org/10.1073/pnas.1212562110
Papantonakis, P., Garoufis, C., & Maragos, P. (2022). Multi-band Masking for Waveform-based Singing Voice Separation. 2022 30th European Signal Processing Conference (EUSIPCO), 249–253. https://ieeexplore.ieee.org/abstract/document/9909713/
Patil, K., Pressnitzer, D., Shamma, S., & Elhilali, M. (2012). Music in our ears: The biological bases of musical timbre perception. PLoS Computational Biology, 8(11), e1002759.
Pauwels, J., & Peeters, G. (2013). Segmenting music through the joint estimation of keys, chords and structural boundaries. Proceedings of the 21st ACM International Conference on Multimedia, 741–744. https://doi.org/10.1145/2502081.2502193
Percival, G., & Tzanetakis, G. (2014). Streamlined tempo estimation based on autocorrelation and cross-correlation with pulses. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12), 1765–1776.
Pérez-Marcos, J., Jiménez-Bravo, D. M., De Paz, J. F., Villarrubia González, G., López, V. F., & Gil, A. B. (2020). Multi-agent system application for music features extraction, meta-classification and context analysis. Knowledge and Information Systems, 62(1), 401–422. https://doi.org/10.1007/s10115-018-1319-2
Pinto, A. S., Böck, S., Cardoso, J. S., & Davies, M. E. (2021). User-driven fine-tuning for beat tracking. Electronics, 10(13), 1518.
Polansky, L., & Bassein, R. (1992). Possible and impossible melody: Some formal aspects of contour. Journal of Music Theory, 36(2), 259–284.
Polo, A., & Sevillano, X. (2019). Musical Vision: An interactive bio-inspired sonification tool to convert images into music. Journal on Multimodal User Interfaces, 13(3), 231–243. https://doi.org/10.1007/s12193-018-0280-4
Pons, J., & Serra, X. (2017). Designing efficient architectures for modeling temporal features with convolutional neural networks. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2472–2476. https://doi.org/10.1109/ICASSP.2017.7952601
Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., Rastrow, A., Rose, R. C., Schwarz, P., & Thomas, S. (2011). The subspace Gaussian mixture model—A structured model for speech recognition. Computer Speech & Language, 25(2), 404–439. https://doi.org/10.1016/j.csl.2010.06.003
Pressnitzer, D., McAdams, S., Winsberg, S., & Fineberg, J. (2000). Perception of musical tension for nontonal orchestral timbres and its relation to psychoacoustic roughness. Perception & Psychophysics, 62(1), 66–80. https://doi.org/10.3758/BF03212061
Puzo?, B., & Kosugi, N. (2011). Extraction and visualization of the repetitive structure of music in acoustic data: Misual project. Proceedings of the 13th International Conference on Information Integration and Web-Based Applications and Services, 152–159. https://doi.org/10.1145/2095536.2095563
Queiroz, A., & Coelho, R. (2022). Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 2504–2513. https://doi.org/10.1109/TASLP.2022.3190670
Quinton, E. (2017). Towards the Automatic Analysis of Metric Modulations [PhD Thesis, Queen Mary University of London].
https://qmro.qmul.ac.uk/xmlui/handle/123456789/25936
Rajan, R., Misra, M., & Murthy, H. A. (2017). Melody extraction from music using modified group delay functions. International Journal of Speech Technology, 20(1), 185–204. https://doi.org/10.1007/s10772-017-9397-1
Ramadhana, Z. H. G., & Widiarthaa, I. M. (n.d.). Classification of Pop And RnB (Rhythm And Blues) Songs With MFCC Feature Extraction And K-NN Classifier. Jurnal Elektronik Ilmu Komputer Udayana P-ISSN, 2301, 5373.
Reddy, G. S. R., & Rompapas, D. (2021). Liquid Hands: Evoking Emotional States via Augmented Reality Music Visualizations. ACM International Conference on Interactive Media Experiences, 305–310. https://doi.org/10.1145/3452918.3465496
Ren, J.-M., Wu, M.-J., & Jang, J.-S. R. (2015). Automatic music mood classification based on timbre and modulation features. IEEE Transactions on Affective Computing, 6(3), 236–246.
Repp, B. H. (2005). Sensorimotor synchronization: A review of the tapping literature. Psychonomic Bulletin & Review, 12(6), 969–992. https://doi.org/10.3758/BF03206433
Richter, J. (2019). Style-Specific Beat Tracking with Deep Neural Networks. https://www.static.tu.berlin/fileadmin/www/10002020/Dokumente/Abschlussarbeiten/Richter_MasA.pdf
Rocha, B., Bogaards, N., & Honingh, A. (2013). Segmentation and timbre-and rhythm-similarity in Electronic Dance Music. https://eprints.illc.uva.nl/482/
Rosemann, S., Altenmüller, E., & Fahle, M. (2016). The art of sight-reading: Influence of practice, playing tempo, complexity and cognitive skills on the eye–hand span in pianists. Psychology of Music, 44(4), 658–673.
https://doi.org/10.1177/0305735615585398
Roy, W. G., & Dowd, T. J. (2010). What Is Sociological about Music? Annual Review of Sociology, 36(1), 183–203. https://doi.org/10.1146/annurev.soc.012809.102618
Salamon, J., & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
Salamon, J., Gómez, E., Ellis, D. P., & Richard, G. (2014). Melody extraction from polyphonic music signals: Approaches, applications, and challenges. IEEE Signal Processing Magazine, 31(2), 118–134.
Salamon, J., Rocha, B., & Gómez, E. (2012). Musical genre classification using melody features extracted from polyphonic music signals. 2012 Ieee International Conference on Acoustics, Speech and Signal Processing (Icassp), 81–84.
https://ieeexplore.ieee.org/abstract/document/6287822/
Schedl, M., Gómez, E., & Urbano, J. (2014). Music information retrieval: Recent developments and applications. Foundations and Trends® in Information Retrieval, 8(2–3), 127–261.
Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., & Skerrv-Ryan, R. (2018). Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4779–4783.
https://ieeexplore.ieee.org/abstract/document/8461368/
Shin, S., Yun, H., Jang, W., & Park, H. (2019). Extraction of acoustic features based on auditory spike code and its application to music genre classification. IET Signal Processing, 13(2), 230–234. https://doi.org/10.1049/iet-spr.2018.5158
Smith, S. M., & Williams, G. N. (1997). A visualization of music. Proceedings. Visualization’97 (Cat. No. 97CB36155), 499–503.
https://ieeexplore.ieee.org/abstract/document/663931/
Steinmetz, C. J., & Reiss, J. D. (2021). WaveBeat: End-to-end beat and downbeat tracking in the time domain (arXiv:2110.01436). arXiv. http://arxiv.org/abs/2110.01436
Swaminathan, S., & Schellenberg, E. G. (2015). Current Emotion Research in Music Psychology. Emotion Review, 7(2), 189–197.
https://doi.org/10.1177/1754073914558282
Thaut, M. H., Trimarchi, P. D., & Parsons, L. M. (2014). Human brain basis of musical rhythm perception: Common and distinct neural substrates for meter, tempo, and pattern. Brain Sciences, 4(2), 428–452.
Town, S. M., & Bizley, J. K. (2013). Neural and behavioral investigations into timbre perception. Frontiers in Systems Neuroscience, 7.
https://doi.org/10.3389/fnsys.2013.00088
Van Der Zwaag, M. D., Westerink, J. H. D. M., & Van Den Broek, E. L. (2011). Emotional and psychophysiological responses to tempo, mode, and percussiveness. Musicae Scientiae, 15(2), 250–269. https://doi.org/10.1177/1029864911403364
Virtala, P., Huotilainen, M., Partanen, E., Fellman, V., & Tervaniemi, M. (2013). Newborn infants’ auditory system is sensitive to Western music chord categories. Frontiers in Psychology, 4, 47528.
Virtala, P., Huotilainen, M., Partanen, E., & Tervaniemi, M. (2014). Musicianship facilitates the processing of Western music chords—An ERP and behavioral study. Neuropsychologia, 61, 247–258.
Wang, Y., Salamon, J., Cartwright, M., Bryan, N. J., & Bello, J. P. (2020). Few-Shot Drum Transcription in Polyphonic Music (arXiv:2008.02791). arXiv.
http://arxiv.org/abs/2008.02791
Wu, Y.-C., Hayashi, T., Tobing, P. L., Kobayashi, K., & Toda, T. (2021). Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 1134–1148. https://doi.org/10.1109/TASLP.2021.3061245
Yu, S., Sun, X., Yu, Y., & Li, W. (2021). Frequency-temporal attention network for singing melody extraction. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 251–255.
https://ieeexplore.ieee.org/abstract/document/9413444/
Yu, S., Yu, Y., Sun, X., & Li, W. (2023a). A neural harmonic-aware network with gated attentive fusion for singing melody extraction. Neurocomputing, 521, 160–171.
Yu, S., Yu, Y., Sun, X., & Li, W. (2023b). A neural harmonic-aware network with gated attentive fusion for singing melody extraction. Neurocomputing, 521, 160–171. https://doi.org/10.1016/j.neucom.2022.11.086
Zamm, A., Schlaug, G., Eagleman, D. M., & Loui, P. (2013). Pathways to seeing music: Enhanced structural connectivity in colored-music synesthesia. Neuroimage, 74, 359–366.
Zatorre, R. J., & Baum, S. R. (2012). Musical melody and speech intonation: Singing a different tune. https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001372
Zhang, J. (2022). Music Data Feature Analysis and Extraction Algorithm Based on Music Melody Contour. Mobile Information Systems, 2022.
https://www.hindawi.com/journals/misy/2022/8030569/
Zhu, Y. (2022). Recognition Method of Matching Error between Dance Action and Music Beat Based on Data Mining. Security and Communication Networks, 2022. https://www.hindawi.com/journals/scn/2022/8176863/