Automatic Music Transcription using Audio-Visual Fusion for Violin Practice in Home Environment

No Thumbnail Available
Date
2009-07-03T09:02:40Z
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Violin practice in a home environment, where there is often no teacher available, can benefit from automatic music transcription to provide feedback to the student. This paper describes a high performance violin transcription system with three main contributions. First, as onset detection is an important but challenging task for automatic transcription of pitched non-percussive music, such as from the violin, we propose an effective audio-only onset detection approach based on supervised learning. The proposed approach outperforms the state-of-the-art methods substantially. Second, we introduce the visual modality, i.e., bowing and fingering of the violin playing, to infer onsets, thus enhancing the audio-only onset detection. We devise automatic and real-time video processing algorithms to extract indicative features of onsets from bowing and fingering videos. Third, we evaluate state-of-the-art multimodal fusion techniques to fuse audio and visual modalities and show this improves onset detection and transcription performance significantly. The audio-visual fusion based violin transcription system provides more accurate transcribed results as learning feedback even in acoustically inferior environments. With efficient and fully automatic audio-visual analysis components, the system can be easily deployed in a home environment.
Description
Keywords
Citation