To make sure this kind of research is ecologically valid, it is important to record various types of real musical performances in a good concert hall. So we hired professional musicians to form a string quartet and a piano trio, and also had kind performance contributions from colleague pipe organist and a student A Cappella ensemble. We also managed to capture room impulse responses (RIRs) for 13 different source positions using all of the 71 mics. The recording venue was St.Paul’s concert hall based at the University of Huddersfield, which has been a crucial venue for my 3D audio research over the last 10 years. It used to be a Victorian-style church until it was converted into a concert hall several decades ago, and is a main venue for Huddersfield Contemporary Music Festival. St.Paul’s has a reverb time of 2.1 seconds, with a lot of it developed by reverberation from a high ceiling, which is perfect for capturing diffuse ambience for the height channels in 3D reproduction.
The microphone arrays included in this project are as follows:
- Decca Cuboid (Decca Tree with surround and height channels)
- Hamasaki Square with two types of height configurations
- mhAcoustics Eigenmike EM32 (Higher-Order Ambisonics up to 4th order)
- Sennheiser Ambeo VR
- Neumann KU100 Dummy head
- Additional microphones for side/height channels, floor channels, overhead channel and spot microphones for individual instruments.
The details of the recording setup can be found in a related AES paper and an article I wrote for Resolution Magazine. There is also YouTube video that DPA filmed on site during the recording session.
The database is now being used by a number of academics, researchers and developers for spatial audio research, critical listening, recording education, etc. At the time of this writing, the 3D-MARCo database has been downloaded 3975 times from the Zenodo repository.
The original plan was to conduct a subjective listening test to elicit salient perceptual differences between all of these microphone arrays earlier this year, but due to Covid this has been postponed to an unknown future unfortunately (hopefully sometime next year). However, during the lockdown period, I’ve worked on the “objective” analyses of the arrays using the RIRs instead, including the parameters of interchannel crosstalk, interchannel cross-correlation, interaural cross-correlation, fluctuations of interaural level and time differences, D/R ratio and spectral distortion caused by the height microphones. The results are going to be published as a journal paper hopefully soon. For now, I can share some interesting insights obtained from the analyses.
- There were substantial differences among the arrays in the amount of both horizontal and vertical interchannel crosstalk, and this was found to be related to the considerable differences in the amount of spectral distortion in the ear signal as well as in the magnitude of ILD and ITD fluctuation over time. From this, it is expected that the arrays would have audible differences in perceived timbral characteristics as well as the localisation stability and spread of phantom image.
- The arrays would have a considerable difference in the perceived magnitudes of horizontal spatial impression (e.g., ASW and LEV) and the size of listening area due to the different degree of interchannel decorrelation. Considerable differences in vertical decorrelation were also observed, but based on previous research, this is hypothesised to have a minimal effect on perceived vertical image spread, based on the literature.
- The analysis of interaural cross-correlation suggests that the addition of the height layer to the base layer would have a minor effect on ASW and LEV regardless of the array, even though the base and height layers might have audible differences independently.
- The differences in the D/R ratios of ear-input signals resulting from the 9-channel playback were around or below the just noticeable difference of perceived auditory distance, even though individual microphones had larger differences especially in the rear channels. This raises an interesting question as to whether it would be the channel-dependent balance of D/R ratio or the D/R ratio of the final ear signal that affects perceived auditory distance.
More details with graphs will be available in a paper soon. In the mean time, checked out 3D-MARCo if you haven’t yet. The download link has a Reaper session template including a binaural playback configuration for easily comparing all the recordings back to back.