In this paper we proposed an unsupervised algorithm to estimate the reverberation time (RT) directly from the reverberant speech signal. For estimation process we use maximum likelihood estimation (MLE) which is a very wellknown and state of the art method for estimation in the field of signal processing. All existing RT estimation methods are based on the decay rate distribution. The decay rate can be obtained either from the energy envelop decay curve analysis of noise source when it is switch off or from decay curve of impulse response of an enclosure. The analysis of a preexisting method of reverberation time estimation is the foundation of the proposed method. In one of the state of the art method, the reverberation decay is modeled as a Laplacian distribution. In this paper, the proposed method models the reverberation decay as a Gamma distribution along with the unification of an effective technique for spotting free decay in reverberant speech. Maximum likelihood estimation technique is then used to estimate the RT from the free decays. The method was motivated by our observation that the RT of a reverberant signal when falls in specific range, then the decay rate of the signal follows Gamma distribution. Experiments are carried out on different reverberant speech signal to measure the accuracy of the suggested method. The experimental results reveal that the proposed method performs better and the accuracy is high in comparison to the state of the art method.
1. Introduction
Reverberation is an important problem for all those engineers and scientist whom deals in the processing of audio signals and room acoustics. Reverberation time (RT) is an important parameter in the quality based categorization of an auditory enclosure. The speech quality and intelligibility observed in an enclosure is indicated by the estimated RT of that speech. After switching off a sound source present in an enclosed environment, linger time of sound is the RT of that enclosure. The time taken by the sound level to have a decrement of 60dB than its cessation value is also referred as RT
[1
,
2]
. The main cause of reverberation is the multipath propagation of reflected signal
b
(
n
) from a source to microphone. The signal received at the microphone can be expressed as
Where
h
(
n
) is the impulse response of the enclosure and
c
(
n
) is the noise present in the enclosure. Some of the major issues associated with reverberant speech is sounding distinct, echo, lower quality, intelligibility, distortion in the energy envelop and fine structure of the speech signal. This issue becomes more problematic and heightened in the unwired or handsfree systems and speech recognition software’s.
Various algorithms have been developed to estimate the RT. In the beginning of the 20
^{th}
century Sabine
[3]
proposed the first empirical formula for the prediction of RT in an enclosed environment e.g. room. The base of the formula is purely laid on the geometry (volume and surface area) of the enclosed environment and the absorption capabilities of the enclosure surfaces. To increase the accuracy of the Sabine RT formula, extensive modification has been done (for modifications details refer
[1]
). These RT calculating formulae is used in the designing of the places where the sound quality has as great importance like classrooms, anechoic chamber, concert halls. However, to implement these formulae the pre determination of the geometry and the absorptive characteristic of the enclosure is required. As these determinations are not easily possible so new methods is required, based only on the enclosed environment radiated test sound signal.
Such RT measuring methods uses the decay curve, like interrupted noise method
[4]
which used a narrow or wide band noise burst to radiate in the enclosed environment. The decay curve is obtained by switching off the noise source after reaching its steady state. In RT estimation the decay curve slope is used. The fluctuation presence in the noise source will result different decay curves from trial to trial. The averaging technique is applied on a large number of obtained decay curve to get a reliable RT value. In another method the segmentation process is used for the detection of gaps in the reverberant speech
[5
,
6]
which leads to the tracking of sound decay curve. In 1965 Schroeder addressed this problem by developing a method based on the integrated impulse response
[7]
. In this method the narrow or broad band excitation signal is replaced by a brief pulse. Schroeder reported an integral relationship between the overall average of the interrupted noise method, decay curves and the impulse response of the room, and thus the recurred trials were unnecessary.
Schroeder’s method ruled the area of voice signal processing for the past few decades however there was a need of a blind RT estimation technique, which has the ability to estimate RT from the available reverberant signals, without using the information of the enclosure or in the absence of the test sound signal. In hands free telephony devices or hearing aids incorporating, the method based on the direct analysis of the sound signal, will be very helpful. Recently many method have been deployed for the blind RT estimation
[2
,
8
,
9
,
10

12]
, uses the reverberant sound (recorded sound) directly for estimation. In all these methods, statistical modelling is the core followed by the MLE to determine the optimized RT. Some semiblind methods have also been developed
[13

16]
in which the enclosure characteristics are learned using neural network approaches. In neural networks first the network is trained by a set of data and then tested after that it is used for estimation. The estimation of reverberation characteristics using only the microphone (reverberant) signal is a good tool for the study of reverberations
[2]
.
Ratnam
et al
.
[2]
used noise decay curve approach to model the room reverberation characteristics in order to develop a blind RT estimation algorithm i.e. completely based on the reverberant sound. The test room sound signal is incessantly processed to achieve a running RT estimate by utilising the ML parameter estimation technique. An order statistics filter is used in decision making step to get the most probable RT from the pool of estimated RT (over a period of time). In
[2]
the objective of detecting sound decay is obtained by using the iterative technique. This iterative approach for the detection of the sound decay makes the algorithm computationally expensive. Ratnam
et al
. made some changes in his algorithm in
[2]
in order to improve the computational efficiency of it and presented a new algorithm in
[10]
. Lollmann
et al
.
[8]
recently developed an algorithm based on the sound decay statistical model
[2]
for the blind RT estimation directly from reverberant speech. The main advantage of the Lollmann
et al
. method is the detection of the possible sound decay via a pre selection method which leads towards an improvement in robustness of estimation and computational efficiency. In this paper we proposed a method for the blind RT estimation based on the Lollmann
et al
. method and a pre selection mechanism for the possible decay rate detection and used Gamma distribution for modelling the decay rate.
The rest of the paper is divided in four sections and is organized as follows. ML estimation procedure and the sound decay model used in the proposed method is discussed in Section 2. In Section 3 the estimation procedure of RT and its efficiency is discussed. Experimental results and proposed algorithm performance is evaluated in Section 4, and followed by Section 5 containing the conclusion.
2. Proposed Model and MLE
In this work, to model the reverberant signal energy decay, the statistical model used is based on the Gamma distribution. When the RT of a reverberant signal falls in a certain range then Gamma distribution is used for the better modelling of the amplitude distribution of the signal
[17]
. The proposed work is motivated from the method in
[17]
. A figure has been derived here based on the method in
[17]
and it shows the main motivation for this work as shown in
Fig. 1
. As the reverberation time lies in the lower range (0150ms) then the distribution pattern followed by signal is Gamma. In current work our main focus is on the small chamber such as telephone booth, aeroplane chamber (cockpit), ATM booth, where we have to develop a method to estimate the reverberation time properly such that enhanced speech signal can be achieved at the end. In such small chambers, RT generally lies in the lower range. Therefore reverberant speech decay curve would be modelled in the work based on Gamma distribution.
Reverberant speech distribution and theoretical PDF of different distribution (Laplace distribution blue line, Gamma distribution black line, Gaussian distribution pink line) with zero mean.
Hence, the decaying sound reverberant tail is modeled by employing Gamma distribution G(x, k, θ) along with a sequence of random variables. In G(x, k, θ)
k
is the shape factor and is equal to one for zero mean, θ is the variance of the Gamma distribution and
x
is the random variable. The general mathematical expression of Gamma distribution probability density is given below
[18]
.
where ⌈(
k
) is the gamma function of
k
. For zero mean we put
k
=1 in (2) and becomes
[18]
;
after simplification and calculating the value of ⌈(1) , (3) reduced to
[18]
,
In the evolution of model an assumption is made. In this assumption the reverberant decaying sound tail is represented by
y
which is the product of
x
and
a
. Where
x
is a random process and representing the fine structure, and
a
represents the deterministic envelope. Furthermore,
x
(
n
) is supposed as an independent and identically distribution (i.i.d.) random number sequence valid for
n
≥0 , having Gamma distribution with zero mean(
k
=1) and variance θ, G(1, θ). Likewise for each and every value of
n
a deterministic sequence
a
(
n
)>0 is defined in order to evolve the room decay (
y
) model
[2]
. The mathematical expression of the decay model is given as.
After rearranging the terms, (5) can be written as
As
y
(
n
) are independent but due to variation in
x
(
n
) with time the
y
(
n
) are not identically distributed and have a probability density function
G
(1, θ
a
(
n
)).
Select a finite observation sequence,
n
= 0,1,2,...,
N
−1, in order to estimate the decay rate of the reverberant signal. For the sake of simplicity and notational convenience, the Ndimensional vectors of y and a are announced as
y
and
a
respectively. Thus the likelihood function of
y
(the joint probability density), parameterized by
a
and θ is,
where
a
and θ are the (
N
+1) unknown parameters that are required to be estimated from the observation
y
. Here the main objective is the modelling of sound decay in an enclosure and to make some simplification to the acquired likelihood function obtained in (7). A supposition is made that the reverberant sound signal energy envelop damping is defined by a single decay rate
σ
in the intermediate free decay regions (i.e., the regions followed by steep speechsound offset) instead of the sound ongoing, onset, or gradually declining speech offsets regions. This leads to an expression for
a
(
n
) and is determined by
[2]
,
As the
N
dimensional parameter
a
(
n
) can be expressed in terms of
σ
and a single parameter
a, i.e., a
=
e
^{−1/σ}
by using a scalar parameter
a
instead of
a
(
n
) and expressed as
By putting the value of Eq. (9) in Eq. (8) we get
After incorporating Eq. (10), Eq. (7) becomes
We use mathematical induction to simplify Eq. (12) by replacing 0 →N−1 positive consecutive integer by its logical equivalent, so that
After making use of Eq. (13) and terms rearrangements, Eq. (12) becomes
For the estimation of the unknown parameters (
a
,θ) MLE is used. First step is to obtain the loglikelihood function, for this logarithm is applied on the both side of the Eq. (14)
Eq. (15) is then simplified by using logarithm identities and terms rearrangements
By making use of mathematical induction, Eq. (16) is more simplified and becomes
For the maximum of
ln
(
L
(
y
;
a, θ
)), we differentiate Eq. (17) with respect to
a
to get the score function S
f_{a}
[19]
.
To get the extremum achieved by the loglikelihood function, we put the differentiation resultant (Eq. (18)) equals to zero, given as
The estimated value of
a
, or in other words the zero of the scoring function is denoted by
a
_{Sf0}
. The value of
a
_{Sf0}
should satisfy Eq. (19). For verification that the estimated value of
a
(
a
_{Sf0}
) maximize the loglikelihood function, the second derivate test
, can be performed.
Similarly, the score function is obtained for
θ
by following the same procedure as followed for
a
. The log likelihood function in Eq. (17) is partially differentiated with respect to
θ
.
The score function (S
f_{θ}L
(
θ
;
y, a
)) is then put equal to zero to obtain an expression for
θ
.
The estimated value of
θ
is denoted by
θ
_{Sf0}
and should satisfy Eq. (21a). For the value of
θ
given in Eq. (21b) the loglikelihood function will achieve an extremum, for verification that the estimated value of
θ
maximizes the loglikelihood function, second derivative test
, can be performed
By observing estimation expression of
a
and
θ
represented by Eq. (19) and Eq. (21b) respectively, it is clear that Eq. (19) belongs to the category of the implicit expressions and its explicit solution will not exist, on the other hand Eq. (21b) fall in the category of explicit expressions and its explicit solution exists ,if
a
is known. It is already defined in Eq. (9), σ is the required time constant and to be estimated.
The mapping of
a
over the σ is observed and it revealed that
a
∈[0,1) maps onetoone onto σ∈[0, ∞). The method that we used here and the method used in
[2]
and
[10]
are similar because in both method quantization is the base for the estimation of
a
. To form the histogram of
a
, first the bins are created by quantizing the given range of
a
. For assigning values to these bin likelihood is used. When the likelihood values are calculated the maximum likelihood value is then assigned to that bin in the histogram.
Let X values are obtained after the quantization of
a
having the range
a
∈[0,1). The quantized values of
a
is then represented by
a_{w}
, where w=1, 2, 3,..., X and for each
a_{w}
the loglikelihood is calculated using Eq. (17) ad is given as;
And
can be decides as
The result obtained from Eq. (23) is then put in Eq. (9) to obtain the estimated decay rate
. At the end the value of RT
is calculated by using the formula from
[11]
.
3. Effective Estimation of RT
A looping approach is used in the parent method proposed in
[2]
to estimate the decay rate of the sound, but the computational resources requirement for that algorithm is significantly high. The computational efficiency of the parent technique is enhanced by the algorithm introduced in
[10]
in which the recorded signal is divided into frames and then processed instead of processing the whole signal to find the free sound decay regions for maximum likelihood estimation of the reverberant sound decay rate. The computational efficiency can be further enhanced by first capturing the reverberant signal free sound decay regions and then using those detected region for the ML estimation of the decay rate. That is the reason that this approach leads to the increment in computational efficiency by using only small portion of the reverberant signal for processing. This goal can be achieves by an estimation procedure proposed by Lollmann
et al
.
[8]
and have a residue benefit of diluting the outliers effects on RT estimated value. To enhance the Maximum Likelihood estimation of the Gamma parameters for our proposed method, we have used this procedure.
The reverberant speech signal is used as an input to the algorithm and is represented by
g
(
n
). The
g
(
n
) is a discrete time signal and the
n
represents its index. The
g
(
n
) has been processed framewise. The samples sequence is divided into frame and each frame has D samples, moved by an instant of ΔD
[8]
. The purpose of this division is to detect the free sound decay regions and is given as
where ϒ∈N. To spot the potential sound decays, pre selection is carried out in the first step. This objective is achieved by the division of
G
(
ϒ,d
) into T=D/Q∈N subframes.
where
j_{sub}
= 0,1,2,...,
Q
− 1 and
i_{sub}
= 0,1,2,...,
T
− 1 represents subframe indices. In next step we examined the maxima and minima of the subframe energy to find whether these values diverts from the next subframe values as done in
[9]
.
where
τ_{isub}
is used as a weight and have a range of 0≤
τ_{isub}
≤1. For some frames the above equations (27a, 27b, 27c) may not be satisfied due to the counter
i_{sub}
, when it touches its minimum value1 <
i_{submin}
<
T
− 2 and if this is not the case, the inequality check is ceased and the incoming signal frame
G
(ϒ +1,
b
) is computed. On the other hand the subframe sequence satisfying Eq. (27a), (27b), (27c) leads towards potential sound decay. RT
of the spotted frame is calculated for a finite band of RT values by making use of Eq. (22), (23), (9), (24).
The estimated values of RT i.e.
is the assigned to the bins in order to generate the histogram. The RT
value is update every time when a new value is generated. The bin size of the histogram is kept 10 for the sake of reducing the computational complexity and enhancing estimation accuracy.
Instead of the first peak, the maximum of the histogram is associated to the current estimated RT
, because of the reduced number of outliers (result of the frame preselection process). The frames preselection also effected the estimated RT variance, reduced it by recursive smoothing, and modified the estimation expression. The modified RT expression is given as
where 0.9 <β< 1 . The RT value is finally estimated by
Table 1
summarizes the blind RT estimation technique utilizing the Gamma distribution based statistical model for the decay of reverberant sound signal.
The proposed RT value estimation method
The proposed RT value estimation method
4. Experimental Result and Discussion
Matlab simulations are carried out to evaluate the performance of proposed method for the blind estimation of RT. For simulation our first goal was to generate 10 different reverberant speech signal. For this we convolved 10 anechoic speech signals with RIRs to get the desired signals (reverberant signals). These 10 anechoic speech signal are randomly selected from the TIMIT database. Five of these are uttered by male and five by female, sampled at 16 KHz. The RIRs are obtained from the AIR database
[20]
. To check the performance of the algorithm for different room environment, the RIRs selected were recorded in different environment (booth, lecture room, office, meeting room). A source microphone distance pair {
S
_{1}
,
S
_{2}
} m is selected for each environment respectively, i.e. {0.5, 1.5}, {2.25, 7.1}, {1, 3}, and {1.45, 2.8}. The values of other parameter used in simulation are given as:
X
=10,
T
=7,
i_{submin}
=3, β = 0.995,
D
= 1631 (corresponds approximately to a time span of 0.10 s),
Q
= 233, Δ
D
= 67 (corresponds approximately to a frame shift of 0.0042 s), and
_{isub}
= 1.
The process of reverberant speech generation is repeated 20 time for each enclosed environment. 10 reverberant speech signals were generated with respect to source microphone distance
S
_{1}
and 10 reverberant speech signals were generated with respect to source microphone distance
S
_{2}
. For four different test environments a total of 80 reverberant speech signals were generated. All the reverberant signals were then tested for the RT estimation and an average over the 10 different result were calculated for each source microphone distance and room type. These average values of the estimated RTs is given in
Fig. 2
and
3
respectively. The performance of the proposed method is evaluated by comparing the estimated RT value with the RT values estimated directly from the RIRs as done in
[7]
and with the RT values estimated using Laplace distribution
[21]
.
Different RT estimation methods performance evaluation in terms of accuracy achieved for different room environments from the AIR database. Blue bars represents the RT values estimated from the RIRs by Schroeder’s method [7], green bars shows RT values estimated by our proposed method and the brown bars shows RT estimated by using Laplace distribution [21]. The source to microphone distances for all of the four rooms are S_{1}= {0.5, 1.0, 1.45, 2.25} m respectively. The standard deviations are represented by short orange lines, plotted on top of the green bars.
Different RT estimation methods performance evaluation in terms of accuracy achieved for different room environments from the AIR database. Blue bars represents the RT values estimated from the RIRs by Schroeder’s method [7], green bars shows RT values estimated by our proposed method and the brown bars shows RT estimated by using Laplace distribution [21]. The source to microphone distances for all of the four rooms are S_{2}={1.5, 3.0, 2.8, 7.1} m respectively. The standard deviations are represented by short orange lines, plotted on top of the green bars.
Fig. 2
shows the results obtained for the shorter sourcemicrophone distances i.e.
S
_{1}
from the above used pairs, while
Fig. 3
shows the result obtained from the longer sourcemicrophone distances i.e.
S
_{2}
. It can be observed that for different environments the difference between the RT estimated from the RIRs (represented by blue bars) and the RT estimated by our proposed method (represented by green bar) is small.
For example, for the booth at the source to microphone distance
S
_{1}
the estimated RT value obtained by our proposed method is 0.347 seconds and the RT value estimated from the RIRs is 0.428 seconds. Similarly for the booth at the source to microphone distance
S
_{2}
the estimated RT value obtained by our proposed method is 0.451 seconds and the RT value estimatedfrom the RIRs is 0.456 seconds. Overall, the proposed method produced comparable results to the baseline method by Schroeder.
5. Conclusion
A novel approach for the blind estimation of reverberation time has been presented. The method is established on a Gamma distribution based statistical model for the decay of reverberant sound signal and a maximum likelihood approach for estimation of parameter. Experiments showed that the results obtained by utilizing our proposed method with real data (for booth RT value for
S
_{1}
and
S
_{2}
is 0.347 second and 0.451 second respectively) using speech signal samples are in close agreement with measured RT values (for booth RT value for
S
_{1}
and
S
_{2}
is 0.428 second and 0.456 second respectively). The Gamma distribution based statistical model produced acceptable results whenever the RT resided in the lower range (few hundred millisecond’s). The results achieved by the proposed algorithm is comparable to the results obtained by the state of the art method and better in few cases. For estimating the RT values over a wide range we would add some modifications to the algorithm in future, to make it possible.
BIO
Amad Hamza received his BSc degree in Electrical (Electronics) Engineering from Air University Islamabad, Pakistan in 2011 and his MS degree in Communication and Electronics from Univ. of Eng. & Tech. Peshawar, Pakistan in 2015. His research interests are audio/video processing, ANN and CGP.
Tariqullah Jan received his B.Sc. degree in Elect. Eng. from Pakistan in 2002 and PhD in the field of Electronic Engineering from UK in 2012. His Research interest includes Blind signal processing, machine learning, blind reverberation time estimation, multimodal based approaches for the blind source separation, compressed sensing, and Nonnegative matrix/tensor factorization for the blind source separation.
Asiya Jehangir received her B.Sc. degree in Electrical Engineering and M.S degree in Communication & electronics from Univ. of Eng. & Tech. Peshawar, Pakistan in 2007 and 2012 respectively. Her research interest includes Wireless Sensor Networks, Blind Signal Processing, and Applications of Signal processing in the field of Networks.
Syed Waqar Shah received his BSc degree in Elect. Eng. from Pakistan in 1994, MS degree from Pakistan in 2000 and PhD degree from UK in 2005. His research interests include Coordination in Mobile Agent Environment, Channel Estimation / Equalization of Wireless channels, Error Correction etc.
M.Haseeb Zafar received his BSc degree in Elect. Eng. from Pakistan in 1996, MS degree in Telecom. and Computers from USA in 2003 and PhD degree from UK in 2009. He has more than 18 years of teaching, research and industrial experience. His research interests include Wireless Sensor Networks, Smart Grids, Software Defined Networks, Genetic Algorithms, Intelligent Transportation Systems and Signal Processing.
M. Asif received his B.Sc degree in Computer Systems Eng. from Pakistan in 2006 and Ph.D degree in Electronics Engineering from UK, in 2012. He is currently working as an Assistant Professor in Department of Electronics, University of Peshawar, Pakistan. His main research interests include MANETs, Sensor Networks, Signal Processing and Renewable Energy.
Kuttruff H.
1991
Room Acoustics
3rd ed.
Elsevier Science Publishers Ltd., Lindin
Ratnam R.
,
Jones D. L.
,
Wheeler B. C.
,
OBrien W. D.
,
Lansing C. R.
,
Feng A. S.
2003
“Blind estimation of reverberation time,”
J. Acoust. Soc. Am.
114
2877 
2892
DOI : 10.1121/1.1616578
Sabine W. C.
1922
“Collected Papers on Acoustics,”
International Organization for Standardization
1997
Acoustics Measurements of the Reverberation Time of Rooms with Reference to Other Acoustical Parameters
Geneva
Lebart K.
,
Boucher J.
,
Denbigh P.
2001
“A new method based on spectral subtraction for speech deriverberation,”
Acta Acustica
87
359 
366
Vesa S.
,
Harma A.
2005
“Automatic estimation of reverberation time from binaural signals,”
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.
3
281 
284
Schroeder M. R.
1965
“New method of measuring reverberation time,”
J. Acoust. Soc. Am.
409 
412
Lollmann H. W.
,
Yilmaz E.
,
Jeub M.
,
Vary P.
2010
“An improved algorithm for blind reverberation time estimation,”
Proc. Int. Workshop Acoust. Echo and Noise Control (IWAENC)
Tel Aviv, Israel
Lollmann H. W.
,
Vary P.
2008
“Estimation of the Reverberation Time in Noisy Environments,”
Proc. Int. Workshop Acoust. Echo and Noise Control (IWAENC)
Washington USA
Ratnam R.
,
Jones D. L.
,
O Brien W. D.
2004
“Fast algorithms for blind estimation of reverberation time,”
IEEE Signal Process. Letters
11
537 
540
DOI : 10.1109/LSP.2004.826667
Wen J.Y.C.
,
Habets E.A.P.
,
Naylor P.A.
2008
“Blind estimation of reverberation time based on the distribution of signal decay rates,”
Proc. IEEE Int. Conf. Acoust., Speech, and Signal Process.
329 
332
Roman Scharrer
,
Vorländer M.
2010
“Blind reverberation time estimation.”
Proceedings of the International Conference on Acoustics
Sydney, Australia
Cox T. J.
,
Li F.
,
Darlington P.
2001
“Extracting room everberation time from speech using artificial neural networks,”
FJ. Audio Engineering Soc.
219 
230
Aliabadi M.
,
Golmohammadi R.
,
Ohadi A.
,
Mansoorizadeh Z.
,
Khotanlou H.
,
Sarrafzadeh M. S.
(2014)
Development of an Empirical Acoustic Model for Predicting Reverberation Time in Typical Industrial Workrooms Using Artificial Neural Networks
Acta Acustica united with Acustica
100
(6)
1090 
1097
DOI : 10.3813/AAA.918788
Petsatodis T.
,
Boukis C.
,
Talantzis F.
,
Tan Z.
,
Prasad R.
2011
“Convex combination of multiple statistical models with application to VAD,”
IEEE Trans. Audio, Speech, and Lang. Process.
2314 
2327
Bean M. A.
(2001)
Probability:The science of uncertainty with application to investments, insurance and engineering [Online]
Available:
Poor V.
1994
“An Introduction to Signal Detection and Estimation,”
SpringerVerlag
New York
Jeub M.
,
Schafer M.
,
Vary P.
2009
“A binaural room impulse response database for the evaluation of dereverberation algorithms,”
Proc. Int. Conf. Digital Signal Process. (DSP)
Santorini, Greece
Jan T.
,
Wang W.
“Blind reverberation time estimation based on laplace distribution,”
20th EUSIPCO 2012