Note: This topic is relevant for Forensics.
A reference population is a collection of voiceprints used as a baseline to avoid false positive matches. This is similar to organizing a line-up of suspects with similar visual characteristics for a victim to identify, except that the reference population is a line-up of different people with similar voice characteristics, such as gender, language, accent, and dialect.
Investigators specify a reference population when comparing a questioned sample with suspect speaker, and the system uses it to compute inter-speaker variation, which is needed for reliable assessments. It is crucial to select a population whose voices are similar to the suspect.
The quality of the reference population is based on the number of individuals in the population, and the similarity of those individuals to the suspect’s voiceprints. Additional factors include the number of audio segments, the average net speech in the segments, and the average SNR (signal-to-noise-ratio).
The strength of the voice evidence depends on how well the evidence and target voiceprints reflect their respective sources, and this depends on the amount of data available:
more data→more representative voiceprints→more reliable evidence
While reference populations are not normally used in automated speaker identification applications, Forensics applications require focused reference populations. For example, if the suspect is a US male speaking English, selecting a reference population of women speaking Mandarin is not a sensible choice.
With a focused reference population, investigators can provide results that are understandable to non-experts, for example, to present testimony such as “There is a 10,000-to-one likelihood that this voice matches the target voice as compared to matching any other US-English speaking male voice.”
For forensic investigators who do not have the luxury of time to perform lengthy analysis, Nuance Forensics provides high-quality, predefined reference populations. Investigators can use the predefined populations, supplement them with additional voices, and create custom reference populations.
Forensic scientists who might spend weeks or months analyzing biometric data might spend even more time collecting new audio recordings from a target reference population. Although this is expensive and time-consuming, it is sometimes necessary when producing testimony for a courtroom.
Forensic examiners usually adopt more than one tool to confirm their hypotheses. It is rare that voice is the only piece of evidence evaluated during the forensic investigation. Voice is typically used at the start of the investigation (to search a database of target voices and find matches), near the end of an investigation (to validate the hypothesis that a suspect’s voice matches the voice evidence), or during both the beginning and end phases of an investigation. Thus, the investigation’s final LLR is a linear combination of multiple LLRs (from different tools). For LLR details, see Log likelihood ratios.