top of page

Privacy in Speech Emotion Recognition

Speech-centric machine learning systems are increasingly transforming how we interact with technology. Whether it's the latest speech-to-text models, voice assistants like Alexa or Siri, or the speech-emotion recognition systems we develop at Maaind, these technologies are relying on voice to let us communicate what we need. However, it is crucial to ensure that these systems are highly reliable for widespread use. Specifically, there are concerns regarding privacy, biased performance and fairness, and susceptibility to adversarial attacks. To tackle these challenges and mitigate the associated risks, significant efforts have been devoted to making these machine learning systems more trustworthy, with a particular emphasis on privacy, safety, and fairness.

from Feng et al. (2023). A review of speech-centric trustworthy machine learning: Privacy, safety, and fairness. APSIPA Transactions on Signal and Information Processing, 12(3).

Why Privacy is Especially Important for Speech

One obvious concern is that raw speech could reveal private, personal, and sensitive information if it falls into the wrong hands. It is crucial to maintain confidentiality when handling speech data. Additionally, trust becomes a critical aspect when we have products that listen to us, as speech and what we say can be very intimate.

Misuse of speech data could also enable advertisers to determine your preferences or the topics you discuss. There have been scandals involving voice assistants like Alexa. That's why we have a strict policy to not collaborate with advertisers.

Additionally, privacy in speech is important because speech patterns can be used as a biometric identifier. Everybody has unique speech patterns, including speed, pitch or alternation, which makes raw speech data a reliable source to identify individuals. This highlights the importance of keeping recorded speech data secure, especially as speech may become more prevalent in unlocking devices like phones and cars in the future.

Different solutions to ensure privacy in SER

Encryption of the data

One crucial aspect to consider when transmitting audio data is the vulnerability of the data to interception by malicious attackers. These attackers could potentially gain unauthorized access to sensitive information, putting the integrity and confidentiality of the data at risk. To mitigate this risk, it is important to encrypt the data before transmitting it, ensuring that even if intercepted, the information remains secure and inaccessible to unauthorized individuals.

Protocols such as SSL or TLS can be used to encrypt data. Also our infrastructure Partners Microsoft are protecting data both at rest and in transit. Azure servers secure data using various encryption methods, protocols, and algorithms, including double encryption.

On-device feature processing

Raw audio can be pre-processed on the device to transform it into features or speech representations that are not interpretable by humans. These features can include pitch, speed, or other characteristics of the voice sample. It is important to note that in this scenario, the raw audio does not leave the client's device at any point, and just the extracted features are received by the machine learning model. This is also the approach we follow at Maaind.

Fully Homomorphic Encryption (FHE)

A novel technique called Fully Homomorphic Encryption (FHE) enables computing directly on encrypted data without the need for decryption. This allows the development of private-by-design applications without compromising on features. ConcreteML is an open-source solution for implementing FHE.


In conclusion, privacy is of utmost importance when it comes to speech emotion recognition systems. The potential risks associated with handling raw speech data necessitate the implementation of robust privacy measures. Encryption of data, on-device feature processing, and innovative techniques such as Fully Homomorphic Encryption (FHE) are all valuable solutions to ensure the confidentiality and security of speech data.

At Maaind, we are pioneers in bringing SER and neuroadaptive AI from academic research to real-life use cases. Our models are at the forefront of accuracy in the industry, ensuring that our clients can trust our technology to deliver reliable and privacy-conscious solutions. With our commitment to privacy, safety, and fairness, our mission is to create more wellbeing by making everyday products more emotionally intelligent. If you want to learn more check out our website or get in touch with us.


bottom of page