Automatic Speech Recognition Jobs
As a Python backend developer, I'm looking for someone who can help me build a personal project. This project involves audio, specifically real-time audio processing and analysis. We need this backend for a web app that will send audio data over websocket connection to your backend. Your backend should accept the stream and transcribe it on the server and then store the text in postgresql database and then convert the transcribed text to audio using tts and send it back as a stream to the user over the same ws session/consumer that initiated it. This pipeline should have min latency and hence we prefer using local speech to text and text to speech instead of making external api calls. We could also use external services using apis if the quality of these services is not good - we n...
I'm looking for a professional who is experienced with Twilio and Deepgram technologies, specifically in Voice Activity Detection (VAD). The main goal of the project is to implement a VAD system that works seamlessly with Twilio and Deepgram APIs. The system should be able to identify voice activity and silence the TTS whenever a human interrupts and resume speaking whenever the human gets quiet. Key requirements: - Proficiency in Twilio and Deepgram APIs - Previous experience in developing VAD systems - Ability to integrate these technologies with existing systems - Understanding of real-time audio processing If you can do this and complete this quickly, I can award another project (similar project but bigger in scope)
I am looking for a Twilio expert to assist with the implementation of Voice Activity Detection (VAD) in a Twilio Call and integrate with TTS, STT, and LLM - except VAD, everything is already in place, just need someone to complete VAD and make documentation Looking for local developer with little AI knowledge. Can hire full time if work is good.
We currently have a pipeline in place utilizing Google Cloud Voice, but we aim to substitute it with a local solution. The replacement must offer minimal latency, support interruption, and have the ability to differentiate between speakers. This transition is essential for a product feature wherein the embodied AI exclusively responds to recognized or registered speakers. Additionally, we require interruption functionality so that the AI remains attentive, ready to pause and process incoming information whenever a registered user contributes relevant input to the conversation.