Stream ASR (Speech to Text Online)
Strame ASR is a module that helps with real-time speech-to-text using Riva SDK from Nvidia.
It is built on Riva SDK using the Conformer Model and SocketIo to make it user-friendly.
Work Flow
-
Step 1: Connect to host https://agents.monkeyenglish.net (SocketIo)
-
Step 2: Each session to speech to text emit to event: "on_start"
If on_start is successful, it will response a message
{'status': 'Connected to server successful.'}
-
Step 3: When on_start is OKE. You will send data to the event "audio_stream"
-
Step 4: Response of audio_stream will be received by event "asr_response"
-
Step 5: Finish session please emit to event "on_end". When finishing it will return to the message
{'status': 'Stopped to server successful.'}
Code example
- For JS
const socket = io('https://agents.monkeyenglish.net');
// Handle connection
socket.on('connect', () => {
console.log('Connected to server');
});
// Handle disconnection
socket.on('disconnect', () => {
console.log('Disconnected from server');
});
// Handle ASR response
socket.on('asr_response', (data) => {
console.log('Received ASR response:', data);
});
// Function to send audio data
function pushAudioStream(audioData) {
socket.emit('audio_stream', audioData);
}
// Example of reading log file and sending data (implement as needed)
- For Python
import socketio
import threading
import time
# Create a Socket.IO client
sio = socketio.Client()
# Event handler for connection
@sio.event
def connect():
print('Connected to server')
# Event handler for disconnection
@sio.event
def disconnect():
print('Disconnected from server')
# Event handler for 'asr_response' event
@sio.on('asr_response')
def on_asr_response(data):
print('Received ASR response:', data)
# Function to push data to 'audio_stream'
def push_audio_stream(audio_data: str):
sio.emit('audio_stream', audio_data)
# print(f'Pushed data to audio_stream: {audio_data}')
# Function to read and push lines from the log file
def stream_log_file(file_path: str):
lines = []
with open(file_path, 'r') as file:
for line in file:
# Assuming the split logic you provided is correct
lines.append(line.strip().split(" ")[1]) # Adjust based on your log format
for line in lines:
push_audio_stream(line)
time.sleep(0.1) # Delay between sending lines
# Function to handle the streaming and listening concurrently
def start_streaming_and_listening():
# Start listening to the Socket.IO server
sio.connect('https://agents.monkeyenglish.net')
# Start a separate thread to stream the log file
log_file_path = 'com.earlystart.monkeytalk-latest.log'
stream_thread = threading.Thread(target=stream_log_file, args=(log_file_path,))
stream_thread.start()
# Keep the main thread alive to listen for responses
sio.wait()
# Start the process
if __name__ == "__main__":
start_streaming_and_listening()