AI: Stream ASR

Profile

Stream ASR (Speech to Text Online)

Strame ASR is a module that helps with real-time speech-to-text using Riva SDK from Nvidia.

It is built on Riva SDK using the Conformer Model and SocketIo to make it user-friendly.

Work Flow

Step 1: Connect to host https://agents.monkeyenglish.net (SocketIo)
Step 2: Each session to speech to text emit to event: "on_start"
If on_start is successful, it will response a message

{'status': 'Connected to server successful.'}

Step 3: When on_start is OKE. You will send data to the event "audio_stream"
Step 4: Response of audio_stream will be received by event "asr_response"
Step 5: Finish session please emit to event "on_end". When finishing it will return to the message

{'status': 'Stopped to server successful.'}

Code example

For JS

const socket = io('https://agents.monkeyenglish.net');

// Handle connection
socket.on('connect', () => {
    console.log('Connected to server');
});

// Handle disconnection
socket.on('disconnect', () => {
    console.log('Disconnected from server');
});

// Handle ASR response
socket.on('asr_response', (data) => {
    console.log('Received ASR response:', data);
});

// Function to send audio data
function pushAudioStream(audioData) {
    socket.emit('audio_stream', audioData);
}

// Example of reading log file and sending data (implement as needed)

For Python

import socketio
import threading
import time

# Create a Socket.IO client
sio = socketio.Client()

# Event handler for connection
@sio.event
def connect():
    print('Connected to server')

# Event handler for disconnection
@sio.event
def disconnect():
    print('Disconnected from server')

# Event handler for 'asr_response' event
@sio.on('asr_response')
def on_asr_response(data):
    print('Received ASR response:', data)

# Function to push data to 'audio_stream'
def push_audio_stream(audio_data: str):
    sio.emit('audio_stream', audio_data)
    # print(f'Pushed data to audio_stream: {audio_data}')

# Function to read and push lines from the log file
def stream_log_file(file_path: str):
    lines = []
    with open(file_path, 'r') as file:
        for line in file:
            # Assuming the split logic you provided is correct
            lines.append(line.strip().split("  ")[1])  # Adjust based on your log format
    
    for line in lines:
        push_audio_stream(line)
        time.sleep(0.1)  # Delay between sending lines

# Function to handle the streaming and listening concurrently
def start_streaming_and_listening():
    # Start listening to the Socket.IO server
    sio.connect('https://agents.monkeyenglish.net')

    # Start a separate thread to stream the log file
    log_file_path = 'com.earlystart.monkeytalk-latest.log'
    stream_thread = threading.Thread(target=stream_log_file, args=(log_file_path,))
    stream_thread.start()

    # Keep the main thread alive to listen for responses
    sio.wait()

# Start the process
if __name__ == "__main__":
    start_streaming_and_listening()