AI: Mspeak

Profile

AI - MSPEAK Progress Document

Update: 22.02.2023 - Version 1.0 API Document | Benchmark result

Update: 10.03.2023 - Summarize version 1.0 - Slide

Update: 15.03.2023 - Planing version 2.0 - Planning version 2.0

Update 25.05.2023 - Diagram v2.0 PDF

Update 25.05.2023 - Kế hoạch đánh giá V2 - Hướng dẫn SD V2 - Doc

Update 25.05.2023 - Planning version 3.0 - Slide

Update 11.10.2023 - Explain API V3 - Doc

Offline Plan Workflow

Update 21.01.2023 - Offline flow M-Speak
AI-S2T.drawio.png

Infra Mspeak
AI System.png
Refer: https://docs.kolena.io/metrics/wer-cer-mer/

Integration:

API method

Property	Value
URL	https://app.monkeyenglish.net/mspeak/v3/score
Method	POST
Header	Bearer {{JWT Token}}
Body	Form

WebSocket method
This guide explains how to use a WebSocket client to stream audio data from a file to a WebSocket server and handle server responses.

Overview
This WebSocket client allows you to:

Stream audio data from a file to a WebSocket server.
Receive and process responses from the server.
Send additional data to the server based on the responses

Domain dev: "wss://ai.monkeyenglish.net/ws/v2/{device_id}"

Biểu đồ không có tiêu đề.drawio.png

Workflow app + server

Biểu đồ không có tiêu đề.drawio.png

Note:

Khi dưới app gửi audio lên được server xác nhận là im lặng rồi đã được chấm điểm mà vẫn dưới 50đ. Mà vẫn tiếp tục im lặng thì server sẽ k chấm nữa. Server chỉ chấm lại khi hết im lặng và có nói thêm cái gì đó.
Khi user chủ động tắt, hoặc hết thời gian ghi âm thì vẫn như luồng bình thường.

Event Tracking App

Overview
This document provides instructions on how to write events to a client push stream using Amazon Kinesis. The events are structured with specific fields that include timing metrics, identifiers, version information, error messages, and timestamps.
Event Structure
Each event is represented by the following fields:

time_first_push: Time from open mic until sent first bytes
time_handshake: The time taken for the handshake process, represented as a float32.
time_response: The time taken to receive a response, represented as a float32.
profile_id: The unique identifier for the user profile, represented as a string.
user_id: User Id
mode: Enum [online | offline] or [0, 1]
device_id: The unique identifier for the device, represented as a string.
app_version: The version of the application, represented as a string.
error: Any error encountered during the process, represented as a string.
request_id: A unique identifier for the request, represented as a string.
created_at: The timestamp when the event was created, represented
as a timestamp with second-level precision.
event_name: mspeak_websocket
platform: string
score: score for session

Prerequisites

Kinesis: data_stream= app_ai_log

Message Push Stream Process

Step 1: Connect to WS: ws/v3/device_id
Khi connect thành công sẽ nhận được msg
{"type": 3, "msg": "Connect to server successfully"}
Step 2: Push data bytes data of array audio
Step 3: Khi muốn score cho data từ khi record đến bây giờ gửi
{"type": 1, "data": {data giống v2 nằm trong daya}}
Step 4: Kết quả nhận về sẽ giống như cũ
Step5: Gửi msg lên để clear data và chuẩn bị cho session mới
{"type": 2}