Domain: https://agents.monkeyenglish.net/
APIKEY: a813ec766197294184a938c331b08e7g
Translate Text API
Endpoint:
POST /translate
Description:
This endpoint translates text from a source language to a target language. It supports both basic and advanced translation, with additional options for context, area, and style in the advanced mode.
Request Headers:
Header | Type | Required | Description |
---|---|---|---|
APIKEY |
String | Yes | API key for authorization |
Request Body:
Basic Translation:
Field | Type | Required | Description |
---|---|---|---|
source_lang |
String | Yes | The language of the source text. |
target_lang |
String | Yes | The language to translate to. |
sentence |
String | Yes | The text to be translated. |
is_advance |
Boolean | No | Set to False for basic translation. |
Note: source_lang, target_lang có thể sử dụng tên quốc gia, hoặc code tên quốc gia theo bảng.
Advanced Translation (with additional optional fields):
Field | Type | Required | Description |
---|---|---|---|
source_lang |
String | Yes | The language of the source text. |
target_lang |
String | Yes | The language to translate to. |
sentence |
String | Yes | The text to be translated. |
is_advance |
Boolean | Yes | Set to True for advanced translation. |
area |
String | No | Specify the domain/area for translation (e.g., legal, medical). |
style |
String | No | Specify the translation style (e.g., formal, informal). |
context |
String | No | Provide additional context for the translation. |
Example Request (Basic Translation):
{
"source_lang": "en",
"target_lang": "es",
"sentence": "Hello, how are you?",
"is_advance": false
}
Example Response
{
"message": "success",
"target": "Humanity is truly terrifying.",
"audio_target": "",
"data": {
"vie": {
"text": "Nhân loại thực sự đáng sợ.",
"audio": "https://vnmedia2.monkeyuni.net/App/uploads/productivity/TW4TBoru1K0TAneo7qUc.wav"
},
"eng": {
"text": "Humanity is truly terrifying.",
"audio": "https://vnmedia2.monkeyuni.net/App/uploads/productivity/h41XWUypOsAOIWUvWvvW.wav"
}
}
}
Supported Languages:
code | language | script | Source | Target |
---|---|---|---|---|
afr | Afrikaans | Latn | Sp, Tx | Tx |
amh | Amharic | Ethi | Sp, Tx | Tx |
arb | Modern Standard Arabic | Arab | Sp, Tx | Sp, Tx |
ary | Moroccan Arabic | Arab | Sp, Tx | Tx |
arz | Egyptian Arabic | Arab | Sp, Tx | Tx |
asm | Assamese | Beng | Sp, Tx | Tx |
ast | Asturian | Latn | Sp | -- |
azj | North Azerbaijani | Latn | Sp, Tx | Tx |
bel | Belarusian | Cyrl | Sp, Tx | Tx |
ben | Bengali | Beng | Sp, Tx | Sp, Tx |
bos | Bosnian | Latn | Sp, Tx | Tx |
bul | Bulgarian | Cyrl | Sp, Tx | Tx |
cat | Catalan | Latn | Sp, Tx | Sp, Tx |
ceb | Cebuano | Latn | Sp, Tx | Tx |
ces | Czech | Latn | Sp, Tx | Sp, Tx |
ckb | Central Kurdish | Arab | Sp, Tx | Tx |
cmn | Mandarin Chinese | Hans | Sp, Tx | Sp, Tx |
cmn_Hant | Mandarin Chinese | Hant | Sp, Tx | Sp, Tx |
cym | Welsh | Latn | Sp, Tx | Sp, Tx |
dan | Danish | Latn | Sp, Tx | Sp, Tx |
deu | German | Latn | Sp, Tx | Sp, Tx |
ell | Greek | Grek | Sp, Tx | Tx |
eng | English | Latn | Sp, Tx | Sp, Tx |
est | Estonian | Latn | Sp, Tx | Sp, Tx |
eus | Basque | Latn | Sp, Tx | Tx |
fin | Finnish | Latn | Sp, Tx | Sp, Tx |
fra | French | Latn | Sp, Tx | Sp, Tx |
fuv | Nigerian Fulfulde | Latn | Sp, Tx | Tx |
gaz | West Central Oromo | Latn | Sp, Tx | Tx |
gle | Irish | Latn | Sp, Tx | Tx |
glg | Galician | Latn | Sp, Tx | Tx |
guj | Gujarati | Gujr | Sp, Tx | Tx |
heb | Hebrew | Hebr | Sp, Tx | Tx |
hin | Hindi | Deva | Sp, Tx | Sp, Tx |
hrv | Croatian | Latn | Sp, Tx | Tx |
hun | Hungarian | Latn | Sp, Tx | Tx |
hye | Armenian | Armn | Sp, Tx | Tx |
ibo | Igbo | Latn | Sp, Tx | Tx |
ind | Indonesian | Latn | Sp, Tx | Sp, Tx |
isl | Icelandic | Latn | Sp, Tx | Tx |
ita | Italian | Latn | Sp, Tx | Sp, Tx |
jav | Javanese | Latn | Sp, Tx | Tx |
jpn | Japanese | Jpan | Sp, Tx | Sp, Tx |
kam | Kamba | Latn | Sp | -- |
kan | Kannada | Knda | Sp, Tx | Tx |
kat | Georgian | Geor | Sp, Tx | Tx |
kaz | Kazakh | Cyrl | Sp, Tx | Tx |
kea | Kabuverdianu | Latn | Sp | -- |
khk | Halh Mongolian | Cyrl | Sp, Tx | Tx |
khm | Khmer | Khmr | Sp, Tx | Tx |
kir | Kyrgyz | Cyrl | Sp, Tx | Tx |
kor | Korean | Kore | Sp, Tx | Sp, Tx |
lao | Lao | Laoo | Sp, Tx | Tx |
lit | Lithuanian | Latn | Sp, Tx | Tx |
ltz | Luxembourgish | Latn | Sp | -- |
lug | Ganda | Latn | Sp, Tx | Tx |
luo | Luo | Latn | Sp, Tx | Tx |
lvs | Standard Latvian | Latn | Sp, Tx | Tx |
mai | Maithili | Deva | Sp, Tx | Tx |
mal | Malayalam | Mlym | Sp, Tx | Tx |
mar | Marathi | Deva | Sp, Tx | Tx |
mkd | Macedonian | Cyrl | Sp, Tx | Tx |
mlt | Maltese | Latn | Sp, Tx | Sp, Tx |
mni | Meitei | Beng | Sp, Tx | Tx |
mya | Burmese | Mymr | Sp, Tx | Tx |
nld | Dutch | Latn | Sp, Tx | Sp, Tx |
nno | Norwegian Nynorsk | Latn | Sp, Tx | Tx |
nob | Norwegian Bokmål | Latn | Sp, Tx | Tx |
npi | Nepali | Deva | Sp, Tx | Tx |
nya | Nyanja | Latn | Sp, Tx | Tx |
oci | Occitan | Latn | Sp | -- |
ory | Odia | Orya | Sp, Tx | Tx |
pan | Punjabi | Guru | Sp, Tx | Tx |
pbt | Southern Pashto | Arab | Sp, Tx | Tx |
pes | Western Persian | Arab | Sp, Tx | Sp, Tx |
pol | Polish | Latn | Sp, Tx | Sp, Tx |
por | Portuguese | Latn | Sp, Tx | Sp, Tx |
ron | Romanian | Latn | Sp, Tx | Sp, Tx |
rus | Russian | Cyrl | Sp, Tx | Sp, Tx |
slk | Slovak | Latn | Sp, Tx | Sp, Tx |
slv | Slovenian | Latn | Sp, Tx | Tx |
sna | Shona | Latn | Sp, Tx | Tx |
snd | Sindhi | Arab | Sp, Tx | Tx |
som | Somali | Latn | Sp, Tx | Tx |
spa | Spanish | Latn | Sp, Tx | Sp, Tx |
srp | Serbian | Cyrl | Sp, Tx | Tx |
swe | Swedish | Latn | Sp, Tx | Sp, Tx |
swh | Swahili | Latn | Sp, Tx | Sp, Tx |
tam | Tamil | Taml | Sp, Tx | Tx |
tel | Telugu | Telu | Sp, Tx | Sp, Tx |
tgk | Tajik | Cyrl | Sp, Tx | Tx |
tgl | Tagalog | Latn | Sp, Tx | Sp, Tx |
tha | Thai | Thai | Sp, Tx | Sp, Tx |
tur | Turkish | Latn | Sp, Tx | Sp, Tx |
ukr | Ukrainian | Cyrl | Sp, Tx | Sp, Tx |
urd | Urdu | Arab | Sp, Tx | Sp, Tx |
uzn | Northern Uzbek | Latn | Sp, Tx | Sp, Tx |
vie | Vietnamese | Latn | Sp, Tx | Sp, Tx |
xho | Xhosa | Latn | Sp | -- |
yor | Yoruba | Latn | Sp, Tx | Tx |
yue | Cantonese | Hant | Sp, Tx | Tx |
zlm | Colloquial Malay | Latn | Sp | -- |
zsm | Standard Malay | Latn | Tx | Tx |
zul | Zulu | Latn | Sp, Tx | Tx |
Speech Translation API
Endpoint:
POST /speech/translate
Description:
This endpoint translates an uploaded audio file from a source language to a target language. It supports speech-to-text translation tasks.
Request Headers:
Header | Type | Required | Description |
---|---|---|---|
APIKEY |
String | Yes | API key for authorization |
Request Body (Form-Data):
Field | Type | Required | Description |
---|---|---|---|
audio |
File | Yes | The audio file to be translated. |
source |
String | Yes | The language of the audio (e.g., en for English). |
target |
String | Yes | The language to translate the audio to. |
task |
String | No | Translation task type. Default is S2TT (Speech-to-Text-to-Translation). |
Example Request (Form-Data):
Key | Value |
---|---|
audio |
(upload audio file) |
source |
en |
target |
fr |
task |
S2TT |
Note: Language code follows the above table.
Task: S2TT if only want to translate to text, S2ST to translate with output audio + text
target: accept multi-output Example "vie,eng,spa"
Response:
Field | Type | Description |
---|---|---|
status |
String | Status of the translation request. |
output |
String | The translated text or processed output. |
error |
String | Error message if applicable. |
Successful Response (200 OK):
{
"status": "success",
"output": "Bonjour",
"error": ""
}
Error Response (500 Internal Server Error):
{
"status": "failure",
"output": "",
"error": "System encountered an unexpected error. <error message>"
}
Error Handling:
401 Unauthorized
: Invalid API key.500 Internal Server Error
: System encountered an unexpected error.
Audio Streaming Client for Speech-to-Text and Translation (S2TT)
1. Overview
This document provides an overview of how to implement a client for streaming audio data to a WebSocket server that processes the data for speech-to-text-to-translation (S2TT) tasks. The system is designed to handle real-time audio streaming from clients, which can be built using various programming languages.
Key Components:
- WebSocket Server: The server receives audio data from the client, processes it, and returns results (e.g., transcriptions, translations).
- Client: Any client application (mobile, desktop, web) can stream audio to the server over WebSocket.
- Streaming Protocol: Audio data is chunked and transmitted in real-time, with metadata indicating task details such as the source language, target language, and processing task.
2. Communication Flow
2.1 Initial Connection
-
Client Connects to Server: The client establishes a WebSocket connection with the server at a predefined URI.
- Example WebSocket URI:
ws://<server-address>:<port>/ws/translate/<session_id>
wss://agents.monkeyenglish.net/ws/translate/123
session_id: random_string
- Example WebSocket URI:
-
Task Metadata: The client sends an initial message to define the task. This message includes:
- Source Language: The language of the audio input (e.g.,
eng
for English). - Target Language: The language for translation (e.g.,
vie
for Vietnamese). - Task Type: The processing task (e.g.,
S2TT
for Speech-to-Text-to-Translation).
Message Format (JSON):
{ "type": "start", "data": { "source": "eng", "target": "vie", "task": "S2TT" } }
- Source Language: The language of the audio input (e.g.,
2.2 Streaming Audio Data
-
Audio Streaming: The client reads and sends audio data in chunks to the server. Each chunk is a segment of the full audio file, mimicking real-time audio streaming.
- The audio data is converted into a byte stream for transmission.
-
Transmission Format:
- Audio chunks are transmitted in binary format (e.g., byte array).
- Each chunk is sent over the WebSocket connection, followed by a short delay to simulate real-time audio capture.
-
Streaming Example:
- For every audio chunk, the client sends the binary data over the established WebSocket connection.
- The client continues sending chunks until the entire audio file has been transmitted.
2.3 Task Completion
-
End of Transmission: After the client finishes sending all audio chunks, it sends a final message to the server indicating that the streaming is complete and the task can be processed.
Message Format (JSON):
{ "type": "do_task", "data": { "source": "eng", "target": "vie", "task": "S2TT" } }
-
Processing Response: The server processes the received audio, performing the requested task (e.g., transcription and translation). Once complete, the server responds with the result, which may include:
- Transcribed text.
- Translated text.
-
Response Format: The server sends a JSON message back to the client containing the task's result:
{
"message": "",
"data": {
"vie": {
"text": "thế là sáng hôm sau cái tin tôi về đến cổng còn phải thăm đường đã lan ra khóc sóng",
"audio": "https://vnmedia2.monkeyuni.net/App/uploads/productivity/8tihbdQvbQPHkcqmDntW.wav"
},
"eng": {
"text": "So the next morning, when I got back to the cage, I had to walk down the street to cry.",
"audio": "https://vnmedia2.monkeyuni.net/App/uploads/productivity/5Um6tQ1nzT3BfEqOYzx4.wav"
}
},
"status": "success"
}```
3. Client Implementation Guidelines
3.1 Supported Languages
The client can be developed in any language that supports WebSocket communication, such as:
- JavaScript: Web-based applications.
- Python: Server-side or command-line tools.
- Java/Kotlin: Android applications.
- Swift: iOS applications.
- C#: Desktop or .NET applications.
3.2 WebSocket Library
Ensure that the client uses a WebSocket library suitable for your chosen programming language. Common libraries include:
- JavaScript: Native WebSocket API or popular libraries like
socket.io
. - Python:
websockets
orwebsocket-client
. - Java/Kotlin:
OkHttp
WebSocket implementation. - Swift:
Starscream
library for WebSocket communication.
3.3 Audio File Handling
The client needs to handle reading audio files or capturing audio in real-time. The format of the audio must be compatible with the server’s requirements (e.g., 16kHz, mono, .wav
).
3.4 Chunking and Streaming
The client should send audio data in small chunks. For real-time applications:
- Chunk Size: Each chunk should be small enough to allow near real-time transmission, typically between 1-3 seconds of audio data per chunk.
- Delay: Introduce a small delay (e.g., 1-10 milliseconds) between sending each chunk to simulate real-time streaming.
3.5 Error Handling
The client must handle potential errors during the WebSocket communication:
- Connection Issues: Reconnect if the WebSocket connection is dropped.
- Server Responses: Handle unexpected responses or errors from the server gracefully.
- Timeouts: Implement timeouts to prevent hanging connections if no response is received from the server.
4. Server Configuration
4.1 Server URI
Clients must connect to the WebSocket server at the following URI:
ws://<server-address>:<port>/ws/translate/<session_id>
<server-address>
: IP address or domain of the WebSocket server.<port>
: Port on which the server is running (e.g., 5001).<session_id>
: A unique identifier for the client session, generated for each streaming session.
4.2 Audio Processing
The server is responsible for:
- Receiving and buffering audio chunks.
- Processing the audio (speech recognition, translation).
- Sending results back to the client in the expected format.
5. Example Use Cases
5.1 Mobile Voice Translation App
A mobile app developed in Java or Swift captures the user's voice, streams the audio to the server using WebSocket, and receives the translated text, which is displayed to the user in real-time.
5.2 Web-Based Audio Translator
A JavaScript web application allows users to upload audio files. The app streams the audio to the server, processes it, and shows the translation results to the user.
5.3 Desktop Speech-to-Text Tool
A Python desktop application records audio from the microphone, streams it to the server, and displays real-time transcription and translation.
6. Conclusion
This document provides an overview of the WebSocket-based client-server system for real-time audio streaming and processing. The client can be implemented in any language with WebSocket support, allowing flexible integration across various platforms and applications.