AI: Translation API

Profile

Domain: https://agents.monkeyenglish.net/
APIKEY: a813ec766197294184a938c331b08e7g

Translate Text API

Endpoint:

POST /translate

Description:

This endpoint translates text from a source language to a target language. It supports both basic and advanced translation, with additional options for context, area, and style in the advanced mode.

Request Headers:

Header	Type	Required	Description
`APIKEY`	String	Yes	API key for authorization

Request Body:

Basic Translation:

Field	Type	Required	Description
`source_lang`	String	Yes	The language of the source text.
`target_lang`	String	Yes	The language to translate to.
`sentence`	String	Yes	The text to be translated.
`is_advance`	Boolean	No	Set to `False` for basic translation.

Note: source_lang, target_lang có thể sử dụng tên quốc gia, hoặc code tên quốc gia theo bảng.

Advanced Translation (with additional optional fields):

Field	Type	Required	Description
`source_lang`	String	Yes	The language of the source text.
`target_lang`	String	Yes	The language to translate to.
`sentence`	String	Yes	The text to be translated.
`is_advance`	Boolean	Yes	Set to `True` for advanced translation.
`area`	String	No	Specify the domain/area for translation (e.g., legal, medical).
`style`	String	No	Specify the translation style (e.g., formal, informal).
`context`	String	No	Provide additional context for the translation.

Example Request (Basic Translation):

{
  "source_lang": "en",
  "target_lang": "es",
  "sentence": "Hello, how are you?",
  "is_advance": false
}

Example Response

{
  "message": "success",
  "target": "Humanity is truly terrifying.",
  "audio_target": "",
  "data": {
    "vie": {
      "text": "Nhân loại thực sự đáng sợ.",
      "audio": "https://vnmedia2.monkeyuni.net/App/uploads/productivity/TW4TBoru1K0TAneo7qUc.wav"
    },
    "eng": {
      "text": "Humanity is truly terrifying.",
      "audio": "https://vnmedia2.monkeyuni.net/App/uploads/productivity/h41XWUypOsAOIWUvWvvW.wav"
    }
  }
}

Supported Languages:

code	language	script	Source	Target
afr	Afrikaans	Latn	Sp, Tx	Tx
amh	Amharic	Ethi	Sp, Tx	Tx
arb	Modern Standard Arabic	Arab	Sp, Tx	Sp, Tx
ary	Moroccan Arabic	Arab	Sp, Tx	Tx
arz	Egyptian Arabic	Arab	Sp, Tx	Tx
asm	Assamese	Beng	Sp, Tx	Tx
ast	Asturian	Latn	Sp	--
azj	North Azerbaijani	Latn	Sp, Tx	Tx
bel	Belarusian	Cyrl	Sp, Tx	Tx
ben	Bengali	Beng	Sp, Tx	Sp, Tx
bos	Bosnian	Latn	Sp, Tx	Tx
bul	Bulgarian	Cyrl	Sp, Tx	Tx
cat	Catalan	Latn	Sp, Tx	Sp, Tx
ceb	Cebuano	Latn	Sp, Tx	Tx
ces	Czech	Latn	Sp, Tx	Sp, Tx
ckb	Central Kurdish	Arab	Sp, Tx	Tx
cmn	Mandarin Chinese	Hans	Sp, Tx	Sp, Tx
cmn_Hant	Mandarin Chinese	Hant	Sp, Tx	Sp, Tx
cym	Welsh	Latn	Sp, Tx	Sp, Tx
dan	Danish	Latn	Sp, Tx	Sp, Tx
deu	German	Latn	Sp, Tx	Sp, Tx
ell	Greek	Grek	Sp, Tx	Tx
eng	English	Latn	Sp, Tx	Sp, Tx
est	Estonian	Latn	Sp, Tx	Sp, Tx
eus	Basque	Latn	Sp, Tx	Tx
fin	Finnish	Latn	Sp, Tx	Sp, Tx
fra	French	Latn	Sp, Tx	Sp, Tx
fuv	Nigerian Fulfulde	Latn	Sp, Tx	Tx
gaz	West Central Oromo	Latn	Sp, Tx	Tx
gle	Irish	Latn	Sp, Tx	Tx
glg	Galician	Latn	Sp, Tx	Tx
guj	Gujarati	Gujr	Sp, Tx	Tx
heb	Hebrew	Hebr	Sp, Tx	Tx
hin	Hindi	Deva	Sp, Tx	Sp, Tx
hrv	Croatian	Latn	Sp, Tx	Tx
hun	Hungarian	Latn	Sp, Tx	Tx
hye	Armenian	Armn	Sp, Tx	Tx
ibo	Igbo	Latn	Sp, Tx	Tx
ind	Indonesian	Latn	Sp, Tx	Sp, Tx
isl	Icelandic	Latn	Sp, Tx	Tx
ita	Italian	Latn	Sp, Tx	Sp, Tx
jav	Javanese	Latn	Sp, Tx	Tx
jpn	Japanese	Jpan	Sp, Tx	Sp, Tx
kam	Kamba	Latn	Sp	--
kan	Kannada	Knda	Sp, Tx	Tx
kat	Georgian	Geor	Sp, Tx	Tx
kaz	Kazakh	Cyrl	Sp, Tx	Tx
kea	Kabuverdianu	Latn	Sp	--
khk	Halh Mongolian	Cyrl	Sp, Tx	Tx
khm	Khmer	Khmr	Sp, Tx	Tx
kir	Kyrgyz	Cyrl	Sp, Tx	Tx
kor	Korean	Kore	Sp, Tx	Sp, Tx
lao	Lao	Laoo	Sp, Tx	Tx
lit	Lithuanian	Latn	Sp, Tx	Tx
ltz	Luxembourgish	Latn	Sp	--
lug	Ganda	Latn	Sp, Tx	Tx
luo	Luo	Latn	Sp, Tx	Tx
lvs	Standard Latvian	Latn	Sp, Tx	Tx
mai	Maithili	Deva	Sp, Tx	Tx
mal	Malayalam	Mlym	Sp, Tx	Tx
mar	Marathi	Deva	Sp, Tx	Tx
mkd	Macedonian	Cyrl	Sp, Tx	Tx
mlt	Maltese	Latn	Sp, Tx	Sp, Tx
mni	Meitei	Beng	Sp, Tx	Tx
mya	Burmese	Mymr	Sp, Tx	Tx
nld	Dutch	Latn	Sp, Tx	Sp, Tx
nno	Norwegian Nynorsk	Latn	Sp, Tx	Tx
nob	Norwegian Bokmål	Latn	Sp, Tx	Tx
npi	Nepali	Deva	Sp, Tx	Tx
nya	Nyanja	Latn	Sp, Tx	Tx
oci	Occitan	Latn	Sp	--
ory	Odia	Orya	Sp, Tx	Tx
pan	Punjabi	Guru	Sp, Tx	Tx
pbt	Southern Pashto	Arab	Sp, Tx	Tx
pes	Western Persian	Arab	Sp, Tx	Sp, Tx
pol	Polish	Latn	Sp, Tx	Sp, Tx
por	Portuguese	Latn	Sp, Tx	Sp, Tx
ron	Romanian	Latn	Sp, Tx	Sp, Tx
rus	Russian	Cyrl	Sp, Tx	Sp, Tx
slk	Slovak	Latn	Sp, Tx	Sp, Tx
slv	Slovenian	Latn	Sp, Tx	Tx
sna	Shona	Latn	Sp, Tx	Tx
snd	Sindhi	Arab	Sp, Tx	Tx
som	Somali	Latn	Sp, Tx	Tx
spa	Spanish	Latn	Sp, Tx	Sp, Tx
srp	Serbian	Cyrl	Sp, Tx	Tx
swe	Swedish	Latn	Sp, Tx	Sp, Tx
swh	Swahili	Latn	Sp, Tx	Sp, Tx
tam	Tamil	Taml	Sp, Tx	Tx
tel	Telugu	Telu	Sp, Tx	Sp, Tx
tgk	Tajik	Cyrl	Sp, Tx	Tx
tgl	Tagalog	Latn	Sp, Tx	Sp, Tx
tha	Thai	Thai	Sp, Tx	Sp, Tx
tur	Turkish	Latn	Sp, Tx	Sp, Tx
ukr	Ukrainian	Cyrl	Sp, Tx	Sp, Tx
urd	Urdu	Arab	Sp, Tx	Sp, Tx
uzn	Northern Uzbek	Latn	Sp, Tx	Sp, Tx
vie	Vietnamese	Latn	Sp, Tx	Sp, Tx
xho	Xhosa	Latn	Sp	--
yor	Yoruba	Latn	Sp, Tx	Tx
yue	Cantonese	Hant	Sp, Tx	Tx
zlm	Colloquial Malay	Latn	Sp	--
zsm	Standard Malay	Latn	Tx	Tx
zul	Zulu	Latn	Sp, Tx	Tx

Speech Translation API

Endpoint:

POST /speech/translate

Description:

This endpoint translates an uploaded audio file from a source language to a target language. It supports speech-to-text translation tasks.

Request Headers:

Header	Type	Required	Description
`APIKEY`	String	Yes	API key for authorization

Request Body (Form-Data):

Field	Type	Required	Description
`audio`	File	Yes	The audio file to be translated.
`source`	String	Yes	The language of the audio (e.g., `en` for English).
`target`	String	Yes	The language to translate the audio to.
`task`	String	No	Translation task type. Default is `S2TT` (Speech-to-Text-to-Translation).

Example Request (Form-Data):

Key	Value
`audio`	(upload audio file)
`source`	`en`
`target`	`fr`
`task`	`S2TT`

Note: Language code follows the above table.
Task: S2TT if only want to translate to text, S2ST to translate with output audio + text
target: accept multi-output Example "vie,eng,spa"

Response:

Field	Type	Description
`status`	String	Status of the translation request.
`output`	String	The translated text or processed output.
`error`	String	Error message if applicable.

Successful Response (200 OK):

{
  "status": "success",
  "output": "Bonjour",
  "error": ""
}

Error Response (500 Internal Server Error):

{
  "status": "failure",
  "output": "",
  "error": "System encountered an unexpected error. <error message>"
}

Error Handling:

401 Unauthorized: Invalid API key.
500 Internal Server Error: System encountered an unexpected error.

Audio Streaming Client for Speech-to-Text and Translation (S2TT)

1. Overview

This document provides an overview of how to implement a client for streaming audio data to a WebSocket server that processes the data for speech-to-text-to-translation (S2TT) tasks. The system is designed to handle real-time audio streaming from clients, which can be built using various programming languages.

Key Components:

WebSocket Server: The server receives audio data from the client, processes it, and returns results (e.g., transcriptions, translations).
Client: Any client application (mobile, desktop, web) can stream audio to the server over WebSocket.
Streaming Protocol: Audio data is chunked and transmitted in real-time, with metadata indicating task details such as the source language, target language, and processing task.

2. Communication Flow

2.1 Initial Connection

Client Connects to Server: The client establishes a WebSocket connection with the server at a predefined URI.
- Example WebSocket URI: ws://<server-address>:<port>/ws/translate/<session_id>
- wss://agents.monkeyenglish.net/ws/translate/123
  session_id: random_string
Task Metadata: The client sends an initial message to define the task. This message includes:
- Source Language: The language of the audio input (e.g., eng for English).
- Target Language: The language for translation (e.g., vie for Vietnamese).
- Task Type: The processing task (e.g., S2TT for Speech-to-Text-to-Translation).
Message Format (JSON):
```
{
    "type": "start",
    "data": {
        "source": "eng",
        "target": "vie",
        "task": "S2TT"
    }
}
```

2.2 Streaming Audio Data

Audio Streaming: The client reads and sends audio data in chunks to the server. Each chunk is a segment of the full audio file, mimicking real-time audio streaming.
- The audio data is converted into a byte stream for transmission.
Transmission Format:
- Audio chunks are transmitted in binary format (e.g., byte array).
- Each chunk is sent over the WebSocket connection, followed by a short delay to simulate real-time audio capture.
Streaming Example:
- For every audio chunk, the client sends the binary data over the established WebSocket connection.
- The client continues sending chunks until the entire audio file has been transmitted.

2.3 Task Completion

End of Transmission: After the client finishes sending all audio chunks, it sends a final message to the server indicating that the streaming is complete and the task can be processed.

Message Format (JSON):
```
{
    "type": "do_task",
    "data": {
        "source": "eng",
        "target": "vie",
        "task": "S2TT"
    }
}
```
Processing Response: The server processes the received audio, performing the requested task (e.g., transcription and translation). Once complete, the server responds with the result, which may include:
- Transcribed text.
- Translated text.
Response Format: The server sends a JSON message back to the client containing the task's result:

{
"message": "",
"data": {
"vie": {
"text": "thế là sáng hôm sau cái tin tôi về đến cổng còn phải thăm đường đã lan ra khóc sóng",
"audio": "https://vnmedia2.monkeyuni.net/App/uploads/productivity/8tihbdQvbQPHkcqmDntW.wav"
},
"eng": {
"text": "So the next morning, when I got back to the cage, I had to walk down the street to cry.",
"audio": "https://vnmedia2.monkeyuni.net/App/uploads/productivity/5Um6tQ1nzT3BfEqOYzx4.wav"
}
},
"status": "success"
}```

3. Client Implementation Guidelines

3.1 Supported Languages

The client can be developed in any language that supports WebSocket communication, such as:

JavaScript: Web-based applications.
Python: Server-side or command-line tools.
Java/Kotlin: Android applications.
Swift: iOS applications.
C#: Desktop or .NET applications.

3.2 WebSocket Library

Ensure that the client uses a WebSocket library suitable for your chosen programming language. Common libraries include:

JavaScript: Native WebSocket API or popular libraries like socket.io.
Python: websockets or websocket-client.
Java/Kotlin: OkHttp WebSocket implementation.
Swift: Starscream library for WebSocket communication.

3.3 Audio File Handling

The client needs to handle reading audio files or capturing audio in real-time. The format of the audio must be compatible with the server’s requirements (e.g., 16kHz, mono, .wav ).

3.4 Chunking and Streaming

The client should send audio data in small chunks. For real-time applications:

Chunk Size: Each chunk should be small enough to allow near real-time transmission, typically between 1-3 seconds of audio data per chunk.
Delay: Introduce a small delay (e.g., 1-10 milliseconds) between sending each chunk to simulate real-time streaming.

3.5 Error Handling

The client must handle potential errors during the WebSocket communication:

Connection Issues: Reconnect if the WebSocket connection is dropped.
Server Responses: Handle unexpected responses or errors from the server gracefully.
Timeouts: Implement timeouts to prevent hanging connections if no response is received from the server.

4. Server Configuration

4.1 Server URI

Clients must connect to the WebSocket server at the following URI:

ws://<server-address>:<port>/ws/translate/<session_id>

<server-address>: IP address or domain of the WebSocket server.
<port>: Port on which the server is running (e.g., 5001).
<session_id>: A unique identifier for the client session, generated for each streaming session.

4.2 Audio Processing

The server is responsible for:

Receiving and buffering audio chunks.
Processing the audio (speech recognition, translation).
Sending results back to the client in the expected format.

5. Example Use Cases

5.1 Mobile Voice Translation App

A mobile app developed in Java or Swift captures the user's voice, streams the audio to the server using WebSocket, and receives the translated text, which is displayed to the user in real-time.

5.2 Web-Based Audio Translator

A JavaScript web application allows users to upload audio files. The app streams the audio to the server, processes it, and shows the translation results to the user.

5.3 Desktop Speech-to-Text Tool

A Python desktop application records audio from the microphone, streams it to the server, and displays real-time transcription and translation.

6. Conclusion

This document provides an overview of the WebSocket-based client-server system for real-time audio streaming and processing. The client can be implemented in any language with WebSocket support, allowing flexible integration across various platforms and applications.