Start a Real-time STT agent
Start a Real-time STT agent
https://api.agora.io/api/speech-to-text/v1/projects/{appid}/join
Use this method to start subtitle recording and subtitle translation.
Request
Path parameters
The App ID of the project
Request body
BODYrequired
- languages array[string]required
The transcription languages to recognize. You can specify a maximum of 4 languages. Refer to Supported Languages for details.
- uidLanguagesConfig arraynullable
Configure the transcription language for the specified user ID. Supports up to 5 configuration items.
- uid stringrequired
The ID of the user to be transcribed.
- languages array[string]required
The transcription languages to recognize. You can specify up to 4 languages. Refer to Supported Languages for details.
- maxIdleTime integernullable
Default:
30
Possible values:
5 to 2592000
Maximum channel idle time, in seconds. When the specified time is exceeded, the transcription task ends automatically. Idle time means that there is no host in a live broadcast channel, or there is no user in a communication channel.
- rtcConfig objectrequired
Real-time subtitle configuration. After a user's voice is converted to text, the information is pushed to the channel as subtitles to match the UI real-time display.
- channelName stringrequired
The name of the channel to transcribe.
- subBotUid stringrequired
The ID of the bot that subscribes to the audio stream.
- subBotToken stringnullable
The token used by the subscribing bot for channel authentication. Required only when your project has App Certificate enabled. Generate this token on your token server. For details, see Token authentication.
- pubBotUid stringrequired
The ID of the bot that pushes subtitle information to the channel.
- pubBotToken stringnullable
The token used by the subtitle-pushing bot for channel authentication. Required only when your project has App Certificate enabled. Generate this token on your token server. For details, see Token authentication.
- subscribeAudioUids array[string]nullable
The user IDs for the audio streams you want to subscribe. Set this parameter if you need to subscribe to the audio stream of certain users. Maximum array length: 32. You can set either
subscribeAudioUids
orunSubscribeAudioUids
. - unSubscribeAudioUids array[string]nullable
The user IDs for the audio streams you do not want to subscribe. Set this parameter if you don't need to subscribe to the audio stream of certain users. Maximum array length: 5. You can set either
subscribeAudioUids
orunSubscribeAudioUids
. - cryptionMode integernullable
Possible values:
0 to 8
The encryption and decryption mode. When enabled, this mode is used for both decrypting incoming streams and encrypting outgoing subtitles.
0
: No encryption1
:AES_128_XTS
128-bit AES encryption, XTS mode2
:AES_128_ECB
128-bit AES encryption, ECB mode3
:AES_256_XTS
256-bit AES encryption, XTS mode4
:SM4_128_ECB
128-bit SM4 encryption, ECB mode5
:AES_128_GCM
128-bit AES encryption, GCM mode6
:AES_256_GCM
256-bit AES encryption, GCM mode7
:AES_128_GCM2
128-bit AES encryption, GCM mode, Compared withAES_128_GCM
encryption mode, this encryption mode is more secure and requires setting a key and salt.8
:AES_256_GCM2
256-bit AES encryption, GCM mode, Compared withAES_256_GCM
encryption mode, this encryption mode is more secure and requires setting a key and salt. The decryption method must match the encryption method set for the channel.
- secret stringnullable
The encryption/decryption key. Required when
cryptionMode
is not0
. - salt stringnullable
A Base64-encoded, 32-byte encryption/decryption salt. Required only when
cryptionMode
is7
or8
. - enableJsonProtocol booleannullable
Default:
false
Set the encoding format of the subtitle data pushed to the channel.
true
: Use JSON to push subtitles and compress data with gzip. Uses less bandwidth, but requires decoding.false
: Use Protobuf to push subtitles (default). The data volume is smaller. Suitable for scenarios with high transmission efficiency requirements.
- translateConfig objectnullable
Subtitle translation configuration.
- languages arraynullable
The translation language array. You can specify a maximum of 4 different source languages. The source language and target language must be different, otherwise an error is reported.
Each array item is an object with:- source stringrequired
The source language for translation. Refer to Supported Languages for details.
- target array[string]required
The target languages for translation. You can specify a maximum of 5 target languages for each source language. Refer to Supported Languages for details.
- captionConfig objectnullable
Subtitle recording configuration.
- sliceDuration integernullable
Default:
60
Possible values:
5 to 28800
The slice size of the recorded subtitle file, in seconds.
- storage objectnullable
- accesskey stringrequired
The access key of the third-party cloud storage.
- secretkey stringrequired
The secret key of the third-party cloud storage.
- bucket stringrequired
The bucket name of the third-party cloud storage.
- vendor integerrequired
Possible values:
1
,5
,6
The third-party cloud storage platform:
1
: Amazon S35
: Microsoft Azure6
: Google Cloud
- region integerrequired
The region information for the third-party cloud storage. To ensure successful and real-time uploading of recorded files, the cloud storage region must match the region of the application server where you initiate the request. For example, if your App server is in East US, set the cloud storage region to East US as well. See third-party storage regions for details.
- fileNamePrefix array[string]nullable
The storage location of the recorded file in the third-party cloud storage. The prefix length (including slashes) must not exceed 128 characters. The following characters are supported:
- Lowercase English letters (a-z)
- Uppercase English letters (A-Z)
- Numbers (0-9)
Symbols like slashes, underscores, and brackets must not appear in the string.
- name stringrequired
Unique ID of the agent. Maximum length is 64 characters. You cannot use the same ID repeatedly.
Response
-
If the returned status code is
200
, the request was successful. The response body contains the result of the request.OK
- agent_id string
The ID of the agent.
- create_ts integer
The Unix timestamp (in seconds) when the agent was created.
- status string
The current status of the agent:
IDLE
: The agent is not initializedSTARTING
: The agent is startingRUNNING
: The agent is runningSTOPPING
: The agent is exitingSTOPPED
: The agent exited successfullyRECOVERING
: The agent is recoveringFAILED
: Agent exit failed
-
If the returned status code is not
200
, the request failed. Refer to thedetail
andreason
fields to understand the possible reasons for failure.Non-200
- detail string
Details of the request failure.
- reason string
The reason why the request failed.
Authorization
This endpoint requires Basic Auth.
Request example
- curl
- Python
- Node.js
Response example
- 200
- Non-200