Skip to main content

Start a conversational AI agent

Use this endpoint to create and start a Conversational AI agent instance.

API endpoint

  • Method: POST
  • Endpoint: https://api.agora.io/api/conversational-ai-agent/v2/projects/{appid}/join
  • Authorization: Basic Auth

Path parameters

ParameterTypeRequiredDescription
appidstringYesThe App ID of the project

Request body parameters

The request body must be a JSON object containing the following parameters:

ParameterTypeRequiredDescription
namestringYesThe unique identifier of the agent. The same identifier cannot be used repeatedly.
propertiesobjectYesConfiguration details of the agent
channelstringYesThe name of the channel to join.
tokenstringYesThe authentication token used by the agent to join the channel.
agent_rtc_uidstringYesThe user ID of the agent in the channel. A value of 0 means that a random uid is generated and assigned. Set the token accordingly.
remote_rtc_uidsarray[string]YesThe list of user IDs that the agent subscribes to in the channel. Only subscribed users can interact with the agent. "*" means that the agent subscribes to all users in the channel.
enable_string_uidbooleanWhether to enable String uid:
  • true: Both agent and subscriber user IDs use strings.
  • false: (Default) Both agent and subscriber user IDs must be integers.
idle_timeoutintegerSets the timeout after all the users specified in remote_rtc_uids are detected to have left the channel. When the timeout value is exceeded, the agent automatically stops and exits the channel. A value of 0 means that the agent does not not exit until it is stopped manually. Default value 30
advanced_featuresobjectAdvanced features configuration.
⇒⇒enable_aivadbooleanWhether to enable the intelligent interruption handling function (AIVAD). Default value false.
info
This feature is currently available only for English.
asrobjectAutomatic Speech Recognition (ASR) configuration
⇒⇒languagestringThe language used by users to interact with the agent. The following languages are supported:
  • en-US English - US (Default)
In Beta:
  • es-ES Spanish - Spain
  • ja-JP Japanese
  • ko-KR Korean
  • ar-AE Arabic - UAE
  • hi-IN Hindi - India
ttsobjectYesText-to-speech (TTS) module configuration
⇒⇒vendorstringYesTTS provider. Supports the following values:
  • microsoft: Microsoft Azure
  • elevenlabs: ElevenLabs
⇒⇒paramsobjectYesThe configuration parameters for the TTS vendor. See TTS vendor configuration for details.
llmobjectYesLarge language model (LLM) configuration.
⇒⇒urlstringYesThe LLM callback address.
⇒⇒api_keystringThe LLM verification API key. The default value is an empty string. Ensure that you enable the API key in a production environment.
⇒⇒system_messagesarray[object]A set of predefined information used as input to the LLM, including prompt words and examples.
⇒⇒paramsobjectAdditional LLM information transmitted in the message body, such as the model used, and the maximum token limit.
⇒⇒max_historyintegerThe number of short-term memory entries cached in the custom LLM. 0 means no short-term memory is cached. Users and agents log entries separately. Default value 10
⇒⇒input_modalitiesarray[string]LLM input modalities. Supports ["text"], ["text", "image"]. Default ["text"]
⇒⇒output_modalitiesarray[string]LLM output modalities. Support ["audio"],["text"], ["text", "audio"]. Default ["text"]
⇒⇒greeting_messagestringAgent greeting. If provided, the first user in the channel is automatically greeted with the message upon joining.
⇒⇒failure_messagestringPrompt for agent activation failure. If provided, it is returned through TTS when the custom LLM call fails
⇒⇒stylestringThe request style for chat completion. Supports:
  • openai (Default, including OpenAI compatible APIs)
  • gemini
vadobjectVoice Activity Detection (VAD) configuration
⇒⇒interrupt_duration_msnumberThe amount of time in milliseconds that the user’s voice must exceed the VAD threshold before an interruption is triggered. Default value 160
⇒⇒prefix_padding_msintegerThe extra forward padding time in milliseconds before the processing system starts to process the speech input. This padding helps capture the beginning of the speech. Default value 300
⇒⇒silence_duration_msintegerThe duration of audio silence in milliseconds. If no voice activity is detected during this period, the agent assumes that the user has stopped speaking. Default value 640
⇒⇒thresholdnumberIdentification sensitivity determines the level of sound in the audio signal that is considered voice activity. The value range is (0.0, 1.0). Lower values ​​make it easier for the agent to detect speech, and higher values ignore weak sounds. Default value 0.5

Request examples

Use one of the following request examples as a starting point:

Sample request:

curl --request post \
--url https://api.agora.io/api/conversational-ai-agent/v2/projects/:appid/join \
--header 'Authorization: Basic <your_base64_encoded_credentials>' \
--data '
{
"name": "unique_name",
"properties": {
"channel": "channel_name",
"token": "token",
"agent_rtc_uid": "friday",
"remote_rtc_uids": [
"*"
],
"enable_string_uid": true,
"idle_timeout": 120,
"advanced_features": {
"enable_aivad": true
},
"llm": {
"url": "https://api.openai.com/v1/chat/completions",
"api_key": "sk-xxx",
"system_messages": [
{
"role": "system",
"content": "You are a helpful chatbot."
}
],
"max_history": 10,
"greeting_message": "Hello, how can I assist you today?",
"failure_message": "Please hold on a second.",
"params": {
"model": "gpt-4o-mini"
}
},
"tts": {
"vendor": "microsoft",
"params": {
"key": "xxxx",
"region": "eastus",
"voice_name": "en-US-AndrewMultilingualNeural"
}
},
"asr": {
"language": "en-US"
},
"vad": {
"silence_duration_ms": 480
}
}
}'

Response

  • If the returned status code is 200, the request was successful. The response body contains the result of the request.

  • If the returned status code is not 200, the request failed. The response body includes the detail and reason for failure. Refer to status codes to understand the possible reasons for failure.

Response body

ParameterTypeDescription
agent_idstringUnique id of the agent instance
create_tsintegerTimestamp of when the agent was created
statusstringCurrent status.
  • IDLE (0): Agent is idle.
  • STARTING (1): The agent is being started.
  • RUNNING (2): The agent is running.
  • STOPPING (3): The agent is stopping.
  • STOPPED (4): The agent has exited.
  • RECOVERING (5): The agent is recovering.
  • FAILED (6): The agent failed to execute.

Sample response


_5
{
_5
"agent_id": "1NT29X10YHxxxxxWJOXLYHNYB",
_5
"create_ts": 1737111452,
_5
"status": "RUNNING"
_5
}

TTS vendor configuration

Conversational AI Engine supports the following TTS vendors:

  • Microsoft

    • Supported parameters

      • key
      • region
      • voice_name
      • rate: Indicates the speaking rate of the text. Speaking rate can be applied at the word or sentence level. The rate changes should be within 0.5 to 2 times the original audio.
      • volume: Expressed as a number in the range of 0.0 to 100.0, from quietest to loudest, such as 75. The default value is 100.0
      • sample_rate: integer. Default value 24000.

      For further details, see Microsoft TTS.

    • Sample configuration


      _10
      {
      _10
      "vendor": "microsoft",
      _10
      "params": {
      _10
      "key": "<your_microsoft_key>",
      _10
      "region": "eastus",
      _10
      "voice_name": "en-US-AndrewMultilingualNeural",
      _10
      "rate": 1.0,
      _10
      "volume": 70
      _10
      }
      _10
      }

  • Elevenlabs

    • Supported parameters

      • key
      • model_id
      • voice_id
      • sample_rate: integer. Default value 24000.
      • stability
      • similarity_boost
      • style
      • user_speaker_boost

      For further details, see Elevenlabs TTS.

    • Sample configuration


      _8
      {
      _8
      "vendor": "elevenlabs",
      _8
      "params": {
      _8
      "key": "<your_elevenlabs_key>",
      _8
      "model_id": "eleven_flash_v2_5",
      _8
      "voice_id": "pNInz6obpgDQGcFmaJgB"
      _8
      }
      _8
      }

vundefined