Display live subtitles

When interacting with conversational AI in real time, you can enable real-time subtitles to display the conversation content. This page explains how to implement real-time subtitles in your app.

Understand the tech

To simplify subtitle integration, Agora provides an open-source subtitle processing module. By integrating this module into your project and calling its APIs, you can quickly enable real-time subtitles. The following figure illustrates how the subtitle module interacts with your app and Agora SD-RTN™.

Subtitles module workflow

Prerequisites

Before you begin, make sure you have implemented the Conversational AI Engine REST quickstart.

Implementation

This section describes how to receive subtitle content from the subtitle processing module and display it on your app UI.

Android
iOS/macOS
Web

Copy ConverationSubtitleController.kt and MessageParser.kt files to your project and import the module before calling the module API.

Integrate the subtitle processing module

Inherit your subtitle UI module from the IConversationSubtitleCallback interface and implement the onSubtitleUpdated method to handle the message rendering logic.

Implement subtitle UI rendering logic

class CovMessageListView @JvmOverloads constructor(    context: Context,    attrs: AttributeSet? = null,    defStyleAttr: Int = 0) : LinearLayout(context, attrs, defStyleAttr), IConversationSubtitleCallback {    override fun onSubtitleUpdated(subtitle: SubtitleMessage) {        // Implement your UI rendering logic here    }}

Create a subtitle processing module instance

When entering the call page, create an ConversationSubtitleController instance, which monitors the subtitle message callback internally and passes the subtitle information to your UI through the onSubtitleUpdated callback of IConversationSubtitleCallback.

override fun onCreate(savedInstanceState: Bundle?) {    val subRenderController = ConversationSubtitleController(        SubtitleRenderConfig(            rtcEngine = rtcEngine,            SubtitleRenderMode.word,            mBinding?.messageListView        )    )}

Release resources

Call the reset method at the end of each call to clean up the cache. When leaving the call page, call release to release resources.
```
subRenderController.reset()subRenderController.release()
```

Integrate the subtitle processing module

Copy ConversationSubtitleController.swift and MessageParser.swift files to your project and import the module before calling the module API.
Implement subtitle UI rendering logic

To render subtitles in your UI, implement the ConversationSubtitleDelegate protocol in your subtitle UI module. Then, define the onSubtitleUpdated method to handle subtitle message rendering.
```
extension ChatViewController: ConversationSubtitleDelegate {    func onSubtitleUpdated(subtitle: SubtitleMessage) {        // Implement your UI rendering logic here    }}
```
Create a subtitle processing module instance

When entering the call page, create a ConversationSubtitleController instance. This instance monitors subtitle message callbacks internally and passes the subtitle information to your UI using the onSubtitleUpdated callback of ConversationSubtitleDelegate.
```
let subRenderConfig = SubtitleRenderConfig(rtcEngine: rtcEngine, renderMode: .words, delegate: self)subRenderController.setupWithConfig(subRenderConfig)
```
Release resources

At the end of each call, use the reset method to clean up the cache.
```
subRenderController.reset()
```

Integrate the subtitle processing module

Copy the message.ts file to your project and import the module before calling its API. The required dependencies are available in the lib folder.

Implement subtitle UI rendering logic

The subtitle UI module you implement processes the MessageEngine subtitle message list and includes a simple component to display these messages:

const ChatHistory = () => {    const [chatHistory, setChatHistory] = useState<IMessageListItem[]>([]);    useEffect(() => {        const getChatHistoryFromEvent = (event: MessageEvent) => {            const { data } = event;            if (data.type === "message") {                setChatHistory(data?.data?.chatHistory || []);            }        };        window.addEventListener("message", getChatHistoryFromEvent);        return () => {            window.removeEventListener("message", getChatHistoryFromEvent);        };    }, []);    return (        <>            {chatHistory.map((message, index) => (                <div key={`${message.uid}-${message.turn_id}`}>                    {message.uid}: {message.text}                </div>            ))}        </>    );};

info

The sample code uses window.addEventListener("message") to listen for subtitle data sent by MessageEngine using window.postMessage. For complex applications, Agora recommends using Redux or other state management tools to manage these messages more efficiently.

Create a subtitle processing module instance

Before joining an RTC channel, create a MessageEngine instance and pass in the AgoraRTC client, mode, and callback function.

import AgoraRTC, { IAgoraRTCClient } from "agora-rtc-sdk-ng";class RtcEngine {    private client: IAgoraRTCClient;    private messageEngine: MessageEngine | null = null;    constructor() {        // Create an AgoraRTC client with RTC mode and VP8 codec        this.client = AgoraRTC.createClient({ mode: "rtc", codec: "vp8" });    }    public joinChannel() {        // Create a MessageEngine instance, passing in the AgoraRTC client, mode, and callback function        this.messageEngine = new MessageEngine(            this.client,            EMessageEngineMode.AUTO,            (chatHistory) => {                // Log chatHistory to the console                console.log("chatHistory", chatHistory);                // Send chatHistory to the web page; using Redux or other state management tools is recommended                // Here, window.postMessage is used as an example                window.postMessage({                    type: "message",                    chatHistory,                });            }        );        this.client.join("***", "****", "****", "****");    }}

Release resources

When leaving the call page or ending the conversation, call the cleanup method to release resources.
```
this.messageEngine.clearup()
```

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

API Reference

This section provides API reference documentation for the subtitles module.

Android
iOS/macOS
Web

`ConversationSubtitleController`

class ConversationSubtitleController (    private val config: SubtitleRenderConfig)

config: Subtitle rendering configuration. See SubtitleRenderConfig for details.

`SubtitleRenderConfig`

data class SubtitleRenderConfig (    val rtcEngine: RtcEngine,    val renderMode: SubtitleRenderMode?,    val callback: IConversationSubtitleCallback?)

rtcEngine：Agora RtcEngine instance.
renderMode: Subtitle rendering mode, see SubtitleRenderMode for details.
callback: The callback interface for receiving subtitle content updates, see IConversationSubtitleCallback for details.

`SubtitleRenderMode`

enum class SubtitleRenderMode {    Text,    Word}

Text: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.
Word: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UI.

caution

Using the word-by-word rendering mode (Word) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (Text).

`IConversationSubtitleCallback`

The callback interface for subtitle content update events.

interface IConversationSubtitleCallback {    fun onSubtitleUpdated(subtitle: SubtitleMessage)}

onSubtitleUpdated: Subtitle update callback.
- subtitle: Updated subtitle message, see for details SubtitleMessage.

`SubtitleMessage`

data class SubtitleMessage(    val turnId: Long,    val userId: Int,    val text: String,    var status: SubtitleStatus)

turnId: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to one turnId, and follows the following rules:
- turnId = 0, This is the welcome message of the agent, and there is no subtitle for the user.
- turnId ≥ 1, The subtitles for the user or agent in that round. Use the userId to display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
caution
There is no guarantee that callbacks will be in strictly increasing turnId order. If you encounter out-of-order situations, implement the sorting logic yourself.
userId: The user ID associated with this subtitle message. In the current version, 0 represents the user, non-zero represents the agent ID.
text: Subtitle text content.
status: The current status of the subtitles. See SubtitleStatus for details.

SubtitleStatus

Use SubtitleStatus for special UI processing based on the status, such as displaying an interruption mark at the end of the subtitle.

enum class SubtitleStatus {    Progress,    End,    Interrupted}

Progress: The subtitles are still being generated; the user or agent has not finished speaking.
End: The subtitle generation is complete; the user or agent has finished speaking.
Interrupted: The subtitles were interrupted before completion; the user actively stopped the agent’s response.

`ConversationSubtitleController`

class ConversationSubtitleController {    func setupWithConfig(_ config: SubtitleRenderConfig)    func reset()}

setupWithConfig(_ config:): Set subtitle rendering configuration.
- config: Subtitle rendering configuration. See SubtitleRenderConfig for details.
reset(): Clear the cache.

`SubtitleRenderConfig`

struct SubtitleRenderConfig {    let rtcEngine: AgoraRtcEngineKit    let renderMode: SubtitleRenderMode    let delegate: ConversationSubtitleDelegate?}

rtcEngine：Agora AgoraRtcEngineKit instance.
renderMode: Subtitle rendering mode. See SubtitleRenderMode for details.
delegate: Callback protocol for receiving subtitle content update events. See ConversationSubtitleDelegate for details.

`SubtitleRenderMode`

enum SubtitleRenderMode {    case words    case text}

words: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UI
text: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.

caution

Using the word-by-word rendering mode (words) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (text).

`ConversationSubtitleDelegate`

Callback protocol for subtitle content update events.

protocol ConversationSubtitleDelegate: AnyObject {    func onSubtitleUpdated(subtitle: SubtitleMessage)}

ConversationSubtitleDelegate: Subtitle update callback protocol.
- onSubtitleUpdated: Subtitle update callback.
  - subtitle: Updated subtitle message. See SubtitleMessage for details.

`SubtitleMessage`

struct SubtitleMessage {    let turnId: Int    let userId: UInt    let text: String    var status: SubtitleStatus}

turnId: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to one turnId, and follows the following rules:
- turnId = 0, This is the welcome message of the agent, and there is no subtitle for the user.
- turnId ≥ 1, The subtitles for the user or agent in that round. Use the userId to display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
caution
There is no guarantee that callbacks will be in strictly increasing turnId order. If you encounter out-of-order situations, implement the sorting logic yourself.
userId: The user ID associated with this subtitle message. In the current version, 0 represents the user, non-zero represents the agent ID.
text: Subtitle text content.
status: The current status of the subtitles. See SubtitleStatus for details.

`SubtitleStatus`

enum SubtitleStatus: Int {    case inprogress = 0    case end = 1    case interrupt = 2}

inprogress: The subtitles are still being generated; the user or agent has not finished speaking.
end: The subtitle generation is complete; the user or agent has finished speaking.
interrupted: The subtitles were interrupted before completion; the user actively stopped the agent’s response.

`MessageEngine`

Subtitle processing engine.

class MessageEngine (    rtcEngine: rtcEngine,    renderMode?: EMessageEngineMode,    callback?: (messageList: IMessageListItem[]) => void)

rtcEngine: Agora RTC engine instance.
renderMode: Subtitle rendering mode, See EMessageEngineMode for details. Default is EMessageEngineMode.AUTO.
callback: Callback function for receiving subtitle content updates.
- IMessageListItem[] is a list of messages. See IMessageListItem for details.

`EMessageEngineMode`

enum EMessageEngineMode {    TEXT = 'text',    WORD = 'word',    AUTO = 'auto',}

TEXT: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.
WORD: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UI.
AUTO: Automatic mode. The rendering mode is automatically selected according to the mode supported by the TTS provider.

caution

Using the word-by-word rendering mode (WORD) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (TEXT).

`IMessageListItem`

interface IMessageListItem {    uid: number    turn_id: number    text: string    status: EMessageStatus}

uid: The user ID associated with this subtitle message. In the current version, 0 represents the user, non-zero represents the agent ID.
turn_id: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to one turn_id, and follows the following rules:
- turn_id = 0, This is the welcome message of the agent, and there is no subtitle for the user.
- turn_id ≥ 1, The subtitles for the user or agent in that round. Use the uid to display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
caution
There is no guarantee that callbacks will be in strictly increasing turn_id order. If you encounter out-of-order situations, implement the sorting logic yourself.
text: Subtitle text content.
status: The current status of the subtitles. See EMessageStatus for details.

`EMessageStatus`

enum EMessageStatus {    IN_PROGRESS = 0,    END = 1,    INTERRUPTED = 2,}

IN_PROGRESS: The subtitles are still being generated; the user or agent has not finished speaking.
END: The subtitle generation is complete; the user or agent has finished speaking.
INTERRUPTED: The subtitles were interrupted before completion; the user actively stopped the agent’s response.