Siddhartha Gunti

Conversational trees

We had two phases for chatbots. The first one was when Facebook created a buzz around chatbots. It died quickly because of how boring and ineffective those chatbots were.

The second phase is now. With GPT, chatbots are back with renewed vigour.

Between these two phases, when everyone wrote out chatbots as not a thing. We at Adaface created a chatbot to conduct a technical test. We often received reviews calling our bot "innovative" and "human-like". Behind the scenes, there was no AI. There was no NLP. And there was no human. There were decision trees designed to conduct an adaptive conversation.

This chapter is a slice of how we built the chatbot. I simplified the problem statement so that the discussion could be wrapped up in a single chapter.

Before I kick off, a short note: I have written this content to go along with my book ‘Logic, Math and Code‘.

Back to business:

Let's design a chatbot that conducts a technical test. Here's the core functionality that we need:

  • Ability to ask MCQ questions to candidates.

  • Candidates should be able to attempt the MCQ question twice.

  • The bot should respond with a contextual hint the first time they make a mistake.

  • Remember and reproduce the state of the chat. State here refers to two portions.

    • Chat history

    • Chat position, i.e., if the candidate is in attempt #2 of question #3, reloading should keep the candidate in the same place.

Here's the second-level breakdown of the chat functionality:

  • Chat needs to support text input from the candidate

  • Chat needs to support MCQ option style input from the candidate

At the atomic level, here's what the bot is doing:

  • It presents information to the user (ex: greeting - "Hi! What is your name?" or "What is 2 + 3?")

  • It expects input from the user (ex: "Siddhartha" or "5" as a response to the above questions)

  • Depending on the user's input, it decides what to do next (i.e, present information or expect input)

One structure that comes to mind if you look at the bot actions at the atomic level is tree-based. Now let's design a tree-like system to support this:

First, let's try to design the simple "nodes" of our tree structure:

// Output node:
class Output {
    constructor({ type, data }) {
        this.type = type; // different type of outputs like 'text', 'image'
        this.data = data; // actual data like 'Hello!', '<url to image>'
    }
}

Simple examples of above node are

new Output({ type: "text", data: "Hello World!" })

Now let's expand this node with a "ui" attribute. This UI is equivalent to the UI that the user sees. This UI should be able to present information to user and accept information from user. We expect UI to have a pre-defined function called "emitOutput". Any node with access to UI can use this function to present information to the user.

class Output {
    constructor({ type, data }) {
        this.type = type;
        this.data = data;
    }

    assign({ ui }) {
        this.ui = ui;
    }

    execute() {
        this.ui.emitOutput({
            type: this.type,
            data: this.data
        });
    }
}

Now UI can assign itself to this node by calling .assign(). When it wants to "execute" this node, it can call "execute" which in turn calls its own emitOutput function. Which in turn shows the output to the user.

So far, This functionality works well for simple output's like "Hello!". But what if we want to keep it dynamic?

For example, We want to say "Hello <user>" based on the user's name.

Now let's raise the responsibility of the "ui" to accommodate this. One: UI maintains a "data store". When the output is emitted, individual nodes can feed into this information to create dynamic output. Two: Along with type and data, there can be a function which takes in type, data and data store as inputs to decide the output.

Let’s put it together:

class Output {
    constructor({ type, data, fn }) {
        this.type = type;
        this.data = data;
        this.fn = fn;
    }

    assign({ ui }) {
        this.ui = ui;
    }

    execute() {
        this.ui.emitOutput(this.fn({
            dataStore: this.ui.getDataStore(),
            type: this.type,
            data: this.data,
        }));
    }
}

This makes our outputs more versatile. For example, here's a dynamic output:

new Output({
    type: "text",
    data: "Hello, ",
    fn: ({type, data, dataStore}) => {
        return {
            type,
            data: data + dataStore.name
        }
    }
});

Now let's create a node that "expects" information from the user. This can be the text input from the user (like their name). Or it can be an MCQ option (one of the options for the MCQ question like "A", "B", "C", etc.)

class Expectation {
    constructor({ type, data, fn }) {
        this.type = type; // "text" or "MCQ"
        this.data = data; // any relevant information that UI needs to create input UI like max 100 characters or one among "A", "B", "C" etc
        this.fn = fn; // dynamic function using which actual type and data can be modified.
    }

    assign({ ui }) {
        this.ui = ui;
    }

    execute() {
        this.ui.emitExpectation(this.fn({
            dataStore: this.ui.getDataStore(),
            type: this.type,
            data: this.data,
        }));
    }
}

Now, let's go one step higher. Nodes that output messages to UI and then expect some input from the user.

class Node {
    constructor({
        output, expectation
    }) {
        this.output = output;
        this.expectation = expectation;
    }

    assign({ ui }) {
        this.ui = ui;
        if (this.output) {
            this.output.assign({ ui });
        }
        if (this.expectation) {
            this.expectation.assign({ ui });
        }
    }

    execute() {        
        if (this.output) {
            this.output.execute();
        }
        if (this.expectation) {
            this.expectation.execute();
        }
    }
}

Now let's add the functionality of receiving user input (expectation from user fulfilled). Based on the user's input, these nodes do two things. One- might learn some new information about the user, like the user's name. Or the user's answer to a question (right or wrong). And Two- decides when to move to the next node.

To facilitate these two core functionalities of our atom, let's create a "decisionMaker" function. This function does these-

  1. Accepts user input + data store from UI and decides if there is any new data (dubbed "miniDataStore") that it learnt.

  2. Merge this new learning with the UI's data store so that other nodes can use this information (Ex: Node 1 learnt that the user's name is "Siddhartha". Node 100 should be able to say, "Bye! Siddhartha".

  3. Decides the next step:

    1. Move to a new node with some name (based on user input. For example: if the output was "Do you like Java or JavaScript?" and if the user's input was "Java", then the next node would be "Q1 of java").

    2. If there is no new node to move to, inform UI that it can end the session.

The job of 2 & 3 both depend on the user's input and happen at the same time. So we can handle them both by adding a "decisionMaker" function as the node's input.

To serve these jobs, we expect the UI to have

  • A function to merge the miniDataStore into its own data store. (let's call it mergeMiniDataStore)

  • A function to move to the next node by name. (let's call it moveToNextNode)

  • A function to end the session. (let's call it endSession)

Let’s put this together with these assumptions now:

class Node {
    constructor({
        output, expectation, decisionMaker
    }) {
        this.output = output;
        this.expectation = expectation;
        this.decisionMaker = decisionMaker;
    }

    assign({ ui }) {
        this.ui = ui;
        if (this.output) {
            this.output.assign({ ui });
        }
        if (this.expectation) {
            this.expectation.assign({ ui });
        }
    }

    execute() {        
        if (this.output) {
            this.output.execute();
        }
        if (this.expectation) {
            this.expectation.execute();
        }
        // If this is a node which just presents information to user
        // we don't need to wait for user's input. There is nothing to wait for.
        if (!this.expectation) {
            this.onInput({ input: null });
        }
    }

    // process user's input
    onInput({ input }) {
        const { nextNode, miniDataStore } = this.decisionMaker({
            input,
            dataStore: this.ui.getDataStore(),
        });

        if (miniDataStore) {
            this.ui.mergeMiniDataStore({
                miniDataStore,
            });
        }

        if (nextNode) {
            this.ui.moveToNextNode(nextNode);
        } else {
            this.ui.endSession();
        }
    }
}

This node becomes a fundamental atom for our system. For example, a flow like:

  • Bot responding, "Hi! what's your name?"

  • Bot awaiting user's text input

is equivalent to a single node with one output and one expectation. If we have to keep the fundamental atom as is. When there is no expectation, instead of using output, we can simply use node by setting expectation as null.

For the UI to use this atom, it has to maintain a set of nodes by their names (so that when "moveToNextNode" happens, it can switch to the new node and "execute" on the new node. Also, if you notice, the UI needs to know which node is the first! So when the session is new, it knows which node to start with.

Since a chat will have hundreds and possibly thousands of nodes, let's create an intermediate "node" that can act as a UI as well as a node. (picture that we have discussed leaves of a tree so far. Now we are going one step higher). Why should this intermediary node be a UI AND a node?

  • Why a node? - so that other nodes can be its parent

  • Why a UI? - so that all its children can "imagine" that it is the UI and call UI's functions on it. But since the intermediate node is itself a node, it will percolate this information one level higher! Neat right?

Let's call this intermediary node-cum-ui a "manager" and start building:

To keep matters simple, we will keep this manager node with null output and expectation so that the core of its work is to manage its nodes.

class Manager extends Node {
    constructor({
        nodeMap, rootNode, decisionMaker
    }) {
        super({ output: null, expectation: null, decisionMaker });
        this.nodeMap = nodeMap;
        this.rootNode = rootNode;
    }

    emitOutput({ type, data }) {
       // Need to percolate information to it's own parent node (which can be another manager or ui)
    }

    assign({ ui }) {
        if (ui) {
            this.ui = ui;
        }
    }


    emitExpectation({ type, data }) {
        // Need to percolate information to it's own parent node (which can be another manager or ui)
    }

    execute() {
        // Execute the current active node
    }

    /* responsibilities of being a UI kick in here. */

    mergeMiniDataStore({ miniDataStore }) {
        // Need to merge this new update with the global store
    }

    getDataStore() {
        // Return the global data store
    }

    moveToNextNode(node) {
        // Move to next node from the node map
        // Ensure this manager node assigns itself as "ui" to the next node 
        // Execute this new node to move things forward
    }

    onInput({ input }) {
        // Received user's input from parent. Since there is no expectation here, the node which is waiting for input is the current active node. Pass the input to that node.
    }

    endSession() {
        // The job of this manager has come to an end.
        // Since this itself is a node, this part is equivalent to 
        // ... onInput of a node. 
        // i.e, merge any new learnings to global store
        // decide which next "node" to go to and ask the parent to move to that node
        // if there is no such manager to go to, end the session one level higher 
    }
}

If you go through the logic we need above, we realize two requirements:

  1. Managers are connected to other managers like a tree (each manager inside itself has a tree of managers :D)

  2. There is something called an “active” or “current” node.

Let’s put these together:

class Manager extends Node {
    constructor({
        nodeMap, rootNode, decisionMaker, currentNode
    }) {
        super({ output: null, expectation: null, decisionMaker });
        this.nodeMap = nodeMap;
        this.rootNode = rootNode;
        this.currentNode = currentNode;
    }

    getNodeFromName(name) {
        return this.nodeMap[name];
    }

    emitOutput({ type, data }) {
       this.ui.emitOutput({
           type,
           data
       })
    }

    assign({ ui }) {
        if (ui) {
            this.ui = ui;
        }
    }

    emitExpectation({ type, data }) {
        this.ui.emitExpectation({
            type,
            data
        })
    }

    execute() {
        this.getNodeFromName(this.currentNode).execute();
    }

    /* responsibilities of being a UI kick in here. */

    mergeMiniDataStore({ miniDataStore }) {
        this.ui.mergeMiniDataStore({ miniDataStore });
    }

    getDataStore() {
        return this.ui.getDataStore();
    }

    moveToNextNode(node) {
        this.currentNode = node;
        this.getNodeFromName(this.currentNode).assign({ ui: this });
        this.execute();
    }

    onInput({ input }) {
        this.getNodeFromName(this.currentNode).onInput({ input });
    }

    endSession() {
        const { nextNode, miniDataStore } = this.decisionMaker({
			dataStore: this.getDataStore(),
		});

		if (miniDataStore) {
			this.mergeMiniDataStore({
				miniDataStore,
			});
		}

		if (nextNode) {
			this.currentNode = this.rootNode;
			this.ui.moveToNextNode(nextNode);
		} else {
            this.currentNode = this.rootNode;
            this.ui.endSession();
		}
    }
}

Go through the logic carefully here. Nothing fancy but just one step higher than the previous node. But this node is not exactly the UI node, yet. It is just enough to work with another "ui" node set in the constructor. But it in itself can't be a UI node. Because someone still needs to provide "dataStore" that every node and manager hungrily asks for. Some UI node has to finally perform mergeMiniDataStore that every manager is percolating to the top. And someone has to perform "endSession", whatever that endSession is.

Here is an interesting question, we know that every manager is a UI. And since it's a tree, its parent is another manager/ UI and so on, until the tree reaches the top. And at the top sits the actual UI node. Given the current manager code, how do you know if the current manager is the UI at the top or if it is any other manager?

this.ui will be null :D Because there is no UI to assign to as such!

Let's use this knowledge to amp up our manager to work as the main root UI.

class Manager extends Node {
    constructor({
        nodeMap, rootNode, decisionMaker,
    }) {
        super({ output: null, expectation: null, decisionMaker });
        this.nodeMap = nodeMap;
        this.rootNode = rootNode;
        this.currentNode = rootNode;
    }

    // new helper
    isRootUI() {
        return !this.ui;
    }

    getNodeFromName(name) {
        return this.nodeMap[name];
    }

    emitOutput({ type, data }) {
        if (this.isRootUI()) {
            // this is where handoff to frontend happens.
            // i.e, actually show it to the user in webpage or iOS app or whatever the frontend is
        } else {
            this.ui.emitOutput({
                type,
                data
            })
        }
    }

    assign({ ui, dataStore }) {
        if (ui) {
            this.ui = ui;
        }
        // giving a way to maintain global datastore at root UI
        if (dataStore) {
            this.dataStore = dataStore;
        }
        this.getNodeFromName(this.currentNode).assign({ ui: this });
    }

    emitExpectation({ type, data }) {
        if (this.isRootUI()) {
            // This is where we need to hand it off to the actual frontend to showcase that input box or buttons to collect user input
        } else {
            this.ui.emitExpectation({
                type,
                data
            })
        }
    }

    execute() {
        this.getNodeFromName(this.currentNode).execute();
    }

    mergeMiniDataStore({ miniDataStore }) {
        if (this.isRootUI()) {
            // update the global datastore
            this.dataStore = Object.assign({}, this.dataStore, miniDataStore);
        } else {
            this.ui.mergeMiniDataStore({ miniDataStore });
        }
    }

    getDataStore() {
        // return the actual object
        if (this.isRootUI()) {
            return this.dataStore;
        }
        return this.ui.getDataStore();
    }

    moveToNextNode(node) {
        this.currentNode = node;
        this.getNodeFromName(this.currentNode).assign({ ui: this });
        this.execute();
    }

    onInput({ input }) {
        this.getNodeFromName(this.currentNode).onInput({ input });
    }

    endSession() {
        const { nextNode, miniDataStore } = this.decisionMaker({
			dataStore: this.getDataStore(),
		});

		if (miniDataStore) {
			this.mergeMiniDataStore({
				miniDataStore,
			});
		}

		if (nextNode) {
			this.currentNode = this.rootNode;
			this.ui.moveToNextNode(nextNode);
		} else if (this.isRootUI()) {
            // We actually have to mark this session as ended here. 
            // i.e, close down the session 
            // This is where we inform the frontend to wrap it up. 
            // Wrap it up might mean different things for different apps
            // Might be a exit screen or same chat session with no more inputs allowed
        } else {
            this.currentNode = this.rootNode;
            this.ui.endSession();
		}
    }
}

Now if you see, the remaining pieces talk about the "frontend". i.e., actually showing information to the user or actually collecting information from the user. This "frontend" can be anything - from an iOS chat application to a simple web or terminal console application. What the "frontend" is doesn't matter.

But we can see what functionality the "frontend" needs to implement:

  1. showOutputToUser

  2. getInputFromUser

  3. endUserSession

  4. startUserSession

To make it easier, we will assume "frontend" is an input to the root node and implements these features. So we can freeze our decision tree code.

class Manager extends Node {
    constructor({
        nodeMap, rootNode, decisionMaker, frontend
    }) {
        super({ output: null, expectation: null, decisionMaker });
        this.nodeMap = nodeMap;
        this.rootNode = rootNode;
        this.currentNode = rootNode;
        this.frontend = frontend;
    }

    // new helper
    isRootUI() {
        return !this.ui;
    }

    getNodeFromName(name) {
        return this.nodeMap[name];
    }

    emitOutput({ type, data }) {
        if (this.isRootUI()) {
            this.frontend.showOutputToUser({
                type,
                data
            })
        } else {
            this.ui.emitOutput({
                type,
                data
            })
        }
    }

    assign({ ui, dataStore }) {
        if (ui) {
            this.ui = ui;
        }
        if (dataStore) {
            this.dataStore = dataStore;
        }
        this.getNodeFromName(this.currentNode).assign({ ui: this });
    }

    emitExpectation({ type, data }) {
        if (this.isRootUI()) {
            this.frontend.getInputFromUser({
                type,
                data
            })
        } else {
            this.ui.emitExpectation({
                type,
                data
            })
        }
    }

    execute() {
        this.getNodeFromName(this.currentNode).execute();
    }

    mergeMiniDataStore({ miniDataStore }) {
        if (this.isRootUI()) {
            this.dataStore = Object.assign({}, this.dataStore, miniDataStore);
        } else {
            this.ui.mergeMiniDataStore({ miniDataStore });
        }
    }

    getDataStore() {
        if (this.isRootUI()) {
            return this.dataStore;
        }
        return this.ui.getDataStore();
    }

    moveToNextNode(node) {
        this.currentNode = node;
        this.getNodeFromName(this.currentNode).assign({ ui: this });
        this.execute();
    }

    onInput({ input }) {
        this.getNodeFromName(this.currentNode).onInput({ input });
    }

    endSession() {
        const { nextNode, miniDataStore } = this.decisionMaker({
			dataStore: this.getDataStore(),
		});

		if (miniDataStore) {
			this.mergeMiniDataStore({
				miniDataStore,
			});
		}

		if (nextNode) {
			this.currentNode = this.rootNode;
			this.ui.moveToNextNode(nextNode);
		} else if (this.isRootUI()) {
            this.frontend.endUserSession();
        } else {
            this.currentNode = this.rootNode;
            this.ui.endSession();
		}
    }
}

That's it. That's our entire decision tree in its full glory. To test this, let's do two things

  • Create a simple decision tree for a few MCQs

  • Create a vanilla frontend web implementation

The flow we will build out will look like this:

  1. Start session

  2. Respond: "What is your name?"

    1. Accept: text from the user

  3. Respond: "Hello! <user name>”

  4. Respond: "What are you comfortable with?"

    1. Accept: One of "Java" or "JavaScript"

  5. Respond: "Sweet."

  6. Respond: "What is 1 + 2 in Java?" (if the previous response is "Java")

    1. Accept: text

    2. Respond: "Correct!" or "You are higher" or "You are lower" depending on the user's input

    3. Accept: text if the previous attempt is not correct and if it is < 3 attempts.

  7. Respond: "What is 1 + 2 in JavaScript?" (if the previous response is "JavaScript")

    1. The rest of the flow remains the same as in Java

  8. Respond: "That's it from my end. Hope you liked the chat <user name>."

  9. End session

A vanilla web app “frontend” code wrapper looks like this:

var frontend = {
    showOutputToUser: ({ type, data }) => {
    },
    getInputFromUser: ({ type, data }) => {
    },
    endUserSession: () => {
    };
};

Brace yourselves! Manually creating such long nodes will look too lengthy. Here's the link to a complete code with the above flow and a sample frontend wrapper - https://github.com/siddug/conversational-trees

The final UI of the same code is hosted here - https://conversational-trees.siddg.com/

I am going to stop here, but here are the immediate next questions to ask:

  • If, in the middle of the chat, you reload the page. How do we restore the chat? - We can store the data store but that only restores the "learnings" so far. But doesn't remember which node under which node under which node is the current active node. How do you solve this? Hint- each node name is different at each manager.

  • What if the front end malfunctions and resends input for the old node? Hint- each expectation can have a unique id generated from stringing together the node names from the root to reach the node that produces the expectation.

There are more such questions to ask in a real system-

  • How can you "time" each question? So that the tree automatically moves to the next node if the user doesn't answer in time.

  • How can you have asynchronous operations inside the tree? For example, an open AI GPT call to generate the output?

  • And so on

All good questions to extend the functionality of such a system. But one interesting question is,

How do you make creating such systems realistic? Your content team will likely enter question data in a simple JSON format. How do we club these JSONs to create the nodes automatically?

If you want to see a real life version of a bot in it’s full glory - check our bot at www.adaface.com.

built with btw btw logo