Agentic CRM Support: Multi-Agent Workflows with Self-Correction

CRM support tickets are messy. They reference past interactions, account state, product details — a single ticket might need a knowledge base lookup, a JIRA action, and a coherent response in sequence.

Overview

Architecture Overview

Architecture (condensed)

A support ticket comes in via Gradio UI, hits a FastAPI endpoint, and enters a LangGraph state machine. Three nodes — Plan, Execute, Evaluate — with a conditional edge that loops back if the response doesn't pass quality thresholds.

User Query → FastAPI → LangGraph Orchestrator
                              │
                    ┌─────────▼──────────┐
                    │    Plan Node        │
                    │  Reasoning Agent    │
                    │  decides: retrieve  │
                    │  / tool / respond   │
                    └─────────┬──────────┘
                              │
                    ┌─────────▼──────────┐
                    │   Execute Node      │
                    │                     │
                    │  ChromaDB RAG       │
                    │  JIRA Tool          │
                    │  HF Generator       │
                    └─────────┬──────────┘
                              │
                    ┌─────────▼──────────┐
                    │   Evaluate Node     │
                    │   Ragas metrics     │
                    │   faithfulness +    │
                    │   relevance >= 0.7  │
                    └─────────┬──────────┘
                              │
              ┌───────────────┴──────────────┐
              │ pass                          │ fail
              ▼                               ▼
        Final Answer                    back to Plan

Why LangGraph

The self-correction loop requires a cycle — Evaluate needs to route back to Plan on failure. LangGraph handles this with a cyclic state machine where each node receives and returns a shared AgentState.

from typing import TypedDict, List

class AgentState(TypedDict):
    query: str
    plan: List[str]          # steps: retrieve / tool / respond
    retrieved_docs: List[str]
    tool_results: List[str]
    response: str
    eval_scores: dict        # faithfulness, relevance
    iteration: int           # prevent infinite loops

The conditional edge checks eval_scores and iteration < MAX_RETRIES before deciding whether to route back or exit.

RAG — Local with ChromaDB and BGE

BGE embeddings (BAAI/bge-base-en-v1.5) from HuggingFace — keeps everything local, no external calls during retrieval. I am using BGE embeddings because it is open-source and shows superior performance on MTEB leaderboard.

from sentence_transformers import SentenceTransformer

embedder = SentenceTransformer('BAAI/bge-base-en-v1.5')

def retrieve(query: str, top_k: int = 5) -> List[str]:
    query_embedding = embedder.encode(
        f"Represent this sentence for searching: {query}"
    )
    results = collection.query(
        query_embeddings=[query_embedding.tolist()],
        n_results=top_k
    )
    return results['documents'][0]

BGE is trained with instruction prefixes — "Represent this sentence for searching:" is required for retrieval queries, not passage encoding.

Self-Correction with Ragas

Ragas evaluates RAG output without ground truth labels. Two metrics:

Faithfulness — every claim in the response must be traceable to the retrieved documents. Catches hallucination.

Answer Relevance — the response must address the actual query. Catches topic drift.

from ragas.metrics import faithfulness, answer_relevancy
from ragas import evaluate
from datasets import Dataset

def evaluate_response(query, response, contexts) -> dict:
    data = Dataset.from_dict({
        "question": [query],
        "answer": [response],
        "contexts": [contexts],
    })
    result = evaluate(data, metrics=[faithfulness, answer_relevancy])
    return {
        "faithfulness": result["faithfulness"],
        "relevance": result["answer_relevancy"]
    }

If either score is below 0.7, the state routes back to Plan with the scores attached. The reasoning agent uses this as signal to adjust retrieval or regeneration strategy on the next iteration.

JIRA Tool

When Plan decides a ticket needs to be created or queried, it emits tool as a step. Execute calls this:

def create_jira_ticket(summary: str, description: str, issue_type: str = "Bug") -> dict:
    payload = {
        "fields": {
            "project": {"key": JIRA_PROJECT_KEY},
            "summary": summary,
            "description": description,
            "issuetype": {"name": issue_type}
        }
    }
    response = requests.post(
        f"{JIRA_BASE_URL}/rest/api/2/issue",
        json=payload,
        auth=(JIRA_EMAIL, JIRA_API_TOKEN)
    )
    return response.json()

The agent decides when to call this based on ticket content — not a hardcoded trigger.

Stack

Python, LangGraph, LangChain, ChromaDB, HuggingFace (Transformers + BGE), Ragas, FastAPI, Docker, LangSmith for tracing.

github.com/iam4tart/agentic-crm-support