Unveiling the Magic Behind Our AI Teammate: Inside Our Proprietary RAG Architecture

Get a look under the hood to discover how our AI Teammate’s RAG architecture transforms data into actionable insights

Sharma Podila, VP of Engineering
Jun 17th, 2024
Share

Our AI Teammate’s latest features enable teams to harness their data for immediate answers, alert investigations, and actionable insights. We often see people’s eyes light up when they witness our AI Teammate in action. But few get to truly understand the magic going on under the hood.

Our AI Teammate is designed for flexibility and scalability. By leveraging a Retrieval-Augmented Generation (RAG) architecture, along with a templatized job system and an IPSE (Ingest, Project, Sessionize, Embed) model, we ensure efficient and reliable data ingestion and processing — allowing for high customization, quick integration of new data sources, and robust performance tailored to enterprise needs.

Why Build RAG In-House?

Building a system like ours involves ingesting customer data, which is then projected and indexed to extract a comprehensive knowledge base. Enterprise customers need scale, reliability, data security, and data governance. While there are open source libraries available to quickly build similar systems with RAG architecture, building and managing one for enterprise-grade use cases requires long term investment. We aimed to ensure a high degree of control and customization throughout the architecture and to fit seamlessly into our internal Java-based development environment.

The RAG architecture, and the LLM application space in general, is rapidly evolving. We didn’t want to restrict ourselves to existing practices. Instead, we focused on fully customizing and efficiently managing every aspect, from data projection and contextual representation to data chunking, embedding creation, retrieval processes, and ensuring the safety of LLM responses.

Our goal was to build a platform capable of quickly adding new data sources while remaining agnostic to any specific source. Addressing source-specific concerns is primarily about securely connecting to customer-specific sources.

We strive to get the architecture right, at least directionally, and optimize for execution speed by choosing implementations based on our team size and skill sets. The implementations can evolve over time, independently, allowing us to adapt and refine our system as needed.

A Deep Dive Into Our Architecture

The diagram shows both functional and the managed service aspects of the system we built.

Users can define data ingestion configuration for various systems in their environment, such as Slack, Confluence, GitHub, Jira, and more. They can also interact with the knowledge base in a ChatGPT-style interactive session that seamlessly blends proprietary data with general insights from LLM within their preferred operational tool, such as Slack, or on the web.

To provide a highly reliable managed service for each ingestion, we created a concept of JobClusters. A JobCluster instance acts as the owning entity for each defined ingestion to ensure and manage the SLAs, including any changes in user intent. Users are able to stop and restart ingestions if needed, or change their configuration.

As part of the managed service, each JobCluster creates an instance of a templatized job definition for the configured data source (more to come on this below). JobCluster instances submit a set of Jobs into the homegrown Job Execution Engine. The engine provides a horizontally scaling, highly reliable execution service. It supports:

  • Exactly-once semantics for job submission per JobCluster
  • A checkpoint store for efficient processing when continuing after a restart
  • The ability to yield when data source connectors have caught up with the latest data in customers’ data sources and need to pause and resume later instead of tying up a job execution slot unnecessarily

The Agent Orchestration Engine provides a reliable service to run agents triggered by either user conversations or by our system in response to incoming automated alerts. Currently, the agents perform actions such as analyzing the user query or the incoming alert to determine user intent and appropriate actions to take via our Automation Platform. We adopt a HITL (human in the loop) approach, allowing users to control tool calls or modify their invocation parameters.

For user chat sessions, we maintain both a history of the user queries and system responses as well as the retrieved chunks and their scores from the RAG pipeline. We map chunks into clickable deep links into users’ proprietary data sources, making them available to the users in the chat response.

Templatized Jobs: Streamlining Data Ingestion

One of the core innovations behind our AI Teammate is the use of templatized jobs with an IPSE (Ingest, Project, Sessionize, Embed) model, which allows us to efficiently handle the diverse and ever-growing number of data sources our customers rely on.

Given the variety of data source types, creating a unique orchestration module for each one would be tedious and inefficient. To address this, we’ve abstracted the ingestion orchestration into a set of standardized operations, Ingest, Project, Sessionize, and Embed. Our ingestion infrastructure manages the entire process for any data source. The specific implementations for each new data source type provide the necessary logistics, such as:

  • Source Connection Logistics: Establishing and maintaining a secure connection to the data source.
  • Data Projection: Transforming raw data from the source into a format suitable for embedding and storage in vector databases

Advanced Data Processing Techniques

  • Chunking Data: We divide the data into manageable chunks suitable for our RAG architecture. We control the overlap across chunks within a session to optimize context and the quality of our vector embeddings.
  • Creating Vector Embeddings: Using OpenAI's API, we generate vector embeddings for each data chunk and store them in our OpenSearch vector database. This enables efficient and accurate data retrieval.

By standardizing these operations and using our managed template system, we can seamlessly integrate new data sources, ensuring efficient and reliable data ingestion and processing for our AI Teammate. This innovative approach allows us to scale our solution to meet the diverse needs of our customers while maintaining high standards of performance and reliability.

While there are many ways we’ve designed and architected these templatized jobs, here are some of the unique aspects:

Dynamic Schema for Flexible Data Projection

Being able to introduce new data sources at rapid speed is critically important for our users who work with dozens of tools and large datasets. Rather than hard coding the schema inside of each source, we introduced a dynamic schema that provides the speed and flexibility we needed.

The metadata schema is dynamically inferred, making it easy for someone adding a new data source to annotate the metadata fields that will be extracted and write the data source specific mapper function that populates the metadata. The Job template orchestrator ties all this together, being agnostic to individual data sources.

Maintaining Wire Speed with Asynchronous Boundaries

Our ingestion pipeline processes data at the speed it arrives, ensuring no delay in receiving the data. However, for tasks like sessionization and creating vector embeddings, which require more processing time, we introduce an asynchronous buffer. This buffer allows us to maintain wire speed for initial data handling while accommodating the slower processing stages without delay.

Sessionizing Data Streams for Better Context

Some data sources, like Confluence documents, Jira tickets, and GitHub pull requests, have natural boundaries that define a session, making it easy to understand and embed the data within its context.

Handling unstructured data like Slack conversations, however, is more difficult. How do you make sense of an individual Slack message that might say “Yeah, I’m working on that”? For continuous streams like Slack channels, we introduced sessionization, which groups related messages into sessions based on their context. This process ensures that each message is understood within the broader conversation, improving the quality of vector embeddings.

Optimized Data Retrieval with Hybrid Queries and Time Decay

Once data is stored, retrieving it efficiently with a high degree of accuracy is critical. We enhance retrieval quality using hybrid queries that combine filtering, pattern matching, and vector matching with weighted scores. Time decay further refines this process by weighting search results based on their temporal proximity to a custom time point, ensuring efficient and relevant data retrieval.

Ensuring Reliability with Continuous Testing and Validation

Validation is critical to a well-oiled RAG system. To ensure reliability and accuracy, we have implemented comprehensive end-to-end integration testing. These tests cover the entire system from ingestion to retrieval-augmented generation (RAG). They are integrated into our CI/CD pipeline, allowing us to validate changes before deploying them to production. This framework ensures quick and confident iteration without disrupting customers’ workflows.

Leading the Way in Enterprise AI

Building our AI Teammate has provided valuable insights into scaling, data governance, and integration. Moving forward, we aim to refine our system, expand data source support, and enhance our Gen AI capabilities to meet evolving customer needs. Our continuous testing framework ensures we can deliver new features and improvements without disrupting production environments.

Our AI Teammate represents the forefront of enterprise GenAI solutions, leveraging innovative technology to transform how businesses manage and utilize their data. At Transposit, we’re committed to pushing the boundaries of what’s possible, delivering tools that drive efficiency, accuracy, and strategic advantage for our customers. Request a demo to see our AI Teammate in action.

Share