Building AI Features.

This guide will show you how to infuse your applications with AI capabilities, using the AI Plugin

Overview

Developers can build AI features into their applications using Servoy's AI Plugin, a developer-focused toolkit, which enables a broad range of functionality using today's latest Large Language Models (LLMs).

How It Works

Developers can use the plugin to programmatically interact with models of their choosing for chat, vector embedding/search and agentic features. This enables them to infuse business applications with AI capabilities for a broad range of use cases.

Use Cases

The potential use cases are virtually limitless, but we'll break it into a few categories and patterns.

Natural Language Interfaces (NLI)

Business Apps are NOT User-Friendly In the real-world, we deal with systems of record, structured data, and rigid business rules. Despite our best efforts in modern UI design, the end-user has always been forced to reckon with the inherent structure of the underlying system.

Better UX through NLIs LLMs provide the potential to break this age-old incompatibility by allowing users to interact with systems using Natural Language (text, but also images and sound), both by understanding user intent as input, and by explaining results as output.

For example, a user today may be forced to review a complicated BI report, when she likely needs an answer to a question or to sharpen her insight or to help with a decision.

Perhaps she could just...ask in her own words! and be answered in plain language or even a picture? Here are some more examples:

  • Let users interact with the application using everyday language

  • Turn plain-language requests into searches, filters, and reports

  • Reduce the need for complex or highly customized UI

  • Support flexible input instead of rigid forms

  • Make advanced features easier to access without training

  • Capture user intent without requiring knowledge of the data model

Knowledge Retrieval

Most organizations rely on an ecosystem of data and information. AI-infused business applications can provide context on-demand.

  • Let users find information by meaning, not just keywords

  • Surface relevant documents and records automatically

  • Find similar cases, issues, or documents

  • Answer questions using your own content and data

  • Ground AI responses in known, trusted sources

Unstructured Data

Most organizations sit on large amounts of dark data — emails, documents, notes, attachments, images, PDFs, and free-text fields that are stored but rarely used. AI makes it possible to extract meaning from this unstructured content and turn it into data that applications can search, reason over and act on.

  • Turn documents archives into structured, usable data

  • Automatically classify and tag content

  • Extract key information from messy or inconsistent inputs

  • Group and match similar content

  • Make large volumes of text searchable and actionable

Assistants

Assistants can be embedded directly into applications to provide contextual guidance, explanations, and support at the moment it’s needed. They help users understand data, decisions, and processes without leaving the context of their work.

Assistants commonly:

  • Provide in-app guidance and explanations

  • Help users understand records, screens, and decisions

  • Answer questions using application context

  • Support better human decision-making

  • Reduce friction in complex workflows

Agents

Agents go a step further than assistants by acting on behalf of the user. They can plan and execute multi-step tasks, call application services or external tools, and operate with varying levels of autonomy while remaining under application control. An agent would work in collaboration with a user or, using Servoy's Automation Tools, could run completely autonomously to fulfill tasks and pass control to users only as needed.

In General, Agents can:

  • Execute multi-step tasks toward a defined goal

  • Invoke application services and external tools

  • Coordinate actions across systems or workflows

  • Operate with user oversight or approval

  • Automate repetitive or complex processes

Model Choice

Servoy’s AI plugin is intentionally model-agnostic. It does not lock you into a specific LLM, embedding model, or vector store. Instead, it provides a consistent integration layer that lets you choose, configure, and evolve the AI components that best fit your application, architecture, and compliance requirements.

What This Means in Practice

  • Choose your own LLMs, embedding models, and vector stores

  • Switch or combine providers without changing application logic

  • Manage your own API keys and credentials

  • Control usage limits, quotas, and rate limiting

  • Monitor and manage token consumption and costs

  • Decide where models run (cloud, private, or local)

  • Apply your own security, compliance, and data-handling policies

Why This Matters

  • Avoid vendor lock-in

  • Adapt quickly as models and providers evolve

  • Align AI usage with your organization’s governance and cost controls

Get Started

TBD

Building Chat Flows

Quick Overview

All language models work in essentially the same way: they take text as input and return text as output. A "chat completion" is simply a structured way to send prompts, instructions, and conversation history to a model and receive a response.

The model itself does not understand your application, data, or workflows. It is up to the developer to provide context, design prompt templates, manage conversation state, and connect the model’s output to application logic and UI.

The examples below show how chat completions can be used as a foundational building block that you can extend with context, retrieval, and application behavior.

circle-info

You don't have to build a chatbot While chat completions can be used to build chatbot-style interfaces, they are not limited to chat-based UX. In many applications, chat completions run entirely behind the scenes — generating explanations, interpreting user intent, transforming text, or driving application logic — without the end user ever seeing a “chat” interface.

Basic Chat Completion

This is the Hello World example of starting up a Chat-Cilent, sending in a prompt and receiving the response. It's just a few lines of code and every use case builds upon this.

Build a Chat Client instance

First, you will need to create an instance of a ChatClient using a Builder object. There are different builders, depending on the for the model provider of your choice.

Here you see a couple of example builders for different providers, both taking several parameters. Some are optional, but the following are required:

  • API Key - Obtain this from your model's vendor after you set up your developer account.

  • Model Name - Obtain a list of compatible model names from your vendor of choice.

  • System Message - This is a guiding principle for the model, including what type of responses it generates. It could be that it is a domain expert (i.e. US Copywrite Law) or it always answers in Strict JSON output, etc.

circle-info

Handing API Keys As a best practice, do not hardcode it, but load it from a secure location, such as your servoy.properties file, i.e. application.getServoyProperty('openai_api_key');

Prompt the Model

Once you have created a ChatClient instance, sending your prompt into the model is a single line of code to call the chat method, passing in the userMessage parameter. This method runs asynchronously and will return a JavaScript Promise object to manage the response. More on that below.

Handle the Response

Here you can see that the Promise object resolves to a ChatResponse object, which is passed into the handling function. Then simply call the getResponse method to get the String value of the response.

Conversation Memory

Chat Models, contrary to many assumptions, don't actually preserve any state or memory. The illusion of a continuous conversation is created by sending the chat history with every call. Fortunately, Servoy's AI plugin handles this for you. All you have to do is enable memory.

Setting the Max Memory

To enable chat memory, simply set the max number of tokens that is "remembered" when you provision the ChatClient by calling maxMemoryTokens on the builder. After this, you can reuse the client object and every time you call chat(), it will automatically remember the last X tokens from your session to be used for context.

circle-info

Token Management Memory is not enabled unless specified and it is recommended to only use memory for use cases where you need to keep a conversation thread as part of your context. Keep in mind your costs per token and your feature requirements when setting your max value.

Streaming Response

The response payload from a chat completion request is actually delivered in chunks as it is generated. However, the default approach is to call the then() method of the Promise object, which resolves to the entire response payload. This is really all you need if you are doing pure programmatic interactions.

However, if you are displaying the response content to the user, you may want to display the chucks as they become available.

To handle a streaming response, call the chat() method as before, but with slightly different parameters:

  1. String The prompt input (same as the first example)

  2. Function This function is called on-partial-complete and a String is passed in. You can append this String to a local variable to show the response as it is generated.

  3. Function This next function is called when the final chunk is generated. You can append this String one last time to complete the transaction.

Remember that all you need to do is create a Form Variable and append the chunks to it in each callback. You can render the result with a data-bound component (such as a TextArea) as it is generated.

Overview

Vector Embeddings are a way to represent text as numerical values that capture its meaning rather than its exact wording. By converting text into embeddings, applications can compare, search, and match content based on semantic similarity instead of keywords alone.

The Servoy AI plugin supports creating Embedding Models to generating embeddings from text, and store those embeddings in a Vector Store. Once stored, embeddings can be searched to find related or similar content, enabling features such as semantic search, similarity matching, and retrieval of relevant context for AI-driven workflows.

Developers control how embeddings are created, what content is embedded, where embeddings are stored, and how search results are used within the application. This makes vector search a flexible building block that can be applied to many use cases, including document search, case matching, recommendation, and context retrieval for language models.

Basic Text Embedding

While there are many use cases and approaches, all of them follow a general pattern

  • Configure an Embedding Model to convert text into vector representations

  • Generate embeddings for documents, records, or other content

  • Store those embeddings in a Vector Store, along with relevant metadata

  • Convert a search content into an embedding using the same model

  • Search the vector store for the most semantically similar vectors

  • Use the search results in application logic, UI, or prompt construction

Create an Embedding Model

Embedding Models are a type of LLM that is specifically designed to generate vector embeddings. This is your first step. To create a model instance, you'll need:

  1. Your API key (i.e. for OpenAI or Gemini)

  2. The name of the model that you want to use

Create a Vector Store

While you can directly use the model to create vector embeddings (An array of numbers), it's most common to pass those vectors into a store. You can do this in one step by creating a store from the model, then generating the embeddings and storing them in a single line of code.

In this example, the embed() method takes a single argument:

  1. An Array of Strings to embed and store.

Once you have stored content as vectors in a store, you can do similarity searches.

In this example, we search for similarity scores and print the resulting score

You can see that each item in the Vector Store is returned with a Similarity Score (0-1). This is a simple example, but if you take this to scale, you can embed, classify and search documents and unstructured data.

Document Understanding

Continuing with the vector embedding example, let's take a look at how one could digest and search an entire document.

Chunking and Embedding Documents

This example uses a Vector Store, as before, but this time the call to embed is taking the following arguments:

  1. File - a JSFile object. This is the file, whose contents will be embedded.

  2. Chunk SizeNumber The text content is split into smaller chunks, and each chunk is embedded separately. This value defines the maximum size of each chunk, measured in tokens.

  3. Chunk OverlapNumber Specifies how much text (in tokens) is shared between adjacent chunks. Using an overlap helps preserve context when content is split at arbitrary boundaries, reducing the risk of losing meaning across chunks.

circle-info

What is the ideal chunk strategy? It depends on the document content and use case. For general documents, start with 300-500 tokens with a 10-15% overlap. Very long-form text may need larger chunks to capture context. Highly structured (dense) text, such as code, lists, tables, etc. may require smaller chunks.

Searching Documents

Once you have chunked and embedded your document(s), you can search for matching chunks using the exact same approach from the previous example:

This will return an array of results, each containing the matching chunk and its similarity score.

Using Documents as a Knowledge Base

Let's build on this example to show how you can use documents as a knowledge base by leveraging vector search and chat together. Imagine that you have digested a repository of product manuals and the end-user can interact with the the knowledge base via chat.

In this example, we used the search result from the document as an input for context in a chat session. (For simplicity, we chose only the top-ranking chunk, but you can imagine a more complex scenario that fuses and ranks multiple results to provide context for the chat input.)

circle-check

Using Vector Metadata

When you embed text into vectors, the embedding captures meaning — but not where it came from. Metadata solves that by storing structured fields alongside each vector so you can (a) trace results back to the original source and (b) filter or scope searches to the right subset of content.

Practical Uses for Metadata

  • Traceability: show “this result came from Document X (chunk 12)”

  • Filtering: search only within a customer, tenant, project, category, date range, etc.

  • Security scoping: restrict retrieval to what the current user is allowed to access

  • Navigation: open the exact record or document location directly from results

Embedding Relational Data

Vector embeddings are not limited to documents and files. Text derived from relational data can also be embedded to enable semantic search, similarity matching, and intent-based retrieval over application records.

For example, suppose an end-user is searching a database of products and enters a key word "beer"

In this approach, selected fields from a database record are combined into a textual representation and embedded as a vector, while metadata is used to preserve the record’s identity, type, and access scope. This allows applications to perform semantic search across structured records—such as customers, cases, tickets, or products—while still resolving results back to concrete records that users can view and act on.

Create Embeddings for Record Sets

In this example, Servoy's AI plugin offers a shortcut: By calling embedAll and passing a FoundSet object, you can embed a set of records in one simple call. You'll notice that the method is overloaded, so you can embed data from multiple text columns in a single call. Finally, you don't have to worry about metadata. The plugin will automatically store each record's primary key (PK) column(s).

Semantic Search of Record Embeddings

In this example, we search the vector store for products with a name similar to the input search text. Critically, we use the getMetadata method of the results to link it back to the products entity. This method returns a simple JavaScript object, from which you can directly access the PK value(s) by column name.

circle-check

Types of Vector Stores

In-Memory Vector Store

Until this point, all the examples have used an In-Memory Vector Store. This implementation is great for getting started, because it is easy to setup and has the same functionality. Depending on your use case, it may be perfectly adequate.

When NOT to use an in-memory store:

As the name suggests, this type of vector store caches the vectors and therefore has the following limitations:

  • Not persistent — Vectors stored in memory will not persist beyond the current runtime in which they were embedded. Therefore it is not ideal for using across sessions.

  • Not Scalable — If you need to embed large volumes of data, then you should consider a proper Vector Database. In-memory is ideal for quick, one-off jobs, such as a single document. However, it's not uncommon for an organization to vectorize many documents in a single store. In this case, you would want to avoid the in-mem implementation.

Persistent Vector Store

Many relational databases include extensions to provide Vector Embedding and Search capabilities. Servoy's AI plugin gives you the option to connect to vector-enabled databases and use them as your store.

Since v2025.12, Servoy Developer ships PG Vector, a vector extension for the PostgreSQL database. You can use this out-of-the-box on your PostgreSQL servers.

In this example, we use the createServoyEmbeddingStoreBuilder method of the EmbeddingModel to create a persistent Vector Store, with the following options:

  • serverName — The target server (must have PG Vector or vector extension)

  • tableName — The name of the table that will be created on that server

  • recreate — Boolean, true if you want the persistence cleared (not used across sessions)

  • addText — Boolean, true if you want to add the plain (unembedded) text

  • metadataColumns — sub-builder to capture metadata, for later filtering and retrieval

Tool Calling

FAQ

How am I charged for model usage?

Can I use a local or open source model?

Last updated

Was this helpful?