# External Contents

As a Dydu bot manager, you have the ability to centralize and organize your external content sources directly from an intuitive interface in the BMS, allowing you to generate instant responses based on these sources and thereby improve the quality of responses provided to end users. Through the BMS navigation menu, you can access the External Content page: Content > External Content.

## **Create a collection**

Via the BMS navigation menu, you can access the External Content page : **Content > External Content.**

You will then arrive at your collections page, where you will find one collection created by default.

<figure><img src="/files/a0uY7GSbpVevgTcBZrSw" alt=""><figcaption></figcaption></figure>

By clicking on this collection, you will enter the collection's edition page.

<figure><img src="/files/5wQTxx7Mm0wT9a5w0XpI" alt=""><figcaption></figcaption></figure>

## **Feed the collection**

### Importing your documents

<figure><img src="/files/g1h1u9YkO0CIhoa0Pyj0" alt=""><figcaption></figcaption></figure>

It is possible to import one or more documents of the following types: PDF, DOCX, PPTX, TXT. Each document must be 10MB maximum.

<figure><img src="/files/CNfCPn4ZfIDFmbtFzubP" alt=""><figcaption></figcaption></figure>

### Adding SharePoint sources

<figure><img src="/files/XHu7wwEQClqQ1oLIrLbr" alt=""><figcaption></figcaption></figure>

The SharePoint indexing tool allows you to add your pages and files to your knowledge base.&#x20;

To authorize this access, a new application with read permissions must be registered in your Microsoft environment. The complete process is explained in [this official tutorial](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application).

{% hint style="info" %}
During configuration, at the API permissions stage, two authorizations are required for the Dydu application. In "Microsoft Graph," then "Application Permissions," you must select the following rights and validate with administrator consent (Grant Admin Consent):

* Files.ReadAll
* Sites.Selected
  {% endhint %}

To finalize the application link, four technical elements must then be collected and saved:

* The clientId (client identifier)
* The client Secret (secret value)
* The tenant Id (environment identifier)
* The SharePoint site ID

<figure><img src="/files/n0icpwSQoKNlRWtSs8Wh" alt="" width="563"><figcaption></figcaption></figure>

#### Details of the steps required to retrieve the necessary values from Azure for the Dydu LLM configuration:

Go to the [Azure portal](https://portal.azure.com/#home) :

1. Click on App registrations.

<figure><img src="/files/DCSEYfKvQFQ7RMmAfQkZ" alt="" width="563"><figcaption></figcaption></figure>

2. Click on New registration.

<figure><img src="/files/SJ4yHpQNPkqg9hk7Q0bl" alt="" width="375"><figcaption></figcaption></figure>

3. Provide a name and click "Register".

<figure><img src="/files/Our43wlIxsgPHTEJAnxC" alt="" width="563"><figcaption></figcaption></figure>

4. The Application (client) ID is the client\_id.

<figure><img src="/files/3kBXmg89qtXH0k1d4t4O" alt=""><figcaption></figcaption></figure>

5. Click on Certificates & secrets. Then, under the "Client secrets" tab, click on New client secret.

<figure><img src="/files/KpU3FGNaM807M6E3ZwRs" alt=""><figcaption></figcaption></figure>

6. Click on Certificates & secrets.

<figure><img src="/files/9ZOUa1SyGLl42zmmqiMy" alt="" width="375"><figcaption></figcaption></figure>

7. Copy the generated Secret Value (client\_secret).

<figure><img src="/files/snTtVZPDuiGiQtAK5HuR" alt=""><figcaption></figcaption></figure>

8. Click on API permissions. Then click on Add a permission.

<figure><img src="/files/Dkqae7YTxaWqSAoBhntb" alt=""><figcaption></figcaption></figure>

9. Click on Microsoft Graph.

<figure><img src="/files/aWTM1bXvEe8G4fbunjfx" alt="" width="563"><figcaption></figcaption></figure>

10. Then click on "Application permissions". Add the Sites.Selected and Files.Read.All permissions.

<figure><img src="/files/NVY6phfBrkGKOvDbHMnv" alt="" width="563"><figcaption></figcaption></figure>

11. Click on Grant admin consent for XXXX.

<figure><img src="/files/EWvOPOvRXZJnxO2cYlV8" alt=""><figcaption></figcaption></figure>

12. To find the Tenant ID:

Go to the site: <https://entra.microsoft.com/>

Click on "Overview".

<figure><img src="/files/prS4qZEX3CFqVt8g59qq" alt="" width="563"><figcaption></figcaption></figure>

**The client ID** corresponds to the tenant ID.

13. To find the SharePoint ID:

Compose the following URL: `https://<tenant>.sharepoint.com/sites/<site-url>/_api/site/id`

The SharePoint ID is found within the result:

```
<d:Id
  xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"  
  xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"  
  xmlns:georss="http://www.georss.org/georss"
  xmlns:gml="http://www.opengis.net/gml"  
  m:type="Edm.Guid">
67a90b63-3384-495d-9456-66141cf4ac28
</d:Id>
```

The tool offers the following features:

* Indexing of pages and files from an entire SharePoint site.
* Standard RAG usage, with the source URL of the SharePoint document displayed in the provided response.
* Optional authentication linking (SAML). In this case, the user must log in; the system retrieves their group memberships and filters responses according to their document access rights.

{% hint style="info" %}
Items that are not indexed:

* Files directly embedded within pages.
* Videos and certain specific formats (Excel, WMF, etc.).

Currently, the document retrieval and indexing process takes time (several minutes); the most frequent refresh rate is once per day.
{% endhint %}

### Adding a personalized FAQ

<figure><img src="/files/Xbql6b62Thzh1w8lqmK5" alt=""><figcaption></figcaption></figure>

To set up a personalized FAQ, simply provide the following information :

<figure><img src="/files/bEvMfj7JnL8nahEYhDcn" alt="" width="563"><figcaption></figcaption></figure>

* Name: Corresponds to the API address (URL) to be used.
* API Key.
* API Secret.
* List of IDs for the knowledge bases to be retrieved.

A single combination of API Key and API Secret allows access to multiple knowledge bases simultaneously.

Based on this information, all documents within the specified knowledge bases are automatically retrieved via the FAQ channel.

### Adding a Salesforce configuration

<figure><img src="/files/G8HCExMaNHk0FeFrHGhJ" alt=""><figcaption></figcaption></figure>

To set up a Salesforce configuration, simply provide the following information:

<figure><img src="/files/nYPlHrCN6uBW3jKDvhff" alt="" width="563"><figcaption></figcaption></figure>

* **Name** : Name of the integration
* **Client ID** : Client key from the Salesforce configuration
* **Client Secret** : Secret key from the Salesforce configuration
* **Content Access URL** : URL used to retrieve documents from your configuration

### Adding Website sources

<figure><img src="/files/c6NQcIId6unywOC6ga36" alt=""><figcaption></figcaption></figure>

There are three types of Websites that can be indexed:

#### Domain&#x20;

When you provide a web address to crawl, the tool prioritizes looking for the site map (called a sitemap) to identify pages.&#x20;

If no sitemap is found, the crawl starts directly from the address you entered.&#x20;

{% hint style="info" %}
If this address corresponds to a specific folder on your site, the search will only take place from that precise location.
{% endhint %}

<figure><img src="/files/CJ158syqTuNyxanwNI2a" alt="" width="563"><figcaption></figcaption></figure>

#### Sitemap&#x20;

A sitemap acts like a map of a website. This file lists all the important pages of a site. If you select a sitemap, the tool will only crawl the addresses listed within it.

<figure><img src="/files/OJ3hnuxR1hDsktNvcWXZ" alt="" width="563"><figcaption></figcaption></figure>

#### Specific URLs&#x20;

By providing a list of web addresses (URLs), you precisely define the exact pages the tool should analyze.

<figure><img src="/files/ear2oGnJM21avsr77C3w" alt="" width="563"><figcaption></figcaption></figure>

### Collection Details

Information regarding the addition of your source to your collection will be displayed as follows :

<figure><img src="/files/ed6rBiZaYTX15CsiFh5z" alt=""><figcaption></figcaption></figure>

* **Name** : The name of your source.
* **Added by** : The bot manager's ID.
* **Creation date** : The date you added your source.
* **Preparation** : Status and actions related to source preparation.
* **Indexing** : Status and actions related to source indexing.
* **Last indexed on** : The date of the most recent indexing.
* **Actions** : Available actions for sources (edit, delete, and view details).

{% hint style="info" %}
Preparation is the individual data retrieval stage, during which the tool downloads and reads the content of each added source.

Indexing is the global stage that gathers and integrates all these sources into the knowledge base to allow the bot to generate responses. Any modification to a source requires restarting this global indexing.
{% endhint %}

Several statuses are available to track the progress of your content :

* **Waiting for action** : No action has been initiated on this source yet.
* **Scheduled** : Preparation or indexing of the source is scheduled and will run soon.
* **Canceled** : The preparation or indexing process was interrupted.
* **Preparing** : Downloading and reading the source data is in progress.
* **Ready** : Data has been successfully retrieved; the source is now awaiting indexing.
* **Preparation failed** : An error prevented data retrieval for this source.
* **Indexing in progress** : Data integration into the knowledge base is being processed.
* **Indexed** : The source is fully integrated into the base, and the bot can use it to generate responses.
* **Partial indexing** : The base requires an update. For example, a new source was added but has not yet been indexed with the rest.
* **LLM config test failed** : The process stopped due to an error in the language model configuration.
* **Configuration file not found** : A technical server error prevented the operation from completing correctly.

### Suggestions and Indexing

<figure><img src="/files/ZhyM0QyTj4VQ1XLNACno" alt=""><figcaption></figcaption></figure>

#### Prepare and index the collection&#x20;

This main button allows you to simultaneously launch the preparation and indexing of your entire collection (including all configured sources).&#x20;

By clicking the small adjacent arrow, you can access two specific options:

* Prepare collection only (without launching indexing).
* Index only items that have already been prepared.

It is also possible to act on an individual source: simply click directly on that source's status button to prepare or index it in isolation.

#### Suggest knowledge from the collection

This button allows you to prepare your collection in order to extract an Excel knowledge file, which you can then import directly into your bot.

Important points:

* No indexing: This action does not index the collection.
* No RAG: It does not allow the bot to use these documents to generate responses autonomously.

The sole purpose of this button is the creation of this export file.

### Details for collection items with "Completed with errors" status

Once indexing or suggestion is completed, you may see a "Completed with errors" status.&#x20;

By clicking on the status, a report is displayed with the error details.

* Details of errors from Websites:&#x20;

The report details show a percentage of successes and errors. A breakdown of HTTP error codes is provided.&#x20;

<figure><img src="/files/IaLxFpSxB4CsAX5PsZHy" alt="" width="563"><figcaption></figcaption></figure>

Errors may be classified into different categories, such as server-side issues or others.

<figure><img src="/files/dH0bfeYdA4upU6HnTsTu" alt="" width="563"><figcaption></figcaption></figure>

* Details of errors from SharePoint:&#x20;

The report details also show a percentage of successes and errors. The report provides full details on all pages that could not be retrieved, as well as the folders involved.&#x20;

<figure><img src="/files/ZQMQCPKa7gGFbcAVIHcJ" alt=""><figcaption></figcaption></figure>

For each folder, it also specifies the particular files that could not be retrieved, allowing for clear identification of missing items.

<figure><img src="/files/NAVZPou4lSBNbaoZjjeR" alt="" width="563"><figcaption></figcaption></figure>

## **Collection configuration**

### Customizing responses

<figure><img src="/files/tVHuRdooEcLHU5viWRlv" alt=""><figcaption></figcaption></figure>

**Configuring the indexing parameters of a collection** allows you to precisely adapt the bot’s behavior to your business needs and the desired user experience. Each collection has a **dedicated card** where you can adjust several options to optimize the **relevance**, **length**, and **style** of generated answers, as well as the **selection of information sources**.

* **Temperature** defines the style of the bot’s answers: the higher the temperature, the more creative the answers can be; conversely, a low temperature favors strictly factual answers. This setting is especially useful to ensure that the tone and level of creativity of the bot match your usage context.
* **Number of output tokens** refers to the length of generated answers. You can choose between short, medium, or detailed answers depending on the complexity of the topics covered or your users’ preferences. Adjusting this parameter helps deliver more concise or, on the contrary, more in-depth information.
* **Minimum score required for answer sources** lets you filter the documents used by the bot: only sources with a score equal to or higher than the defined value will be considered in generating answers and displaying cited sources. This setting ensures that only sources deemed sufficiently relevant or reliable are used to build the answer.
* **Additional prompt** gives you the possibility to add specific context or an instruction that will always be considered when generating answers for the relevant collection. This free-text field allows you, for example, to impose a tone, specify a business instruction, or guide the bot on a sensitive topic.
* The **flexible management of the additional prompt** feature provides better control over the final prompt sent to the model. It allows users to view the complete final prompt and choose the precise placement of their additional prompt: at the beginning, in the middle, or at the end.

{% hint style="info" %}
You cannot modify the content of the final prompt itself; you can only insert the additional prompt and define its position to optimize the model's response.
{% endhint %}

<figure><img src="/files/92TY3QZflR7iwVfBUA6X" alt=""><figcaption></figcaption></figure>

#### Advanced Mode: Response Customization

By enabling advanced mode, new configuration options appear to control how the system selects information.

**Minimum score required for response sources**&#x20;

In this block, you will find a new checkbox: **Enable/Disable score filtering before response generation**.

* **Disabled (default)** : All retrieved sources are used to generate the response, regardless of their relevance score.
* **Enabled** : The system applies a strict upstream filter. Only sources with a score greater than or equal to your defined minimum will be kept and used to draft the response.

<figure><img src="/files/GJNBZAOrJwlBNVPxucSz" alt="" width="563"><figcaption></figcaption></figure>

**Response Generation**&#x20;

This new block allows you to configure the amount of information sent to the model to build its response. You will find the following options:

* **Number of direct sources (Top K)** : This parameter is set on a scale of 1 to 10 and defines the number of text extracts (chunks) sent directly to the LLM. It is strongly advised to keep this value low (between 2 and 4). If set too high, the model may be overwhelmed by less relevant information, increasing the risk of hallucinations.
* **Enable/Disable LLM Rerank to improve response accuracy** : This option activates a second review of the extracts to keep only the most relevant ones. While this slightly slows down response generation, it greatly improves the quality and accuracy of the final result.

<figure><img src="/files/18ezHFONscxYvghm2bau" alt="" width="560"><figcaption></figcaption></figure>

**Specific Rerank Parameters**

When you check the LLM Rerank activation box, the interface adapts and new configuration options appear in the response generation block:

* **Pre-selection range (Top K)** : The initial parameter ("Number of direct sources") changes its name and behavior. The model will first scan these K extracts to identify the most relevant ones before keeping only the best (N) to answer. Unlike the classic mode, you can set a higher value here (between 10 and 30). This parameter is adjustable on a scale of 1 to 50.
* **Number of chunks to use (Top N)** : This new parameter (adjustable from 1 to 10) corresponds to the final number of extracts that will actually be used to write the response. The Rerank chooses these N best items from the pre-selection (K). It is recommended to keep this value low (between 2 and 4).
* **Processing power (Batch size)** : This setting defines the number of extracts processed at once from the Top K. A higher value speeds up the response processing time but requires more system resources. Note: The scale of this parameter automatically adapts to your Top K configuration (for example, if your Pre-selection range is set to 35, the Batch size can be configured from 1 to 35).

<figure><img src="/files/JnIB0fOF9uizn365UTDj" alt="" width="562"><figcaption></figcaption></figure>

### Dynamic variables

**Dynamic variables** can be used in the additional prompt of each collection. For example, **`${capture.user_name}`** is automatically replaced by the actual value retrieved during the conversation or from a web service.

If a variable is not available, it is ignored or replaced by an empty string.\
This makes it possible to **personalize the instructions** sent to the RAG engine, resulting in answers tailored to each user’s context.

In order for the capture variables to be correctly replaced in the prompt, they need to be added to the parameters of the Web service: Dydu\_RAG.\
Here is an example with the capture variable `user_name`:

<figure><img src="/files/vO54Mri8q07AJvysIYne" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/6QRtQyzkbYpm0gyIpGZp" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
When the prompt is sent to the **RAG engine**, it no longer contains the variable in the form **`${capture.XXXX}`**, but directly its value.

For example, a prompt like:

* "*Give me the value contained in ${capture.city}*"

Will be sent to the engine as:

* "*Give me the value contained in Paris*"

If the variable **`${capture.city}`** contains "Paris".
{% endhint %}

### Contextualizing RAG with metadata

It is possible to precisely target the documents used by the RAG to generate a response. To do this, you can filter content using the metadata associated with each document (such as a URL or a category). All metadata can be used for this filtering, with the exception of the score.

Example of metadata:

<figure><img src="/files/ZbROKBx3dZz8MKEiQ1r4" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
It is possible to view the metadata of your documents by clicking the button  ![](/files/RMGpDMVP7KcQVdEsynno) of an indexed collection.&#x20;
{% endhint %}

The configuration is done directly in the webservice named Dydu\_RAG. Simply add a new parameter titled metadataFilters. The value of this parameter must be entered in this format: \[{"key": "key", "operator": "operator", "value": "value"}].

<figure><img src="/files/7tjVRDb48FH8BY9dA3Nu" alt="" width="563"><figcaption></figcaption></figure>

Three operators are available to define your filter:

* EQUALS: keeps only the content exactly matching the value.
* NOT\_EQUAL: excludes content exactly matching the value.
* SUB\_STRING: keeps content that includes all or part of the value.

For example, to limit the bot exclusively to Dydu product pages, use the SUB\_STRING operator on the URL as follows: \[{"key": "url", "operator": "SUB\_STRING", "value": "<https://www.dydu.ai/produits/>"}].

### Displaying the RAG Score in Responses

To display the **metadata score** attributed by the RAG (Retrieval-Augmented Generation) model for each response provided, it is necessary to modify the `Dydu_RAG` **webservice** and integrate the information into the **response format**.

To retrieve the **RAG score**, the variable must be extracted from the webservice's return JSON.

Add the following line **inside the JSON** structure to **extract the score value**:

```
var score = returnedJson.metadata[i]["score"];
```

After extracting the value, you must define the **display variable** and format how the score will be presented.

This code checks for the existence of the score and formats it for display, for example, by adding a line break and the label:

```
var scoreDisplay = "";

// Formatting the score for display (e.g.: "Score : [value]")
if (score != undefined && score != null) {
    scoreDisplay = "<br/> Score : <br/>" + score;
}
```

This completes the configuration; the response score will now be displayed with every answer generated by the RAG.

***

**Properly configuring these parameters** allows you to obtain **relevant, reliable, and tailored answers**, while maintaining control over how the bot interacts with your users for each indexed data collection.

## **Content management**

### **Automatic reindexing**

<figure><img src="/files/u7qDn9cWjKRPnwXCveo2" alt="" width="563"><figcaption></figcaption></figure>

You can configure the **reindexing frequency** of collections using four modes: **none**, **daily**, **weekly**, or **monthly**.

* **None**: no reindexing is scheduled, data remains unchanged.
* **Daily**: reindexing is performed automatically every day at midnight.
* **Weekly**: reindexing takes place every Monday at midnight.
* **Monthly**: reindexing is performed on the first Monday of each month at midnight.

The day and time of reindexing are predefined and cannot be changed.\
This configuration allows you to adjust the **data update frequency** to your needs, while keeping the process simple and automatic.

### Content access optimization

<figure><img src="/files/dg2RRLt5uvmbmIeyBns5" alt="" width="563"><figcaption></figcaption></figure>

This option provides fast responses while maintaining high accuracy. It is essential when processing a large volume of data.

However, it may be less effective if your knowledge base is small. It is therefore recommended to test it first to verify its effectiveness on your content using the "Test the RAG" feature.

{% hint style="warning" %}
This option is not enabled by default.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs-en.dydu.ai/contents/external-contents.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
