External Contents

As a Dydu bot manager, you have the ability to centralize and organize your external content sources directly from an intuitive interface in the BMS, allowing you to generate instant responses based on these sources and thereby improve the quality of responses provided to end users. Through the BMS navigation menu, you can access the External Content page: Content > External Content.

Create a collection

By clicking on the "Your Collections" modal, you need to choose the name of the collection you want to create, and then click on "Create."

A page for your collection is displayed as shown below:

Feed the collection

Import files

It is possible to import multiple documents of the following types: PDF, DOCX, PPTX, TXT.

Register Sharepoints

The Dydu SharePoint reader enables the indexing of pages and files.
Please refer to this documentation.

You need to register a new application in your tenant that has read permissions. The tutorial below explains the process: https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application When you reach the "API permissions" step, the two necessary permissions are:

Microsoft Graph -> Application Permissions -> Files.ReadAll (Grant Admin Consent)
Microsoft Graph -> Application Permissions -> BrowserSiteLists.Read.All (Grant Admin Consent)*

Set the permissions Files.ReadAll and BrowserSiteLists.Read.All for the Dydu application.
The required elements for the configuration are:

a. clientId

b. client Secret (the value)

c. tenant Id

d. SharePoint site ID

Details of the necessary steps on how to retrieve the required values from Azure for the Dydu LLM configuration:

Go to the Azure portal:

Click on App registrations

Click on New registration

Give a name and click on "Register"

The application ID is the client_id

Click on Certificates & secrets. Then, in the "Client secrets" tab, click on New client secret.

Click on Certificates & secrets

Copy the generated secret value (client_secret)

Click on API permissions. Then click on Add a permission.

Click on Microsoft Graph

Then click on "Application permissions". Then add the Sites.Selected and Files.Read.All permissions.

Click on Grant admin consent for XXXX

To find the tenant ID:

Go to the website: https://entra.microsoft.com/

Click on "Overview":

The client ID corresponds to the tenant ID.

To find the SharePoint ID:

Compose the following URL: https://<tenant>.sharepoint.com/sites/<site-url>/_api/site/id

The SharePoint ID can be found in the result

<d:Id
  xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"  
  xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"  
  xmlns:georss="http://www.georss.org/georss"
  xmlns:gml="http://www.opengis.net/gml"  
  m:type="Edm.Guid">
67a90b63-3384-495d-9456-66141cf4ac28
</d:Id>

Features:

1. Indexing pages and/or files from an entire SharePoint site:

Standard RAG
Displaying the original SharePoint URL in the result provided by the RAG

Possible integration with SAML authentication:

A user must be authenticated via SAML
We retrieve their group memberships
Document permission filtering is possible: access to a subset depending on the access rights.

Not indexed:

"Embedded" files in pages
Videos, and some other types (Excel, WMF, ...)

Currently, the process of retrieving documents and indexing takes time (several minutes), and the most frequent refresh is once a day.

Register Smart Tribune

To configure a Smart Tribune source, the following informations are required :

Name: URL of the Smart Tribune API to use
API Key
API Secret
Knowledge base ID list : refers to the IDs of the knowledge bases to retrieve (the same API key / API secret combination can grant access to multiple knowledge bases.)

Using these informations, all documents contained in the specified knowledge bases are retrieved, regardless of their original channel (FAQ).

Register Websites

Type of Websites:

Domain

Sitemap

Specific URLs

The informations about adding your source to your collection are displayed as follows:

Name: the name of your source that you added
Added by: the bot manager's identifier
Created at: the date when you added your source
Status: the status of your source

There are three states for the status:

"Waiting for action": status when no action has been taken.
"Completed": status when the operation (indexation or suggestion) is successful.
"Completed with errors": status when the operation (indexation or suggestion) was completed, but there are errors.
"Processing": status when the operation is ongoing.

Action: the actions you can perform on your added source > delete, edit.

Suggestion and Indexation

Suggest knowledge from the collection
Indexation: index the content of the collection

Details of items in the collection with the status "Completed with errors"

After indexing or suggestion has been performed, it is possible to obtain a status of "Completed with errors".

By clicking on the status, a report is displayed with error details.

Details of Errors from Websites:

In the report details, a success and error percentage is shown.

A breakdown of HTTP error codes is displayed.

The errors can be classified into different categories, such as server-side issues or others.

Details of Errors from SharePoints:

In the report details, a success and error percentage is also displayed.

The report provides comprehensive information about all pages that could not be retrieved, as well as the affected folders.

It also specifies, for each folder, the specific files that could not be retrieved, making it easier to identify missing elements.

Collection configuration

Configuring the indexing parameters of a collection allows you to precisely adapt the bot’s behavior to your business needs and the desired user experience. Each collection has a dedicated card where you can adjust several options to optimize the relevance, length, and style of generated answers, as well as the selection of information sources.

Temperature defines the style of the bot’s answers: the higher the temperature, the more creative the answers can be; conversely, a low temperature favors strictly factual answers. This setting is especially useful to ensure that the tone and level of creativity of the bot match your usage context.
Number of output tokens refers to the length of generated answers. You can choose between short, medium, or detailed answers depending on the complexity of the topics covered or your users’ preferences. Adjusting this parameter helps deliver more concise or, on the contrary, more in-depth information.
Minimum score required for answer sources lets you filter the documents used by the bot: only sources with a score equal to or higher than the defined value will be considered in generating answers and displaying cited sources. This setting ensures that only sources deemed sufficiently relevant or reliable are used to build the answer.
Additional prompt gives you the possibility to add specific context or an instruction that will always be considered when generating answers for the relevant collection. This free-text field allows you, for example, to impose a tone, specify a business instruction, or guide the bot on a sensitive topic.

Dynamic variables can be used in the additional prompt of each collection. For example, ${capture.user_name} is automatically replaced by the actual value retrieved during the conversation or from a web service.

If a variable is not available, it is ignored or replaced by an empty string. This makes it possible to personalize the instructions sent to the RAG engine, resulting in answers tailored to each user’s context.

In order for the capture variables to be correctly replaced in the prompt, they need to be added to the parameters of the Web service: Dydu_RAG. Here is an example with the capture variable user_name:

When the prompt is sent to the RAG engine, it no longer contains the variable in the form ${capture.XXXX}, but directly its value.

For example, a prompt like:

"Give me the value contained in ${capture.city}"

Will be sent to the engine as:

"Give me the value contained in Paris"

If the variable ${capture.city} contains "Paris".

Properly configuring these parameters allows you to obtain relevant, reliable, and tailored answers, while maintaining control over how the bot interacts with your users for each indexed data collection.

Automatic reindexing

You can configure the reindexing frequency of collections using four modes: none, daily, weekly, or monthly.

None: no reindexing is scheduled, data remains unchanged.
Daily: reindexing is performed automatically every day at midnight.
Weekly: reindexing takes place every Monday at midnight.
Monthly: reindexing is performed on the first Monday of each month at midnight.

The day and time of reindexing are predefined and cannot be changed. This configuration allows you to adjust the data update frequency to your needs, while keeping the process simple and automatic.

PreviousContext conditions NextLLM: how to configure each type of model ?

Last updated 1 month ago

Was this helpful?