
AI Vector Search Technology
The 23ai database, as the name suggests and the numerous announcements on the official website, has been integrated with artificial intelligence (AI) technology. One of the solutions in this area is the Vector Search functionality, which, as you will soon see, simply takes modern data processing to a whole new level. This is particularly important in the context of the new trend in analysis, where traditional index-based search methods are not always sufficient.
What is Vector Search?
Vector Search is a method that relies on data represented as vectors – sequences of numbers describing the most important features of objects. Thanks to this mechanism, the database engine is not limited to literal text comparison, but analyzes its similarity based on the distances between vectors in multidimensional space. It is a bit like comparing points on a map. The closer they are to each other, the higher the probability that they are similar. Using such technology, we can search for data that is semantically similar, even though it is not literally identical. In other words, we can automatically measure how much two seemingly different sentences mean the same.
The figure below illustrates how similar the word “climate” is to the words “weather” and “heat”, but how far it is from the word “allergy”.

Let’s choose a model
To use Vector Search technology, we need to download a ready, trained model.
For this purpose, let’s use the address:
Then let’s place it in a specially created directory on the server. If the machine has Internet access, you can perform all these operations directly from the console:
$ mkdir -p /u01/app/oracle/models
$ cd /u01/app/oracle/models
$ wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip
$ unzip all_MiniLM_L12_v2_augmented.zip
$ rm all_MiniLM_L12_v2_augmented.zip
What is all-MiniLM-L12-v2?
Hugging Face’s All-MiniLM-L12-v2 is a compact text transformation model that takes a sentence or a short paragraph and maps it to a 384-dimensional vector. These vectors are then used for a variety of tasks, including natural language processing (NLP).
Loading the Model into the Database
Now let’s connect to the PDB1 database, create a dedicated user “VSEARCHUSER”, then create a directory, i.e. a DIRECTORY object called “VSEARCH_DIR” and grant read and write permissions to the new user:
$ sql / as sysdba
SQL> ALTER SESSION SET CONTAINER=PDB1;
SQL> CREATE USER VSEARCHUSER IDENTIFIED BY Password1 QUOTA UNLIMITED ON USERS;
SQL> GRANT CREATE SESSION, DB_DEVELOPER_ROLE, CREATE MINING MODEL TO VSEARCHUSER;
SQL> CREATE OR REPLACE DIRECTORY VSEARCH_DIR AS '/u01/app/oracle/models';
SQL> GRANT READ, WRITE ON DIRECTORY VSEARCH_DIR TO VSEARCHUSER;
Let’s load the model into the PDB1 database using the DBMS_VECTOR package:
$ sql VSEARCHUSER@localhost:1521/pdb1
Password? (**********?) *********
SQL> BEGIN
DBMS_VECTOR.DROP_ONNX_MODEL (
model_name => 'ALL_MINILM_L12_V2',
force => true);
DBMS_VECTOR.LOAD_ONNX_MODEL (
directory => 'vsearch_dir',
file_name => 'all_MiniLM_L12_v2.onnx',
model_name => 'ALL_MINILM_L12_V2');
end;
/
Once loaded, we can check the model information by viewing the USER_MINING_MODELS view:
SQL> SELECT MODEL_NAME, ALGORITHM, MINING_FUNCTION FROM USER_MINING_MODELS
WHERE MODEL_NAME = 'ALL_MINILM_L12_V2';
MODEL_NAME ALGORITHM MINING_FUNCTION
____________________ ____________ __________________
ALL_MINILM_L12_V2 ONNX EMBEDDING
Data for analysis
The next step is to obtain the appropriate data. To perform the exercise, we need sets containing descriptive elements. Due to the model we will use, they should be prepared in English. If you have data that meets these criteria, you can use them. In the exercise, we will use a CSV file containing BBC news from the last few years. However, you do not have to limit yourself to this solution. There is a lot of useful data available on the Internet, provided free of charge, usually by public administration bodies and also by some private companies.
Why is this type of data so readily shared? In the case of state institutions, the main goal is not only to increase the transparency of their operations, but also to allow society to conduct various analyses. Sometimes such action can lead to finding correlations where no one would expect it.
Although analyses of this type go beyond the scope of this book, let me cite a few surprising results of research conducted in the news segment. Namely, a study conducted by MIT showed that fake news spreads six times faster than real information. This is probably due to the emotional nature of such news, which is more engaging, especially among social media users.
Other studies have shown that bad news gets 30% more clicks on average than good news. Meanwhile, a study published in PNAS in 2020 showed that around 60% of readers form their opinions based only on the headline they read. This happens even when the content of the article contradicts it. As you can see, we wouldn’t know this if it weren’t for well-conducted research.
When starting analyses, we need to have good data filtration mechanisms.
While standard mechanisms based on searching for phrases in the text work great for high-quality data, in a situation where it is less structured, contains descriptions or other non-uniform structures, we need to use more advanced methods such as the Vector Search technology described here. Thanks to it, we will find records based on semantic similarity, not just literal character matching.
Let’s go back to our exercise. Using the address:
https://www.kaggle.com/datasets/gpreda/bbc-news
let’s download the file “bbc_news.csv” which contains BBC news from the last few years.

If necessary, create a free account on the service.
Next, let’s connect to the PDB1 database using SQL Developer, using the credentials of the “VSEARCHUSER” user, and import the downloaded CSV file into a new table using the “Import Data…” functionality available from the context menu of the Tables branch.

In the individual windows, you should enter the parameters shown on the screens below.
In the first stage, let’s indicate the file and use the comma as the data separator (delimiter).

In the second stage, we can optionally enter a limit on the number of data records that will be imported. This will speed up the process of generating vectors that we will perform in the rest of the exercise. Let’s enter “BBC_NEWS” as the table name:

In the third step, let’s leave the default values, i.e. import all available columns:


Then after reviewing the summary, we are ready to import:

As some of the records may pose problems, we can skip them and ignore the errors by clicking the “Yes” button:

After the entire process was completed, the data was imported into the table:

Which has over 34,000 records.

Looking at the table structure, we can see that it contains five columns, with the most interesting column being the “TITLE” column:
SQL> DESC VSEARCHUSER.BBC_NEWS;
Name Null? Type
______________ ________ ________________
TITLE VARCHAR2(128)
PUBDATE VARCHAR2(128)
GUID VARCHAR2(128)
LINK VARCHAR2(128)
DESCRIPTION VARCHAR2(256)
Now let’s add a new one and call it “TITLE_VECTOR” where we will store the vector interpretation of the field containing the title. We will use the new data type “VECTOR”:
SQL> ALTER TABLE VSEARCHUSER.BBC_NEWS ADD (
TITLE_VECTOR VECTOR
);
SQL> DESC VSEARCHUSER.BBC_NEWS;
Name Null? Type
_______________ ________ ________________
TITLE VARCHAR2(128)
PUBDATE VARCHAR2(128)
GUID VARCHAR2(128)
LINK VARCHAR2(128)
DESCRIPTION VARCHAR2(256)
TITLE_VECTOR VECTOR
Let’s fill the new column with vector data using the instructions:
SQL> UPDATE VSEARCHUSER.BBC_NEWS
SET TITLE_VECTOR = VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING TITLE AS DATA);
SQL> COMMIT;
Depending on the number of records, this operation can take from a few to even a few dozen minutes. After all, our table will grow by an additional column:

Search using artificial intelligence methods
We now perform a search using the VECTOR_DISTANCE function. It accepts two vectors and returns the distance between them. In the following examples, we create a vector from the search text, then sort the query result by the distance between that vector and the vector stored in the table. Let’s query the database for “Climate news from Europe“:
SQL> VARIABLE SEARCH_TEXT VARCHAR2(100);
SQL> EXEC :SEARCH_TEXT := 'Climate news from Europe';
SQL> SELECT vector_distance(TITLE_VECTOR, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
TITLE
FROM VSEARCHUSER.BBC_NEWS
order by 1
fetch approximate first 40 rows only;
DISTANCE TITLE
_____________________ __________________________________________________________________
0.35682225227355957 European weather: Winter heat records smashed all over continent
0.3730506896972656 Europe weather: How heatwaves could forever change summer holidays abroad
0.39855557680130005 The weather report which changed the course of history
0.41042137145996094 Climate change: Last year's UK heatwave 'a sign of things to come'
0.412065327167511 Europe and US heatwaves near 'impossible' without climate change
0.4154854416847229 Europe heatwave: Nearly all major Italian cities on red heat alert
0.41764819622039795 Europe heatwave: Outdoor events banned in parts of France
0.4258374571800232 Europe heatwave: Temperatures to soar in Greece as fires still burn
0.4332166314125061 Will European heatwaves make us change our summer holiday plans?
0.43449318408966064 Europe wildfires: Are they linked to climate change?
0.43486589193344116 Watch: How hot will it get in southern Europe heatwave?
It really works!
Now let’s search for the phrase: “How to spend the summer months?”:
SQL> EXEC :SEARCH_TEXT := 'How to spend the summer months?';
SQL> SELECT vector_distance(TITLE_VECTOR, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
TITLE
FROM VSEARCHUSER.BBC_NEWS
order by 1
fetch approximate first 40 rows only;
DISTANCE TITLE
______________________________________________________________________________
0.3601928949356079 Where has summer gone?
0.43250101804733276 Cost of living: Saving money on our summer holiday
0.4412391185760498 Where has the UK's summer gone?
0.46068572998046875 Where is summer and why is it so cold?
0.48274606466293335 How to manage your holiday in a heatwave
0.4867743253707886 Is summer finally about to turn up?
0.49696075916290283 Ways to save money even when the budget is tight
0.4971854090690613 Coolest UK summer since 2015
0.4978260397911072 Summer holiday getaway: How to beat the queues
0.4981515407562256 Today's the summer solstice - here's what you need to know
0.5029231905937195 Winter tips: Staying warm and coping with the cold weather
0.5042414665222168 Your pictures on the theme of 'summer walks'
0.5048672556877136 How to save money even when the budget is tight
0.5061560273170471 How to stay warm and cope with the cold weather
0.5071080923080444 Energy saving tips: Five ways to cut costs this winter
0.5082815289497375 Will the UK's warm spring lead to a hot summer?
0.5093449354171753 Tips for staying warm and coping with the cold weather
0.5280313491821289 What are airports doing to avoid summer holiday chaos?
0.532344400882721 Summer holiday: 'We spent 15 hours on a flight to nowhere'
Here too the result looks good!
It would seem that the new version of the database and especially the way of searching for information in it, will not be able to surprise us with anything special.
And here you are!