I have been working on implementing a semantic/vector search system for images.
My approach involves utilizing gpt-4-mini to analyze an image and generate data using the following prompt:
Your task is to create JSON data based on a given image.
Provide your output in the format below:
{
description: "Brief description of the image, focusing on relevant keywords only.",
text: "Include any text present in the image here, or omit this field if there is none",
keywords: "Keywords describing the content of the image",
artstyle: "The artistic style portrayed in the image",
text_language: "Specify the language of any text within the image, otherwise exclude this field",
design_theme : "Identify any theme present in the image (e.g., hobby, interest, occupation), otherwise remove this field",
}
The accuracy of the data I receive from this process appears satisfactory to me. Subsequently, I am incorporating the json data with the "text-embedding-3-small" model.
However, I have encountered issues with the quality of the search results.
For example: There are two images containing only text. One reads "straight outta knee surgery" while the other says "straight outta valhalla."
Upon searching for "straight outta," I find that I need to lower the similarity threshold to 0.15 in order to retrieve both results.
Below is my PostgreSQL search function:
CREATE
OR REPLACE FUNCTION search_design_items (
query_embedding vector (1536),
match_threshold FLOAT,
match_count INT
) RETURNS TABLE (
id BIGINT
) AS $$
BEGIN
RETURN QUERY
SELECT id
FROM public.design_management_items
WHERE 1 - (design_management_items.description_vector <=> query_embedding) > match_threshold
ORDER BY (design_management_items.description_vector <=> query_embedding) asc
LIMIT match_count;
END;
$$ LANGUAGE plpgsql;
Increasing the threshold value (to 0.5) leads to very few, if any, search results. This seems contrary to what is typically recommended in tutorials, where thresholds of 0.7 or higher are suggested.
What adjustments should I make to enhance the precision of my search outcomes?