What strategies can I implement to enhance the data embedded and improve the quality of vector search results?

I have been working on implementing a semantic/vector search system for images.

My approach involves utilizing gpt-4-mini to analyze an image and generate data using the following prompt:

Your task is to create JSON data based on a given image.
          
            Provide your output in the format below:
            {
            description: "Brief description of the image, focusing on relevant keywords only.",
            text: "Include any text present in the image here, or omit this field if there is none",
            keywords: "Keywords describing the content of the image",
            artstyle: "The artistic style portrayed in the image",
            text_language: "Specify the language of any text within the image, otherwise exclude this field",
            design_theme : "Identify any theme present in the image (e.g., hobby, interest, occupation), otherwise remove this field",
            }

The accuracy of the data I receive from this process appears satisfactory to me. Subsequently, I am incorporating the json data with the "text-embedding-3-small" model.

However, I have encountered issues with the quality of the search results.

For example: There are two images containing only text. One reads "straight outta knee surgery" while the other says "straight outta valhalla."

Upon searching for "straight outta," I find that I need to lower the similarity threshold to 0.15 in order to retrieve both results.

Below is my PostgreSQL search function:

CREATE
OR REPLACE FUNCTION search_design_items (
  query_embedding vector (1536),
  match_threshold FLOAT,
  match_count INT
) RETURNS TABLE (
  id BIGINT
) AS $$
BEGIN
    RETURN QUERY
    SELECT id
    FROM public.design_management_items
    WHERE 1 - (design_management_items.description_vector <=> query_embedding) > match_threshold
    ORDER BY (design_management_items.description_vector <=> query_embedding) asc
    LIMIT match_count;
END;
$$ LANGUAGE plpgsql;

Increasing the threshold value (to 0.5) leads to very few, if any, search results. This seems contrary to what is typically recommended in tutorials, where thresholds of 0.7 or higher are suggested.

What adjustments should I make to enhance the precision of my search outcomes?

Answer №1

Consider implementing a hybrid search approach. The hybrid search feature is available in all vector databases.

According to the official Weaviate blog:

Hybrid search involves combining multiple search algorithms to enhance the accuracy and relevance of search results. It merges the strengths of keyword-based search algorithms with vector search techniques, offering users a more efficient searching experience.

In simple terms, utilizing a hybrid search method means incorporating both keywords and embedding vectors in your search, where you can adjust the alpha parameter to assign weightage to each. For instance, setting alpha to 0 implies keyword search solely, while setting it to 1 signifies embedding vector search exclusively.

I have previously developed a project featuring hybrid search functionality, allowing users to explore insights from Lex Fridman's podcasts without having to watch the entire episodes. Check out the demonstration.

You can access the weaviateHybridSearch.ts file here:

"use server";

import weaviate from "weaviate-client";
import { PodcastType } from "@/app/types/podcast";

// Define and export the queryPodcasts function
export async function queryPodcasts(searchTerm: string, alpha: number) {
  /**
   * Queries the Podcast collection based on a search term and alpha value.
   *
   * @param {string} searchTerm - The search term for querying.
   * @param {number} alpha - The alpha value used in hybrid search.
   * @return {Promise<PodcastType[]>} - Array of PodcastType objects representing search results.
   */

  // Connect to the local Weaviate instance
  const client = await weaviate.connectToLocal();

  // Get the Podcast collection
  const podcastCollection = await client.collections.get<
    Omit<PodcastType, "distance">
  >("Podcast");

  // Execute hybrid search on the Podcast collection
  const { objects } = await podcastCollection.query.hybrid(searchTerm, {
    limit: 10,
    alpha: alpha,
    returnMetadata: ["score"],
    returnProperties: ["number", "guest", "title", "transcription"],
  });

  // Process the results
  const podcasts: PodcastType[] = objects.map((podcast: any) => ({
    ...podcast.properties,
    distance: podcast.metadata?.score!!,
  }));

  // Return the podcasts
  return podcasts;
}

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Error: Trying to access a property that does not exist on an undefined object (retrieving 'kind

Currently, I am working on a project using angular-CLI. When I attempted to create a new module yesterday, an error popped up in the terminal saying Cannot read properties of undefined (reading 'kind') (only this error there wasn't an ...

Issue: The installation of jugglingdb-postgres failed with exit code 1 due to an error in `gyp`

Encountering this issue during the installation of jugglingdb-postgres on my Ubuntu system. Seeking assistance to resolve this problem. Currently running Python 2.7.4 on my Ubuntu machine. gyp: Call to 'pg_config --libdir' resulted in exit statu ...

Encountering issues with dependencies while updating React results in deployment failure for the React app

Ever since upgrading React to version 18, I've been encountering deployment issues. Despite following the documentation and scouring forums for solutions, I keep running into roadblocks with no success. The errors displayed are as follows: $ npm i np ...

Creating a generic constructor in TypeScript while working with inheritance is necessary for defining a flexible and reusable class

In my typescript project, I am looking to create a factory function that can generate a new object instance based on an existing object and a set of properties to populate it with. Essentially, I want to be able to convert one type of object into another r ...

How to integrate a chips feature in Angular 4 using Typescript

Struggling to incorporate a chips component into my Angular web application, which comprises Typescript, HTML, and CSS files. After grappling with this for weeks without success, I have yet to find the right solution. To review my current code, you can a ...

Error encountered in Node.js OpenAI wrapper: BadRequestError (400) - The uploaded image must be in PNG format and cannot exceed 4 MB

Attempting to utilize the OpenAI Dall-e 2 to modify one of my images using the official Nodejs SDK. However, encountering an issue: This is the snippet of code: const image = fs.createReadStream(`./dist/lab/${interaction.user.id}.png`) const mask = fs.c ...

I'm interested in learning how to implement dynamic routes in Nexy.js using TypeScript. How can I

I have a folder structure set up like this: https://i.stack.imgur.com/qhnaP.png [postId].ts import { useRouter } from 'next/router' const Post = () => { const router = useRouter() const { pid } = router.query return <p>Post: {p ...

Encountered an unexpected token error when executing karma-coverage in a project using TypeScript

I have been working on a simple Angular/Typescript project that includes 12 basic unit tests which all pass successfully. However, I am now looking to measure the code coverage of these tests. Despite trying different methods, I have not been able to achie ...

What is the method for extracting search parameters as an object from a URL that includes a hash symbol?

Currently, I am dealing with a URL structured in the following format: https://my-app.com/my-route/someOtherRoute#register?param1="122"&param2="333" While I am familiar with fetching query strings from a standard URL, I am struggli ...

Class-validator does not have any associated metadata

Struggling with implementing a ValidationPipe in my code, I consistently encounter the warning "No metadata found. There is more than one class-validator version installed probably. You need to flatten your dependencies" when sending a request. The struct ...

Tips for accurately typing a "Type Mapping" function

In my code, I have a specific type designed for a function that takes one type I as input and returns another type O as output. Here is how it currently looks: export interface IFunctionalMapping<I, O, K extends keyof O> { [prop: Extract<O[K], ...

Avoiding the inclusion of server-side modules in the webpack build process for a client-side application

I specialize in developing web applications using NodeJS and React. Lately, I've been experimenting with different architecture styles and I'm currently fascinated by the concept of sharing code between the server-side and client-side. I believe ...

Adding properties with strings as identifiers to classes in TypeScript: A step-by-step guide

I am working with a list of string values that represent the identifiers of fields I need to add to a class. For instance: Consider the following string array: let stringArr = ['player1score', 'player2score', 'player3score' ...

You have encountered an issue with the runtime-only build of Vue, which does not include the template compiler

Lately, I have been utilizing Vue in a project and encountered an issue where upon compiling, my browser page displays as white with an error message stating "You are using the runtime-only build of Vue where the template compiler is not available. Either ...

Tips for typing a JavaScript object in one line in TypeScript, with a variable name for the root property

I have this Javascript object: var termsAndConditions = { pt: ["url1", "url2"], en: ["url3", "url4"] } I am looking to elegantly convert it into Typescript in just one line. Here is what I came up with: const termsAndConditions: {[countryKey: Ar ...

Typedoc: only export contents from a particular file are documented

Currently, I am working on developing two npm packages: https://github.com/euberdeveloper/mongo-scanner https://github.com/euberdeveloper/mongo-cleaner My goal is to create documentation for these packages using Typedoc. The main file is index.js p ...

Typedoc Error: Attempted to assign a value to an undefined option (mode)

After installing typedoc with the command npm install typedoc --save-dev, I proceeded to add typedocOptions to tsconfig.json: { "compileOnSave": false, "compilerOptions": { "baseUrl": "./", // ...some lin ...

Ways to simulate a variable imported in the module being tested without it being a function parameter can be achieved by using describe.each and changing the mock value for each test

I have a requirement to test a function within my TypeScript module. module-to-test.ts import { config } from './app-config'; export const isSomethingWhatINeedSelector = createSelector( firstDependencySelector, secondDependencySelector ...

Confirming the legitimacy of ISO-8061 dates using the Decoder method

When it comes to simplicity, using the Decoder approach with io-ts has proven to be effective: import { isRight } from 'fp-ts/Either'; import * as D from 'io-ts/Decoder'; const thing = D.struct({ id: D.number, createdAt: D.string, ...

How can you resolve the error message "No overload matches this call." while implementing passport.serializeUser()?

Currently, I am working on implementing passport.serializeUser using TypeScript. passport.serializeUser((user: User, done) => { done(null, user.id) }); The issue I am encountering is as follows: No overload matches this call. Overload 1 of 2, &ap ...