In advanced mode, you can use a Vector Embedding transformation to generate vector embeddings for input text, capturing the semantic meaning of the text in a vector format.
Before using a Vector Embedding transformation, use a Chunking transformation to split the text into chunks. Then, the Vector Embedding transformation can generate vector embeddings for each chunk of text using an embedding model like Word2Vec or BERT. For more information about the Chunking transformation, see Chunking transformation.
To create an identifer for each vector, you can use either the UUID_STRING function in an Expression transformation or a Sequence Generator transformation:
•If you use the UUID_STRING function in an Expression transformation, use the function without passing any arguments. The function returns a globally unique ID that can be stored in a string field with a precision of 100.
Note: UUID_STRING is an internal function that you can use only in advanced mode. Using it to create identifiers for other use cases might produce unexpected results.
•If you use a Sequence Generator transformation, create a shared sequence to use across all mappings that load data to the same index in the vector database.
A Target transformation can write the vectors to a vector database.
Note: The Vector Embedding transformation can't run in a serverless runtime environment, on an advanced cluster on Google Cloud, or on GPUs. If the transformation runs on a GPU-enabled cluster, GPUs are disabled and the transformation consumes CPUs.