In advanced mode, you can use a Vector Embedding transformation to generate vector embeddings for input text, capturing the semantic meaning of the text in a vector format.
Before using a Vector Embedding transformation, use a Chunking transformation to split the text into chunks and process it to make the data cleaner and semantically more consistent for vector embedding. Then, the Vector Embedding transformation can generate vector embeddings for each chunk of text using an embedding model like Word2Vec or BERT, or your own embedding model. For more information about the Chunking transformation, see Chunking transformation.
To create an identifer for each vector, you can use either the UUID_STRING function in an Expression transformation or a Sequence Generator transformation:
•If you use the UUID_STRING function in an Expression transformation, use the function without passing any arguments. The function returns a globally unique ID that can be stored in a string field with a precision of 100. For more information, see Function Reference.
•If you use a Sequence Generator transformation, create a shared sequence to use across all mappings that load data to the same index in the vector database.
A Target transformation can write the vectors to a vector database.
Note: The Vector Embedding transformation can't run in a serverless runtime environment on AWS, on an advanced cluster on Google Cloud, or on GPUs. If the transformation runs on a GPU-enabled cluster, GPUs are disabled and the transformation consumes CPUs.