Configure the index host, input file, maximum candidates, maximum results, and the default language, and publish the processes.
1To publish the Search_Category process, click Actions in the row that contains the process and select Publish.
2To publish the OpenAI_Embedding process, click Actions in the row that contains the process and select Publish.
3Open the Mass_Index_Category_Chunk process.
4On the Start tab of the Start step, select the Secure Agent from the Run On list.
The Secure Agent must contain the file described in the prerequisites.
5Save and publish the process.
6Open the Mass_Index_Category process.
7On the Start tab of the Start step, select the same Secure Agent as the previous process from the Run On list.
8Publish and click Run Using to run the process with inputs.
9Edit the index_host and input_file input parameters.
Here, index_host is the name of the host in the Pinecone index without https:// and input_file is the complete path and name of the file containing the categories. For example, //home/user/data/egressed_categories.csv.
Run the process and monitor it until it finishes. You will get the embeddings into the Pinecone index through the Pinecone console.
10Open the B360_Enrich_Autoclassification process.
11On the Temp Fields tab of the Start step, enter values for the following fields:
- index_host: The name of the host from the Pinecone index without https://.
- max_candidates: The number of candidates to be retrieved from the Vector DB. If you set a high value, the results might be more accurate, but you’ll consume more OpenAI tokens. Set a value between 25 and 50. Default is 25. Experiment until you get the best results.
- max_results: The maximum number of results to show to the user as suggestions. Default is 5. If you want to auto-assign the most similar category, use 1.
- Default_lang: The default input language of the Business Entity to be classified. If this language is not available, the first value available in the Field Group is used.