Term Cleanup
Set Up the Term Cleanup Feature
1. Decide on Provider
- This Feature is powered by either OpenAI or Azure OpenAI.
- Once you've chosen a Provider, you'll need to create an account and get authentication details.
- When setting things up on the Azure side, ensure you choose either the
text-embedding-3-small
ortext-embedding-3-large
model. The Feature will not work with other models.
- When setting things up on the Azure side, ensure you choose either the
2. Configure Settings under Tools > ClassifAI > Language Processing > Term Cleanup > Settings
- Select the proper Provider in the provider dropdown.
- Enter your authentication details.
- Configure any other settings as desired.
3. ElasticPress configuration
It is recommended to use ElasticPress with this Feature, especially if processing more than 500 terms, as performance will be significantly better. Once the Term Cleanup Feature is configured, you can then proceed to get ElasticPress set up to index the data.
If on a standard WordPress installation:
- Install and activate the ElasticPress plugin.
- Set your Elasticsearch URL in the ElasticPress settings (
ElasticPress > Settings
). - Enable the term index feature.
- Go to the
ElasticPress > Sync
settings page and trigger a sync, ensuring this is set to run a sync from scratch. This will send over the new schema to Elasticsearch and index all content, including creating vector embeddings for each term.
If on a WordPress VIP hosted environment:
- Enable Enterprise Search.
- Enable the term index. Example command:
vip @example-app.develop -- wp vip-search activate-feature terms
. - Run the VIP-CLI
index
command. This sends the new schema to Elasticsearch and indexes all content, including creating vector embeddings for each term. Note you may need to use the--setup
flag to ensure the schema is created correctly.
4. Start the Term Cleanup Process
Once configured, the plugin will add a new submenu under the Tools menu called Term Cleanup.
- Go to the Term Cleanup page, click on your desired taxonomy, then click on the "Find similar" button.
- This initializes a background process that will compare each term to find ones that are similar.
- Once done, all the results will be displayed.
- You can then skip or merge the potential duplicate terms from the settings page.