This article covers how to create content entities - if you would like to understand content entities we have a Content Entities Explained article.
In this article:
- Create content entity
- Create custom content entities
- Regular Expressions (Regex)
- Expression List
- Synonym List
- Create a csv file
- Edit the template
Create content entity
- Go to Settings > Content Entities
- Click Add Preset or + New Entity
- Add Preset - The common ones we create for you: email, credit_card, iban, and personal_identity_code of different countries.
- + New Entity - Add your own set of rules to create a custom content entity
- Finally, activate the content entity you just added by clicking the three dots in the top right corner and reordering the entities.
Create custom content entities
After clicking + New Entity, enter the fields on the screen:
- Name: This may contain only alphanumerical characters, ‘_’, and ‘-’. No space is allowed.
- Description: Give it a short description with an example
- Sanitize: Enable this by checking the box if you wish our model to replace the detected custom entity with a placeholder
-
Rule Type:
- Regular Expressions: Select this for repetitive patterns of letters, numbers, characters. For example, UAI-94857365.
- Expression List: Select this when you have a list of words. For example, Berlin, Helsinki, Paris, Rome.
- Synonym List: Select this when you have an infinite amount of word combinations of product description. For example, footwear, colors, etc. (This cannot be mixed with other rules.)
If there is more than one rule, they will be operating in an OR logic.
Regular Expressions (Regex)
- In the Regular Expression field, define the regular expression of the content entity
- Enable Case Sensitive if the content entity is case sensitive
- Enable Whole Words if the expression represents the whole word
What is Regex?
Regular Expressions, also known as RegEx, is a special text string for describing a search pattern. When creating your own content entity user RegEx, keep in mind that it should represent all the possible patterns, format, and length of the information you want to identify. Our collection of RegEx can be found here.
Expression List
-
List Implementation: Select what suits your needs
- Partial words: when the expression is a subset of another word or could be within another word. e.g. this would recognize "hat" in "hats", "play"in "played", "playing"
- Whole words: when the expression represents the whole word. e.g. hats, beanies, caps, berets
- Lemmatize: when the content entity has multiple ways of expressing the content entity. e.g. "am", "are", "is", "was", "were", "been" as the other forms of the verb "to be"
Synonym List
Click Save first before uploading your csv file.
Create a csv file
Below is an example of the file format, it can also be downloaded at the bottom of this article:
item1, "item1-synonym1, item1-synonym2, item1-synonym3, item1-synonym4"
item2, "item2-synonym1, item2-synonym2, item2-synonym3, item2-synonym4"
item3, "item3-synonym1, item3-synonym2, item3-synonym3, item3-synonym4"
All synonyms need to be in quotation marks and separated by commas.
Edit the template
We recommend using Notepad for Microsoft and TextEdit for Mac users.
- Notepad: Click Save As, enter the name of your file and add ".csv" manually after selecting All Files. For example, synonymList.csv".
-
TextEdit: After saving the file, rename it and add ".csv" to the end. A window will pop up confirming to change the format to csv, click Use.csv.
If you need support setting up content entities, please submit a request.