This guide shows you how to annotate examples of your fields for the AI to learn.
What is Annotation?
During the process, you manually label or highlight the desired information in the training documents. The annotations provide the ground truth data for training the OCR model, enabling it to learn and identify the relevant fields in new, unseen documents.
How to Annotate fields
If you are on the Pro plan, annotate only a few key images to give our Annotation team an idea of the expected field data.
Go to your model > Left nav > AI Training > Training Files
Click on a file to open it. You will be redirected to an expanded view of the file
On the image canvas on the left, Click and drag to draw a box around some data of interest
You will see a popup with the detected text and a dropdown to assign a label.
Click on the dropdown > Select Label from dropdown
Click Save
Do the same for the rest of the Labels. Annotate more files till you have at least 10 examples of each label you have added to the model.
How to Annotate table headers
If you need to capture or Tables from your documents, follow the steps below. Tabular data will be extracted from your documents by default. If the table is captured as expected, simply add headers:
Click on the header dropdown above the table on the right-side > select the corresponding Table header.
Drawing boxes around field data: Define the boundaries or regions of interest for each field in the document. This ensures that the OCR model understands the spatial layout of the information to be extracted.
Assigning Field Labels: Assigning the appropriate labels or identifiers to each field, such as "Invoice Number," "Date," "Total Amount," etc. This helps the OCR model associate the recognized text with the specific field it represents.
When a field name has 1 box associated with it's corresponding data, it is counted as 1 annotation.
What are Annotation Services on Pro?
Annotation Services refer to one of the benefits offered on the Pro plan, where our team handles the cumbersome process of annotating a large number of sample documents for you.
How this works:
Annotate a few sample documents based on the instructions above. This helps ensure clear communication and consistent annotations for best results.
Ensure that you have uploaded all the sample documents you want us to annotate.
Go to Prepare model for training screen (Training files). On the top of the screen > find the line "Skip this step with our Annotation Services". Click on Annotation Services > Send request.
Our annotation team will take over the annotation process for the rest of your uploaded sample documents and get in touch with you if they have any questions.
You can gradually expand the annotation process as your team becomes more familiar with the requirements and guidelines.
I am on the Pro plan. Why do I still need to annotate?
You need to annotate only a few key images to give our Annotation team an idea of the expected field data.
How are table header annotations counted?
Each row in a table is counted as an annotation for that header. For example, the table above has 7 rows. This means when I assign the table header "Description", I am adding 7 examples for Description.
What are the table data capture issues that may lead to poor accuracy?
The auto detected table has 5 columns, I want only 3. What should I do?
Don't select any table header from the dropdown on the column you don't want captured. Leave the header blank. When the model is trained, it will learn to ignore unlabelled columns.