Annotate examples

This guide shows you how to annotate examples of your fields for the AI to learn.

What is Annotation?

During the process, you manually label or highlight the desired information in the training documents. The annotations provide the ground truth data for training the OCR model, enabling it to learn and identify the relevant fields in new, unseen documents.

How to Annotate fields

If you are on the Pro plan, annotate only a few key images to give our Annotation team an idea of the expected field data.

Learn more about Annotation services on Pro

  1. Go to your model > Left nav > AI Training > Training Files

  2. Click on a file to open it. You will be redirected to an expanded view of the file

  3. On the image canvas on the left, Click and drag to draw a box around some data of interest

  4. You will see a popup with the detected text and a dropdown to assign a label.

  5. Click on the dropdown > Select Label from dropdown

  6. Click Save

Do the same for the rest of the Labels. Annotate more files till you have at least 10 examples of each label you have added to the model.

How to Annotate table headers

If you need to capture or Tables from your documents, follow the steps below. Tabular data will be extracted from your documents by default. If the table is captured as expected, simply add headers:

  • Click on the header dropdown above the table on the right-side > select the corresponding Table header.

  • Do this for all columns you need.

How to add new table

If the table is not captured as expected, you may need to add the table from scratch.

Frequently Asked Questions

What is an annotation? How are they counted?

Field labelling or annotation includes:

  • Drawing boxes around field data: Define the boundaries or regions of interest for each field in the document. This ensures that the OCR model understands the spatial layout of the information to be extracted.

  • Assigning Field Labels: Assigning the appropriate labels or identifiers to each field, such as "Invoice Number," "Date," "Total Amount," etc. This helps the OCR model associate the recognized text with the specific field it represents.

When a field name has 1 box associated with it's corresponding data, it is counted as 1 annotation.

What are Annotation Services on Pro?

Annotation Services refer to one of the benefits offered on the Pro plan, where our team handles the cumbersome process of annotating a large number of sample documents for you.

How this works:

  1. Annotate a few sample documents based on the instructions above. This helps ensure clear communication and consistent annotations for best results.

  2. Ensure that you have uploaded all the sample documents you want us to annotate.

  3. Go to Prepare model for training screen (Training files). On the top of the screen > find the line "Skip this step with our Annotation Services". Click on Annotation Services > Send request.

  4. Our annotation team will take over the annotation process for the rest of your uploaded sample documents and get in touch with you if they have any questions.

You can gradually expand the annotation process as your team becomes more familiar with the requirements and guidelines.

I am on the Pro plan. Why do I still need to annotate?

You need to annotate only a few key images to give our Annotation team an idea of the expected field data.

How are table header annotations counted?

Each row in a table is counted as an annotation for that header. For example, the table above has 7 rows. This means when I assign the table header "Description", I am adding 7 examples for Description.

What are the table data capture issues that may lead to poor accuracy?
  • Headers captured as the first row

  • Two columns' data merged into one

  • Missing columns or rows

Learn more on how to solve these issues here.

The auto detected table has 5 columns, I want only 3. What should I do?

Don't select any table header from the dropdown on the column you don't want captured. Leave the header blank. When the model is trained, it will learn to ignore unlabelled columns.

Last updated