Custom Model

Do I need a custom model?

When you create a custom model, you can teach the AI to recognize and label any piece of information from any document (e.g. 'numero de factura' from an invoice in Spanish).

  • You have a unique document type (e.g. German identity cards, purchase orders, etc)
  • You need to extract custom fields (that our pre-trained model was not trained with)

What you'll need

To start building your own model, keep the following ready:

  • A minimum of 10 sample files:
    The AI needs examples of the document type you want to extract data from. Collect at least 10 such files (.jpg, .png or .pdf) to train the model on.
  • Fields or label names you need:
    As the AI detects all text on an image, you will need to tell us the key information you need to export. This helps us process your files as structured data (e.g. purchase_date, client_phone, identity_number, phone, height, meter_reading, etc).

Note: The more sample files exist, the more training data your model will be able to learn from. This means the model will be able to auto-extract data more accurately once it is trained. (We recommend 50 sample files for best results)

Video tutorial:


Steps to build your own model

  • From the 'New model' page > select the Custom Document card.
  • A model will be added to your account and you'll be redirected to the Build section. Follow the steps below once you see the page titled Upload sample files:

Step 1: Upload sample files

  • On the Upload sample files page > click on the empty box or drag and drop the 10 or more sample files you collected.
  • Check the Upload status on the top left (ensure that 10 are uploaded)
  • Click on Next once the files are uploaded.

Important: The uploader will not accept duplicate entries of the same file (the model learns best on unique files) If you see any files with upload errors, please re-upload a new file.

Step 2: Specify labels/field names

  • On the Manage labels page > click on the empty input field.
  • Type in the name of one of the labels you want to extract (e.g. invoice_ID)
  • You can leave the Type blank for now. Learn more about label types here.
  • To add another label, click on Add new label.
  • Once you've added the labels you need, click on Start Training.

Step 3: Mark examples of your fields/labels

Show the AI examples of your unique fields or labels by drawing boxes around the relevant text on the image. Each label needs a minimum of 10 examples each (1 instance on each image).

  • You will be redirected to the expanded view of the files you uploaded. 
  • Find the corresponding text for each label on this image. e.g. Here we looked for the Invoice_ID on this image.
  • Click and drag across text to draw a box around the text.
  • You will see a popup with the detected text and a dropdown to assign a label.
  • Click on the 'Label' dropdown.
  • Select the label you want to assign it.
  • Click Save.
  • Do the same for the rest of the Labels.
  • You will now have 1 of 10 examples of each label. Do the same on more images till you have 10 examples of each label you have added to the model.

Tip

Don't have examples of each label on all images? Upload some more files and label the fields without examples in other images.

Done: Train your model!

When you have labelled enough examples on your images (10 per label), click on Train Model from either the expanded view or the All Files page.

Check number of examples marked under Model Details on the All Files page. 

Check number of examples marked on the Training progress bar on top of the expanded view.

Important: Check that each label name you have added has a minimum of 10 examples marked. Don't have enough examples for a label on your files? Delete that label and Train the model without that label.


Troubleshoot building your model:

What happens after I click on Train Model?

Take a break—your model will take around 10 to 45 minutes to train depending on how many files you labelled. We'll send you an email on your registered account when it's ready to auto-extract your data.

Why do I have to add 10 examples?

AI learns from examples. When you mark data on your images and assign a label, the AI uses this information to learn to recognise that data on a new file.

If you add more examples, the AI will have more data to learn from and you will see higher accuracy in how it is able to auto-extract data after Training.

How can I see which labels need more data?

  • On the expanded view of an image > click on the Manage labels dropdown next to the Learning progress bar.
  • Check if the labels listed have 10 examples under the marked column.

Or

  • Click on All Files on the top left corner of this page > you will be taken to the page with the list of files you added (shown above).
  • Under Model details > check the Examples column under the Labels section.
  • You can also locate a file with fewer labels marked from the Marked labels column on the file list.

How do I test if the model works?

Once you receive an email saying your model has been Trained, follow the steps here