Pre-Trained | Custom | Table

Custom, Pretrained and Tables models need to be trained to extract data accurately

If you are on the pro plan, you can request model training. Here’s how you would do it : Link

If you want to train the model yourself, here’s how to do it :

To train the model, go to AI Training in the left hand side navigation, as shown in the image attached


Steps to train:

  • Add labels and table headers that you need to extract by clicking on “Add Label” and “Add Table Header”

  • Upload at least 10 files
  • Add/Edit the labels and tables in the files as required.
  • Once all labels have atleast 10 examples, you can click on the “Train Model” button

The model will take around 10 to 45 minutes to train depending on how many files you labelled. We'll send you an email on your registered account when it's ready to extract your data.

How many files are required as part of training?

There is no fixed number of files, We will need to have a minimum of 10 files. The more the number of files the better would result in general.

  • If you have multiple formats for the same document type, ensure you have samples from each format as part of the training dataset.
  • If you plan to have large dataset > 500 images. Do not upload more than 5 images of the same structure/template.

Is it mandatory to have samples for all templates?

It is not mandatory to have but preferred to have. Accuracies on the templates which are not part of the training dataset might not be the same as on the templates which are part of the training data.

What to do if some templates have low accuracy?

  • Go to the training section
  • Apply filters to find the files for respective templates. For example, a filter like - supplier name contains "ABC Corp" will filter out all files by supplier name ABC Corp
  • Check if the number of files for the format are high enough ( 15-20 files are a good starting point). If there are less number of files, add more files to the training section for the template and annotate
  • Check if all annotations for the template are correct. Please note, even one missed annotation on a single file can affect model output
  • If there are misses in annotations, reannotate and then click on train model again to retrain the model

How long does it take to train a model?

Training usually takes between 2-8 hours depending on the number of files and queued models for training. In case you are facing longer time you can chose to upgrade your model to a paid plan to be moved to the front of the queue and get more compute resources allocated.