Upload sample documents

Best practices and guidelines on uploading sample documents to train your model

What are sample documents

Nanonets OCR learns from the examples you provide. Collect a variety of examples that encompass the different types of a document your model will encounter and make predictions on in the future.

How to upload sample documents

  • If creating a new custom model

    1. Left panel > New Model > Create your own

    2. You will see an Upload sample files screen. Drag and drop files or click on Upload files to select files from your device.

  • If uploading files to an existing model

    1. Left panel > Active model > AI Training > Training Files

    2. You will see a Prepare model for training screen. Drag and drop files or click on Upload files to select files from your device.

  • Ensure that you upload at least 10 documents. (Supported formats are .JPG, .PNG, .PDF, .TIFF)

As a starting point, aim to have at least several hundred to a few thousand well-annotated training documents. However, it's important to note that the quality and diversity of the training data are equally important as the quantity. Learn more about recommended practices.

Best Practices for Sample Documents

  • Quantity: The number of documents needed for training an OCR model can vary depending on several factors, such as the complexity of the documents, the variability of the data, and the desired level of accuracy. While there is no fixed threshold, a larger quantity of diverse and representative training documents generally leads to better model performance.

  • Variation: It is recommended to have a sufficient number of documents to cover various scenarios and variations that your OCR model is expected to encounter in real-world use cases. This can include different document layouts, fonts, languages, and styles relevant to your use-case.

Frequently Asked Questions

Multi-page PDFs are split up into separate images. Will this affect results?

No, multi-page PDFs are separated on the Training section only for ease of annotations. You will see results from the same document as a single PDF.

How many documents are enough?

The number of documents needed for training an OCR model can vary depending on several factors, such as the complexity of the documents, the variability of the data, and the desired level of accuracy. While there is no fixed threshold, a larger quantity of diverse and representative training documents generally leads to better model performance.

Which file formats can be uploaded?

Supported formats are .JPG, .PNG, .PDF, .TIFF

How to upload over 1000 files at once?

Contact our team at info@nanonets or your dedicated account manager, they will help you with access to our Bulk Uploader.

Last updated