Upload sample documents
Best practices and guidelines on uploading sample documents to train your model
What are sample documents
Nanonets OCR learns from the examples you provide. Collect a variety of examples that encompass the different types of a document your model will encounter and make predictions on in the future.
How to upload sample documents
If creating a new custom model
Left panel > New Model > Create your own
You will see an Upload sample files screen. Drag and drop files or click on Upload files to select files from your device.
If uploading files to an existing model
Left panel > Active model > AI Training > Training Files
You will see a Prepare model for training screen. Drag and drop files or click on Upload files to select files from your device.
Ensure that you upload at least 10 documents. (Supported formats are .JPG, .PNG, .PDF, .TIFF)
As a starting point, aim to have at least several hundred to a few thousand well-annotated training documents. However, it's important to note that the quality and diversity of the training data are equally important as the quantity. Learn more about recommended practices.
Best Practices for Sample Documents
Quantity: The number of documents needed for training an OCR model can vary depending on several factors, such as the complexity of the documents, the variability of the data, and the desired level of accuracy. While there is no fixed threshold, a larger quantity of diverse and representative training documents generally leads to better model performance.
Variation: It is recommended to have a sufficient number of documents to cover various scenarios and variations that your OCR model is expected to encounter in real-world use cases. This can include different document layouts, fonts, languages, and styles relevant to your use-case.
Frequently Asked Questions
Last updated