Upload sample documents
Best practices and guidelines on uploading sample documents to train your model
Nanonets OCR learns from the examples you provide. Collect a variety of examples that encompass the different types of a document your model will encounter and make predictions on in the future.
- If creating a new custom model
- 1.Left panel > New Model > Create your own
- 2.You will see an Upload sample files screen. Drag and drop files or click on Upload files to select files from your device.
- If uploading files to an existing model
- 1.Left panel > Active model > AI Training > Training Files
- 2.You will see a Prepare model for training screen. Drag and drop files or click on Upload files to select files from your device.
- Ensure that you upload at least 10 documents. (Supported formats are .JPG, .PNG, .PDF, .TIFF)
As a starting point, aim to have at least several hundred to a few thousand well-annotated training documents. However, it's important to note that the quality and diversity of the training data are equally important as the quantity. Learn more about recommended practices.

If creating a new custom model: Left panel > New Model > Create your own

If uploading to existing custom model: Left panel > Active model > AI Training > Training Files
- Quantity: The number of documents needed for training an OCR model can vary depending on several factors, such as the complexity of the documents, the variability of the data, and the desired level of accuracy. While there is no fixed threshold, a larger quantity of diverse and representative training documents generally leads to better model performance.
- Variation: It is recommended to have a sufficient number of documents to cover various scenarios and variations that your OCR model is expected to encounter in real-world use cases. This can include different document layouts, fonts, languages, and styles relevant to your use-case.
The number of documents needed for training an OCR model can vary depending on several factors, such as the complexity of the documents, the variability of the data, and the desired level of accuracy. While there is no fixed threshold, a larger quantity of diverse and representative training documents generally leads to better model performance.
Last modified 3mo ago