Document Classification and Routing Model

Overview

This feature lets you automatically classify and send documents to distinct OCR models.

For example:

  1. You want to discard all non-invoice documents sent by your users. You do this by creating a document classification model and then routing only the invoice documents to the correct OCR model.
  2. You want to route different document types (e.g. receipts, invoices, and purchase orders) to distinct OCR models that serve each type of document. You can create a document classification model with 3 labels for each of these 3 documents and then select the OCR model you want the documents to be processed against.

Creating a Document Router Model

  1. Create a Document Classification Model

    1. Head over to https://app.nanonets.com/#/
    2. Scroll down to the Document Sorting section
    3. Click on the Document Classification Model Card

  2. Create Labels

    1. Create the necessary labels

      1. For example: invoices, receipts
    2. Choose the respective OCR models that you want to route the documents to

      1. For example: The receipt and AP workflow pretrained Models


  3. Train Model 

    1. Upload 10 images to each category
      1. For example: 10 receipt images, 10 images of invoices
    2. Select Train
    3. Once the model is trained, you can upload images to classify them.

Integration

  1. Once you have created and trained your own model, you have a few options to integrate:

    1. Email - You can send documents via a dedicated email inbox. You can get this from heading over to https://app.nanonets.com/#/ic/extract/MODEL_ID and clicking on the upload file button and choose the Import via Email option. By default, all the attachments sent in an email will be processed. The attachment will then get routed to the respective OCR model
  2. API - You can send files directly to the Document Router API. Available here ->https://nanonets.com/documentation/#operation/ImageCategorizationLabelFilePost
    Like email, all the documents sent in this POST API call, will be sent to the respective OCR model

    1. Response Format
      1. The data of the OCR model results will be present in 'data_extraction_results`
      2. This structure will be the same as the OCR model results

      {  
      "message": "Success",  
      "result": \[{  
      "message": "Success",  
      "prediction": [{  
      "label": "receipts",  
      "probability": 0.836599  
      },  
      {  
      "label": "invoices",  
      "probability": 0.16340101  
      }  
      ],  
      "file": "00ba7aad-bd43-4449-b649-add832b325ae.jpeg",  
      "page": 0,  
      "label": "receipts"  
      }],  
      "signed_urls": {  
      "uploadedfiles/06f748b9-350b-46b2-ac8a-dfae2298b09c/7e096fca-3315-4cf8-83cc-66641252bc5b.jpeg": {  
      "original": "<https://nnts.imgix.net/uploadedfiles/06f748b9-350b-46b2-ac8a-dfae2298b09c/7e096fca-3315-4cf8-83cc-66641252bc5b.jpeg?expires=1665595218&or=0&s=d8c571407eb57941f84b7e4f8abba1b2">,  
      "original_compressed": "<https://nnts.imgix.net/uploadedfiles/06f748b9-350b-46b2-ac8a-dfae2298b09c/7e096fca-3315-4cf8-83cc-66641252bc5b.jpeg?auto=compress&expires=1665595218&or=0&s=481d4f025c664175cdd23f902efe496c">,  
      "thumbnail": "<https://nnts.imgix.net/uploadedfiles/06f748b9-350b-46b2-ac8a-dfae2298b09c/7e096fca-3315-4cf8-83cc-66641252bc5b.jpeg?auto=compress&expires=1665595218&w=240&s=7f4323f31fa714b42f1757df50ebc50a">,  
      "acw_rotate_90": "<https://nnts.imgix.net/uploadedfiles/06f748b9-350b-46b2-ac8a-dfae2298b09c/7e096fca-3315-4cf8-83cc-66641252bc5b.jpeg?auto=compress&expires=1665595218&or=270&s=f387fb219d5cabdcf55a2a6e4f588e7b">,  
      "acw_rotate_180": "<https://nnts.imgix.net/uploadedfiles/06f748b9-350b-46b2-ac8a-dfae2298b09c/7e096fca-3315-4cf8-83cc-66641252bc5b.jpeg?auto=compress&expires=1665595218&or=180&s=b9f8ec62e6e91cf7c8f80d3aa024b85a">,  
      "acw_rotate_270": "<https://nnts.imgix.net/uploadedfiles/06f748b9-350b-46b2-ac8a-dfae2298b09c/7e096fca-3315-4cf8-83cc-66641252bc5b.jpeg?auto=compress&expires=1665595218&or=90&s=97c6299c60a8456386d5b4a423a1f8ab">,  
      "original_with_long_expiry": "<https://nnts.imgix.net/uploadedfiles/06f748b9-350b-46b2-ac8a-dfae2298b09c/7e096fca-3315-4cf8-83cc-66641252bc5b.jpeg?expires=1681132818&or=0&s=1b380a6e0a65b5edfb9c38d2fbed4d6d">  
      }  
      },  
      "data_extraction_result": {  
      "message": "Success",  
      "result": \[{  
      "message": "Success",  
      "input": "00ba7aad-bd43-4449-b649-add832b325ae.jpeg",  
      "prediction": \[],  
      "page": 0,  
      "request_file_id": "55272e5a-44b9-4a57-9a55-11e99eee0960",  
      "filepath": "PredictionImages/",  
      "id": "9dfe75f5-4a30-11ed-8bda-96810894b27e",  
      "rotation": 0,  
      "file_url": "uploadedfiles/09d205e6-7283-44be-a360-8428b410233a/RawPredictions/00ba7aad-bd43-4449-b649-add832b325ae-2022-10-12T13-20-18.811.jpeg",  
      "request_metadata": ""  
      }],  
      "signed_urls": {  
      "PredictionImages/": {  
      "original": "<https://nnts.imgix.net/PredictionImages/?expires=1665595219&or=0&s=cb21580e3ff217d04eac31421505215b">,  
      "original_compressed": "<https://nnts.imgix.net/PredictionImages/?auto=compress&expires=1665595219&or=0&s=6d8a3328edc6646279b7b579d12354a8">,  
      "thumbnail": "<https://nnts.imgix.net/PredictionImages/?auto=compress&expires=1665595219&w=240&s=d2323f8f0f5f0ed1b6e0e37ef87f5fc3">,  
      "acw_rotate_90": "<https://nnts.imgix.net/PredictionImages/?auto=compress&expires=1665595219&or=270&s=1e75b08dfa73df680eb09f640081c17f">,  
      "acw_rotate_180": "<https://nnts.imgix.net/PredictionImages/?auto=compress&expires=1665595219&or=180&s=931f27a6d764cc548d897e07b013d23d">,  
      "acw_rotate_270": "<https://nnts.imgix.net/PredictionImages/?auto=compress&expires=1665595219&or=90&s=30915233f2b024fb6954d50565e4889b">,  
      "original_with_long_expiry": "<https://nnts.imgix.net/PredictionImages/?expires=1681132819&or=0&s=5b112c38944b793a22cb191943be68b7">  
      },  
      "uploadedfiles/09d205e6-7283-44be-a360-8428b410233a/RawPredictions/00ba7aad-bd43-4449-b649-add832b325ae-2022-10-12T13-20-18.811.jpeg": {  
      "original": "<https://nanonets.s3.us-west-2.amazonaws.com/uploadedfiles/09d205e6-7283-44be-a360-8428b410233a/RawPredictions/00ba7aad-bd43-4449-b649-add832b325ae-2022-10-12T13-20-18.811.jpeg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA5F4WPNNTLX3QHN4W%2F20221012%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20221012T132019Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&response-cache-control=no-cache&X-Amz-Signature=639c02ffa9af432ac731ad48c0d68c28f7ad068cd65a0d05282ad45c494fad30">,  
      "original_compressed": "",  
      "thumbnail": "",  
      "acw_rotate_90": "",  
      "acw_rotate_180": "",  
      "acw_rotate_270": "",  
      "original_with_long_expiry": ""  
      }  
      }  
      }  
      }
      

Behaviors and Caveats

  1. In case you want to discard documents, use the Do Nothing option available in Step #1 while creating the model.

  2. In case the files are uploaded to the Document Classification Model via email, the document will be automatically routed in the OCR model. We recommend using the webhooks export in the OCR model to retrieve the results.

  3. The data of the Document Classification will also be present in the OCR model's GET APIs

    1. The data of the classification will be present in the page_classification_result key

    {
    "moderated_images_count": 1,
    "unmoderated_images_count": 1,
    "moderated_images": [{
    "model_id": "05618833-0469-4fd2-a5fc-8e4a61e64486",
    "day_since_epoch": 19200,
    "is_moderated": true,
    "hour_of_day": 13,
    "id": "bfcf1e47-0dad-11ed-8ede-3200f1e279c7",
    "url": "uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/85975318-3bf7-4c0b-aec1-908f616b0192.jpeg",
    
    "predicted_boxes": [],
    "moderated_boxes": [],
    "page_classification_results": {
    "message": "Success",
    "model_id": "00000000-0000-0000-0000-000000000000",
    "result": [{
    "file": "string",
    "message": "Success",
    "page": 0,
    "prediction": [{
    "label": "category1",
    "probability": 0.9
    },
    {
    "label": "category2",
    "probability": 0.1
    }
    ]
    }]
    },
    "size": {
    "width": 2380,
    "height": 3368
    },
    "page": 0,
    "request_file_id": "bfcf1e53-0dad-11ed-8edf-3200f1e279c7",
    "original_file_name": "SAMPLE-Passed.jpg",
    "custom_response": null,
    "assigned_member": "",
    "is_deleted": false,
    "source": "api",
    "no_of_fields": 30,
    "cost": 0.3,
    "payable_cost": 0,
    "status": "success",
    "export_status": "",
    "retries": 0,
    "rotation": 0,
    "updated_at": "88ce6249-0e89-11ed-b146-3a9bc324f25c",
    "verified_at": "88ce6237-0e89-11ed-b145-3a9bc324f25c",
    "verified_by": "[email protected]",
    "current_stage_id": "ffffffff-ffff-ffff-ffff-ffffffffffff",
    "uploaded_by": "[email protected]",
    "upload_channel": "ui",
    "file_url": "uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/RawPredictions/SAMPLE-Passed-2022-07-07T10-46-23.132.jpg",
    "request_metadata": "",
    "raw_ocr": [],
    "delay_post_prediction_tasks": false
    }],
    "unmoderated_images": [{
    "model_id": "05618833-0469-4fd2-a5fc-8e4a61e64486",
    "day_since_epoch": 19200,
    "is_moderated": false,
    "hour_of_day": 13,
    "id": "bfd084fc-0dad-11ed-8ee0-3200f1e279c7",
    "url": "uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/ae861e25-3fe5-4727-9287-7ec166a3522b.jpeg",
    "predicted_boxes": [],
    "moderated_boxes": [],
    "page_classification_results": {
    "message": "Success",
    "model_id": "00000000-0000-0000-0000-000000000000",
    "result": [{
    "file": "string",
    "message": "Success",
    "page": 0,
    "prediction": [{
    "label": "category1",
    "probability": 0.9
    },
    {
    "label": "category2",
    "probability": 0.1
    }
    ]
    }]
    },
    "size": {
    "width": 595,
    "height": 842
    },
    "page": 0,
    "request_file_id": "bfd08506-0dad-11ed-8ee1-3200f1e279c7",
    "original_file_name": "SAMPLE-Flagged.jpg",
    "custom_response": null,
    "assigned_member": "[email protected]",
    "is_deleted": false,
    "source": "api",
    "no_of_fields": 30,
    "cost": 0.3,
    "payable_cost": 0,
    "status": "success",
    "export_status": "",
    "retries": 0,
    "rotation": 0,
    "updated_at": "bfd084fc-0dad-11ed-8ee0-3200f1e279c7",
    "verified_at": "bfd084fc-0dad-11ed-8ee0-3200f1e279c7",
    "verified_by": "",
    "current_stage_id": "f5934a81-d6a6-42fe-a130-246ce1e338d3",
    "uploaded_by": "[email protected]",
    "upload_channel": "ui",
    "file_url": "uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/RawPredictions/SAMPLE-Flagged-2022-07-07T10-46-22.158.jpg",
    "request_metadata": "",
    "raw_ocr": [],
    "delay_post_prediction_tasks": false
    }],
    "signed_urls": {
    "uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/85975318-3bf7-4c0b-aec1-908f616b0192.jpeg": {
    "original": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/85975318-3bf7-4c0b-aec1-908f616b0192.jpeg?expires=1659968049&or=0&s=17e5b56292fbe00cd7277326899a13c8",
    "original_compressed": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/85975318-3bf7-4c0b-aec1-908f616b0192.jpeg?auto=compress&expires=1659968049&or=0&s=dd77542ea92d42714da2f1e2922493f1",
    "thumbnail": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/85975318-3bf7-4c0b-aec1-908f616b0192.jpeg?auto=compress&expires=1659968049&w=240&s=2e6c3bf0fd613cab82f0489dd84193e1",
    "acw_rotate_90": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/85975318-3bf7-4c0b-aec1-908f616b0192.jpeg?auto=compress&expires=1659968049&or=270&s=d656532d0fbf471d7ebf14d259119679",
    "acw_rotate_180": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/85975318-3bf7-4c0b-aec1-908f616b0192.jpeg?auto=compress&expires=1659968049&or=180&s=9b6a07be6961ee768c4c2466d84e555b",
    "acw_rotate_270": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/85975318-3bf7-4c0b-aec1-908f616b0192.jpeg?auto=compress&expires=1659968049&or=90&s=70d6639f21150c690ce6139119a8a05a",
    "original_with_long_expiry": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/85975318-3bf7-4c0b-aec1-908f616b0192.jpeg?expires=1675505649&or=0&s=448c749daaec494a05f6570d0d88361d"
    },
    "uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/ae861e25-3fe5-4727-9287-7ec166a3522b.jpeg": {
    "original": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/ae861e25-3fe5-4727-9287-7ec166a3522b.jpeg?expires=1659968049&or=0&s=fa7d37cb7e1c6a8e9cd9e9db9cdf52ea",
    "original_compressed": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/ae861e25-3fe5-4727-9287-7ec166a3522b.jpeg?auto=compress&expires=1659968049&or=0&s=ccc2ee9d540cffee6f614466cc6c216f",
    "thumbnail": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/ae861e25-3fe5-4727-9287-7ec166a3522b.jpeg?auto=compress&expires=1659968049&w=240&s=c5cfd1e3893ab3379783bdb3970a3ab8",
    "acw_rotate_90": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/ae861e25-3fe5-4727-9287-7ec166a3522b.jpeg?auto=compress&expires=1659968049&or=270&s=423bb22e9aa01ebcf2b2a51ad2ba3895",
    "acw_rotate_180": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/ae861e25-3fe5-4727-9287-7ec166a3522b.jpeg?auto=compress&expires=1659968049&or=180&s=eb614306c30f1f76741ba8a819261bb0",
    "acw_rotate_270": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/ae861e25-3fe5-4727-9287-7ec166a3522b.jpeg?auto=compress&expires=1659968049&or=90&s=385ce16f0c75a4c1fce987310191dfab",
    "original_with_long_expiry": "https://nnts.imgix.net/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/PredictionImages/ae861e25-3fe5-4727-9287-7ec166a3522b.jpeg?expires=1675505649&or=0&s=b71cbea278a03bfe0e27a417d8e49a31"
    },
    "uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/RawPredictions/SAMPLE-Flagged-2022-07-07T10-46-22.158.jpg": {
    "original": "https://nanonets.s3.us-west-2.amazonaws.com/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/RawPredictions/SAMPLE-Flagged-2022-07-07T10-46-22.158.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA5F4WPNNTLX3QHN4W%2F20220808%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220808T101409Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&response-cache-control=no-cache&X-Amz-Signature=1f0331d94c1e8b87497a19d324211af01565e443f461febeaa5b16f34bd6eb1d",
    "original_compressed": "",
    "thumbnail": "",
    "acw_rotate_90": "",
    "acw_rotate_180": "",
    "acw_rotate_270": "",
    "original_with_long_expiry": ""
    },
    "uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/RawPredictions/SAMPLE-Passed-2022-07-07T10-46-23.132.jpg": {
    "original": "https://nanonets.s3.us-west-2.amazonaws.com/uploadedfiles/415f8096-f114-41f8-a167-cc3c5b7fdd13/RawPredictions/SAMPLE-Passed-2022-07-07T10-46-23.132.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA5F4WPNNTLX3QHN4W%2F20220808%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220808T101409Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&response-cache-control=no-cache&X-Amz-Signature=6c31d04733e35297cb6c0bac5b9736eb69e316f143b516d1726c1a11dcb2a934",
    "original_compressed": "",
    "thumbnail": "",
    "acw_rotate_90": "",
    "acw_rotate_180": "",
    "acw_rotate_270": "",
    "original_with_long_expiry": ""
    }
    }
    }