Get Prediction File By Page ID

You can test this API on this page using the API key. First, generate the API key, enter the model_id and page_id in the parameter boxes below. Once you have added all the parameters, hit the “Try It!” button on the right side panel to see the response in the response box on the right side panel. You can get the page_id from the file_url.

Basic Structure:

message: This Indicates the success status of the API call.
result: This contains an array of objects, each representing the prediction details for a specific file page.
signed_urls: Object containing URLs with signed access for different versions of the images. The original expiry urls are valid for 4 hrs and original with long expiry urls are valid for 180 days.

{
    "message": "Success",
    "result": [
        {
            "message": "success",
            "input": "test2_file.pdf",
            "prediction": [
            ],
            "predicted_boxes": [
            ],
            "moderated_boxes": [
            ],
            "raw_ocr": [
            ]
        }
    ],
    "signed_urls": {
    }
}

Description of each field inside the Json Response:

message: This represent the success status of page level prediction.
input: This is the name of the file for which you fetched predictions using the API.
prediction: An array containing initial predictions from the model for different fields and tables in the file.
predicted_boxes: An array of predicted data boxes, reflecting the model’s initial predictions.
moderated_boxes: An array of modified data boxes, representing final processed results after human review or additional processing.
page: The page number in the document where the label is located, with 0 representing the first page and so on.
day_since_epoch: The number of days since January 1, 1970 (GMT), representing the upload date of the file.
request_file_id: The unique identifier of the file you uploaded to the model for prediction. You can find this ID on the extract data page of the model for each file.
id: This is the unique identifier for this specific page’s prediction result.
is_moderated: A boolean value indicating whether the file is approved or not.
- 'TRUE' means file is approved.
- ‘FALSE’ means file is rejected or not approved.
updated_at: Timestamp(UUID) indicating when the page details were last updated.
model_id: Represents the specific model_id used for making predictions.
size: Dimensions of the page used in the prediction (width and height).
original_file_name: The number of fields configured in the “AI Training” section.
no_of_fields: Number of fields detected and processed in the file.
export_status: Indicates if the data was successfully exported or not.
current_stage_id: Unique identifier of the current processing stage. (e.g., “ffffffff-ffff-ffff-ffff-ffffffffffff” represent the approved stage)
raw_ocr: Array of raw OCR results, including text data extracted before any modification/correction.
approval_status: This represent the approval status of the file, it will be black if file is not approved/rejected, “approved” if file is approved, and “rejected” if file is rejected.
assigned_members: This represents the list of user emails assigned to the file for review/approval.

Inside "prediction", "predicted_boxes", "moderated_boxes", and "raw_ocr" Arrays

Each entry within these arrays includes:

id: Unique identifier of the bounding box.
label: The label name, which corresponds to the field or table header as configured in the model.
xmin, ymin, xmax, ymax: This represents the minimum and maximum x,y-coordinate of the bounding box used to predict the value for the specified label.
`score:` A confidence score in prediction represents the model's certainty in its output. It's a numerical value, usually between 0 and 1, indicating the probability that the prediction is correct. A higher score means greater confidence in the prediction's accuracy.
ocr_text: This is the predicted value associated with the label
status: This field represent the status of the label predicted, it’s always correctly_predicted.
validation_status: This represent the status of the label based on validation rules configured in the workflow section. Indicates whether the initial prediction for the field passed validation checks. Possible values are "success" if all checks passed, or “failed” if any validation failed.
validation_message: This represent the specific reason for the validation failure of a label. It is only populated if the validation_status for the cell is "failed." For instance, "Content Length is greater than or equal to 2" could be a reason indicating that the label value or ocr_text did not meet the required content length criteria set by the validation rules.
type: This represent if the label is field or table header. Possible values are field and table.
page: The page number in the document where the label is located, with 0 representing the first page and so on.
label_id: A unique identifier associated with each label as defined in the model.
cells: An array of cell predictions within the identified table.
- id: This is the unique identifier for the cell prediction.
- row, col: This represent the row and column numbers where the cell is located.
- row_span, col_span:
- label: The label name, which corresponds to the table header as configured in the model.
- xmin, ymin, xmax, ymax: This represents the minimum and maximum x,y-coordinate of the bounding box used to predict the value for the specified label.
- score: A confidence score in prediction represents the model's certainty in its output. It's a numerical value, usually between 0 and 1, indicating the probability that the prediction is correct. A higher score means greater confidence in the prediction's accuracy.
- text : This represent the predicted value associated with the cell within the table.
- verification_status: Indicates the moderation status of a document or a specific field. It can take the values "moderated" if the item has been manually reviewed and modified as necessary, or "correctly_predicted" if the model's initial prediction was accurate and required no further modifications. If the file is approved the verification_status inside the moderated_box for each cell will change to “moderated”.
- status: This represent the status of the cell based on validation rules configured in the workflow section. Indicates whether the initial prediction for the cell within the table passed validation checks. Possible values are "success" if all checks passed, or “failed” if any validation failed.
- failed_validation: This represent the specific reason for the validation failure of a cell within the table. It is only populated if the validation_status for the cell is "failed." For instance, "Content Length is greater than or equal to 2" could be a reason indicating that the cell content did not meet the required content length criteria set by the validation rules.
- label_id: A unique identifier associated with each label as defined in the model.