Introduction
Nanonets provides an AI-driven Intelligent Document Processing API that transforms unstructured documents into structured data. Our advanced OCR and document data extraction capabilities enable you to:
- Extract structured data from various document types (invoices, receipts, forms, etc.)
- Convert unstructured text into organized, machine-readable formats
- Process and analyze document content with high accuracy
- Automate document workflows and data entry tasks
Key Features
- Advanced OCR & Data Extraction: Extract text, fields, and tables from documents with high accuracy
- Unstructured to Structured Data: Transform raw document content into organized, structured formats
- Workflow Automation: Approve or reject extracted results and assign files for review
- External Integrations: Seamlessly import documents from various sources and export data to business applications
Getting Started
This guide will help you get started with the Nanonets API quickly.
Prerequisites
- A Nanonets account
- An API key (get it from http://app.nanonets.com/#/keys)
- Basic knowledge of REST APIs
- Your preferred programming language (Python, JavaScript, etc.)
Quick Start with the REST API
1. Create Instant Learning Workflow
import requests
from requests.auth import HTTPBasicAuth
API_KEY = 'YOUR_API_KEY'
url = "https://app.nanonets.com/api/v4/workflows"
# Create instant learning workflow (default)
payload = {
"description": "Extract data from custom documents",
"workflow_type": "" # Empty string for instant learning workflow
}
response = requests.post(
url,
json=payload,
auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())
const axios = require('axios');
const API_KEY = 'YOUR_API_KEY';
const url = "https://app.nanonets.com/api/v4/workflows";
// Create instant learning workflow (default)
const payload = {
description: "Extract data from custom documents",
workflow_type: "" // Empty string for instant learning workflow
};
axios.post(url, payload, {
auth: {
username: API_KEY,
password: ''
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
curl -X POST \
-u "YOUR_API_KEY:" \
-H "Content-Type: application/json" \
-d '{
"description": "Extract data from custom documents",
"workflow_type": ""
}' \
https://app.nanonets.com/api/v4/workflows
2. Configure Fields and Tables to Extract
import requests
from requests.auth import HTTPBasicAuth
API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = workflow['id'] # From previous step
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/fields"
payload = {
"fields": [
{"name": "invoice_number"},
{"name": "total_amount"},
{"name": "invoice_date"}
],
"table_headers": [
{"name": "item_description"},
{"name": "quantity"},
{"name": "unit_price"},
{"name": "total"}
]
}
response = requests.put(
url,
json=payload,
auth=HTTPBasicAuth(API_KEY, '')
)
print(response.json())
const axios = require('axios');
const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = workflow.id; // From previous step
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/fields`;
const payload = {
fields: [
{ name: "invoice_number" },
{ name: "total_amount" },
{ name: "invoice_date" }
],
table_headers: [
{ name: "item_description" },
{ name: "quantity" },
{ name: "unit_price" },
{ name: "total" }
]
};
axios.put(url, payload, {
auth: {
username: API_KEY,
password: ''
}
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});
curl -X PUT \
-u "YOUR_API_KEY:" \
-H "Content-Type: application/json" \
-d '{
"fields": [
{ "name": "invoice_number" },
{ "name": "total_amount" },
{ "name": "invoice_date" }
],
"table_headers": [
{ "name": "item_description" },
{ "name": "quantity" },
{ "name": "unit_price" },
{ "name": "total" }
]
}' \
https://app.nanonets.com/api/v4/workflows/YOUR_WORKFLOW_ID/fields
3. Process Document
import requests
from requests.auth import HTTPBasicAuth
API_KEY = 'YOUR_API_KEY'
WORKFLOW_ID = workflow['id'] # From previous step
url = f"https://app.nanonets.com/api/v4/workflows/{WORKFLOW_ID}/documents"
# Process a document
files = {'file': open('invoice.pdf', 'rb')}
response = requests.post(url, files=files, auth=HTTPBasicAuth(API_KEY, ''))
result = response.json()
# Get results
if result['status'] == 'completed':
# Access extracted fields
invoice_number = result['data']['fields']['invoice_number'][0]['value']
total_amount = result['data']['fields']['total_amount'][0]['value']
invoice_date = result['data']['fields']['invoice_date'][0]['value']
print(f"Invoice Number: {invoice_number}")
print(f"Total Amount: {total_amount}")
print(f"Invoice Date: {invoice_date}")
# Access extracted tables
for table in result['data']['tables']:
print(f"\nTable: {table['name']}")
for cell in table['cells']:
print(f"Row {cell['row']}, Col {cell['col']}: {cell['text']}")
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
const API_KEY = 'YOUR_API_KEY';
const WORKFLOW_ID = workflow.id; // From previous step
const url = `https://app.nanonets.com/api/v4/workflows/${WORKFLOW_ID}/documents`;
// Process a document
const formData = new FormData();
formData.append('file', fs.createReadStream('invoice.pdf'));
axios.post(url, formData, {
auth: {
username: API_KEY,
password: ''
},
headers: {
...formData.getHeaders()
}
})
.then(response => {
const result = response.data;
if (result.status === 'completed') {
// Access extracted fields
const invoiceNumber = result.data.fields['invoice_number'][0].value;
const totalAmount = result.data.fields['total_amount'][0].value;
const invoiceDate = result.data.fields['invoice_date'][0].value;
console.log(`Invoice Number: ${invoiceNumber}`);
console.log(`Total Amount: ${totalAmount}`);
console.log(`Invoice Date: ${invoiceDate}`);
// Access extracted tables
for (const table of result.data.tables) {
console.log(`\nTable: ${table.name}`);
for (const cell of table.cells) {
console.log(`Row ${cell.row}, Col ${cell.col}: ${cell.text}`);
}
}
}
})
.catch(error => {
console.error(error);
});
curl -X POST \
"https://app.nanonets.com/api/v4/workflows/YOUR_WORKFLOW_ID/documents" \
-u "YOUR_API_KEY:" \
-F "[email protected]"
Best Practices
-
Error Handling
- Always check response status codes
- Implement retry logic for rate limits
-
Security
- Store API keys securely
- Use environment variables
-
Performance
- Use async processing for large files
- Monitor API usage