Public API
API documentation
Integrate SotaOCR into your AI agents and LLM pipelines. The API is asynchronous: upload a document, poll job status, then fetch the final result.
Authentication
All API requests require Bearer token authentication. You can create an API key in your dashboard.
Header
Authorization: Bearer YOUR_API_KEYRate limits and polling
Please poll job status no more than once per second. When you exceed the limit, the API returns 429 Too Many Requests.
1. Upload document
/v1/extract Uploads a PDF or image for OCR. You can optionally limit processing to specific pages.
- file: Document file (PDF, PNG, JPG).
- page_ranges: (Optional) JSON string with an array of page ranges. Example: '[{"start":1,"end":3}]'
curl -X POST https://api.sotaocr.com/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@document.pdf" \
-F 'page_ranges=[{"start":1,"end":5}]'{
"id": "job_123456789",
"status": "pending",
"page_count": 0,
"created_at": "2026-03-24T12:00:00Z"
}2. Check status
/v1/jobs/{job_id} Returns the current processing status. Use page_count and pages_completed to track progress.
curl -X GET https://api.sotaocr.com/v1/jobs/job_123456789 \ -H "Authorization: Bearer YOUR_API_KEY"
{
"id": "job_123456789",
"status": "running",
"page_count": 5,
"pages_completed": 2,
"created_at": "2026-03-24T12:00:00Z"
}3. Fetch result
/v1/jobs/{job_id}/result?format=markdown Returns extracted text. Available only when the job status is completed.
- format: (Optional) Response format: json, markdown, or text. Defaults to json.
curl -X GET "https://api.sotaocr.com/v1/jobs/job_123456789/result?format=markdown" \ -H "Authorization: Bearer YOUR_API_KEY"
{
"job_id": "job_123456789",
"format": "markdown",
"page_count": 5,
"content": "# Annual report\n\nDocument text..."
}Ready to integrate?
Create an API key in the dashboard and get free pages for testing.