Textract
Textract is a machine learning service that automatically extracts text, forms, and tables from scanned documents. It simplifies the process of extracting valuable information from a variety of document types, enabling applications to quickly analyze and understand document content.
LocalStack allows you to mock Textract APIs in your local environment. The supported APIs are available on our API Coverage section, providing details on the extent of Textract’s integration with LocalStack.
Getting started
Section titled “Getting started”This guide is tailored for users new to Textract and assumes basic knowledge of the AWS CLI and our awslocal wrapper script.
Start your LocalStack container using your preferred method. We will demonstrate how to perform basic Textract operations, such as mocking text detection in a document.
Detect document text
Section titled “Detect document text”You can use the DetectDocumentText API to identify and extract text from a document.
Execute the following command:
awslocal textract detect-document-text \ --document '{"S3Object":{"Bucket":"your-bucket","Name":"your-document"}}'{ "DocumentMetadata": { "Pages": { "Pages": 389 } }, "Blocks": [], "DetectDocumentTextModelVersion": "1.0"}Start document text detection job
Section titled “Start document text detection job”You can use the StartDocumentTextDetection API to asynchronously detect text in a document.
Execute the following command:
awslocal textract start-document-text-detection \ --document-location '{"S3Object":{"Bucket":"bucket","Name":"document"}}'{ "JobId": "501d7251-1249-41e0-a0b3-898064bfc506"}Save the JobId value to use in the next command.
Get document text detection job
Section titled “Get document text detection job”You can use the GetDocumentTextDetection API to retrieve the results of a document text detection job.
Execute the following command:
awslocal textract get-document-text-detection \ --job-id "501d7251-1249-41e0-a0b3-898064bfc506"Replace 501d7251-1249-41e0-a0b3-898064bfc506 with the JobId value retrieved from the previous command.
{ "DocumentMetadata": { "Pages": { "Pages": 389 } }, "JobStatus": "SUCCEEDED", "Blocks": [], "DetectDocumentTextModelVersion": "1.0"}API Coverage
Section titled “API Coverage”| Operation ▲ | Implemented | Image | Kubernetes Support |
|---|