Textract
Textract is a machine learning service that automatically extracts text, forms, and tables from scanned documents. It simplifies the process of extracting valuable information from a variety of document types, enabling applications to quickly analyze and understand document content.
LocalStack allows you to mock Textract APIs in your local environment. The supported APIs are available on our API Coverage section, providing details on the extent of Textract’s integration with LocalStack.
Getting started
Section titled “Getting started”This guide is tailored for users new to Textract and assumes basic knowledge of the AWS CLI and our awslocal
wrapper script.
Start your LocalStack container using your preferred method. We will demonstrate how to perform basic Textract operations, such as mocking text detection in a document.
Detect document text
Section titled “Detect document text”You can use the DetectDocumentText
API to identify and extract text from a document.
Execute the following command:
awslocal textract detect-document-text \ --document '{"S3Object":{"Bucket":"your-bucket","Name":"your-document"}}'
{ "DocumentMetadata": { "Pages": { "Pages": 389 } }, "Blocks": [], "DetectDocumentTextModelVersion": "1.0"}
Start document text detection job
Section titled “Start document text detection job”You can use the StartDocumentTextDetection
API to asynchronously detect text in a document.
Execute the following command:
awslocal textract start-document-text-detection \ --document-location '{"S3Object":{"Bucket":"bucket","Name":"document"}}'
{ "JobId": "501d7251-1249-41e0-a0b3-898064bfc506"}
Save the JobId
value to use in the next command.
Get document text detection job
Section titled “Get document text detection job”You can use the GetDocumentTextDetection
API to retrieve the results of a document text detection job.
Execute the following command:
awslocal textract get-document-text-detection \ --job-id "501d7251-1249-41e0-a0b3-898064bfc506"
Replace 501d7251-1249-41e0-a0b3-898064bfc506
with the JobId
value retrieved from the previous command.
{ "DocumentMetadata": { "Pages": { "Pages": 389 } }, "JobStatus": "SUCCEEDED", "Blocks": [], "DetectDocumentTextModelVersion": "1.0"}
API Coverage
Section titled “API Coverage”Operation ▲ | Implemented | Image |
---|