Skip to main content

OCR FAQ

This article answers common questions about the LifeOmic Platform OCR feature. For OCR documentation, see OCR.

How do we integrate OCR into a LifeOmic Platform project?

The LifeOmic Platform fully integrates OCR. OCR imports documents into the LifeOmic Platform project you are logged into.

  • To see all OCR files associated with a project, log into the LifeOmic Platform and choose the project, then click OCR > All Files.
  • To see OCR files assigned to a specific subject, click Subjects and select a subject.

How can we use OCR?

OCR in the LifeOmic Platform Web App - OCR is bundled with our platform for indexing and analyzing data from disparate sources (like print documents). This option is often utilized for drug discovery and other demanding needs.

OCR API - OCR API endpoints are available to integrate our product and utilize the capabilities programmatically.

CLI - OCR commands are available in the CLI.

SDK for Python - You can also script the OCR upload of documents with the Python SDK.

What file types are supported?

OCR supports PDF, JPG, and PNG.

How do we pull in data with OCR?

We make data available to you in the FHIR format. For more information on why we chose the FHIR format, see Why FHIR.

1 - Manually: You can use the UI to manually highlight the text you want to export and download it in CSV format. You can also use the copy/paste to Picklist feature. For more information, see Copy or Add to the Picklist.

2 - Automatically: You can also use the UI to configure the automatic extraction of data from the document types you commonly receive using the report extractors template tool. For more information, see Building OCR Report Extractors.

3 - Programmatically: You can use the CLI (command line interface), the API Endpoints, or the Python SDK to extract data. For more information, see Use the CLI for OCR, OCR API, or Running OCR in the Python SDK.

What is the maximum document size that you can handle?

Our system can comfortably handle a single document of up to 1000 pages.

How quickly are documents uploaded and automatically extracted?

A couple of pages takes approximately a minute. But pages process in parallel, so 10 to 20 pages take roughly the same amount of time as a few pages.

A customer test processed over 600 pages in approximately fifteen minutes.

Which health code types do you support?

OCR supports RxNORM, ICD-10-CM, LOINC, and SNOMED. You can also upload custom ontologies, see OCR Ontology.

Which resource types do you support?

OCR supports Medications, Conditions, Observations, and Procedures.

Can you handle poor quality documents?

Yes, a de-noising feature cleans up messy documents and improves extracts.

Are you able to extract the text on top of images?

Yes, but it depends on how the text is formatted in the image. If the text is literally separate from the image and appears in a straight line at the top, it is easily extracted. If it is part of the image, but close to horizontal and does not blend with the image, it typically extracts well. Curved text or text that blends with an image does not extract well.

What is the difference between LifeOmic OCR and Amazon Textract?

Amazon Textract has basic capabilities to identify and extract data from structured text, such as fields and tables. OCR extracts data from unstructured text, such as paragraphs or doctor's notes. It is estimated that 80% of clinical data is unstructured text. Based on this estimate, Amazon Textract can extract up to 20% of data from documents while OCR can extract up to 80%.