Skip to main content

OCR FAQ

This article answers common questions about PrecisionOCR (OCR). For OCR documentation, see OCR.

How do we integrate OCR into a LifeOmic Platform project?

LifeOmic Platform + Precision OCR fully integrates the LifeOmic Platform and OCR. Documents imported using OCR are are imported into the LifeOmic Platform project you are logged into. Log into the LifeOmic Platform and choose a specific project, then click OCR > All Files to see all OCR files associated with a project. To see OCR files assigned to a specific subject, click Subjects and select a subject.

How can we use PrecisionOCR?

1 - Standalone - A separate UI and app allow you to upload and organize documents and extract data.

2 - Precision Health Cloud with OCR - OCR is bundled with our platform for indexing and analyzing data from disparate sources (like print documents). This option is often utilized for drug discovery and other demanding needs.

3 - OCR API and CLI - CLI and API endpoints are available to integrate our product and utilize the capabilities programmatically.

4 - Python SDK - You can also script the upload of documents with the Python SDK.

What file types are supported?

OCR supports PDF, JPG, and PNG.

How do we pull data out of PrecisionOCR?

We make data available to you in the FHIR format. For more information on why we chose the FHIR format, see Why FHIR.

1 - Manually: You can use the UI to manually highlight the text you want to export and download it in CSV format. You can also use the copy/paste to Picklist feature. For more information, see Copy or Add to the Picklist.

2 - Automatically: You can also use the UI to configure the automatic extraction of data from the document types you commonly receive using the report extractors template tool. For more information, see Building OCR Report Extractors.

3 - Programmatically: You can use the CLI (command line interface), the API Endpoints, or the Python SDK to extract data. For more information, see Use the CLI for OCR, OCR API, or Running PrecisionOCR in the Python SDK.

Who do you use for hosting? Can we host in-house?

LifeOmic is a cloud first company. We use Amazon AWS for hosting. We do not access customers' data and customers' data is logically partitioned from other customers' data. LifeOmic has strict measures in place to protect our customers, as we are HITRUST certified, SOC 2 audited, and have FedRAMP Ready compliance. Being FedRAMP Ready certifies that we have implemented the strict security and privacy standards required to serve the Federal Government.

For additional security and architecture information, see Architecture.

What is the maximum document size that you can handle?

Our system can comfortably handle a single document of up to 1000 pages. Contact LifeOmic if you have documents larger than 1000 pages.

How quickly are documents uploaded and automatically extracted?

A couple of pages takes approximately a minute. But pages process in parallel, so 10 to 20 pages take roughly the same amount of time as a few pages.

A customer test processed over 600 pages in approximately fifteen minutes.

Which health code types do you support?

OCR supports RxNORM, ICD-10-CM, LOINC, and SNOMED. You can also upload custom ontologies, see OCR Ontology.

Which resource types do you support?

OCR supports Medications, Conditions, Observations, and Procedures.

Can you handle poor quality documents?

Yes, we have a de-noising feature that can provide some cleanup of messy documents and allow for extract.

Are you able to extract the text on top of images?

Yes, but it depends on how the text is formatted in the image. If the text is literally separate from the image and appears in a straight line at the top, it is easily extracted. If it is part of the image, but close to horizontal and does not blend with the image, it typically extracts well. Curved text or text that blends with an image does not extract well.

What is the difference between PrecisionOCR and Amazon Textract?

Amazon Textract has basic capabilities to identify and extract data from structured text, such as fields and tables. PrecisionOCR extracts data from unstructured text, such as paragraphs or doctor's notes. It is estimated that 80% of clinical data is unstructured text. Based on this estimate, Amazon Textract can extract up to 20% of data from documents while PrecisionOCR can extract up to 80%.