Skip to main content

Working With Your Data

Tasks

Tasks allow you to bring your algorithms in the form of Docker images to your data. The LifeOmic Platform takes care of standing up the compute infrastructure to execute the tasks. Tasks can use files from a project as inputs and can generate new files as outputs that get stored in the same project. You can use the Tasks API or the tasks commands in the cli to create and run tasks against your data. The Task Service User Guide has more detailed information to help get you started.

Genomic Data

Genomic data can be added to a project in the form of data files as outlined in the Uploading Your Data section. The platform supports the following file formats:

Once the files have been added to a project, the data needs to be indexed. The indexing process ingests the data from the files and annotates the short variants using several public variant databases like ClinVar and dbSNP. The annotated variants are then stored in the analytics engine.

For VCF and BAM files, you can use the genomics commands in the cli to initiate the indexing process. For large VCF files, the indexing process can take up to an hour to complete. The indexing process creates a VariantSet resource which has a status property that indicates when the indexing has completed and the VariantSet is active and ready for use.

For NantOmics and Foundation Medicine files, the platform provides custom tasks that can be used to index the data. These tasks perform some pre-processing of the files to convert them into a standard VCF format. At this point, the same indexing process is then started to ingest and annotate the variants. The Tasks API or the tasks commands in the cli can be used to start these custom tasks.

Analytic Insights

The Analytics Insights API or the insights commands in the cli can be used to query the indexed clinical and genomic data within a project. The Insights API section has more detailed information on how to do this.