Tasks

Task service allows for direct access to cloud data in a secure compute environment with your own code inside Docker containers.

The LifeOmic CLI (Command Line Interface) offers a familiar interface for a user to develop and run their code.

Users will need to have an account set up and are able to log in at https://apps.us.lifeomic.com/phc before proceeding with using the CLI to run tasks.

General concept

Every resource in the LifeOmic Platform is identified by a unique ID which looks like this 4a113171-9f4a-48e2-82be-5682f476cc76. In general, use the ID instead of resource name with the CLI. When working with task service and files, it is often necessary to specify the project or dataset ID to work under. Project and dataset are used interchangeably to mean the same thing.

Listing of files in project

The following command lists the datasets/projects under the account, which has only one with datasetId = 4a113171-9f4a-48e2-82be-5682f476cc76:

>> lo projects list
items:
  -
    id:          4a113171-9f4a-48e2-82be-5682f476cc76
    name:        analytics-testing
    description: Analytics Testing Project
    lrn:         lrn:lo:dev:lifeomic:project:4a113171-9f4a-48e2-82be-5682f476cc7

To list the files in a dataset/project, provide the datasetId (in the example below, the dataset is e447d01a-ae17-48d0-8cd8-86c9a65f779b) with the command like this which returns a list of two files:

>> lo files list e447d01a-ae17-48d0-8cd8-86c9a65f779b
items:
  -
    id:           89cac4d4-e1a4-4d9b-912b-1bcc4908b9b7
    name:         mmrf.rgel
    datasetId:    e447d01a-ae17-48d0-8cd8-86c9a65f779b
    size:         1223998865
    contentType:  application/octet-stream
    lastModified: 2018-06-15T19:01:58.365Z
    lrn:          lrn:lo:dev:lifeomic:file:89cac4d4-e1a4-4d9b-912b-1bcc4908b9b7
  -
    id:           988d4158-6aac-47a2-99d5-e5de8fa5acb1
    name:         mmrf.rgel.executor.0.stderr.txt
    datasetId:    e447d01a-ae17-48d0-8cd8-86c9a65f779b
    size:         82
    contentType:  text/plain
    lastModified: 2018-06-15T19:22:12.438Z
    lrn:          lrn:lo:dev:lifeomic:file:988d4158-6aac-47a2-99d5-e5de8fa5acb1

Upload of file

To upload a single file:

lo files upload ./myfile.txt <datasetId>

To upload a whole directory of files, provide the path to the local directory instead of a file name. Note that the files will be prefixed with the directory name <localDir> when uploaded to the LifeOmic Platform.

lo files upload <localDir> <datasetId>

Task Service with the CLI

Submitting a task job using the CLI will create the JSON job definition. A task JSON file contains the <name> and <datasetId> declarations, followed by these main sections: inputs, outputs, resources and executors. The input and output sections are files and directories to copy and write out upon execution. Resource specifies the CPU cores needed and the RAM in GB. The executors section defines a list of Docker images to execute serially in the order of declaration.

Here is an example of file specification as input. Provide the file's <url>, using its unique ID which can be obtained by listing the files in the dataset. The <path> is where the file will be copied and available to the Docker images upon execution.

{
    "path": "/tmp/input.txt",
    "url": "https://api.us.lifeomic.com/v1/files/32c94154-1910-4e77-ab98-c6c8f2163060",
    "type": "FILE"
}

For directory as input, provide the <url> with the project/dataset ID, and the <prefix> specifies the path prefix in which all files with that prefix in the project will be copied over to the <path>.

{
    "path": "/tmp/in",
    "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
    "prefix": "data/",
    "type": "DIRECTORY"
}

To save the results from the execution, declare the file or directory to be copied out with <url> being the project/dataset ID. In the following example, file "/out/result.txt" and directory "/outDir" in the container will be copied out upon successful completion of the task.

({
    "path": "/out/result.txt",
    "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
    "type": "FILE"
},
{
    "path": "/outDir",
    "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
    "type": "DIRECTORY"
})

Scheduling

Tasks can be configured to run at a future time or on a recurring schedule. recurring schedule.

To schedule a task to run at a specific date and time, add the following to the task definition:

{
    "scheduleDate": "2019-07-26T16:35:31Z"
}

To schedule a task to run on a recurring schedule, use a cron expression:

{
    "scheduleExpression": "0 0 * * *"
}

To stop a recurring task, use the cancel task API.

Email Notifications

Email notifications can be sent when a task completes or fails. To use this feature, add the following to the task definition:

{
    "email": {
        "sendFailedTo": "user@company.com",
        "sendCompletedTo": "user@company.com
    }
}

In the example above, an email will be sent if the task fails or completes.

Using a non-public image

Task Service can pull images from any public Docker registry like Dockerhub. A non-public image can be used by uploading an export of it to the LifeOmic Platform and then specifying it as an input to the task.

Use docker save to create a gzipped TAR file of the image and then use the CLI to upload to a project.

docker build -t my_image --rm .
docker save my_image | gzip > myimage.tar.gz
lo files upload ./myimage.tar.gz 0ec93203-febb-4c85-9aac-229703b6fa58

Specify the uploaded Docker image as an input to a task. The task service will the gzipped TAR file and loads the Docker image.

{
    "url": "https://api.us.lifeomic.com/v1/files/32c94154-1910-4e77-ab98-c6c8f2163060",
    "path": "/tmp/myimage.tar.gz",
    "type": "DOCKER_IMAGE"
}

Use the image in an executor within the task. Note: You use the image tag name and not the name of the gzipped TAR file.

"executors": [
    {
        "workdir": "/tmp",
        "image": "my_image",
        "command": [
            "echo",
            "hello world"
        ],
        "stderr": "/out/stderr.txt",
        "stdout": "/out/stdout.txt"
    }
]

Hello world JSON example

To run the following example, save the definition as a JSON file (e.g. hello.json) and change <datasetId> and the output <url> with your datasetId. This task uses the downloaded image "busybox" from https://hub.docker.com/_/busybox/ and executes the Linux command "echo hello world" which is saved to the "stdout" in file "out/stdout.txt". Users will find the file "out/stdout.txt" in the file listing in the UI with content "hello world" text. Note: There is no input declared with only an output directory to return.

To submit the job, run:

cat hello.json | lo tasks create
{
    "name": "Hello World Task",
    "datasetId": "0ec93203-febb-4c85-9aac-229703b6fa58",
    "inputs": [
    ],
    "outputs": [
        {
            "path": "/out",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "type": "DIRECTORY"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": [
        {
        "workdir": "/tmp",
            "image": "busybox",
            "command": [
                "echo",
                "hello world"
            ],
            "stderr": "/out/stderr.txt",
            "stdout": "/out/stdout.txt"
        }
    ]
}

List files JSON example

This example lists the files in a directory and saves the result in a file. Assume there exists files with prefix "data/" and a bash script file "run.sh" with file id = 32c94154-1910-4e77-ab98-c6c8f2163060 in the project. The content of the bash script file "run.sh" has 2 lines:

#!/bin/bash
ls -al $1 > $2

The result of the listing of directory "/tmp/in" is saved to the file "/out/result.txt".

{
    "name": "Task Service Test",
    "datasetId": "0ec93203-febb-4c85-9aac-229703b6fa58",
    "inputs": [
        {
            "path": "/tmp/in",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "prefix": "data/",
            "type": "DIRECTORY"
        },
        {
            "path": "/tmp/run.sh",
            "url": "https://api.us.lifeomic.com/v1/files/32c94154-1910-4e77-ab98-c6c8f2163060",
            "type": "FILE"
        }
    ],
    "outputs": [
        {
            "path": "/out",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "type": "DIRECTORY"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": [
        {
            "workdir": "/tmp",
            "image": "busybox",
            "command": [
                "sh",
                "-l",
                "/tmp/run.sh",
                "/tmp/in",
                "/out/result.txt"
            ],
            "stderr": "/out/stderr.txt",
            "stdout": "/out/stdout.txt"
        }
    ]
}

A complete task JSON example

This is a more comprehensive example of taking a variant "vcf" file and passing it through a series of processing. It uses LifeOmic GNOSIS data resources as reference data inputs. The various resources available in GNOSIS to be used with Task Service will be an advanced topic to discuss. This example demonstrates the practical usage of Task Service to perform a complete series of tasks.

{
    "name": "NantOmics Test",
    "datasetId": "0ec93203-febb-4c85-9aac-229703b6fa58",
    "inputs": [
        {
            "path": "/tmp/nantomics.vcf",
            "url": "https://api.us.lifeomic.com/v1/files/5a61bfdb-5db0-4264-8573-ee9945383cf7",
            "type": "FILE"
        },
        {
            "path": "/tmp/genome",
            "name": "GRCh37",
            "genome": "GRCh37",
            "type": "GNOSIS"
        },
        {
            "path": "/tmp/clinvar",
            "name": "ClinVar",
            "genome": "GRCh37",
            "type": "GNOSIS"
        },
        {
            "path": "/tmp/cosmic",
            "name": "COSMIC",
            "genome": "GRCh37",
            "type": "GNOSIS"
        },
        {
            "path": "/tmp/dbsnp",
            "name": "dbSNP",
            "genome": "GRCh37",
            "type": "GNOSIS"
        }
    ],
    "outputs": [
        {
            "path": "/out",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "type": "DIRECTORY"
        },
        {
            "path": "/log",
            "url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
            "type": "DIRECTORY"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 4
    },
    "executors": [
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-nant-et",
            "command": [
                "extract",
                "-i",
                "/tmp/nantomics.vcf",
                "-v",
                "/out/nantomics.var.vcf.gz",
                "-s",
                "/out/nantomics.sv.vcf.gz",
                "-c",
                "/out/nantomics.cnv.vcf.gz",
                "-e",
                "/out/nantomics.exp.vcf.gz"
            ],
            "stderr": "/log/stderr1.txt",
            "stdout": "/log/stdout1.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-nant-et",
            "command": [
                "var-transform",
                "-i",
                "/out/nantomics.var.vcf.gz",
                "-o",
                "/out/nantomics.var.std.vcf.gz"
            ],
            "stderr": "/log/stderr2.txt",
            "stdout": "/log/stdout2.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-nant-et",
            "command": [
                "exp-transform",
                "-i",
                "/out/nantomics.exp.vcf.gz",
                "-g",
                "/out/nantomics.exp.gene.txt.gz",
                "-s",
                "/out/nantomics.exp.iso.txt.gz"
            ],
            "stderr": "/log/stderr3.txt",
            "stdout": "/log/stdout3.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-vtools",
            "command": [
                "vt-combo",
                "-r",
                "/tmp/genome/GRCh37.fa.gz",
                "-i",
                "/out/nantomics.var.std.vcf.gz",
                "-o",
                "/out/nantomics.var.nrm.vcf.gz"
            ],
            "stderr": "/log/stderr4.txt",
            "stdout": "/log/stdout4.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-snpeff-grch37",
            "command": [
                "snpeff",
                "-m",
                "Refseq",
                "-i",
                "/out/nantomics.var.nrm.vcf.gz",
                "-o",
                "/out/nantomics.var.fnc.vcf.gz"
            ],
            "stderr": "/log/stderr5.txt",
            "stdout": "/log/stdout5.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-snpeff-grch37",
            "command": [
                "snpsift-annotate",
                "-p",
                "CLN_",
                "-n",
                "/tmp/clinvar/clinvar-GRCh37.vcf.gz",
                "-i",
                "/out/nantomics.var.fnc.vcf.gz",
                "-o",
                "/out/nantomics.var.cln.vcf.gz"
            ],
            "stderr": "/log/stderr6.txt",
            "stdout": "/log/stdout6.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-snpeff-grch37",
            "command": [
                "snpsift-annotate",
                "-p",
                "CMC_",
                "-n",
                "/tmp/cosmic/cosmic-GRCh37.vcf.gz",
                "-i",
                "/out/nantomics.var.cln.vcf.gz",
                "-o",
                "/out/nantomics.var.cmc.vcf.gz"
            ],
            "stderr": "/log/stderr7.txt",
            "stdout": "/log/stdout7.txt"
        },
        {
            "workdir": "/tmp",
            "image": "lifeomic/kopis-task-snpeff-grch37",
            "command": [
                "snpsift-annotate",
                "-p",
                "DBS_",
                "-n",
                "/tmp/dbsnp/dbsnp-GRCh37.vcf.gz",
                "-i",
                "/out/nantomics.var.cmc.vcf.gz",
                "-o",
                "/out/nantomics.var.dbs.vcf.gz"
            ],
            "stderr": "/log/stderr8.txt",
            "stdout": "/log/stdout8.txt"
        }
    ]
}

FHIR resource ingest

This example shows how to ingest FHIR resources from a file using a task. In this example there are no executors because the FHIR resources are taken as-is from the file with no transformation.

{
    "name": "FHIR ingest",
    "datasetId": "643efe57-430f-4b06-b1b0-3e565c62a64c",
    "inputs": [
        {
            "path": "/tmp/fhir.json",
            "url": "https://api.us.lifeomic.com/v1/files/146e0679-0e03-4e05-af46-d930cfaec761",
            "type": "FILE"
        }
    ],
    "outputs": [
        {
            "path": "/tmp/fhir.json",
            "url": "https://api.us.lifeomic.com/v1/projects/643efe57-430f-4b06-b1b0-3e565c62a64c",
            "type": "FHIR"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": []
}

The file of FHIR resources (fhir.json in this example) should be in a JSON Lines format (aka newline-delimited JSON). For example:

{"resourceType":"Patient","name":[{"family":"Zieme","given":["Mina"]}],"gender":"female","id":"024f2316-265a-46e8-965a-837e308ae678","birthDate":"1977-06-21"}
{"status":"final","code":{"coding":[{"code":"11142-7","system":"http://loinc.org","display":"Glucose"}]},"resourceType":"Observation","id":"62f3ccbf-c51b-48ed-ad1d-0420ea196af6","subject":{"reference":"Patient/024f2316-265a-46e8-965a-837e308ae678"},"effectiveDateTime":"1999-09-09T23:20:53Z","valueQuantity":{"value":10,"unit":"mg/DL","system":"http://unitsofmeasure.org/","code":"mg/DL"}}
{"status":"final","code":{"coding":[{"code":"11142-7","system":"http://loinc.org","display":"Glucose"}]},"resourceType":"Observation","id":"b452acd6-00c9-4fab-847c-31177e14e412","subject":{"reference":"Patient/024f2316-265a-46e8-965a-837e308ae678"},"effectiveDateTime":"1999-05-05T15:11:18Z","valueQuantity":{"value":0,"unit":"mg/DL","system":"http://unitsofmeasure.org/","code":"mg/DL"}}

FHIR resource listing and cohort creation

This example queries 100,000 FHIR Observations and writes them in JSON Lines format to a file in the task, and then runs a container to compute some statistics and make a cohort out of the outliers.

{
    "name": "FHIR Analytics",
    "datasetId": "cccdf419-ac83-4b7e-aa2d-70702d43297c",
    "inputs": [
        {
            "resourceType": "Observation",
            "limit": 100000,
            "path": "/fhir/Observation.json",
            "type": "FHIR"
        }
    ],
    "outputs": [
        {
            "path": "/output/",
            "url": "https://api.us.lifeomic.com/v1/projects/cccdf419-ac83-4b7e-aa2d-70702d43297c",
            "type": "DIRECTORY"
        },
        {
            "path": "/cohorts/cohort.csv",
            "url": "https://api.us.lifeomic.com/v1/projects/cccdf419-ac83-4b7e-aa2d-70702d43297c",
            "type": "COHORT"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": [
        {
            "image": "aroach/task-sandbox:6",
            "command": ["python", "stats.py"],
            "stderr": "/output/stderr.txt",
            "stdout": "/output/stdout.txt"
        }
    ]
}

Getting updated FHIR resources with a Scheduled Task

This example creates a Scheduled Task that runs weekly and gets only the FHIR Patients that have been updated since the last time it was run. It does this by querying the _lastUpdated field of the FHIR records and by using the special variables startTime and lastSuccessfulStartTime to limit the results to those that have been updated since the last time it was run successfully. It also stops it at the current start time to prevent overlap on the next run. The curly braces are a special syntax to represent a placeholder for the value inside the braces. This will cause the braces, and everything inside of them, to be replaced with the specified variable before the input is downloaded to the container.

{
    "name": "Get Updated Patients",
    "datasetId": "8913220a-6e22-4747-9f00-8477c475b1ec",
    "scheduleExpression": "0 0 * * 0",
    "inputs": [
        {
            "path": "/fhir/patients.json",
            "type": "FHIR",
            "resourceType": "Patient",
            "limit": 100000,
            "query": "_lastUpdated=gt{{lastSuccessfulStartTime}}&_lastUpdated=le{{startTime}}"
        }
    ],
    "outputs": [
        {
            "path": "/fhir",
            "url": "https://api.dev.lifeomic.com/v1/projects/8913220a-6e22-4747-9f00-8477c475b1ec",
            "type": "DIRECTORY"
        }
    ],
    "resources": {
        "cpu_cores": 1,
        "ram_gb": 1
    },
    "executors": []
}

NOTE: The first time a Scheduled Task runs, lastSuccessfulStartTime will be set to the Unix Epoch (Jan. 1st 1970 at midnight, UTC time). See: https://en.wikipedia.org/wiki/Unix_time

Final Note

The task and its execution status can be seen from the LifeOmic Platform UI. The task listing is only there for a period of time.

Reference

Docker Overview - https://docs.docker.com/engine/docker-overview/
Registry of Docker based tools and workflows defined in CWL or WDL for the sciences - https://dockstore.org
GA4GH TES schemas (The LifeOmic Task Service API is based on this) - https://github.com/ga4gh/task-execution-schemas

Tasks

General concept​

Listing of files in project​

Upload of file​

Task Service with the CLI​

Scheduling​

Email Notifications​

Using a non-public image​

Hello world JSON example​

List files JSON example​

A complete task JSON example​

FHIR resource ingest​

FHIR resource listing and cohort creation​

Getting updated FHIR resources with a Scheduled Task​

Final Note​

Reference​