Tasks
Task service allows for direct access to cloud data in a secure compute environment with your own code inside Docker containers.
The LifeOmic CLI (Command Line Interface) offers a familiar interface for a user to develop and run their code.
Users will need to have an account set up and are able to log in at https://apps.us.lifeomic.com/phc before proceeding with using the CLI to run tasks.
General concept
Every resource in the LifeOmic Platform is identified by a unique ID which looks like this
4a113171-9f4a-48e2-82be-5682f476cc76
. In general, use the ID instead of
resource name with the CLI. When working with task service and
files, it is often necessary to specify the project or dataset ID to work under.
Project and dataset are used interchangeably to mean the same thing.
Listing of files in project
The following command lists the datasets/projects under the account, which has
only one with datasetId = 4a113171-9f4a-48e2-82be-5682f476cc76
:
>> lo projects list
items:
-
id: 4a113171-9f4a-48e2-82be-5682f476cc76
name: analytics-testing
description: Analytics Testing Project
lrn: lrn:lo:dev:lifeomic:project:4a113171-9f4a-48e2-82be-5682f476cc7
To list the files in a dataset/project, provide the datasetId (in the example below, the dataset is e447d01a-ae17-48d0-8cd8-86c9a65f779b) with the command like this which returns a list of two files:
>> lo files list e447d01a-ae17-48d0-8cd8-86c9a65f779b
items:
-
id: 89cac4d4-e1a4-4d9b-912b-1bcc4908b9b7
name: mmrf.rgel
datasetId: e447d01a-ae17-48d0-8cd8-86c9a65f779b
size: 1223998865
contentType: application/octet-stream
lastModified: 2018-06-15T19:01:58.365Z
lrn: lrn:lo:dev:lifeomic:file:89cac4d4-e1a4-4d9b-912b-1bcc4908b9b7
-
id: 988d4158-6aac-47a2-99d5-e5de8fa5acb1
name: mmrf.rgel.executor.0.stderr.txt
datasetId: e447d01a-ae17-48d0-8cd8-86c9a65f779b
size: 82
contentType: text/plain
lastModified: 2018-06-15T19:22:12.438Z
lrn: lrn:lo:dev:lifeomic:file:988d4158-6aac-47a2-99d5-e5de8fa5acb1
Upload of file
To upload a single file:
lo files upload ./myfile.txt <datasetId>
To upload a whole directory of files, provide the path to the local directory
instead of a file name. Note that the files will be prefixed with the directory
name <localDir>
when uploaded to the LifeOmic Platform.
lo files upload <localDir> <datasetId>
Task Service with the CLI
Submitting a task job using the CLI will create the JSON job definition. A task
JSON file contains the <name>
and <datasetId>
declarations, followed by
these main sections: inputs, outputs, resources and executors. The input and
output sections are files and directories to copy and write out upon execution.
Resource specifies the CPU cores needed and the RAM in GB. The executors section
defines a list of Docker images to execute serially in the order of declaration.
Here is an example of file specification as input. Provide the file's <url>
,
using its unique ID which can be obtained by listing the files in the dataset.
The <path>
is where the file will be copied and available to the Docker images
upon execution.
{
"path": "/tmp/input.txt",
"url": "https://api.us.lifeomic.com/v1/files/32c94154-1910-4e77-ab98-c6c8f2163060",
"type": "FILE"
}
For directory as input, provide the <url>
with the project/dataset ID, and the
<prefix>
specifies the path prefix in which all files with that prefix in the
project will be copied over to the <path>
.
{
"path": "/tmp/in",
"url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
"prefix": "data/",
"type": "DIRECTORY"
}
To save the results from the execution, declare the file or directory to be
copied out with <url>
being the project/dataset ID. In the following example,
file "/out/result.txt" and directory "/outDir" in the container will be copied
out upon successful completion of the task.
({
"path": "/out/result.txt",
"url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
"type": "FILE"
},
{
"path": "/outDir",
"url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
"type": "DIRECTORY"
})
Scheduling
Tasks can be configured to run at a future time or on a recurring schedule. recurring schedule.
To schedule a task to run at a specific date and time, add the following to the task definition:
{
"scheduleDate": "2019-07-26T16:35:31Z"
}
To schedule a task to run on a recurring schedule, use a cron expression:
{
"scheduleExpression": "0 0 * * *"
}
To stop a recurring task, use the cancel task API.
Email Notifications
Email notifications can be sent when a task completes or fails. To use this feature, add the following to the task definition:
{
"email": {
"sendFailedTo": "user@company.com",
"sendCompletedTo": "user@company.com
}
}
In the example above, an email will be sent if the task fails or completes.
Using a non-public image
Task Service can pull images from any public Docker registry like Dockerhub. A non-public image can be used by uploading an export of it to the LifeOmic Platform and then specifying it as an input to the task.
Use docker save
to create a gzipped TAR file of the image and then use the CLI
to upload to a project.
docker build -t my_image --rm .
docker save my_image | gzip > myimage.tar.gz
lo files upload ./myimage.tar.gz 0ec93203-febb-4c85-9aac-229703b6fa58
Specify the uploaded Docker image as an input to a task. The task service will the gzipped TAR file and loads the Docker image.
{
"url": "https://api.us.lifeomic.com/v1/files/32c94154-1910-4e77-ab98-c6c8f2163060",
"path": "/tmp/myimage.tar.gz",
"type": "DOCKER_IMAGE"
}
Use the image in an executor within the task. Note: You use the image tag name and not the name of the gzipped TAR file.
"executors": [
{
"workdir": "/tmp",
"image": "my_image",
"command": [
"echo",
"hello world"
],
"stderr": "/out/stderr.txt",
"stdout": "/out/stdout.txt"
}
]
Hello world JSON example
To run the following example, save the definition as a JSON file (e.g.
hello.json) and change <datasetId>
and the output <url>
with your datasetId.
This task uses the downloaded image "busybox" from
https://hub.docker.com/_/busybox/ and executes the Linux command "echo hello
world" which is saved to the "stdout" in file "out/stdout.txt". Users will find
the file "out/stdout.txt" in the file listing in the UI with content "hello
world" text. Note: There is no input declared with only an output directory
to return.
To submit the job, run:
cat hello.json | lo tasks create
{
"name": "Hello World Task",
"datasetId": "0ec93203-febb-4c85-9aac-229703b6fa58",
"inputs": [
],
"outputs": [
{
"path": "/out",
"url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
"type": "DIRECTORY"
}
],
"resources": {
"cpu_cores": 1,
"ram_gb": 1
},
"executors": [
{
"workdir": "/tmp",
"image": "busybox",
"command": [
"echo",
"hello world"
],
"stderr": "/out/stderr.txt",
"stdout": "/out/stdout.txt"
}
]
}
List files JSON example
This example lists the files in a directory and saves the result in a file.
Assume there exists files with prefix "data/" and a bash script file "run.sh"
with file id = 32c94154-1910-4e77-ab98-c6c8f2163060
in the project. The
content of the bash script file "run.sh" has 2 lines:
#!/bin/bash
ls -al $1 > $2
The result of the listing of directory "/tmp/in" is saved to the file "/out/result.txt".
{
"name": "Task Service Test",
"datasetId": "0ec93203-febb-4c85-9aac-229703b6fa58",
"inputs": [
{
"path": "/tmp/in",
"url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
"prefix": "data/",
"type": "DIRECTORY"
},
{
"path": "/tmp/run.sh",
"url": "https://api.us.lifeomic.com/v1/files/32c94154-1910-4e77-ab98-c6c8f2163060",
"type": "FILE"
}
],
"outputs": [
{
"path": "/out",
"url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
"type": "DIRECTORY"
}
],
"resources": {
"cpu_cores": 1,
"ram_gb": 1
},
"executors": [
{
"workdir": "/tmp",
"image": "busybox",
"command": [
"sh",
"-l",
"/tmp/run.sh",
"/tmp/in",
"/out/result.txt"
],
"stderr": "/out/stderr.txt",
"stdout": "/out/stdout.txt"
}
]
}
A complete task JSON example
This is a more comprehensive example of taking a variant "vcf" file and passing it through a series of processing. It uses LifeOmic GNOSIS data resources as reference data inputs. The various resources available in GNOSIS to be used with Task Service will be an advanced topic to discuss. This example demonstrates the practical usage of Task Service to perform a complete series of tasks.
{
"name": "NantOmics Test",
"datasetId": "0ec93203-febb-4c85-9aac-229703b6fa58",
"inputs": [
{
"path": "/tmp/nantomics.vcf",
"url": "https://api.us.lifeomic.com/v1/files/5a61bfdb-5db0-4264-8573-ee9945383cf7",
"type": "FILE"
},
{
"path": "/tmp/genome",
"name": "GRCh37",
"genome": "GRCh37",
"type": "GNOSIS"
},
{
"path": "/tmp/clinvar",
"name": "ClinVar",
"genome": "GRCh37",
"type": "GNOSIS"
},
{
"path": "/tmp/cosmic",
"name": "COSMIC",
"genome": "GRCh37",
"type": "GNOSIS"
},
{
"path": "/tmp/dbsnp",
"name": "dbSNP",
"genome": "GRCh37",
"type": "GNOSIS"
}
],
"outputs": [
{
"path": "/out",
"url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
"type": "DIRECTORY"
},
{
"path": "/log",
"url": "https://api.us.lifeomic.com/v1/projects/0ec93203-febb-4c85-9aac-229703b6fa58",
"type": "DIRECTORY"
}
],
"resources": {
"cpu_cores": 1,
"ram_gb": 4
},
"executors": [
{
"workdir": "/tmp",
"image": "lifeomic/kopis-task-nant-et",
"command": [
"extract",
"-i",
"/tmp/nantomics.vcf",
"-v",
"/out/nantomics.var.vcf.gz",
"-s",
"/out/nantomics.sv.vcf.gz",
"-c",
"/out/nantomics.cnv.vcf.gz",
"-e",
"/out/nantomics.exp.vcf.gz"
],
"stderr": "/log/stderr1.txt",
"stdout": "/log/stdout1.txt"
},
{
"workdir": "/tmp",
"image": "lifeomic/kopis-task-nant-et",
"command": [
"var-transform",
"-i",
"/out/nantomics.var.vcf.gz",
"-o",
"/out/nantomics.var.std.vcf.gz"
],
"stderr": "/log/stderr2.txt",
"stdout": "/log/stdout2.txt"
},
{
"workdir": "/tmp",
"image": "lifeomic/kopis-task-nant-et",
"command": [
"exp-transform",
"-i",
"/out/nantomics.exp.vcf.gz",
"-g",
"/out/nantomics.exp.gene.txt.gz",
"-s",
"/out/nantomics.exp.iso.txt.gz"
],
"stderr": "/log/stderr3.txt",
"stdout": "/log/stdout3.txt"
},
{
"workdir": "/tmp",
"image": "lifeomic/kopis-task-vtools",
"command": [
"vt-combo",
"-r",
"/tmp/genome/GRCh37.fa.gz",
"-i",
"/out/nantomics.var.std.vcf.gz",
"-o",
"/out/nantomics.var.nrm.vcf.gz"
],
"stderr": "/log/stderr4.txt",
"stdout": "/log/stdout4.txt"
},
{
"workdir": "/tmp",
"image": "lifeomic/kopis-task-snpeff-grch37",
"command": [
"snpeff",
"-m",
"Refseq",
"-i",
"/out/nantomics.var.nrm.vcf.gz",
"-o",
"/out/nantomics.var.fnc.vcf.gz"
],
"stderr": "/log/stderr5.txt",
"stdout": "/log/stdout5.txt"
},
{
"workdir": "/tmp",
"image": "lifeomic/kopis-task-snpeff-grch37",
"command": [
"snpsift-annotate",
"-p",
"CLN_",
"-n",
"/tmp/clinvar/clinvar-GRCh37.vcf.gz",
"-i",
"/out/nantomics.var.fnc.vcf.gz",
"-o",
"/out/nantomics.var.cln.vcf.gz"
],
"stderr": "/log/stderr6.txt",
"stdout": "/log/stdout6.txt"
},
{
"workdir": "/tmp",
"image": "lifeomic/kopis-task-snpeff-grch37",
"command": [
"snpsift-annotate",
"-p",
"CMC_",
"-n",
"/tmp/cosmic/cosmic-GRCh37.vcf.gz",
"-i",
"/out/nantomics.var.cln.vcf.gz",
"-o",
"/out/nantomics.var.cmc.vcf.gz"
],
"stderr": "/log/stderr7.txt",
"stdout": "/log/stdout7.txt"
},
{
"workdir": "/tmp",
"image": "lifeomic/kopis-task-snpeff-grch37",
"command": [
"snpsift-annotate",
"-p",
"DBS_",
"-n",
"/tmp/dbsnp/dbsnp-GRCh37.vcf.gz",
"-i",
"/out/nantomics.var.cmc.vcf.gz",
"-o",
"/out/nantomics.var.dbs.vcf.gz"
],
"stderr": "/log/stderr8.txt",
"stdout": "/log/stdout8.txt"
}
]
}
FHIR resource ingest
This example shows how to ingest FHIR resources from a file using a task. In this example there are no executors because the FHIR resources are taken as-is from the file with no transformation.
{
"name": "FHIR ingest",
"datasetId": "643efe57-430f-4b06-b1b0-3e565c62a64c",
"inputs": [
{
"path": "/tmp/fhir.json",
"url": "https://api.us.lifeomic.com/v1/files/146e0679-0e03-4e05-af46-d930cfaec761",
"type": "FILE"
}
],
"outputs": [
{
"path": "/tmp/fhir.json",
"url": "https://api.us.lifeomic.com/v1/projects/643efe57-430f-4b06-b1b0-3e565c62a64c",
"type": "FHIR"
}
],
"resources": {
"cpu_cores": 1,
"ram_gb": 1
},
"executors": []
}
The file of FHIR resources (fhir.json
in this example) should be in a
JSON Lines format (aka newline-delimited JSON). For
example:
{"resourceType":"Patient","name":[{"family":"Zieme","given":["Mina"]}],"gender":"female","id":"024f2316-265a-46e8-965a-837e308ae678","birthDate":"1977-06-21"}
{"status":"final","code":{"coding":[{"code":"11142-7","system":"http://loinc.org","display":"Glucose"}]},"resourceType":"Observation","id":"62f3ccbf-c51b-48ed-ad1d-0420ea196af6","subject":{"reference":"Patient/024f2316-265a-46e8-965a-837e308ae678"},"effectiveDateTime":"1999-09-09T23:20:53Z","valueQuantity":{"value":10,"unit":"mg/DL","system":"http://unitsofmeasure.org/","code":"mg/DL"}}
{"status":"final","code":{"coding":[{"code":"11142-7","system":"http://loinc.org","display":"Glucose"}]},"resourceType":"Observation","id":"b452acd6-00c9-4fab-847c-31177e14e412","subject":{"reference":"Patient/024f2316-265a-46e8-965a-837e308ae678"},"effectiveDateTime":"1999-05-05T15:11:18Z","valueQuantity":{"value":0,"unit":"mg/DL","system":"http://unitsofmeasure.org/","code":"mg/DL"}}
FHIR resource listing and cohort creation
This example queries 100,000 FHIR Observations and writes them in JSON Lines format to a file in the task, and then runs a container to compute some statistics and make a cohort out of the outliers.
{
"name": "FHIR Analytics",
"datasetId": "cccdf419-ac83-4b7e-aa2d-70702d43297c",
"inputs": [
{
"resourceType": "Observation",
"limit": 100000,
"path": "/fhir/Observation.json",
"type": "FHIR"
}
],
"outputs": [
{
"path": "/output/",
"url": "https://api.us.lifeomic.com/v1/projects/cccdf419-ac83-4b7e-aa2d-70702d43297c",
"type": "DIRECTORY"
},
{
"path": "/cohorts/cohort.csv",
"url": "https://api.us.lifeomic.com/v1/projects/cccdf419-ac83-4b7e-aa2d-70702d43297c",
"type": "COHORT"
}
],
"resources": {
"cpu_cores": 1,
"ram_gb": 1
},
"executors": [
{
"image": "aroach/task-sandbox:6",
"command": ["python", "stats.py"],
"stderr": "/output/stderr.txt",
"stdout": "/output/stdout.txt"
}
]
}
Getting updated FHIR resources with a Scheduled Task
This example creates a Scheduled Task that runs weekly and gets only the FHIR
Patients that have been updated since the last time it was run. It does this by
querying the _lastUpdated
field of the FHIR records and by using the special
variables startTime
and lastSuccessfulStartTime
to limit the results to
those that have been updated since the last time it was run successfully. It
also stops it at the current start time to prevent overlap on the next run.
The curly braces are a special syntax to represent a placeholder for the value
inside the braces. This will cause the braces, and everything inside of them, to
be replaced with the specified variable before the input is downloaded to the
container.
{
"name": "Get Updated Patients",
"datasetId": "8913220a-6e22-4747-9f00-8477c475b1ec",
"scheduleExpression": "0 0 * * 0",
"inputs": [
{
"path": "/fhir/patients.json",
"type": "FHIR",
"resourceType": "Patient",
"limit": 100000,
"query": "_lastUpdated=gt{{lastSuccessfulStartTime}}&_lastUpdated=le{{startTime}}"
}
],
"outputs": [
{
"path": "/fhir",
"url": "https://api.dev.lifeomic.com/v1/projects/8913220a-6e22-4747-9f00-8477c475b1ec",
"type": "DIRECTORY"
}
],
"resources": {
"cpu_cores": 1,
"ram_gb": 1
},
"executors": []
}
NOTE: The first time a Scheduled Task runs, lastSuccessfulStartTime
will
be set to the Unix Epoch (Jan. 1st 1970 at midnight, UTC time). See:
https://en.wikipedia.org/wiki/Unix_time
Final Note
The task and its execution status can be seen from the LifeOmic Platform UI. The task listing is only there for a period of time.
Reference
- Docker Overview - https://docs.docker.com/engine/docker-overview/
- Registry of Docker based tools and workflows defined in CWL or WDL for the sciences - https://dockstore.org
- GA4GH TES schemas (The LifeOmic Task Service API is based on this) - https://github.com/ga4gh/task-execution-schemas