Insights API
In this section, we will explain how to use the insights API. This will specifically look at the JSON contract and not the SQL capabilities. To learn more about the DSL language for insights, please visit the insights filters page.
The goal of the insights API is to provide advanced searching and analytical capabilities, such as the power to search across patient and genomic data, run a variety of aggregations, and perform statistical operations. Most data that is added to LifeOmic's patient and genomic services will automatically be indexed in analytics, requiring no extra configuration or actions from the user.
Overview
This section explains the contract structure of the insights API. The goal is to
provide an overall intuition and understanding of how each component works in the
API. First, to access the insights API, you must have authorization to hit
this endpoint. Also, once authorized, you must have permission to be able to
access insights data. The insights API can be accessed through the following
endpoint: https://api.us.lifeomic.com/v1/analytics/dsl
While the insights API can be called over HTTP with a JSON body, the endpoint
itself is not RESTful. Instead all requests require a POST body with varied
contracts. The main components of this JSON contract are datasetId
, query
,
target
, domain
, options
, and where
. The datasetId
, also known as the
projectId, is the UUID for your LifeOmic Platform project. This is a parent level key in the
contract. Next, query
is the object that stores all of the query information.
This includes the target
, domain
, options
, and where
clauses. The
target
can be one of three values: variant
, gene
, or patient
. We will
explain more about each of these targets in a later section. domains
are
specific operations that are available within the scope of a target. For
example, within the variant target, we may want to look at associated samples,
given a filter object, or on the other hand, want an OncoPrint representation.
Domains have a discrete number of values, and the list of available domains can
be seen in a later section. options
are domain specific contracts that can be
utilized for custom configurations of outputs, aggregations, filters, and more.
Finally, where
represents the filter body across genomic and patient data.
Example Skeleton
{
"dataset_id": "UUID for project",
"query": {
"target": "variant|patient|gene",
"domain": "domain specific to target",
"options": {
...
},
"where": {
...
}
}
}
Where Clause
The where
clause in the insights contract allows users to search across
genomic and patient data, regardless of the specified target
or corresponding
domain
. For example, this separation could allow a user to look at gene
expression statistics for individuals below the age of 40, or a summary of
patients that have a specific genetic mutation. In order to fully capture the
breadth of searching combinations that is offered through the insights engine,
the where
object is recursive in nature, providing boolean combinations of
various disparate datasets. With that in mind, the where
data structure is a
tree, which contains boolean options such as or
, and
, and xor
or resource
targets, such as variant
, gene
, and patient
. Both variant
and patient
targets also allow recursive searching within its own resource. For now, we will
focus on the composition of the top level of this data structure. To provide
context, an example is provided below:
{
"dataset_id": "UUID for project",
"query": {
"target": "variant|patient|gene",
"domain": "domain specific to target",
"options": {
...
},
"where": {
"and": [{
"or": [{
"variant": {
...
}
}, {
"variant": {
...
}
}]
}, {
"patient": {
...
}
}]
}
}
}
In the where clause in the JSON above, we are looking for individuals that satisfy either variant clause and also satisfy the patient clause. The simplified hierarchy can be visualized as:
where
and
or
variant clause
another variant clause
patient clause
With this recursive data structure, one can see the numerous combinations that can be constructed through the DSL. One thing that should be noted is that only one item can be supplied from the clauses and
, or
, xor
, variant
, patient
, and gene
. In other words, implicit booleans are not supported at the top layer of filtering and require explicit operators or resources in each clause.
Filter Targets
While the above where clause example demonstrates how you could combine multiple targets to get a single filter, it has not provided any details on what belongs in those filters. In the following section, we will clarify the different options that can be added for each one of the three targets. This will allow users to better understand the capabilities of the insights engine.
Filter Variants
One of the filterable resource targets that can be utilized on the insights engine is genetic variants. Currently, the variant
target only supports the sub resources and
, or
, samples
, genes
, and gnosis
. Gnosis is LifeOmic's genetic knowledge base that accumulates multiple open and close source resources. This includes popular knowledge bases such as Clinvar, Cosmic, DBSnp, and more. In this section, we will describe the capabilities of the variant
filter target.
To provide some context up front, it is best to demonstrate an example and explain the moving pieces.
{
"variant": {
"or": [
{
"gnosis": {
"gene": [
{
"operator": "eq",
"value": "KRAS"
},
{
"operator": "eq",
"value": "PIK3CA"
}
],
"population_allele_frequency": [
{
"operator": "lte",
"value": 0.1
}
]
}
},
{
"gnosis": {
"gene": [
{
"operator": "eq",
"value": "BRCA2"
}
],
"cosmic_sample_count": [
{
"operator": "range",
"lower": 2,
"upper": 6
}
]
}
}
]
}
}
First, it is worth noting, other than the selected genes being associated with cancer research, this query does not hold any specific merit and is mostly random. In plain English, the above query is asking for
return variants and samples that either have a mutation in KRAS or PIK3CA as labeled by the gnosis annotation with a population frequency less than 10 percent OR have a mutation in BRCA2 with a cosmic sample count between 2 and 6 from the gnosis annotation
.
Before we describe how this query is constructed, it may be worth looking at the plain English definition a couple of times and determining how the items map to the JSON example.
First, it is worth noting that some new capabilities are offered in the JSON above, including implicit "and's" and "or's". From previous sections, it was demonstrated how to run explicit and
's and or
's, but at a level higher than the target resources, implicits are not supported. In the following example, a
demonstration of an implicit or
is shown.
{
"gene": [
{
"operator": "eq",
"value": "KRAS"
},
{
"operator": "eq",
"value": "PIK3CA"
}
]
}
This query is asking for variants and samples with a KRAS or PIK3CA mutation. Notice implicit or
's can only be used within a single resource. For more complex or
clauses, the user should use the explicit or
sub-resource. Next, let's look at an example of an implicit and
.
{
"gnosis": {
"gene": [
{
"operator": "eq",
"value": "KRAS"
},
{
"operator": "eq",
"value": "PIK3CA"
}
],
"population_allele_frequency": [
{
"operator": "lte",
"value": 0.1
}
]
}
}
First notice how both gene
and population_allele_frequency
are supplied under the gnosis
sub-resource. It is worth mentioning that implicit and
's can only be utilized within a sub-resource that is not and
or or
. For example, this is an INVALID query:
{
"gnosis": {...},
"and": [{...}]
}
But this is a valid query:
{
"gnosis": {
"gene": [{...}],
"cosmic_sample_count": [{...}]
}
}
Also, when supplying genes
or samples
, the array contains an or
relationship. For example:
{
"samples": ["first", "second", "third"]
}
These items are all or'd together.
One other thing to note is the variant
target accepts and
and or
's within its JSON block. Queries that utilize this pattern when necessary can see performance gains. A general good rule of thumb is if you are using and
or or
with the same targets, the and
or or
should live inside of the target block. If there are items from other targets, you can add the and
and or
to the parent level of the query.
Variant Filter Options
With all of this information, we can now see all of the options for filtering variants. Variants is unique to other data sources in that it has only one sub-resource that can have actionable parameters (that sub-resource being gnosis). Below shows all of the options that are possible within gnosis.
{
"gnosis": {
"chromosome": [
{
"operator": "eq|ne",
"value": "string_value with chr prefix (chr1, chr2, etc)"
}
],
"position": [
{
"operator": "lt|gt|lte|gte|eq|ne|range",
"value": 100000
}
],
"population_allele_frequency": [
{
"operator": "lt|gt|lte|gte|eq|ne|range",
"value": 100000
}
],
"rs_id": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"id": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"clinvar_allele_id": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"clinical_disease": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"clinical_review": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"clinical_significance": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"cosmic_id": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"mutation_status": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"histology": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"tumor_site": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"ebcanon_class": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"ebcanon_group": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"impact": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"gene": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"transcript_id": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"biotype": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"amino_acid_change": [
{
"operator": "eq|ne",
"value": "string_value"
}
],
"cosmic_sample_count": [
{
"operator": "lt|gt|lte|gte|eq|ne|range",
"value": 100000
}
]
}
}
One thing you may have noticed is that numerical columns have a range
operator, but only a single value
. That is actually incorrect and only meant to show how you would use the other operators with a value. To use the range
operator, the following contract is necessary:
{
"gnosis": {
"position": [
{
"operator": "range",
"lower": 10000,
"upper": 20000
}
]
}
}