Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SQL on FHIR views #1439

Closed
wants to merge 98 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
e59d062
Implement initial part of query, and test stub
johngrimes May 9, 2023
6ecad1b
Get test #2 working
johngrimes May 17, 2023
f0bf7ce
Get test #3 working
johngrimes May 18, 2023
74d0822
Add test #4
johngrimes May 18, 2023
14b0229
Ensure determinism in ordering of join columns
johngrimes May 18, 2023
c0c059d
Add test purpose comments
johngrimes May 18, 2023
5b33f53
Add tests #5-7
johngrimes May 18, 2023
fd25b92
Add test 8
johngrimes May 18, 2023
be78820
Merge branch 'main' into fhir-views
johngrimes May 23, 2023
d936b10
Merge remote-tracking branch 'origin/main' into fhir-views
johngrimes May 24, 2023
9c3cd02
Update version to 6.3.0-SNAPSHOT
johngrimes May 24, 2023
37c01fd
Replace extract column joining logic with view executor method
johngrimes May 24, 2023
b4f2db0
Add ExtractQueryTest#multipleIndependentUnnestings
johngrimes May 24, 2023
b321ead
Add ExtractQueryTest#toleranceOfColumnOrdering
johngrimes May 25, 2023
414f4be
Enable structured extract results within library
johngrimes May 29, 2023
5493c67
Add string coercion
johngrimes May 30, 2023
69d7c3b
Implement joining with topological sorting
johngrimes Jun 7, 2023
6ada657
Merge branch 'main' into fhir-views
johngrimes Jul 9, 2023
71da037
Fix import within NamedFunction
johngrimes Jul 9, 2023
aebef99
Add additional tests
johngrimes Jul 11, 2023
44f61bc
Remove column joins
johngrimes Jul 16, 2023
108922e
Get ExtractQueryTest passing
johngrimes Jul 17, 2023
f050a58
Add comments and remove unnecessary classes
johngrimes Jul 17, 2023
c9c4e95
Refactor aggregate execution to remove joins
johngrimes Jul 17, 2023
f20e23c
Move to windowing functions for aggregate and where functions, remove…
johngrimes Jul 18, 2023
80c9b2c
Improve comments in WhereFunction and AggregateFunction
johngrimes Jul 18, 2023
98193d1
Change count function to use collect_set and size
johngrimes Jul 18, 2023
efdabb3
Differentiate between distinct and non-distinct count scenarios
johngrimes Jul 18, 2023
067b134
Rewrite count function to take account of counting in unnested datasets
johngrimes Jul 18, 2023
03d4899
Get SearchExecutorTest passing
johngrimes Jul 18, 2023
88e3830
Remove join from offset and limit in search
johngrimes Jul 18, 2023
d9a2635
Clear all nesting upon aggregation or filtering
johngrimes Jul 18, 2023
d5de5d8
Alias where function value column before onward processing
johngrimes Jul 19, 2023
3473f20
Handle disaggregation within binary operation with aggregated left op…
johngrimes Jul 19, 2023
d75f26d
Get AggregateQueryTest#queryWithNestedAggregation working
johngrimes Jul 19, 2023
f029eee
Restore original count function implementation, simplify AggregateFun…
johngrimes Jul 19, 2023
74a627b
Update this context when threading dataset through to right operand o…
johngrimes Jul 19, 2023
c1ed6b2
Disaggregate when threading dataset through multiple expressions
johngrimes Jul 19, 2023
787fce7
Aggregate following aggregation expressions where no aggregation has …
johngrimes Jul 20, 2023
16e9580
Reset root erasure after processing of binary operator
johngrimes Jul 20, 2023
20f5f92
Set dataset caching to false for library
johngrimes Jul 20, 2023
d258027
Revert to SLF4J 1.7.36
johngrimes Jul 20, 2023
2710607
Reimplement combine operator using array and explode
johngrimes Jul 20, 2023
0254042
Compact nulls in combine operator
johngrimes Jul 20, 2023
01f82c1
Disable metrics in Infinispan
johngrimes Jul 21, 2023
3bd86cc
Add additional tests to ExtractQueryTest relating to combine
johngrimes Jul 21, 2023
185eb28
Fix problem with key of name prefix in test data
johngrimes Jul 21, 2023
2bb4f43
Fix expectations and thread dataset through multiple function arguments
johngrimes Jul 21, 2023
e209004
Start again with FhirViewExecutor and a single test
johngrimes Jul 29, 2023
3a0e5eb
Add new test: nested singular elements
johngrimes Jul 29, 2023
9cd1f8e
Add explicit unnesting
johngrimes Jul 29, 2023
e6519ea
Add new test: related unnesting
johngrimes Jul 29, 2023
0c81e81
Add new test: resource-related unnesting
johngrimes Jul 31, 2023
a8b068c
Add new test: forEach within from
johngrimes Jul 31, 2023
80ff55d
Add new test: unnest empty collection
johngrimes Jul 31, 2023
def7022
Add flattened blood pressure test and getId function
johngrimes Aug 1, 2023
e75cf04
Enable use of ofType with choice types
johngrimes Aug 3, 2023
fe798f7
Implement constants
johngrimes Aug 3, 2023
b8d97b8
Change all CSV test expectations to TSV
johngrimes Aug 6, 2023
8e60f62
Refactor parser to return FhirPathTransformation instead of FhirPath
johngrimes Aug 6, 2023
bf11cd8
Update FhirViewExecutor to account for the parsing changes
johngrimes Aug 6, 2023
f4a6dc2
Partial refactoring to implement only view execution
johngrimes Aug 22, 2023
9a25420
Commenting out unimplemented code and marking it with @NotImplemented…
piotrszul Aug 25, 2023
4e98e90
Commenting out unimplemented code and marking it with @NotImplemented…
piotrszul Aug 25, 2023
a86ffb7
Fix Collection object construction
johngrimes Sep 5, 2023
e1f1fb5
Update tests to use definitions from the spec
johngrimes Sep 6, 2023
aecf24e
Update element names in tests to match likely consensus
johngrimes Sep 6, 2023
e66aaec
Fix bug in basic query tests
johngrimes Sep 6, 2023
eb7798e
Update FhirViewTest to use new test format
johngrimes Sep 8, 2023
36028ae
Get tests passing up until implementation of comparison
johngrimes Sep 8, 2023
c81c945
Get all basic tests passing
johngrimes Sep 8, 2023
6361302
Get Python view querying functional
johngrimes Sep 8, 2023
193b9c6
Move SQL on FHIR submodule location
johngrimes Sep 10, 2023
b9ec589
Use expect instead of expected
johngrimes Sep 10, 2023
1274efc
Add formatting to pre-commit hook
johngrimes Sep 10, 2023
f0d79d7
Make column aliases optional
johngrimes Sep 10, 2023
a9fec44
Updating the pointer to sql-on-fhir
piotrszul Sep 11, 2023
afb652b
Delegating singularity tracking for column representation of FHIR ele…
piotrszul Sep 14, 2023
b478a2b
Updating the sql-on-fire test cases
piotrszul Sep 14, 2023
af744ee
Making the entire code compilable.
piotrszul Sep 14, 2023
9a80a95
Reverting test data for Patient resource to be compatible with Parser…
piotrszul Sep 15, 2023
5174a7f
Implemented first(), empty(), where(), and count() as collection func…
piotrszul Sep 15, 2023
e10e6f0
Added first draft of new binary operators.
piotrszul Sep 15, 2023
31d6310
Implementing extension support and 'typeOf' operation.
piotrszul Sep 21, 2023
76f6bcf
Refactoring of FHIRViews execution.
piotrszul Sep 26, 2023
4e0dd31
Added (temporarily) new 'v1' tests to extended tests.
piotrszul Sep 26, 2023
c1771b8
Implementing optional arguments to FHIR functions.
piotrszul Sep 26, 2023
eb1a4f8
Implemented 'not()', 'empty()' and 'exists()' for non-resource paths.
piotrszul Sep 26, 2023
c7ee10b
Adding support for empty results in FhirView test cases and correctin…
piotrszul Sep 27, 2023
e104a08
Added direct traversal to nested choice element by name (e.g. `valueI…
piotrszul Sep 27, 2023
fe8a1bc
Fixing the fn_rkeys test.
piotrszul Sep 28, 2023
82aea7a
Updating the sql-on-fire pointer
piotrszul Sep 28, 2023
ae4cbd8
Implementing 'expectError' and 'expectCount' for FhirViewTests.
piotrszul Sep 28, 2023
1f15872
Re-implementing type specifiers as special cases of FhirPath expressi…
piotrszul Sep 29, 2023
764b7ad
Updating the sql-on-fhir pointer after merging changes from the main …
piotrszul Oct 3, 2023
1e1e306
Replacing 'where()' with 'exists()' in FHIRView where clauses.
piotrszul Oct 3, 2023
272b301
Adding support for FHIR operations on singular ResoureceCollections, …
piotrszul Oct 3, 2023
d306c69
Updating test cases for 'forEach' removing the null rows from the fin…
piotrszul Oct 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -900,7 +900,7 @@ ij_json_keep_indents_on_empty_lines = false
ij_json_keep_line_breaks = true
ij_json_space_after_colon = true
ij_json_space_after_comma = true
ij_json_space_before_colon = true
ij_json_space_before_colon = false
ij_json_space_before_comma = false
ij_json_spaces_within_braces = true
ij_json_spaces_within_brackets = false
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,4 @@ _version.py
metastore_db
derby.log
spark-warehouse
.*.crc
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "sql-on-fhir"]
path = sql-on-fhir
url = https://github.com/FHIR/sql-on-fhir-v2.git
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
import java.util.Set;
import javax.validation.constraints.Min;
import javax.validation.constraints.NotNull;
import au.csiro.pathling.encoders.FhirEncoders;
import lombok.Builder;
import lombok.Data;

Expand Down Expand Up @@ -59,19 +60,5 @@ public class EncodingConfiguration {
*/
@NotNull
@Builder.Default
private Set<String> openTypes = Set.of(
"boolean",
"code",
"date",
"dateTime",
"decimal",
"integer",
"string",
"Coding",
"CodeableConcept",
"Address",
"Identifier",
"Reference"
);

private Set<String> openTypes = FhirEncoders.STANDARD_OPEN_TYPES;
}
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,25 @@
*/
public class FhirEncoders {

/**
* The reasonable default set of open types to encode with extension values.
*/
public static final Set<String> STANDARD_OPEN_TYPES = Set.of(
"boolean",
"code",
"date",
"dateTime",
"decimal",
"integer",
"string",
"Coding",
"CodeableConcept",
"Address",
"Identifier",
"Reference"
);


/**
* Cache of Encoders instances.
*/
Expand Down Expand Up @@ -308,6 +327,15 @@ public Builder withOpenTypes(final Set<String> openTypes) {
return this;
}

/**
* Sets the reasonable default list of types to be encoded for open types, such as extensions.
*
* @return this builder
*/
public Builder withStandardOenTypes() {
return withOpenTypes(STANDARD_OPEN_TYPES);
}

/**
* Switches on/off the support for extensions in encoders.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@

package au.csiro.pathling.encoders

import org.apache.spark.sql.{Column, functions}
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, UnresolvedExtractValue}
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.expressions.codegen.Block._
import org.apache.spark.sql.catalyst.expressions.codegen.{CodeGenerator, CodegenContext, ExprCode, FalseLiteral}
Expand Down Expand Up @@ -304,3 +306,87 @@ case class AttachExtensions(targetObject: Expression,
AttachExtensions(newChildren.head, newChildren.tail.head)
}
}


case class UnresolvedIfArray(value: Expression, arrayExpressions: Expression => Expression, elseExpression: Expression => Expression)
extends Expression with Unevaluable with NonSQLExpression {

override def mapChildren(f: Expression => Expression): Expression = {

val newValue = f(value)

if (newValue.resolved) {
newValue.dataType match {
case ArrayType(_, _) => f(arrayExpressions(newValue))
case _ => f(elseExpression(newValue))
}
}
else {
copy(value = newValue)
}
}

override def dataType: DataType = throw new UnresolvedException("dataType")

override def nullable: Boolean = throw new UnresolvedException("nullable")

override lazy val resolved = false

override def toString: String = s"$value"

override def children: Seq[Expression] = value :: Nil

override def withNewChildrenInternal(newChildren: IndexedSeq[Expression]): Expression = {
UnresolvedIfArray(newChildren.head, arrayExpressions, elseExpression)
}
}


case class UnresolvedUnnest(value: Expression)
extends Expression with Unevaluable with NonSQLExpression {

override def mapChildren(f: Expression => Expression): Expression = {

val newValue = f(value)

if (newValue.resolved) {
newValue.dataType match {
case ArrayType(ArrayType(_, _), _) => Flatten(newValue)
case _ => newValue
}
}
else {
copy(value = newValue)
}
}

override def dataType: DataType = throw new UnresolvedException("dataType")

override def nullable: Boolean = throw new UnresolvedException("nullable")

override lazy val resolved = false

override def toString: String = s"$value"

override def children: Seq[Expression] = value :: Nil

override def withNewChildrenInternal(newChildren: IndexedSeq[Expression]): Expression = {
UnresolvedUnnest(newChildren.head)
}
}


object ValueFunctions {
def ifArray(value: Column, arrayExpressions: Column => Column, elseExpression: Column => Column): Column = {
val expr = UnresolvedIfArray(value.alias(value.hashCode().toString).expr,
e => arrayExpressions(new Column(e)).expr, e => elseExpression(new Column(e)).expr)
new Column(expr)
}

def unnest(value: Column): Column = {
val expr = UnresolvedUnnest(value.alias(value.hashCode().toString).expr)
new Column(expr)
}

}

102 changes: 102 additions & 0 deletions encoders/src/test/java/au/csiro/pathling/encoders/ExpressionsTest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
/*
* This is a modified version of the Bunsen library, originally published at
* https://github.com/cerner/bunsen.
*
* Bunsen is copyright 2017 Cerner Innovation, Inc., and is licensed under
* the Apache License, version 2.0 (http://www.apache.org/licenses/LICENSE-2.0).
*
* These modifications are copyright 2023 Commonwealth Scientific and Industrial Research
* Organisation (CSIRO) ABN 41 687 119 230.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package au.csiro.pathling.encoders;

import org.apache.spark.sql.*;
import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;

import static au.csiro.pathling.encoders.ValueFunctions.*;

/**
* Test for FHIR encoders.
*/
public class ExpressionsTest {


private static SparkSession spark;

/**
* Set up Spark.
*/
@BeforeAll
public static void setUp() {
spark = SparkSession.builder()
.master("local[*]")
.appName("testing")
.config("spark.driver.bindAddress", "localhost")
.config("spark.driver.host", "localhost")
.getOrCreate();


}

/**
* Tear down Spark.
*/
@AfterAll
public static void tearDown() {
spark.stop();
}

public static Column size(Column arr) {
return ifArray(arr, functions::size, x -> functions.when(arr.isNotNull(), functions.lit(1)).otherwise(functions.lit(0)));
}

@Test
public void testIfArray() {
Dataset<Row> ds = spark.range(2).toDF();
Dataset<Row> resultDs = ds
.withColumn("id_array", functions.array(functions.col("id"), functions.lit(22)))
.withColumn("test_single", ifArray(ds.col("id"), x -> functions.array(functions.lit(20)), x -> functions.lit(10)))
.withColumn("test_array", ifArray(functions.col("id_array"), x -> functions.array(functions.lit(20)), x -> functions.lit(10)))
.withColumn("test_array_with_unresolved", ifArray(functions.col("id_array"), x -> functions.array(functions.col("id_array")), x -> functions.lit(10)))
.withColumn("test_lit", ifArray(functions.lit("a1"), x -> functions.array(functions.lit(20)), x -> functions.lit(10)))
.withColumn("test_array_lit", ifArray(functions.array(functions.lit("a1")), x -> functions.array(ds.col("id")), x -> ds.col("id")))
.withColumn("test_array_lit_lambda", ifArray(functions.filter(functions.array(functions.lit("a1")), c -> c.equalTo("a1")), x -> functions.array(functions.lit(20)), x -> functions.lit(10)))
.withColumn("test_array_size", size(functions.col("id_array")))
.withColumn("test_array_where", ifArray(functions.col("id_array"), x -> functions.filter(x, c -> size(c).equalTo(functions.col("id"))), x -> functions.lit(0)));
resultDs.show();
resultDs.collectAsList().forEach(System.out::println);
}

@Test
public void testFlatten() {
Dataset<Row> ds = spark.range(2).toDF();
Dataset<Row> resultDs = ds
.withColumn("id_array", functions.array(functions.col("id"), functions.lit(22)))
.withColumn("id_array_of_arrays", functions.array(functions.array(functions.col("id"), functions.lit(22)),
functions.array(functions.col("id"), functions.lit(33))))
.withColumn("test_unnest_single", unnest(ds.col("id")))
.withColumn("test_unnest_array", unnest(functions.col("id_array")))
.withColumn("test_unnest_array_of_arrays", unnest(functions.col("id_array_of_arrays")))
.withColumn("test_unnest_array_of_arrays_if_array", ifArray(functions.col("id_array"), x -> unnest(functions.col("id_array_of_arrays")), x -> x))
.withColumn("test_unnest_array_of_arrays_if_id", ifArray(functions.col("id"), x -> unnest(functions.col("id_array_of_arrays")), x -> x));

resultDs.show();
resultDs.collectAsList().forEach(System.out::println);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,22 @@

package au.csiro.pathling.aggregate;

import static java.util.stream.Collectors.toList;

import au.csiro.pathling.config.QueryConfiguration;
import au.csiro.pathling.fhirpath.FhirPath;
import au.csiro.pathling.fhirpath.Materializable;
import au.csiro.pathling.fhirpath.collection.Collection;
import au.csiro.pathling.io.Database;
import au.csiro.pathling.io.source.DataSource;
import au.csiro.pathling.terminology.TerminologyServiceFactory;
import ca.uhn.fhir.context.FhirContext;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Optional;
import java.util.function.Function;
import java.util.stream.Collectors;
import javax.annotation.Nonnull;
import lombok.extern.slf4j.Slf4j;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.hl7.fhir.r4.model.Type;
Expand Down Expand Up @@ -73,6 +74,8 @@ public AggregateExecutor(@Nonnull final QueryConfiguration configuration,
public AggregateResponse execute(@Nonnull final AggregateRequest query) {
final ResultWithExpressions resultWithExpressions = buildQuery(
query);
resultWithExpressions.getDataset().explain();
resultWithExpressions.getDataset().show(1_000, false);

// Translate the result into a response object to be passed back to the user.
return buildResponse(resultWithExpressions);
Expand All @@ -95,16 +98,16 @@ private AggregateResponse buildResponse(
.map(mapRowToGrouping(resultWithExpressions.getParsedAggregations(),
resultWithExpressions.getParsedGroupings(),
resultWithExpressions.getParsedFilters()))
.collect(Collectors.toList());
.collect(toList());

return new AggregateResponse(groupings);
}

@Nonnull
@SuppressWarnings("unchecked")
private Function<Row, AggregateResponse.Grouping> mapRowToGrouping(
@Nonnull final List<FhirPath> aggregations, @Nonnull final List<FhirPath> groupings,
@Nonnull final Collection<FhirPath> filters) {
@Nonnull final List<Collection> aggregations, @Nonnull final List<Collection> groupings,
@Nonnull final java.util.Collection<Collection> filters) {
return row -> {
final List<Optional<Type>> labels = new ArrayList<>();
final List<Optional<Type>> results = new ArrayList<>();
Expand All @@ -113,7 +116,7 @@ private Function<Row, AggregateResponse.Grouping> mapRowToGrouping(
final Materializable<Type> grouping = (Materializable<Type>) groupings.get(i);
// Delegate to the `getValueFromRow` method within each Materializable path class to extract
// the Type value from the Row in the appropriate way.
final Optional<Type> label = grouping.getValueFromRow(row, i);
final Optional<Type> label = grouping.getFhirValueFromRow(row, i);
labels.add(label);
}

Expand All @@ -122,7 +125,7 @@ private Function<Row, AggregateResponse.Grouping> mapRowToGrouping(
final Materializable aggregation = (Materializable<Type>) aggregations.get(i);
// Delegate to the `getValueFromRow` method within each Materializable path class to extract
// the Type value from the Row in the appropriate way.
final Optional<Type> result = aggregation.getValueFromRow(row, i + groupings.size());
final Optional<Type> result = aggregation.getFhirValueFromRow(row, i + groupings.size());
results.add(result);
}

Expand Down
Loading