-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unclear spark/databricks support #2262
Comments
You may need to specify the exact path to driver in pom.xml |
Thanks @konstjar but it's already set by default to: <spark.classpath>${basedir}/src/main/extras/spark</spark.classpath> And as I wrote above in item number 2 on the list, we've extract and copy the driver to that path. But we are seeking a way to use WebAPI image or at least build the image without the need to fork the repository and change the source code, for example we saw there's support for Redshift, Postgresql and MS SQL Server in the public image. |
@guybartal I'm not sure if this path is correct. We specify in our build exact path. Also I see there are some copy instruction in after the build. Maybe it causes issues. In our case we build .war file and run application under Tomcat server. @leeevans Do you have an idea? |
@leeevans did you have time to see this? We'd appreciate more clarity on that matter. |
Hello @konstjar , @leeevans. It seems like some other folks have faced this issue in the past. As you can see here and here the OHDSI webapi repo was forked and changes were made to the docker file, pom file and added the jar file. I think what we are looking for and especially since others tried to tackle it in the past, is a way to build ohdsi WebAPI with spark "almost" out of the box - meaning passing a build argument to the docker file. What are you thoughts? are there any plans on making the ohdsi WebAPI with spark simpler to build, or is it something you are open for us to contribute? |
@anatbal As far as I know there are no plans for "spark simpler to build". Feel free to submit PR. |
Bumping this as I'm trying to build WebAPI with support for data bricks with @fdefalco. I ran into the same issues as @anatbal @guybartal. I was hoping it would be as simple as using Maven to compile the project with the
which is not currently a directory under the ${basedir}. Tagging @TomWhite-MedStar @konstjar hoping they can share any insights for compiling/deploying WebAPI w/ spark support. |
@anthonysena I do not think we do something different. We have Spark drivers in external folder that's why we specify the path using
|
And you can grab the drivers manually here |
@konstjar , just a clarification on the Maven lifecycle (maybe this worked once but not anymore?) From pom.xml:
This seems to be calling out that there should be an action during the |
@chrisknoll Thank you for the comments. We do CI/CD builds internally using pipelines and every time they create build environment from scratch. That's why I can confirm it's not a one time success case. Since we build WebAPI version with all available profiles, we know that You can see in original message that 2 commands are used |
Thanks @konstjar . My question was that is the 2 separate command redundant when the second command should cover the lifecycle phase It's fine if it must be a two-phase build in maven if you're going to use external JDBC drivers, I just don't think our documentation reflects that. |
Yes, I can confirm that 2 commands are required when additional JDBC profiles are in use. Though my understanding about lifecycle phases was the same. I thought it's enough to have just |
I'm no expert on maven pipeline, but my guess is that when the command starts up it has a cache of all the installed dependencies and so if anything gets 'added' to the m2 repo during the pipeline, it's not known about it until the process restarts, which is why we need to do it in 2 different invocations. Lame, but it would explain it. |
I put together a PR that will both update the driver and remove the maven
You will notice that all I did was add |
Update here based on review of #2339, we've decided to add a new profile WebAPI's pom.xml called 'webapi-databricks' to allow for using the latest DataBricks driver. This is to preserve backwards compatibility for those that are currently using the previous Spark driver and may be using a connection string that starts with Once #2339 is merged, the command to build WebAPI w/ the DataBricks driver would look like this:
|
Expected behavior
Spark drivers included in WebAPI
Actual behavior
We get an error on WebAPI application startup:
Steps to reproduce behavior
Start WebAPI application with ohdsi/webapi:2.12.1 docker image
Description
Steps to support spark source is unclear and undocumented,
what should we do to use spark source without changing Dockerfile?
the steps we did to load the spark drivers are:
/src/main/extras/spark/spark-2.6.22.1040.jar
docker build -t webapi --build-arg MAVEN_PROFILE=webapi-docker,webapi-spark --build-arg GIT_BRANCH=v2.12.1 .
2023-04-30T08:00:58.824720185Z 2023-04-30 08:00:58.824 INFO main org.ohdsi.webapi.DataAccessConfig - [] - driver loaded: com.simba.spark.jdbc.Driver
The text was updated successfully, but these errors were encountered: