-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of tensorflow_decision_forests #480
Comments
Did you get any logs or errors out of the |
Thanks again. I followed the instructions on tensorflow/decision-forests#81 but did not get as far as the commenter there because I get an error immediately on the call to
Following the instructions there I downloaded this python wheel and extracted the
I also tried grabbing a few different versions of shared libraries from |
Hmm. Well that symbol has been in TF for years, so it's not new enough that we're missing it. Maybe it's not exported from our build of the native libraries? @karllessard is that something we can easily control? |
If you look at this patch, I think that is where we can define additional symbols to be exported when we compile TF. But looking at the exported symbols in the TensorFlow binaries we are building, this symbol is present at the exception that it is missing the "B5cxx11" part:
So maybe we are missing a compiler option instead? |
Yeah, if I look at the Python distribution, the "B5cxx11" part is there. I think it is related to set the |
Right, that library was built for the new ABI:
Which isn't compatible with CentOS 7... |
Ok, that's a real problem then... Will that work on other Linux distributions, like Ubuntu and Debian? I would like to avoid building multiple binaries for different distributions but if only CentOS prevents us to support loading libraries like TFDF, it is something we might have to consider... ... and that would also mean that Tensorflow >= 2.9.0 is not supported on CentOS even when using Python? |
glibc and libstdc++ are fully backward compatible, so that's not a problem, that's why using the old ABI works, and "manylinux2014" in the case of Python is essentially CentOS 7: https://peps.python.org/pep-0599/ They are migrating away from that by explicitly stating the glibc version instead: https://github.com/pypa/manylinux It's going to be a mess that TF Core is already prepared to embrace apparently. Since Ubuntu is essentially becoming the most widely distribution, I'm thinking of just moving to that when CentOS 7 reaches EOL (June 2024). Obviously, Oracle is going to disagree, so we might as well start figuring out what we want to do about that here before 2024 arrives. |
Current builds will only not work for users who wants to load TF extensions, like TFDF, that were built on top of TF >= 2.9.0. I don't know what is the proportion of these users but building with the new ABI on Ubuntu will prevent all users on CentOS7 to run TensorFlow with Java. To summarize again the possible solution (which I'm not a fan of any of them):
The last option is great in a sense that it could allow us to add extensions to TensorFlow Java by just adding a JAR, instead of having user download the Python wheel and extract the libraries themselves. But I don't know how much work that would be. |
So... I think the last option will endup to be too much work for whatever our small team can contribute to (unless someone new is willing to give it a try). We've already deprecated usage of Java8 by only maintaining the 0.4.x branch for these users (no new features, only bug and CVE fixes). We can redirect CentOS7 users to that branch as well, and switch to Ubuntu on the 0.5.0 branch. Anyway, this branch is already supporting a version of TensorFlow that is officially not available for CentOS 7 users, regardless of their language bindings. So we hardly can do better than what TensorFlow/Google has decided to do. Thoughts? |
Fine by me, CentOS 7 is 8 years old at this point. The next step would be CentOS 8 but after Red Hat killed it it hasn't seen a lot of market uptake. It's roughly the same glibc version as Ubuntu 18.04, but that's also coming to the end of it's lifespan. Ubuntu 20.04 is probably a reasonable target, but I could see a good argument for 18.04 as well because that would catch anyone still on a RHEL 8 variant (e.g. Rocky Linux). It's worth noting that Google have stopped making patches for TF 2.7 so our 0.4 branch will stop getting those fixes now. |
This is odd, the packages on PyPI are still "manylinux2014": https://pypi.org/project/tensorflow/#files That means they need to be compatible with CentOS 7, so what's going on here |
Hey I wanted to give you an update. The discussion went towards how to generically solve this problem with new vs old ABI. We took the path described above of compiling TFDF using the old ABI so that it would be compatible with tensorflow-java. This created binaries that I distributed in an internal JAR. I was able to successfully load and use this for inference. Huzzah :) Sadly, while this solves issue specifically for me, other users are still faced with the same core problem. |
Thanks for your update on this @mattomatic , and happy to hear that you are now unblocked. I think the plan to fix that permanently is to start reusing in Java the TF binaries that are built for the Python wheels. Hopefully TF 2.12 will make this easier for us, as it distributes the C++ libraries we depend on distinctively from the Python wrappers. I'll leave this issue opened as a reminder to test it out when/if it is done. |
@mattomatic I've actually been running into this exact issue as well. Thanks for starting this discussion and updating on your solution. Do you have any additional details on the steps you took to compile TFDF using the old ABI? I'm a bit unsure of how to proceed with that. Thanks in advance for your help! |
Hi,
Firstly, thank you for your great work making the tensorflow library accessible within the java ecosystem.
I would like some guidance or info about how to incorporate a third party library
tensorflow_decision_forests
(https://www.tensorflow.org/decision_forests) into tensorflow-java. Particularly, I would like to take a trained model and run inference with it from a java app.I've given this a number of tries on my own. I am using the 0.5.0-SNAPSHOT that targets tf 2.9.1 along with the version of tfdf version 0.2.7 which also targets tf 2.9.1. Without any modifications tensorflow-java or library loading, the error that arises is this:
This is the obvious error as I'm aware the library introduces custom tensorflow operations which are probably needed to run inference on the saved model. I attempted to raid the shared library
.so
files from the library distribution and load them usingTensorflow.loadLibrary
from within docker containers, but this did not work for me:I'm a little out of my depth as to how to get this to work properly, and I would appreciate any helpful info, resources or direction.
I note that the most similar looking issue recently was #468 -- which also involves loading shared libraries for a 3rd party library.
The text was updated successfully, but these errors were encountered: