-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obtaining inverse transform values from tft.transform #185
Comments
If it's supposed to be saving something in my transform_model.variables or in the variables/ folder, it isn't. |
Ah, it seems my model is not returning this either which I realize could give me those stats.. |
These constants are being saved in the graph as constants used for transformations.
{pre|post}_transform_statistics_path is where statistics are stored about your data that can be used for your analysis, not so much using the contents during training or anything like that. |
Thank you so much for such a timely response Zoyahav. Indeed, I had checked the graph via:
but none of the constants were actually floats that look like a variance or mean. (They were either int64 values or floats = 1.0). If they really should be stored at all, they're definitely not being saved on my end, either in the graph or in any output folders though my transformed data does appear to have undergone the transformations. I am testing to see if the few NaN values (which I imputed in a prior step) might be causing an issue. But overall, having to compute something beam is already calculating to perform z scaling seems redundant and inefficient.. I figured they'd be present in the output object or graph but I can't seem to get them. Regardless, I can proceed as you're suggesting. Perhaps this is something a future update could provide? or even just an option to reverse a transformation. Say, for example, I scale my numeric target. My prediction will be scaled. I have no idea what that means in my normal scale since I can't reverse the transformation without those statistics. Now that aside, do you find it concerning at all that my process isn't actually outputting {pre|post}_transform_statistics folders? I would like those values |
Also, I also included this in my Pipeline following decode. I'll run again and update if I can get something working. |
Side note, I was unable to get this to work. My preprocessing function would return an empty dict:
|
Does your preprocessing_fn return an empty dict? (perhaps look if there's an issue with indentation or something like that) |
|
The mean and var are not in the |
So I have two options:
|
Could you please explain the second option? |
Sure. From the doc, it says Option 1. it wastes spaces to store duplicated data if we want to preprocess the data in a single pass We could be more efficient because https://www.tensorflow.org/tfx/transform/api_docs/python/tft_beam/AnalyzeAndTransformDataset |
I'm still not understanding option 2, "run the preprocessing_fn twice" I'm assuming means a completely different pipeline with a different preprocessing_fn. This is not a good idea because tracking compatibility between the pipelines is not trivial and defeats the purpose of hermetic preprocessing used for training and serving in order to avoid training/inference skew. Option 1 wastes some space yes, but you also don't need to call TransformDataset() (or AnalyzeAndTransformDataset), you could just produce the TFT output in the form of a SavedModel and apply these transformations for training and serving, having access to additional features. |
Ok, now I understand your point. |
I can't figure it out either. TFT has got such nice functions as But after I run inference and get predictions, also through Beam, how do I apply the inverse transform for the labels to get the actual predictions? I found a way in this thread: by saving Did anyone find a solution for such a problem? |
We can keep this in mind as a feature request, but yes, in the meantime it has to be done manually. |
Could you please confirm if this issue can be closed.Thanks |
I'm throughly confused by the tft transform output.
I've read dozens and dozens of documentation pages and I've even tried (dumbly) to explore the saved_model.pb graph to see if I could find the constants computed during pre_processing with
tft.scale_to_z_score(outputs[key], name='z_scale_'+key) #for the sake of an example we can pretend key = 'height'
how can I obtain the std and mean computed during the AnalyzeAndTransformDataset step for my numeric column "height" ?
I clearly see in my transformed_data files that it's been transformed. In particular, this is important for my target prediction (regression problem). I feel silly. Someone please point me in the right direction?
If it helps, I'm using the census_v2 code as an example where the only major difference in code is our model architecture and loss function (mine is custom)
The text was updated successfully, but these errors were encountered: