Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the readdata_Oxford.py question? #3

Open
PMRS-lab opened this issue Nov 26, 2024 · 5 comments
Open

About the readdata_Oxford.py question? #3

PMRS-lab opened this issue Nov 26, 2024 · 5 comments

Comments

@PMRS-lab
Copy link

I couldn't find any executable function in the readdata_Oxford.py file. The file only contains class functions. How can I accurately crop the spliced satellite image into the corresponding area of the ground image

@PMRS-lab PMRS-lab changed the title About In readdata_Oxford.py question? About the readdata_Oxford.py question? Nov 26, 2024
@ZiminXia
Copy link
Member

Thank you for your interest in this project.
If you check the training code, you can find the next_pair_batch function is being called.

batch_sat, batch_grd, batch_gt = input_data.next_pair_batch(batch_size)

The cropping is done at

# crop a satellite patch centered at the location of the ground image offseted by a randomly generated amount

@PMRS-lab
Copy link
Author

Thank you for your interest in this project. If you check the training code, you can find the next_pair_batch function is being called.

batch_sat, batch_grd, batch_gt = input_data.next_pair_batch(batch_size)

The cropping is done at

# crop a satellite patch centered at the location of the ground image offseted by a randomly generated amount

Thank you very much for your answer. Another question is whether the network does not reduce the perspective difference between ground images and satellite images when extracting features. From the network structure you showed, it seems to directly learn ground images without performing perspective changes. Will this affect the similarity calculation results? After all, if the perspective is not changed and feature extraction is performed directly, the difference between the two images is quite large.

@PMRS-lab
Copy link
Author

Thank you for your interest in this project. If you check the training code, you can find the next_pair_batch function is being called.

batch_sat, batch_grd, batch_gt = input_data.next_pair_batch(batch_size)

The cropping is done at

# crop a satellite patch centered at the location of the ground image offseted by a randomly generated amount

Can it be considered that the satellite images were cropped during the training process, without first outputting the cropped images as a satellite image block dataset?

@ZiminXia
Copy link
Member

Regarding the dataloader, the cropping is done on the fly since the random offsets are generated at each iteration.

Regarding the model architecture, we did not use common perspective changes, e.g., polar transformation and homography because 1. polar transformation assumes center alignment between ground and aerial view, an assumption that does not hold for fine-grained localization. 2. homography ignores above ground objects, limiting the model solely to lane markings, etc.

In general, we find the use of global descriptors is strong enough to pull two views together. Of course, a strong perspective transformation may further improve the performance, but we see obvious limitations in commonly used ones.

@PMRS-lab
Copy link
Author

Regarding the dataloader, the cropping is done on the fly since the random offsets are generated at each iteration.

Regarding the model architecture, we did not use common perspective changes, e.g., polar transformation and homography because 1. polar transformation assumes center alignment between ground and aerial view, an assumption that does not hold for fine-grained localization. 2. homography ignores above ground objects, limiting the model solely to lane markings, etc.

In general, we find the use of global descriptors is strong enough to pull two views together. Of course, a strong perspective transformation may further improve the performance, but we see obvious limitations in commonly used ones.

Thank you very much. Yesterday, you mentioned the common perspective projection transformation and would like to ask an open-ended question. Do you think using orthographic projection transformation would have better results than polar coordinate transformation and homography transformation. Best wiehes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants