Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting perspectiveProjection #28

Open
jamesmfriedman opened this issue Oct 17, 2018 · 7 comments
Open

Documenting perspectiveProjection #28

jamesmfriedman opened this issue Oct 17, 2018 · 7 comments

Comments

@jamesmfriedman
Copy link
Contributor

I will happily add this to the documentation, but I need help figuring out how to use it ;).

Basically I want to extract a quadrilateral region from an image and perspective transform it to be a rectangle. Per my other open issue, I've been pointed to the undocumented gm.perspectiveProjection.

Per this operation

export default (tSrc, tTransform, shape = [10, 10, 4], dtype = tSrc.dtype) => new RegisterOperation('PerspectiveProjection')

I'm doing a lot of guessing here, but I'm assuming you use the perspective transform util to pass the transform into the perspectiveProjection op.

const input = await gm.imageTensorFromURL(...); // 1024 x 1024 input image

// I get my quadrilateral by some method... TL, TR, BL, BR.
const pts = [85.03672790527344, 228.44911193847656, 893.9627685546875, 234.818603515625, 49.670997619628906, 758.93505859375, 982.0530395507812, 724.961669921875];

// The Rect TL, TR, BR, BL
const rect = new gm.Rect(
	pts[0][0],
	pts[0][1],
	pts[1][0],
	pts[1][1],
	pts[3][0],
	pts[3][1],
	pts[2][0],
	pts[2][1]
);

const transform = gm.generateTransformMatrix(
	rect, // pass the rect
	[1024, 1024], // this is bounds? I am assuming an x and y of maxWidth and maxHeight
  new gm.Tensor('uint8', input.shape) // this I am unsure of. It's called transformMatrix, but it appears to just need an empty Tensor to save the transform date into. Not sure about type or shape
);

const operation = gm.perspectiveProjection(
	input, // pretty sure about this.
	transform, // also pretty sure about this/
	transform.shape // not so sure about this
);

const output = gm.tensorFrom(operation);
sess.init(operation);
sess.runOp(operation, 0, output);
gm.canvasFromTensor(document.getElementById('my-canvas'), output);

The good news is I'm getting some output. The bad news is it is just a giant single color canvas. Any help would be greatly appreciated.

@WorldThirteen
Copy link
Contributor

@jamesmfriedman, here is some comments, I hope it'll help

const input = await gm.imageTensorFromURL(...); // 1024 x 1024 input image

const pts = [85.03672790527344, 228.44911193847656, 893.9627685546875, 234.818603515625, 49.670997619628906, 758.93505859375, 982.0530395507812, 724.961669921875];

// The Rect TL, TR, BR, BL (order is correct)
const rect = new gm.Rect(
	pts[0][0],
	pts[0][1],
	pts[1][0],
	pts[1][1],
	pts[3][0],
	pts[3][1],
	pts[2][0],
	pts[2][1]
);

const tTransform = new gm.Tensor('float32', [3, 1, 4]) // this is a placeholder for a transformation matrix
// generated by `gm.generateTransformMatrix`
// The detailed description of this tensor: 3 rows, 1 column, depth 4, it stores a 3x3 [affine transformation](https://en.wikipedia.org/wiki/Affine_transformation) matrix. Each row of a tensor, is a row of a matrix, and each tensor depth is a matrix column
// Example: (3x3 matrix to tensor)
// |1 0 0|
// |0 1 0| => new gm.Tensor('float32', [3, 1, 4], new Float32Array([1, 0, 0, 0,    0, 1, 0, 0,   0, 0, 1, 0]));
// |0 0 1|

gm.generateTransformMatrix(
	rect, // pass the rect
	[480, 640], // this is an output shape you want to have [height, width],
// e.g you want your quadrilateral to be fixed to a rectangle with
// width 640 and height 480
         tTransform, // we use a placeholder instead of returning a new Tensor,
// because we want to reuse already allocated memory, and change the value
// from a call to call without reconstructing the graph
);

const operation = gm.perspectiveProjection(
	input, // input Tensor
	tTransform, // Transformation matrix Tensor
	[480, 640] // the output shape
);

const output = gm.tensorFrom(operation);
sess.init(operation);
sess.runOp(operation, 0, output);
gm.canvasFromTensor(document.getElementById('my-canvas'), output);

Also, you may want to run a graph with different rect, because of real-time quadriliteral position changing:

// For this you must keep the same session and operations
gm.generateTransformMatrix(
	rect, // pass updated rect
	[480, 640], // output shape
         tTransform, // reuse already created Tensor
); // this will modify the data of tTransform
// you don't need to re-init the session
sess.runOp(operation, frameNumber, output); // frame number should be unique for each call, because the value output may be cached by this parameter
gm.canvasFromTensor(document.getElementById('my-canvas'), output);

@jamesmfriedman
Copy link
Contributor Author

Thank you so much for your quick response! I definitely could not have come to that on my own. This is my actual code, copied your example verbatim. Sorry if these are naive questions, I'm still trying to orient myself in the library concepts, as well as computer vision in general.

image

const rect = new gm.Rect(
        pts[0][0],
        pts[0][1],
        pts[1][0],
        pts[1][1],
        pts[3][0],
        pts[3][1],
        pts[2][0],
        pts[2][1]
      );

      const tTransform = new gm.Tensor('float32', [3, 1, 4]);

      gm.generateTransformMatrix(
        rect, // pass the rect
        [480, 640], // this is an output shape you want to have [height, width],
        // e.g you want your quadrilateral to be fixed to a rectangle with
        // width 640 and height 480
        tTransform // we use a placeholder instead of returning a new Tensor,
        // because we want to reuse already allocated memory, and change the value
        // from a call to call without reconstructing the graph
      );

      const operation = gm.perspectiveProjection(input, tTransform, [480, 640]);

      let output2 = gm.tensorFrom(operation);
      sess.init(operation);
      sess.runOp(operation, 1, output2);
      gm.canvasFromTensor(document.getElementById('contours'), output2);

@WorldThirteen
Copy link
Contributor

WorldThirteen commented Nov 25, 2018

@jamesmfriedman, sorry for a huge delay. There was a little mistake in my code example

const operation = gm.perspectiveProjection(
	input, // input Tensor
	tTransform, // Transformation matrix Tensor
	[480, 640] // the output shape
);

to

const operation = gm.perspectiveProjection(
	input, // input Tensor
	tTransform, // Transformation matrix Tensor
	[480, 640, 4] // the output shape, since shape requires a 3 components
);

Here is an example:
pespective_projection_example.zip

@jamesmfriedman
Copy link
Contributor Author

Thanks for the response :). I had eventually figured that out thankfully. I still need to add this to the docs.

@calebbergman
Copy link

calebbergman commented Jul 8, 2019

@WorldThirteen
The output of the image transform from your example comes out a bit fuzzy (you can see jaggy edges around all the app icons, and really any UI element). Is there any way to have it come out a bit crisper?

image

@WorldThirteen
Copy link
Contributor

@calebbergman, Hi! Thanks for the interest in GammaCV! For now, our perspective correction algorithm doesn't include antialiasing strategy, we plan to add it in the nearest future. But now that is how it works for this example, for a source with less skew, the result will be better.

@calebbergman
Copy link

@WorldThirteen
Ok. Well I'll be on the lookout for the anti-aliasing strategy implementation then ;) Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants