Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgconn write buffer size as config option? #137

Open
quantics16 opened this issue Dec 1, 2023 · 7 comments
Open

pgconn write buffer size as config option? #137

quantics16 opened this issue Dec 1, 2023 · 7 comments

Comments

@quantics16
Copy link

quantics16 commented Dec 1, 2023

I'm looking into reducing, or at least better understanding the memory footprint from pgx that a program I have takes up. The profile points to https://github.com/jackc/pgproto3/blob/master/bind.go#L134 being the main cause of memory allocations, which it looks like uses the wbuf in a pgconn to do the writing (looking at https://github.com/jackc/pgconn/blob/master/pgconn.go#L1110).

What I can't figure out is what the wbuf is for. I was looking at e948dc3 where it was added and I see it being appended into, but I don't see where it gets reset. I would have thought to see a

wbuf = wbuf[:0]

Somewhere to 'reuse' the write buffer. What am I missing?

Edit: Turns out I needed a rubber duck. I missed that wbuf was never being assigned the value of buf so it'd always be an empty slice.

In this case, my query strings are almost always larger than 1024 bytes, so there is still significant allocation overhead. This seems like something that could be promoted into a configuration variable?

@quantics16 quantics16 changed the title What is the write buffer for in pgconn? pgconn write buffer size as config option? Dec 1, 2023
@jackc
Copy link
Owner

jackc commented Dec 2, 2023

First, please ensure you are working on the correct repo. This repo is for the pgconn used with pgx v4. It was merged into the pgx repo for v5.

As far as a making the write buffer size configurable, I'm not totally opposed to it, but my first inclination would be to avoid low level configurations like that. It should just work. (like how the Go GC has extremely few settings to tune). I would first be interested in seeing benchmarks that meaningfully changed with a larger buffer size. My assumption is that the write buffer is usually not a significant factor (obviously I could be wrong here).

If it can be shown to be an issue, then either automatically adjusting the write buffer size or using a buffer pool like the internal iobufpool package might solve the problem without exposing a new setting.

@quantics16
Copy link
Author

quantics16 commented Dec 3, 2023

Thanks, I've confirmed my team is still using v4 - I've made a note to switch to v5 later but looking the code in question appears to be roughly the same between versions so I did my benchmarks against v4.

We're doing bulk inserts using the

insert into table (col, col, ...) values (val, val, ...), ...

pattern, and the size of the overall query can get fairly large. The one I tested with was approximately 16MB in size. (to note, I think typically the overall size would be lower than this, but was looking for a worst case as opposed to average case)

I documented the benchmarks here: https://github.com/quantics16/wbufbenchmark

Preview:

BenchmarkPgConnWbufDirect1KB-10     	      27	  37558035 ns/op	88242940 B/op	      36 allocs/op
BenchmarkPgConnWbufDirect1MB-10     	      32	  38572945 ns/op	91922612 B/op	      14 allocs/op
BenchmarkPgConnWbufDirect20MB-10    	    1186	    930651 ns/op	       0 B/op	       0 allocs/op
BenchmarkPgConnWBufViaPostgres1KB-10    	       1	1035001708 ns/op	107349512 B/op	    5079 allocs/op
BenchmarkPgConnWBufViaPostgres1MB-10    	       1	1081868042 ns/op	110988280 B/op	    5057 allocs/op
BenchmarkPgConnWBufViaPostgres20MB-10    	       2	 745823666 ns/op	19029052 B/op	    5032 allocs/op

Let me know if you need more details

@jackc
Copy link
Owner

jackc commented Dec 5, 2023

Have you considered using CopyFrom? Bulk inserts can have significantly higher performance through the copy protocol.

@quantics16
Copy link
Author

I have considered it yes. We have loads that are like the above example, and ones that are much smaller. I figured with CopyFrom, since we'd have to deal with temporary tables and whatnot since we have to deal with updates, not just inserts the overhead there meant it was best suited for cases where we had more data than what we were dealing with currently. Like I presume the bulk insert approach I use above is better than CopyFrom if we were to only write in 20 rows, instead of 1000.

We've not yet looked into when to switch to CopyFrom, but we had thought this would work for the majority of cases for now

@jackc
Copy link
Owner

jackc commented Dec 9, 2023

Actually, CopyFrom performs much better than you might think.

From the pgx benchmarks (v5 but I think v4 is similar):

BenchmarkWrite2RowsViaInsert-12                                      	    9584	    113651 ns/op	     857 B/op	      25 allocs/op
BenchmarkWrite2RowsViaMultiInsert-12                                 	   12120	    100913 ns/op	    2693 B/op	      35 allocs/op
BenchmarkWrite2RowsViaBatchInsert-12                                 	   18582	     64046 ns/op	     848 B/op	      23 allocs/op
BenchmarkWrite2RowsViaCopy-12                                        	   18187	     65571 ns/op	    3182 B/op	      50 allocs/op

Even at 2 rows CopyFrom is faster than a multi-insert statement. Obviously, your specific case may be different, but in almost all cases CopyFrom will have the best performance.

@quantics16
Copy link
Author

Ah interesting - I'll make a note to test this in the new year. Thanks!

It does appear that CopyFrom would still have the allocation issue I have above - https://github.com/jackc/pgx/blob/master/copy_from.go#L168C3-L168C3. Would it still be reasonable to consider a change to that write buffer?

@jackc
Copy link
Owner

jackc commented Dec 16, 2023

It does appear that CopyFrom would still have the allocation issue I have above - https://github.com/jackc/pgx/blob/master/copy_from.go#L168C3-L168C3. Would it still be reasonable to consider a change to that write buffer?

🤷‍♂️

My current assumption is that the improvement wouldn't be worth the (small) increase in the public pgx interface, but I guess if it could be shown to be a big enough issue we could do it. (Though ideally it would be something where pgx could adaptively resize the buffer rather than expose a setting to the user.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants