diff --git a/app/(blog)/blog/a-tale-of-two-copies/07035c56befc8dc1.png b/app/(blog)/blog/a-tale-of-two-copies/07035c56befc8dc1.png new file mode 100644 index 000000000..6cb83ccd4 Binary files /dev/null and b/app/(blog)/blog/a-tale-of-two-copies/07035c56befc8dc1.png differ diff --git a/app/(blog)/blog/a-tale-of-two-copies/51fb41f548601419.png b/app/(blog)/blog/a-tale-of-two-copies/51fb41f548601419.png new file mode 100644 index 000000000..7989fad7b Binary files /dev/null and b/app/(blog)/blog/a-tale-of-two-copies/51fb41f548601419.png differ diff --git a/app/(blog)/blog/a-tale-of-two-copies/9b285294e5712e47.jpeg b/app/(blog)/blog/a-tale-of-two-copies/9b285294e5712e47.jpeg new file mode 100644 index 000000000..9a7575c2d Binary files /dev/null and b/app/(blog)/blog/a-tale-of-two-copies/9b285294e5712e47.jpeg differ diff --git a/app/(blog)/blog/a-tale-of-two-copies/a5d148aee871d9eb.png b/app/(blog)/blog/a-tale-of-two-copies/a5d148aee871d9eb.png new file mode 100644 index 000000000..0d3269dc0 Binary files /dev/null and b/app/(blog)/blog/a-tale-of-two-copies/a5d148aee871d9eb.png differ diff --git a/app/(blog)/blog/a-tale-of-two-copies/eb4b808e55daa547.png b/app/(blog)/blog/a-tale-of-two-copies/eb4b808e55daa547.png new file mode 100644 index 000000000..e85167d50 Binary files /dev/null and b/app/(blog)/blog/a-tale-of-two-copies/eb4b808e55daa547.png differ diff --git a/app/(blog)/blog/a-tale-of-two-copies/page.md b/app/(blog)/blog/a-tale-of-two-copies/page.md new file mode 100644 index 000000000..91defb521 --- /dev/null +++ b/app/(blog)/blog/a-tale-of-two-copies/page.md @@ -0,0 +1,482 @@ +--- +author: + name: Jeff Wendling +date: '2021-08-10 00:00:00' +heroimage: ./a5d148aee871d9eb.png +layout: blog +metadata: + description: It was the best of times, it was the worst of times. That's when I + hit a performance mystery that sent me down a multi-day rabbit hole of adventure. + I was writing some code to take some entries, append them into a fixed size in-memory + buffer, and then flush that buffer to disk when it was full. + title: A Tale of Two Copies +title: A Tale of Two Copies + +--- + +It was the best of times, it was the worst of times. That's when I hit a performance mystery that sent me down a multi-day rabbit hole of adventure. I was writing some code to take some entries, append them into a fixed size in-memory buffer, and then flush that buffer to disk when it was full. The main bit of code looked a little something like this: + + +```go +type Buffer struct { + fh *os.File + n uint + buf [numEntries]Entry +} + +func (b *Buffer) Append(ent Entry) error { + if b.n < numEntries-1 { + b.buf[b.n] = ent + b.n++ + return nil + } + return b.appendSlow(ent) +} + +``` + + +with the idea being that when there's space in the buffer, we just insert the entry and increment a counter, and when we're full, it falls back to the slower path that writes to disk. Easy, right? Easy... + +## The Benchmark + +I had a question about what size the entries should be. The minimum size I could pack them into was 28 bytes, but that's not a nice power of 2 for alignment and stuff, so I wanted to compare it to 32 bytes. Rather than just relying on my intuition, I decided to write a benchmark. The benchmark would Append a fixed number of entries per iteration (100,000) and the only thing changing would be if the entry size was 28 or 32 bytes. + +Even if I'm not relying on my intuition, I find it fun and useful to try to predict what will happen anyway. And so, I thought to myself: + + +> Everyone knows that I/O is usually dominating over small CPU potential inefficiencies. The 28 byte version writes less data and does less flushes to disk than the 32 byte version. Even if it's somehow slower filling the memory buffer, which I doubt, that will be more than made up for by the extra writes that happen. + +Maybe you thought something similar, or maybe something completely different. Or maybe you didn't sign up to do thinking right now and just want me to get on with it. And so, I ran the following benchmark: + + +```go +func BenchmarkBuffer(b *testing.B) { + fh := tempFile(b) + defer fh.Close() + + buf := &Buffer{fh: fh} + now := time.Now() + ent := Entry{} + + b.ResetTimer() + + for i := 0; i < b.N; i++ { + fh.Seek(0, io.SeekStart) + + for i := 0; i < 1e5; i++ { + _ = buf.Append(ent) + } + _ = buf.Flush() + } + + b.ReportMetric(float64(time.Since(now).Nanoseconds())/float64(b.N)/1e5, "ns/key") + b.ReportMetric(float64(buf.flushes)/float64(b.N), "flushes") +} + +``` + + +## Confusion + +And here are the results: + + +``` +BenchmarkBuffer/28 734286 ns/op 171.0 flushes 7.343 ns/key +BenchmarkBuffer/32 436220 ns/op 196.0 flushes 4.362 ns/key + +``` + + + +That's right, a nearly 2x difference in performance where the benchmark writing to disk MORE is FASTER! + +![](./51fb41f548601419.png) + +*Me too, Nick Young. Me, too.* + +And so began my journey. The following is my best effort in remembering the long, strange trip I took diagnosing what I thought was happening. Spoiler alert: I was wrong a lot, and for a long time. + +# The Journey + +## CPU Profiles + +CPU profiles have a huge power to weight ratio. To collect them from a Go benchmark, all you have to do is specify `-cpuprofile=` + on the command line and that's it. So of course this is the first thing I reached for. + +One thing to keep in mind, though, is that Go benchmarks by default will try to run for a fixed amount of time, and if one benchmark takes longer to do its job vs another, you get less iterations of it. Since I wanted to compare the results more directly, I made sure to also pass a fixed number of iterations to the command with `-benchtime=2000x` +. + +So let's take a look at these profiles. First, the 32 byte version: + + +``` + . . 24:func (b *Buffer) Append(ent Entry) error { + 30ms 30ms 25: if b.n < numEntries-1 { + 110ms 110ms 26: b.buf[b.n] = ent + 90ms 90ms 27: b.n++ + . . 28: return nil + . . 29: } + 10ms 520ms 30: return b.appendSlow(ent) + . . 31:} + +``` + + +The first column shows the amount of time spent on that line just in the context of the shown function, and the second column is the amount of time spent on that line including any functions it may have called. + +From that, we can see that, as expected, most of the time is spent flushing to disk in appendSlow compared to writing to the in memory buffer. + +And now here's the 28 byte version: + + +``` + . . 24:func (b *Buffer) Append(ent Entry) error { + 20ms 20ms 25: if b.n < numEntries-1 { + 840ms 840ms 26: b.buf[b.n] = ent + 20ms 20ms 27: b.n++ + . . 28: return nil + . . 29: } + . 470ms 30: return b.appendSlow(ent) + . . 31:} + +``` + + + +A couple of things stand out to me here. First of all, WHAT? Second of all, it spends less time flushing to disk compared to the 32 byte version. That's at least expected because it does that less often (171 vs 196 times). And finally, WHAT? + +Maybe the penalty for writing unaligned memory was worse than I thought. Let's take a look at the assembly to see what instruction it's stuck on. + +## The Assembly + +Here's the section of code responsible for the 840ms on line 26 in the above profile: + + +``` + . . 515129: IMULQ $0x1c, CX, CX (1) + 90ms 90ms 51512d: LEAQ 0xc0(SP)(CX*1), CX (2) + . . 515135: MOVUPS 0x7c(SP), X0 (3) + 670ms 670ms 51513a: MOVUPS X0, 0(CX) (4) + 80ms 80ms 51513d: MOVUPS 0x88(SP), X0 (5) + . . 515145: MOVUPS X0, 0xc(CX) (6) + +``` + + +If you've never read assembly before, this may be a bit daunting, so I've numbered the lines and will provide a brief explanation. The most important bits to know are that `CX` +, `SP` + and `X0` + are registers, and the syntax `0x18(CX)` + means the value at address `CX + 0x18` +. Armed with that knowledge, we can understand the lines: + +1. Multiply the `CX` + register by `0x1c` + and store it into `CX` +. `0x1c` + is the hex encoding of the decimal value 28. +2. This computes the address we'll be storing the entry into. It computes `0xc0 + SP + (CX*1)` + and stores it into `CX` +. From this, we deduce that the start of the entry array is at `0xc0(SP)` +. +3. This loads 16 bytes starting at `0x7c(SP)` + and stores it into `X0` +. +4. This stores the 16 bytes we just loaded into `0(CX)` +. +5. This loads 16 bytes starting at `0x88(SP)` + and stores it into `X0` +. +6. This stores the 16 bytes we just loaded into `0xc(CX)` +. + +I don't know about you, but I saw no reason why line 4 should have so much weight compared to the other lines. So, I compared it to the 32 byte version to see if the generated code was different: + + +``` + 40ms 40ms 515129: SHLQ $0x5, CX + 10ms 10ms 51512d: LEAQ 0xc8(SP)(CX*1), CX + . . 515135: MOVUPS 0x80(SP), X0 + 10ms 10ms 51513d: MOVUPS X0, 0(CX) + 40ms 40ms 515140: MOVUPS 0x90(SP), X0 + 10ms 10ms 515148: MOVUPS X0, 0x10(CX) + +``` + + + +It looks like the only difference, aside from almost no time at all being spent in these instructions, is the SHLQ vs the IMULQ. The former is doing a "left shift" of 5, which effectivly multiplies by 2 to the 5th power, or 32, and the latter, as we previously saw, multiplies by 28. Could this possibly be the performance difference? + +## Pipelines and Ports + +Modern CPUs are complex beasts. Maybe you have the mental model that your CPU reads instructions in and executes them one at a time as I once did. That couldn't be further from the truth. Instead, they execute multiple instructions at once, possibly out of order, in a [pipeline](https://en.wikipedia.org/wiki/Instruction_pipelining). But it gets even better: they have limits on how many of each kind of instruction can be run simultaneously. This is done by the CPU having multiple "ports", and certain instructions require and can run on different subsets of these ports. + +So what does that have to do with IMULQ vs SHLQ? Well, you may have noticed that the LEAQ following the IMULQ/SHLQ has a multiply in it (`CX*1` +). But, because there aren't infinite ports, there must be a limited number of ports able to do multiplies. + +The LLVM project has lots of tools to help you understand what computers do, and one of them is a tool called [`llvm-mca`](https://www.llvm.org/docs/CommandGuide/llvm-mca.html#how-llvm-mca-works). Indeed, if we run the two first instructions of the 32 and 28 byte versions through `llvm-mca` +, it gives us an idea of what ports will be used when they are executed: + + +``` +Resource pressure by instruction (32 byte version): +[2] [3] [7] [8] Instructions: +0.50 - - 0.50 shlq $5, %rcx + - 0.50 0.50 - leaq 200(%rsp,%rcx), %rcx + +Resource pressure by instruction (28 byte version): +[2] [3] [7] [8] Instructions: + - 1.00 - - imulq $28, %rcx, %rcx + - - 1.00 - leaq 192(%rsp,%rcx), %rcx + +``` + + + +The numbers are what percent of the time each instruction ran on the port (here, numbered 2, 3, 7 and 8) when executed in a loop. + +So that's saying that in the 32 byte version, the SHLQ ran on port 2 half the time and port 8 the other half, and the LEAQ ran on port 3 half the time and port 7 the other half. This is implying that it can have 2 parallel executions at once. For example, on one iteration, it can use ports 2 and 3, and on the next iteration it can use ports 7 and 8, even if ports 2 and 3 are still being used. However, for the 28 byte version, the IMULQ must happen solely on port 3 due to the way the processor is built, which in turn limits the maximum throughput. + +And for a while, this is what I thought was happening. In fact, a first draft of this very blog post had that as the conclusion, but the more I thought about it, the less good of an explanation it seemed. + +![](./eb4b808e55daa547.png) + +*My first draft attempt* + +## Trouble in Paradise + +Here are some thoughts that you may be having: + +1. In the worst case, that can only be a 2x speed difference. +2. Aren't there other instructions in the loop? That has to make it so that it's much less than 2x in practice. +3. The 32 byte version spends 230ms in the memory section and the 28 byte version spends 880ms. +4. That is much bigger than 2x bigger. +5. Oh no. + +Well, maybe that last one was just me. With those doubts firmly in my mind, I tried to figure out how I could test to see if it was because of the IMULQ and SHLQ. Enter `perf` +. + +## Perf + +[`perf`](https://perf.wiki.kernel.org/index.php/Main_Page) is a tool that runs on linux that allows you to execute programs and expose some detailed counters that CPUs keep about how they executed instructions (and more!). Now, I had no idea if there was a counter that would let me see something like "the pipeline stalled because insufficient ports or whatever", but I did know that it had counters for like, everything. + +If this were a movie, this would be the part where the main character is shown trudging through a barren desert, sun blazing, heat rising from the earth, with no end in sight. They'd see a mirage oasis and jump in, gulping down water, and suddenly realize it was sand. + +A quick estimate shows that perf knows how to read over 700 different counters on my machine, and I feel like I looked at most of them. Take a look at [this huge table](https://perfmon-events.intel.com/skylake.html) if you're interested. I couldn't find any counters that could seem to explain the large difference in speed, and I was starting to get desparate. + +![](./9b285294e5712e47.jpeg) + +*A picture of me wading through all of the perf counters* + +## Binary Editing for Fun and Profit + +At this point, I had no idea what the problem was, but it sure seemed like it wasn't port contention like I thought. One of the only other things that I thought it could be was alignment. CPUs tend to like to have memory accessed at nice multiples of powers of 2, and 28 is not one of those, and so I wanted to change the benchmark to write 28 byte entries but at 32 byte offsets. + +Unfortunately, this wasn't as easy as I hoped. The code under test is very delicately balanced with respect to the Go compiler's inliner. Basically any changes to Append cause it to go over the threshold and stop it from being inlined, which really changes what's being executed. + +Enter binary patching. It turns out that in our case, the IMULQ instruction encodes to the same number of bytes as the SHLQ. Indeed, the IMULQ encodes as `486bc91c` +, and the SLHQ as `48c1e105` +. So it's just a simple matter of replacing those bytes and running the benchmark. I'll (for once) spare you the details of how I edited it (Ok, I lied: I hackily used `dd` +). The results sure did surprise me: + + +``` +BenchmarkBuffer/28@32 813529 ns/op 171.0 flushes 8.135 ns/key + +``` + + + +I saw the results and felt defeated. It wasn't the IMULQ making the benchmark go slow. That benchmark has no IMULQ in it. It wasn't due to unaligned writes. The slowest instruction was written with the same alignment as in the 32 byte version as we can see from the profiled assembly: + + +``` + . . 515129: SHLQ $0x5, CX + 60ms 60ms 51512d: LEAQ 0xc0(SP)(CX*1), CX + . . 515135: MOVUPS 0x7c(SP), X0 + 850ms 850ms 51513a: MOVUPS X0, 0(CX) + 120ms 120ms 51513d: MOVUPS 0x88(SP), X0 + . . 515145: MOVUPS X0, 0xc(CX) + +``` + + +What was left to try? + +# A Small Change + +Sometimes when I have no idea why something is slow, I try writing the same code but in a different way. That may tickle the compiler just right to cause it to change which optimizations it can or can't apply, giving some clues as to what's going on. So in that spirit I changed the benchmark to this: + + +```go +func BenchmarkBuffer(b *testing.B) { + // ... setup code + + for i := 0; i < b.N; i++ { + fh.Seek(0, io.SeekStart) + + for i := 0; i < 1e5; i++ { + _ = buf.Append(Entry{}) + } + _ = buf.Flush() + } + + // .. teardown code +} + +``` + + + +It's hard to spot the difference, but it changed to passing a new entry value every time instead of passing the `ent` + variable manually hoisted out of the loop. I ran the benchmarks again. + + +``` +BenchmarkBuffer/28 407500 ns/op 171.0 flushes 4.075 ns/key +BenchmarkBuffer/32 446158 ns/op 196.0 flushes 4.462 ns/key + +``` + + + +IT DID SOMETHING? How could that change possibly cause that performance difference? It's finally running faster than the 32 byte version! As usual, time to look at the assembly. + + +``` + 50ms 50ms 515109: IMULQ $0x1c, CX, CX + . . 51510d: LEAQ 0xa8(SP)(CX*1), CX + . . 515115: MOVUPS X0, 0(CX) + 130ms 130ms 515118: MOVUPS X0, 0xc(CX) + +``` + + + +It's no longer loading the value from the stack to store it into the array, and instead just storing directly into the array from the already zeroed register. But we know from all the pipeline analysis done earlier that the extra loads should effectively be free, and the 32 byte version confirms that. It didn't get any faster even though it also is no longer loading from the stack. + +So what's going on? + +## Overlapping Writes + +In order to explain this idea, it's important to show the assembly of the full inner loop instead of just the code that writes the entry to the in-memory buffer. Here's a cleaned up and annotated version of the slow 28 byte benchmark inner loop: + + +``` +loop: + INCQ AX (1) + CMPQ $0x186a0, AX + JGE exit + + MOVUPS 0x60(SP), X0 (2) + MOVUPS X0, 0x7c(SP) + MOVUPS 0x6c(SP), X0 + MOVUPS X0, 0x88(SP) + + MOVQ 0xb8(SP), CX (3) + CMPQ $0x248, CX + JAE slow + + IMULQ $0x1c, CX, CX (4) + LEAQ 0xc0(SP)(CX*1), CX + MOVUPS 0x7c(SP), X0 (5) + MOVUPS X0, 0(CX) + MOVUPS 0x88(SP), X0 + MOVUPS X0, 0xc(CX) + + INCQ 0xb8(SP) (6) + JMP loop + +slow: + // ... slow path goes here ... + +exit: + +``` + + +1. Increment `AX` + and compare it to 100,000 exiting if it's larger. +2. Copy 28 bytes on the stack from offsets `[0x60, 0x7c]` + to offsets `[0x7c, 0x98]` +. +3. Load the memory counter and see if we have room in the memory buffer +4. Compute where the entry will be written to in the in-memory buffer. +5. Copy 28 bytes on the stack at offsets `[0x7c, 0x98]` + into the in-memory buffer. +6. Increment the memory counter and loop again. + +Steps 4 and 5 are what we've been looking at up to now. + +If step 2 seems silly and redundant, that's because it is. There's no reason to copy a value on the stack to another location on the stack and then load from that copy on the stack into the in-memory buffer. Step 5 could have just used offsets `[0x60, 0x7c]` + instead and step 2 could have been eliminated. The Go compiler could be doing a better job here. + +But that shouldn't be why it's slow, right? The 32 byte code does almost the exact same silly thing and it goes fast, because of pipelines or pixie dust or something. What gives? + +There's one crucial difference: the writes in the 28 byte case overlap. The MOVUPS instruction writes 16 bytes at a time, and as everyone knows, 16 + 16 is usually more than 28. So step 2 writes to bytes `[0x7c, 0x8c]` + and then writes to bytes `[0x88, 0x98]` +. This means the range `[0x88, 0x8c]` + was written to twice. Here's a helpful ASCII diagram: + + +``` +0x7c 0x8c +├────────────────┤ +│ Write 1 (16b) │ +└───────────┬────┴──────────┐ + │ Write 2 (16b) │ + ├───────────────┤ + 0x88 0x98 + +``` + + +## Store Forwarding + +Remember how CPUs are complex beasts? Well it gets even better. An optimization that some CPUs do is they have something called a ["write buffer"](https://en.wikipedia.org/wiki/Write_buffer). You see, memory access is often the slowest part of what CPUs do. Instead of, you know, actually writing the memory when the instruction executes, CPUs place the writes into a buffer first. I think the idea is to coalesce a bunch of small writes into larger sizes before flushing out to the slower memory subsystem. Sound familiar? + +So now it has this write buffer buffering all of the writes. What happens if a read comes in for one of those writes? It would slow everything down if had to wait for that write to actually happen before reading it back out, so instead it tries to service the read from the write buffer directly if possible, and no one is the wiser. You clever little CPU. This optimization is called [store forwarding](https://easyperf.net/blog/2018/03/09/Store-forwarding). + +![](./07035c56befc8dc1.png) + +*My CPU buffering and reorganizing all of the writes* + +But what if those writes overlap? It turns out that, on my CPU at least, this inhibits that "store forwarding" optimization. There's even a perf counter that keeps track of when this happens: [ld\_blocks.store\_forward](https://perfmon-events.intel.com/index.html?pltfrm=skylake.html&evnt=LD_BLOCKS.STORE_FORWARD). + +Indeed, the documentation about that counter says + + +> Counts the number of times where store forwarding was prevented for a load operation. The most common case is a load blocked due to the address of memory access (partially) overlapping with a preceding uncompleted store. + +Here's how often that counter hits for the different benchmarks so far where "Slow" means that the entry is constructed outside of the loop, and "Fast" means that the entry is constructed inside of the loop on every iteration: + + +``` +BenchmarkBuffer/28-Slow 7.292 ns/key 1,006,025,599 ld_blocks.store_forward +BenchmarkBuffer/32-Slow 4.394 ns/key 1,973,930 ld_blocks.store_forward +BenchmarkBuffer/28-Fast 4.078 ns/key 4,433,624 ld_blocks.store_forward +BenchmarkBuffer/32-Fast 4.369 ns/key 1,974,915 ld_blocks.store_forward + +``` + + +Well, a billion is usually bigger than a million. Break out the champagne. + +# Conclusion + +After all of that, I have a couple of thoughts. + +Benchmarking is hard. People often say this, but maybe the only thing harder than benchmarking is adequately conveying how hard benchmarking is. Like, this was closer to the micro-benchmark than macro-benchmark side of things but still included performing millions of operations including disk flushes and actually measured a real effect. But at the same time, this would almost never be a problem in practice. It required the compiler to spill a constant value to the stack unnecessarily very closely to the subsequent read in a tight inner loop to notice. Doing any amount of real work to create the entries would cause this effect to vanish. + +A recurring theme as I learn more about how CPUs work is that the closer you get to the "core" of what it does, the leakier and more full of edge cases and hazards it becomes. Store forwarding not working if there was a partially overlapping write is one example. Another is that the caches aren't [fully associative](https://en.wikipedia.org/wiki/CPU_cache#Associativity), so you can only have so many things cached based on their memory address. Like, even if you have 1000 slots available, if all your memory accesses are multiples of some factor, they may not be able to use those slots. [This blog post](https://danluu.com/3c-conflict/) has a great discussion. Totally speculating, but maybe this is because you have less "room" to solve those edge cases when under ever tighter physical constraints. + +Before now, I've never been able to concretely observe the CPU slowing down from port exhaustion issues in an actual non-contrived setting. I still haven't. I've heard the adage that you can imagine every CPU instruction taking 0 cycles except for the ones that touch memory. As a first approximation, it seems pretty true. + +I've put up the full code sample in [a gist](https://gist.github.com/zeebo/4c9e28ac277c74ae450ad1bff8068f93) for your viewing/downloading/running/inspecting pleasure. + +Often, things are more about the journey than the destination, and I think that's true here, too. If you made it this far, thanks for coming along on the journey with me, and I hope you enjoyed it. Until next time. + +‍ + diff --git a/app/(blog)/blog/automatically-store-your-tesla-sentry-mode-and-dashcam-videos-on-the-decentralized-cloud/94ca9aa0a874a87b.png b/app/(blog)/blog/automatically-store-your-tesla-sentry-mode-and-dashcam-videos-on-the-decentralized-cloud/94ca9aa0a874a87b.png new file mode 100644 index 000000000..3d8555684 Binary files /dev/null and b/app/(blog)/blog/automatically-store-your-tesla-sentry-mode-and-dashcam-videos-on-the-decentralized-cloud/94ca9aa0a874a87b.png differ diff --git a/app/(blog)/blog/automatically-store-your-tesla-sentry-mode-and-dashcam-videos-on-the-decentralized-cloud/page.md b/app/(blog)/blog/automatically-store-your-tesla-sentry-mode-and-dashcam-videos-on-the-decentralized-cloud/page.md new file mode 100644 index 000000000..adccc5018 --- /dev/null +++ b/app/(blog)/blog/automatically-store-your-tesla-sentry-mode-and-dashcam-videos-on-the-decentralized-cloud/page.md @@ -0,0 +1,100 @@ +--- +author: + name: Krista Spriggs +date: '2021-06-15 00:00:00' +heroimage: ./94ca9aa0a874a87b.png +layout: blog +metadata: + description: You can automatically transfer Sentry Mode and Dashcam video clips + over WiFi to cloud storage and make room for more videos the next day. We used + a Raspberry Pi (a small, low cost, low power computer about the size of an Altoids + tin) plugged into the USB port in the dashboard to store the video files. When + the Tesla pulls into the garage at night, the Raspberry Pi connects via WiFi and + uploads all the videos to Storj DCS cloud storage, then clears off the drive for + use the next day. This will also work for videos recorded in Track Mode if you + have one of the performance models, making it easy to share any of the videos + with your friends. + title: Automatically Store Your Tesla Sentry Mode and Dashcam Videos on the Decentralized + Cloud +title: Automatically Store Your Tesla Sentry Mode and Dashcam Videos on the Decentralized + Cloud + +--- + +I have a 2019 Tesla M3 and I love the built in features to capture Dashcam and Sentry Mode footage for review later. The Sentry Mode feature captures 10 minutes of video when someone or something approaches your parked vehicle. Dashcam captures video when you're driving. It continuously stores the most recent hour of video and will also save the most recent 10 minutes of video when you press the Dashcam button or when you honk your horn. Where do those videos get saved? As it turns out, they get saved to a flash drive, if you have one plugged into one of the USB ports in the front console. + + +When you get home, you have to pull the flash drive out of the car and copy the video files to your computer to watch them or store them long term. It’s a fairly low-tech way to manage your data. If you don’t free up space on the flash drive, Sentry Mode will eventually fill up the storage device and your Dashcam feature will stop saving video. What if there was an easy way to save all your Sentry Mode and Dashcam videos automatically when you pull into your garage, and ensure you always have space for new videos? As it turns out, [this is a solved problem](https://github.com/marcone/teslausb)\* if you just want to copy the videos to a computer on your home network. We’ve taken this open source project maintained by GitHub user marcone and created by Reddit user drfrank, and connected it to store the videos on Storj DCS, a decentralized cloud storage service that is secure, private, and extremely affordable.  + +### TL;DR + +You can automatically transfer Sentry Mode and Dashcam video clips over WiFi to cloud storage and make room for more videos the next day. We used a Raspberry Pi (a small, low cost, low power computer about the size of an Altoids tin) plugged into the USB port in the dashboard to store the video files. When the Tesla pulls into the garage at night, the Raspberry Pi connects via WiFi and uploads all the videos to Storj DCS cloud storage, then clears off the drive for use the next day. This will also work for videos recorded in Track Mode if you have one of the performance models, making it easy to share any of the videos with your friends. + + +### So, how hard is it? + +Ok, so most Tesla owners tend to be pretty technical, so, if that describes you, this is a piece of cake, sorry, Pi. Here’s what you’ll need: + +* [Raspberry Pi Zero W : ID 3400 : $10.00](https://www.adafruit.com/product/3400) - we used a different model, but this is better. +* [Adafruit Raspberry Pi Zero Case : ID 3252 : $4.75](https://www.adafruit.com/product/3252) - it should look good - you can 3d print your own for extra credit. +* Y [Video microSDXC Card](https://www.amazon.com/SanDisk-Endurance-microSDXC-Adapter-Monitoring/dp/B07P4HBRMV) $37 - it’s very important to have high quality storage with high write endurance. This gives you room for a few days in case you don’t connect to WiFi and won't wear out too quickly. +* [USB A to Micro-B - 3 foot long](https://www.adafruit.com/product/592) - A USB Cable to plug into the car. [‍](https://www.storj.io/) +* [Storj DCS cloud storage](https://www.storj.io/) - Storj provides 25 GB for free and it’s only $0.004 per GB after that! Secure, private, and decentralized. +* Optional Items for easier setup - A [Mini HDMI to HDMI Cable - 5 feet : ID 2775 : $5.95](https://www.adafruit.com/product/2775) will make it easier to set everything up by connecting the Pi to a monitor. + +All in, you’re looking at right around $60 to get going. I’ll wait while you get everything ordered... + + +### Okay, what’s next? + +Assuming you have everything you need, we’ve published the [detailed, step-by-step instructions in this tutorial](docId:XjYoGwaE6ncc3xTICXOOu). In general, there are four main steps which we share in a brief overview below: + + +* Create your Storj DCS account +* Set up your Raspberry Pi +* Enable Sentry Mode and Dashcam Mode +* Drive around and honk your horn + +After that, you can just go home and park your car. Your Raspberry Pi will connect to your home WiFi and do the rest. You’ll be able to view and share your videos from Storj DCS. + +### Create your Storj DCS account + +Storj DCS is a decentralized cloud object storage service. Storj DCS is like the Airbnb for hard drives—thousands of people with extra hard drive space share that space with Storj, then Storj makes that space available to use for storing things like Tesla videos. Every file stored on Storj DCS is encrypted, encoded for redundancy, and divided into at least 80 pieces— each stored on a different hard drive all over the world, provided by people who share storage space and bandwidth in exchange for STORJ tokens. Of those 80 pieces, you only need any 29 to download your file. That’s a bit of an oversimplification, and it’s even better than it sounds, but you get the idea; cloud storage for your videos. We just released some new features and pretty amazing pricing on April 20 and now you can try it. + + +First, go to Storj.io and create an account. You get 25 GB of storage and bandwidth a month for free with your new account. After that, you’ll need to create an Access Grant and generate the S3-compatible gateway credentials for use in the application that uploads your data to Storj DCS. Follow the steps in the tutorial and save your gateway credentials for the next step. + + +### Set up your Raspberry Pi + +A Raspberry Pi is a mini computer. Setting one up is relatively straightforward. Once you assemble the parts, you can connect to it remotely or plug it in to a monitor keyboard and mouse. To get it up and running, you download an OS like Raspian, flash it to the SD card and boot it up. From there, it’s a matter of installing a few components, including Rclone, which is the application that will upload your videos to Storj DCS. Configure Rclone with the gateway credentials you created on your Storj DCS account. Once you have the Raspberry Pi working, shut it down, unplug everything and head out to your Tesla with the Raspberry Pi and the USB cable. + + +### Enable Sentry Mode and Dashcam Mode + +Once you’re in your car, plug the Raspberry Pi into one of the USB ports in the front console. (The ones in the back don’t work for this project.) The Raspberry Pi will store video files through and is also powered by the USB port, so it needs to stay plugged in. Now, you’ll need to enable Sentry Mode and Dashcam Mode. These features are not enabled by default. Follow the steps in the tutorial to enable those two features on your Tesla. Once the Raspberry Pi is plugged in and the features are enabled, you’re ready to see it in action. + + +### Drive around and honk your horn + +The easiest way to capture some video clips is to drive around and honk your horn. Of course, if you worked on this until late into the night, your neighbors may or may not be as excited to test it as you are, so honk responsibly. As an alternative, drive around and click the Dashcam button to save a clip. Really, it’s up to you, but just get some video footage. All of the videos generated by Sentry Mode and Dashcam Mode will be saved to the SD card in the Raspberry Pi. + + +Once you’ve got some video, it’s time for the real magic—go home. When you pull into your garage and your Raspberry Pi connects to your home WiFi, it will upload the trip’s videos to Storj DCS.  + + +As you drive around, honk your horn, capture Dashcam videos and accumulate Sentry Mode video. Upon return, the videos will be uploaded to your Storj DCS account. Every one of those videos will be encrypted, erasure coded and stored in pieces distributed across our network of 13,000 Storage Nodes (and growing). You can view, download or share those videos with your friends. We’ve shared a sample video from a Tesla belonging to a Storj team member. When you share a file though Storj DCS,  the link lets you see all the Storage Nodes storing pieces of your file and stream the file directly from the network. The tutorial also has the steps to share a file (hint: click the share button to create a secure and private link to share). + + +### That’s all there is to it + +If you’ve followed along and followed the steps in your tutorial, your Tesla will store your Sentry and Dashcam videos in the decentralized cloud for as long as you want or need them.  + + +Overall this was a really fun project to put together, and shows off yet another way that you can integrate with Storj DCS easily and quickly! + + + +*\* The code used in this tutorial is open source and uses, among other things,* [*RClone*](https://github.com/rclone/rclone) *which includes native support for Storj DCS. The GitHub Repository for the code is available at:* [*https://github.com/marcone/teslausb*](https://github.com/marcone/teslausb) *and the project was originally described on the* [*/r/teslamotors*](https://www.reddit.com/r/teslamotors/comments/9m9gyk/build_a_smart_usb_drive_for_your_tesla_dash_cam/) *subreddit.* + + diff --git a/app/(blog)/blog/demystifying-technical-debt/09f5df2707586cc9.png b/app/(blog)/blog/demystifying-technical-debt/09f5df2707586cc9.png new file mode 100644 index 000000000..84349cdfc Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/09f5df2707586cc9.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/132d3b12bc22594e.png b/app/(blog)/blog/demystifying-technical-debt/132d3b12bc22594e.png new file mode 100644 index 000000000..6ef6d4267 Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/132d3b12bc22594e.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/4e80d123d00106b7.png b/app/(blog)/blog/demystifying-technical-debt/4e80d123d00106b7.png new file mode 100644 index 000000000..ebcb3dd8e Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/4e80d123d00106b7.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/4fc027a938f4b0a5.png b/app/(blog)/blog/demystifying-technical-debt/4fc027a938f4b0a5.png new file mode 100644 index 000000000..231063903 Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/4fc027a938f4b0a5.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/75c7c37ca0dd4c23.jpeg b/app/(blog)/blog/demystifying-technical-debt/75c7c37ca0dd4c23.jpeg new file mode 100644 index 000000000..64049a6be Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/75c7c37ca0dd4c23.jpeg differ diff --git a/app/(blog)/blog/demystifying-technical-debt/7635f7e95616d794.png b/app/(blog)/blog/demystifying-technical-debt/7635f7e95616d794.png new file mode 100644 index 000000000..9f7502f86 Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/7635f7e95616d794.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/91daf91929515361.png b/app/(blog)/blog/demystifying-technical-debt/91daf91929515361.png new file mode 100644 index 000000000..77b6714af Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/91daf91929515361.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/927230183b2dd4ce.png b/app/(blog)/blog/demystifying-technical-debt/927230183b2dd4ce.png new file mode 100644 index 000000000..b41ea58f5 Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/927230183b2dd4ce.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/928390a574b36827.png b/app/(blog)/blog/demystifying-technical-debt/928390a574b36827.png new file mode 100644 index 000000000..6fdb3f00c Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/928390a574b36827.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/adbf52a20f1b6129.png b/app/(blog)/blog/demystifying-technical-debt/adbf52a20f1b6129.png new file mode 100644 index 000000000..0d1301989 Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/adbf52a20f1b6129.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/afc5ffad22ef3dfa.png b/app/(blog)/blog/demystifying-technical-debt/afc5ffad22ef3dfa.png new file mode 100644 index 000000000..3f37aa8a3 Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/afc5ffad22ef3dfa.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/b06a6c08163bb49f.png b/app/(blog)/blog/demystifying-technical-debt/b06a6c08163bb49f.png new file mode 100644 index 000000000..80bc864f1 Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/b06a6c08163bb49f.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/dea5acde0e869ea5.png b/app/(blog)/blog/demystifying-technical-debt/dea5acde0e869ea5.png new file mode 100644 index 000000000..c9c3f4eef Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/dea5acde0e869ea5.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/e2acd2ec17692723.png b/app/(blog)/blog/demystifying-technical-debt/e2acd2ec17692723.png new file mode 100644 index 000000000..5cb3829fc Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/e2acd2ec17692723.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/eafa151edb719f5b.png b/app/(blog)/blog/demystifying-technical-debt/eafa151edb719f5b.png new file mode 100644 index 000000000..1aec23ace Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/eafa151edb719f5b.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/fc01c0ee8ce96871.png b/app/(blog)/blog/demystifying-technical-debt/fc01c0ee8ce96871.png new file mode 100644 index 000000000..1003d0fa3 Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/fc01c0ee8ce96871.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/fed657d8659dbc29.png b/app/(blog)/blog/demystifying-technical-debt/fed657d8659dbc29.png new file mode 100644 index 000000000..a8d7535c3 Binary files /dev/null and b/app/(blog)/blog/demystifying-technical-debt/fed657d8659dbc29.png differ diff --git a/app/(blog)/blog/demystifying-technical-debt/page.md b/app/(blog)/blog/demystifying-technical-debt/page.md new file mode 100644 index 000000000..47b7471ee --- /dev/null +++ b/app/(blog)/blog/demystifying-technical-debt/page.md @@ -0,0 +1,259 @@ +--- +author: + name: Egon Elbre +date: '2021-10-14 00:00:00' +heroimage: ./dea5acde0e869ea5.png +layout: blog +metadata: + description: "\u200D\u201CTechnical debt\u201D has been bothering me for a while.\ + \ It looks like a scary monster in the closet. It seems somehow a catchall for\ + \ different design mistakes, code worsening over time, legacy codebases, and intentional\ + \ design mistakes due to time constraints. You can take a look at the list of\ + \ caus..." + title: Demystifying Technical Debt +title: Demystifying Technical Debt + +--- + +‍ + +![](./91daf91929515361.png) + + +“Technical debt” has been bothering me for a while. It looks like a scary monster in the closet. It seems somehow a catchall for different design mistakes, code worsening over time, legacy codebases, and intentional design mistakes due to time constraints. You can take a look at the list of causes in [Wikipedia](https://en.wikipedia.org/wiki/Technical_debt#Causes) if you don’t believe me. It makes you feel like the code is collecting dust when it’s not being maintained, but clearly, that cannot be correct since the code might be unchanged. + +Let’s take this piece of code from "Software Tools" by Kernighan and Plauger. It has been unchanged since 1976. Has the technical debt risen for this code? When we talk about things collecting dust, the book example would have more chance of being dusty than code stored digitally. + +![](./75c7c37ca0dd4c23.jpeg) + + +To push the metaphor to the breaking point, how do you measure technical debt, and how large is the interest? How much code would I need to write to pay off all the debt? If I have a lot of code, can I give a technical loan to other people? + +![](./132d3b12bc22594e.png)But I digress; this unclear “technical debt” metaphor has caused bad decisions in codebases that don’t need fixing. On the other hand, not understanding it has caused people to overlook actual problems. + +Before we get to tackle ***technical debt***, we need to take a slight detour. + + + + +# Quality and Effort + +The first problem we need to tackle is ***quality***. When we are talking about code quality, we usually have the following things in mind: + +* Fit for purpose - whether and how well the code does, what it is supposed to do +* Reliability - does it break every Tuesday between 1 AM - 2 AM; +* Security - can we access, modify or break information that isn’t meant for us; +* Flexibility - how well can the code accommodate new needs; +* Efficiency - how many trees we need to burn to run an operation +* Maintainability - how many hours and lines of code do we need to modify to add, fix or remove a feature. + +![](./fc01c0ee8ce96871.png)When we talk about technical debt, usually, we are concerned about maintainability. There definitely are hints of the other aspects in there, but maintainability seems to be dominant. + +![](./927230183b2dd4ce.png)One way to summarize ***maintainability*** is to treat it as “***effort needed to make a change***.” We can dissect this effort into several pieces or, in other words, places where we use up our energy: + +The most visible part is "***effort in code modification***." We can modify many different aspects of the code: + +* types, structs, variables, methods - the usual language primitives +* packages, modules - the grouping and organization of code +* tests - things that verify that the code works +* frontend, UX - how things look and how you interact with the system +* documentation - things that describe what the code or program does +* tooling, databases - changes to programs that the code needs to interact +* makefiles, build system - changes in how we run, build the code + +By no means is that list exhaustive. The less obvious part of the effort is "***effort in understanding***." Understanding here doesn’t mean only ***understanding*** but clarifying and modifying things that help with understanding. We can dissect it into: + +* code structure - how clear is how things interact and how things are connected +* mental model - how we think about the problem and how it relates to the product +* product - how should the product work +* business value - how does the product give value to its users + +The last major category is about people. You rarely build a product alone. Even if you are the sole coder and owner of the company, you probably still need to com​​municate with your users. So, there’s "***effort in communication***": + +* other developers - asking for help and discussing code design; +* code reviewers - giving and getting feedback on things that can be improved; +* product owners - discussing how the product should work; +* end-users - understanding their needs and where they would get the most value. + +We could dive deeper, but the main point is that ***effort*** is not one-dimensional and involves many human factors besides typing code. + +# Change in effort + +It’s an obvious statement that this ***effort*** changes over time. The question is, how? + +***Code modification effort*** roughly depends on a few factors: the amount of code, code complexity, and understanding of the code. Based on these, we can estimate that effort to modify code usually increases because: + +* features are usually added -> more code and more things to understand; +* features are rarely removed -> amount of code doesn’t decrease arbitrarily; +* the user interface is larger -> more things that can interact, hence more complexity; +* features accumulate more cases -> which means more complex and more code. + +![](./4fc027a938f4b0a5.png)***Understanding effort*** roughly depends on the complexity of the mental model, project, and user needs. It also depends on how much we know the system already. We can similarly estimate that it increases over time: + +* larger number of features interact -> more complex mental model and cases to consider; +* more business rules and concerns -> because we want to solve the user problems better; +* knowledge of code, that isn’t being modified is forgotten -> it’s going to be harder to work with a system that you don’t know; +* people come and go -> tacit knowledge is lost when a person leaves. + +![](./fed657d8659dbc29.png)***Communication effort*** roughly depends on the number of people you need to communicate with and clarity on organization structure. Here it’s harder to pinpoint clear tendencies, but we can estimate that: + +* communication effort increases when a company grows +* communication effort decreases when processes and company structure is clarified + +![](./7635f7e95616d794.png)Overall, we can estimate that: + +***The effort to maintain a project increases without activities that actively reduce it.*** + +![](./b06a6c08163bb49f.png)It would be easy to conclude that this “***increase in the effort***” is the “***technical debt***.” However, when we look back at the initial question about old code. + +![](./75c7c37ca0dd4c23.jpeg)This code has been years in a book without any new additions and no one communicating about it, but some still consider it technical debt. + +There must be things that we don’t take into account when thinking about technical debt. + +# Mistakes everywhere + +One of the fundamental laws of software development is that you make mistakes. Technical debt is often associated with bad decisions in the past. Let’s get philosophical – how do we recognize mistakes? + +![](./e2acd2ec17692723.png) + + +When we look at this equation, we have two parts in our head: + +* The perception of the equation. +* The expectation of the equation and what it should be. + +Or in other words, there’s something that we ***perceive*** and realize that it’s not in its ***ideal*** state. The more significant this difference between our perception and expectation, the larger the mistake seems. + +We can apply the same line of thinking to ***effort*** and ***maintainability***. + +![](./09f5df2707586cc9.png)Our ***ideal effort to modify*** decreases when we learn how things could be better. So, there’s a “potential improvement” that we evaluate. We could simplify this into an equation: + +**Technical Debt ~ Perceived Effort - Ideal Effort** + +![](./928390a574b36827.png)There are several interesting observations here. + +When there’s a breakthrough in technology, people realize that there’s a much better way to do something. Hence, they feel that their project has technical debt and they should fix it. Although, the effort to maintain the project hasn’t changed. Only the expectation has changed. In principle, the technical debt is higher because people learned something new. *Note, that our "ideal" may have many problems that are being overlooked*. + +![](./afc5ffad22ef3dfa.png)Borrowing technical debt is also nicely explained with this way of thinking. Instead of perceived effort and ideal effort changing separately, they are changed together. Or in other words, we increase perceived effort while knowing that ideal effort would increase it less. + +![](./4e80d123d00106b7.png) + + +This model does seem to explain technical debt quite well and gives us a nice intuition about different situations. + +As a side note, it is interesting to consider ***quality debt*** or ***security debt***. However, it’s essential to realize that improving ***quality*** can sometimes increase effort to maintain the software. For example, writing code is much easier if you don’t care about security or performance. + +# Pobody’s Nerfect + +It might seem that “***perceived effort***” and “***ideal effort***” are easy to measure, but they have many dimensions. Similarly, different people may come to different conclusions. + +The first question is, whose effort? – if we measure “hours spent on a change,” then different people have different speeds. We could consider the “average developer in the world,” or “average developer in the company,” or “the average junior developer,” or “average game developer.” Additionally, people have different skills and knowledge in different areas. + +The second question is, which changes? Is it about an arbitrary change in the codebase or the most common change or architectural changes? All of these are different, and some are more likely than others. + +Finally, we need to consider the person evaluating because every person has some biases. Especially when dealing with “***perceived effort***” and “***ideal effort***.” + +![](./eafa151edb719f5b.png) + + +For example, if the person is hyped about a language, framework, or works with a system they know well, they can easily underestimate the average effort. This is due to knowing how to solve common problems and knowing how to avoid the issues in the first place. + +On the other hand, if the person has a strong preference for other tools, the tools have a flaky behavior, or the person doesn’t understand the system, they can overestimate the effort needed to maintain a system. For example, flaky tests that fail once a month are annoying; however, realistically, it doesn’t affect the effort to maintain too much. + +We tend to overestimate the effort needed to maintain code written in old languages. Definitely, there is more effort required to maintain old code, but it’s not as big as it might seem. Think about how people learn a new JavaScript framework and library every week and keep up with it. If you can learn new code, you can learn old code. + +We also tend to overestimate the effort needed to use another language with different features. A C++ programmer starting to use Go would feel overly restricted and hence conclude that they will be significantly slower when writing. Similarly, a Go programmer thinks they would be overwhelmed when starting to use Rust due to the number of available features. Both are right to some degree, but the main reason for feeling the “speed of writing difference” is not knowing how to use the language effectively. After a few months of using a language, the unfamiliarity will decrease. There are definitely differences in languages and their usability, but it’s not as big as it seems at first sight. Nevertheless, there would still be a bias towards the language and community you like more. + +Interestingly there’s no such feeling when taking a language with an unfamiliar paradigm. In such cases, we accept our ignorance much more quickly. + +Beginner programmers seem to overestimate the “ideal effort” for newer frameworks because it might look like they solve all the problems. Veteran programmers are either realistic or pessimistic because they have been burnt before and know that most problems don’t lie in the framework but elsewhere. + +Overall we can state that the less familiar you are with a codebase, system, tool, the higher your bias can be. The bias can be either positive or negative. + +# Technical Debt by Ward Cunningham + +Initially, when Ward Cunningham came up with the metaphor, he only had the “code mismatching business ideas” in mind. He was more precise in its formulation than people know. + +*And that said that if we failed to make our program align with what we then understood to be the proper way to think about our financial objects, then we were gonna continually stumble over that disagreement and that would slow us down which was like paying interest on a loan.* + +* *Ward Cunningham (*[*https://www.youtube.com/watch?v=pqeJFYwnkjE,*](https://www.youtube.com/watch?v=pqeJFYwnkjE,)[*http://wiki.c2.com/?WardExplainsDebtMetaphor*](http://wiki.c2.com/?WardExplainsDebtMetaphor)*)* + +In other words, we improve our “ideal mental model of the system,” and there’s a difference between our code and the ideal mental model. There was no concept of “borrowing,” and that would’ve been an error while developing. + +# What can you do about it? + +After all of this discussion, you might wonder how you deal with technical debt. + +## Rewrite??? + +The first inclination for people to get rid of “technical debt” is to rewrite the system. Rewriting carries considerable risk, and the larger the piece you are rewriting, the larger the chance of failure. + +Few factors contribute to rewriting ending up in a failure: + +* People don’t notice things that work well in the current system due to desensitization. During rewriting, it’s easy to forget that they should keep working well. These parts are also often more important than the things that currently don’t work well. +* *Note, people also get desensitized to things that work poorly consistently. For example, when some process always takes 10 min, it doesn't bother people; however, when it takes 10 min randomly, it does.* +* Size of the refactoring. Each line of code you change can introduce a bug; hence, the more lines and more systems the piece of code integrates, the more likely it is to have faults. +* People focus on fixing the mistakes, sometimes at the cost of the rest of the system. One aspect of this is the “second-system effect,” where you end up overcomplicating the system by including all the missing features. +* Unclear understanding of how the current system exactly works. It’s pretty common that people want to rewrite a system because they don’t understand it. + +Overall, a [rewrite should be the last resort](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/), and try to minimize the problems above to ensure that the rewrite ends up as a success. Before rewriting, you also should be able to give an estimate (in numbers) on how much the rewrite would help. + +Prefer refactoring over rewriting and see "Working Effectively with Legacy Code" by Michael C. Feathers for more recommendations. + +## Continuous Learning + +One good way to prevent “technical debt” is to ensure that the developers have a scheduled time to learn about coding and the business. The more the developers know how to do things, the fewer surprises they get about their system. There are many ways to implement these in a company - 20% projects, hack weeks, book clubs, regular presentations by other people. + +Try to implement one feature in multiple ways. The more ways you know how to solve a problem, the more informed your decision will be. + +Finally, a good strategy is to ask for help and guidance. There are plenty of experienced programmers that can do a review of your code and suggest improvements. + +When we learn new things, it’ll actually end up increasing “technical debt” because it lowers the “ideal effort,”… but it will also mean that programmers are more likely to write code that is nearer to that “ideal effort.” + + + + +## Code Reviews + +Code reviews significantly help disseminate understanding about the system and learn early about things that could be better. + +First, automate linting as much as possible and automate formatting. Ideally, when someone starts to review the code, all the style questions have already been solved. The style questions can grab attention quite fast. + +One target for every developer is to ensure your next pull request quality is better than the last one. By trying to improve the quality of every PR, people end up defaulting to a better baseline. There are many ways to define better; ideally, try to improve in multiple dimensions. + +Strive for gradual improvement of the codebase rather than large improvements. As mentioned previously, the larger the change, the more likely it is to contain mistakes. + +Ideally, target less than 400 LOC per change. When the change is over 400 LOC, the reviewer fatigue kicks in, and reviewers start missing more bugs. Similarly, when commits are small, they get merged faster and are less likely to go stale. [See this SmartBear study for more information](https://smartbear.com/learn/code-review/best-practices-for-peer-code-review/). + +While reviewing, always consider whether two similar commits would make it difficult to maintain? If yes, then the next PR should be preceded by an improvement to the structure. + +## Maintenance + +It’s easy to lose sight of the overall picture when implementing things one at a time. Hence, do regular architecture reviews. Think about whether everything feels right. Note down places where people waste their effort and discuss how you can improve these parts. + +As programmers, the first inclination is to fix “maintenance effort” with fixing the code; there sometimes can be alternative means. For example, a good video explaining how the existing “bad system” works and why things ended up that way can be much less effort and have more impact. + +For maintenance, it’s helpful to isolate problem areas. Some third-party packages and libraries are pervasive and can seriously affect the rest of the codebase. By creating a nice wrapper for those systems, they can be made less benign. + +## Acceptance + +The final advice is about acceptance: + +*“Not all of a large system will be well designed…”* + +* *Eric Evans* + +While the inclination is to try to fix all the problems you notice, it might not make a significant difference to the needed effort to maintain the system. You don’t have to rewrite your bash scripts in Haskell for the glory of purity. Your time to implement things is limited and try to figure out how you can make the most impact on the value stream. + +# Conclusion + +Technical debt is not “dust” that accumulates on your code, but rather it’s an inherent part of code. Over time you learn and notice mistakes in your systems. Using “technical debt accumulates” is the wrong mentality; instead, it should be considered “discovering technical debt.” + +**Technical Debt is not a Monster.** + +**It’s just the realization that you could do better.** + +![](./adbf52a20f1b6129.png) + + + diff --git a/app/(blog)/blog/finding-and-tracking-resource-leaks-in-go/d2d97753e00bbc80.png b/app/(blog)/blog/finding-and-tracking-resource-leaks-in-go/d2d97753e00bbc80.png new file mode 100644 index 000000000..bbd0eae58 Binary files /dev/null and b/app/(blog)/blog/finding-and-tracking-resource-leaks-in-go/d2d97753e00bbc80.png differ diff --git a/app/(blog)/blog/finding-and-tracking-resource-leaks-in-go/page.md b/app/(blog)/blog/finding-and-tracking-resource-leaks-in-go/page.md new file mode 100644 index 000000000..2983b3f50 --- /dev/null +++ b/app/(blog)/blog/finding-and-tracking-resource-leaks-in-go/page.md @@ -0,0 +1,402 @@ +--- +author: + name: Egon Elbre +date: '2022-10-13 00:00:00' +heroimage: ./d2d97753e00bbc80.png +layout: blog +metadata: + description: Forgetting to close a file, a connection, or some other resource is + a rather common issue in Go. Usually you can spot them with good code review practices, + but what if you wanted to automate it and you don't have a suitable linter at + hand? + title: Finding and Tracking Resource Leaks in Go +title: Finding and Tracking Resource Leaks in Go + +--- + +Forgetting to close a file, a connection, or some other resource is a rather common issue in Go. Usually you can spot them with good code review practices, but what if you wanted to automate it and you don't have a suitable linter at hand? + +How do we track and figure out those leaks? + +Fortunately, there's an approach to finding common resource leaks that we’ll explore below. + +## Problem: Connection Leak + +Let's take a simple example that involves a TCP client. Of course, it applies to other protocols, such as GRPC, database, or HTTP. We'll omit the communication implementation because it's irrelevant to the problem. + +```go +type Client struct { + conn net.Conn +} + +func Dial(ctx context.Context, address string) (*Client, error) { + conn, err := (&net.Dialer{}).DialContext(ctx, "tcp", address) + if err != nil { + return nil, fmt.Errorf("failed to dial: %w", err) + } + + return &Client{conn: conn}, nil +} + +func (client *Client) Close() error { + return client.conn.Close() +} +``` + +It's easy to put the defer in the wrong place or forget to call Close altogether. + +```go +func ExampleDial(ctx context.Context) error { + source, err := Dial(ctx, "127.0.0.1:1000") + if err != nil { + return err + } + + destination, err := Dial(ctx, "127.0.0.1:1001") + if err != nil { + return err + } + + defer source.Close() + defer destination.Close() + + data, err := source.Recv(ctx) + if err != nil { + return fmt.Errorf("recv failed: %w", err) + } + + err = destination.Send(ctx, data) + if err != nil { + return fmt.Errorf("send failed: %w", err) + } + + return nil +} +``` + +Notice if we fail to dial the second client, we have forgotten to close the source connection. + +## Problem: File Leak + +Another common resource management mistake is a file leak. + +```go +func ExampleFile(ctx context.Context, fs fs.FS) error { + file, err := fs.Open("data.csv") + if err != nil { + return fmt.Errorf("open failed: %w", err) + } + + stat, err := fs.Stat() + if err != nil { + return fmt.Errorf("stat failed: %w", err) + } + + fmt.Println(stat.Name()) + + _ = file.Close() + return nil +} +``` + +## Tracking Resources + +How do we track and figure out those leaks? One thing we can do is to keep track of every single open file and connection and ensure that everything is closed when the tests finish. + +We need to build something that keeps a list of all open things and tracks where we started using a resource. + +To figure out where our "leak" comes from, we can use [`runtime.Callers`](https://pkg.go.dev/runtime#Callers). You can look at the [Frames example](https://pkg.go.dev/runtime#example-Frames) to learn how to use it. Let's call the struct we use to hold this information a `Tag`. + +```go +// Tag is used to keep track of things we consider open. +type Tag struct { + owner *Tracker // we'll explain this below + caller [5]uintptr +} + +// newTag creates a new tracking tag. +func newTag(owner *Tracker, skip int) *Tag { + tag := &Tag{owner: owner} + runtime.Callers(skip+1, tag.caller[:]) + return tag +} + +// String converts a caller frames to a string. +func (tag *Tag) String() string { + var s strings.Builder + frames := runtime.CallersFrames(tag.caller[:]) + for { + frame, more := frames.Next() + if strings.Contains(frame.File, "runtime/") { + break + } + fmt.Fprintf(&s, "%s\n", frame.Function) + fmt.Fprintf(&s, "\t%s:%d\n", frame.File, frame.Line) + if !more { + break + } + } + return s.String() +} + +// Close marks the tag as being properly deallocated. +func (tag *Tag) Close() { + tag.owner.Remove(tag) +} +``` + +Of course, we need something to keep the list of all open trackers: + +```go +// Tracker keeps track of all open tags. +type Tracker struct { + mu sync.Mutex + closed bool + open map[*Tag]struct{} +} + +// NewTracker creates an empty tracker. +func NewTracker() *Tracker { + return &Tracker{open: map[*Tag]struct{}{}} +} + +// Create creates a new tag, which needs to be closed. +func (tracker *Tracker) Create() *Tag { + tag := newTag(tracker, 2) + + tracker.mu.Lock() + defer tracker.mu.Unlock() + + // We don't want to allow creating a new tag, when we stopped tracking. + if tracker.closed { + panic("creating a tag after tracker has been closed") + } + tracker.open[tag] = struct{}{} + + return tag +} + +// Remove stops tracking tag. +func (tracker *Tracker) Remove(tag *Tag) { + tracker.mu.Lock() + defer tracker.mu.Unlock() + delete(tracker.open, tag) +} + +// Close checks that none of the tags are still open. +func (tracker *Tracker) Close() error { + tracker.mu.Lock() + defer tracker.mu.Unlock() + + tracker.closed = true + if len(tracker.open) > 0 { + return errors.New(tracker.openResources()) + } + return nil +} + +// openResources returns a string describing all the open resources. +func (tracker *Tracker) openResources() string { + var s strings.Builder + fmt.Fprintf(&s, "%d open resources\n", len(tracker.open)) + + for tag := range tracker.open { + fmt.Fprintf(&s, "---\n%s\n", tag) + } + + return s.String() +} +``` + +Let's look at how it works: + +```go +func TestTracker(t *testing.T) { + tracker := NewTracker() + defer func() { + if err := tracker.Close(); err != nil { + t.Fatal(err) + } + }() + + tag := tracker.Create() + // if we forget to call Close, then the test fails. + // tag.Close() +} +``` + +You can test it over at https://go.dev/play/p/8AkKrzYVFH5. + +## Hooking up the tracker to a `fs.FS` + +We need to integrate it into the initially problematic code. We can create a wrapper for `fs.FS` that creates a tag for each opened file. + +```go +type TrackedFS struct { + tracker *Tracker + fs fs.FS +} + +func TrackFS(fs fs.FS) *TrackedFS { + return &TrackedFS{ + tracker: NewTracker(), + fs: fs, + } +} + +func (fs *TrackedFS) Open(name string) (fs.File, error) { + file, err := fs.fs.Open(name) + if err != nil { + return file, err + } + + tag := fs.tracker.Create() + return &trackedFile{ + File: file, + tag: tag, + }, nil +} + +func (fs *TrackedFS) Close() error { return fs.tracker.Close() } + +type trackedFile struct { + fs.File + tag *Tag +} + +func (file *trackedFile) Close() error { + file.tag.Close() + return file.File.Close() +} +``` + +Finally, we can use this wrapper in a test and get some actual issues resolved: + +```go +func TestFS(t *testing.T) { + // We'll use `fstest` package here, but you can also replace this with + // `os.DirFS` or similar. + dir := fstest.MapFS{ + "data.csv": &fstest.MapFile{Data: []byte("hello")}, + } + + fs := TrackFS(dir) + defer func() { + if err := fs.Close(); err != nil { + t.Fatal(err) + } + }() + + file, err := fs.Open("data.csv") + if err != nil { + t.Fatal(err) + } + + stat, err := file.Stat() + if err != nil { + t.Fatal(err) + } + + t.Log(stat.Name()) +} +``` + +You can play around with it here https://go.dev/play/p/VTKZUzWukTe. + + +## Hooking up the tracker via a `Context` + +Passing this `tracker` everywhere would be rather cumbersome. However, we can write some helpers to put the tracker inside a `Context`. + +```go +type trackerKey struct{} + +func WithTracker(ctx context.Context) (*Tracker, context.Context) { + tracker := NewTracker() + return tracker, context.WithValue(ctx, trackerKey{}, tracker) +} + +func TrackerFromContext(ctx context.Context) *Tracker { + value := ctx.Value(trackerKey{}) + return value.(*Tracker) +} +``` + +Of course, we need to adjust our `Client` implementation as well: + +```go +type Client struct { + conn net.Conn + tag *Tag +} + +func Dial(ctx context.Context, address string) (*Client, error) { + conn, err := (&net.Dialer{}).DialContext(ctx, "tcp", address) + if err != nil { + return nil, fmt.Errorf("failed to dial: %w", err) + } + + tracker := TrackerFromContext(ctx) + return &Client{conn: conn, tag: tracker.Create()}, nil +} + +func (client *Client) Close() error { + client.tag.Close() + return client.conn.Close() +} +``` + +To make our testing code even shorter, we can make a tiny helper: + +```go +func TestingTracker(ctx context.Context, tb testing.TB) context.Context { + tracker, ctx := WithTracker(ctx) + tb.Cleanup(func() { + if err := tracker.Close(); err != nil { + tb.Fatal(err) + } + }) + return ctx +} +``` + +Finally, we can put it all together: + +```go +func TestClient(t *testing.T) { + ctx := TestingTracker(context.Background(), t) + + addr := startTestServer(t) + + client, err := Dial(ctx, addr) + if err != nil { + t.Fatal(err) + } + + // if we forget to close, then the test will fail + // client.Close + _ = client +} +``` + +You can see it working over here https://go.dev/play/p/B6qI6xgij1m. + +## Making it zero cost for production + +Now, all of this `runtime.Callers` calling comes with a high cost. However, we can reduce it by conditionally compiling the code. Luckily we can use tags to only compile it only for testing. I like to use the `race` tag for it because it is added any time you run your tests with `-race`. + +```go +//go:build race + +package tracker +``` + +The implementations are left as an exercise for the reader. :) + +## Conclusion + +This is probably not a final solution for your problem, but hopefully, it is a good starting point. You can add more helpers, maybe track the filename inside a `Tag`, or only print unique caller frames in the test failure. Maybe try implementing this for SQL driver and track each thing separately -- you can take a peek [at our implementation](https://github.com/storj/private/tree/main/tagsql), if you get stuck. + +May all your resource leaks be discovered. + +This is a continuation of our series of finding leaks in Golang. In case you missed it, in a previous post we covered [finding leaked goroutines](https://www.storj.io/blog/finding-goroutine-leaks-in-tests). diff --git a/app/(blog)/blog/finding-goroutine-leaks-in-tests/c1245dac8cff160d.jpeg b/app/(blog)/blog/finding-goroutine-leaks-in-tests/c1245dac8cff160d.jpeg new file mode 100644 index 000000000..d78c9d33b Binary files /dev/null and b/app/(blog)/blog/finding-goroutine-leaks-in-tests/c1245dac8cff160d.jpeg differ diff --git a/app/(blog)/blog/finding-goroutine-leaks-in-tests/page.md b/app/(blog)/blog/finding-goroutine-leaks-in-tests/page.md index fdae92417..ba898f61f 100644 --- a/app/(blog)/blog/finding-goroutine-leaks-in-tests/page.md +++ b/app/(blog)/blog/finding-goroutine-leaks-in-tests/page.md @@ -1,403 +1,297 @@ --- -layout: blog -title: Finding Goroutine Leaks in Tests -date: 2022-03-07 author: name: Egon Elbre - title: Software Engineer -hackernews: https://news.ycombinator.com/item?id=30610952 +date: '2022-03-07 00:00:00' +heroimage: ./c1245dac8cff160d.jpeg +layout: blog metadata: - title: Automated Tracking of Goroutine Leaks in Go Testing - description: - A comprehensive guide on finding and tracking resource leaks (e.g., - file or connection leakages) in Go testing by automating the process using runtime - callers for efficient resource management. ---- + description: 'A leaked goroutine at the end of a + test can indicate several problems. Let''s first, take a look at the most common + ones before tackling an approach to finding them.Problem: DeadlockFirst, we can + have a goroutine that is blocked. As an example:func LeakySumSquares(c...' + title: Finding Goroutine Leaks in Tests +title: Finding Goroutine Leaks in Tests -Forgetting to close a file, a connection, or some other resource is a rather common issue in Go. Usually you can spot them with good code review practices, but what if you wanted to automate it and you don't have a suitable linter at hand? +--- -How do we track and figure out those leaks? +A leaked goroutine at the end of a test can indicate several problems. Let's first, take a look at the most common ones before tackling an approach to finding them. -Fortunately, there's an approach to finding common resource leaks that we’ll explore below. +### Problem: Deadlock -## Problem: Connection Leak +First, we can have a goroutine that is blocked. As an example: -Let's take a simple example that involves a TCP client. Of course, it applies to other protocols, such as GRPC, database, or HTTP. We'll omit the communication implementation because it's irrelevant to the problem. ```go -type Client struct { - conn net.Conn -} - -func Dial(ctx context.Context, address string) (*Client, error) { - conn, err := (&net.Dialer{}).DialContext(ctx, "tcp", address) - if err != nil { - return nil, fmt.Errorf("failed to dial: %w", err) - } - - return &Client{conn: conn}, nil -} - -func (client *Client) Close() error { - return client.conn.Close() +func LeakySumSquares(ctx context.Context, data []int) ( + total int, err error) { + + results := make(chan int) + + for _, v := range data { + v := v + go func() { + result := v * v + results <- result + }() + } + + for { + select { + case value := <-results: + total += value + case <-ctx.Done(): + return ctx.Err() + } + } + + return total, nil } -``` -It's easy to put the defer in the wrong place or forget to call Close altogether. - -```go -func ExampleDial(ctx context.Context) error { - source, err := Dial(ctx, "127.0.0.1:1000") - if err != nil { - return err - } - - destination, err := Dial(ctx, "127.0.0.1:1001") - if err != nil { - return err - } - - defer source.Close() - defer destination.Close() - - data, err := source.Recv(ctx) - if err != nil { - return fmt.Errorf("recv failed: %w", err) - } - - err = destination.Send(ctx, data) - if err != nil { - return fmt.Errorf("send failed: %w", err) - } - - return nil -} ``` +In this case, when the context is canceled, the goroutines might end up leaking. -Notice if we fail to dial the second client, we have forgotten to close the source connection. +### Problem: Leaked Resource -## Problem: File Leak +Many times different services, connections, or databases have an internal goroutine used for async processing. A leaked goroutine can show such leaks. -Another common resource management mistake is a file leak. ```go -func ExampleFile(ctx context.Context, fs fs.FS) error { - file, err := fs.Open("data.csv") - if err != nil { - return fmt.Errorf("open failed: %w", err) - } - - stat, err := fs.Stat() - if err != nil { - return fmt.Errorf("stat failed: %w", err) - } +type Conn struct { + messages chan Message - fmt.Println(stat.Name()) - - _ = file.Close() - return nil + close context.CancelFunc + done chan struct{} } -``` - -## Tracking Resources - -How do we track and figure out those leaks? One thing we can do is to keep track of every single open file and connection and ensure that everything is closed when the tests finish. -We need to build something that keeps a list of all open things and tracks where we started using a resource. - -To figure out where our "leak" comes from, we can use [`runtime.Callers`](https://pkg.go.dev/runtime#Callers). You can look at the [Frames example](https://pkg.go.dev/runtime#example-Frames) to learn how to use it. Let's call the struct we use to hold this information a `Tag`. - -```go -// Tag is used to keep track of things we consider open. -type Tag struct { - owner *Tracker // we'll explain this below - caller [5]uintptr +func Dial(ctx context.Context) *Conn { + ctx, cancel := context.WithCancel(ctx) + conn := &Conn{ + close: cancel, + messages: make(chan Message) + done: make(chan struct{}), + } + go conn.monitor(ctx) + return conn } -// newTag creates a new tracking tag. -func newTag(owner *Tracker, skip int) *Tag { - tag := &Tag{owner: owner} - // highlight - runtime.Callers(skip+1, tag.caller[:]) - return tag +func (conn *Conn) monitor(ctx context.Context) { + defer close(conn.done) + for { + select { + case msg := <-conn.messages: + conn.handle(msg) + case <-ctx.Done(): + return + } + } } -// String converts a caller frames to a string. -func (tag *Tag) String() string { - var s strings.Builder - // highlight - frames := runtime.CallersFrames(tag.caller[:]) - for { - frame, more := frames.Next() - if strings.Contains(frame.File, "runtime/") { - break - } - fmt.Fprintf(&s, "%s\n", frame.Function) - fmt.Fprintf(&s, "\t%s:%d\n", frame.File, frame.Line) - if !more { - break - } - } - return s.String() +func (conn *Conn) Close() { + conn.close() + <-conn.done } -// Close marks the tag as being properly deallocated. -func (tag *Tag) Close() { - tag.owner.Remove(tag) -} ``` +Even if the main loop is properly handled, the *conn.handle(msg)* could become deadlocked in other ways. -Of course, we need something to keep the list of all open trackers: -```go -// Tracker keeps track of all open tags. -type Tracker struct { - mu sync.Mutex - closed bool - open map[*Tag]struct{} -} - -// NewTracker creates an empty tracker. -func NewTracker() *Tracker { - return &Tracker{open: map[*Tag]struct{}{}} -} +### Problem: Lazy Closing Order -// Create creates a new tag, which needs to be closed. -func (tracker *Tracker) Create() *Tag { - tag := newTag(tracker, 2) + +Even if all the goroutines terminate, there can still be order problems with regard to resource usage. For example, you could end up depending on a database, connection, file, or any other resource, that gets closed before the goroutine finishes. - tracker.mu.Lock() - defer tracker.mu.Unlock() - // We don't want to allow creating a new tag, when we stopped tracking. - if tracker.closed { - panic("creating a tag after tracker has been closed") - } - tracker.open[tag] = struct{}{} +Let's take a common case of the problem: - return tag -} -// Remove stops tracking tag. -func (tracker *Tracker) Remove(tag *Tag) { - tracker.mu.Lock() - defer tracker.mu.Unlock() - delete(tracker.open, tag) +```go +type Server struct { + log Logger + db *sql.DB } -// Close checks that none of the tags are still open. -func (tracker *Tracker) Close() error { - tracker.mu.Lock() - defer tracker.mu.Unlock() - - tracker.closed = true - if len(tracker.open) > 0 { - return errors.New(tracker.openResources()) - } - return nil +func NewServer(log Logger, dburi string) (*Server, error) { + db, err := sql.Open("postgres", dburi) + if err != nil { + return nil, fmt.Errorf("opening database failed: %w", err) + } + return &Server{log: log, db: db}, nil } -// openResources returns a string describing all the open resources. -func (tracker *Tracker) openResources() string { - var s strings.Builder - fmt.Fprintf(&s, "%d open resources\n", len(tracker.open)) - for tag := range tracker.open { - fmt.Fprintf(&s, "---\n%s\n", tag) - } +func (server *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) { + tag := r.FormValue("tag") + if tag == "" { + return + } - return s.String() + // update the database in the background + go func() { + err := server.db.Exec("...", tag) + if err != nil { + server.log.Errorf("failed to update tags: %w", err) + } + }() } -``` -Let's look at how it works: -```go -func TestTracker(t *testing.T) { - tracker := NewTracker() - defer func() { - if err := tracker.Close(); err != nil { - t.Fatal(err) - } - }() - - tag := tracker.Create() - // if we forget to call Close, then the test fails. - // tag.Close() +func (server *Server) Close() { + _ = server.db.Close() } + ``` +In this case, when the *Server* is closed, there still could be goroutines updating the database in the background. Similarly, even the *Logger* could be closed before the goroutine finishes, causing some other problems. -You can test it over at . -## Hooking up the tracker to a `fs.FS` +The severity of such close ordering depends on the context. Sometimes it's a simple extra error in the log; in other cases, it can be a data-race or a panic taking the whole process down. -We need to integrate it into the initially problematic code. We can create a wrapper for `fs.FS` that creates a tag for each opened file. +### Rule of Thumb -```go -type TrackedFS struct { - tracker *Tracker - fs fs.FS -} +Hopefully, it's clear that such goroutines can be problematic. -func TrackFS(fs fs.FS) *TrackedFS { - return &TrackedFS{ - tracker: NewTracker(), - fs: fs, - } -} +One of the best rules in terms of preventing these issues is: -func (fs *TrackedFS) Open(name string) (fs.File, error) { - file, err := fs.fs.Open(name) - if err != nil { - return file, err - } - - tag := fs.tracker.Create() - return &trackedFile{ - File: file, - tag: tag, - }, nil -} -func (fs *TrackedFS) Close() error { return fs.tracker.Close() } +The location that starts the goroutine must wait for the goroutine to complete even in the presence of context cancellation. Or, it must explicitly transfer that responsibility to some other service. -type trackedFile struct { - fs.File - tag *Tag -} +As long as you close the top-level service responsible for everything, it'll become visible in tests because if there's a leak, then the test cannot finish. -func (file *trackedFile) Close() error { - file.tag.Close() - return file.File.Close() -} -``` +Unfortunately, this rule cannot be applied to third-party libraries and it's easy to forget to add tracking to a goroutine. -Finally, we can use this wrapper in a test and get some actual issues resolved: -```go -func TestFS(t *testing.T) { - // We'll use `fstest` package here, but you can also replace this with - // `os.DirFS` or similar. - dir := fstest.MapFS{ - "data.csv": &fstest.MapFile{Data: []byte("hello")}, - } - - fs := TrackFS(dir) - defer func() { - if err := fs.Close(); err != nil { - t.Fatal(err) - } - }() - - file, err := fs.Open("data.csv") - if err != nil { - t.Fatal(err) - } - - stat, err := file.Stat() - if err != nil { - t.Fatal(err) - } - - t.Log(stat.Name()) -} -``` +### Finding Leaks -You can play around with it here . +We could use the total number of goroutines, to find leaks at the end of a test, however that wouldn't work with parallel tests. -## Hooking up the tracker via a `Context` -Passing this `tracker` everywhere would be rather cumbersome. However, we can write some helpers to put the tracker inside a `Context`. +One helpful feature in Go is [goroutine labels](https://rakyll.org/profiler-labels/), which can make profiling and stack traces more readable. One interesting feature they have is that they are propagated automatically to child goroutines. -```go -type trackerKey struct{} -func WithTracker(ctx context.Context) (*Tracker, context.Context) { - tracker := NewTracker() - return tracker, context.WithValue(ctx, trackerKey{}, tracker) -} +This means if we attach a unique label to a goroutine, we should be able to find all the child goroutines. However, code for finding such goroutines is not trivial. -func TrackerFromContext(ctx context.Context) *Tracker { - value := ctx.Value(trackerKey{}) - return value.(*Tracker) -} -``` -Of course, we need to adjust our `Client` implementation as well: +To attach the label: -```go -type Client struct { - conn net.Conn - tag *Tag -} -func Dial(ctx context.Context, address string) (*Client, error) { - conn, err := (&net.Dialer{}).DialContext(ctx, "tcp", address) - if err != nil { - return nil, fmt.Errorf("failed to dial: %w", err) - } - tracker := TrackerFromContext(ctx) - return &Client{conn: conn, tag: tracker.Create()}, nil +```go +func Track(ctx context.Context, t *testing.T, fn func(context.Context)) { + label := t.Name() + pprof.Do(ctx, pprof.Labels("test", label), fn) + if err := CheckNoGoroutines("test", label); err != nil { + t.Fatal("Leaked goroutines\n", err) + } } -func (client *Client) Close() error { - client.tag.Close() - return client.conn.Close() -} ``` +Unfortunately, currently, there's not an easy way to get the goroutines with a given label. But, we can use some of the profiling endpoints to extract the necessary information. Clearly, this is not very efficient. -To make our testing code even shorter, we can make a tiny helper: ```go -func TestingTracker(ctx context.Context, tb testing.TB) context.Context { - tracker, ctx := WithTracker(ctx) - tb.Cleanup(func() { - if err := tracker.Close(); err != nil { - tb.Fatal(err) - } - }) - return ctx +import "github.com/google/pprof/profile" + +func CheckNoGoroutines(key, value string) error { + var pb bytes.Buffer + profiler := pprof.Lookup("goroutine") + if profiler == nil { + return fmt.Errorf("unable to find profile") + } + err := profiler.WriteTo(&pb, 0) + if err != nil { + return fmt.Errorf("unable to read profile: %w", err) + } + + p, err := profile.ParseData(pb.Bytes()) + if err != nil { + return fmt.Errorf("unable to parse profile: %w", err) + } + + return summarizeGoroutines(p, key, value) } -``` - -Finally, we can put it all together: -```go -func TestClient(t *testing.T) { - ctx := TestingTracker(context.Background(), t) +func summarizeGoroutines(p *profile.Profile, key, expectedValue string) ( + err error) { + var b strings.Builder + + for _, sample := range p.Sample { + if !matchesLabel(sample, key, expectedValue) { + continue + } + + fmt.Fprintf(&b, "count %d @", sample.Value[0]) + // format the stack trace for each goroutine + for _, loc := range sample.Location { + for i, ln := range loc.Line { + if i == 0 { + fmt.Fprintf(&b, "# %#8x", loc.Address) + if loc.IsFolded { + fmt.Fprint(&b, " [F]") + } + } else { + fmt.Fprint(&b, "# ") + } + if fn := ln.Function; fn != nil { + fmt.Fprintf(&b, " %-50s %s:%d", fn.Name, fn.Filename, ln.Line) + } else { + fmt.Fprintf(&b, " ???") + } + fmt.Fprintf(&b, "\n") + } + } + fmt.Fprintf(&b, "\n") + } + + if b.Len() == 0 { + return nil + } + + return errors.New(b.String()) +} - addr := startTestServer(t) +func matchesLabel(sample *profile.Sample, key, expectedValue string) bool { + values, hasLabel := sample.Label[key] + if !hasLabel { + return false + } - client, err := Dial(ctx, addr) - if err != nil { - t.Fatal(err) - } + for _, value := range values { + if value == expectedValue { + return true + } + } - // if we forget to close, then the test will fail - // client.Close - _ = client + return false } -``` -You can see it working over here . +``` +And a failing test might look like this: -## Making it zero cost for production -Now, all of this `runtime.Callers` calling comes with a high cost. However, we can reduce it by conditionally compiling the code. Luckily we can use tags to only compile it only for testing. I like to use the `race` tag for it because it is added any time you run your tests with `-race`. +```go +func TestLeaking(t *testing.T) { + t.Parallel() + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + Track(ctx, t, func(ctx context.Context) { + LeakyThing(ctx) + }) +} -``` -//go:build race +func LeakyThing(ctx context.Context) { + done := make(chan struct{}) + go func() { + go func() { + done <- struct{}{} + }() + done <- struct{}{} + }() +} -package tracker ``` +The full example can be found here . -The implementations are left as an exercise for the reader. :) - -## Conclusion - -This is probably not a final solution for your problem, but hopefully, it is a good starting point. You can add more helpers, maybe track the filename inside a `Tag`, or only print unique caller frames in the test failure. Maybe try implementing this for SQL driver and track each thing separately -- you can take a peek [at our implementation](https://github.com/storj/private/tree/main/tagsql), if you get stuck. +Depending on your use case, you may want to adjust to your needs. For example, you may want to skip some goroutines or maybe print some extra information, or have a grace period for transient goroutines to shut down. -May all your resource leaks be discovered. +Such an approach can be hooked into your tests or existing system in a multitude of ways. -This is a continuation of our series of finding leaks in Golang. In case you missed it, in a previous post we covered [finding leaked goroutines](https://www.storj.io/blog/finding-goroutine-leaks-in-tests). diff --git a/app/(blog)/blog/flexible-file-sharing-with-macaroons/1597845bc69c4f76.png b/app/(blog)/blog/flexible-file-sharing-with-macaroons/1597845bc69c4f76.png new file mode 100644 index 000000000..3c44ee859 Binary files /dev/null and b/app/(blog)/blog/flexible-file-sharing-with-macaroons/1597845bc69c4f76.png differ diff --git a/app/(blog)/blog/flexible-file-sharing-with-macaroons/441ae738c61703c8.png b/app/(blog)/blog/flexible-file-sharing-with-macaroons/441ae738c61703c8.png new file mode 100644 index 000000000..02f2fcca7 Binary files /dev/null and b/app/(blog)/blog/flexible-file-sharing-with-macaroons/441ae738c61703c8.png differ diff --git a/app/(blog)/blog/flexible-file-sharing-with-macaroons/71499072b0db295f.jpeg b/app/(blog)/blog/flexible-file-sharing-with-macaroons/71499072b0db295f.jpeg new file mode 100644 index 000000000..0b4a45746 Binary files /dev/null and b/app/(blog)/blog/flexible-file-sharing-with-macaroons/71499072b0db295f.jpeg differ diff --git a/app/(blog)/blog/flexible-file-sharing-with-macaroons/79fa81cab0393039.png b/app/(blog)/blog/flexible-file-sharing-with-macaroons/79fa81cab0393039.png new file mode 100644 index 000000000..5bdf1ec89 Binary files /dev/null and b/app/(blog)/blog/flexible-file-sharing-with-macaroons/79fa81cab0393039.png differ diff --git a/app/(blog)/blog/flexible-file-sharing-with-macaroons/page.md b/app/(blog)/blog/flexible-file-sharing-with-macaroons/page.md new file mode 100644 index 000000000..c5539565a --- /dev/null +++ b/app/(blog)/blog/flexible-file-sharing-with-macaroons/page.md @@ -0,0 +1,41 @@ +--- +author: + name: Paul Cannon +date: '2019-05-03 00:00:00' +heroimage: ./71499072b0db295f.jpeg +layout: blog +metadata: + description: Sharing is a vital piece of any online storage system. Or, to be more + precise, access control is a vital piece of such systems. When you store a file, + you need to be able to designate whether other people or automated agents are + allowed to retrieve the data, delete it, or put something else in it... + title: Flexible File Sharing With Macaroons +title: Flexible File Sharing With Macaroons + +--- + +Sharing is a vital piece of any online storage system. Or, to be more precise, access control is a vital piece of such systems. When you store a file, you need to be able to designate whether other people or automated agents are allowed to retrieve the data, delete it, or put something else in its place. On top of that, you also need to be able to designate when and for how long that particular access should be allowed. We can refer to this as “sharing” because that is typically how we make direct use of access control in our everyday online lives. You might have shared a Google Docs spreadsheet with a team of people or allowed a friend to download a particular video file from your Dropbox account. Often, this type of sharing involves sending around a unique URL. + +The Storj platform, of course, requires this type of functionality. Some people will be using Storj as personal backup, and won’t want or need anyone else to have access to their stuff. Some will use the platform to store collections of family photos, and they will want to allow friends and family to view the albums (but not add to them or change them). Some will use Storj for business collaboration, and share a large folder with members of a team as though it were a network share. Finally, some will use Storj as a CDN, and will want to allow anyone and everyone to download their content. Storj intends to support all these use cases and many more. + +The mechanism we are building to accomplish all of this uses a special type of bearer credential called a Macaroon. Macaroons can be thought of as “enhanced [browser] cookies”, and that’s why they got the delicious name. They were an idea that originated from Google research¹ a few years ago. Macaroons have the special quality that the user can derive new, more restricted macaroons from them, without any involvement on the part of the service they are targeting. This is useful to us for two chief reasons. First, because macaroons encode all the necessary information about what they grant, Storj Satellites do not need to maintain their own (potentially enormous) list of all shares ever made, including what resources are included, with whom they are shared, and under what conditions access should be allowed. Instead, the Satellites only² need to keep track of one root token per project. Any derived Macaroons that are sent to the Satellite can be parsed and verified when received. The second reason is that this mechanism allows third-party developers a significant amount of power and flexibility, without requiring the development of a powerful and complicated API, along with all the costs and complexity and potential vulnerabilities that would entail. + +Let’s see how they work! + +Suppose you create a new project instance on the Storj network called “OlioFamilyPhotos”. The Satellite you are using gives you a token that will govern access to that project. We can think of it looking like a certificate like this: + +![Macaroons are tasty!](./441ae738c61703c8.png)Any time you ask your computer to read or write files in OlioFamilyPhotos, it can send that special certificate to the Storj Satellite to prove that it’s allowed to do so. The Satellite will verify the digital signature and grant access if appropriate. You could make a copy of the certificate and share it with your spouse, if you trust them with full, unrestricted access to the project. + +But you may want to allow other family members to see these photos, without being able to reorganize or delete anything (they really are busybodies sometimes). Rather than making API calls to the Storj Satellite to ask for a new token, you³ can make a copy of your existing token, cut off the signature, paste it into a bigger certificate, and add on a proviso, like this: + +![But while they are good,](./79fa81cab0393039.png)You can hand out copies of this bigger certificate to family members at the next family reunion. If they want to see your photos, their computers will send this bigger certificate to the Satellite. The Satellite can still verify the digital signature—through the wonders of digital cryptography⁴—and thereby verify that the added proviso is satisfied. If your second cousin-in-law turns out to be nefarious and wants to make changes to your photos, they can’t just cut out the smaller certificate from the big one and use that, because its signature is gone. They could try to make a new bigger certificate with a weaker proviso, but they would not be able to make a valid signature because they don’t know what the original signature looked like. + +Now, imagine Aunt Alice wants to share a specific one of your photos with her friend Beth. Alice values your privacy and does not want to share everything with Beth, and she also does not want Beth to be able to share the photo with anyone else. Just like what we did earlier, Alice can make a copy of her certificate, cut off the signature, paste it into a bigger certificate, and add on some provisos: + +![Macaroons are not as good as macarons!](./1597845bc69c4f76.png) + +Again, the Satellite will be able to verify this digital signature, verify that the signer knew what the signature on the intermediate certificate looked like, verify that the intermediate signature was also valid, and check that all the provisos there are satisfied before granting access. There won’t be any way for Beth to use the original root project token or the intermediate family-sharing token on their own; she will only be able to use this certificate to fetch that one specific file, and nothing else. She also won’t be able to pass on her access to anyone else, because of the “only if the bearer is Beth” proviso that has been indelibly added. + +This chain can be continued for many more steps, allowing Macaroons of significant complexity where necessary. + +We expect to bring you access control by way of Macaroons in one of the next few Alpha releases. Stay tuned for more details! diff --git a/app/(blog)/blog/go-integration-tests-with-postgres/2aaa5a3adcf49612.jpeg b/app/(blog)/blog/go-integration-tests-with-postgres/2aaa5a3adcf49612.jpeg new file mode 100644 index 000000000..c3fc2ef12 Binary files /dev/null and b/app/(blog)/blog/go-integration-tests-with-postgres/2aaa5a3adcf49612.jpeg differ diff --git a/app/(blog)/blog/go-integration-tests-with-postgres/page.md b/app/(blog)/blog/go-integration-tests-with-postgres/page.md new file mode 100644 index 000000000..e8c0bcc96 --- /dev/null +++ b/app/(blog)/blog/go-integration-tests-with-postgres/page.md @@ -0,0 +1,400 @@ +--- +author: + name: Egon Elbre +date: '2023-03-20 00:00:00' +heroimage: ./2aaa5a3adcf49612.jpeg +layout: blog +metadata: + description: When writing server side projects in Go, at some point you will also + need to test against a database. Let's take a look at different ways of using + Postgres with different performance characteristics. The final approach shows + how you can set up a clean database in 20ms (there are a few caveats). + title: Go Integration Tests with Postgres +title: Go Integration Tests with Postgres + +--- + +When writing server side projects in Go, at some point you will also need to test against a database. Let's take a look at different ways of using Postgres with different performance characteristics. The final approach shows how you can set up a clean database in 20ms (there are a few caveats). + +We're not going to cover the "how should you use a real database in your tests" debate. At some point you'll need to test your database layer, so, we'll cover those cases. + +## Using containers + +If you have searched a bit on how to set up a clean test environment, you've probably come across [github.com/ory/dockertest](https://github.com/ory/dockertest) package. There's also [testcontainers](https://golang.testcontainers.org) for setting up containers. Alternatively, you could even invoke docker as a command and use that. Whichever your poison, the approach will look similar. We'll use *dockertest* for our examples. + +Usually, the first thing you do is set up something to act as the client. With *dockertest* it means creating a *dockertest.Pool*. And we need to set it up in our *TestMain*: + +```go +var dockerPool *dockertest.Pool + +func TestMain(m *testing.M) { + var err error + pool, err = dockertest.NewPool("") + if err != nil { + fmt.Fprintln(os.Stderr, err) + os.Exit(1) + } + + // Set a time for our retries. A lower value probably makes more sense. + pool.MaxWait = 120 * time.Second + code := m.Run() + os.Exit(code) +} +``` + +If we are writing tests, then using a specific helper is going to be very convenient. + +```go +func TestCreateTable(t *testing.T) { + ctx := context.Background() + WithDatabase(ctx, t, func(t *testing.TB, db *pgx.Conn) { + _, err := db.Exec(ctx, ` + CREATE TABLE accounts ( user_id serial PRIMARY KEY ); + `) + if err != nil { + t.Fatal(err) + } + }) +} + +func WithDatabase[TB testing.TB](ctx context.Context, tb TB, +test func(t TB, db *pgx.Conn)) { + // < snip > +} +``` + +This approach creates a docker image and calls *test* callback whenever it's ready. + +The callback based approach is especially helpful if you need to test with multiple backends such as Cockroach and Postgres. In your own codebase you probably would return the data layer interface rather than *\*pgx.Conn* directly. For example: + +```go +func TestCreateTable(t *testing.T) { + ctx := context.Background() + db := NewDatabase(ctx, t) + _, err := db.Exec(ctx, ` + CREATE TABLE accounts ( user_id serial PRIMARY KEY ); + `) + if err != nil { + t.Fatal(err) + } +} + +func NewDatabase(ctx context.Context, tb testing.TB) *pgx.Conn { + // create the database resource + tb.Cleanup(func() { + err := db.Close(ctx) + if err != nil { + tb.Logf("failed to close db: %v", err) + } + }) + return conn +} +``` + +A single table migration isn't indicative of a proper database layer, but it's sufficient for seeing the best-case scenario. Adding more tables didn't seem to affect things that much. + +Let's get back on track and see how you can implement the first approach. It's should be trivial to convert one to the other: + +```go +func WithDatabase[TB testing.TB](ctx context.Context, tb TB, +test func(t TB, db *pgx.Conn)) { + + // First we need to specify the image we wish to use. + resource, err := dockerPool.RunWithOptions(&dockertest.RunOptions{ + Repository: "postgres", + Tag: "15", + Env: []string{ + "POSTGRES_PASSWORD=secret", + "POSTGRES_USER=user", + "POSTGRES_DB=main", + "listen_addresses = '*'", + }, + }, func(config *docker.HostConfig) { + // set AutoRemove to true so that stopped container goes away by itself + config.AutoRemove = true + config.RestartPolicy = docker.RestartPolicy{Name: "no"} + }) + if err != nil { + tb.Fatalf("Could not start resource: %s", err) + } + defer func() { + if err := dockerPool.Purge(resource); err != nil { + tb.Logf("failed to stop: %v", err) + } + }() + + // Construct our connection string. + hostAndPort := resource.GetHostPort("5432/tcp") + databaseConnstr := fmt.Sprintf("postgres://user:secret@%s/main?sslmode=disable", hostAndPort) + + err = resource.Expire(2 * 60) // hard kill the container after 2 minutes, just in case. + if err != nil { + tb.Fatalf("Unable to set container expiration: %v", err) + } + + // Finally, try to connect to the container. + // We need to retry, because it might take some time until the container becomes available. + var db *pgx.Conn + err = dockerPool.Retry(func() error { + db, err = pgx.Connect(ctx, databaseConnstr) + if err != nil { + return err + } + return nil + }) + if err != nil { + tb.Fatal("unable to connect to Postgres", err) + } + + defer func() { + err := db.Close(ctx) + if err != nil { + tb.Logf("failed to close db: %v", err) + } + }() + + // Finally call our test code. + test(tb, db) +} +``` + +Let's look at the performance: + +``` +Environment Test Time +Windows Threadripper 2950X Container 2.86s ± 6% +MacOS M1 Pro Container 1.63s ± 16% +Linux Xeon Gold 6226R Container 2.24s ± 10% +``` + +## Using DATABASE + +In most cases, creating a new postgres instance per test isn't necessary. It'll be entirely sufficient to have a database per test. If we have SUPERUSER permissions in postgres we can create them dynamically. + +To contrast with the previous approach, let's use a locally installed Postgres instance. This can be helpful, if you want to run tests against a remote database or want to avoid the container startup time. + +```go +var pgaddr = flag.String("database", os.Getenv("DATABASE_URL"), "database address") +``` + +Let's rewrite the function to create a new database per test: + +```go +func WithDatabase[TB testing.TB](ctx context.Context, tb TB, test func(t TB, db *pgx.Conn)) { + if *pgaddr == "" { + tb.Skip("-database flag not defined") + } + dbaddr := *pgaddr + + // We need to create a unique database name so that our parallel tests don't clash. + var id [8]byte + rand.Read(id[:]) + uniqueName := tb.Name() + "/" + hex.EncodeToString(id[:]) + + // Create the main connection that we use to create the database. + maindb, err := pgx.Connect(ctx, dbaddr) + if err != nil { + tb.Fatalf("Unable to connect to database: %v", err) + } + + // Run the database creation query and defer the database cleanup query. + if err := createDatabase(ctx, maindb, uniqueName); err != nil { + tb.Fatalf("unable to create database: %v", err) + } + defer func() { + if err := dropDatabase(ctx, maindb, uniqueName); err != nil { + tb.Fatalf("unable to drop database: %v", err) + } + }() + + // Modify the connection string to use a different database. + connstr, err := connstrWithDatabase(dbaddr, uniqueName) + if err != nil { + tb.Fatal(err) + } + + // Create a new connection to the database. + db, err := pgx.Connect(ctx, connstr) + if err != nil { + tb.Fatalf("Unable to connect to database: %v", err) + } + defer func() { _ = db.Close(ctx) }() + + // Run our test code. + test(tb, db) +} +``` + +Now for the small utility funcs that we used: + +```go +// connstrWithDatabase changes the main database in the connection string. +func connstrWithDatabase(connstr, database string) (string, error) { + u, err := url.Parse(connstr) + if err != nil { + return "", fmt.Errorf("invalid connstr: %q", connstr) + } + u.Path = database + return u.String(), nil +} + +// createDatabase creates a new database with the specified name. +func createDatabase(ctx context.Context, db *pgx.Conn, name string) error { + _, err := db.Exec(ctx, `CREATE DATABASE `+sanitizeDatabaseName(name)+`;`) + return err +} + +// dropDatabase drops the specific database. +func dropDatabase(ctx context.Context, db *pgx.Conn, name string) error { + _, err := db.Exec(ctx, `DROP DATABASE `+sanitizeDatabaseName(name)+`;`) + return err +} + + +// sanitizeDatabaseName is ensures that the database name is a valid postgres identifier. +func sanitizeDatabaseName(schema string) string { + return pgx.Identifier{schema}.Sanitize() +} +``` + +The performance looks already significantly better: + +``` +Environment Test Time +Windows Threadripper 2950X Container 2.86s ± 6% +Windows Threadripper 2950X Database 136ms ± 12% +MacOS M1 Pro Container 1.63s ± 16% +MacOS M1 Pro Database 136ms ± 12% +Linux Xeon Gold 6226R Container 2.24s ± 10% +Linux Xeon Gold 6226R Database 135ms ± 10% +``` + +## Using SCHEMA + +But, 90ms is still a lot of time per single test. There's one lesser-known approach we discovered in Storj. It's possible to use a [schema](https://www.postgresql.org/docs/current/ddl-schemas.html) to create an isolated namespace that can be dropped together. + +Creating a new schema is as straightforward as executing `CREATE SCHEMA example;` and dropping `DROP SCHEMA example CASCADE;`. When connecting to the database it's possible to add a connection string parameter `?search\_path=example` to execute all queries by default in that schema. + +Of course, if you use schemas for other purposes in your system, then this approach may complicate the rest of your code. Similarly, schemas are not as isolated as separate databases. + +Now that the disclaimer is out of the way, let's take a look at some code: + +```go +func WithSchema[TB testing.TB](ctx context.Context, tb TB, test func(t TB, db *pgx.Conn)) { + if *pgaddr == "" { + tb.Skip("-database flag not defined") + } + dbaddr := *pgaddr + + // We need to create a unique schema name so that our parallel tests don't clash. + var id [8]byte + rand.Read(id[:]) + uniqueName := tb.Name() + "/" + hex.EncodeToString(id[:]) + + // Change the connection string to use a specific schema name. + connstr, err := connstrWithSchema(dbaddr, uniqueName) + if err != nil { + tb.Fatal(err) + } + db, err := pgx.Connect(ctx, connstr) + if err != nil { + tb.Fatalf("Unable to connect to database: %v", err) + } + defer func() { _ = db.Close(ctx) }() + + // Surprisingly, it's perfectly fine to create a schema after connecting with the name. + if err := createSchema(ctx, db, uniqueName); err != nil { + tb.Fatal(err) + } + defer func() { + if err := dropSchema(ctx, db, uniqueName); err != nil { + tb.Fatal(err) + } + }() + + test(tb, db) +} +``` + +The smaller utilities that make it work: + +```go +// connstrWithSchema adds search_path argument to the connection string. +func connstrWithSchema(connstr, schema string) (string, error) { + u, err := url.Parse(connstr) + if err != nil { + return "", fmt.Errorf("invalid connstr: %q", connstr) + } + u.Query().Set("search_path", sanitizeSchemaName(schema)) + return u.String(), nil +} + +// createSchema creates a new schema in the database. +func createSchema(ctx context.Context, db *pgx.Conn, schema string) error { + _, err := db.Exec(ctx, `CREATE SCHEMA IF NOT EXISTS` ++ sanitizeSchemaName(schema)+`;`) + return err +} + +// dropSchema drops the specified schema and associated data. +func dropSchema(ctx context.Context, db *pgx.Conn, schema string) error { + _, err := db.Exec(ctx, `DROP SCHEMA `+sanitizeSchemaName(schema)+` CASCADE;`) + return err +} + +// sanitizeSchemaName is ensures that the name is a valid postgres identifier. +func sanitizeSchemaName(schema string) string { + return pgx.Identifier{schema}.Sanitize() +} +``` + +After running some benchmarks we can see that we've reached ~20ms: + +``` +Environment Test Time +Windows Threadripper 2950X Container 2.86s ± 6% +Windows Threadripper 2950X Database 136ms ± 12% +Windows Threadripper 2950X Schema 26.7ms ± 3% +MacOS M1 Pro Container 1.63s ± 16% +MacOS M1 Pro Database 136ms ± 12% +MacOS M1 Pro Schema 19.7ms ± 20% +Linux Xeon Gold 6226R Container 2.24s ± 10% +Linux Xeon Gold 6226R Database 135ms ± 10% +Linux Xeon Gold 6226R Schema 29.2ms ± 16% +``` + + +## Final tweaks + +There's one important flag that you can adjust in Postgres to make it run faster... of course, this should only be used for testing. It's disabling [fsync](https://www.postgresql.org/docs/current/runtime-config-wal.html). + +The final results of the comparison look like: + +``` +Environment Test fsync Time +Windows Threadripper 2950X Container on 2.86s ± 6% +Windows Threadripper 2950X Container off 2.82s ± 4% +Windows Threadripper 2950X Database on 136ms ± 12% +Windows Threadripper 2950X Database off 105ms ± 30% +Windows Threadripper 2950X Schema on 26.7ms ± 3% +Windows Threadripper 2950X Schema off 20.5ms ± 5% +MacOS M1 Pro Container on 1.63s ± 16% +MacOS M1 Pro Container off 1.64s ± 13% +MacOS M1 Pro Database on 136ms ± 12% +MacOS M1 Pro Database off 105ms ± 30% +MacOS M1 Pro Schema on 19.7ms ± 20% +MacOS M1 Pro Schema off 18.5ms ± 31% +Linux Xeon Gold 6226R Container on 2.24s ± 10% +Linux Xeon Gold 6226R Container off 1.97s ± 10% +Linux Xeon Gold 6226R Database on 135ms ± 10% +Linux Xeon Gold 6226R Database off 74.2ms ± 10% +Linux Xeon Gold 6226R Schema on 29.2ms ± 16% +Linux Xeon Gold 6226R Schema off 15.3ms ± 15% +``` + +All the tests were run in a container that didn't have persistent disk mounted. The fsync=off would probably have a bigger impact with an actual disk. + +So for the conclusion, we looked at three different approaches to creating a clean Postgres environment. The approaches aren't completely equivalent, but use the fastest one that you can. + +‍ + diff --git a/app/(blog)/blog/introducing-drpc-our-replacement-for-grpc/26c69fe77df6e712.png b/app/(blog)/blog/introducing-drpc-our-replacement-for-grpc/26c69fe77df6e712.png new file mode 100644 index 000000000..28b240d00 Binary files /dev/null and b/app/(blog)/blog/introducing-drpc-our-replacement-for-grpc/26c69fe77df6e712.png differ diff --git a/app/(blog)/blog/introducing-drpc-our-replacement-for-grpc/e2c929baac38fe20.png b/app/(blog)/blog/introducing-drpc-our-replacement-for-grpc/e2c929baac38fe20.png new file mode 100644 index 000000000..d9403a45f Binary files /dev/null and b/app/(blog)/blog/introducing-drpc-our-replacement-for-grpc/e2c929baac38fe20.png differ diff --git a/app/(blog)/blog/introducing-drpc-our-replacement-for-grpc/page.md b/app/(blog)/blog/introducing-drpc-our-replacement-for-grpc/page.md new file mode 100644 index 000000000..bfcfbaba7 --- /dev/null +++ b/app/(blog)/blog/introducing-drpc-our-replacement-for-grpc/page.md @@ -0,0 +1,148 @@ +--- +author: + name: JT Olio and Jeff Wending +date: '2021-04-27 00:00:00' +heroimage: ./e2c929baac38fe20.png +layout: blog +metadata: + description: In 2016, Google launched gRPC, which has overall taken the systems + programming community by storm. gRPC stands for something with a G, Remote Procedure + Call; it's a mechanism for easily defining interfaces between two different remote + services. Building a new decentralized storage platform from the ground up in + Go, obviously, we considered using gRPC to simplify our development process in + peer-to-peer remote procedure calling. In fact, I'm not even sure we really considered + anything else. Fast forward to the latter half of 2019, and we had 170k lines + of Go, a beta network of over 4 PB, real live active users, and it turns out the + gRPC bed we made for ourselves was not all roses. So we rewrote gRPC and migrated + our live network. DRPC is an open-source, drop-in replacement that handles everything + we needed from gRPC (and most likely, everything you need) in under 3000 lines + of Go. It now powers our full network of tens of thousands of servers and countless + clients. + title: 'Introducing DRPC: Our Replacement for gRPC' +title: 'Introducing DRPC: Our Replacement for gRPC' + +--- + +In 2016, Google launched [gRPC](https://grpc.io/), which has overall taken the systems programming community by storm. gRPC stands for something with a G, Remote Procedure Call; it's a mechanism for easily defining interfaces between two different remote services. It's tightly bundled with [Protocol Buffers](https://developers.google.com/protocol-buffers) version 3 (another highly adopted data interchange specification from Google), and... it seems like everyone is using it. Wikipedia, Square, Netflix, IBM, Docker, Cockroach Labs, Cisco, Spotify, Dropbox, etc., all use gRPC. + + +Here at Storj, we’re pioneers in decentralized cloud storage. By early 2018, we built and scaled a 150 petabyte decentralized storage network. Of course, like every good scaling story, by the time we got to 150 petabytes, we discovered some fundamental architectural issues that needed to be reworked. Staring down the barrel of a few hundred thousand lines of untyped Javascript with... sort of decent test coverage, we made the [risky decision](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/) in March of 2018 to fix these architectural issues with a ground-up reimplementation in Go. We're calling this iteration our V3 network, which had its production launch in March of 2020. You can read all about our architecture in [our whitepaper](https://storj.io/storjv3.pdf), and you can [try out our live service](https://www.storj.io/). + + +*(Aside:* [*We’re hiring engineers!*](https://storj.io/careers/)*)*‍ + +Building a new decentralized storage platform from the ground up in Go, obviously, we considered using gRPC to simplify our development process in peer-to-peer remote procedure calling. In fact, I'm not even sure we really considered anything else. Using gRPC for us was a deliberate decision to [avoid using an innovation token](https://mcfunley.com/choose-boring-technology). How could gRPC be the wrong choice? It has impressive credentials and wide usage. We were always disappointed Google didn't release a standard RPC implementation with proto2. With an otherwise previously strongly positive experience with protocol buffers, we were excited to jump all in to the new protobuf RPC land. + +Fast forward to the latter half of 2019, and we had 170k lines of Go, a beta network of over 4 PB, real live active users, and it turns out the gRPC bed we made for ourselves was not all roses, and we kind of pooped just a little in it. Just a little bit. This much ->||<-. So not a lot, but still. + +So we rewrote gRPC and migrated our live network. [DRPC](https://storj.github.io/drpc/) is an open-source, drop-in replacement that handles everything we needed from gRPC (and most likely, everything you need) in under 3000 lines of Go. It now powers our full network of tens of thousands of servers and countless clients. + + +[Check out DRPC here!](https://storj.github.io/drpc/) + +# Where gRPC needs improvement + +Let’s just get out here and say what not enough people are saying—in a nutshell, gRPC has feature creep, bloat and is trying to solve too many problems. It’s overcomplicated and has become a dumping ground of features, tangled in a standards body-based web of crap.  + +Let’s go over the problems with gRPC one by one! + + +## Feature bloat + +Did you know that there are 40 different dial options? There are 26 server options. You have 13 call options. gRPC is huge. One major issue that led us away from gRPC was that it constituted over a fifth of our own (sizeable) binaries. + +Do you use gRPC’s built-in APIs for internal load balancing? You probably don’t—your load balancer probably does something else. Do you use manual name resolution builders? Do you find yourself uncertain what operations are synchronous or not? Wait, should we use WithBlock? gRPC tends to accrue features so that it tries to solve every problem, and with every solution comes more code to maintain, places for bugs to hide, and additional semantics to worry about. + +On the other hand, DRPC’s core is under 3000 lines! It’s a reasonable task to audit and understand it. + + +## Deprecation + + + + +![](./26c69fe77df6e712.png)‍ + +This tweet is from 2019, and as of today, in 2021, WithDefaultServiceConfig is still experimental, and WithBalancerName is still deprecated. + +At the time of this writing, there are 37 deprecation notices in the top-level [gRPC documentation](https://pkg.go.dev/google.golang.org/grpc). This makes it hard to understand what you’re supposed to use, what you’re not supposed to use, what things are available, etc. + + +## High resource usage + +81% of the heap usage of one of our Storage Nodes was in gRPC. You can’t use another protobuf library; there are a large number of allocations, you get its own HTTP/2 server, and the hits keep coming.  + + +A protobuf-based protocol has the fortunate ability to avoid a complicated string parsing and overhead of traditional, older protocols, such as HTTP, unless you use gRPC. Then you have to deal with HTTP/2 and all the legacy edge cases that may arise.  + + +## Opacity + +Pop quiz: imagine you’re trying to debug some issue that’s happening during dialing of connections, and you use . See if you can find where in the gRPC code base the function you provide there is called. + + +gRPC is at least 10 times more lines of code than DRPC, and it is safe to say some of the API has grown organically and is hard to reason about. + + +DRPC, meanwhile, is implemented to be lightweight, straightforward, clear, and easy to debug. + + +# Wait, but a rewrite? + +Yep! Considering we have 170k lines of Go, tightly integrated into both single and streaming request styles of gRPC, in 2019, we narrowed our options down to: + + +* [Thrift](https://thrift.apache.org/) +* [Twirp](https://github.com/twitchtv/twirp) +* Fork gRPC? +* Write our own + +We really wanted to avoid having to change every service already registered with gRPC. Again, the services were fine, and we just needed to change the connection layer. Thrift was a pretty big departure for pretty much all of our code, so we eliminated it. Maybe it would have been good to start with, but we judged by the cover and suspected it wasn’t the best place to start. + + +We could have eliminated Twirp for the same reason, but Twirp had another problem - we needed support for bidirectional streaming, and Twirp didn’t have it. + + +Forking gRPC may have been a good choice, but we would suddenly be responsible for all of it, as we ripped out the parts we didn’t need. Ripping out the overhead of HTTP/2 alone was by itself essentially a rewrite. It seemed like a simpler undertaking to start fresh. + + +So, we decided to time-box an experiment to write our own. The experiment was a smashing success. + +# DRPC + +[DRPC](https://storj.github.io/drpc/) is a code-wise drop-in replacement for the client/server interactions of gRPC. If you’re using gRPC today in Go, you should be able to swap your protocol buffer generation pipeline to DRPC and be on your way. If you already have proto3 .proto files, the protoc protobuf compiler can be told to generate DRPC code instead of gRPC code (or both, if you're migrating). + +DRPC supports a wide range of functionality in its spartan few thousand lines. DRPC is blazingly fast and lightweight (the protocol does not require HTTP header parsing, for example), it supports unitary and streaming requests, it has an HTTP/JSON gateway, it supports metadata for per-request side-channel information like tracing, it supports layering and middleware, etc. + +[Check out how easy our Quickstart documentation is!](https://storj.github.io/drpc/docs.html) + +Also be sure to check out the gRPC vs DRPC benchmarks on [our Github README](https://github.com/storj/drpc#readme). I want to specifically call out how much better with memory usage DRPC is. GC pressure sucks! When you have high performance servers, reducing GC pressure is always a good call. + +Also make sure to see more examples at   + +It's worth pointing out that DRPC is *not* the same protocol as gRPC, and DRPC clients cannot speak to gRPC servers and vice versa. + +# Migration + +One major challenge we faced was that we already had gRPC deployed. We needed to support both DRPC and gRPC clients for a transition period until everything understood DRPC. + + +As a result, we wrote (and included in DRPC) migration helpers that allow you to listen for and respond to DRPC and gRPC requests on the same port. Make sure to check out and our gRPC and DRPC example: + + +Here were our transition steps: + + +1. Release and deploy new server code that understands both gRPC and DRPC concurrently. With DRPC, this was a breeze since all of our application code could be used identically with both, and our ListenMux allowed us to do both from the same server port. +2. Once all the servers and Nodes were updated, release and deploy new clients that spoke DRPC instead of gRPC. +3. Once all of the old clients were gone, we removed the gRPC code. + +We immediately eliminated a whole class of inscrutable WAN network errors (our availability metrics went up), improved performance and reduced CPU and resource utilization, reduced our binary sizes significantly, and have overall been much, much happier. + +# We're open source + +DRPC, like almost everything else at Storj, is open source. DRPC is MIT/expat licensed, and we’d love your help! Since we currently only have Go bindings for DRPC, bindings for new languages would be a great place to start. + +Feel free to check out [our Github repo](https://github.com/storj/drpc) and let us know if we can help you dive in! + + diff --git a/app/(blog)/blog/lensm/0254dd9cb3223a7d.png b/app/(blog)/blog/lensm/0254dd9cb3223a7d.png new file mode 100644 index 000000000..a0877387a Binary files /dev/null and b/app/(blog)/blog/lensm/0254dd9cb3223a7d.png differ diff --git a/app/(blog)/blog/lensm/0b52083cb5116034.png b/app/(blog)/blog/lensm/0b52083cb5116034.png new file mode 100644 index 000000000..06e4deb89 Binary files /dev/null and b/app/(blog)/blog/lensm/0b52083cb5116034.png differ diff --git a/app/(blog)/blog/lensm/32e9beff555091f9.png b/app/(blog)/blog/lensm/32e9beff555091f9.png new file mode 100644 index 000000000..23d1ad877 Binary files /dev/null and b/app/(blog)/blog/lensm/32e9beff555091f9.png differ diff --git a/app/(blog)/blog/lensm/41b000780c0ec8d1.png b/app/(blog)/blog/lensm/41b000780c0ec8d1.png new file mode 100644 index 000000000..ceeeddabb Binary files /dev/null and b/app/(blog)/blog/lensm/41b000780c0ec8d1.png differ diff --git a/app/(blog)/blog/lensm/5513fe38f6ea01e0.png b/app/(blog)/blog/lensm/5513fe38f6ea01e0.png new file mode 100644 index 000000000..0f3c10b4b Binary files /dev/null and b/app/(blog)/blog/lensm/5513fe38f6ea01e0.png differ diff --git a/app/(blog)/blog/lensm/5c968a28c01d3c08.png b/app/(blog)/blog/lensm/5c968a28c01d3c08.png new file mode 100644 index 000000000..d9881849e Binary files /dev/null and b/app/(blog)/blog/lensm/5c968a28c01d3c08.png differ diff --git a/app/(blog)/blog/lensm/6246319ebda3c85f.png b/app/(blog)/blog/lensm/6246319ebda3c85f.png new file mode 100644 index 000000000..d1c2f3fe3 Binary files /dev/null and b/app/(blog)/blog/lensm/6246319ebda3c85f.png differ diff --git a/app/(blog)/blog/lensm/71bd197bec7fd5d9.png b/app/(blog)/blog/lensm/71bd197bec7fd5d9.png new file mode 100644 index 000000000..ab4397bc2 Binary files /dev/null and b/app/(blog)/blog/lensm/71bd197bec7fd5d9.png differ diff --git a/app/(blog)/blog/lensm/88a2dddc651c9576.png b/app/(blog)/blog/lensm/88a2dddc651c9576.png new file mode 100644 index 000000000..dc213a25a Binary files /dev/null and b/app/(blog)/blog/lensm/88a2dddc651c9576.png differ diff --git a/app/(blog)/blog/lensm/9cfb54d7e51f9d77.png b/app/(blog)/blog/lensm/9cfb54d7e51f9d77.png new file mode 100644 index 000000000..1501b36e7 Binary files /dev/null and b/app/(blog)/blog/lensm/9cfb54d7e51f9d77.png differ diff --git a/app/(blog)/blog/lensm/a4cb43cd5163de6e.png b/app/(blog)/blog/lensm/a4cb43cd5163de6e.png new file mode 100644 index 000000000..9c000a91b Binary files /dev/null and b/app/(blog)/blog/lensm/a4cb43cd5163de6e.png differ diff --git a/app/(blog)/blog/lensm/b50757548a5c14fe.png b/app/(blog)/blog/lensm/b50757548a5c14fe.png new file mode 100644 index 000000000..6dd5354fd Binary files /dev/null and b/app/(blog)/blog/lensm/b50757548a5c14fe.png differ diff --git a/app/(blog)/blog/lensm/c59c0fe3df775270.png b/app/(blog)/blog/lensm/c59c0fe3df775270.png new file mode 100644 index 000000000..40acf9f9d Binary files /dev/null and b/app/(blog)/blog/lensm/c59c0fe3df775270.png differ diff --git a/app/(blog)/blog/lensm/d20ef641424803bf.png b/app/(blog)/blog/lensm/d20ef641424803bf.png new file mode 100644 index 000000000..9e14ee676 Binary files /dev/null and b/app/(blog)/blog/lensm/d20ef641424803bf.png differ diff --git a/app/(blog)/blog/lensm/db316e6fe6d98305.png b/app/(blog)/blog/lensm/db316e6fe6d98305.png new file mode 100644 index 000000000..04585c5b5 Binary files /dev/null and b/app/(blog)/blog/lensm/db316e6fe6d98305.png differ diff --git a/app/(blog)/blog/lensm/e1ce73a8dfceabfd.jpeg b/app/(blog)/blog/lensm/e1ce73a8dfceabfd.jpeg new file mode 100644 index 000000000..a5184b0a4 Binary files /dev/null and b/app/(blog)/blog/lensm/e1ce73a8dfceabfd.jpeg differ diff --git a/app/(blog)/blog/lensm/page.md b/app/(blog)/blog/lensm/page.md new file mode 100644 index 000000000..0ad137688 --- /dev/null +++ b/app/(blog)/blog/lensm/page.md @@ -0,0 +1,116 @@ +--- +author: + name: Egon Elbre +date: '2022-07-18 00:00:00' +heroimage: ./e1ce73a8dfceabfd.jpeg +layout: blog +metadata: + description: "I couldn\u2019t find a great tool for viewing disassembly, so I wrote\ + \ it myself over the weekend." + title: Lensm, A Tool for Viewing Disassembly +title: Lensm, A Tool for Viewing Disassembly + +--- + +I couldn’t find a great tool for viewing disassembly, so I [wrote it myself over the weekend](https://github.com/loov/lensm). + +At Storj, we are constantly looking for ways to accelerate our team’s efficiency, and one of those is building the tools we need. + +One of the major things you will rub against when you delve into performance optimization is viewing the assembly that the compiler generates. It's usually not efficient to write assembly yourself, and it's better to try to coerce the compiler to produce the assembly you want. Here's my story of writing a little tool for viewing disassembly. + +# Getting Annoyed + +My story starts on a weekend when I was doing a bunch of tiny optimizations to the [Gio UI](https://gioui.org/) project. There are ways to view the assembly; one is to use **go tool objdump -s funcname** from the command line. However, it's rather difficult to see how the source code and assembly are related. + +![](./d20ef641424803bf.png)There is an excellent online tool for writing code and seeing the output [https://go.godbolt.org](https://go.godbolt.org/). The visuals are much clearer. + +The corresponding lines of code have the same color. When you hover over the specific lines of code, the corresponding assembly is also highlighted. + +![](./32e9beff555091f9.png)Compiler Explorer has many other nice features as well: sharing the result, compiling with different versions, diffing output from different compilers, and description of assembly instructions. The amount of different languages and compilers is staggering. + +Despite how nice Compiler Explorer is, it's still an online tool, and you need to copy-paste your relevant code to the explorer. + +After trying many times, my annoyance finally kicked in: + +*"Someone should've written this tool already–it shouldn't be too difficult."* + +Over the years of developing, I've found that getting annoyed is a rather excellent way to start a new project. + +# Disassembly + +The first step in the project was to have access to the disassembly. It would be wasteful to start a disassembler from scratch. I knew that **go tool objdump** could already do it, so maybe they have some library they are using. + +![](./71bd197bec7fd5d9.png)Indeed, they are using a library, but it's internal to the compiler. The internal library looks pretty nice to use as well. I guess I need to extract it for my own needs. Copying the relevant code and adjusting the import paths was grunt work, but I got it extracted. Luckily the license for the Go code is open-source. + +I needed to expose a little bit more information from the API to access the [necessary details](https://github.com/loov/lensm/commit/5bb596225accd3d6c0b4dbc13c4e6189c558c879#diff-1596bd8ceb74246828aacab827b39a33075c86baa627fbbeb7491bd31eef1169), but I got it working. Here's the debug print from the initial output: + +![](./5c968a28c01d3c08.png)Of course, extracting the internals means needing to keep it manually updated. I'm sure there was a tool to rewrite the paths and keep them automatically updated. Alternatively, maybe the Go project would accept a patch that exposes the information in some JSON format so the visualizer can call the appropriate Go compiler. But all of that is a project for another day. + +## Extracting Source Code + +The first important step was to figure out the relevant source code that needed to be loaded. This seems a relatively easy thing in concept. It's mainly "Collect the list of lines per source file". However, the gotcha is how to represent the data, and similarly, you probably don't want just the lines but also some of the surrounding code. + +This is the basic structure for representing source: + +![](./0b52083cb5116034.png)Every assembly function can have multiple associated **Source** files due to inlining. Similarly, the code needed from different files isn't contiguous, and you wouldn't want to show more than is required. + +Most of the data munging is: collect all the source lines, convert them into ranges, expand the ranges (for the surrounding context). We also need to do it in reverse: figure out which lines in disassembly correspond to the source code. Note that each source line can correspond to multiple disassembly lines, and they might not be contiguous. + +Once I got it working, I did a debug print of the relevant source lines: + +![](./db316e6fe6d98305.png)# Drawing Code + +I was trying to optimize the code for [Gio UI](https://gioui.org/), so of course, it was a natural choice for building a tool such as this. It has pretty lovely drawing capabilities that I'll need. + +The question was then, how should it be visualized. Compiler Explorer visualization is a great starting point. However, it's not as clear as I would like it to be. When starting the project, I already had a design in mind. There are many source diffing tools that offer visualizing related lines. For example, here is what Meld tool looks like: + +![](./b50757548a5c14fe.png)There are other tools such as Kompare, CodeCompare, Oxygen Compare that offer similar visualization. I really like how it shows how one side is related to the other. To draw the shape, we can use the following idea: + +![](./88a2dddc651c9576.png)*The purple lines show the final relation shape. The orange arrows show bezier curve handles.* + +Drawing the visuals seemed then straightforward: + +1. figure out the location of each line of the source and assembly; +2. draw the relation shape for each line of source and related assembly lines; +3. draw the text on top of the relation shapes. + +One difficult thing people encounter with such projects is: how to choose a random color such that they are distinct, visually pleasing, and code is easy to write. One nice trick I've picked up over time is this formula: + +*hue: index \* phi \* 2 \* PI, saturation: 60%, lightness: 60%* + +You can adjust the saturation and lightness between 50% to 90% to get different lightness and saturation. If you want a more pastel color look, you would use a lower saturation and higher lightness. For dark mode, you would use lightness below 30%. (The color selection assumes that hue is defined with the range 0 .. 2\*PI). There are a few variations of the hue selection: + +![](./9cfb54d7e51f9d77.png)As you can see, the 𝜑 = 1.618033988749… constant allows selecting values on a hue circle such that sequential numbers are different and won't repeat. If you want a smoother transition, then using i × 1/𝜑 works a treat. If you want more contrast, then i × 𝜑 × 2𝜋 is nicer. + +Once you put all these ideas together, you get the first output: + +![](./c59c0fe3df775270.png)I also added a small interaction – when you hover the mouse over a line of code, it highlights the relation shape. + +## Drawing Jumps + +The next thing I wanted to visualize was drawing jumps in the code. They are important from a performance perspective. It's relatively common for disassemblers to draw an arrow from the jump location to the destination. This brings up two problems, detecting the jumps, and figuring out how to draw the lines. + +Unfortunately, the objfile library disassembler doesn't expose the information whether the instruction is a jump and when it jumps, then where to. I didn't want to dig too deep into this, so I reached for the usual tool for this – regular expression matching. It seemed that all the jumps ended with a hex number, such as **JMP 0x123**... of course, that approach broke. On arm processors, they look like **BLS 38(PC)**. I added a special case for it for now, but it'll probably break again on some other platform. + +To draw the jumps initially, I just drew them like a stack. In other words, push the jump line to the sidebar when you encounter one and then pop it when it ends. Of course, that didn't look great due to overlapping lines: + +![](./6246319ebda3c85f.png)In some cases it even caused the lines to be on top of each other. I searched for a nice algorithm for drawing them; however, I came up empty. Finally, I decided to go with the following approach, sort all the jump ranges based on their starting and ending point. If multiple ranges start from the same location, the larger range is sorted first. Then divide the sidebar into lanes; every new range picks the first lane that is free – starting from the left. This ends up minimizing crossings. + +![](./0254dd9cb3223a7d.png)It's by no means ideal. It can still draw the jump line too far from the code. + +![](./5513fe38f6ea01e0.png)Or do this thing here: + +![](./41b000780c0ec8d1.png)But, these are things someone will fix some other day. + +# Summary + +After a few days of work, I have a nice tool for viewing disassembly. + +![](./a4cb43cd5163de6e.png)Choosing a name was also a struggle. I wanted it to be easily recognizable and searchable. I asked in Gophers #performance channel and Jan Mercl suggested "lensm," which is "lens" and "asm" smushed together. + +When you look at the code and think: "For a performance-conscious project, it doesn't look very efficient – allocations and suboptimal algorithms everywhere. Also, the code looks very messy." + +That's all true, but the goal was to get it done quickly. And, if I do need to optimize, then I have an extra tool in my toolbelt to optimize it. + +I'll still have a few things I want to add before I can call it sufficiently complete. Nevertheless, it's already functional, so give it a test run at . If you feel like something is missing, then come along for the ride and submit a patch; there have already been a few contributors. + diff --git a/app/(blog)/blog/open-source-and-open-data-storj-dcs-network-statistics/20487c1035ee937f.jpeg b/app/(blog)/blog/open-source-and-open-data-storj-dcs-network-statistics/20487c1035ee937f.jpeg new file mode 100644 index 000000000..464dd99e0 Binary files /dev/null and b/app/(blog)/blog/open-source-and-open-data-storj-dcs-network-statistics/20487c1035ee937f.jpeg differ diff --git a/app/(blog)/blog/open-source-and-open-data-storj-dcs-network-statistics/page.md b/app/(blog)/blog/open-source-and-open-data-storj-dcs-network-statistics/page.md new file mode 100644 index 000000000..9bff2de92 --- /dev/null +++ b/app/(blog)/blog/open-source-and-open-data-storj-dcs-network-statistics/page.md @@ -0,0 +1,84 @@ +--- +author: + name: Brandon Iglesias +date: '2021-08-24 00:00:00' +heroimage: ./20487c1035ee937f.jpeg +layout: blog +metadata: + description: "We recently began publicly exposing more data about the network in\ + \ a way that could be used on-demand and programmatically. If you missed it, we\ + \ have started publishing what we think is the most important network statistics\ + \ on our new Storj DCS Public Network Statistics page. Now, if you\u2019re a non-technical\ + \ person, this may not be what you expected. Here\u2019s an explanation of why\ + \ we took this approach." + title: 'Open Source and Open Data: Storj DCS Network Statistics' +title: 'Open Source and Open Data: Storj DCS Network Statistics' + +--- + +You might often see or hear us reference our company values. The fact of the matter is that our values—including openness, transparency, and empowering our community—are what drives us as a company and as individuals. Our values are our north star, so when faced with decisions or when we find ourselves at a crossroads, we often reexamine the situation through the lens of our company values.  + + +Our company value of Open means we’re committed to the free and open sharing of software, information, knowledge, and ideas. It’s been shown this kind of openness yields better results in the long run—not just for the company but for the industry and community as well. Open source software has been the cornerstone for innovations such as containers and microservices, private web browsing, and new databases that enable other powerful services. This is why [we are committed to open source software](https://www.storj.io/open-source).   + + +Since the launch of Storj DCS, our community has been asking for more statistics and data on the network. Some folks in our community have even found ways of reverse-engineering the network to derive statistics about it. A great example of this ingenuity is [Storj Net Info](https://storjnet.info/). Providing these statistics has always been a goal of ours, but the task has been lower on our priority roadmap list than delivering some other critical features that Storj DCS customers need.  + + +We [recently](https://forum.storj.io/t/publicly-exposed-network-data-official-statistics-from-storj-dcs-satellites/14103) began publicly exposing more data about the network in a way that could be used on-demand and programmatically. If you missed it, we have started publishing what we think is the most important network statistics on our new [Storj DCS Public Network Statistics](https://stats.storjshare.io/) page. Now, if you’re a non-technical person, this may not be what you expected. Here’s an explanation of why we took this approach.  + + +New members of our community often ask why don't we build a service like Dropbox or Google Drive instead of a cloud object storage service like Storj DCS. This is because we’re focused on providing the building blocks (underlying storage layer) for others to build those kinds of applications. By doing this, we can enable dozens of companies to build Dropbox-like services on Storj DCS (or easily migrate their existing applications to the service). + +We decided we wanted to take a similar approach with these statistics, so we’re exposing the data in JSON format instead of just providing a dashboard for people to view. On this page, you’ll find statistics such as the amount of data stored and transferred across the network and information about the Nodes on the network. The data on this page is automatically updated every hour so you can make time-series charts. + +You’ll also start seeing these statistics appear on various pages across the site, including our homepage and Node Operator page. These pages will be updated every hour when new data is published on the network statistics page. + + +The data we are exposing include the following statistics:  + + +### Statistics about stored and transferred data + +* **bandwidth\_bytes\_downloaded** - number of bytes downloaded (egress) from the network for the last 30 days +* **bandwidth\_bytes\_uploaded** - number of bytes uploaded (ingress) to the network for the last 30 days +* **storage\_inline\_bytes** - number of bytes stored in inline segments on the Satellite +* **storage\_inline\_segments** - number of segments stored inline on the Satellite +* **storage\_median\_healthy\_pieces\_count** - median number of healthy pieces per segment stored on Storage Nodes +* **storage\_min\_healthy\_pieces\_count** - minimum number of healthy pieces per segment stored on Storage Nodes +* **storage\_remote\_bytes** - number of bytes stored on Storage Nodes (it does not take into account the expansion factor of erasure encoding) +* **storage\_remote\_segments** - number of segments stored on Storage Nodes +* **storage\_remote\_segments\_lost** - number of irreparable segments lost from Storage Nodes +* **storage\_total\_bytes** - total number of bytes (both inline and remote) stored on the network +* **storage\_total\_object**s - total number of objects stored on the network +* **storage\_total\_pieces** - total number of pieces stored on Storage Nodes +* **storage\_total\_segments** - total number of segments stored on Storage Nodes +* **storage\_free\_capacity\_estimate\_bytes** - a statistical estimate of free Storage Node capacity, with suspicious values removed + +### Statistics about Storage Nodes + +* **active\_nodes** - number of Storage Nodes that were successfully contacted within the last 4 hours, excludes disqualified and exited Nodes +* **disqualified\_nodes** - number of disqualified Storage Nodes +* **exited\_nodes** - number of Storage Nodes that gracefully exited the Satellite, excludes disqualified Nodes +* **offline\_nodes** - number of Storage Nodes that were not successfully contacted within the last four hours, excludes disqualified and exited Nodes +* **suspended\_nodes** - number of suspended Storage Nodes, excludes disqualified and exited Nodes +* **total\_nodes** - total number of unique Storage Nodes that ever contacted the Satellite +* **vetted\_nodes** - number of vetted Storage Nodes, excludes disqualified and exited Nodes +* **full\_nodes** - number of Storage Nodes without free disk + +### Statistics about user accounts + +* **registered\_accounts** - number of registered user accounts + +‍ + + +Since we launched this, one of our community members built this really cool [grafana dashboard](https://storjstats.info/d/storj/storj-network-statistics?orgId=1). Check it out. We’ll be sharing more about this and other community-built dashboards in the coming weeks, but we hope that exposing this data will continue to enable others to build amazing things like this! + +‍ +As we continue to expand on the data points we expose, we’ll be adding more of this data to our [website](http://storj.io) as well. If you have any ideas or suggestions on what else we should be exposing, please open a GitHub [issue](https://github.com/storj/stats/issues) in the [repository](https://github.com/storj/stats) for this project. + + + + + diff --git a/app/(blog)/blog/our-3-step-interview-process-for-engineering-candidates/14f72bac7573e10c.png b/app/(blog)/blog/our-3-step-interview-process-for-engineering-candidates/14f72bac7573e10c.png new file mode 100644 index 000000000..cf3198eaf Binary files /dev/null and b/app/(blog)/blog/our-3-step-interview-process-for-engineering-candidates/14f72bac7573e10c.png differ diff --git a/app/(blog)/blog/our-3-step-interview-process-for-engineering-candidates/page.md b/app/(blog)/blog/our-3-step-interview-process-for-engineering-candidates/page.md new file mode 100644 index 000000000..84c4f6459 --- /dev/null +++ b/app/(blog)/blog/our-3-step-interview-process-for-engineering-candidates/page.md @@ -0,0 +1,151 @@ +--- +author: + name: JT Olio +date: '2019-03-25 00:00:00' +heroimage: ./14f72bac7573e10c.png +layout: blog +metadata: + description: In case you hadn't heard, Storj Labs is building a decentralized cloud + object storage service. Why would we do such a challenging thing? At a basic level, + it's because we believe the internet can be better than it currently is and we + see how to improve it. We believe your data is worse off being ... + title: Our 3-Step Interview Process for Engineering Candidates +title: Our 3-Step Interview Process for Engineering Candidates + +--- + +In case you hadn't heard, Storj Labs is building a decentralized cloud object storage service. Why would we do such a challenging thing? At a basic level, it's because we believe the internet can be better than it currently is and we see how to improve it. We believe your data is worse off being stored in the centralized data centers of five multinational mega-companies. + +Solving this grand problem requires us to solve many difficult sub-problems. Even though we are rigidly focusing on the simplest ways to solve these problems, the simplest solutions that can work in our space are still intensely complicated. To build our decentralized service, we need smart ideas and a capable team that isn’t afraid of solving challenges for the first time. If this sounds like fun to you, then you might be a good fit to join us on this adventure! + +Today I want to introduce you to our engineering interview process. + +### Hiring Values + +Before we talk about the details, it's important to list what we value, and what our goals are. We have two primary and co-equal objectives with recruiting: + +* Build the most diverse, inclusive, and welcoming team we can. +* Build the strongest technical team we can. + +We'll talk about being welcoming first. + +#### Diversity and Inclusion + +First off, pursuing a diverse team is the right thing to do, full stop. We recognize that the tech industry can be—and has been—an inaccessible place for people of underrepresented groups, so we value gate-opening (versus gate-keeping) wherever we can. + +Fortunately, a welcoming, inclusive, diverse team has a number of benefits that make it the clear choice even if it wasn't clearly the right thing to do! Monocultures breed all sorts of wacky pathologies, which is precisely the sort of thing we're fighting against technically with our decentralized storage platform! + +Did you know that [workforces with more gender, ethnic, or racial diversity tend to perform better financially](https://www.mckinsey.com/business-functions/organization/our-insights/why-diversity-matters)? According to McKinsey: + + +> Companies in the top quartile for racial and ethnic diversity are 35 percent more likely to have financial returns above their respective national industry medians. + +[According to Catalyst](https://www.catalyst.org/knowledge/bottom-line-corporate-performance-and-womens-representation-boards-20042008): + + +> Companies with sustained high representation of women board directors (WBD), defined as those with three or more WBD in at least four of five years, significantly outperformed those with sustained low representation by 84 percent on return on sales, by 60 percent on return on invested capital, and by 46 percent on return on equity. + +When you look at the hard data, it's clear that more diverse teams are more flexible, efficient, and effective. We want those benefits for our team too! + +As an aside, this is specifically a interviewing post, but diversity is not the end-goal of the journey. We said diversity and inclusion. Diversity is having many different backgrounds in your organization. Inclusion is having all of us feel welcome and wanting to be here. Literally centuries’ worth of company leadership advice talk about how important team morale and a sense of belonging is to all of us as human beings. + +So, it's not enough to just get employees through the door. To build a great team, you have to keep them! In a future blog, we’ll share more about what we're doing to support inclusion through our new Diversity & Inclusion Council. We’ll also share more about what diversity means to us because it’s more than just hiring more underrepresented people, such as women or LGBTQ employees. + +#### Technical aptitude + +To solve the hardest problems in decentralized storage, we absolutely need a world-class team. Great developers love learning, and therefore enjoy and seek out opportunities where they can learn from and grow thanks to others on their team. This means a world-class team tends to attract more world-class developers. Why do so many engineers want to work at, say, AmaGooSoft? It's not only about the money. + +Unfortunately, identifying which new recruits are world-class is very hard, and a lot of companies take lazy shortcuts. One such shortcut is hiring based on time-in-the-industry. Don't get me wrong, having more experience in distributed systems is a good thing for us, but requiring that someone has stuck around through the worst of our sometimes-polluted industry has too much of a filtering effect on our other top goal—a diverse and inclusive workplace. So, we need to find a more effective way to identify which candidates have top-shelf problem solving, communication, and programming skills, whether they have years of experience or are new to the industry. All backgrounds are welcome! + +In my experience, the best predictor of how successful someone will be at solving complex problems after six months on the job is their learning rate. We're specifically not looking for proficiency in a certain programming language, or knowledge of a specific skill, precisely because learning a language or skill should be the sort of thing our top candidates eat for breakfast. The best candidates are the sort of people who gravitate toward hard and confusing problems to tackle them and understand them, instead of flinching away from the discomfort of not knowing. We are building a decentralized cloud storage platform, after all. + +So, while we value experience, we also want to find candidates that demonstrate they are eager and adept at throwing themselves at hard problems they don't already know how to solve—and solving them. Great candidates are lifelong learners and this factors into our process. + +### Our process + +To achieve our two major values, we've had to make calls on a number of trade-offs. There are definitely downsides to our recruiting and interviewing process, but we feel that on-balance, the trade-offs are currently worth it. We're open to suggestions though! We’re constantly trying to learn and make improvements, and, considering interviewing ironically seems to have its own [Full Employment Theorem](https://en.wikipedia.org/wiki/Full_employment_theorem), we will probably be on this journey indefinitely. (Again, we believe in lifelong learning.) Given our values, if you can think of a way to help improve our process, please leave a comment or shoot us an [email](mailto:ask@storj.io)! + +#### Recruiting + +The recruiting stage of our pipeline is simply finding people who may want to come work for us. + +In an interview with Christiane Amanpour, Jon Stewart had this to say about diversity in comedy: + + +> It started out as a male-dominated field. It's not a particularly welcoming field—you sort of have to come out there and cut your teeth on it. I'll tell you a story: So we had, on The Daily Show, there was an article about us that said it was a sexist environment, we didn't have women writers. And I got very offended by that. I was very mad. I was like, "Are you saying I'm not a feminist?" I was raised by a single mother. She wore a T-shirt that said, "A woman needs a man like a fish needs a bicycle." And me and my brother were like, "I think we might be men?" So I was mad—how can they say such a thing? And I went back to the writers room, and I was like, "You believe this, Steve? What do you think, Greg? Dave? Tom? Mike?" And then I was like, Oooohhh. And it was right. + +But the reason it was right was not necessarily one that we had seen before. We had put in a system of getting writers where there were no names on it. We thought that's color-blind, gender-blind, et cetera. But what you don't realize is the system itself—the tributaries that feed us those submissions—is polluted itself… + + +> + +But do you see what I mean? It's a systemic issue, and I think what can mostly help change is when you open up new tributaries to bring in talent, and then they grow, and then they help grow their communities and tell their stories. + +Does this sound like the tech industry to you? It's not enough to simply say that you encourage candidates who might add diversity to your team to apply. You must find inputs to your recruiting funnel that represent greater diversity than the diversity in your field. It takes extra energy and effort. + +Like The Daily Show, the main technical evaluation stage of our interview process is name-blind. We are interested in hiring the best candidates based as much as possible on relevant qualifications alone. But Jon Stewart’s observation is a key insight to our recruiting process: if we only have a certain type of candidate, then all hires will be that same type of candidate! So while we don’t have quotas, we must continually analyze our incoming funnel of candidates to determine whether we are finding candidates with diverse backgrounds and experience. We are of course interested in hiring the most qualified candidate regardless of any other trait, but we also believe it is worth doing significant extra work to help build a diverse and inclusive workforce, which we believe will help Storj Labs be the best company it can be. This means that if our incoming pool of candidates does not represent these values, then we need to cast a wider net in our search. We reject the idea that diversity efforts are about lowering the bar--increasing diversity requires that we widen the net and include more people so we can raise the bar. + +We will continue to seek ways to improve our recruiting, and we welcome your thoughts on ways to make sure that everyone has an equal shot with Storj Labs. Again, if you have a suggestion on how we can improve, please let us know. + +#### Screening + +Each stage of our interview process takes work, so we want to quickly eliminate people from the process who aren't going to be the right fit for that position. + +In our screening stage, we want to ensure there won't be some sort of logistical problem. Is the candidate requesting more compensation than we have budgeted for the position? Is the candidate applying for the right job listing? Is the candidate lacking relevant job experience entirely? Where does the candidate expect to work, and will it require a relocation? Does the candidate seem communicative and not a jerk? Can we answer any questions? + +Hopefully most people sail on through this step. + +#### Name-blind homework problem + +This part of our interview process is the biggest, most time consuming, and potentially the most controversial, precisely because of how many trade-off decisions it makes. So, before we jump into what the downsides are, and why we do it anyway, let me just outline what we do. + +First, we invite candidates to an interview-specific Slack channel with a randomly chosen pseudonym. Originally we chose random animal species as part of the pseudonym, but we then switched to names of stars, because you're all stars! Interview candidates join the Slack channel to anonymously talk with the team. + +Second, the hiring manager posts a link to a homework problem in the channel. We are trying to select a problem that: + +1. is clear and concise, but with deep complexity +2. is fairly representative of day-to-day work +3. is considerably challenging, but preferably not due to the requirement of much additional existing knowledge +4. requires design discussion and architectural considerations prior to implementation +5. can be completed by our target candidates in under eight hours + +We don't set a deadline on the assignment because we want to be flexible with candidates' schedules. + +Third, we answer any questions the candidate has, discussing potential solutions and tradeoffs, and let the candidate get to work. + +Finally, we run the homework submission against tests and evaluate their assignment's code against a checklist. We pay $500 USD (in STORJ tokens) for problem solution submissions (whether or not they work completely). + +So, why do we do it this way? + +First, we want our interview to hinge on evaluating a candidate’s ability to do work in as real an environment as possible. We want them to use their IDE, their programming language, have access to documentation, relieve time pressure (if possible), and see how they communicate remotely (much of our own team is remote). We do our best to engage our candidates in a discussion about the problem; the Slack conversation is a big part of what we're looking for. So, we want to evaluate the candidate on work that is as much like real work as possible. + +Second, we use pseudonyms to let the candidate’s work stand for itself. At this point, we want this stage of our interview process to simply select the best possible candidates, independent of as many other factors as possible. This is how we use our focus on diversity to raise the bar—by including a wider pool of applicants, we can be even more selective at this stage. + +Third, we use a hard problem that isn’t completely specified. Just like real tickets we unfortunately file sometimes, the problem statement has assumptions that aren’t clear and require some level of additional requirements gathering and clarification. Just like international math tests, we want a problem that most people won’t ace. The greater the fidelity of the test, the more an excellent candidate will stand out. This, combined with our lack of specific experience criteria, is intended to allow inexperienced but sharp candidates the ability to shine. + +Fourth, we pay people. The major downside of our problem is it takes time, potentially time the candidate doesn't have. This is something we're pretty torn about. Unfortunately, a homework problem might eliminate people with busier home lives or who need to work elsewhere until they land the job with us, which is why we don’t set deadlines on the assignments. It's challenging to compress our hard problem into the span of an hour, but we're hopeful that compensating our candidates will help. + +Fifth, we try to grade as evenly and as routinely as possible. We have a checklist and we have a test harness. Even though we're already doing the interview at this stage name-blind, we still want to avoid as much bias as possible that might cause us to prefer anything besides what is explicitly stated as criteria in the homework problem description. + +#### Alternate option: name-blind technical work sample + +So, you’re probably thinking, “Some of the best candidates will most certainly balk at this huge hurdle you’re placing in their way.” To this we can only say, you’re right. Ultimately, we believe two things that make us pursue it anyway: we would rather have a process that occasionally rejects candidates that would have been a good fit than one that occasionally hires candidates that would have been a bad fit, and we believe great candidates are often just as interested in working on great teams (with stringent hiring criteria) as great teams are interested in hiring them. Many of our fantastic team members were more than willing to jump through this hurdle to join us! + +However, we do make a concession. For candidates that simply do not have the time for the assignment and will be forced to pass on a sweet job possibility with Storj Labs otherwise, we are happy to consider some other sample of work. If the candidate is able to provide us with some code sample they have created with sufficient complexity to be graded using our homework grading evaluation checklist, we will anonymize it and pass it to our review team. We will ask the candidate to explain the project to us in the anonymous Slack channel, so that we get a feel for how the candidate communicates. Of course, we expect that any candidates using this option will make sure that the sample is something that they have permission to share with us--please do not send us some third party’s intellectual property. Remember, you may not have the right to share a work sample with us even though you created the sample. Your work for another company likely belongs to that Company. We will reject any candidate who sends us third party IP that they do not have permission to share, or another person’s work as if it was their own. + +It’s a bit harder to grade these types of submissions evenly and fairly, so we are more picky with these types of submissions. + +#### Alternate option: white board interview + +If the rest of this post has failed to convince you of the benefits of our system, that’s okay! We are also willing to do a whiteboard interview with difficult algorithmic challenges. Let’s just say these won’t be [Fizz Buzz](https://blog.codinghorror.com/why-cant-programmers-program/)-style questions. These will be hard questions, and if you want to go this route (it is potentially the least time consuming), then we will be looking for you to dazzle us. We only leave this option available to be as flexible as possible, but be warned that we are the most picky for people who choose this option. We will certainly pass on some good candidates who choose this option. + +#### Team interview + +Our last phase is a team interview. This is just as much an opportunity for us as it is for the interview candidate. We want you to get to know the team! We want bidirectional question asking. Interview candidates should grill us on anything and everything (if they want). Otherwise, we’ll be asking performance-based interviewing questions. Sidenote: the U.S. Department of Veterans Affairs of all places has [the best collection of these we’ve found so far](https://www.va.gov/pbi/questions.asp)! + +Assuming no one finds any red flags in the team interview, the formal interview process is complete. We may have a few more follow-up questions, and you might as well. Sometimes, we must be forced to make hard choices between well-qualified candidates. But ideally, for candidates who make it this far, it’s time for onboarding, which is worth a separate blog post. + +### Final thoughts + +We’ve spent a lot of time thinking about and tweaking this process. Aside from the time our homework assignment takes to complete and grade, we feel like this is one of the better interview processes we’ve seen. Please [let us know](mailto:ask@storj.io) how we can improve! + diff --git a/app/(blog)/blog/production-concurrency/df0a3a1834547b61.jpeg b/app/(blog)/blog/production-concurrency/df0a3a1834547b61.jpeg new file mode 100644 index 000000000..2bff07d05 Binary files /dev/null and b/app/(blog)/blog/production-concurrency/df0a3a1834547b61.jpeg differ diff --git a/app/(blog)/blog/production-concurrency/page.md b/app/(blog)/blog/production-concurrency/page.md new file mode 100644 index 000000000..45df3f44d --- /dev/null +++ b/app/(blog)/blog/production-concurrency/page.md @@ -0,0 +1,1090 @@ +--- +author: + name: Egon Elbre +date: '2022-07-29 00:00:00' +heroimage: ./df0a3a1834547b61.jpeg +layout: blog +metadata: + description: Concurrency is one of those things that's easy to get wrong, even with + Go concurrency features. Let's review things you should consider while writing + a concurrency production code. + title: Production Ready Go Concurrency +title: Production Ready Go Concurrency + +--- + +Concurrency is one of those things that's easy to get wrong, even with Go concurrency features. Let's review things you should consider while writing a concurrency production code. + +The guide is split into three parts, each with a different purpose. First, we'll talk about "Rules of Thumb," which are usually the right thing to do. The second part is on what to use for writing concurrent code. And finally, we'll cover how to write your custom concurrency primitives. + +Before we start, I should mention that many of these recommendations will have conditions where they are not the best choice. The main situations are going to be performance and prototyping. + +### Avoid Concurrency + +I've seen many times people using concurrency where you should not use it. It should go without saying, don't add concurrency unless you have a good reason. + + +```go +var wg sync.WaitGroup + +wg.Add(1) +go serve(&wg) +wg.Wait() +``` +❯ + + +```go +serve() +``` +The concurrency here is entirely unnecessary, but I've seen this exact code in a repository. System without concurrency is much easier to debug, test and understand. + +People also add concurrency because they think it will speed up their program. In a production environment, you are handling many concurrent requests anyways, so making one part concurrent doesn't necessarily make the whole system faster. + +### Prefer Synchronous API + +A friend of the previous rule is to prefer synchronous API. As mentioned, non-concurrent code is usually shorter and easier to test and debug. + + +```go +server.Start(ctx) +server.Stop() +server.Wait() +``` +❯ + + +```go +server.Run(ctx) +``` +If you need concurrency when using something, it's relatively easy to make things concurrent. It's much more difficult to do the reverse. + +### Use -race and t.Parallel() + +There are two excellent Go features that help you shake out concurrency bugs from your code. + +First is -race, which enables the race detector to flag all the observed data races. It can be used with go test -race ./... or go build -race ./yourproject. See [Data Race Detector](https://go.dev/doc/articles/race_detector) for more details. + +Second mark your tests with t.Parallel(): + + +```go +func TestServer(t *testing.T) { + t.Parallel() + // ... +``` +This makes your tests run in parallel, which can speed them up, but it also means you are more likely to find a hidden shared state that doesn't work correctly in concurrent code. In addition to finding bugs in our codebases, we've also found them in third-party libraries. + +### No global variables + +Avoid global variables such as caches, loggers, and databases. + +For example, it's relatively common for people to use log.Println inside their service, and their testing output ends in the wrong location. + + +```go +func TestAlpha(t *testing.T) { + t.Parallel() + log.Println("Alpha") +} + +func TestBeta(t *testing.T) { + t.Parallel() + log.Println("Beta") +} +``` +The output from go test -v will look like: + + +``` +=== RUN TestAlpha +=== PAUSE TestAlpha +=== RUN TestBeta +=== PAUSE TestBeta +=== CONT TestAlpha +=== CONT TestBeta +2022/07/24 10:59:06 Alpha +--- PASS: TestAlpha (0.00s) +2022/07/24 10:59:06 Beta +--- PASS: TestBeta (0.00s) +PASS +ok test.test 0.213s +``` +Notice how the "Alpha" and "Beta" are out of place. The code under test should call t.Log for any testing needs; then, the log lines will appear in the correct location. There's no way to make it work with a global logger. + +### Know when things stop + +Similarly, it's relatively common for people to start goroutines without waiting for them to finish. *go* keyword makes starting goroutines very easy; however, it's not apparent that you also must wait for them to stop. + + +```go +go ListenHTTP(ctx) +go ListenGRPC(ctx) +go ListenDebugServer(ctx) +select{} +``` +❯ + + +```go +g, ctx := errgroup.WithContext(ctx) +g.Go(func() error { + return ListenHTTP(ctx) +} +g.Go(func() error { + return ListenGRPC(ctx) +} +g.Go(func() error { + return ListenDebugServer(ctx) +} +err := g.Wait() +``` +When you don't know when things stop, you don't know when to close your connections, databases, or log files. For example, some stray goroutine might use a closed database and cause panic. + +Similarly, when you wait for all goroutines to finish, you can detect scenarios when one of the goroutines has become indefinitely blocked. + +### Context aware code + +The next common issue is not handling context cancellation. It usually won't be a problem in the production system itself. It's more of an annoyance during testing and development. Let's imagine you have a time.Sleep somewhere in your code: + + +```go +time.Sleep(time.Minute) +``` +❯ + + +```go +tick := time.NewTimer(time.Minute) +defer tick.Stop() + +select { +case <-tick.C: +case <-ctx.Done(): + return ctx.Err() +} +``` +time.Sleep cannot react to any code, which means when you press Ctrl-C on your keyboard, it will stay on that line until it finishes. This can increase your test times due to some services shutting down slowly. Or, when doing upgrades on your servers, it can make them much slower to shut down. + +*The code for the waiting on the right is much longer, but we can write helpers to simplify it.* + +The other scenario where this cancellation comes up is long calculations: + + +```go +for _, f := range files { + data, err := os.ReadFile(f) + // ... +} +``` +❯ + + +```go +for _, f := range files { + if err := ctx.Err(); err != nil { + return err + } + + data, err := os.ReadFile(f) + // ... +} +``` +Here we can introduce a ctx.Err() call to check whether the context has been cancelled. Note ctx.Err() call is guaranteed to be concurrency safe, and it's not necessary to check ctx.Done() separately. + +### No worker pools + +People coming from other languages often resort to creating worker pools. It's one of those tools that's necessary when you are working with threads instead of goroutines. + +There are many reasons to not use worker pools: + +* They make stack traces harder to read. You'll end up having hundreds of goroutines that are on standby. +* They use resources even if they are not working. +* They can be slower than spawning a new goroutine. + +You can replace your worker pools with a goroutine limiter -- something that disallows from creating more than N goroutines. + + +```go +var wg sync.WaitGroup +defer wg.Wait() +queue := make(chan string, 8) +for k := 0; k < 8; k++ { + wg.Add(1) + go func() { + defer wg.Done() + for work := range queue { + process(work) + } + }() +} + +for _, work := range items { + queue <- work +} +close(queue) +``` +❯ + + +```go +var wg sync.WaitGroup +defer wg.Wait() +limiter := make(chan struct{}, 8) +for _, work := range items { + work := work + wg.Add(1) + limiter <- struct{}{} + go func() { + defer wg.Done() + defer func() { <-limiter }() + + process(work) + }() +} +``` +We'll later show how to make a limiter primitive easier to use. + +### No polling + +Polling another system is rather wasteful of resources. It's usually better to use some channel or signal to message the other side: + + +```go +lastKnown := 0 +for { + time.Sleep(time.Second) + t.mu.Lock() + if lastKnown != t.current { + process(t.current) + lastKnown = t.current + } + t.mu.Unlock() +} +``` +❯ + + +```go +lastKnown := 0 +for newState := range t.updates { + if lastKnown != newState { + process(newState) + lastKnown = newState + } +} +``` +Polling wastes resources when the update rates are slow. It also responds to changes slower compared to notifying directly. There are many ways to avoid polling, which could be a separate article altogether. + +*Of course, if you are making an external request and the external API is out of your control, you might not have any other choice than to poll.* + +### Defer unlocks and waits + +It's easy to forget an mu.Unlock, wg.Wait or close(ch). If you always defer them, it will be much easier to see when they are missing. + + +```go +for _, item := range items { + service.mu.Lock() + service.process(item) + service.mu.Unlock() +} +``` +❯ + + +```go +for _, item := range items { + func() { + service.mu.Lock() + defer service.mu.Unlock() + + service.process(item) + }() +} +``` +Even if your initial code is correct, then code modification can introduce a bug. For example, adding a return inside the loop after the mu.Lock() would leave the mutex locked. + +### Don’t expose your locks + +The larger the scope where the locks can be used, the easier it is to make a mistake. + + +```go +type Set[T any] struct { + sync.Lock + Items []T +} +``` +❯ + + +```go +type Set[T any] struct { + mu sync.Lock + items []T +} +``` +### Name your goroutines + +You can make your debugging and stack traces much nicer by adding names to your goroutines: + + +```go +labels := pprof.Labels("server", "grpc") +pprof.Do(ctx, labels, + func(ctx context.Context) { + // ... + }) +``` +There's an excellent article "[Profiler labels in Go](https://rakyll.org/profiler-labels/)", which explains how to use them. + +## Concurrency Primitives + +When it comes to writing production code, it's a bad idea to use some concurrency primitives directly in your code. They can be error-prone and make code much harder to reason about. + +When choosing primitives, prefer them in this order: + +1. no-concurrency +2. golang.org/x/sync/errgroup, golang.org/x/sync, sync.Once +3. custom primitive or another library +4. sync.Mutex in certain scenarios +5. select { + +However, many others are useful when used for implementing your custom primitives: + +5. sync.Map, sync.Pool (use a typesafe wrapper) +6. sync.WaitGroup +7. chan, go func() { +8. sync.Mutex, sync.Cond +9. sync/atomic + +If you are surprised that chan and go func() { are so low on the list, we'll show how people make tiny mistakes with them. + +### Common Mistake #1: go func() + + +```go +func (server *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) { + ... + // start an async operation + go func() { + res, err := server.db.ExecContext(r.Context(), "INSERT ...") + ... + }() + ... +} + +func main() { + ... + + db, err := openDB(ctx) + defer db.Close() + + err := server.Run(ctx) + ... +} +``` +Notice there's no guarantee that the goroutine finishes before the database is closed. This can introduce weird test failure, where you try to insert into a closed database. + +Similarly, another bug, r.Context() could be cancelled prematurely. Of course, this depends on the problem specifics, but when you start a background operation from your handler, you don't want the user to cancel it. + +### Primitive: sync.WaitGroup + +One of the solutions for starting goroutines is to use sync.WaitGroup. However, it also has quite a few problematic scenarios. + +Let's take a look at the first common mistake with sync.WaitGroup: + + +```go +func processConcurrently(item []*Item) { + var wg sync.WaitGroup + defer wg.Wait() + for _, item := range items { + item := item + go func() { + process(&wg, item) + }() + } +} + +func process(wg *sync.WaitGroup, item *Item) { + wg.Add(1) + defer wg.Done() + + ... +} +``` +Here the problem is that the processConcurrently can return before wg.Add is called. This means that we don't wait for all the goroutines to finish. + +The other scenario comes up when people incrementally change code: + + +```go +func processConcurrently(item []*Item) { + var wg sync.WaitGroup + wg.Add(len(items)) + defer wg.Wait() + for _, item := range items { + item := item + if filepath.Ext(item.Path) != ".go" { + continue + } + go func() { + defer wg.Done() + process(item) + }() + } +} +``` +Notice how we moved the call to wg.Done outside of the process, making it easier to track the full concurrency. However, due to the extra if filepath.Ext statement, the code is wrong. That check was probably added by someone else at a later time. Similarly, it's one of those cases where tests might easily miss the problem. + +To fully fix the code, it should look like this: + + +```go +func processConcurrently(item []*Item) { + var wg sync.WaitGroup + defer wg.Wait() + for _, item := range items { + item := item + if filepath.Ext(item.Path) != ".go" { + continue + } + wg.Add(1) + go func() { + defer wg.Done() + process(item) + }() + } +} +``` +If you don't see the following parts when someone is using sync.WaitGroup, then it probably has a subtle error somewhere: + + +```go +var wg sync.WaitGroup +defer wg.Wait() +... +for ... { + wg.Add(1) + go func() { + defer wg.Done() +``` +### Use golang.org/x/sync/errgroup + +Instead of sync.WaitGroup there's a better alternative that avoids many of these issues: + + +```go +func processConcurrently(item []*Item) error { + var g errgroup.Group + for _, item := range items { + item := item + if filepath.Ext(item.Path) != ".go" { + continue + } + g.Go(func() error { + return process(item) + }) + } + return g.Wait() +} +``` +errgroup.Group can be used in two ways: + + +```go +// on failure, waits other goroutines +// to stop on their own +var g errgroup.Group +g.Go(func() error { + return publicServer.Run(ctx) +}) +g.Go(func() error { + return grpcServer.Run(ctx) +}) +err := g.Wait() +``` + +```go +// on failure, cancels other goroutines +g, ctx := errgroup.WithContext(ctx) +g.Go(func() error { + return publicServer.Run(ctx) +}) +g.Go(func() error { + return grpcServer.Run(ctx) +}) +err := g.Wait() +``` +You can read [golang.org/x/sync/errgroup documentation](https://pkg.go.dev/golang.org/x/sync/errgroup#Group) for additional information. *Note, errgroup allows to limit the number of goroutines that can be started concurrently.* + +### Primitive: sync.Mutex + +Mutex is definitely a useful primitive, however you should be careful when you use it. I've seen quite often code that looks like: + + +```go +func (cache *Cache) Add(ctx context.Context, key, value string) { + cache.mu.Lock() + defer cache.mu.Unlock() + + cache.evictOldItems() + cache.items[key] = entry{ + expires: time.Now().Add(time.Second), + value: value, + } +} +``` +You might wonder, what's the problem here. It's appropriately locking and unlocking. The main problem is the call to cache.evictOldItemsand that it's not handling context cancellation. This means that requests could end up blocking behind cache.mu.Lock, and even if they are cancelled you would need to wait for it to get unlocked before you can return. + +Instead, you can use a chan \*state, which allows you to handle context cancellation properly: + + +```go +type Cache struct { + state chan *state +} + +func NewCache() { + content := make(chan *state, 1) + content <- &state{} + return Cache{state: content} +} + +func (cache *Cache) Add(ctx context.Context, key, value string) error { + select { + case <-ctx.Done(): + return ctx.Err() + case state := <-cache.state: + defer func() { cache.state <- state }() + + cache.evictOldItems() + cache.items[key] = entry{ + expires: time.Now().Add(time.Second), + value: value, + } + + return nil + } +} +``` +Even though the evictOldItems call is still there, it won't prevent other callers to Add to cancel their request. + +Use sync.Mutex only for cases where you need to hold the lock for a short duration. Roughly it means that the code is O(N) or better, and N is small. + +#### Primitive: sync.RWMutex + +sync.RWMutex has all the same problems as sync.Mutex. However, it can also be significantly slower. Similarly, it makes it easy to have data races when you write to variables during RLock. + +In your specific scenario, you should have benchmarks demonstrating that sync.RWMutex is faster than sync.Mutex. + +*Details: When there are a lot of readers and no writers, there's a cache contention between the readers because taking a read lock mutates a mutex, which is not scalable. A writer attempting to grab the lock blocks future readers from acquiring it, so long-lived readers with infrequent writers cause long delays of no work.* + +Either way, you should be able to demonstrate that your use of sync.RWMutex is helpful. + +### Primitive: chan + +Channels are valuable things in the Go language but are also error-prone. There are many ways to write bugs with them: + + +```go +const workerCount = 100 + +var wg sync.WaitGroup +workQueue := make(chan *Item) +defer wg.Wait() + +for i := 0; i < workerCount; i++ { + wg.Add(1) + go func() { + defer wg.Done() + for item := range workQueue { + process(item) + } + }() +} + +err := db.IterateItems(ctx, func(item *Item) { + workQueue <- item +}) +``` +This is probably one of the common ones... forgetting to close the channel. Channels also make the code harder to review compared to using higher-level primitives. + +Using chan for communicating between different "goroutine processes" in your application is fine; however, ensure that you handle context cancellations and shut down properly. Otherwise, it's easy to introduce a deadlock. + +### Few additional rules-of-thumb + +I've come to the conclusion that you should avoid these in your domain logic: + +* make(chan X, N) +* go func() +* sync.WaitGroup + +They are error-prone, and there are better approaches. It's clearer to write your own higher-level abstraction for your domain logic. Of course, having them isn't an "end-of-the-world" issue either. + +I should separately note that using "select" is usually fine. + +## Your own artisanal concurrency primitives + +I told you to avoid many things in domain code, so what should you do instead? + +If you cannot find an appropriate primitive from golang.org/x/sync or other popular libraries... you can write your own. + + +> Writing a separate concurrency primitive is easier to get right than writing ad hoc concurrency logic in domain code. + +There are many ways you can write such primitives. The following are merely examples of different ways how you can write them. + +### Sleeping + +Let's take a basic thing first, sleeping a bit: + + +```go +func Sleep(ctx context.Context, duration time.Duration) error { + t := time.NewTimer(duration) + defer t.Stop() + + select { + case <-t.C: + return nil + case <-ctx.Done(): + return ctx.Err() + } +} +``` +Here we need to ensure that we appropriately react to context cancellation so that we don't wait for a long time until we notice that context canceled the operation. Using this call is not much longer than time.Sleep itself: + + +```go +if err := Sleep(ctx, time.Second); err != nil { + return err +} +``` +### Limiter + +I've found plenty of cases where you must limit the number of goroutines. + + +```go +type Limiter struct { + limit chan struct{} + working sync.WaitGroup +} + +func NewLimiter(n int) *Limiter { + return &Limiter{limit: make(chan struct{}, n)} +} + +func (lim *Limiter) Go(ctx context.Context, fn func()) bool { + // ensure that we aren't trying to start when the + // context has been cancelled. + if ctx.Err() != nil { + return false + } + + // wait until we can start a goroutine: + select { + case lim.limit <- struct{}{}: + case <-ctx.Done(): + // maybe the user got tired of waiting? + return false + } + + lim.working.Add(1) + go func() { + defer func() { + <-lim.limit + lim.working.Done() + }() + + fn() + }() + + return true +} + +func (lim *Limiter) Wait() { + lim.working.Wait() +} +``` +This primitive is used the same way as errgroup.Group: + + +```go +lim := NewLimiter(8) +defer lim.Wait() +for _, item := range items { + item := item + started := lim.Go(ctx, func() { + process(item) + }) + if !started { + return ctx.Err() + } +} +``` +Of course, if your limited goroutines are dependent on each other, then it can introduce a deadlock. + +*AlsonNote that there's a potential "bug" with using such a Limiter. You must not call limiter.Go after you have called limiter.Wait, otherwise the goroutine can be started after limiter.Wait has returned. This can also happen with sync.WaitGroup and errgroup.Group. One way to avoid this problem is to disallow starting goroutines after limiter.Wait has been called. It probably makes sense to rename it to "limiter.Close" in that case.* + +#### Batch processing a slice + +Let's say you want to process a slice concurrently. We can use this limiter to start multiple goroutines with the specified batch sizes: + + +```go +type Parallel struct { + Concurrency int + BatchSize int +} + +func (p Parallel) Process(ctx context.Context, + n, process func(low, high int)) error { + + // alternatively, these panics could set a default value + if p.Concurrency <= 0 { + panic("concurrency must be larger than zero") + } + if p.BatchSize <= 0 { + panic("batch size must be larger than zero") + } + + lim := NewLimiter(p.Concurrency) + defer lim.Wait() + + for low := 0; low < n; low += p.BatchSize { + low, high := low, low + p.BatchSize + if high > n { + high = n + } + + started := lim.Go(ctx, func() { + process(low, high) + }) + if !started { + return ctx.Err() + } + } +} +``` +This primitive allows to hide the "goroutine management" from our domain code: + + +```go +var mu sync.Mutex +total := 0 + +err := Parallel{ + Concurrency: 8, + BatchSize: 256, +}.Process(ctx, len(items), func(low, high int) { + price := 0 + for _, item := range items[low:high] { + price += item.Price + } + + mu.Lock() + defer mu.Unlock() + total += price +}) +``` +### Running a few things concurrently + +Sometimes for testing, you need to start multiple goroutines and wait for all of them to complete. You can use errgroup for it; however, we can write a utility that makes it shorter: + + +```go +func Concurrently(fns ...func() error) error { + var g errgroup.Group + for _, fn := range fns { + g.Go(fn) + } + return g.Wait() +} +``` +A test can use it this way: + + +```go +err := Concurrently( + func() error { + if v := cache.Get(123); v != nil { + return errors.New("expected value for 123") + } + return nil + }, + func() error { + if v := cache.Get(256); v != nil { + return errors.New("expected value for 256") + } + return nil + }, +) +if err != nil { + t.Fatal(err) +} +``` +There are many variations of this. Should the function take ctx as an argument and pass it to the child goroutines? Should it cancel all the other functions via context cancellations when one error occurs? + +### Waiting for a thing + +Sometimes you want different goroutines to wait for one another: + + +```go +type Fence struct { + create sync.Once + release sync.Once + wait chan struct{} +} + +// init allows to use the struct without separate initialization. +func (f *Fence) init() { + f.create.Do(func() { + f.wait = make(chan struct{}) + }) +} + +// Release releases any waiting goroutines. +func (f *Fence) Release() { + f.init() + f.release.Do(func() { + close(f.wait) + }) +} + +// Released allows to write different select than +// `Fence.Wait` provides. +func (f *Fence) Released() chan struct{} { + f.init() + return f.wait +} + +// Wait waits for the fence to be released and takes into account +// context cancellation. +func (f *Fence) Wait(ctx context.Context) error { + f.init() + select { + case <-f.Released(): + return nil + case <-ctx.Done(): + return ctx.Err() + } +} +``` +When we use it together with Concurrently we can write code that looks like: + + +```go +var loaded Fence +var data map[string]int + +err := Concurrently( + func() error { + defer loaded.Release() + data = getData(ctx, url) + return nil + }, + func() error { + if err := loaded.Wait(ctx); err != nil { + return err + } + return saveToCache(data) + }, + func() error { + if err := loaded.Wait(ctx); err != nil { + return err + } + return processData(data) + }, +) +``` +### Protecting State + +Similarly, we quite often need to protect the state when concurrently modifying it. We've seen how sync.Mutex is sometimes error-prone and doesn't consider context cancellation. Let's write a helper for such a scenario. + + +```go +type Locked[T any] struct { + state chan *T +} + +func NewLocked[T any](initial *T) *Locked[T] { + s := &Locked[T]{} + s.state = make(chan *T, 1) + s.state <- initial + return s +} + +func (s *Locked[T]) Modify(ctx context.Context, fn func(*T) error) error { + if ctx.Err() != nil { + return ctx.Err() + } + + select { + case state := <-s.state: + defer func() { s.state <- state }() + return fn(state) + case <-ctx.Done(): + return ctx.Err() + } +} +``` +Then we can use it like: + + +```go +state := NewLocked(&State{Value: 123}) +err := state.Modify(ctx, func(state *State) error { + state.Value = 256 + return nil +}) +``` +### Async processes in a server + +Finally, let's take a scenario where we want to start background goroutines inside a server. + +Let's first write out the server code, how we would like to use it: + + +```go +func (server *Server) Run(ctx context.Context) error { + server.pending = NewJobs(ctx) + defer server.pending.Wait() + + return server.listenAndServe(ctx) +} + +func (server *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) { + ... + + started := server.pending.Go(r.Context(), + func(ctx context.Context) { + err := server.db.ExecContext(ctx, "INSERT ...") + ... + }) + if !started { + if r.Context().Err() != nil { + http.Error(w, "client closed request", 499) + return + } + http.Error(w, "shutting down", http.StatusServiceUnavailable) + return nil + } + + ... +} +``` +Then let's write the primitive: + + +```go +type Jobs struct { + root context.WithContext + group errgroup.Group +} + +func NewJobs(root context.Context) *Jobs { + return &Jobs{root: root} +} + +func (jobs *Jobs) Wait() { _ = jobs.group.Wait() } + +func (jobs *Jobs) Go(requestCtx context.Context, fn func(ctx context.Context)) bool { + // did the user cancel? + if requestCtx.Err() != nil { + return false + } + // let's check whether server is shutting down + if jobs.root.Err() != nil { + return false + } + + jobs.group.Go(func() error { + // Note, we use the root context and not the request context. + fn(jobs.root) + return nil + }) + + return true +} +``` +Of course, we can add a limiter, to prevent too many background workers to be started: + + +```go +type Jobs struct { + root context.WithContext + limit chan struct{} + group errgroup.Group +} + +func (jobs *Jobs) Go(requestCtx context.Context, fn func(ctx context.Context)) bool { + if requestCtx.Err() != nil || jobs.root.Err() != nil { + return false + } + select { + case <-requestCtx.Done(): + return false + case <-jobs.root.Done(): + return false + case jobs.limit <- struct{}{}: + } + + jobs.group.Go(func() error { + defer func() { <-jobs.limit }() + fn(ctx) + return nil + }) + + return true +} +``` +### Exercise: Retrying with backoff + +As a final exercise for the reader, you can try implementing a retry with backoff. The API for such a primitive can look like this: + + +```go +const ( + maxRetries = 10 + minWait = time.Second/10 + maxWait = time.Second +) + +retry := NewRetry(maxRetries, minWait, maxWait) +for retry.Next(ctx) { + ... +} +if retry.Err() != nil { + return retry.Err() +} +``` +Alternatively, it can be callback based: + + +```go +err := Retry(ctx, maxRetries, minWait, maxWait, + func(ctx context.Context) error { + ... + }) +``` +Additionally, consider where one would be better than the other. + +## Additional resources + +There are many resources that can help you delve deeper. + +You can find quite a lot of **our own custom primitives** at [**storj.io/common/sync2**](https://pkg.go.dev/storj.io/common/sync2). This package contains most of our synchronization primitives. It contains things like *Sleep* and *Concurrently*, but also more advanced things like *Cycle*, *ReadCache* and *Throttle*. We also have problem specific implementations of [**Combiner**](https://github.com/storj/storj/blob/main/satellite/metainfo/piecedeletion/combiner.go#L15) and [**Queue**](https://github.com/storj/storj/blob/6df867bb3d06240da139de145aaf88077572b4b8/satellite/metainfo/piecedeletion/queue.go#L10) that implement a combiner queue. This primitive allows to dial storage nodes, coalesce multiple deletion requests into a single request. + +One of the best talks about Go concurrency is "[**Rethinking Classical Concurrency Patterns**](https://www.youtube.com/watch?v=5zXAHh5tJqQ)" by **Bryan C. Mills**. He discusses problems with worker pools and sync.Cond in-depth. + +When you struggle with understanding data-races, then "[**Little Book of Semaphores**](https://greenteapress.com/wp/semaphores/)" by **Allen B. Downey** is an excellent resource. It contains many classic problems and exercises to get your brain noticing them. + +There has been also some research on the topic "[**Real-World Concurrency Bugs in Go**](https://songlh.github.io/paper/go-study.pdf)" by **Tengfei Tu** et. al. It contains many additional issues not mentioned in this post. + +‍ + diff --git a/app/(blog)/blog/storj-open-development-announcement/10fb6f750e522e1c.png b/app/(blog)/blog/storj-open-development-announcement/10fb6f750e522e1c.png new file mode 100644 index 000000000..9b6dd8fe6 Binary files /dev/null and b/app/(blog)/blog/storj-open-development-announcement/10fb6f750e522e1c.png differ diff --git a/app/(blog)/blog/storj-open-development-announcement/page.md b/app/(blog)/blog/storj-open-development-announcement/page.md new file mode 100644 index 000000000..55232b408 --- /dev/null +++ b/app/(blog)/blog/storj-open-development-announcement/page.md @@ -0,0 +1,67 @@ +--- +author: + name: Clement Sam +date: '2021-10-01 00:00:00' +heroimage: ./10fb6f750e522e1c.png +layout: blog +metadata: + description: Storj Open Development Announcement + title: Storj Open Development Announcement +title: Storj Open Development Announcement + +--- + + + + +As you all know, all our code for Storj V3 is [open source](https://github.com/storj/storj). Our team believes that openness and transparency are critical to everything we do at Storj.  We also value feedback from our Storage Node Operators and the community as a whole. Today we’re announcing that we intend to adopt an open development strategy for our Storage Node development to ensure that everyone can get involved in developing the Storage Nodes or contribute to the network. + + + + +As part of this, we’ve moved all our Storage Node Operator Jira tickets to GitHub, and new issues will be tracked in the [Storage Node Project Board](https://github.com/orgs/storj/projects/6#card-69201765) on GitHub. + + + + +Our goal is to do this in a way that embraces the well-established open source model that’s been working effectively for years: meaningful and positive contributions that align to long-standing, thoughtfully designed architecture and collaborative engineering. Together we seek the best outcome for all people who use Storj. + + + + +**Who can contribute?** + + + + +Everyone - Developers interested in contributing to the Storj Open Source Project are invited to contribute code or open issues with bug reports and feature requests.  + + + + +Contributions are not only restricted to issues with bug reports or feature requests. We’re open to pull requests in the form of bug fixes, tests, new features, UI enhancements, or bug reports in the form of PR with a failing test. Any pull requests that help the network and improve upon our open source software are welcome, reviewed, and accepted into the Storj platform. + + + + +Our primary focus for the Storage Node development is the multi-node dashboard. Any improvement is welcome. If you’re not running a Storage Node yourself, we can send you a list of nodeIDs that you can add to your dashboard. You can start with a multi-node dashboard filled with several months of data. + + + + +We also look forward to pull requests with UI tests with [go-rod](https://github.com/go-rod/rod).  + + + + +**What happens next?** + +If you’re a Storage Node Operator, a Storj customer, an open source enthusiast, or a developer interested in contributing to the Storj network, [we invite you to collaborate](https://github.com/storj/storj/contribute) and bring the best of Storj forward to continue to make the internet decentralized and secure for everyone. + +**Helpful links:** + +* **Getting Started**: +* **Setting up a local instance of the storj network using storj-sim**: + +**Storj V3 whitepaper**: + diff --git a/app/(blog)/blog/storj-open-development-part-2-whats-new/69c628262ae22ee5.png b/app/(blog)/blog/storj-open-development-part-2-whats-new/69c628262ae22ee5.png new file mode 100644 index 000000000..8c830c24b Binary files /dev/null and b/app/(blog)/blog/storj-open-development-part-2-whats-new/69c628262ae22ee5.png differ diff --git a/app/(blog)/blog/storj-open-development-part-2-whats-new/75072fcbffc78ed1.png b/app/(blog)/blog/storj-open-development-part-2-whats-new/75072fcbffc78ed1.png new file mode 100644 index 000000000..91cf61d86 Binary files /dev/null and b/app/(blog)/blog/storj-open-development-part-2-whats-new/75072fcbffc78ed1.png differ diff --git a/app/(blog)/blog/storj-open-development-part-2-whats-new/page.md b/app/(blog)/blog/storj-open-development-part-2-whats-new/page.md new file mode 100644 index 000000000..2adc9be94 --- /dev/null +++ b/app/(blog)/blog/storj-open-development-part-2-whats-new/page.md @@ -0,0 +1,56 @@ +--- +author: + name: Brandon Iglesias +date: '2022-03-31 00:00:00' +heroimage: ./69c628262ae22ee5.png +layout: blog +metadata: + description: "In October 2021, Storj announced we were going to adopt an open development\ + \ strategy for the storage node development efforts. The goal was to enable our\ + \ community\u2014and the wider open source community\u2014to contribute to the\ + \ development of the network. We started this effort by moving all node issues..." + title: Storj Open Development -Part 2 +title: Storj Open Development -Part 2 + +--- + +In October 2021, Storj [announced](https://www.storj.io/blog/storj-open-development-announcement) we were going to adopt an open development strategy for the storage node development efforts. The goal was to enable our community—and the wider open source community—to contribute to the development of the network. We started this effort by moving all node issues to the [storj Github repo](https://github.com/storj/storj/issues?q=is%3Aopen+is%3Aissue+label%3ASNO) and creating a public GitHub [project](https://github.com/orgs/storj/projects/6) to track them. This allows anyone in the [community](https://forum.storj.io/) to look at current and past issues, add comments, make code contributions, or open new issues. + +‍ + +Over the last six months, our engineering and product teams have been adopting this open development strategy for all our efforts. Our teams have moved from private Jira tickets to Github issues/projects to create more transparency with what we are working on and allow the community to make contributions. + +‍ + +In addition, we have made the Storj Network product road map public. The [product road map](https://github.com/storj) is a high-level overview of the features and functionality we plan on implementing over the next nine - twelve months. It is constantly evolving as we gain more input and insights from our customers and community, so keep in mind road map items are subject to change. + +‍ + +As a part of this open development initiative, before starting development the product team will now make product requirements documents (PRD) public, describing the problem/challenge we intend to solve for our customers. The engineering team uses the PRD to create a blueprint on how we intend to implement the functionality, and our QA team will develop a test plan. Through each of these steps, the documents will be published in our GitHub repositories where reviews and input from our community and customers are welcome and very much appreciated. + +‍ + +![](./75072fcbffc78ed1.png)‍ + +Team boards + +* Storj Network road map: +* Metainfo team: +* Edge team: +* Documentation: + +‍ + +\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ + +TLDR: + +We launched our open development strategy in October and have been making significant progress. Six months later, check out some of our initiatives: + +* Moved issue tracking to Github where it's open and transparent to everyone +* Created the [Storj Network public road map](https://github.com/orgs/storj/projects/23) also on Github +* Continuing our [Bug bounty program](#) +* Looking for more contributions from the community + +‍ + diff --git a/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/06a4d2d7af413c6a.png b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/06a4d2d7af413c6a.png new file mode 100644 index 000000000..7d3edceed Binary files /dev/null and b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/06a4d2d7af413c6a.png differ diff --git a/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/344a9e476aa2b1ea.png b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/344a9e476aa2b1ea.png new file mode 100644 index 000000000..d9267dee2 Binary files /dev/null and b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/344a9e476aa2b1ea.png differ diff --git a/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/4b79f2e770056a31.png b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/4b79f2e770056a31.png new file mode 100644 index 000000000..a4f4e53b3 Binary files /dev/null and b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/4b79f2e770056a31.png differ diff --git a/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/6de1c07a519802ed.png b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/6de1c07a519802ed.png new file mode 100644 index 000000000..5c5663386 Binary files /dev/null and b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/6de1c07a519802ed.png differ diff --git a/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/943dcd25f919a98b.png b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/943dcd25f919a98b.png new file mode 100644 index 000000000..db58bd8cf Binary files /dev/null and b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/943dcd25f919a98b.png differ diff --git a/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/c911680feffbe653.png b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/c911680feffbe653.png new file mode 100644 index 000000000..d9dcf7361 Binary files /dev/null and b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/c911680feffbe653.png differ diff --git a/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/e76c6da3bc2247e7.png b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/e76c6da3bc2247e7.png new file mode 100644 index 000000000..ee4454123 Binary files /dev/null and b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/e76c6da3bc2247e7.png differ diff --git a/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/page.md b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/page.md new file mode 100644 index 000000000..9e28e0f15 --- /dev/null +++ b/app/(blog)/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern/page.md @@ -0,0 +1,213 @@ +--- +author: + name: Marton Elek +date: '2022-03-07 00:00:00' +heroimage: ./344a9e476aa2b1ea.png +layout: blog +metadata: + description: "Data stored in Storj Decentralized Cloud Storage can be accessed in\ + \ multiple ways:With native \u201Cuplink\u201D protocol, which connects directly\ + \ to the nodes where the data is storedWith using S3 compatible REST API, using\ + \ an S3 gateway:Either the hosted S3 gateway, operated by Storj LabsOr with running\ + \ ..." + title: Use Storj DCS from Cloud-native Environments Using the Sidecar Pattern +title: Use Storj DCS from Cloud-native Environments Using the Sidecar Pattern +--- +Data stored in Storj Decentralized Cloud Storage can be accessed in multiple ways: + +1. With native “uplink” protocol, which connects directly to the nodes where the data is stored +2. With using S3 compatible REST API, using an S3 gateway: + 1. Either the hosted S3 gateway, operated by Storj Labs + 2. Or with running your own S3 gateways + +The easiest way is using the shared S3 gateway with any S3 compatible tool (2.2.) but this approach may also have disadvantages: +1. The encryption keys are shared with the gateway +2. All traffic is routed to the gateway before accessing the data on storage nodes +In a powerful server environment (with enough network and CPU bandwidth) it can be more reasonable to use the native protocol and access the storage nodes directly. However, native protocol is not supported by as many tools as the S3 protocol. + +Fortunately, in Kubernetes – thanks to the sidecar pattern – using the native protocol is almost as easy as using the shared gateway. +## Sidecar Pattern +The smallest deployable unit in Kubernetes is a pod. Pod is the definition of one or more containers with the attached volumes/resources/network usage. Typically, only one container is included in one pod, but sidecar patterns deploys an additional helper container to each pod. + +As the network namespace is shared inside the pod the main container can access the features of the sidecar container. +![](./6de1c07a519802ed.png)To follow this pattern, we should deploy a sidecar container to each of our application pods. + +Instead of using the hosted, multi-tenant version of the S3 gateway: + +![](./06a4d2d7af413c6a.png) + +We will start a single-tenant S3 gateway with each of the services: + +![](./943dcd25f919a98b.png) +## Getting Started +Let’s start with a simple example: we will create a Jupyter notebook which reads data from a Storj bucket for following data science calculations. + + +A Jupyter notebook can be deployed with the following simplified deployment: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: jupyter +spec: + replicas: 1 + selector: + matchLabels: + app: jupyter + template: + metadata: + labels: + app: jupyter + spec: + containers: + - name: jupyter + image: jupyter/base-notebook + ports: + - containerPort: 8888 + hostPort: 8888 +``` + +(Please note that we use **hostPort** here. In a real cluster, service (load balanced, nodeIp) or ingress definition would be required, depending on the environment.) +‍ + +After deploying this definition to a Kubernetes cluster, we can access the Jupyter notebook application and write our own notebooks. + +To open the Jupyter web application we need the secret token which is printed out to the standard output of the container: + +``` +kubectl logs -l app=jupyter -c jupyter --tail=-1 + +.... +[I 14:12:50.361 NotebookApp] Serving notebooks from local directory: /home/jovyan +[I 14:12:50.361 NotebookApp] Jupyter Notebook 6.4.6 is running at: +[I 14:12:50.362 NotebookApp] http://jupyter-7546dc9f8c-ww4hb:8888/?token=32bc4f4617fcad6001895c966ce8df539f5f71a243197d5d +[I 14:12:50.362 NotebookApp] or http://127.0.0.1:8888/?token=32bc4f4617fcad6001895c966ce8df539f5f71a243197d5d +[I 14:12:50.362 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). +[C 14:12:50.366 NotebookApp] + + To access the notebook, open this file in a browser: + file:///home/jovyan/.local/share/jupyter/runtime/nbserver-8-open.html + Or copy and paste one of these URLs: + http://jupyter-7546dc9f8c-ww4hb:8888/?token=32bc4f4617fcad6001895c966ce8df539f5f71a243197d5d + or http://127.0.0.1:8888/?token=32bc4f4617fcad6001895c966ce8df539f5f71a243197d5d +``` +‍ +And now we can create a new notebook where we use the Storj data: + +![](./4b79f2e770056a31.png) + +To read data via S3 protocol, we need *boto*, the python S3 library, which can be added to the docker image or installed as a first step in the notebook: + +```python +import subprocess +subprocess.run(["pip", "install","boto3", "pandas"]) +``` +‍ +Next, we can read/use files directly from Storj: + +```python +import boto3 +import pandas as pd +from io import StringIO + +session = boto3.session.Session() +s3_client = session.client( + 's3', + aws_access_key_id="...", + aws_secret_access_key="...", + endpoint_url="https://gateway.eu1.storjshare.io") + +response = client.get_object(Bucket="sidecar", Key="data.csv") +csv = response["Body"].read().decode('utf-8') +df = pd.read_csv(StringIO(csv)) + +df +``` +‍ +The approach uses the shared S3 gateway and requires access key and secret credentials generated as documented [here](docId:yYCzPT8HHcbEZZMvfoCFa). + +## Activating the Sidecar +Let’s improve the previous example by using the sidecar pattern. First, we need to generate an [*access grant*](docId:b4-QgUOxVHDHSIWpAf3hG) instead of the S3 credentials to access Storj data, and we should define any S3 credentials for our local, single-tenant S3 gateway: + + +![](./c911680feffbe653.png) + +Let’s create a Kubernetes secret with all of these: + +```bash +export ACCESS_GRANT=...generated_by_ui… + +kubectl create secret generic storj-gateway \ +--from-literal=storj-gateway-key=$(pwgen -n 18) \ +--from-literal=storj-gateway-secret=$(pwgen -n 18) \ +--from-literal=storj-access-grant=$ACCESS_GRANT +``` +‍ +Now we can enhance our Kubernetes deployment by adding one more container (put it under spec/template/spec/containers): + +```yaml + - name: storj-sidecar + image: storjlabs/gateway + args: + - run + env: + - name: STORJ_MINIO_ACCESS_KEY + valueFrom: + secretKeyRef: + name: storj-gateway + key: storj-gateway-key + - name: STORJ_MINIO_SECRET_KEY + valueFrom: + secretKeyRef: + name: storj-gateway + key: storj-gateway-secret + - name: STORJ_ACCESS + valueFrom: + secretKeyRef: + name: storj-gateway + key: storj-access-grant +``` +‍ +This container is configured to access the Storj API (using the STORJ\_ACCESS environment variable) and secured by STORJ\_MINIO\_ACCESS\_KEY and STORJ\_MINIO\_SECRET\_KEY. + +Now we can access any Storj object from our Storj bucket, but we can also make it more secure without hard-coding credentials during the initialization of the python S3 client. We should add two more environment variables to the existing Jupyter container to make it available for the client: + +```yaml + spec: + containers: + - name: jupyter + image: jupyter/base-notebook + ports: + - containerPort: 8888 + hostPort: 8888 + env: + - name: AWS_ACCESS_KEY_ID + valueFrom: + secretKeyRef: + name: storj-gateway + key: storj-gateway-key + - name: AWS_SECRET_ACCESS_KEY + valueFrom: + secretKeyRef: + name: storj-gateway + key: storj-gateway-secret + - name: storj-sidecar +``` + +With this approach, we can initialize the S3 client without hard-coding the credentials: + +```python +session = boto3.session.Session() +client = session.client( + 's3', + endpoint_url="http://localhost:7777") +``` + +Note: http://localhost:7777 is the address of the single-tenant Storj gateway which is running in the same pod. + +## Summary +Sidecar pattern is an easy way to access our data from Storj Decentralized Cloud Storage using the power of the native protocol, even if our application is compatible only with the S3 Rest API. +![](./e76c6da3bc2247e7.png) + + diff --git a/app/(blog)/blog/using-storj-dcs-with-github-actions/1da95eeb5ef706b6.png b/app/(blog)/blog/using-storj-dcs-with-github-actions/1da95eeb5ef706b6.png new file mode 100644 index 000000000..fe05553bf Binary files /dev/null and b/app/(blog)/blog/using-storj-dcs-with-github-actions/1da95eeb5ef706b6.png differ diff --git a/app/(blog)/blog/using-storj-dcs-with-github-actions/334785fee7e124be.png b/app/(blog)/blog/using-storj-dcs-with-github-actions/334785fee7e124be.png new file mode 100644 index 000000000..725b0578f Binary files /dev/null and b/app/(blog)/blog/using-storj-dcs-with-github-actions/334785fee7e124be.png differ diff --git a/app/(blog)/blog/using-storj-dcs-with-github-actions/87361ca4a0a843f4.png b/app/(blog)/blog/using-storj-dcs-with-github-actions/87361ca4a0a843f4.png new file mode 100644 index 000000000..29fffaa4f Binary files /dev/null and b/app/(blog)/blog/using-storj-dcs-with-github-actions/87361ca4a0a843f4.png differ diff --git a/app/(blog)/blog/using-storj-dcs-with-github-actions/9ccdc75d22b6993e.png b/app/(blog)/blog/using-storj-dcs-with-github-actions/9ccdc75d22b6993e.png new file mode 100644 index 000000000..7a79cb44f Binary files /dev/null and b/app/(blog)/blog/using-storj-dcs-with-github-actions/9ccdc75d22b6993e.png differ diff --git a/app/(blog)/blog/using-storj-dcs-with-github-actions/9fae43e73a66cba0.png b/app/(blog)/blog/using-storj-dcs-with-github-actions/9fae43e73a66cba0.png new file mode 100644 index 000000000..ca7f05502 Binary files /dev/null and b/app/(blog)/blog/using-storj-dcs-with-github-actions/9fae43e73a66cba0.png differ diff --git a/app/(blog)/blog/using-storj-dcs-with-github-actions/page.md b/app/(blog)/blog/using-storj-dcs-with-github-actions/page.md new file mode 100644 index 000000000..1be132734 --- /dev/null +++ b/app/(blog)/blog/using-storj-dcs-with-github-actions/page.md @@ -0,0 +1,169 @@ +--- +author: + name: Kaloyan Raev +date: '2021-08-31 00:00:00' +heroimage: ./9ccdc75d22b6993e.png +layout: blog +metadata: + description: GitHub Actions is their system to automate, customize, and execute + software development workflows in the GitHub repository. This article will inform + you how to upload files to a Storj DCS bucket from a GitHub Actions workflow.The + Storj DCS Public Network Stats is one of the projects at Storj wher... + title: Using Storj DCS with GitHub Actions +title: Using Storj DCS with GitHub Actions + +--- + +[GitHub Actions](https://docs.github.com/en/actions) is their system to automate, customize, and execute software development workflows in the GitHub repository. This article will inform you how to upload files to a Storj DCS bucket from a GitHub Actions workflow. + + +The [Storj DCS Public Network Stats](https://stats.storjshare.io/) is one of the projects at Storj where we use GitHub Actions. The statistics are hosted as a [static website](docId:GkgE6Egi02wRZtyryFyPz) on Storj DCS, so we have an easy way to redeploy the homepage when we merge any modification in the code repository. We created a GitHub Actions workflow that converts the Markdown file of the homepage to an HTML file and then uploads it to the bucket hosting the website. + + +GitHub Actions has a marketplace for actions created by the community. Instead of creating our own Storj-specific action to upload files to Storj DCS, we decided to keep it simple and use the [s3-sync-action](https://github.com/jakejarvis/s3-sync-action) that the community has already created. The s3-sync-action allows uploading files to an S3-compatible storage service, so we took advantage of Storj Gateway-MT - the globally available, multi-region hosted S3-compatible gateway. + + +Let’s break down the specific GitHub Actions workflow for the Storj DCS Public Network Stats project. [The complete workflow is here](https://github.com/storj/stats/blob/main/.github/workflows/upload-homepage.yml). + + +Every workflow starts with declaring its name: + + +``` + +# This is a workflow converts homepage.md to index.html +# and uploads it to the static website +name: upload homepage + +``` + + + + +Then follow the rules for triggering the workflow: + + + +``` + +# Controls when the workflow will run +on: + # Triggers the workflow only on push event to the main branch, + # but not for pull requests + push: + branches: [ main ] + # Triggers the workflow only if the homepage.md file has been # edited + paths: + - 'homepage.md' + +``` + + + + +In this case, the workflow triggers when a commit is merged to the main branch, and that commit modifies the homepage.md file. + + +Next, we have the definition of the job that will be run when the above event triggers: + + + +``` + +# A workflow run is made up of one or more jobs that can run +# sequentially or in parallel +jobs: + # This workflow contains a single job called "build" + build: + # The type of runner that the job will run on + runs-on: ubuntu-latest + # Steps represent a sequence of tasks that will be executed as +# part of the job + steps: + +``` + + + + +The job will run on an Ubuntu VM and will execute the following three steps: + +1. Check out the head of the GitHub repository + +``` + +# Checks-out your repository under $GITHUB_WORKSPACE, +# so your job can access it +- uses: actions/checkout@v2 + +``` +2. Convert the homepage.md file to index.html + +``` + +# Converts the homepage.md file to index.html +- uses: ZacJW/markdown-html-action@1.1.0 + with: + input_files: '[["homepage.md"]]' + output_files: '["index.html"]' + extensions: '[]' # Alas, this cannot be skipped even if empty + +``` +3. Upload the index.html file to the Storj DCS bucket. + +``` + +# Uploads the index.html file to the root of the destination bucket +- uses: jakejarvis/s3-sync-action@v0.5.1 + with: + # This is a workaround as SOURCE_DIR does not support + # a single file + args: --exclude '*' --include 'index.html' + env: + AWS_S3_ENDPOINT: ${{ secrets.AWS_S3_ENDPOINT }} + AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + +``` + +The destination bucket and the S3 credentials for s3-sync-action are configured through environment variables. In this case, we use [encrypted secrets](https://docs.github.com/en/actions/reference/encrypted-secrets) from the GitHub repository to keep this information private and safe from public eyes. + + +Encrypted secrets can be configured in the “Secrets” section of the repository settings.  + + +![](./9fae43e73a66cba0.png) + +All these secrets can be created via the Storj Satellite web interface. After logging in to the web interface, we make sure that the target bucket is already created. If not, the easiest way to create it is using the [Object Browser](docId:4oDAezF-FcfPr0WPl7knd). Then we set the name of the bucket as the AWS\_S3\_BUCKET secret in the Github repository. + + +Having the bucket created, next, we create S3 credentials that grant access to that bucket. This is done by [creating a new access grant from the web interface](docId:AsyYcUJFbO1JI8-Tu8tW3#generate-s3-compatible-credentials). + + +In the Permissions dialog, we make sure to limit the access only to the target bucket instead of giving access to the whole project. + +![](./87361ca4a0a843f4.png) + +In the Access Grant dialog, we click on the Generate S3 Gateway Credentials button. + + +![](./334785fee7e124be.png) + +This generates the S3 credentials for the access grant that can be used with Gateway-MT. + + +![](./1da95eeb5ef706b6.png) + +We use these credentials to set the remaining secrets in the Github repository: + +* AWS\_ACCESS\_KEY\_ID is set to the Access Key value +* AWS\_SECRET\_ACCESS\_KEY is set to the Secret Key value +* AWS\_S3\_ENDPOINT is set to the End Point value + +With this, everything is now complete to run the GitHub Actions workflow successfully. + + +If you have any questions, please feel free to reach out to us at [support@storj.io](mailto:support@storj.io) or visit .  + + diff --git a/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/29e62398803ba277.png b/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/29e62398803ba277.png new file mode 100644 index 000000000..57b959424 Binary files /dev/null and b/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/29e62398803ba277.png differ diff --git a/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/47f263c59b58c3b7.jpeg b/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/47f263c59b58c3b7.jpeg new file mode 100644 index 000000000..f359ee0ef Binary files /dev/null and b/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/47f263c59b58c3b7.jpeg differ diff --git a/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/6371747a59800613.jpeg b/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/6371747a59800613.jpeg new file mode 100644 index 000000000..804086caa Binary files /dev/null and b/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/6371747a59800613.jpeg differ diff --git a/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/page.md b/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/page.md new file mode 100644 index 000000000..2291d8fd3 --- /dev/null +++ b/app/(blog)/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map/page.md @@ -0,0 +1,50 @@ +--- +author: + name: Brandon Iglesias +date: '2020-10-21 00:00:00' +heroimage: ./6371747a59800613.jpeg +layout: blog +metadata: + description: At Storj Labs we're distributed system junkies. We enjoy building highly + distributed, ridiculously resilient software. The Storj Network is currently spread + across over 10,000 uncorrelated endpoints, and that number is growing fast.The + global substrate of diverse, uncorrelated endpoints across wh... + title: Visualizing Decentralized Data Distribution with the Linksharing Object Map +title: Visualizing Decentralized Data Distribution with the Linksharing Object Map + +--- + +At Storj Labs we're distributed system junkies. We enjoy building highly distributed, ridiculously resilient software. The Storj Network is currently spread across over 10,000 uncorrelated endpoints, and that number is growing fast. + +The global substrate of diverse, uncorrelated endpoints across which the network runs is unmatched by any other cloud provider. + +Storage Nodes run across a diverse distribution of operating systems, hardware types, geographic locations, and owners. Node Operator software runs on Linux systems like Ubuntu, CentOS, Debian, and Fedora, as well as macOS and Windows, with a native MSI installer. Storage Nodes run in Docker containers as well as compile to native binaries for ARM and AMD64. Hardware ranges from basic Raspberry Pis to QNAP NAS devices. + +Now that we have Storage Nodes across the world, we decided to build a simple visualization tool to showcase just how distributed and resilient the files stored on Tardigrade actually are. + +### Distributed Storage, Visualized + +Through this tool—which is called the Linksharing Object Map—our team and our community can visualize the geographic distribution of data uploaded to our Tardigrade service. This showcases how resilient the network is, as well as the wide geographic distribution of Nodes holding each object. + +We set out to build the Linksharing Object Map Dashboard at the start of the two-day Storj Labs employee hackathon and quickly productized and completed the project. + +![](./29e62398803ba277.png)![](./47f263c59b58c3b7.jpeg)Try it out yourself by generating access for an object, and creating a link share for the URL, [like outlined in our documentation](https://documentation.tardigrade.io/getting-started/uploading-your-first-object/view-distribution-of-an-object). This process will generate a link with a macaroon (embedded, [hash-based logic](https://storj.io/blog/2019/12/secure-access-control-in-the-decentralized-cloud/)) that controls how the object can be accessed. + +See an example of the Node map yourself, here: [Link share Object Map](https://bit.ly/31qVdyc) + +### Uplink Visualizer: The tech and how it works + +The Uplink Visualizer is a simple GoLang application that ingests, transforms, and visualizes client-side data from the uplink client. The tool grabs the IP addresses of the Storage Nodes holding pieces for a given object and displays them on a map. + +Essentially, there's an endpoint on the Satellite that will return the IP address of all the Storage Nodes holding pieces for an object on the network if you have permission to download it with your API key. + +We use [MaxMind](https://github.com/maxmind/) to convert the list of the IP addresses to their corresponding global longitudes and latitudes. We then use [LeafletJS](https://leafletjs.com/) to message the geo locations to be displayed on the leaflet js map. + +### Try it yourself, and be the cloud! + +We're excited for you to try the Uplink Object Map and check out its [code](https://github.com/storj/linksharing). It's licensed under Apache-2.0 License, and because it's open source (like most of the code we produce), you can also contribute to the tool as well. If you have any feedback on the visualizer or find it useful, please let us know at . + +Finally, if you like the look and feel of the tool, please let the world know, and tweet it to us @storjproject—we always reshare and promote our community's content and efforts! + +Please note that this is a GENERIC location so the Storage Nodes actual location is not disclosed. A user is ONLY able to get this information for a file if they have permission to download it. + diff --git a/package-lock.json b/package-lock.json index e54b0f40d..11e22f9c7 100644 --- a/package-lock.json +++ b/package-lock.json @@ -438,9 +438,9 @@ } }, "node_modules/@next/env": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/env/-/env-14.0.2.tgz", - "integrity": "sha512-HAW1sljizEaduEOes/m84oUqeIDAUYBR1CDwu2tobNlNDFP3cSm9d6QsOsGeNlIppU1p/p1+bWbYCbvwjFiceA==" + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/env/-/env-14.2.4.tgz", + "integrity": "sha512-3EtkY5VDkuV2+lNmKlbkibIJxcO4oIHEhBWne6PaAp+76J9KoSsGvNikp6ivzAT8dhhBMYrm6op2pS1ApG0Hzg==" }, "node_modules/@next/eslint-plugin-next": { "version": "14.0.2", @@ -452,9 +452,9 @@ } }, "node_modules/@next/swc-darwin-arm64": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-darwin-arm64/-/swc-darwin-arm64-14.0.2.tgz", - "integrity": "sha512-i+jQY0fOb8L5gvGvojWyZMfQoQtDVB2kYe7fufOEiST6sicvzI2W5/EXo4lX5bLUjapHKe+nFxuVv7BA+Pd7LQ==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-darwin-arm64/-/swc-darwin-arm64-14.2.4.tgz", + "integrity": "sha512-AH3mO4JlFUqsYcwFUHb1wAKlebHU/Hv2u2kb1pAuRanDZ7pD/A/KPD98RHZmwsJpdHQwfEc/06mgpSzwrJYnNg==", "cpu": [ "arm64" ], @@ -467,9 +467,9 @@ } }, "node_modules/@next/swc-darwin-x64": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-darwin-x64/-/swc-darwin-x64-14.0.2.tgz", - "integrity": "sha512-zRCAO0d2hW6gBEa4wJaLn+gY8qtIqD3gYd9NjruuN98OCI6YyelmhWVVLlREjS7RYrm9OUQIp/iVJFeB6kP1hg==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-darwin-x64/-/swc-darwin-x64-14.2.4.tgz", + "integrity": "sha512-QVadW73sWIO6E2VroyUjuAxhWLZWEpiFqHdZdoQ/AMpN9YWGuHV8t2rChr0ahy+irKX5mlDU7OY68k3n4tAZTg==", "cpu": [ "x64" ], @@ -482,9 +482,9 @@ } }, "node_modules/@next/swc-linux-arm64-gnu": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-gnu/-/swc-linux-arm64-gnu-14.0.2.tgz", - "integrity": "sha512-tSJmiaon8YaKsVhi7GgRizZoV0N1Sx5+i+hFTrCKKQN7s3tuqW0Rov+RYdPhAv/pJl4qiG+XfSX4eJXqpNg3dA==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-gnu/-/swc-linux-arm64-gnu-14.2.4.tgz", + "integrity": "sha512-KT6GUrb3oyCfcfJ+WliXuJnD6pCpZiosx2X3k66HLR+DMoilRb76LpWPGb4tZprawTtcnyrv75ElD6VncVamUQ==", "cpu": [ "arm64" ], @@ -497,9 +497,9 @@ } }, "node_modules/@next/swc-linux-arm64-musl": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-musl/-/swc-linux-arm64-musl-14.0.2.tgz", - "integrity": "sha512-dXJLMSEOwqJKcag1BeX1C+ekdPPJ9yXbWIt3nAadhbLx5CjACoB2NQj9Xcqu2tmdr5L6m34fR+fjGPs+ZVPLzA==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-musl/-/swc-linux-arm64-musl-14.2.4.tgz", + "integrity": "sha512-Alv8/XGSs/ytwQcbCHwze1HmiIkIVhDHYLjczSVrf0Wi2MvKn/blt7+S6FJitj3yTlMwMxII1gIJ9WepI4aZ/A==", "cpu": [ "arm64" ], @@ -512,9 +512,9 @@ } }, "node_modules/@next/swc-linux-x64-gnu": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-linux-x64-gnu/-/swc-linux-x64-gnu-14.0.2.tgz", - "integrity": "sha512-WC9KAPSowj6as76P3vf1J3mf2QTm3Wv3FBzQi7UJ+dxWjK3MhHVWsWUo24AnmHx9qDcEtHM58okgZkXVqeLB+Q==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-linux-x64-gnu/-/swc-linux-x64-gnu-14.2.4.tgz", + "integrity": "sha512-ze0ShQDBPCqxLImzw4sCdfnB3lRmN3qGMB2GWDRlq5Wqy4G36pxtNOo2usu/Nm9+V2Rh/QQnrRc2l94kYFXO6Q==", "cpu": [ "x64" ], @@ -527,9 +527,9 @@ } }, "node_modules/@next/swc-linux-x64-musl": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-linux-x64-musl/-/swc-linux-x64-musl-14.0.2.tgz", - "integrity": "sha512-KSSAwvUcjtdZY4zJFa2f5VNJIwuEVnOSlqYqbQIawREJA+gUI6egeiRu290pXioQXnQHYYdXmnVNZ4M+VMB7KQ==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-linux-x64-musl/-/swc-linux-x64-musl-14.2.4.tgz", + "integrity": "sha512-8dwC0UJoc6fC7PX70csdaznVMNr16hQrTDAMPvLPloazlcaWfdPogq+UpZX6Drqb1OBlwowz8iG7WR0Tzk/diQ==", "cpu": [ "x64" ], @@ -542,9 +542,9 @@ } }, "node_modules/@next/swc-win32-arm64-msvc": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-win32-arm64-msvc/-/swc-win32-arm64-msvc-14.0.2.tgz", - "integrity": "sha512-2/O0F1SqJ0bD3zqNuYge0ok7OEWCQwk55RPheDYD0va5ij7kYwrFkq5ycCRN0TLjLfxSF6xI5NM6nC5ux7svEQ==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-win32-arm64-msvc/-/swc-win32-arm64-msvc-14.2.4.tgz", + "integrity": "sha512-jxyg67NbEWkDyvM+O8UDbPAyYRZqGLQDTPwvrBBeOSyVWW/jFQkQKQ70JDqDSYg1ZDdl+E3nkbFbq8xM8E9x8A==", "cpu": [ "arm64" ], @@ -557,9 +557,9 @@ } }, "node_modules/@next/swc-win32-ia32-msvc": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-win32-ia32-msvc/-/swc-win32-ia32-msvc-14.0.2.tgz", - "integrity": "sha512-vJI/x70Id0oN4Bq/R6byBqV1/NS5Dl31zC+lowO8SDu1fHmUxoAdILZR5X/sKbiJpuvKcCrwbYgJU8FF/Gh50Q==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-win32-ia32-msvc/-/swc-win32-ia32-msvc-14.2.4.tgz", + "integrity": "sha512-twrmN753hjXRdcrZmZttb/m5xaCBFa48Dt3FbeEItpJArxriYDunWxJn+QFXdJ3hPkm4u7CKxncVvnmgQMY1ag==", "cpu": [ "ia32" ], @@ -572,9 +572,9 @@ } }, "node_modules/@next/swc-win32-x64-msvc": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-win32-x64-msvc/-/swc-win32-x64-msvc-14.0.2.tgz", - "integrity": "sha512-Ut4LXIUvC5m8pHTe2j0vq/YDnTEyq6RSR9vHYPqnELrDapPhLNz9Od/L5Ow3J8RNDWpEnfCiQXuVdfjlNEJ7ug==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-win32-x64-msvc/-/swc-win32-x64-msvc-14.2.4.tgz", + "integrity": "sha512-tkLrjBzqFTP8DVrAAQmZelEahfR9OxWpFR++vAI9FBhCiIxtwHwBHC23SBHCTURBtwB4kc/x44imVOnkKGNVGg==", "cpu": [ "x64" ], @@ -654,18 +654,24 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/@swc/counter": { + "version": "0.1.3", + "resolved": "https://registry.npmjs.org/@swc/counter/-/counter-0.1.3.tgz", + "integrity": "sha512-e2BR4lsJkkRlKZ/qCHPw9ZaSxc0MVUd7gtbtaB7aMvHeJVYe8sOB8DBZkP2DtISHGSku9sCK6T6cnY0CtXrOCQ==" + }, "node_modules/@swc/helpers": { - "version": "0.5.2", - "resolved": "https://registry.npmjs.org/@swc/helpers/-/helpers-0.5.2.tgz", - "integrity": "sha512-E4KcWTpoLHqwPHLxidpOqQbcrZVgi0rsmmZXUle1jXmJfuIf/UWpczUJ7MZZ5tlxytgJXyp0w4PGkkeLiuIdZw==", + "version": "0.5.5", + "resolved": "https://registry.npmjs.org/@swc/helpers/-/helpers-0.5.5.tgz", + "integrity": "sha512-KGYxvIOXcceOAbEk4bi/dVLEK9z8sZ0uBB3Il5b1rhfClSpcX0yfRO0KmTkqR2cnQDymwLB+25ZyMzICg/cm/A==", "dependencies": { + "@swc/counter": "^0.1.3", "tslib": "^2.4.0" } }, "node_modules/@swc/helpers/node_modules/tslib": { - "version": "2.6.2", - "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.6.2.tgz", - "integrity": "sha512-AEYxH93jGFPn/a2iVAwW87VuUIkR1FVUKB77NwMF7nBTDkDrrT/Hpt/IrCJ0QXhW27jTBDcf5ZY7w6RiqTMw2Q==" + "version": "2.6.3", + "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.6.3.tgz", + "integrity": "sha512-xNvxJEOUiWPGhUuUdQgAJPKOOJfGnIyKySOc09XkKsgdUV/3E2zvwZYdejjmRgPCgcym1juLH3226yA7sEFJKQ==" }, "node_modules/@tailwindcss/typography": { "version": "0.5.7", @@ -1136,11 +1142,63 @@ "dequal": "^2.0.3" } }, + "node_modules/b4a": { + "version": "1.6.6", + "resolved": "https://registry.npmjs.org/b4a/-/b4a-1.6.6.tgz", + "integrity": "sha512-5Tk1HLk6b6ctmjIkAcU/Ujv/1WqiDl0F0JdRCR80VsOcUlHcu7pWeWRlOqQLHfDEsVx9YH/aif5AG4ehoCtTmg==", + "dev": true + }, "node_modules/balanced-match": { "version": "1.0.2", "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz", "integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==" }, + "node_modules/bare-events": { + "version": "2.4.2", + "resolved": "https://registry.npmjs.org/bare-events/-/bare-events-2.4.2.tgz", + "integrity": "sha512-qMKFd2qG/36aA4GwvKq8MxnPgCQAmBWmSyLWsJcbn8v03wvIPQ/hG1Ms8bPzndZxMDoHpxez5VOS+gC9Yi24/Q==", + "dev": true, + "optional": true + }, + "node_modules/bare-fs": { + "version": "2.3.1", + "resolved": "https://registry.npmjs.org/bare-fs/-/bare-fs-2.3.1.tgz", + "integrity": "sha512-W/Hfxc/6VehXlsgFtbB5B4xFcsCl+pAh30cYhoFyXErf6oGrwjh8SwiPAdHgpmWonKuYpZgGywN0SXt7dgsADA==", + "dev": true, + "optional": true, + "dependencies": { + "bare-events": "^2.0.0", + "bare-path": "^2.0.0", + "bare-stream": "^2.0.0" + } + }, + "node_modules/bare-os": { + "version": "2.4.0", + "resolved": "https://registry.npmjs.org/bare-os/-/bare-os-2.4.0.tgz", + "integrity": "sha512-v8DTT08AS/G0F9xrhyLtepoo9EJBJ85FRSMbu1pQUlAf6A8T0tEEQGMVObWeqpjhSPXsE0VGlluFBJu2fdoTNg==", + "dev": true, + "optional": true + }, + "node_modules/bare-path": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/bare-path/-/bare-path-2.1.3.tgz", + "integrity": "sha512-lh/eITfU8hrj9Ru5quUp0Io1kJWIk1bTjzo7JH1P5dWmQ2EL4hFUlfI8FonAhSlgIfhn63p84CDY/x+PisgcXA==", + "dev": true, + "optional": true, + "dependencies": { + "bare-os": "^2.1.0" + } + }, + "node_modules/bare-stream": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/bare-stream/-/bare-stream-2.1.3.tgz", + "integrity": "sha512-tiDAH9H/kP+tvNO5sczyn9ZAA7utrSMobyDchsnyyXBuUe2FSQWbxhtuHB8jwpHYYevVo2UJpcmvvjrbHboUUQ==", + "dev": true, + "optional": true, + "dependencies": { + "streamx": "^2.18.0" + } + }, "node_modules/base64-js": { "version": "1.5.1", "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", @@ -1198,11 +1256,11 @@ } }, "node_modules/braces": { - "version": "3.0.2", - "resolved": "https://registry.npmjs.org/braces/-/braces-3.0.2.tgz", - "integrity": "sha512-b8um+L1RzM3WDSzvhm6gIz1yfTbBt6YTlcEKAvsmqCZZFw46z626lVj9j1yEPW33H5H+lBQpZMP1k8l+78Ha0A==", + "version": "3.0.3", + "resolved": "https://registry.npmjs.org/braces/-/braces-3.0.3.tgz", + "integrity": "sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==", "dependencies": { - "fill-range": "^7.0.1" + "fill-range": "^7.1.1" }, "engines": { "node": ">=8" @@ -1321,9 +1379,9 @@ } }, "node_modules/caniuse-lite": { - "version": "1.0.30001519", - "resolved": "https://registry.npmjs.org/caniuse-lite/-/caniuse-lite-1.0.30001519.tgz", - "integrity": "sha512-0QHgqR+Jv4bxHMp8kZ1Kn8CH55OikjKJ6JmKkZYP1F3D7w+lnFXF70nG5eNfsZS89jadi5Ywy5UCSKLAglIRkg==", + "version": "1.0.30001640", + "resolved": "https://registry.npmjs.org/caniuse-lite/-/caniuse-lite-1.0.30001640.tgz", + "integrity": "sha512-lA4VMpW0PSUrFnkmVuEKBUovSWKhj7puyCg8StBChgu298N1AtuF1sKWEvfDuimSEDbhlb/KqPKC3fs1HbuQUA==", "funding": [ { "type": "opencollective", @@ -1589,9 +1647,9 @@ } }, "node_modules/detect-libc": { - "version": "2.0.1", - "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.0.1.tgz", - "integrity": "sha512-463v3ZeIrcWtdgIg6vI6XUncguvr2TnGl4SzDXinkt9mSLpBJKXT3mW6xT3VQdDN11+WVs29pgvivTc4Lp8v+w==", + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.0.3.tgz", + "integrity": "sha512-bwy0MGW55bG41VqxxypOsdSdGqLwXPI/focwgTYCFMbdUiBAxLg9CFzG08sz2aqzknwiX7Hkl0bQENjg8iLByw==", "dev": true, "engines": { "node": ">=8" @@ -2351,6 +2409,12 @@ "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz", "integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==" }, + "node_modules/fast-fifo": { + "version": "1.3.2", + "resolved": "https://registry.npmjs.org/fast-fifo/-/fast-fifo-1.3.2.tgz", + "integrity": "sha512-/d9sfos4yxzpwkDkuN7k2SqFKtYNmCTzgfEpz82x34IM9/zc8KGxQoXg1liNC/izpRM/MBdt44Nmx41ZWqk+FQ==", + "dev": true + }, "node_modules/fast-glob": { "version": "3.3.2", "resolved": "https://registry.npmjs.org/fast-glob/-/fast-glob-3.3.2.tgz", @@ -2410,9 +2474,9 @@ } }, "node_modules/fill-range": { - "version": "7.0.1", - "resolved": "https://registry.npmjs.org/fill-range/-/fill-range-7.0.1.tgz", - "integrity": "sha512-qOo9F+dMUmC2Lcb4BbVvnKJxTPjCm+RRpe4gDuGrzkL7mEVl/djYSu2OdQ2Pa302N4oqkSg9ir6jaLWJ2USVpQ==", + "version": "7.1.1", + "resolved": "https://registry.npmjs.org/fill-range/-/fill-range-7.1.1.tgz", + "integrity": "sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg==", "dependencies": { "to-regex-range": "^5.0.1" }, @@ -2637,11 +2701,6 @@ "node": ">=10.13.0" } }, - "node_modules/glob-to-regexp": { - "version": "0.4.1", - "resolved": "https://registry.npmjs.org/glob-to-regexp/-/glob-to-regexp-0.4.1.tgz", - "integrity": "sha512-lkX1HJXwyMcprw/5YUZc2s7DrpAiHB21/V+E1rHUrVNokkvB6bqMzT0VfV6/86ZNabt1k14YOIaT7nDvOX3Iiw==" - }, "node_modules/globals": { "version": "13.17.0", "resolved": "https://registry.npmjs.org/globals/-/globals-13.17.0.tgz", @@ -3589,17 +3648,17 @@ } }, "node_modules/next": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/next/-/next-14.0.2.tgz", - "integrity": "sha512-jsAU2CkYS40GaQYOiLl9m93RTv2DA/tTJ0NRlmZIBIL87YwQ/xR8k796z7IqgM3jydI8G25dXvyYMC9VDIevIg==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/next/-/next-14.2.4.tgz", + "integrity": "sha512-R8/V7vugY+822rsQGQCjoLhMuC9oFj9SOi4Cl4b2wjDrseD0LRZ10W7R6Czo4w9ZznVSshKjuIomsRjvm9EKJQ==", "dependencies": { - "@next/env": "14.0.2", - "@swc/helpers": "0.5.2", + "@next/env": "14.2.4", + "@swc/helpers": "0.5.5", "busboy": "1.6.0", - "caniuse-lite": "^1.0.30001406", + "caniuse-lite": "^1.0.30001579", + "graceful-fs": "^4.2.11", "postcss": "8.4.31", - "styled-jsx": "5.1.1", - "watchpack": "2.4.0" + "styled-jsx": "5.1.1" }, "bin": { "next": "dist/bin/next" @@ -3608,18 +3667,19 @@ "node": ">=18.17.0" }, "optionalDependencies": { - "@next/swc-darwin-arm64": "14.0.2", - "@next/swc-darwin-x64": "14.0.2", - "@next/swc-linux-arm64-gnu": "14.0.2", - "@next/swc-linux-arm64-musl": "14.0.2", - "@next/swc-linux-x64-gnu": "14.0.2", - "@next/swc-linux-x64-musl": "14.0.2", - "@next/swc-win32-arm64-msvc": "14.0.2", - "@next/swc-win32-ia32-msvc": "14.0.2", - "@next/swc-win32-x64-msvc": "14.0.2" + "@next/swc-darwin-arm64": "14.2.4", + "@next/swc-darwin-x64": "14.2.4", + "@next/swc-linux-arm64-gnu": "14.2.4", + "@next/swc-linux-arm64-musl": "14.2.4", + "@next/swc-linux-x64-gnu": "14.2.4", + "@next/swc-linux-x64-musl": "14.2.4", + "@next/swc-win32-arm64-msvc": "14.2.4", + "@next/swc-win32-ia32-msvc": "14.2.4", + "@next/swc-win32-x64-msvc": "14.2.4" }, "peerDependencies": { "@opentelemetry/api": "^1.1.0", + "@playwright/test": "^1.41.2", "react": "^18.2.0", "react-dom": "^18.2.0", "sass": "^1.3.0" @@ -3628,6 +3688,9 @@ "@opentelemetry/api": { "optional": true }, + "@playwright/test": { + "optional": true + }, "sass": { "optional": true } @@ -3656,9 +3719,9 @@ } }, "node_modules/node-addon-api": { - "version": "6.0.0", - "resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-6.0.0.tgz", - "integrity": "sha512-GyHvgPvUXBvAkXa0YvYnhilSB1A+FRYMpIVggKzPZqdaZfevZOuzfWzyvgzOwRLHBeo/MMswmJFsrNF4Nw1pmA==", + "version": "6.1.0", + "resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-6.1.0.tgz", + "integrity": "sha512-+eawOlIgy680F0kBzPUNFhMZGtJ1YmqM6l4+Crf4IkImjYrO/mqPwRMh352g23uIaQKFItcQ64I7KMaJxHgAVA==", "dev": true }, "node_modules/node-releases": { @@ -4229,6 +4292,12 @@ } ] }, + "node_modules/queue-tick": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/queue-tick/-/queue-tick-1.0.1.tgz", + "integrity": "sha512-kJt5qhMxoszgU/62PLP1CJytzd2NKetjSRnyuj31fDd3Rlcz3fzlFdFLD1SItunPwyqEOkca6GbV612BWfaBag==", + "dev": true + }, "node_modules/rc": { "version": "1.2.8", "resolved": "https://registry.npmjs.org/rc/-/rc-1.2.8.tgz", @@ -4619,19 +4688,19 @@ } }, "node_modules/sharp": { - "version": "0.32.0", - "resolved": "https://registry.npmjs.org/sharp/-/sharp-0.32.0.tgz", - "integrity": "sha512-yLAypVcqj1toSAqRSwbs86nEzfyZVDYqjuUX8grhFpeij0DDNagKJXELS/auegDBRDg1XBtELdOGfo2X1cCpeA==", + "version": "0.32.6", + "resolved": "https://registry.npmjs.org/sharp/-/sharp-0.32.6.tgz", + "integrity": "sha512-KyLTWwgcR9Oe4d9HwCwNM2l7+J0dUQwn/yf7S0EnTtb0eVS4RxO0eUSvxPtzT4F3SY+C4K6fqdv/DO27sJ/v/w==", "dev": true, "hasInstallScript": true, "dependencies": { "color": "^4.2.3", - "detect-libc": "^2.0.1", - "node-addon-api": "^6.0.0", + "detect-libc": "^2.0.2", + "node-addon-api": "^6.1.0", "prebuild-install": "^7.1.1", - "semver": "^7.3.8", + "semver": "^7.5.4", "simple-get": "^4.0.1", - "tar-fs": "^2.1.1", + "tar-fs": "^3.0.4", "tunnel-agent": "^0.6.0" }, "engines": { @@ -4641,6 +4710,31 @@ "url": "https://opencollective.com/libvips" } }, + "node_modules/sharp/node_modules/tar-fs": { + "version": "3.0.6", + "resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-3.0.6.tgz", + "integrity": "sha512-iokBDQQkUyeXhgPYaZxmczGPhnhXZ0CmrqI+MOb/WFGS9DW5wnfrLgtjUJBvz50vQ3qfRwJ62QVoCFu8mPVu5w==", + "dev": true, + "dependencies": { + "pump": "^3.0.0", + "tar-stream": "^3.1.5" + }, + "optionalDependencies": { + "bare-fs": "^2.1.1", + "bare-path": "^2.1.0" + } + }, + "node_modules/sharp/node_modules/tar-stream": { + "version": "3.1.7", + "resolved": "https://registry.npmjs.org/tar-stream/-/tar-stream-3.1.7.tgz", + "integrity": "sha512-qJj60CXt7IU1Ffyc3NJMjh6EkuCFej46zUqJ4J7pqYlThyd9bO0XBTmcOIhSzZJVWfsLks0+nle/j538YAW9RQ==", + "dev": true, + "dependencies": { + "b4a": "^1.6.4", + "fast-fifo": "^1.2.0", + "streamx": "^2.15.0" + } + }, "node_modules/shebang-command": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz", @@ -4789,6 +4883,20 @@ "node": ">=10.0.0" } }, + "node_modules/streamx": { + "version": "2.18.0", + "resolved": "https://registry.npmjs.org/streamx/-/streamx-2.18.0.tgz", + "integrity": "sha512-LLUC1TWdjVdn1weXGcSxyTR3T4+acB6tVGXT95y0nGbca4t4o/ng1wKAGTljm9VicuCVLvRlqFYXYy5GwgM7sQ==", + "dev": true, + "dependencies": { + "fast-fifo": "^1.3.2", + "queue-tick": "^1.0.1", + "text-decoder": "^1.1.0" + }, + "optionalDependencies": { + "bare-events": "^2.2.0" + } + }, "node_modules/string_decoder": { "version": "1.3.0", "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.3.0.tgz", @@ -5100,6 +5208,15 @@ "node": ">=6" } }, + "node_modules/text-decoder": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/text-decoder/-/text-decoder-1.1.0.tgz", + "integrity": "sha512-TmLJNj6UgX8xcUZo4UDStGQtDiTzF7BzWlzn9g7UWrjkpHr5uJTK1ld16wZ3LXb2vb6jH8qU89dW5whuMdXYdw==", + "dev": true, + "dependencies": { + "b4a": "^1.6.4" + } + }, "node_modules/text-table": { "version": "0.2.0", "resolved": "https://registry.npmjs.org/text-table/-/text-table-0.2.0.tgz", @@ -5355,18 +5472,6 @@ "resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz", "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==" }, - "node_modules/watchpack": { - "version": "2.4.0", - "resolved": "https://registry.npmjs.org/watchpack/-/watchpack-2.4.0.tgz", - "integrity": "sha512-Lcvm7MGST/4fup+ifyKi2hjyIAwcdI4HRgtvTpIUxBRhB+RFtUh8XtDOxUfctVCnhVi+QQj49i91OyvzkJl6cg==", - "dependencies": { - "glob-to-regexp": "^0.4.1", - "graceful-fs": "^4.1.2" - }, - "engines": { - "node": ">=10.13.0" - } - }, "node_modules/which": { "version": "2.0.2", "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz", @@ -5860,9 +5965,9 @@ } }, "@next/env": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/env/-/env-14.0.2.tgz", - "integrity": "sha512-HAW1sljizEaduEOes/m84oUqeIDAUYBR1CDwu2tobNlNDFP3cSm9d6QsOsGeNlIppU1p/p1+bWbYCbvwjFiceA==" + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/env/-/env-14.2.4.tgz", + "integrity": "sha512-3EtkY5VDkuV2+lNmKlbkibIJxcO4oIHEhBWne6PaAp+76J9KoSsGvNikp6ivzAT8dhhBMYrm6op2pS1ApG0Hzg==" }, "@next/eslint-plugin-next": { "version": "14.0.2", @@ -5874,57 +5979,57 @@ } }, "@next/swc-darwin-arm64": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-darwin-arm64/-/swc-darwin-arm64-14.0.2.tgz", - "integrity": "sha512-i+jQY0fOb8L5gvGvojWyZMfQoQtDVB2kYe7fufOEiST6sicvzI2W5/EXo4lX5bLUjapHKe+nFxuVv7BA+Pd7LQ==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-darwin-arm64/-/swc-darwin-arm64-14.2.4.tgz", + "integrity": "sha512-AH3mO4JlFUqsYcwFUHb1wAKlebHU/Hv2u2kb1pAuRanDZ7pD/A/KPD98RHZmwsJpdHQwfEc/06mgpSzwrJYnNg==", "optional": true }, "@next/swc-darwin-x64": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-darwin-x64/-/swc-darwin-x64-14.0.2.tgz", - "integrity": "sha512-zRCAO0d2hW6gBEa4wJaLn+gY8qtIqD3gYd9NjruuN98OCI6YyelmhWVVLlREjS7RYrm9OUQIp/iVJFeB6kP1hg==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-darwin-x64/-/swc-darwin-x64-14.2.4.tgz", + "integrity": "sha512-QVadW73sWIO6E2VroyUjuAxhWLZWEpiFqHdZdoQ/AMpN9YWGuHV8t2rChr0ahy+irKX5mlDU7OY68k3n4tAZTg==", "optional": true }, "@next/swc-linux-arm64-gnu": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-gnu/-/swc-linux-arm64-gnu-14.0.2.tgz", - "integrity": "sha512-tSJmiaon8YaKsVhi7GgRizZoV0N1Sx5+i+hFTrCKKQN7s3tuqW0Rov+RYdPhAv/pJl4qiG+XfSX4eJXqpNg3dA==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-gnu/-/swc-linux-arm64-gnu-14.2.4.tgz", + "integrity": "sha512-KT6GUrb3oyCfcfJ+WliXuJnD6pCpZiosx2X3k66HLR+DMoilRb76LpWPGb4tZprawTtcnyrv75ElD6VncVamUQ==", "optional": true }, "@next/swc-linux-arm64-musl": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-musl/-/swc-linux-arm64-musl-14.0.2.tgz", - "integrity": "sha512-dXJLMSEOwqJKcag1BeX1C+ekdPPJ9yXbWIt3nAadhbLx5CjACoB2NQj9Xcqu2tmdr5L6m34fR+fjGPs+ZVPLzA==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-linux-arm64-musl/-/swc-linux-arm64-musl-14.2.4.tgz", + "integrity": "sha512-Alv8/XGSs/ytwQcbCHwze1HmiIkIVhDHYLjczSVrf0Wi2MvKn/blt7+S6FJitj3yTlMwMxII1gIJ9WepI4aZ/A==", "optional": true }, "@next/swc-linux-x64-gnu": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-linux-x64-gnu/-/swc-linux-x64-gnu-14.0.2.tgz", - "integrity": "sha512-WC9KAPSowj6as76P3vf1J3mf2QTm3Wv3FBzQi7UJ+dxWjK3MhHVWsWUo24AnmHx9qDcEtHM58okgZkXVqeLB+Q==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-linux-x64-gnu/-/swc-linux-x64-gnu-14.2.4.tgz", + "integrity": "sha512-ze0ShQDBPCqxLImzw4sCdfnB3lRmN3qGMB2GWDRlq5Wqy4G36pxtNOo2usu/Nm9+V2Rh/QQnrRc2l94kYFXO6Q==", "optional": true }, "@next/swc-linux-x64-musl": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-linux-x64-musl/-/swc-linux-x64-musl-14.0.2.tgz", - "integrity": "sha512-KSSAwvUcjtdZY4zJFa2f5VNJIwuEVnOSlqYqbQIawREJA+gUI6egeiRu290pXioQXnQHYYdXmnVNZ4M+VMB7KQ==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-linux-x64-musl/-/swc-linux-x64-musl-14.2.4.tgz", + "integrity": "sha512-8dwC0UJoc6fC7PX70csdaznVMNr16hQrTDAMPvLPloazlcaWfdPogq+UpZX6Drqb1OBlwowz8iG7WR0Tzk/diQ==", "optional": true }, "@next/swc-win32-arm64-msvc": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-win32-arm64-msvc/-/swc-win32-arm64-msvc-14.0.2.tgz", - "integrity": "sha512-2/O0F1SqJ0bD3zqNuYge0ok7OEWCQwk55RPheDYD0va5ij7kYwrFkq5ycCRN0TLjLfxSF6xI5NM6nC5ux7svEQ==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-win32-arm64-msvc/-/swc-win32-arm64-msvc-14.2.4.tgz", + "integrity": "sha512-jxyg67NbEWkDyvM+O8UDbPAyYRZqGLQDTPwvrBBeOSyVWW/jFQkQKQ70JDqDSYg1ZDdl+E3nkbFbq8xM8E9x8A==", "optional": true }, "@next/swc-win32-ia32-msvc": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-win32-ia32-msvc/-/swc-win32-ia32-msvc-14.0.2.tgz", - "integrity": "sha512-vJI/x70Id0oN4Bq/R6byBqV1/NS5Dl31zC+lowO8SDu1fHmUxoAdILZR5X/sKbiJpuvKcCrwbYgJU8FF/Gh50Q==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-win32-ia32-msvc/-/swc-win32-ia32-msvc-14.2.4.tgz", + "integrity": "sha512-twrmN753hjXRdcrZmZttb/m5xaCBFa48Dt3FbeEItpJArxriYDunWxJn+QFXdJ3hPkm4u7CKxncVvnmgQMY1ag==", "optional": true }, "@next/swc-win32-x64-msvc": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/@next/swc-win32-x64-msvc/-/swc-win32-x64-msvc-14.0.2.tgz", - "integrity": "sha512-Ut4LXIUvC5m8pHTe2j0vq/YDnTEyq6RSR9vHYPqnELrDapPhLNz9Od/L5Ow3J8RNDWpEnfCiQXuVdfjlNEJ7ug==", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/@next/swc-win32-x64-msvc/-/swc-win32-x64-msvc-14.2.4.tgz", + "integrity": "sha512-tkLrjBzqFTP8DVrAAQmZelEahfR9OxWpFR++vAI9FBhCiIxtwHwBHC23SBHCTURBtwB4kc/x44imVOnkKGNVGg==", "optional": true }, "@nodelib/fs.scandir": { @@ -5974,18 +6079,24 @@ "lodash.deburr": "^4.1.0" } }, + "@swc/counter": { + "version": "0.1.3", + "resolved": "https://registry.npmjs.org/@swc/counter/-/counter-0.1.3.tgz", + "integrity": "sha512-e2BR4lsJkkRlKZ/qCHPw9ZaSxc0MVUd7gtbtaB7aMvHeJVYe8sOB8DBZkP2DtISHGSku9sCK6T6cnY0CtXrOCQ==" + }, "@swc/helpers": { - "version": "0.5.2", - "resolved": "https://registry.npmjs.org/@swc/helpers/-/helpers-0.5.2.tgz", - "integrity": "sha512-E4KcWTpoLHqwPHLxidpOqQbcrZVgi0rsmmZXUle1jXmJfuIf/UWpczUJ7MZZ5tlxytgJXyp0w4PGkkeLiuIdZw==", + "version": "0.5.5", + "resolved": "https://registry.npmjs.org/@swc/helpers/-/helpers-0.5.5.tgz", + "integrity": "sha512-KGYxvIOXcceOAbEk4bi/dVLEK9z8sZ0uBB3Il5b1rhfClSpcX0yfRO0KmTkqR2cnQDymwLB+25ZyMzICg/cm/A==", "requires": { + "@swc/counter": "^0.1.3", "tslib": "^2.4.0" }, "dependencies": { "tslib": { - "version": "2.6.2", - "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.6.2.tgz", - "integrity": "sha512-AEYxH93jGFPn/a2iVAwW87VuUIkR1FVUKB77NwMF7nBTDkDrrT/Hpt/IrCJ0QXhW27jTBDcf5ZY7w6RiqTMw2Q==" + "version": "2.6.3", + "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.6.3.tgz", + "integrity": "sha512-xNvxJEOUiWPGhUuUdQgAJPKOOJfGnIyKySOc09XkKsgdUV/3E2zvwZYdejjmRgPCgcym1juLH3226yA7sEFJKQ==" } } }, @@ -6319,11 +6430,63 @@ "dequal": "^2.0.3" } }, + "b4a": { + "version": "1.6.6", + "resolved": "https://registry.npmjs.org/b4a/-/b4a-1.6.6.tgz", + "integrity": "sha512-5Tk1HLk6b6ctmjIkAcU/Ujv/1WqiDl0F0JdRCR80VsOcUlHcu7pWeWRlOqQLHfDEsVx9YH/aif5AG4ehoCtTmg==", + "dev": true + }, "balanced-match": { "version": "1.0.2", "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz", "integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==" }, + "bare-events": { + "version": "2.4.2", + "resolved": "https://registry.npmjs.org/bare-events/-/bare-events-2.4.2.tgz", + "integrity": "sha512-qMKFd2qG/36aA4GwvKq8MxnPgCQAmBWmSyLWsJcbn8v03wvIPQ/hG1Ms8bPzndZxMDoHpxez5VOS+gC9Yi24/Q==", + "dev": true, + "optional": true + }, + "bare-fs": { + "version": "2.3.1", + "resolved": "https://registry.npmjs.org/bare-fs/-/bare-fs-2.3.1.tgz", + "integrity": "sha512-W/Hfxc/6VehXlsgFtbB5B4xFcsCl+pAh30cYhoFyXErf6oGrwjh8SwiPAdHgpmWonKuYpZgGywN0SXt7dgsADA==", + "dev": true, + "optional": true, + "requires": { + "bare-events": "^2.0.0", + "bare-path": "^2.0.0", + "bare-stream": "^2.0.0" + } + }, + "bare-os": { + "version": "2.4.0", + "resolved": "https://registry.npmjs.org/bare-os/-/bare-os-2.4.0.tgz", + "integrity": "sha512-v8DTT08AS/G0F9xrhyLtepoo9EJBJ85FRSMbu1pQUlAf6A8T0tEEQGMVObWeqpjhSPXsE0VGlluFBJu2fdoTNg==", + "dev": true, + "optional": true + }, + "bare-path": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/bare-path/-/bare-path-2.1.3.tgz", + "integrity": "sha512-lh/eITfU8hrj9Ru5quUp0Io1kJWIk1bTjzo7JH1P5dWmQ2EL4hFUlfI8FonAhSlgIfhn63p84CDY/x+PisgcXA==", + "dev": true, + "optional": true, + "requires": { + "bare-os": "^2.1.0" + } + }, + "bare-stream": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/bare-stream/-/bare-stream-2.1.3.tgz", + "integrity": "sha512-tiDAH9H/kP+tvNO5sczyn9ZAA7utrSMobyDchsnyyXBuUe2FSQWbxhtuHB8jwpHYYevVo2UJpcmvvjrbHboUUQ==", + "dev": true, + "optional": true, + "requires": { + "streamx": "^2.18.0" + } + }, "base64-js": { "version": "1.5.1", "resolved": "https://registry.npmjs.org/base64-js/-/base64-js-1.5.1.tgz", @@ -6361,11 +6524,11 @@ } }, "braces": { - "version": "3.0.2", - "resolved": "https://registry.npmjs.org/braces/-/braces-3.0.2.tgz", - "integrity": "sha512-b8um+L1RzM3WDSzvhm6gIz1yfTbBt6YTlcEKAvsmqCZZFw46z626lVj9j1yEPW33H5H+lBQpZMP1k8l+78Ha0A==", + "version": "3.0.3", + "resolved": "https://registry.npmjs.org/braces/-/braces-3.0.3.tgz", + "integrity": "sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==", "requires": { - "fill-range": "^7.0.1" + "fill-range": "^7.1.1" } }, "bright": { @@ -6429,9 +6592,9 @@ "integrity": "sha512-QOSvevhslijgYwRx6Rv7zKdMF8lbRmx+uQGx2+vDc+KI/eBnsy9kit5aj23AgGu3pa4t9AgwbnXWqS+iOY+2aA==" }, "caniuse-lite": { - "version": "1.0.30001519", - "resolved": "https://registry.npmjs.org/caniuse-lite/-/caniuse-lite-1.0.30001519.tgz", - "integrity": "sha512-0QHgqR+Jv4bxHMp8kZ1Kn8CH55OikjKJ6JmKkZYP1F3D7w+lnFXF70nG5eNfsZS89jadi5Ywy5UCSKLAglIRkg==" + "version": "1.0.30001640", + "resolved": "https://registry.npmjs.org/caniuse-lite/-/caniuse-lite-1.0.30001640.tgz", + "integrity": "sha512-lA4VMpW0PSUrFnkmVuEKBUovSWKhj7puyCg8StBChgu298N1AtuF1sKWEvfDuimSEDbhlb/KqPKC3fs1HbuQUA==" }, "chalk": { "version": "4.1.2", @@ -6615,9 +6778,9 @@ "dev": true }, "detect-libc": { - "version": "2.0.1", - "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.0.1.tgz", - "integrity": "sha512-463v3ZeIrcWtdgIg6vI6XUncguvr2TnGl4SzDXinkt9mSLpBJKXT3mW6xT3VQdDN11+WVs29pgvivTc4Lp8v+w==", + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.0.3.tgz", + "integrity": "sha512-bwy0MGW55bG41VqxxypOsdSdGqLwXPI/focwgTYCFMbdUiBAxLg9CFzG08sz2aqzknwiX7Hkl0bQENjg8iLByw==", "dev": true }, "didyoumean": { @@ -7195,6 +7358,12 @@ "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz", "integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==" }, + "fast-fifo": { + "version": "1.3.2", + "resolved": "https://registry.npmjs.org/fast-fifo/-/fast-fifo-1.3.2.tgz", + "integrity": "sha512-/d9sfos4yxzpwkDkuN7k2SqFKtYNmCTzgfEpz82x34IM9/zc8KGxQoXg1liNC/izpRM/MBdt44Nmx41ZWqk+FQ==", + "dev": true + }, "fast-glob": { "version": "3.3.2", "resolved": "https://registry.npmjs.org/fast-glob/-/fast-glob-3.3.2.tgz", @@ -7247,9 +7416,9 @@ } }, "fill-range": { - "version": "7.0.1", - "resolved": "https://registry.npmjs.org/fill-range/-/fill-range-7.0.1.tgz", - "integrity": "sha512-qOo9F+dMUmC2Lcb4BbVvnKJxTPjCm+RRpe4gDuGrzkL7mEVl/djYSu2OdQ2Pa302N4oqkSg9ir6jaLWJ2USVpQ==", + "version": "7.1.1", + "resolved": "https://registry.npmjs.org/fill-range/-/fill-range-7.1.1.tgz", + "integrity": "sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg==", "requires": { "to-regex-range": "^5.0.1" } @@ -7409,11 +7578,6 @@ "is-glob": "^4.0.3" } }, - "glob-to-regexp": { - "version": "0.4.1", - "resolved": "https://registry.npmjs.org/glob-to-regexp/-/glob-to-regexp-0.4.1.tgz", - "integrity": "sha512-lkX1HJXwyMcprw/5YUZc2s7DrpAiHB21/V+E1rHUrVNokkvB6bqMzT0VfV6/86ZNabt1k14YOIaT7nDvOX3Iiw==" - }, "globals": { "version": "13.17.0", "resolved": "https://registry.npmjs.org/globals/-/globals-13.17.0.tgz", @@ -8101,26 +8265,26 @@ } }, "next": { - "version": "14.0.2", - "resolved": "https://registry.npmjs.org/next/-/next-14.0.2.tgz", - "integrity": "sha512-jsAU2CkYS40GaQYOiLl9m93RTv2DA/tTJ0NRlmZIBIL87YwQ/xR8k796z7IqgM3jydI8G25dXvyYMC9VDIevIg==", - "requires": { - "@next/env": "14.0.2", - "@next/swc-darwin-arm64": "14.0.2", - "@next/swc-darwin-x64": "14.0.2", - "@next/swc-linux-arm64-gnu": "14.0.2", - "@next/swc-linux-arm64-musl": "14.0.2", - "@next/swc-linux-x64-gnu": "14.0.2", - "@next/swc-linux-x64-musl": "14.0.2", - "@next/swc-win32-arm64-msvc": "14.0.2", - "@next/swc-win32-ia32-msvc": "14.0.2", - "@next/swc-win32-x64-msvc": "14.0.2", - "@swc/helpers": "0.5.2", + "version": "14.2.4", + "resolved": "https://registry.npmjs.org/next/-/next-14.2.4.tgz", + "integrity": "sha512-R8/V7vugY+822rsQGQCjoLhMuC9oFj9SOi4Cl4b2wjDrseD0LRZ10W7R6Czo4w9ZznVSshKjuIomsRjvm9EKJQ==", + "requires": { + "@next/env": "14.2.4", + "@next/swc-darwin-arm64": "14.2.4", + "@next/swc-darwin-x64": "14.2.4", + "@next/swc-linux-arm64-gnu": "14.2.4", + "@next/swc-linux-arm64-musl": "14.2.4", + "@next/swc-linux-x64-gnu": "14.2.4", + "@next/swc-linux-x64-musl": "14.2.4", + "@next/swc-win32-arm64-msvc": "14.2.4", + "@next/swc-win32-ia32-msvc": "14.2.4", + "@next/swc-win32-x64-msvc": "14.2.4", + "@swc/helpers": "0.5.5", "busboy": "1.6.0", - "caniuse-lite": "^1.0.30001406", + "caniuse-lite": "^1.0.30001579", + "graceful-fs": "^4.2.11", "postcss": "8.4.31", - "styled-jsx": "5.1.1", - "watchpack": "2.4.0" + "styled-jsx": "5.1.1" } }, "next-themes": { @@ -8139,9 +8303,9 @@ } }, "node-addon-api": { - "version": "6.0.0", - "resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-6.0.0.tgz", - "integrity": "sha512-GyHvgPvUXBvAkXa0YvYnhilSB1A+FRYMpIVggKzPZqdaZfevZOuzfWzyvgzOwRLHBeo/MMswmJFsrNF4Nw1pmA==", + "version": "6.1.0", + "resolved": "https://registry.npmjs.org/node-addon-api/-/node-addon-api-6.1.0.tgz", + "integrity": "sha512-+eawOlIgy680F0kBzPUNFhMZGtJ1YmqM6l4+Crf4IkImjYrO/mqPwRMh352g23uIaQKFItcQ64I7KMaJxHgAVA==", "dev": true }, "node-releases": { @@ -8477,6 +8641,12 @@ "resolved": "https://registry.npmjs.org/queue-microtask/-/queue-microtask-1.2.3.tgz", "integrity": "sha512-NuaNSa6flKT5JaSYQzJok04JzTL1CA6aGhv5rfLW3PgqA+M2ChpZQnAC8h8i4ZFkBS8X5RqkDBHA7r4hej3K9A==" }, + "queue-tick": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/queue-tick/-/queue-tick-1.0.1.tgz", + "integrity": "sha512-kJt5qhMxoszgU/62PLP1CJytzd2NKetjSRnyuj31fDd3Rlcz3fzlFdFLD1SItunPwyqEOkca6GbV612BWfaBag==", + "dev": true + }, "rc": { "version": "1.2.8", "resolved": "https://registry.npmjs.org/rc/-/rc-1.2.8.tgz", @@ -8747,19 +8917,44 @@ } }, "sharp": { - "version": "0.32.0", - "resolved": "https://registry.npmjs.org/sharp/-/sharp-0.32.0.tgz", - "integrity": "sha512-yLAypVcqj1toSAqRSwbs86nEzfyZVDYqjuUX8grhFpeij0DDNagKJXELS/auegDBRDg1XBtELdOGfo2X1cCpeA==", + "version": "0.32.6", + "resolved": "https://registry.npmjs.org/sharp/-/sharp-0.32.6.tgz", + "integrity": "sha512-KyLTWwgcR9Oe4d9HwCwNM2l7+J0dUQwn/yf7S0EnTtb0eVS4RxO0eUSvxPtzT4F3SY+C4K6fqdv/DO27sJ/v/w==", "dev": true, "requires": { "color": "^4.2.3", - "detect-libc": "^2.0.1", - "node-addon-api": "^6.0.0", + "detect-libc": "^2.0.2", + "node-addon-api": "^6.1.0", "prebuild-install": "^7.1.1", - "semver": "^7.3.8", + "semver": "^7.5.4", "simple-get": "^4.0.1", - "tar-fs": "^2.1.1", + "tar-fs": "^3.0.4", "tunnel-agent": "^0.6.0" + }, + "dependencies": { + "tar-fs": { + "version": "3.0.6", + "resolved": "https://registry.npmjs.org/tar-fs/-/tar-fs-3.0.6.tgz", + "integrity": "sha512-iokBDQQkUyeXhgPYaZxmczGPhnhXZ0CmrqI+MOb/WFGS9DW5wnfrLgtjUJBvz50vQ3qfRwJ62QVoCFu8mPVu5w==", + "dev": true, + "requires": { + "bare-fs": "^2.1.1", + "bare-path": "^2.1.0", + "pump": "^3.0.0", + "tar-stream": "^3.1.5" + } + }, + "tar-stream": { + "version": "3.1.7", + "resolved": "https://registry.npmjs.org/tar-stream/-/tar-stream-3.1.7.tgz", + "integrity": "sha512-qJj60CXt7IU1Ffyc3NJMjh6EkuCFej46zUqJ4J7pqYlThyd9bO0XBTmcOIhSzZJVWfsLks0+nle/j538YAW9RQ==", + "dev": true, + "requires": { + "b4a": "^1.6.4", + "fast-fifo": "^1.2.0", + "streamx": "^2.15.0" + } + } } }, "shebang-command": { @@ -8866,6 +9061,18 @@ "resolved": "https://registry.npmjs.org/streamsearch/-/streamsearch-1.1.0.tgz", "integrity": "sha512-Mcc5wHehp9aXz1ax6bZUyY5afg9u2rv5cqQI3mRrYkGC8rW2hM02jWuwjtL++LS5qinSyhj2QfLyNsuc+VsExg==" }, + "streamx": { + "version": "2.18.0", + "resolved": "https://registry.npmjs.org/streamx/-/streamx-2.18.0.tgz", + "integrity": "sha512-LLUC1TWdjVdn1weXGcSxyTR3T4+acB6tVGXT95y0nGbca4t4o/ng1wKAGTljm9VicuCVLvRlqFYXYy5GwgM7sQ==", + "dev": true, + "requires": { + "bare-events": "^2.2.0", + "fast-fifo": "^1.3.2", + "queue-tick": "^1.0.1", + "text-decoder": "^1.1.0" + } + }, "string_decoder": { "version": "1.3.0", "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.3.0.tgz", @@ -9086,6 +9293,15 @@ "readable-stream": "^3.1.1" } }, + "text-decoder": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/text-decoder/-/text-decoder-1.1.0.tgz", + "integrity": "sha512-TmLJNj6UgX8xcUZo4UDStGQtDiTzF7BzWlzn9g7UWrjkpHr5uJTK1ld16wZ3LXb2vb6jH8qU89dW5whuMdXYdw==", + "dev": true, + "requires": { + "b4a": "^1.6.4" + } + }, "text-table": { "version": "0.2.0", "resolved": "https://registry.npmjs.org/text-table/-/text-table-0.2.0.tgz", @@ -9267,15 +9483,6 @@ "resolved": "https://registry.npmjs.org/util-deprecate/-/util-deprecate-1.0.2.tgz", "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==" }, - "watchpack": { - "version": "2.4.0", - "resolved": "https://registry.npmjs.org/watchpack/-/watchpack-2.4.0.tgz", - "integrity": "sha512-Lcvm7MGST/4fup+ifyKi2hjyIAwcdI4HRgtvTpIUxBRhB+RFtUh8XtDOxUfctVCnhVi+QQj49i91OyvzkJl6cg==", - "requires": { - "glob-to-regexp": "^0.4.1", - "graceful-fs": "^4.1.2" - } - }, "which": { "version": "2.0.2", "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz", diff --git a/src/components/BlogLayout.jsx b/src/components/BlogLayout.jsx index de4db51bd..d29b251ad 100644 --- a/src/components/BlogLayout.jsx +++ b/src/components/BlogLayout.jsx @@ -82,8 +82,8 @@ export default async function BlogLayout({ children, href, frontmatter, ast }) { {title} )} + @@ -99,7 +99,6 @@ export default async function BlogLayout({ children, href, frontmatter, ast }) { indiehackers={frontmatter.indiehackers} /> - diff --git a/src/components/Spaces.jsx b/src/components/Spaces.jsx index 5080f5231..01a35368a 100644 --- a/src/components/Spaces.jsx +++ b/src/components/Spaces.jsx @@ -8,7 +8,7 @@ const spaces = [ { name: 'Learn', href: '/learn' }, { name: 'Node', href: '/node' }, { name: 'Help Center', href: '/support' }, - { name: 'Blog', href: '/blog' }, + { name: 'Eng Blog', href: '/blog' }, ] export function TopLevelLink({ href, className, current, children }) { diff --git a/src/components/TableOfContents.jsx b/src/components/TableOfContents.jsx index 6e71aa193..223621b66 100644 --- a/src/components/TableOfContents.jsx +++ b/src/components/TableOfContents.jsx @@ -115,15 +115,19 @@ export default function TableOfContents({ tableOfContents, routeGroup }) { )} -
- - Edit this page on GitHub → - + {routeGroup != "(blog)" ? +
+ + : ""} ) } diff --git a/utils/import-blogs.py b/utils/import-blogs.py new file mode 100644 index 000000000..9e7330e7a --- /dev/null +++ b/utils/import-blogs.py @@ -0,0 +1,221 @@ +#!/usr/bin/env python3 + +import os +import sys +import yaml +import hashlib +import requests + +from dateutil.parser import parse as dateparse +from bs4 import BeautifulSoup +from markdownify import MarkdownConverter, ATX + +MAX_AUTO_DESCRIPTION_LENGTH = 300 + + +def download(url): + print("downloading", url, file=sys.stderr) + resp = requests.get(url) + if resp.status_code == 404: + print("error 404: %r" % url, file=sys.stderr) + return b"" + resp.raise_for_status() + return resp.content + + +def cached_download(url): + cache = os.path.join("cache", hashlib.sha256(url.encode("utf8")).hexdigest()) + if not os.path.exists(cache): + os.makedirs("cache", exist_ok=True) + data = download(url) + with open(cache, "wb") as fh: + fh.write(data) + with open(cache + ".url", "wb") as fh: + fh.write(url.encode("utf8")) + with open(cache, "rb") as fh: + return fh.read() + + +def filtered_cached_download(url): + if any( + url.lower().startswith(prefix) + for prefix in ( + "https://www.storj.io/blog/", + "https://www.storj.io/category/", + "https://www.storj.io/blog-posts/", + "https://blog.storj.io/")): + return cached_download(url) + return "" + + +def handleblog(url, parsed): + if url not in ( + # definite + "https://www.storj.io/blog/go-integration-tests-with-postgres", + "https://www.storj.io/blog/finding-and-tracking-resource-leaks-in-go", + "https://www.storj.io/blog/production-concurrency", + "https://www.storj.io/blog/finding-goroutine-leaks-in-tests", + "https://www.storj.io/blog/demystifying-technical-debt", + "https://www.storj.io/blog/a-tale-of-two-copies", + "https://www.storj.io/blog/introducing-drpc-our-replacement-for-grpc", + + # maybe + "https://www.storj.io/blog/how-developers-can-easily-connect-storj-to-compute-for-presigned-urls", + "https://www.storj.io/blog/how-to-generate-presigned-urls-for-temporary-object-access", + "https://www.storj.io/blog/storj-open-development-part-2-whats-new", + "https://www.storj.io/blog/use-storj-dcs-from-cloud-native-environments-using-sidecar-pattern", + "https://www.storj.io/blog/cloud-based-mutlimedia-library-transformation", + "https://www.storj.io/blog/february-2022-product-update", + "https://www.storj.io/blog/january-2021-product-update", + "https://www.storj.io/blog/december-2021-storj-product-update", + "https://www.storj.io/blog/the-complexity-of-amazon-s3-and-the-simplicity-of-decentralization", + "https://www.storj.io/blog/november-2021-storj-product-update", + "https://www.storj.io/blog/storj-open-development-announcement", + "https://www.storj.io/blog/the-10-most-common-questions-about-decentralized-cloud-storage", + "https://www.storj.io/blog/september-2021-development-update", + "https://www.storj.io/blog/using-storj-dcs-with-github-actions", + "https://www.storj.io/blog/open-source-and-open-data-storj-dcs-network-statistics", + "https://www.storj.io/blog/august-2021-development-update-from-storj", + "https://www.storj.io/blog/july-2021-development-update-from-storj", + "https://www.storj.io/blog/automatically-store-your-tesla-sentry-mode-and-dashcam-videos-on-the-decentralized-cloud", + "https://www.storj.io/blog/june-2021-development-update", + "https://www.storj.io/blog/may-2021-development-update-from-storj", + "https://www.storj.io/blog/what-is-end-to-end-encryption", + "https://www.storj.io/blog/product-development-update-april-2021", + "https://www.storj.io/blog/december-2020-development-update-from-storj-labs", + "https://www.storj.io/blog/november-2020-development-update-from-storj-labs", + "https://www.storj.io/blog/visualizing-decentralized-data-distribution-with-the-linkshare-object-map", + "https://www.storj.io/blog/october-2020-development-update-from-storj-labs", + "https://www.storj.io/blog/integrating-decentralized-cloud-storage-with-duplicati", + "https://www.storj.io/blog/choosing-cockroach-db-for-horizontal-scalability", + "https://www.storj.io/blog/july-2020-development-update-from-storj-labs", + "https://www.storj.io/blog/development-update-37-from-storj-labs", + "https://www.storj.io/blog/development-update-36-from-storj-labs", + "https://www.storj.io/blog/development-update-35-from-storj-labs", + "https://www.storj.io/blog/development-update-34-from-storj-labs", + "https://www.storj.io/blog/development-update-33-from-storj-labs", + "https://www.storj.io/blog/announcing-pioneer-2-and-tardigrade-io-pricing", + "https://www.storj.io/blog/development-update-32-from-storj-labs", + "https://www.storj.io/blog/development-update-31-from-storj-labs", + "https://www.storj.io/blog/development-update-30-from-storj-labs", + "https://www.storj.io/blog/storage-nodes-are-now-supported-on-windows-home", + "https://www.storj.io/blog/development-update-29-from-storj-labs", + "https://www.storj.io/blog/development-update-28-from-storj-labs", + "https://www.storj.io/blog/announcing-beta-pioneer-1-v3-and-tardigrade-are-here", + "https://www.storj.io/blog/development-update-27-from-storj-labs", + "https://www.storj.io/blog/development-update-26-from-storj-labs", + "https://www.storj.io/blog/announcing-beacon-alpha-file-sharing-ip-filtering-and-increased-performance", + "https://www.storj.io/blog/development-update-25-from-storj-labs", + "https://www.storj.io/blog/development-update-24-from-storj-labs", + "https://www.storj.io/blog/coordination-avoidance-on-the-storj-network", + "https://www.storj.io/blog/development-update-23-from-storj-labs", + "https://www.storj.io/blog/flexible-file-sharing-with-macaroons", + "https://www.storj.io/blog/development-update-22-from-storj-labs", + "https://www.storj.io/blog/what-storage-node-operators-need-to-know-about-satellites", + "https://www.storj.io/blog/what-happens-when-you-upload-a-file-to-a-decentralized-network", + "https://www.storj.io/blog/development-update-21-from-storj-labs", + "https://www.storj.io/blog/developers-and-v3-network-make-first-contact-with-vanguard-alpha", + "https://www.storj.io/blog/development-update-20-from-storj-labs", + "https://www.storj.io/blog/our-3-step-interview-process-for-engineering-candidates", + "https://www.storj.io/blog/development-update-19-from-storj-labs", + "https://www.storj.io/blog/development-update-18-from-storj-labs", + "https://www.storj.io/blog/so-youre-ready-for-your-first-payday-as-a-storage-node-operator", + "https://www.storj.io/blog/development-update-17-from-storj-labs", + "https://www.storj.io/blog/product-manager-development-update-16", + "https://www.storj.io/blog/announcing-the-storj-v3-explorer-release", + "https://www.storj.io/blog/product-manager-development-update-15", + "https://www.storj.io/blog/product-manager-development-update-14", + "https://www.storj.io/blog/product-manager-development-update-13", + "https://www.storj.io/blog/decentralized-auditing-and-repair-the-low-key-life-of-data-resurrection", + "https://www.storj.io/blog/product-manager-development-update-12", + "https://www.storj.io/blog/product-manager-development-update-11", + "https://www.storj.io/blog/security-and-encryption-on-the-v3-network", + "https://www.storj.io/blog/replication-is-bad-for-decentralized-storage-part-1-erasure-codes-for-fun-and-profit", + "https://www.storj.io/blog/product-manager-development-update-10", + "https://www.storj.io/blog/the-benefits-of-decentralization-go-far-beyond-ideology", + "https://www.storj.io/blog/introducing-the-storj-v3-white-paper", + "https://www.storj.io/blog/product-manager-development-update-9", + "https://www.storj.io/blog/product-manager-development-update-8", + "https://www.storj.io/blog/product-manager-development-update-7", + "https://www.storj.io/blog/product-manager-development-update-6", + "https://www.storj.io/blog/product-manager-development-update-5", + "https://www.storj.io/blog/product-manager-development-update-4", + "https://www.storj.io/blog/product-manager-development-update-3", + "https://www.storj.io/blog/product-manager-development-update-2", + "https://www.storj.io/blog/product-manager-development-update-1", + "https://www.storj.io/blog/a-look-at-storj-labs-decentralized-cloud-storage-architecture-with-jt-olio", + "https://www.storj.io/blog/lensm", + ): + return + + slug = url.removeprefix("https://www.storj.io/blog/") + os.makedirs(os.path.join("output", slug), exist_ok=True) + + def persist_image(url): + if not url.strip(): return "" + image_data = cached_download(url) + filename = (hashlib.sha256(url.encode("utf8")).hexdigest()[:16] + "." + + os.path.basename(url).split(".")[-1]) + with open(os.path.join("output", slug, filename), "wb") as fh: + fh.write(image_data) + return "./" + filename + + date = dateparse(parsed.find_all("div", class_="blog-details")[0].string or "1970-01-01") + author = parsed.find_all("div", class_="blog-author")[0].string or "No Author" + title = parsed.find_all("h1", class_="blog-post-title")[0].string + hero_image = persist_image(parsed.find_all("img", class_="blog-hero-image")[0].get("src")) + description = parsed.find_all("meta", property="og:description")[0].get("content") + blog_copy = parsed.find_all("div", class_="blog-copy")[0] + if not description.strip(): + description = blog_copy.text + if len(description) > MAX_AUTO_DESCRIPTION_LENGTH: + description = description[:MAX_AUTO_DESCRIPTION_LENGTH-3] + "..." + + for image in blog_copy.find_all("img"): + image["src"] = persist_image(image.get("src")) + + content = MarkdownConverter(heading_style=ATX).convert_soup(blog_copy) + + with open(os.path.join("output", slug, "page.md"), "wb") as fh: + fh.write("---\n{frontmatter}\n---\n\n{content}".format( + frontmatter=yaml.safe_dump({ + "layout": "blog", + "title": str(title), + "date": str(date), + "author": {"name": str(author)}, + "heroimage": str(hero_image), + "metadata": { + "title": str(title), + "description": str(description), + }, + }), + content=content).encode("utf8")) + + +def main(): + seen = set() + queue = ["https://www.storj.io/blog/"] + while queue: + url = queue.pop(0) + if url in seen: continue + seen.add(url) + text = filtered_cached_download(url) + parsed = BeautifulSoup(text, "html.parser") + if (text and url.startswith("https://www.storj.io/blog/") and + len(url) > len("https://www.storj.io/blog/")): + handleblog(url, parsed) + links = parsed.find_all("a") + for link in links: + href = link.get("href") + if not href: continue + if href.startswith("/"): + href = "https://www.storj.io" + href + if href.startswith("?"): + href = url.split("?")[0] + href + if not href.startswith("http://") and not href.startswith("https://"): + continue + queue.append(href) + + +if __name__ == "__main__": + main()