From A Tour of Go, concurrency exercise #10.
$> go run *.go
-
In
main.go
we create astateManager
struct, which holds 3 things: a. afetched
map, so we can check whether we've already fetched a URL inO(1)
time, b. awaitGroup
that willAdd(1)
when we start fetching a url andDone
when we finish fetching a url, and c. arwMutex
so we can safely read to and write from thefetched
array from many goroutines concurrently -
In
crawl.go
we implement anisAlreadyFetched
helper function that uses ourstateManager.rwMutex
to check whether a url has already been fetched and return a boolean. -
Also in
crawl.go
, we: a.Add(1)
to ourstateManager.waitGroup
to tell theWaitGroup
that we are waiting to fetch an item, b. implement agoroutine
using an IIFE (Immediately Invoked Function Expression) --go func() { ... }()
that does the following...- tell the wait group that we're finished whenever we return from the IIFE's scope,
- return if we've reached our maximum fetching depth,
- skip processing the node if we've already fetched the url,
- handle fetching failures,
- safely mark the url as fetched in
stateManager.fetched
using ourstateManager.rwMutex
(lock for writing this time), and - recurse and yield to other goroutines.
- Allow
main.go
to receive the depth as a command-line argument. - Refactor the
Crawl
function's IIFE. It has too many responsibilities, and mixes levels of abstraction: orchestration of concurrency, logging, error handling, recursion.