Whatwg-url is a spec-compliant URL parser written in Go. See WHATWG website for the specification.
Whatwg-url parser is up to date as of 24 May 2023 and passes all relevant tests from web-platform-tests
API is similar to Chapter 6 in WHATWG URL Standard. See documentation for details.
import "github.com/nlnwa/whatwg-url/url"
url, _ := url.Parse("http://example.com:80/a?b#c")
fmt.Println(url.Scheme()) // http
fmt.Println(url.Host()) // example.com
fmt.Println(url.Port()) // ""
fmt.Println(url.Pathname()) // "/a"
fmt.Println(url.Href(false)) // http://example.com/a?b#c
fmt.Println(url.Href(true)) // http://example.com/a?b
fmt.Println(url.Hash()) // "#c"
fmt.Println(url.Fragment()) // "c"
fmt.Println(url.Search()) // "?b"
fmt.Println(url.Query()) // "b"
fmt.Println(url) // http://example.com/a?b#c
The default parser instance follows the WHATWG URL Standard. To adapt parsing to other needs, create a new parser instance and configure it with options.
example:
p := url.NewParser(url.WithAcceptInvalidCodepoints(), url.WithCollapseConsecutiveSlashes())
If you want canonicalization beyond what's described in the standard, you can use the Canonicalizer API. You can define your own canonicalization profile:
c := canonicalizer.New(canonicalizer.WithRemoveUserInfo(), canonicalizer.WithRemoveFragment())
url, err := c.Parse("http://[email protected]/a?b#c")
Or use one of the predefined profiles:
url, err := canonicalizer.GoogleSafeBrowsing.Parse("http://[email protected]/a?b#c")