Maximum-performance example

A complete, runnable program that stacks every v1.1.0 opt-in — PoolRequestBundle, PoolFastParams, Pre, Use, HandleFast, UseFast, and a Mounted net/http/pprof tree — plus an in-process /bench endpoint that measures the live speed-up on the user's hardware and returns the result as JSON. Treat this page as the operational companion to the Maximum performance guide, which covers the contracts and the audit.

Step 1 — Enable both pool opt-ins

Both pools eliminate the per-request allocation on the routing layer. The lifetime contract — handlers must not retain *http.Request (on Handle) or the Params slice (on HandleFast) past return — is documented in Maximum performance. Every handler in this example is written to satisfy that contract.

mux := mm.New()

// PoolRequestBundle: recycles the fused reqBundle.
// Drops 1-param routes from 105 ns / 384 B / 1 alloc to ~45 ns / 0 B / 0 allocs.
mux.PoolRequestBundle = true

// PoolFastParams: recycles the Params slice handed to FastHandler routes.
// Drops FastParam routes from 50 ns / 32 B / 1 alloc to ~44 ns / 0 B / 0 allocs.
mux.PoolFastParams = true

Step 2 — `Pre` for cross-cutting policy

Pre runs once per request, before route lookup, so any work it does (request-ID generation, panic recovery, IP rewriting) is paid uniformly across both Handle and HandleFast routes. Use Pre for policy that must wrap every request.

mux.Pre(
    mw.RequestID(),                  // X-Request-Id propagation
    mw.RecovererWithLogger(log),     // recover from panics in handlers
)

Step 3 — `Group` + `Use` for the JSON REST API

Use applies stdlib middleware at route registration time. It wraps only Handle routes — registering a HandleFast route on a Mux that has Use panics at registration. The panic is deliberate: silently mixing the two would let HandleFast routes bypass authentication, logging, or any other policy you intended to apply globally.

v1 := mux.Group("/v1")
v1.Use(mw.Logger(os.Stdout))

v1.GET("/users/:id", getUser)                              // 1 param, 0 alloc
v1.GET("/users/:id/orders/:orderID", getUserOrder)         // 2 params, 0 alloc
v1.GET("/orgs/:org/repos/:repo/issues/:num", getRepoIssue) // 3 params, 0 alloc
v1.GET("/static/*filepath", listStaticFile)                // catch-all, 0 alloc
v1.POST("/users", createUser)

// Regex-constrained route on a different prefix (a regex param and a `:name`
// param cannot share the same parent in the radix tree).
v1.GET("/profiles/{id:[0-9]+}", getUserProfile)

// Background-work pattern.
v1.POST("/events", postEvent)

Step 4 — `HandleFast` + `UseFast` for the latency-sensitive hot path

HandleFast bypasses the standard context allocation entirely. Parameters arrive as a third argument (mm.Params) instead of through r.Context(). Stdlib Use middleware is not applied — use UseFast for the FastMiddleware family, or rely on Pre for cross-cutting policy.

mux.UseFast(fastTimer(log))
mux.GETFast("/v1/health", healthFast)
mux.GETFast("/v1/metrics/:metric", metricsFast)

fastTimer is a FastMiddleware — the fast-path equivalent of stdlib middleware:

func fastTimer(log *slog.Logger) mm.FastMiddleware {
    return func(next mm.FastHandler) mm.FastHandler {
        return func(w http.ResponseWriter, r *http.Request, ps mm.Params) {
            start := time.Now()
            next(w, r, ps)
            log.Debug("fast", "path", r.URL.Path, "elapsed", time.Since(start))
        }
    }
}

Latency target: ~25 ns for static, ~44 ns for 1-parameter routes with PoolFastParams on.

Step 5 — The background-work handler

postEvent is the canonical example of the body-drain-before-spawn pattern: every value the goroutine needs is captured by value (body, requestID, remoteAddr) before the go statement. The goroutine has no reference to r, so the bundle recycles cleanly the instant postEvent returns.

func postEvent(w http.ResponseWriter, r *http.Request) {
    // Snapshot primitives BEFORE spawning anything async.
    body, err := io.ReadAll(io.LimitReader(r.Body, 1<<20))
    if err != nil {
        http.Error(w, "bad body", http.StatusBadRequest)
        return
    }
    requestID := mw.GetRequestID(r.Context())
    remoteAddr := r.RemoteAddr

    // Now we are safe to fan out — `body`, `requestID`, `remoteAddr` are all
    // values; the bundle can be recycled the moment we return.
    go func() {
        fmt.Fprintf(os.Stderr,
            "event accepted req=%s peer=%s bytes=%d\n",
            requestID, remoteAddr, len(body))
    }()

    w.WriteHeader(http.StatusAccepted)
}

If r were captured directly, the goroutine would observe a recycled bundle once postEvent returns — exactly the use-after-free that the audit checklist catches.

Step 6 — The `/bench` endpoint: measure pool wins live

The /bench endpoint builds two Mux instances — one default, one with PoolRequestBundle enabled — registers the same /users/:id route on each, and times 200 000 dispatches against each through httptest. It returns a JSON payload with the per-op latency, the per-op allocations measured via runtime.MemStats, and the speed-up ratio.

func benchHandler(w http.ResponseWriter, r *http.Request) {
    const iterations = 200_000

    mux := mm.New()
    mux.GET("/users/:id", func(w http.ResponseWriter, r *http.Request) {})

    muxPool := mm.New()
    muxPool.PoolRequestBundle = true
    muxPool.GET("/users/:id", func(w http.ResponseWriter, r *http.Request) {})

    req := httptest.NewRequest(http.MethodGet, "/users/42", nil)
    rec := httptest.NewRecorder()

    // Warm-up so first-call costs do not skew the result.
    for range 1000 {
        mux.ServeHTTP(rec, req)
        muxPool.ServeHTTP(rec, req)
    }

    startDefault := time.Now()
    allocsDefault := runMeasured(iterations, func() { mux.ServeHTTP(rec, req) })
    nsDefault := time.Since(startDefault).Nanoseconds() / int64(iterations)

    startPool := time.Now()
    allocsPool := runMeasured(iterations, func() { muxPool.ServeHTTP(rec, req) })
    nsPool := time.Since(startPool).Nanoseconds() / int64(iterations)

    _ = json.NewEncoder(w).Encode(map[string]any{
        "route":         "/users/:id",
        "iterations":    iterations,
        "default":       benchResult{NsPerOp: nsDefault, AllocsPerOp: allocsDefault},
        "pooled":        benchResult{NsPerOp: nsPool, AllocsPerOp: allocsPool},
        "speedup_ratio": fmt.Sprintf("%.2fx", float64(nsDefault)/float64(nsPool)),
        "go":            runtime.Version(),
    })
}

runMeasured snapshots runtime.MemStats.Mallocs around the loop and divides by the iteration count — a coarser but more portable measurement than testing.B.AllocsPerOp.

Step 7 — `Mount` `net/http/pprof` for production profiling

Mount grafts another http.Handler onto the Mux with prefix stripping. Mounting net/http/pprof's default ServeMux exposes the full profiler endpoint suite (/debug/pprof/profile, /heap, /goroutine, /block, etc.) without rewriting MuxMaster routes for each. The _ "net/http/pprof" blank import is what registers those endpoints onto http.DefaultServeMux.

import _ "net/http/pprof" // attaches /debug/pprof/* to http.DefaultServeMux

mux.Mount("/debug/pprof", http.DefaultServeMux)

Capture a CPU profile under load:

curl http://localhost:8080/debug/pprof/profile?seconds=10 > cpu.prof
go tool pprof -top -cum cpu.prof

Try it

go run .

# REST routes — zero allocations on the pool path
curl http://localhost:8080/v1/health
curl http://localhost:8080/v1/users/42
curl http://localhost:8080/v1/orgs/acme/repos/api/issues/123

# Live in-process benchmark — returns JSON with speedup_ratio
curl http://localhost:8080/bench

# Live configuration snapshot
curl http://localhost:8080/config

The /bench response on the reference hardware (AMD Ryzen 9 5900HX, Go 1.26.2) reports speedup_ratio: "2.40x" and allocs_per_op: 0 on the pooled mux, matching the headline numbers in Benchmarks.

Frequently asked questions

Why does `/bench` show ~2.4× and not the 20 % headline?

The 20 % headline (45 ns vs 56 ns) is the competitor showdown measurement — MuxMaster Pooled vs httprouter on a one-parameter route. The 2.4× speed-up reported by /bench is internal: MuxMaster default (~105 ns / 1 alloc) vs MuxMaster Pooled (~45 ns / 0 alloc), same router, same machine, same route. They answer different questions. Both numbers come from the same bench_test.go harness.

Should I leave `pprof` mounted in production?

Only behind authentication or on a private port. The _ "net/http/pprof" blank import registers /debug/pprof/* on http.DefaultServeMux, which Step 7 mounts at /debug/pprof on the public router. In production, mount it on a separate http.Server listening on a loopback or private interface, or wrap the group with mw.BasicAuth / mw.APIKey so the profiler endpoints are not world-reachable. The endpoints leak Go runtime state and can DoS the process if profiled under load by an attacker.

Can I enable `PoolRequestBundle` for some routes only?

No — PoolRequestBundle is a *Mux flag, not per-route. You can either keep two *Mux instances (one with the pool, one without) and mount one inside the other via Mount, or split the service so the pool-safe handlers live on a different *Mux from the pool-unsafe ones. The audit checklist in Maximum performance is the better tool: catch unsafe captures, fix them with the drain-before-spawn pattern, and turn the pool on globally.

Upstream source

Every code excerpt above is lifted verbatim from examples/max-performance/main.go at the v1.1.0 tag. The upstream file also contains the full handler set (getUserOrder, getRepoIssue, createUser, getUserProfile, listStaticFile, metricsFast), the /config endpoint, and the graceful-shutdown wiring — follow the link for the full program.

Maximum-performance example

Step 1 — Enable both pool opt-ins

Step 2 — Pre for cross-cutting policy

Step 3 — Group + Use for the JSON REST API

Step 4 — HandleFast + UseFast for the latency-sensitive hot path

Step 5 — The background-work handler

Step 6 — The /bench endpoint: measure pool wins live

Step 7 — Mount net/http/pprof for production profiling