TLDR; The author argues that free-form logging is quite useless/expensive to use. They also argue that structured logging is less effective than tracing b/c of mainly the difficulty of inferring timelines and causality.
I find the arguments very plausible.
In fact I very rarely use logs produced by several services b/c most of the times they just confuse me. The only time that I heavily use logs is troubleshooting a single service and looking at its stdout (or kubectl log
.)
However I have very little experience w/ tracing (I’ve used it in my hobby projects but, obviously, they never represent the reality of complex distributed systems.)
Have you got real world experience w/ tracing in larger systems? Care to share your take on the topic?
My own thoughts.
Instead of defining the difference between logging and tracing, the author spams the screen with pages’ worth of examples of why logging is bad, then jumps into tracing by immediately referencing code that uses a specific tracing library (OpenTelemetry Tracer) without at any point explaining what that code is actually doing to someone who is not familiar with it already. To me this smacks of preaching to the choir since if you’re already familiar with this tool, you’re likely already a) familiar with what “tracing” is compared to “logging”, and b) probably a tracing advocate to begin with. If you want to persuade an undecided or unfamiliar audience, confusing them and/or making assumptions about what they know or don’t know is … suboptimal.
If you’re going to screen dump your code in your rant, FUCKING COMMENT IT YOU GIT! I don’t want to have to read through 100 lines of code in an unfamiliar language written to an unfamiliar architecture to find the three (!) lines that are actually on the fucking topic!
If you’re going to show changes in your code, put before/after snapshots side by side so I don’t have to go scrolling back to the uncommented hundred-line blob to see what changed. It’s not that hard. Using his own damned example from “Step 1”:
// BEFORE func PrepareContainer(ctx context.Context, container ContainerContext, locales []string, dryRun bool, allLocalesRequired bool) (*StatusResult, error) { logger.Info(`Filling home page template`)
// AFTER var tr = otel.Tracer("container_api") func PrepareContainer(ctx context.Context, container ContainerContext, locales []string, dryRun bool, allLocalesRequired bool) (*StatusResult, error) { ctx, span := tr.Start(ctx, "prepare_container") defer span.End()
(And while you’re at it, how 'bout explaining the fucking code you wrote? How hard is it to add a line explaining what that
defer span.End()
nonsense is? Remember, you’re trying to sell people on the need for tracing. If they already know what you’re talking about you’re preaching to the choir, son.)Of course in “The Result” he talks about the diff between the two functions … but doesn’t actually provide that diff. Instead he provides another hundred-line blob kept far away from the original so you have to bounce back and forth between them to spot the differences. Side-by-side diffs are a thing and there’s plenty of tools that make supplying them trivial. Maybe the author should think about using them.
I got to admit that your point about the presentation skills of the author are all correct! Perhaps the reason that I was able to relate to the material and ignore those flaws was that it’s a topic that I’ve been actively struggling w/ in the past few years 😅
That said, I’m still happy that this wasn’t a YouTube video or we’d be having this conversation in the comments section (if ever!) 😂
To your point and @krnpnk@feddit.de’s RE embedded systems:
That’s absolutely true that such a mindset is probably not going to work in an embedded environment. The author, w/o explicitly mentioning it anywhere, is explicitly talking about distributed systems where you’ve got plenty of resources, stable network connectivity and a log/trace ingestion solution (like Sumo or Datadog) alongside your setup.
That’s indeed an expensive setup, esp for embedded software.
The narrow scope and the stylistic problem aside, I believe the author’s view is correct, if a bit radical.
One of major pain points of troubleshooting distributed systems is sifting through the logs produced by different services and teams w/ different takes of what are the important bits of information in a log message.
It get extremely hairy when you’ve got a non-linear lifeline for a request (ie branches of execution.) And even worse when you need to keep your logs free of any type of information which could potentially identify a customer.
The article and the conversation here got me thinking that may be a combo of tracing and structured logging can help simplify investigations.
That is the very core of my objection. He hasn’t identified the warrants for his argument, meaning his argument is literally gibberish to people working from a different set of warrants. Dudebro here could learn a thing or two from Toulmin.
This is a problem endemic to techbros writing about tech. They assume, quite incorrectly, that the entire world is just clones of themselves perhaps a little bit behind on the learning curve. (It never occurs, naturally, that others might be ahead of them on the learning curve or *gasp!* that there may be more than one curve! That would be silly!)
So they write without establishing their warrants. (Hell, they often write without bothering to define their terms, because “trace” means the same thing in all forms of computer technology, amirite?!) They write as if they have The Answer instead of merely a possible answer in a limited set of circumstance (which they fail to identify). And they write as if they’re on the top of the learning heap instead of, as is statistically far more likely, somewhere in the middle.
Which makes it funny when he sings the praises of a tracing library that, when I investigated it briefly, made me choke with laughter at just how painfully ineffective it is compared to tools I’ve used in the past; specifically Erlang’s tracing tools. The library he’s text-wanking to is pitifully weak compared to what comes out of the box in an Erlang environment. You have to manually insert tracing calls (error-prone, tedious, obfuscatory) for example. Whatever you don’t decide to trace in advance can’t be traced. Whereas Erlang’s tracing system (and, presumably Ruby-on-BEAM’s, a.k.a. Elixir) lets you make ad hoc tracing calls on live systems as they’re executing. This means you can trace a live system as it’s fucking up without having to be a precognitive psychic when coding, leaving the costs of tracing at 0 until such a time as you genuinely need them.
So he doesn’t identify his warrants, he writes as if he has the One True Answer, he assumes all programming forms use the same jargon in the same way, and he acts as if he’s the guru sharing his wisdom when he’s actually way behind the curve on the very tech he’s pitching.
He is a, in a word, programmer.
I couldn’t agree more 😂
Except that, what the author uses is pretty much standard in the Go ecosystem, which is, yes, a shame.
To my knowledge, the only framework which does it quite seamlessly is Spring Boot which, w/ sane and well thought out defaults, gets the tracing done w/o the programmer writing a single line of code to do tracing-related tasks.
That said, even Spring’s solution is pretty heavy-weight compared to what comes OOTB w/ BEAM.