switch symbols-urls to use tecken
Categories
(Socorro :: Processor, task, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
(Blocks 1 open bug)
Details
We have a symbols-urls configuration parameter that has a list of urls to check in order for SYM files that minidump-stackwalk needs to symbolicate stacks. Currently, we check the public symbols bucket, then the private bucket, then hit Tecken last. Hitting Tecken last allows Tecken to record the missing symbol file so it can report on what's missing that we should upload.
We want to change that to private bucket and then Tecken. This bug covers that.
Assignee | ||
Comment 1•6 years ago
|
||
I talked with John and Brian about this. It would make it a little easier to migrate Tecken to GCP so that's kind of nice. We were wondering what kind of performance hit this would cause to minidump-stackwalk and whether it'd affect Tecken. So the first order of business would be to approximate that and if it's bad, then maybe not do this.
SYM file is in public bucket
I think this is the most likely scenario that happens.
current:
- minidump-stackwalk checks public s3 bucket (hit)
proposed:
- minidump-stackwalk checks private s3 bucket (miss)
- minidump-stackwalk checks tecken
- tecken checks public s3 bucket (hit) and sends redirect
- minidump-stackwalk downloads from public s3
This ends up being more HTTP requests (1 vs. 4).
Socorro tends to process crashes from recent builds more often than other builds, so more of these are cached.
SYM file isn't in any bucket
I think this is the second most likely scenario.
current:
- minidump-stackwalk checks public s3 bucket (miss)
- minidump-stackwalk checks private s3 bucket (miss)
- minidump-stackwalk checks tecken
- tecken checks public s3 bucket (miss)
proposed:
- minidump-stackwalk checks private s3 bucket (miss)
- minidump-stackwalk checks tecken
- tecken checks public s3 bucket (miss)
This ends up being fewer HTTP requests (4 vs. 3). This is never cached, so we do this entire thing every time.
SYM file is in private bucket
I think this is unlikely--we don't have many symbol files in the private bucket.
current:
- minidump-stackwalk checks public s3 bucket (miss)
- minidump-stackwalk checks private s3 bucket (hit)
proposed:
- minidump-stackwalk checks private s3 bucket (hit)
This scenario is probably rare and getting increasingly rarer since we don't have many symbols in the private bucket.
This is off the top of my head. minidump-stackwalk doesn't emit any signal about cache hits/misses or how long it takes to download SYM files or where they came from. That "SYM is in public bucket" scenario is concerning, but maybe the HTTP requests that are misses and such are dominated by downloading the SYM file in which case it doesn't matter much? One way we could do this is write a simulator that goes through json_dump output for a bunch of consecutively processed crashes and tells us what the differences might be.
That seems like a lot of work. Seems better to just switch stage, see how that goes, and then approximate it based on that.
Assignee | ||
Comment 2•5 years ago
|
||
Dropping this to a P3. We can think about it later when we're closer to Tecken moving to GCP or some other compelling reason comes up.
Assignee | ||
Comment 3•5 years ago
|
||
All the moves are done so we're not waiting on that anymore.
We should do this and see how it affects processing times. If it makes processing times worse, then maybe we don't want to do it. Otherwise, I think we should since it makes it easier to move kinds of symbols around in Tecken without having to change it here, too.
Assignee | ||
Comment 4•5 years ago
|
||
Brian: Do you want to weigh in here? Is this ok to test out next week in stage and then prod?
Comment 5•5 years ago
|
||
Are we planning any changes to where Tecken stores symbols? If so, then I agree doing this makes sense, at least temporarily. Otherwise, I don't see any benefit from this change, but I do see risk from making Socorro's processor dependent on Tecken's availability.
Assignee | ||
Comment 6•5 years ago
|
||
I'm concerned that when we make changes to symbols locations, we (and all future maintainers) have to remember to update Socorro's configuration. I'd prefer not to have details like that littered across projects. I hear you on
Tecken has been pretty stable for a long time. I'm working on improving quality checks for Tecken so as to reduce the "stability as a fluke". Even so, I recognize that this gives additional impetus to keeping Tecken stable and up. Further, there's nothing in Socorro to indicate that Tecken was down when it was trying to process a crash and thus failed to symbolicate the stack. It'd be nice if it had something like that, but the symbolication code is in minidump-stackwalk and is complicated to work with.
Bug #1603278 is about how we're storing system symbols in with everything else which expires after 2 years, but system symbols should stick around longer. For example, Ubuntu LTS is supported for 5 years. Some people are using older versions of MacOS and Windows and Android and other Linux distributions. I was thinking we probably need to move the system symbols to another path with a different expiration like we do with try symbols.
That's the only change I've got on the books. Having said that, I'm swamped, so I don't know when I'm going to get to it.
Mmm... I think I've argued myself into "it's fine now, let's push it off".
Assignee | ||
Comment 7•4 years ago
|
||
Comment #1 doesn't take into account the try location which we treat like a separate bucket.
Also, the task that dominates minidump-stackwalk time is parsing SYM files--not HTTP requests or stackwalking or symbolication.
I want to wait on this, but we should do it before we start GCP things.
Assignee | ||
Comment 8•3 years ago
|
||
I had another idea about this... What if we switched it to:
- hit symbols.mozilla.org
- hit private symbols bucket
Most symbols are not private symbols, so hitting symbols.mozilla.org is a single HTTP request and takes advantage of Tecken's symbol-exists cache and also marks a missing symbol. Further, when we start the GCP migration and have symbols in both GCP and AWS, we won't have to change Socorro.
The weird case is when the symbol we want is a private symbol. It'll get marked as a missing symbol as a result of not being available via symbols.mozilla.org. However, most symbols aren't private and anyone who doesn't have direct access to the private symbols bucket (which is everyone except socorro) is going to have it marked as a missing symbol, too, so even though it's wrong, I don't think it messes up the bookkeeping in a meaningful way.
The one issue here is that this will increase Tecken usage. I think that'll be fine. The downloads API is pretty fast and minimal.
I'm going to toss this in my queue of things to do in January 2022.
Assignee | ||
Comment 9•3 years ago
|
||
I thought about this some more. I want to split out the "mark this as a missing symbol" to a separate endpoint. Then we can do this in Socorro:
SYMBOLS_URLS=https://symbols.mozilla.org/try,PRIVATEBUCKET,https://symbols.mozilla.org/api/missing/
That'll get all the bookkeeping right, be pretty fast (generally), and work as we migrate Tecken.
I'll write up a bug in Tecken for that.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 10•2 years ago
•
|
||
We're going to nix the missing symbols bookkeeping altogether. That removes the complexity from this bug and will allow us to do:
SYMBOLS_URLS=https://symbols.mozilla.org/try,PRIVATEBUCKET
Assignee | ||
Comment 11•2 years ago
|
||
I created a PR in the infra repo.
Assignee | ||
Comment 12•2 years ago
|
||
The PR landed. We did a stage deploy.
I checked Grafana (Socorro and Tecken), the Crash Stats stage site, and logs and verified the following things:
- the logs show
symbols_urls
is set correctly for theMinidumpStackwalkerRule
in stage - crash reports are getting processed correctly with symbols
- there's no noticeable effect on Tecken for download API requests; Socorro stage is roughly 10% of the processing as prod, but Tecken gets so many requests, it looks like it's a drop in the bucket
I think we're good!
Assignee | ||
Comment 13•2 years ago
•
|
||
This was pushed to prod a few hours ago in bug #1809927. I checked Tecken and I don't see a worrysome change in download API usage, so I think we're going to be fine. Marking as FIXED.
Description
•