Closed Bug 1393416 Opened 7 years ago Closed 5 years ago

Stop storing Winsock_LSP in processed crash

Categories

(Socorro :: Processor, task, P3)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: peterbe, Assigned: willkg)

References

Details

Attachments

(2 files)

All the Winsock_LSPRule rule [0] does is that it copies the raw_crash.Winsock_LSP value verbatim into the processed crash. Now it's stored twice. 
Also, it's often a very large string.

Can we get away with *only* storing it in the raw crash and still be able to SuperSearch on it??

What's curious is that when we stored Super Search Fields in ES we had two entries for it. One for processed_crash (exposed and string analyzed) and one for raw_crash (not exposed, enum). 
But in the new super_search_fields.json we only keep the processed_crash one [1].
Surely we can keep it in raw_crash only and still index it in Super Search.

This bug is about trying to find out why it is duplicated and whether we can get away with NOT duplicating it any more and rewire Super Search Fields to point to the raw_crash value instead. 

[0] https://github.com/mozilla-services/socorro/blob/778dbb949049f7e7c18446aa55711377b6b17304/socorro/processor/mozilla_transform_rules.py#L656-L662
[1] https://github.com/mozilla-services/socorro/blob/d6ecf522f4f93352b0302bc149bd599238b6451c/socorro/external/es/data/super_search_fields.json#L3034-L3057
Winsock_LSPRule and friends were added for bug #613874.

Reading through the comments there, seems like the need is for that field to show up in crash reports shown on crash-stats, but there's no need to search it or facet on it. That bug is from 7 years ago, so maybe usage has changed since then.
Out of curiosity I wanted to see how much this winsock_lsp adds up in bytes. 
Out of the 100 first crashes, it's a median of 1,760 bytes per crash. It's the second biggest after telemetry_environment (6,977 bytes).
Meaning it adds up to a lot of storage which we can get away with reducing by half if we don't put it into twice.
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #1)
> Winsock_LSPRule and friends were added for bug #613874.
> 
> Reading through the comments there, seems like the need is for that field to
> show up in crash reports shown on crash-stats, but there's no need to search
> it or facet on it. That bug is from 7 years ago, so maybe usage has changed
> since then.

That bug is just about is being visible on the report index page. We don't use SuperSearch for that page. 
As Lars pointed out in https://bugzilla.mozilla.org/show_bug.cgi?id=613874#c3 "On reviewing a processed crash dump, I see that it already contains the field in question."

I don't think it's wrong to make it available in Super Search but we should be able to get by with doing Super Search on the raw_crash.
Adrian, 
Can you please weigh in here. Do you remember why it's in the processed crash? And what do you think about the idea of killing that Winsock_LSPRule entirely and redirecting the Super Search to use the raw_crash.Winsock_LSP?
Flags: needinfo?(adrian)
I bet it's for historical reasons. Raw and processed crashes used to be quite separated, and that's not the case anymore. It increasingly makes less and less sense to have them separated in modern Socorro. 

Removing it from the processed crash might cause problems for people using that through the API for example, as you would effectively be changing the interface. But I don't think that is a big problem, and we have a very easy work-around (user the raw crash).
Flags: needinfo?(adrian)
(In reply to Adrian Gaudebert [:adrian] from comment #5)
> I bet it's for historical reasons. Raw and processed crashes used to be
> quite separated, and that's not the case anymore. It increasingly makes less
> and less sense to have them separated in modern Socorro. 
> 
> Removing it from the processed crash might cause problems for people using
> that through the API for example, as you would effectively be changing the
> interface. But I don't think that is a big problem, and we have a very easy
> work-around (user the raw crash).

Perhaps I phrased myself poorly. What I meant was that we A) stop store the Winsock_LSP in the processed_crash and B) change the supersearch field's namespace from "processed_crash" to "raw_crash". 

So if someone depends on something like `/api/SuperSearch/?other=parameters&_columns=winsock_lsp&_facets=winsock_lsp nothing would change, right?

The problem, if I understand this correctly, is that just before we insert a big fat dict into Elasticsearch is that that dict contains both `crash['raw_crash']['Winsock_LSP']` and `crash['processed_crash']['Winsock_LSP']`. Right?
(In reply to Peter Bengtsson [:peterbe] from comment #6)
> A) stop store the Winsock_LSP in the processed_crash

That means changing the API of the ProcessedCrash service, which is what I was concerned about. SuperSearch API will indeed not be affected. 

(In reply to Peter Bengtsson [:peterbe] from comment #6)
> The problem, if I understand this correctly, is that just before we insert a
> big fat dict into Elasticsearch is that that dict contains both
> `crash['raw_crash']['Winsock_LSP']` and
> `crash['processed_crash']['Winsock_LSP']`. Right?

I don't understand this.
This seems nice-to-have, but not urgent. Making it a P3.
Priority: -- → P3

Pushed to prod in 365. Marking as FIXED.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
See Also: → 1762000
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: