Closed Bug 1762271 Opened 2 years ago Closed 2 years ago

replace json-schema-reducer

Categories

(Socorro :: Processor, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(1 file)

A while back, we started using json-schema-reducer which Peter wrote for contribute.json and maybe AirMo. That takes a Python nested structure and a JSON schema and returns the reduced nested structure. It only looks at the structure--it doesn't do any type or other validation. The end result is that we have a reduced structure with who-knows-what.

Further, it looks like that hasn't been updated since 2016, so it's unmaintained.

I think we should either look for another reducer or write one. It should reduce a structure based on a schema and also do the type validation. We'll use this for TelemetryBotoS3CrashStroage and possibly other things.

Grabbing this to look into now since I need it for bug #1755095.

Assignee: nobody → willkg
Blocks: 1755095
Status: NEW → ASSIGNED

I looked for other reducers on PyPI and didn't see anything, so I think I'm going to roll my own. I've done something similar recently, so I think it's doable.

It needs to do the following:

  1. traverse a document and prune anything that isn't in the specified json schema
  2. handle type checking
  3. handle json references (https://cswr.github.io/JsonSchema/spec/definitions_references/)
  4. accept a predicate that looks at the schema and determines whether this item should be included or not -- we can use this for removing all the protected parts of crash data

I'm pretty sure I have most of that working. I need to write some tests covering edge cases. I should also go back and re-read JSON Schema structure. The telemetry socorro crash json schema and java exception schemas are draft-04, so I'm currently writing it to work with those.

I got a working reducer with tests.

When I went to test it on crash data, I hit a slew of issues where the data in the processed crash is the wrong type. Bug #1754035 covers schema problems related to the new stackwalker. I'm hitting problems with a lot of other fields, too.

Examples:

  • available_physical_memory is an int in the processed crash, but defined as a [string, null] in the schema
  • available_virtual_memory is an int in the processed crash, but defined as a [string, null] in the schema
  • safe_mode is a bool in the processed crash, but defined as a [string, null] in the schema and has values "0" and "1" in telemetry.socorro_crash

We need to not break what's in telemetry.socorro_crash, so I'm going to write a converter to fix fields that need fixing as specified in the schema.

I'll fix the other fields in this bug and fix the issues related to the new stackwalker in bug #1754035.

This went to production in bug #1763234. If there are any issues, we can fix them in new bugs. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: