Closed Bug 1652520 Opened 4 years ago Closed 4 years ago

Enforce file extensions for pdfs, office style documents and media file types

Categories

(Firefox :: File Handling, defect, P3)

78 Branch
defect

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox84 --- fixed

People

(Reporter: watkinggg, Assigned: Gijs)

References

Details

(Whiteboard: [pdfjs-c-integration])

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0

Steps to reproduce:

Downloading .pdf files problem seen repeatedly from documents held in a group on https://www.goassemble.com/ i.e. when I am logged in to the group (set up by an organisation I belong too).

Actual results:

Downloaded files will not vopen apparently because they do not to have .pdf file type. Problem does not occur when same files downloaded via another browser i.e. google chrome. Renaming file in Windows explorer to add .pdf to file name makes it openable in pdf exchange.

Nobodly else in group has reported similar problem so I wonered if it its the copy of Filefox I have but an update 78.0.1 64bit to 78.0.2 64bit made no difference.

Expected results:

Expected files to be .pdf ones when downloaded. Note other simple .pdf files from elsewhere on web appear are downloading as .pdf ones.

Not a serious problem for me as I can use another browser or edit file names.

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → PDF Viewer

The severity field is not set for this bug.
:bdahl, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bdahl)

Gijs, I think I was discussing a similar issue with you. This seems like an example where we should look at the file's magic number to infer content type.

Severity: -- → S3
Flags: needinfo?(bdahl)
Whiteboard: [pdfjs-c-integration]

(In reply to Brendan Dahl [:bdahl] from comment #3)

Gijs, I think I was discussing a similar issue with you. This seems like an example where we should look at the file's magic number to infer content type.

Maybe you're thinking of bug 1645796 ?

The site requires a login etc., so we can't reproduce. It'd be helpful if we had complete header information for this download, so we can investigate what the issue is. Reporter, can you maybe attach a har file or a screenshot of the header information for the PDF request from Firefox devtools or similar?

Flags: needinfo?(watkinggg)

I now have what I hope is a HAR file of a goassemble pdf download that results in a file without file type. (Instructions for doing this I found online at several sites do not match the Firefox 79.0 64bit version on this PC so I had to experiment to get to a save as HAR option).

I have tried to attach it as new file but I get a HAR file invalid response from Bugzilla. How can I attach it?

(In reply to watkinggg@yahoo.com from comment #5)

I now have what I hope is a HAR file of a goassemble pdf download that results in a file without file type. (Instructions for doing this I found online at several sites do not match the Firefox 79.0 64bit version on this PC so I had to experiment to get to a save as HAR option).

I have tried to attach it as new file but I get a HAR file invalid response from Bugzilla. How can I attach it?

I'm a little confused - I didn't think bugzilla did any file validation. If you have a file on disk you should be able to attach it to this bug as an attachment in https://bugzilla.mozilla.org/attachment.cgi?bugid=1652520&action=enter . If that's blocked somehow, you could try creating a zip file of the har file and then attaching that in the same form

Har file of download of .pdf from goassemble. File downloaded had no associated file type when downloaded

As a consequence of the above attachment process I have comments on the attachment and documentation pages of Bugzilla. Having found the feedback link and got to https://github.com/topics/bugzilla I still don't know where and how to add my comments. Any ideas how I could proceed ?

There's a valid content-type header in the har file (ie that says application/pdf), so sniffing as per comment #3 shouldn't be necessary.

However, there's an invalid content-disposition header that we probably misparse - it uses spaces in the filename but does not encode the spaces (cf. bug 1440677). So the file is called "Coastal Access Volunteers.pdf", but I expect Firefox saves it as just "Coastal" ?

I'm a little surprised we don't re-add the extension, esp. on Windows/macOS.

This is effectively a dupe of bug 1440677. See also the Chrome issue at https://bugs.chromium.org/p/chromium/issues/detail?id=1006345 - the spec doesn't allow spaces without quotes, but many servers do it anyway, and because Chrome copied IE's behaviour, and then MS switched to Blink/Chrome, they now have little incentive to change this.

Anyway, fixing bug 1440677 is tricky, and I wonder if we can somehow at least add the file extension back on Windows/macOS if we know the content-type... Marco, thoughts?

Status: UNCONFIRMED → NEW
Component: PDF Viewer → File Handling
Ever confirmed: true
Flags: needinfo?(mak)
Summary: download file type pdf , pdf file type missing → download file type pdf , pdf file extension missing (due to server not quoting filenames containing spaces)

(In reply to watkinggg@yahoo.com from comment #8)

As a consequence of the above attachment process I have comments on the attachment and documentation pages of Bugzilla. Having found the feedback link and got to https://github.com/topics/bugzilla I still don't know where and how to add my comments. Any ideas how I could proceed ?

I'm unsure where you found "the" feedback link, but you can file a bug in this tracker - probably at https://bugzilla.mozilla.org/enter_bug.cgi?format=guided#h=dupes|bugzilla.mozilla.org|Bug%20Creation%2FEditing . There's a Documentation subcomponent, too.

(In reply to :Gijs (he/him) from comment #9)

Anyway, fixing bug 1440677 is tricky, and I wonder if we can somehow at least add the file extension back on Windows/macOS if we know the content-type... Marco, thoughts?

If there's no extension and we can obtain the primary extension from the content-type, I don't see why not. We must ensure this doesn't confuse the download dialog, so the final used extension/type is shown to the user and not an empty one.

Flags: needinfo?(mak)

I'm looking at this code anyway and will see if I can find time to deal with it this week.

Flags: needinfo?(watkinggg) → needinfo?(gijskruitbosch+bugs)

I put up a patch in bug 1440677.

Going to keep the needinfo here because I think if we know the mimetype we should still be doing something here, as Windows/macOS are unlikely to do the right thing without an extension.

Hm, it seems we've not appended file extensions in cases like this for a Very Long Time.

Marco, for when you're back, how dangerous would it be to change that behaviour, perhaps only on macOS+Windows? That is, append the primary extension when no extension at all is specified? I imagine there will be some people who will be upset that a file "README" downloads to "README.txt" or something... perhaps we should only do it for non-text mimetypes? Or we could use a pref with different defaults on macOS and Windows? WDYT?

(the original case in this bug would be fixed anyway by the patch in bug 1440677 but I'd like to resolve what to do about the extension either way)

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mak)

Updating summary and clearing the extra fields. The URL and parity flags apply to bug 1440677 (for which, btw, a fix just landed!), not to this morphed bug.

Priority: -- → P3
Summary: download file type pdf , pdf file extension missing (due to server not quoting filenames containing spaces) → Consider enforcing file extensions for downloads that have none

(In reply to :Gijs (he/him) from comment #14)

Hm, it seems we've not appended file extensions in cases like this for a Very Long Time.

Marco, for when you're back, how dangerous would it be to change that behaviour, perhaps only on macOS+Windows? That is, append the primary extension when no extension at all is specified? I imagine there will be some people who will be upset that a file "README" downloads to "README.txt" or something... perhaps we should only do it for non-text mimetypes? Or we could use a pref with different defaults on macOS and Windows? WDYT?

I may have been a bit optimistic in comment 11, I'm thinking about possible use-cases and a few things came to my mind:

  1. the extension is not necessarily pointless on Linux, it may actually be informative, for example I'd like to know if a file is jpg or png, mp3 or flac.
  2. Mac is still unix-derived, so it may be more common for apps to require extension-less names
  3. It's not given that Windows always wants an extension, for example I may be downloading a mingw script on Windows or anyway a script for the Windows Subsystem for Linux, for which I may not want an extension even if it's served as application/x-sh. Or I may be downloading assets (skins, modified textures, samples) for a videogame or a music mixer app, and they may expects those files to be extension-less.
  4. Introducing different behaviors across OSes seems to add even more complication, in general I'd prefer a unified approach.

The safer path may be to always append only for a subset of mime we can control, but which ones? The only ones for which I can't think of downsides atm are documents (pdf and other "office" formats). It would be nice to do also for any media, because the format may be informative for the user, but I already found the application assets use-case that would break, there may be others I didn't even think about.

Flags: needinfo?(mak)
See Also: → 1662780

I... just realized we may have accidentally done this in bug 1667787? That is, if there is a known mimetype + extension, we'll add the extension. Per comment #16, do we need to restrict that so we don't enforce e.g. txt for text/plain, and/or don't do it at all for unknown application/** mimetypes or something ?

Flags: needinfo?(mak)
See Also: → 1667787

Ah, I thought we were doing that only for media types, but you are right there's no such a check in the patch.
I feel like it may have to be an allowlist rather than a blocklist.
I made a few examples above of assets usually distributed with custom extensions or no extensions, but we may also consider it a server bug in those cases, so they may not be that much compelling.
Though what happens if I serve my.script as application/javascript, it ends up as my.js? Or if I serve "clickme" as application/vnd.microsoft.portable-executable? Those are then immediately executable on the system, while they would have been harmless without the extension.

I'd say a nice group would be media and office files, though it sounds like another list we must manually update :(
Could check if the extension we generate is executable?

(In reply to :Gijs (he/him) from comment #17)

and/or don't do it at all for unknown application/** mimetypes or something ?

If we have an unknown application mime, where do we pick the new extension from?

Flags: needinfo?(mak)

(In reply to Marco Bonardo [:mak] from comment #18)

Ah, I thought we were doing that only for media types, but you are right there's no such a check in the patch.
I feel like it may have to be an allowlist rather than a blocklist.
I made a few examples above of assets usually distributed with custom extensions or no extensions, but we may also consider it a server bug in those cases, so they may not be that much compelling.
Though what happens if I serve my.script as application/javascript, it ends up as my.js? Or if I serve "clickme" as application/vnd.microsoft.portable-executable?

Yes, probably.

Those are then immediately executable on the system, while they would have been harmless without the extension.

Yes, though arguably they were also a bit useless without the extension (certainly on Windows/macOS). I'm not too worried about roadblocks for executables at the moment, given the signing requirements from Windows and the executable warning for any non-.exe executable files (cf. bug 1576762).

I'd say a nice group would be media and office files, though it sounds like another list we must manually update :(

Yeah, I'm not thrilled about another list, either, especially as all the lists we already have are forever out of date (and/or being disputed, see above), too.

But practically speaking, this is probably the most straightforward solution in the short term. I'll see if I can make time to do that this cycle so we don't regress something.

Could check if the extension we generate is executable?

(In reply to :Gijs (he/him) from comment #17)

and/or don't do it at all for unknown application/** mimetypes or something ?

If we have an unknown application mime, where do we pick the new extension from?

Right, sorry, I meant "unknown" in the sense of one of those lists that we both love so much... so that we could have extensions for application/pdf and application/x-ms-word or whatever, but not for application/x-sh as per your example.

Flags: needinfo?(gijskruitbosch+bugs)
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Flags: needinfo?(gijskruitbosch+bugs)

Morphing this bug some more to clarify what the patch is going to do.

Just to reiterate, the issue from comment #0 and the ones linked by Thomas Bertels have been fixed in bug 1440677 which landed in Firefox 81 (Firefox release is, at time of writing, on version 82.0.something).

OS: Unspecified → All
Hardware: Unspecified → All
Summary: Consider enforcing file extensions for downloads that have none → Enforce file extensions for pdfs, office style documents and media file types
Blocks: 1676240
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/58d4317c9427
do not overwrite extensions for all filetypes, r=mak

Backed out changeset 58d4317c9427 (bug 1652520) for nsExternalHelperAppService bustages.

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&fromchange=58d4317c9427190d33004df2632544cc4a8fedeb&tochange=4fc8763050c90cf1cb738a678e330884a159572a&searchStr=build&selectedTaskRun=Zz4Nex6YSw-g3fjFNKKqtg.0

Backout link: https://hg.mozilla.org/integration/autoland/rev/9287a5caef96c05facaa6e83fb717fd1176bb324

Failure log: https://treeherder.mozilla.org/logviewer?job_id=321300007&repo=autoland&lineNumber=8597

[task 2020-11-10T12:28:49.426Z] 12:28:49     INFO -  make[4]: Entering directory '/builds/worker/workspace/obj-build/dom/ipc'
[task 2020-11-10T12:28:49.432Z] 12:28:49     INFO -  /builds/worker/fetches/sccache/sccache /builds/worker/fetches/clang/bin/clang++ -std=gnu++17 -o ContentChild.o -c  -I/builds/worker/workspace/obj-build/dist/stl_wrappers -I/builds/worker/workspace/obj-build/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -ftrivial-auto-var-init=pattern -DDEBUG=1 -DOS_POSIX=1 -DOS_LINUX=1 '-DBIN_SUFFIX=""' '-DMOZ_APP_NAME="firefox"' -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/dom/ipc -I/builds/worker/workspace/obj-build/dom/ipc -I/builds/worker/workspace/obj-build/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/checkouts/gecko/ipc/glue -I/builds/worker/checkouts/gecko/caps -I/builds/worker/checkouts/gecko/chrome -I/builds/worker/checkouts/gecko/docshell/base -I/builds/worker/checkouts/gecko/dom/base -I/builds/worker/checkouts/gecko/dom/bindings -I/builds/worker/checkouts/gecko/dom/events -I/builds/worker/checkouts/gecko/dom/filesystem -I/builds/worker/checkouts/gecko/dom/geolocation -I/builds/worker/checkouts/gecko/dom/media/webrtc -I/builds/worker/checkouts/gecko/dom/media/webspeech/synth/ipc -I/builds/worker/checkouts/gecko/dom/security -I/builds/worker/checkouts/gecko/dom/storage -I/builds/worker/checkouts/gecko/extensions/spellcheck/src -I/builds/worker/checkouts/gecko/gfx/2d -I/builds/worker/checkouts/gecko/hal/sandbox -I/builds/worker/checkouts/gecko/js/xpconnect/loader -I/builds/worker/checkouts/gecko/js/xpconnect/src -I/builds/worker/checkouts/gecko/layout/base -I/builds/worker/checkouts/gecko/media/webrtc -I/builds/worker/checkouts/gecko/netwerk/base -I/builds/worker/checkouts/gecko/netwerk/protocol/http -I/builds/worker/checkouts/gecko/toolkit/components/printingui/ipc -I/builds/worker/checkouts/gecko/toolkit/crashreporter -I/builds/worker/checkouts/gecko/toolkit/xre -I/builds/worker/checkouts/gecko/uriloader/exthandler -I/builds/worker/checkouts/gecko/widget -I/builds/worker/checkouts/gecko/xpcom/base -I/builds/worker/checkouts/gecko/xpcom/threads -I/builds/worker/checkouts/gecko/modules/libjar -I/builds/worker/workspace/obj-build/dist/include -I/builds/worker/workspace/obj-build/dist/include/nspr -I/builds/worker/workspace/obj-build/dist/include/nss -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-build/mozilla-config.h -Qunused-arguments -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wempty-init-stmt -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-error=tautological-type-limit-compare -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=return-std-move -Wno-error=atomic-alignment -Wno-error=deprecated-copy -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-psabi -Wno-unknown-warning-option -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fcrash-diagnostics-dir=/builds/worker/artifacts -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -Xclang -load -Xclang /builds/worker/workspace/obj-build/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -Os -fno-omit-frame-pointer -funwind-tables -Werror -I/builds/worker/checkouts/gecko/widget/gtk/compat-gtk3 -pthread -I/usr/include/gtk-3.0/unix-print -I/usr/include/gtk-3.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/at-spi-2.0 -I/usr/include/dbus-1.0 -I/usr/lib/x86_64-linux-gnu/dbus-1.0/include -I/usr/include/gtk-3.0 -I/usr/include/cairo -I/usr/include/pango-1.0 -I/usr/include/harfbuzz -I/usr/include/pango-1.0 -I/usr/include/atk-1.0 -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng12 -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/libpng12 -I/usr/include/gio-unix-2.0/ -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -Wno-error=shadow -fexperimental-new-pass-manager  -MD -MP -MF .deps/ContentChild.o.pp   /builds/worker/checkouts/gecko/dom/ipc/ContentChild.cpp
[task 2020-11-10T12:28:49.433Z] 12:28:49     INFO -  In file included from /builds/worker/checkouts/gecko/dom/ipc/ContentChild.cpp:60:
[task 2020-11-10T12:28:49.433Z] 12:28:49     INFO -  In file included from /builds/worker/workspace/obj-build/dist/include/mozilla/dom/ExternalHelperAppChild.h:11:
[task 2020-11-10T12:28:49.433Z] 12:28:49    ERROR -  /builds/worker/checkouts/gecko/uriloader/exthandler/nsExternalHelperAppService.h:473:22: error: '/*' within block comment [-Werror,-Wcomment]
[task 2020-11-10T12:28:49.434Z] 12:28:49     INFO -     * extension (image/*, video/*, audio/*, and a few specific document types).
[task 2020-11-10T12:28:49.434Z] 12:28:49     INFO -                       ^
[task 2020-11-10T12:28:49.434Z] 12:28:49    ERROR -  /builds/worker/checkouts/gecko/uriloader/exthandler/nsExternalHelperAppService.h:473:31: error: '/*' within block comment [-Werror,-Wcomment]
[task 2020-11-10T12:28:49.435Z] 12:28:49     INFO -     * extension (image/*, video/*, audio/*, and a few specific document types).
[task 2020-11-10T12:28:49.435Z] 12:28:49     INFO -                                ^
[task 2020-11-10T12:28:49.435Z] 12:28:49    ERROR -  /builds/worker/checkouts/gecko/uriloader/exthandler/nsExternalHelperAppService.h:473:40: error: '/*' within block comment [-Werror,-Wcomment]
[task 2020-11-10T12:28:49.436Z] 12:28:49     INFO -     * extension (image/*, video/*, audio/*, and a few specific document types).
[task 2020-11-10T12:28:49.436Z] 12:28:49     INFO -                                         ^
[task 2020-11-10T12:28:49.436Z] 12:28:49     INFO -  3 errors generated.
[task 2020-11-10T12:28:49.437Z] 12:28:49     INFO -  /builds/worker/checkouts/gecko/config/rules.mk:674: recipe for target 'ContentChild.o' failed
[task 2020-11-10T12:28:49.437Z] 12:28:49    ERROR -  make[4]: *** [ContentChild.o] Error 1
[task 2020-11-10T12:28:49.437Z] 12:28:49     INFO -  make[4]: Leaving directory '/builds/worker/workspace/obj-build/dom/ipc'
[task 2020-11-10T12:28:49.437Z] 12:28:49     INFO -  make[4]: *** Waiting for unfinished jobs....
Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Bogdan Tara[:bogdan_tara | bogdant] from comment #23)

[task 2020-11-10T12:28:49.435Z] 12:28:49    ERROR -  /builds/worker/checkouts/gecko/uriloader/exthandler/nsExternalHelperAppService.h:473:40: error: '/*' within block comment [-Werror,-Wcomment]
[task 2020-11-10T12:28:49.436Z] 12:28:49     INFO -     * extension (image/*, video/*, audio/*, and a few specific document types).
[task 2020-11-10T12:28:49.436Z] 12:28:49     INFO -                                         ^

Oh ffs. Didn't think I needed to recompile for a comment-only change, and this is a false positive in terms of code hygiene, but hey-ho.

Flags: needinfo?(gijskruitbosch+bugs)
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/0a93a065053d
do not overwrite extensions for all filetypes, r=mak
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch

Users on SUMO are reporting that the intended file extension (.story and .ts4script) is being changed to .zip when saving downloads. Unfortunately, the name change breaks their workflow: they have to change the file extension before launching the files in the relevant application.

This arises because the server is sending the Content-Type "application/zip" -- like .xpi and .docx files, these are ZIP archives internally.

I don't think there is any way to avoid this in Firefox 84+ (i.e., no preference; nothing that could be added to handlers.json).

Any thoughts on addressing this scenario?

(In reply to jscher2000 from comment #28)

Users on SUMO are reporting that the intended file extension (.story and .ts4script) is being changed to .zip when saving downloads. Unfortunately, the name change breaks their workflow: they have to change the file extension before launching the files in the relevant application.

This arises because the server is sending the Content-Type "application/zip" -- like .xpi and .docx files, these are ZIP archives internally.

I don't think there is any way to avoid this in Firefox 84+ (i.e., no preference; nothing that could be added to handlers.json).

Any thoughts on addressing this scenario?

Yeah, this is really unfortunate. You're right that there is currently no preference. Thanks for your suggestions on those topics to use an extension as a workaround. We'll try to address this for 85. Someone else already filed bug 1684183 and we'll track the fix there.

Depends on: 1686437
Blocks: 1692720
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: