Closed Bug 546726 Opened 14 years ago Closed 8 years ago

formhistory.sqlite explodes to 4TB with profile on network drive

Categories

(Toolkit :: Form Manager, defect)

1.9.1 Branch
x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: stef, Unassigned)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10
Build Identifier: version 3.5.6

formhistory.sqlite explodes to 4TB - We are using network home directories so,  AFP chokes and the server crashes.  Chaos ensues.  Seriously.  This seems to happen on initial launch of the application.  The 2 users that I have been able to track down say that they just launch FireFox, the browser window opens blank and  the machine stops -- 

Reproducible: Sometimes
You have a hard drive large enough for a 4TB file? ;)
So, the profile is stored on the server?
Network homes are stored on XServe RAID  - nearly 10TB are assigned to user Home folders....so, when this happens to 2 people (and it has) the RAID is full......freakish but true!
You're sure it's not a sparse file and actually taking up 4TB?
Version: unspecified → 3.5 Branch
Attached image Screen snap of 4TB file
I attached a screen snap...fyi.  Drives space goes down to 5TB available from 9TB when the file is created.  Then back up to 9TB when I delete it...
In the spirit of the Olympics, I guess you've now got the gold!

Stef       4TB    (bug 546726)
Vladimir   964GB  (bug 525753)
Paul       4GB    (bug 483823)

and that's just for formhistory.sqlite

Is there a generic bug we can dupe everyone here to for out-of-control sqlite files like this? I see bug 538493, but that's currently listed for SeaMonkey. Someone needs to come up with a game-plan to fix this. This is getting ridiculous.

Stef: What filesystem is this on? If your FS doesn't support sparse files then comment 5 would hold true even if it's a mostly empty file. If you attempt to compress it, what is the end result? (assuming it doesn't explode in the process)
Component: General → Form Manager
Product: Firefox → Toolkit
QA Contact: general → form.manager
Version: 3.5 Branch → 1.9.1 Branch
OSX Leopard Server w/XServe RAID .  If (when) it happens again, I will try to compress it. I had to delete the file because I can't risk running out of storage....
I was asking what filesystem, not what operating system. Being OSX I guess HFS+ is a fair bet. It doesn't seem to have sparse file support. (ex: bug 525753 w/ 964GB on a 120GB drive, though it was stated that corruption is another possible cause for that one)
stef (or anyone else experiencing this): Are you seeing this happen repeatedly to the same users, or is it a once-in-a-blue-moon kind of thing?
So far it has happened 7 or 8 times since early november 2009.  Different users each time.
Taras: You created bug 572460 to track sqlite vacuuming. CCing you to decide what to do with this bug and the others listed in comment 6.
Curious, how much do these files shrink when you try to vacuum them?
(In reply to comment #12)
> Curious, how much do these files shrink when you try to vacuum them?

It has been suggested to users with the problem but I don't think any have done so in these extreme cases, at least not that we have results from. The few data points I see here are in bug 483823 comment 3 - bug 483823 comment 6. An instance of a 4GB formhistory.sqlite compressed down to only 20kB with bzip2. Another 4GB file compressed down to 7kB. Justin noted that the file had "a gigantic sequence of nulls appended to the end, and didn't seem otherwise corrupt" and suggested a vacuum in the most recent comment.
Of course, ideally one could track down whatever is filling up these files with nulls in the first place and stop it. This bug implies it happens largely on startup but the others just discovered it after the fact. Might be the same issue just running into different upper limits based on the file system or there might be something else at play. In any case, your suggestion of monitoring db size and vacuuming as-needed could at least keep things in check.
Similar issue of SQLite3 to Tb's bug 494706?
Issue of bug 494706 was:
 1. Write open of file was interfered by other software's open of the file, 
    then returns -1 in 32bits signed integer.
 2. Requester of open treats the -1 as 32bits unsigned integer.
    So requests seek to 4GB-1.
 3. Handler of "seek to 4GB-1" openes file because file is not opened yet.
 4. File open is successfull because other software already closed file.
 5. Thus "seek to 4GB-1" was executed.
Note: 4GB in bug 483823 == 4294966272 == 4294967296-1024 == 2**32-1024
We have seen this twice now on Linux machines with home folders stored on a Solaris/ZFS NFS share. The user's filesystem has ZFS block level compression turned on so the files don't make much of an impact on the actual disk space used, but our backup server aborts the backup as it takes too long to read the 4TB file!
Just a curiosity, I've had a similar problem with my asl.log file on my MacBook (2.16GHz Intel Core 2 Duo running Tiger 10.4.11). The asl.log file blows up to 3.31GB (notably less than 4TB!), but it happens FAST, like to the tune of hundreds of MB per minute. I've managed to delete the file, but I looked at it first and it was full of "...Firefox.app...CGWindowContextCreate: failed to create software delegate." Hundreds of these log entries per second, making a huge file that eats up my drive space. I don't want to create a new bug if it's really the same stuff as what you guys are dealing with, but I'd be curious to know if it gets fixed because I won't run Firefox until I know it's stable and I LIKE Firefox.
(In reply to comment #17)

This sounds like a totally different (and possibly mac-specific) issue, please file a separate bug for it.
I can confirm the occurence of this bug on x86 Linux:

We have about 100 Linux (Debian stable, x86 32bit, some with 64bit kernel but 32bit userland, running various, between Debian stable "iceweasel" and current Firefox 10 ESR versions of Firefox as chosen by the respective user) Workstations accessing a Linux NFS fileserver. The backup tool on that fileserver ran into severe problems trying to backup several 4TB sparse files.

ls -l of those files shows recent accesses and exactly identical size (though block usage differs from file to file), usernames are changed for privacy reasons:
-rw-r--r-- 1 user0 immdstud 4398046510080 Mär 28 10:46 ./cip/2008/user0/.mozilla/firefox/106aerhg.default/formhistory.sqlite
-rw-r--r-- 1 user1 cipce 4398046510080 Mär 27 12:40 ./cip/ce/user1/.mozilla/firefox/chvm2a4j.default/formhistory.sqlite
-rw-r--r-- 1 user2 cipiuk 4398046510080 Apr  2 13:58 ./cip/iuk/user2/.mozilla/firefox/yogq66dc.default/formhistory.sqlite
-rw-r--r-- 1 user3 immdstud 4398046510080 Mär 26 11:24 ./cip/2007/user3/.mozilla/firefox/e89cch9i.default/formhistory.sqlite
-rw-r--r-- 1 user4 cipce 4398046510080 Apr  3 10:47 ./cip/ce/user4/.mozilla/firefox/nzap0pew.default/formhistory.sqlite
-rw-r--r-- 1 user5 immdstud 4398046510080 Apr  1 17:18 ./cip/2009/user5/.mozilla/firefox/nb5swrtk.default/formhistory.sqlite
-rw-r--r-- 1 user6 cipguest 4398046510080 Apr  2 14:03 ./cip/nf/user6/.mozilla/firefox/4mhdi102.default/formhistory.sqlite
-rw-r--r-- 1 user7 immdstud 4398046510080 Mär 27 09:46 ./cip/2010/user7/.mozilla/firefox/70nw4jdx.default/formhistory.sqlite
-rwx------ 1 user8 immdstud 4398046510080 Mär 27 15:57 ./cip/2010/user8/.mozilla/firefox/r9glr6xm.default/formhistory.sqlite

Unfortunately no sample of those files was saved in a rush to get backups working again. Stat shows the low block usage and the recent accesses to those files:

stat /home.stand/cip/2008/user0/.mozilla/firefox/106aerhg.default/formhistory.sqlite
  File: `/home.stand/cip/2008/user0/.mozilla/firefox/106aerhg.default/formhistory.sqlite'
  Size: 4398046510080   Blocks: 48         IO Block: 4096   regular file
Device: fc00h/64512d    Inode: 82064156    Links: 1
Access: (0644/-rw-r--r--)  Uid: (12345/user0)   Gid: (30001/immdstud)
Access: 2012-04-05 05:04:35.223621272 +0200
Modify: 2012-03-28 10:46:15.176452465 +0200
Change: 2012-03-28 10:46:15.176452465 +0200

I'm unsure which version of Firefox were used when creating those files, how can I find that out from the profile directory?
The fact that the file is sparse indicates it came from an old version of firefox(pre 4.0). We should be using fallocate for formhistory.
(In reply to Taras Glek (:taras) from comment #20)
> The fact that the file is sparse indicates it came from an old version of
> firefox(pre 4.0). We should be using fallocate for formhistory.

nevermind, we don't use SetGrowthIncrement for formhistory :(
Does moving to the flat file in bug 673470 resolve this?
(In reply to Kevin Brosnan [:kbrosnan] from comment #22)
> Does moving to the flat file in bug 673470 resolve this?

That bug is for safe browsing, not form history from what I can tell.
(In reply to Taras Glek (:taras) from comment #21)
> (In reply to Taras Glek (:taras) from comment #20)
> > The fact that the file is sparse indicates it came from an old version of
> > firefox(pre 4.0). We should be using fallocate for formhistory.
> 
> nevermind, we don't use SetGrowthIncrement for formhistory :(

Would that solve this problem?
(In reply to Matthew N. [:MattN] from comment #24)
> (In reply to Taras Glek (:taras) from comment #21)
> > (In reply to Taras Glek (:taras) from comment #20)
> > > The fact that the file is sparse indicates it came from an old version of
> > > firefox(pre 4.0). We should be using fallocate for formhistory.
> > 
> > nevermind, we don't use SetGrowthIncrement for formhistory :(
> 
> Would that solve this problem?

it would make the problem occur sooner which may make it easier to track down.
prefs.js in all affected profiles contains user_pref("extensions.lastAppVersion", "3.5.16");

This, I guess, points to the last FF version that profile was used with. 3.5.16-13 is the currently stable version of Iceweasel in Debian. I guess filing a bug with Debian would be best. Is there any hint I could give the Debian guys as to what might cause that bug and what might fix it (since it doesn't seem to occur in newer versions so far)?
(In reply to snalwuer@cip.informatik.uni-erlangen.de from comment #26)
> This, I guess, points to the last FF version that profile was used with.
> 3.5.16-13 is the currently stable version of Iceweasel in Debian.

Is Debian actually maintaining such an out-of-date branch in some way? Firefox 3.5 is no longer supported by Mozilla and 3.6 is considered a direct update which is the minimum anyone should be using at this point. (and it too will be EOL soon) Just doing a search, I see that Debian has a version of Iceweasel based on Firefox 10 ESR for the not yet released Debian 7.0, but there doesn't seem to be a backport of that to the current stable. That seems strange to me.

> I guess filing a bug with Debian would be best.

If they could actually do something with that old version, maybe, but I wouldn't count on it. I'd suggest trying to install a backport of newer Iceweasel first, or just installing the current stable or ESR version of Firefox from Mozilla.com instead.
(In reply to Dave Garrett from comment #27)
> (In reply to snalwuer@cip.informatik.uni-erlangen.de from comment #26)
> > This, I guess, points to the last FF version that profile was used with.
> > 3.5.16-13 is the currently stable version of Iceweasel in Debian.
> 
> Is Debian actually maintaining such an out-of-date branch in some way?

Yes, there are frequent security updates, but nothing that changes functionality in any way. Thats also the reason they are still on 3.5.16 instead of 3.5.19 i guess: http://www.debian.org/doc/manuals/debian-faq/ch-getting.en.html#s-updatestable

> Firefox 3.5 is no longer supported by Mozilla and 3.6 is considered a direct
> update which is the minimum anyone should be using at this point. (and it
> too will be EOL soon)

Debian stable will be stable for at least another year I think. After a new stable release, the old one gets security support for another year. Of course this is very different from the Mozilla support terms, even for its longterm enterprise releases.

> Just doing a search, I see that Debian has a version
> of Iceweasel based on Firefox 10 ESR for the not yet released Debian 7.0,
> but there doesn't seem to be a backport of that to the current stable. That
> seems strange to me.

There is http://mozilla.debian.org which provides various versions of mozilla products as backports.
 
> > I guess filing a bug with Debian would be best.
> 
> If they could actually do something with that old version, maybe, but I
> wouldn't count on it.

Even if they can't or won't fix it, a bug report may at least indicate that there is a known problem and possible workarounds, so I'll go for it.

> I'd suggest trying to install a backport of newer
> Iceweasel first, or just installing the current stable or ESR version of
> Firefox from Mozilla.com instead.

Yes, that will be our workaround, together with a cronjob doing something like
find /home.stand/cip -maxdepth 6 -path \*/.mozilla/firefox/\* -name formhistory.sqlite -size +10M -exec ls -la '{}' \;
Sorry, its http://mozilla.debian.net/

Also, for future reference, the Debian bug report is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=668243
Since April 2012 this bug has occured only once on our systems, on March 3rd 2015. The version that supposedly created that one file seems to have been a rather current release of Iceweasel 31, in prefs.js from that user's profile directory gecko.mstone is 31.5.0, gecko.buildID is 20150225052701.

As to our current system setup, we are running x86_64 Linux, Userspace is Debian jessie (currently testing, soon stable). Users' homes are mounted via NFS, which is sometimes a contributing factor in database corruptions.

But since this was the only occurence in the last 3 years, with by now 300 machines, around 8k users and we still have a daily cronjob looking for those files, frequency seems to have gone down quite a bit from the earlier "around 3 per day" to now seemingly "1 per 3 years".
Summary: formhistory.sqlite explodes to 4TB → formhistory.sqlite explodes to 4TB with profile on network drive
Hi, I have been meaning to fix some I/O-related errors in thunderbird.
To be exact, I am trying to fix the failure of error checks of low-level I/O routines
and proper error recover (the latter has to wait for a while until I figure out how best to cope with the errors detected: right now, thunderbird fails miserably to detect I/O errors, especially network I/O errors and this leads to all sorts of strange issues.)

That the issues noticed by previous bug reporters seem to have something to do with
their profile stored on network server rings a bell.

I have noticed that there are issues in low-level file I/Os not handling network I/O errors very well.

The network code handles EINTR-type error code and required retry very well. Otherwise, I think firefox users would have complained very loudly by now.
Although |Write|, |Read|, |PR_Write|, |PR_Read| (and variants of close and flush) need to handle EINTR-type error, I have found out that
many routines lack this processing.
EINTR-processing and proper retry is necessary when the underlying file system is a remote file system. 
This angle of processing seems to have escaped early coders.

Although I thought sqlite code has a rather good checks of low-level I/O errors (it even has a built-in test to simulate various I/O errors in the source code, which mozilla development community might want to emulate), from what I read here I
suspect that maybe there could have been a failure on the user side of API or maybe a single point of failure in SQlite which has subsequently been fixed.

Please post a relevant information to this bug when something like this happens again
even if it is very rare.
When one is hit by such a bug, it is a very bad rainy day even if it is one in the hundred of thousands users.

TIA

From someone whose mail folder got eaten by TB when it failed to detect full disk condition during compaction about 7 years ago... (I believe it is fixed now.)
That prompted my quest for "rock solid mail client (tm)" to this day :-)
Flags: needinfo?(spencer.maybee)
Flags: needinfo?(duncan)
> Please post a relevant information to this bug when something like this happens again even if it is very rare.

Apparently reporters have no such occurrence in the past two years
Flags: needinfo?(spencer.maybee)
Flags: needinfo?(duncan)
I can confirm that on our site the last known occurence of this bug was 2015-03-03. The cronjob searching for those files is still working and has been silent since 2015-03-03. System is still Debian jessie, the Firefox/Iceweasel version that should be most-used currently is 45.5.1esr-1~deb8u1 as shipped by Debian.
thanks for the update
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: