Closed
Bug 1081577
Opened 10 years ago
Closed 9 years ago
[Performance][Dialer] lmk.js is wrong in Flame base image
Categories
(Firefox OS Graveyard :: Vendcom, defect, P2)
Tracking
(blocking-b2g:2.2+, firefox44 unaffected, b2g-v2.0 unaffected, b2g-v2.1 affected, b2g-v2.2 affected, b2g-master unaffected)
RESOLVED
FIXED
blocking-b2g | 2.2+ |
Tracking | Status | |
---|---|---|
firefox44 | --- | unaffected |
b2g-v2.0 | --- | unaffected |
b2g-v2.1 | --- | affected |
b2g-v2.2 | --- | affected |
b2g-master | --- | unaffected |
People
(Reporter: Marty, Assigned: cyu)
References
()
Details
(Keywords: regression, Whiteboard: [2.1-exploratory-3][POVB])
Attachments
(4 files)
Description: If the User has several apps open (4-5), when they receive a phone call, the ringtone will play properly, but the Incoming Call UI takes up to 5 seconds to appear and allow the user to accept or deny the call. This issue seems to be more severe when the user is viewing a app in landscape mode at the time of the call. Repro Steps: 1) Update a Flame device to BuildID: 20141011000201 2) Open several apps (Browser, Gallery, Messaging, Contacts, Settings, Marketplace) 3) View the Browser app in landscape mode. 4) Call the DUT from another phone. Actual: Call UI appeared 5 seconds after ringtone began playing. Expected: Call UI appears immediately when the ringtone begins playing. Environmental Variables: Device: Flame 2.1 (319MB) BuildID: 20141011000201 (Full Flash) Gaia: f5d4ff60ffed8961f7d0380ada9d0facfdfd56b1 Gecko: d813d79d3eae Gonk: 52c909e821d107d414f851e267dedcd7aae2cebf Version: 34.0a2 (2.1) Firmware: V180 User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0 Notes: Repro frequency: 5/5 See attached: video clip (URL), logcat ------------------ This issue DOES occur on Flame 2.2 Call UI takes 5 seconds to appear after the ringtone begins playing when there are multiple apps open. Environmental Variables: Device: Flame 2.2 Master (319MB) BuildID: 20141011040204 (Full Flash) Gaia: 95f580a1522ffd0f09302372b78200dab9b6f322 Gecko: 3f6a51950eb5 Gonk: 52c909e821d107d414f851e267dedcd7aae2cebf Version: 35.0a1 (2.2 Master) Firmware: V180 User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0
Reporter | ||
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(dharris)
Updated•10 years ago
|
Whiteboard: [2.1-Daily-Testing] → [2.1-exploratory-3]
Comment 1•10 years ago
|
||
[Blocking Requested - why for this release]: that video is bad. marty, how bad is the performance when its in portrait mode? having a unactionable delay this long on incoming call is unacceptable to a user. Also, does this reproduce on the raw base image v180? (which is 2.0) flagging for blocking and necessary investigation.
blocking-b2g: --- → 2.1?
Keywords: qawanted
Comment 2•10 years ago
|
||
I was able to repro this issue on the reporter's 2.2 build - out of 20 trials the average seemed 2 or 3 seconds but there were several instances of 4 and 5 second delays. This issue DOES NOT repro on 512 mem ------------------------------------------------------------------------------------------ following testing done with 319 mem: This issue DOES reproduce on the raw base image v180, but DOES NOT repro with 2.0 Full Flashed on top Actual Results - Opening all the apps listed in the STR and calling the DUT resulted in the callscreen appearing in 4-5 seconds on average on Base only. Device: Flame 2.0 (Base Only) Repro Build ID: 20140904160718 Gaia: 506da297098326c671523707caae6eaba7e718da Gecko: 2b27becae85092d46bfadcd4fb5605e82e1e1093 Version: 32.0 (2.0) Firmware Version: V180 User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0 Device: Flame 2.0 - No Repro Build ID: 20141012000202 Gaia: 6effca669c5baaf6cd7a63c91b71a02c6bd953b3 Gecko: 54ec9cb26b59 Version: 32.0 (2.0) Firmware Version: V180 User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
status-b2g-v2.0:
--- → unaffected
Flags: needinfo?(dharris)
Comment 3•10 years ago
|
||
Regression window unavailable - This issue occurs in the oldest 2.1 build we have access to Device: Flame 2.1 Build ID: 20140904062538 Gaia: a47ecb6368c015dd72148acde26413fd90ba3136 Gecko: ffb144a500a4 Version: 34.0a2 Firmware Version: V180 User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0 and issue does not reproduce in JB Device: Flame Master Build ID: 20141003070740 Gaia: a8a6eed2ba9d66239aac789b9ee4900f911c73cb Gecko: 388e101e75c8 Version: 35.0a1 (Master) Firmware Version: V123 User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0
Flags: needinfo?(pbylenga)
Keywords: regressionwindow-wanted
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(pbylenga)
Comment 4•10 years ago
|
||
Regression in call UI appearance = blocker. This should probably be moved out of the Performance component to get some attention. Gregor, is Systems FE the right component?
blocking-b2g: 2.1? → 2.1+
Flags: needinfo?(anygregor)
Comment 5•10 years ago
|
||
That sounds like window-management to me.
Component: Performance → Gaia::System::Window Mgmt
Flags: needinfo?(anygregor)
Comment 6•10 years ago
|
||
Can we do a regression window check on the 2.0 between raw base image v180 and full flash?
Keywords: regressionwindow-wanted
Comment 7•10 years ago
|
||
This sounds to me that we are drop the callscreen on memory pressure, but in some gaia version (see comment 2) the kill/reload is either failing. Etienne, any thought?
Updated•10 years ago
|
Flags: needinfo?(etienne)
Comment 8•10 years ago
|
||
(In reply to Alive Kuo [:alive][NEEDINFO!] from comment #7) > This sounds to me that we are drop the callscreen on memory pressure, > but in some gaia version (see comment 2) the kill/reload is either failing. > > Etienne, any thought? I don't think it's failing since the callscreen eventually comes up. Looks like this is just bug 999478 working. After a memorypressure we don't reload the callscreen until the next call. And I don't think we get another event once the memory pressure is over, so we don't have a good heuristic to trigger a new preload of the callscreen.
Flags: needinfo?(etienne)
Comment 9•10 years ago
|
||
(In reply to howie [:howie] from comment #6) > Can we do a regression window check on the 2.0 between raw base image v180 > and full flash? If I'm understanding your question correctly - the answer is no - You (or WE) can not get a regression window between a base image and a branch of builds - regression windows have to be found within a branch itself AFAIK If there IS a way to do this, we lack documentation / proper pushlog links to use / etc.
Keywords: regressionwindow-wanted
Comment 10•10 years ago
|
||
Hi Alive, so we still need your help to dig in more on this.
Flags: needinfo?(alive)
Comment 11•10 years ago
|
||
Very strange...it seems we are NEVER killing apps while there are 10 more apps running in 319MB 2.1 build. So we are having more than 10 apps running at the same time and I think it's the root cause of this bug. Cervantes, any idea?
Flags: needinfo?(alive) → needinfo?(cyu)
Comment 12•10 years ago
|
||
I wrote this up in bug 1080239 that apps are not being killed and it was resolved as invalid.
Comment 13•10 years ago
|
||
This is more like a device or configuration dependent issue. For now, configuration of LMK and zram are not changed on booting according device's configuration. HW vendors need to change LMK/zram according their owned HW/SW configuration to get better performance. I think it is not only for 319MB, my dogfood flame with 1G also hit the same problem if I open enough processes. Especially, now, we never close WEB pages. If you follow links on FB app, over the time, you will get a bunch of processes for visited links, then you will slow down too. Cervantes, please help to revise our LMK and OOM settings.
Assignee: nobody → cyu
Comment 14•10 years ago
|
||
(In reply to KTucker [:KTucker] from comment #12) > I wrote this up in bug 1080239 that apps are not being killed and it was > resolved as invalid. I am sorry if I misunderstood something; we are having a feature to keep the oom-killed app in background but don't really keep the application alive, and from the bug comments we don't see if you know this feature or not so we think that is the this feature. What we want to see is the |adb shell b2g-info| result to confirm all apps are live.
Assignee | ||
Comment 15•10 years ago
|
||
I reproduced this problem on 2014-10-11-00-12-01 build. It requires several rounds of the steps in #c0 to see this problem. b2g-info shows that we are using LMK parameters on a low-memory device. I checked dmesg, and no process is killed after this STR. That is, the kernel works very hard to keep everything alive, but this is not expected by us. We can 1. increase the LMK parameters, but I am suspicious about it. Even we double the parameters, free+cache is still larger than the minfree for background apps. 2. lower swapiness to make the kernel less likely to swap memory. I guess we need to do both and will need more experiments to verify.
Flags: needinfo?(cyu)
Assignee | ||
Comment 16•10 years ago
|
||
I cross-checked the minfree parameters with Intex CloudFX by cat /sys/module/lowmemorykiller/parameters/minfree Flame: 1024,1280,1536,1792,2048,2560 Intex: 1024,1280,1536,1792,2048,4608 So we set the parameters on flame 319 MB stricter than on Intex!! That must be totally wrong.
Assignee | ||
Comment 17•10 years ago
|
||
Also /sys/module/lowmemorykiller/parameters/notify_trigger: Flame: 2304 Intex: 3584 Flame is still stricter than Intex.
Assignee | ||
Comment 18•10 years ago
|
||
Loop Paul and Walter in. I think this bug is relevant to the problem that the device becomes janky after MTBF tests.
Assignee | ||
Comment 19•10 years ago
|
||
minfree for background apps has always been 20480 MiB: http://mxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#740 http://mxr.mozilla.org/mozilla-b2g32_v2_0/source/b2g/app/b2g.js#718 http://mxr.mozilla.org/mozilla-beta/source/b2g/app/b2g.js#737 http://mxr.mozilla.org/mozilla-aurora/source/b2g/app/b2g.js#740 but /system/b2g/defaults/pref/lmk.js changed notify_trigger and minfree for background apps. We need to check why the base image contains such parameters.
Assignee | ||
Comment 20•10 years ago
|
||
ni Wesly Huang for the issue in base image lmk settings. Wesly, we'd like your help to check why /system/b2g/defaults/pref/lmk.js has such restricted values. Thanks.
Flags: needinfo?(wehuang)
Comment 21•10 years ago
|
||
:cyu, would you move this bug to appropriate component? Thanks.
Flags: needinfo?(cyu)
Assignee | ||
Updated•10 years ago
|
Component: Gaia::System::Window Mgmt → GonkIntegration
Flags: needinfo?(cyu)
Comment 22•10 years ago
|
||
Hi Youlong: Pls see discussion above then comment#20. We are checking MTBF issue and now realize the low mem. killer setting in your image is quite strict (too late to kill process while free ram is very low), would like to know the reason behind and maybe need to change as well. Thank you.
Flags: needinfo?(wehuang) → needinfo?(youlong.jiang)
Comment 23•10 years ago
|
||
btw seems not in earlier SW before v180, is it possible that, this change is made when you upgrade to QCT CS?
Comment 24•10 years ago
|
||
Any update?
Comment 25•10 years ago
|
||
(In reply to Wesly Huang from comment #22) > Hi Youlong: > > Pls see discussion above then comment#20. > > We are checking MTBF issue and now realize the low mem. killer setting in > your image is quite strict (too late to kill process while free ram is very > low), would like to know the reason behind and maybe need to change as well. > Thank you. hi wesly - per you previous summarize, you doubt maybe this issue caused by lmk parameter strict, also checked /system/b2g/defaults/pref/lmk.js to reflect current status. we haven't modify this point, so could you help to analysis and provide configure interface and recommended value, we'll cooperate and release test base image to you. tks.
Flags: needinfo?(youlong.jiang)
Comment 26•10 years ago
|
||
Hi Youlong: Are you saying the value are all default ones from QCT? pls help list the values used in v123, v180, and v188 for our reference. Hi Cervantes: Would you help provide some suggestion about the value? could we reference some other products and select a proper one? Thank you.
Flags: needinfo?(youlong.jiang)
Flags: needinfo?(cyu)
Assignee | ||
Comment 27•10 years ago
|
||
(In reply to Wesly Huang from comment #26) > Hi Cervantes: Would you help provide some suggestion about the value? could > we reference some other products and select a proper one? Thank you. I think the default values in b2g.js should work on most devices: pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 20480); pref("hal.processPriorityManager.gonk.notifyLowMemUnderKB", 14336); We may just remove lmk.js to make the default values work and see if the issue remains. If the problem still remains, we might need other tweaks such increasing the values or changing swapiness.
Flags: needinfo?(cyu)
Comment 28•10 years ago
|
||
(In reply to Wesly Huang from comment #26) > Hi Youlong: Are you saying the value are all default ones from QCT? pls help > list the values used in v123, v180, and v188 for our reference. > > Hi Cervantes: Would you help provide some suggestion about the value? could > we reference some other products and select a proper one? Thank you. we've check v123,v180 and v188, they are all the same. pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 10240); pref("hal.processPriorityManager.gonk.notifyLowMemUnderKB", 9216); pls take it as refer tks.
Flags: needinfo?(youlong.jiang)
Comment 29•10 years ago
|
||
Per comment 18, this is kind of affecting the result of MTBF testing. Setting block to MTBF-B2G meta bug
Blocks: MTBF-B2G
Comment 30•10 years ago
|
||
Hi, Cervantes, Do we even need different parameters for 319? If so, can you help with getting a parameter and communicate with people who might know how to better examinate when we set it to 319?
Updated•10 years ago
|
Flags: needinfo?(cyu)
Assignee | ||
Comment 31•10 years ago
|
||
As I said in comment #27. We don't need the default values in lmk.js. They are too restrictive and are the root cause of this bug.
Flags: needinfo?(cyu)
Assignee | ||
Comment 32•10 years ago
|
||
I rm /system/b2g/defaults/pref/lmk.js and the problem is not reproduced on my flame. My flame is vanilla v188 image. So I suggest just removing lmk.js and using the default values in gecko.
Comment 33•10 years ago
|
||
Sorry for catch up late. ni T2M @Youlong: as just discussed in phone, pls help arrange a local build (userdebug) with comment#27's suggestion then release to me for more verification here. Thank you.
Flags: needinfo?(youlong.jiang)
Updated•10 years ago
|
Summary: [Performance][Dialer] Call UI can take 5 seconds to appear when multiple other apps are open. → [Performance][Dialer] lmk.js is wrong in Flame base image
Comment 34•10 years ago
|
||
(In reply to Wesly Huang from comment #33) > Sorry for catch up late. ni T2M > > @Youlong: as just discussed in phone, pls help arrange a local build > (userdebug) with comment#27's suggestion then release to me for more > verification here. Thank you. hi wesly - we found lmk.js is generated per build. so, could you pls help to provide point that correct lmk.js value for your test. tks.
Flags: needinfo?(youlong.jiang)
Comment 35•10 years ago
|
||
(In reply to youlong.jiang from comment #34) > (In reply to Wesly Huang from comment #33) > > Sorry for catch up late. ni T2M > > > > @Youlong: as just discussed in phone, pls help arrange a local build > > (userdebug) with comment#27's suggestion then release to me for more > > verification here. Thank you. > > hi wesly - > > we found lmk.js is generated per build. so, could you pls help to provide > point that correct lmk.js value for your test. > > tks. Viral, any idea?
Flags: needinfo?(vwang)
Comment 36•10 years ago
|
||
Per comment 32, please just do as Cervantes suggest.
Flags: needinfo?(vwang)
Comment 38•10 years ago
|
||
Actually I do think Cervantes already provide the answer in comment 27. We should keep it as default value.
Flags: needinfo?(vwang)
Comment 39•10 years ago
|
||
Hi Youlong, I assume your questions is "how to" change the value as suggested in comment#27, right? If Cervantes and Viral has no suggestion here I recommand to go for QCT for answer. @Cervantes, Viral: do you know that?
Flags: needinfo?(youlong.jiang)
Flags: needinfo?(vwang)
Flags: needinfo?(cyu)
Comment 40•10 years ago
|
||
Hi Wesly, It looks like the code comes from qualcomm: in "device/qcom/msm8610/msm8610.mk" out/target/product/$(TARGET_PRODUCT)/system/gecko: gaia/profile/defaults/pref/lmk.js .PHONY: gaia/profile/defaults/pref/lmk.js gaia/profile/defaults/pref/lmk.js: gaia/profile.tar.gz echo 'pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 10240);' > $@ echo 'pref("hal.processPriorityManager.gonk.notifyLowMemUnderKB", 9216);' >> $@ It will overwrite our default setting. I think we should ask qualcomm the season why they modify the low memory killer parameters. Maybe they can remove it and then we can use default value as our expect.
Flags: needinfo?(wehuang)
Flags: needinfo?(vwang)
Flags: needinfo?(cyu)
Comment 41•10 years ago
|
||
Thanks for Viral's help! @Youlong, pls help check with QCT for the reason, see if it's ok to change, and how to change. Thank you.
Flags: needinfo?(wehuang)
Comment 42•10 years ago
|
||
Hi Michael, Not sure if you can help on this question in comment 40. Looks like you overwrite parameters of low memory killer in device/qcom/msm8610/msm8610.mk We suffer some OOM issues in this case. Is that possible that you remove it and we can use default setting for lmk?
Flags: needinfo?(mvines)
(In reply to viral [:viralwang] from comment #40) > Hi Wesly, > > It looks like the code comes from qualcomm: > > in "device/qcom/msm8610/msm8610.mk" > out/target/product/$(TARGET_PRODUCT)/system/gecko: > gaia/profile/defaults/pref/lmk.js > .PHONY: gaia/profile/defaults/pref/lmk.js > gaia/profile/defaults/pref/lmk.js: gaia/profile.tar.gz > echo 'pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", > 10240);' > $@ > echo 'pref("hal.processPriorityManager.gonk.notifyLowMemUnderKB", > 9216);' >> $@ > > It will overwrite our default setting. > I think we should ask qualcomm the season why they modify the low memory > killer parameters. > Maybe they can remove it and then we can use default value as our expect. We have changed this to save background apps getting killed by LMK frequently. Reducing this parameter ensures that we can have more background apps and we will be using zram more on 256MB device to do this. Instead of changing this value, I would suggest to understand who is taking most cpu usage in use case from Comment 0 "Incoming Call UI takes up to 5 seconds to appear and allow the user to accept or deny the call." . Running |adb shell top -t -m 10| should tell us cpu usage during this operation.
Flags: needinfo?(mvines)
Updated•10 years ago
|
Flags: needinfo?(vwang)
NAME PID PPID CPU(s) NICE USS PSS RSS SWAP VSIZE OOM_ADJ USER b2g 206 1 637.9 0 52.1 53.4 60.3 19.6 251.5 0 root (Nuwa) 400 206 5.7 0 0.0 0.2 1.8 6.8 53.7 -16 root OperatorVariant 935 400 7.7 18 0.3 0.7 5.1 9.1 61.8 10 u0_a935 Homescreen 1185 400 90.8 1 5.2 6.1 12.1 12.3 83.4 2 u0_a1185 Browser 3435 400 23.8 18 1.5 2.3 8.4 13.4 71.2 10 u0_a3435 Smart Collectio 9856 400 5.8 18 2.8 3.4 8.7 8.7 64.1 10 u0_a9856 Messages 10881 206 3.9 18 4.4 5.0 9.9 11.1 75.3 10 u0_a10881 Communications 10890 400 4.8 18 6.3 7.2 13.4 10.8 71.4 10 u0_a10890 Settings 10962 400 5.3 18 4.7 5.6 11.6 10.5 68.0 10 u0_a10962 Marketplace 11011 400 9.0 18 7.7 8.5 14.3 11.2 76.6 10 u0_a11011 (Preallocated a 11309 400 0.8 18 4.0 4.7 9.8 4.8 60.8 1 u0_a11309 System memory info: Total 215.3 MB SwapTotal 192.0 MB Used - cache 187.8 MB B2G procs (PSS) 97.2 MB Non-B2G procs 90.6 MB Free + cache 27.5 MB Free 6.6 MB Cache 20.9 MB SwapFree 80.9 MB Low-memory killer parameters: notify_trigger 9216 KB oom_adj min_free 0 4096 KB 58 5120 KB 117 6144 KB 352 7168 KB 470 8192 KB 588 10240 KB SwapFree 80.9 MB : it tells that almost 192-80.9 = 111 MB data is pushed to zram device. We have at least 7 background apps running when this issue happened. But we should also understand cpu usage before making LMK more aggresive to kill background apps.
Assignee | ||
Comment 45•10 years ago
|
||
Running top on the device shows that the system has high system CPU usage (>50%) and relatively low user space CPU usage (~25%) during the 5 sec period from ringtone playing to call screen showing up. It's very likely that the system tries really hard swapping memory into/out of zram. No sign of lowmemory killer taking action. Actually from dmesg, I also see OOM killer taking action like: <3>[30963.239308] Out of memory: Kill process 205 (b2g) score 575 or sacrifice child <3>[30963.245539] Killed process 325 ((Nuwa)) total-vm:74100kB, anon-rss:108kB, file-rss:16kB <4>[32064.980410] b2g-ps invoked oom-killer: gfp_mask=0xd0, order=2, oom_adj=0, oom_score_adj=0 <6>[32064.987646] [<c010bd74>] (unwind_backtrace+0x0/0xf8) from [<c087d9c8>] (dump_header.isra.10+0x74/0x180) So I strongly suggest increasing the values as previously suggested.
(In reply to Cervantes Yu from comment #45) > Running top on the device shows that the system has high system CPU usage > (>50%) and relatively low user space CPU usage (~25%) during the 5 sec > period from ringtone playing to call screen showing up. It's very likely > that the system tries really hard swapping memory into/out of zram. No sign > of lowmemory killer taking action. > Could you please also post full output of |adb shell top -t -m 10| . I am curious to see what are those processes :) .
Flags: needinfo?(vwang)
Flags: needinfo?(cyu)
Assignee | ||
Comment 48•10 years ago
|
||
Flags: needinfo?(vwang)
Flags: needinfo?(cyu)
Assignee | ||
Comment 49•10 years ago
|
||
More experiment: If we don't ask the background process to GC then the performance is much better on incoming call, even with the current lmk setting and many apps open. Call screen shows up in about 1 sec after ringtone plays. My wild guess is that zram and GC'ing the background process just doesn't work well with each other. With the foreground and background app running concurrently, zram could repeatedly and alternatively swapping pages in and out in these 2 processes. This is not to say that we don't need to change the lmk settings. With the current lmk settings, I could even run out of swap space and let the OOM killer kicks in to kill a random process. This is really bad for end users. Gabriele, what's your comment on not GC when sending the process to background?
Flags: needinfo?(gsvelto)
Comment 50•10 years ago
|
||
(In reply to Cervantes Yu from comment #49) > Gabriele, what's your comment on not GC when sending the process to > background? We already encountered the problem in the past (bug 963477) and disabled GC'ing the background application in the v1.3t branch only. The real solution would be to provide a fix for bug 1082290 which I'm working on, but that won't be ready for some time and I see that this bug is 2.1+ so we need a quick fix here. Also I'm not sure if the feature bug 1082290 will use is present in all kernels we support. My guess is that it's not so it wouldn't be enough. What I would suggest is that we fix bug 975360 instead. The idea is that we would have a pref that establishes if applications sent in the background are GC'd or not. zram-based devices will then turn this pref off to prevent needlessly swapping from zram. Bug 963477 delayed the GC but I don't think it's a good idea in general because that's going to cause a slow-down anyway (just later) and it might have a negative impact on battery life due to the the significant swapping.
Flags: needinfo?(gsvelto)
(In reply to Cervantes Yu from comment #48) > Created attachment 8523640 [details] > CPU usage when the bug is reproduced I am seeing below line in cpu usage log which suggests b2g main process is doing some activity : 14172 0 21% S 69 243864K 46960K root /system/b2g/b2g For comment 50, My vote will be to fix gc issues instead of changing LMK.. It seems like we are moving in right direction already :)
Assignee | ||
Comment 52•10 years ago
|
||
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #51) > (In reply to Cervantes Yu from comment #48) > > Created attachment 8523640 [details] > > CPU usage when the bug is reproduced > > I am seeing below line in cpu usage log which suggests b2g main process is > doing some activity : > > 14172 0 21% S 69 243864K 46960K root /system/b2g/b2g > > For comment 50, > > My vote will be to fix gc issues instead of changing LMK.. It seems like we > are moving in right direction already :) No, lmk.js also needs to be changed. Otherwise, if the OOM killer kicks in, it will kill a random process. The worst case is the b2g process, which is a system crash.
Updated•10 years ago
|
Flags: needinfo?(bbajaj)
Comment 53•10 years ago
|
||
Hi Gabriele and Tapas, what's your view about comment#52, that fix not only GC but also lmk? Since per comment#27 & comment#28 indeed Flame has smaller value then a working setting in current b2g.js?
Flags: needinfo?(tkundu)
Flags: needinfo?(gsvelto)
Comment 54•10 years ago
|
||
(In reply to Wesly Huang from comment #53) > what's your view about comment#52, that fix not only GC but also lmk? Since > per comment#27 & comment#28 indeed Flame has smaller value then a working > setting in current b2g.js? I agree. The values present in the base image don't make any sense. They will prevent the order we set up to kill applications from working correctly since the KillUnderKB value for background applications is terribly close to all others. There's also not enough room around the notifyLowMemUnderKB threshold possibly making low-memory notifications useless. Note that the default parameters were designed with a 256MiB device in mind. On a device like the flame with more memory those could be raised a bit to allow for more wiggle room when lots of apps are open but definitely not lowered.
Flags: needinfo?(gsvelto)
(In reply to Cervantes Yu from comment #52) > (In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from > comment #51) > > (In reply to Cervantes Yu from comment #48) > > > Created attachment 8523640 [details] > > > CPU usage when the bug is reproduced > > > > I am seeing below line in cpu usage log which suggests b2g main process is > > doing some activity : > > > > 14172 0 21% S 69 243864K 46960K root /system/b2g/b2g > > > > For comment 50, > > > > My vote will be to fix gc issues instead of changing LMK.. It seems like we > > are moving in right direction already :) > > No, lmk.js also needs to be changed. Otherwise, if the OOM killer kicks in, > it will kill a random process. The worst case is the b2g process, which is a > system crash. >> Otherwise, if the OOM killer kicks in, it will kill a random process. Not really. LMK will kill b2g process only if system is still under memory pressure even after it kills NUWA, homescreen, preallocate app and all other FFOS apps. We never saw b2g getting killed randomly if there is no memleak on system. IMO, we should modify lmk.js only after solving gc issues. If we still see problem then we can go ahead and change lmk settings.
Flags: needinfo?(tkundu) → needinfo?(wehuang)
Comment 56•10 years ago
|
||
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #55) > Not really. LMK will kill b2g process only if system is still under memory > pressure even after it kills NUWA, homescreen, preallocate app and all other > FFOS apps. We never saw b2g getting killed randomly if there is no memleak > on system. Yes, the main process is in a class of his own and *all* other apps will be killed before it. The only scenario in which the main process can be killed is if it's exhausted all memory on his own and all other apps have already been killed. > IMO, we should modify lmk.js only after solving gc issues. If we still see > problem then we can go ahead and change lmk settings. The LMK changes are also needed. The settings you're seeing here are breaking our OOM policy. This page contains more details on the process and describes why the values of those parameters were set that way in b2g.js https://developer.mozilla.org/en-US/Firefox_OS/Platform/Out_of_memory_management_on_Firefox_OS
(In reply to Gabriele Svelto [:gsvelto] from comment #56) > The LMK changes are also needed. The settings you're seeing here are > breaking our OOM policy. This page contains more details on the process and > describes why the values of those parameters were set that way in b2g.js > > https://developer.mozilla.org/en-US/Firefox_OS/Platform/ > Out_of_memory_management_on_Firefox_OS ok please change lmk settings if you feel it is needed :) thanks for informing us.
Updated•10 years ago
|
Whiteboard: [2.1-exploratory-3] → [2.1-exploratory-3][mtbf]
Comment 58•10 years ago
|
||
According to comment 40, comment 56, and comment 57, we need to change lmk settings. We need T2M's help to change that. If T2M doesn't know how to do it, they will need to talk with the vendor who provide code to T2M. Thanks.
Component: GonkIntegration → Vendcom
Assignee | ||
Comment 59•10 years ago
|
||
(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #55) > (In reply to Cervantes Yu from comment #52) > > (In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from > > comment #51) > > > (In reply to Cervantes Yu from comment #48) > > > > Created attachment 8523640 [details] > > > > CPU usage when the bug is reproduced > > > > > > I am seeing below line in cpu usage log which suggests b2g main process is > > > doing some activity : > > > > > > 14172 0 21% S 69 243864K 46960K root /system/b2g/b2g > > > > > > For comment 50, > > > > > > My vote will be to fix gc issues instead of changing LMK.. It seems like we > > > are moving in right direction already :) > > > > No, lmk.js also needs to be changed. Otherwise, if the OOM killer kicks in, > > it will kill a random process. The worst case is the b2g process, which is a > > system crash. > > >> Otherwise, if the OOM killer kicks in, it will kill a random process. > > Not really. LMK will kill b2g process only if system is still under memory > pressure even after it kills NUWA, homescreen, preallocate app and all other > FFOS apps. We never saw b2g getting killed randomly if there is no memleak > on system. > There are 2 different killers: The low memory killer (LMK) and the OOM killer. The settings in lmk.js affects the low memory killer, but I am talking about the OOM killer. OOM killer kills a process based on the OOM score. The score is computed using various hints, one of which is how much memory it consumes. On a running flame (319M), b2g's oom_score is even higher than the preallocated process: b2g 0 root 203 1 214608 58876 ffffffff b6ef4894 S /system/b2g/b2g (Preallocated a 2 u0_a2069 2069 389 79988 17408 ffffffff b6ef4894 S /system/b2g/b2g root@flame:/ # cat /proc/203/oom_score 150 root@flame:/ # cat /proc/2069/oom_score 113 That is, when the "OOM killer" kicks in, b2g will be killed before the preallocated process, even we set the "LMK" to work the other way. > IMO, we should modify lmk.js only after solving gc issues. If we still see > problem then we can go ahead and change lmk settings. For the above reason, we should modify lmk.js whether we solve the gc issue or not. Otherwise we might run into getting the b2g process killed.
Comment 60•10 years ago
|
||
Hi Youlong, pls see comment#58 and contact QC for help to change lmk setting. @Tapas: if you know how to do this maybe you can kindly guide T2M here? Thanks.
Flags: needinfo?(wehuang) → needinfo?(tkundu)
Comment 61•10 years ago
|
||
It's just a matter of removing that file. We already have sane defaults for those settings in master gecko.
(In reply to Gabriele Svelto [:gsvelto] from comment #61) > It's just a matter of removing that file. We already have sane defaults for > those settings in master gecko. Yes . Agreed. Please let me know if that works for you/T2M :) .
Flags: needinfo?(tkundu)
Updated•10 years ago
|
Flags: needinfo?(wehuang)
Updated•10 years ago
|
Flags: needinfo?(bbajaj)
Comment 63•10 years ago
|
||
Thanks Tapas, Gabriele. @Youlong: pls follow the suggestion above and let us know if any question. Thanks.
Flags: needinfo?(wehuang)
Comment 64•10 years ago
|
||
(In reply to Wesly Huang from comment #63) > Thanks Tapas, Gabriele. > > @Youlong: pls follow the suggestion above and let us know if any question. > Thanks. hi wesly - we've moved lmk.js from system and would release version with patch taken. tks.
Flags: needinfo?(youlong.jiang)
Comment 65•10 years ago
|
||
verify with the latest base image v18D, lmk.js is already removed.
Updated•10 years ago
|
Whiteboard: [2.1-exploratory-3][mtbf] → [2.1-exploratory-3][mtbf][POVB]
Comment 66•10 years ago
|
||
Hi Tapas: Now the change is applied in T2M's SW release to us, however for our own full build it still links to code in CAF, do you think you can make same change there?
Flags: needinfo?(tkundu)
Comment 67•10 years ago
|
||
Please just fork the CAF project if you'd like to customize its contents for your Flame build.
Flags: needinfo?(tkundu)
Comment 68•9 years ago
|
||
Hi Wesly, any further action needed for this issue?
Flags: needinfo?(wehuang)
Comment 69•9 years ago
|
||
Any updates? Also, we can for the CAF project but the things on T2M phones will still be incorrect.
Comment 70•9 years ago
|
||
(In reply to Steven Yang [:styang] from comment #68) > Hi Wesly, any further action needed for this issue? In comment#65 it's verified v18D has done the removal as suggested in comment#61 & #62, so I see no further action needed for T2M/Flame base image. (also we don't have further Flame base image release plan after v18D) My understanding is, the left thing is if we would like to make changes accordingly in Mozilla's Flame build, comment#66 & #67 covers this topic, so it depends on if we would like to fork it in our code.
Flags: needinfo?(wehuang)
Comment 72•9 years ago
|
||
I believe that the setting is still wrong in new build/images.
Flags: needinfo?(wachen)
Comment 73•9 years ago
|
||
MTBF Triage: remove from MTBF monitor.
No longer blocks: MTBF-B2G
Whiteboard: [2.1-exploratory-3][mtbf][POVB] → [2.1-exploratory-3][POVB]
Updated•9 years ago
|
blocking-b2g: 2.1+ → 2.5+
Updated•9 years ago
|
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][qa-tracking]
Comment 74•9 years ago
|
||
Per comment 32 and comment 65, lmk.js had been removed in v18D image. This should already be resolved, and not be 2.5 blocker. Mark verifyme to double check. ----- Build ID 20150804150207 Gaia Revision c5425d9f1f5184731a59ed4bc99295acbde30390 Gaia Date 2015-08-04 16:09:19 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/f3b757156f69 Gecko Version 42.0a1 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150712.193621 Firmware Date Sun Jul 12 19:36:34 EDT 2015 Bootloader L1TC000118D0
status-b2g-master:
--- → unaffected
Keywords: verifyme
Comment 76•9 years ago
|
||
This issue is still reproducing on Flame 2.2 and 2.1. Following STR, device takes more than 5 seconds to show call UI. Reproduction frequency is 3 out of 3 on each branch. Device: Flame 2.2 (full flashed 319MB KK) BuildID: 20150828032506 Gaia: 335cd8e79c20f8d8e93a6efc9b97cc0ec17b5a46 Gecko: 16d864d163de Gonk: bd9cb3af2a0354577a6903917bc826489050b40d Version: 37.0 (2.2) Firmware Version: v18Dv4 User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0 Device: Flame 2.1 (full flashed 319MB KK) BuildID: 20150724001207 (note: we stopped getting newer builds on this branch) Gaia: 9dba58d18006e921546cec62c76074ce81e16518 Gecko: 41e10c6740be Gonk: bd9cb3af2a0354577a6903917bc826489050b40d Version: 34.0 (2.1) Firmware Version: v18Dv4 User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0 ------- This issue does NOT occur on Flame 2.5/master. Following STR, call UI is displayed within 2 seconds. I think the reason why this doesn't repro on master is because of bug 1172167 where apps are getting aggressively killed in the background. Without that bug, master is likely still affected. Device: Flame 2.5 (full flashed 319MB KK) BuildID: 20150828030207 Gaia: b69c16798ddd7154207f56d983721a327522f5d1 Gecko: 87e23922be375985d0b1906ed5ba5f095f323a38 Gonk: c4779d6da0f85894b1f78f0351b43f2949e8decd Version: 43.0a1 (2.5 Master) Firmware Version: v18Dv4 User Agent: Mozilla/5.0 (Mobile; rv:43.0) Gecko/43.0 Firefox/43.0
Comment 77•9 years ago
|
||
Bobby please see comment 76
QA Whiteboard: [QAnalyst-Triage?][qa-tracking][failed-verification] → [QAnalyst-Triage+][qa-tracking][failed-verification]
Flags: needinfo?(jmercado) → needinfo?(bchien)
Comment 78•9 years ago
|
||
Per comment 76, could you share b2g-info log per your STR? So that we could work with Cervantes for further troubleshooting.
Flags: needinfo?(pcheng)
Flags: needinfo?(cyu)
Flags: needinfo?(bchien)
Comment 79•9 years ago
|
||
Attaching b2g-info after bug reproduced on Flame 2.2 319MB memory.
Flags: needinfo?(pcheng)
Assignee | ||
Comment 80•9 years ago
|
||
oom_adj min_free 0 4096 KB 58 5120 KB 117 6144 KB 352 7168 KB 470 8192 KB 588 10240 KB ^^^^^^^^^^^^ This explains why the problem remains. But it's verified that /system/b2g/defaults/pref/lmk.js is removed, isn't it? We need to find out where the value 10240 KB comes from.
Flags: needinfo?(cyu)
Comment 82•9 years ago
|
||
As result in comment 74 and comment 76, this is issue has been fixed in v2.5/v18D. Removed v2.5 blocker. However, as comment 74, lmk.js is already removed from v18D image. So there is no default lmk.js configurations if based on v18D. Suppose there is another configuration in v2.2, which would cause result in Description and comment 80. Mahe and Josh, do we continue investigate and fix issue in v2.1 and v2.2?
blocking-b2g: 2.5+ → ---
Flags: needinfo?(mpotharaju)
Flags: needinfo?(jcheng)
Comment 83•9 years ago
|
||
Hi Bobby, I would prefer consider this for 2.2r as it is the one we have device shipping. However I am not sure the fix from bug 1172167 also apply to 2.2r as it is different UI?
Flags: needinfo?(jcheng) → needinfo?(martijn.martijn)
Comment 84•9 years ago
|
||
(In reply to Josh Cheng [:josh] from comment #83) > Hi Bobby, > I would prefer consider this for 2.2r as it is the one we have device > shipping. > However I am not sure the fix from bug 1172167 also apply to 2.2r as it is > different UI? Josh, I'm not sure what you're asking. I only disabled a test in bug 1172167, because that test is failing because of aggressive lmk (although it could also be fixed in a different way).
Flags: needinfo?(martijn.martijn)
Comment 85•9 years ago
|
||
Martijn, Based on comment 76, 1172167 was referred as the reason this issue is not being reported on Master. Can we get a confirmation if this is the bug that is making issue not reproducible on Master?
Comment 86•9 years ago
|
||
I think Josh meant to mention a different bug in comment 83, right Josh?
Flags: needinfo?(jocheng)
Comment 87•9 years ago
|
||
(In reply to Martijn Wargers [:mwargers] (QA) from comment #86) > I think Josh meant to mention a different bug in comment 83, right Josh? The bug I mentioned is base on comment 76 from Pei-Wei: "doesn't repro on master is because of bug 1172167 where apps are getting aggressively killed in the background." Did Pei-Wei mention wrong bug?
Flags: needinfo?(jocheng)
Comment 88•9 years ago
|
||
(In reply to Josh Cheng [:josh] from comment #87) > The bug I mentioned is base on comment 76 from Pei-Wei: "doesn't repro on > master is because of bug 1172167 where apps are getting aggressively killed > in the background." > Did Pei-Wei mention wrong bug? I guess not, but in comment 83, you mentioned: (In reply to Josh Cheng [:josh] from comment #83) > However I am not sure the fix from bug 1172167 also apply to 2.2r as it is > different UI? I don't see any fix in bug 1172167. There is only a pull request there that disabled one of the Gaia UI tests. (In reply to Mahendra Potharaju [:mahe] from comment #85) > Martijn, > > Based on comment 76, 1172167 was referred as the reason this issue is not > being reported on Master. Can we get a confirmation if this is the bug that > is making issue not reproducible on Master? There is no clear idea on what caused bug 1172167. Perhaps it was caused by the pull request from bug 1094759. That is certainly something that's not easily backported at all. I can certainly understand that by eagerly killing apps, this bug doesn't appear anymore.
Comment 89•9 years ago
|
||
Bobby, Yes, we need to continue investigate this. This issues is a blocker if surfaced on 2.5 OR 2.2. We are limiting patches on 2.2 as QualComm has completed their testing. Unless identified and fixed we cannot confirm it doesn't resurface on Master.
Flags: needinfo?(mpotharaju) → needinfo?(bchien)
Keywords: regressionwindow-wanted
Comment 90•9 years ago
|
||
See comment 3 for why we can't find a regression window for this bug. This is a vendor issue.
Flags: needinfo?(jmercado)
Keywords: regressionwindow-wanted
Comment 91•9 years ago
|
||
Sorry, I missed that. Thanks for pointing that out Pi Wei. But we still would need to continue investigate this on 2.2.
Updated•9 years ago
|
Flags: needinfo?(jmercado)
Comment 93•9 years ago
|
||
mark as resolved fixed in v2.5 and following. Leave 2.2 as wontfix.
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox44:
--- → unaffected
Flags: needinfo?(bchien)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•