Bugzilla

Comment 7

•

10 years ago

This sounds to me that we are drop the callscreen on memory pressure,
but in some gaia version (see comment 2) the kill/reload is either failing.

Etienne, any thought?

Etienne Segonzac (:etienne)

Updated

•

10 years ago

Flags: needinfo?(etienne)

Comment 8

•

10 years ago

(In reply to Alive Kuo [:alive][NEEDINFO!] from comment #7)
> This sounds to me that we are drop the callscreen on memory pressure,
> but in some gaia version (see comment 2) the kill/reload is either failing.
> 
> Etienne, any thought?

I don't think it's failing since the callscreen eventually comes up.
Looks like this is just bug 999478 working.

After a memorypressure we don't reload the callscreen until the next call.
And I don't think we get another event once the memory pressure is over, so we don't have a good heuristic to trigger a new preload of the callscreen.

Flags: needinfo?(etienne)

Joshua Mitchell (Inactive)

Comment 9

•

10 years ago

(In reply to howie [:howie] from comment #6)
> Can we do a regression window check on the 2.0 between raw base image v180
> and full flash?

If I'm understanding your question correctly - the answer is no - You (or WE) can not get a regression window between a base image and a branch of builds - regression windows have to be found within a branch itself AFAIK

If there IS a way to do this, we lack documentation / proper pushlog links to use / etc.

Keywords: regressionwindow-wanted

howie [:howie]

Comment 10

•

10 years ago

Hi Alive, so we still need your help to dig in more on this.

Flags: needinfo?(alive)

KTucker [:KTucker][Inactive 3/4/2016]

Comment 11

•

10 years ago

Very strange...it seems we are NEVER killing apps while there are 10 more apps running in 319MB 2.1 build.
So we are having more than 10 apps running at the same time and I think it's the root cause of this bug.

Cervantes, any idea?

Flags: needinfo?(alive) → needinfo?(cyu)

Comment 12

•

10 years ago

I wrote this up in bug 1080239 that apps are not being killed and it was resolved as invalid.

Thinker Li [:sinker]

Comment 13

•

10 years ago

This is more like a device or configuration dependent issue.  For now, configuration of LMK and zram are not changed on booting according device's configuration.  HW vendors need to change LMK/zram according their owned HW/SW configuration to get better performance.  I think it is not only for 319MB, my dogfood flame with 1G also hit the same problem if I open enough processes.  Especially, now, we never close WEB pages.  If you follow links on FB app, over the time, you will get a bunch of processes for visited links, then you will slow down too.

Cervantes, please help to revise our LMK and OOM settings.

Assignee: nobody → cyu

Comment 14

•

10 years ago

(In reply to KTucker [:KTucker] from comment #12)
> I wrote this up in bug 1080239 that apps are not being killed and it was
> resolved as invalid.

I am sorry if I misunderstood something; we are having a feature to keep the oom-killed app in background but don't really keep the application alive, and from the bug comments we don't see if you know this feature or not so we think that is the this feature. What we want to see is the |adb shell b2g-info| result to confirm all apps are live.

Assignee

Comment 15

•

10 years ago

Attached file b2g-info output after the problem is reproduced. — Details

I reproduced this problem on 2014-10-11-00-12-01 build. It requires several rounds of the steps in #c0 to see this problem. b2g-info shows that we are using LMK parameters on a low-memory device. I checked dmesg, and no process is killed after this STR. That is, the kernel works very hard to keep everything alive, but this is not expected by us.

We can
1. increase the LMK parameters, but I am suspicious about it. Even we double the parameters, free+cache is still larger than the minfree for background apps.
2. lower swapiness to make the kernel less likely to swap memory.

I guess we need to do both and will need more experiments to verify.

Flags: needinfo?(cyu)

Assignee

Comment 16

•

10 years ago

I cross-checked the minfree parameters with Intex CloudFX by cat /sys/module/lowmemorykiller/parameters/minfree

Flame: 1024,1280,1536,1792,2048,2560
Intex: 1024,1280,1536,1792,2048,4608

So we set the parameters on flame 319 MB stricter than on Intex!! That must be totally wrong.

Assignee

Comment 17

•

10 years ago

Also /sys/module/lowmemorykiller/parameters/notify_trigger:

Flame: 2304
Intex: 3584

Flame is still stricter than Intex.

Assignee

Comment 18

•

10 years ago

Loop Paul and Walter in. I think this bug is relevant to the problem that the device becomes janky after MTBF tests.

Assignee

Comment 19

•

10 years ago

minfree for background apps has always been 20480 MiB:

http://mxr.mozilla.org/mozilla-central/source/b2g/app/b2g.js#740
http://mxr.mozilla.org/mozilla-b2g32_v2_0/source/b2g/app/b2g.js#718
http://mxr.mozilla.org/mozilla-beta/source/b2g/app/b2g.js#737
http://mxr.mozilla.org/mozilla-aurora/source/b2g/app/b2g.js#740

but /system/b2g/defaults/pref/lmk.js changed notify_trigger and minfree for background apps. We need to check why the base image contains such parameters.

Tim Guan-tin Chien [:timdream] (please needinfo)

Assignee

Comment 20

•

10 years ago

ni Wesly Huang for the issue in base image lmk settings. Wesly, we'd like your help to check why /system/b2g/defaults/pref/lmk.js has such restricted values. Thanks.

Flags: needinfo?(wehuang)

Comment 21

•

10 years ago

:cyu, would you move this bug to appropriate component? Thanks.

Flags: needinfo?(cyu)

Assignee

Updated

•

10 years ago

Component: Gaia::System::Window Mgmt → GonkIntegration

Flags: needinfo?(cyu)

Comment 22

•

10 years ago

Hi Youlong:

Pls see discussion above then comment#20.

We are checking MTBF issue and now realize the low mem. killer setting in your image is quite strict (too late to kill process while free ram is very low), would like to know the reason behind and maybe need to change as well. Thank you.

Flags: needinfo?(wehuang) → needinfo?(youlong.jiang)

Comment 23

•

10 years ago

btw seems not in earlier SW before v180, is it possible that, this change is made when you upgrade to QCT CS?

Comment 24

•

10 years ago

Any update?

Comment 25

•

10 years ago

(In reply to Wesly Huang from comment #22)
> Hi Youlong:
> 
> Pls see discussion above then comment#20.
> 
> We are checking MTBF issue and now realize the low mem. killer setting in
> your image is quite strict (too late to kill process while free ram is very
> low), would like to know the reason behind and maybe need to change as well.
> Thank you.

hi wesly -

per you previous summarize, you doubt maybe this issue caused by lmk parameter strict, also checked /system/b2g/defaults/pref/lmk.js to reflect current status. we haven't modify this point, so could you help to analysis and provide configure interface and recommended value, we'll cooperate and release test base image to you.

tks.

Flags: needinfo?(youlong.jiang)

Comment 26

•

10 years ago

Hi Youlong: Are you saying the value are all default ones from QCT? pls help list the values used in v123, v180, and v188 for our reference.

Hi Cervantes: Would you help provide some suggestion about the value? could we reference some other products and select a proper one? Thank you.

Flags: needinfo?(youlong.jiang)

Flags: needinfo?(cyu)

Assignee

Comment 27

•

10 years ago

(In reply to Wesly Huang from comment #26)
> Hi Cervantes: Would you help provide some suggestion about the value? could
> we reference some other products and select a proper one? Thank you.

I think the default values in b2g.js should work on most devices:

pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 20480);
pref("hal.processPriorityManager.gonk.notifyLowMemUnderKB", 14336);

We may just remove lmk.js to make the default values work and see if the issue remains. If the problem still remains, we might need other tweaks such increasing the values or changing swapiness.

Flags: needinfo?(cyu)

Comment 28

•

10 years ago

(In reply to Wesly Huang from comment #26)
> Hi Youlong: Are you saying the value are all default ones from QCT? pls help
> list the values used in v123, v180, and v188 for our reference.
> 
> Hi Cervantes: Would you help provide some suggestion about the value? could
> we reference some other products and select a proper one? Thank you.

we've check v123,v180 and v188, they are all the same.

pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 10240);
pref("hal.processPriorityManager.gonk.notifyLowMemUnderKB", 9216);

pls take it as refer

tks.

Flags: needinfo?(youlong.jiang)

Comment 29

•

10 years ago

Per comment 18, this is kind of affecting the result of MTBF testing. Setting block to MTBF-B2G meta bug

Blocks: MTBF-B2G

Comment 30

•

10 years ago

Hi, Cervantes,

Do we even need different parameters for 319? If so, can you help with getting a parameter and communicate with people who might know how to better examinate when we set it to 319?

Updated

•

10 years ago

Flags: needinfo?(cyu)

Assignee

Comment 31

•

10 years ago

As I said in comment #27. We don't need the default values in lmk.js. They are too restrictive and are the root cause of this bug.

Flags: needinfo?(cyu)

Assignee

Comment 32

•

10 years ago

I rm /system/b2g/defaults/pref/lmk.js and the problem is not reproduced on my flame. My flame is vanilla v188 image. So I suggest just removing lmk.js and using the default values in gecko.

Updated

•

10 years ago

Blocks: 1092924

Comment 33

•

10 years ago

Sorry for catch up late. ni T2M

@Youlong: as just discussed in phone, pls help arrange a local build (userdebug) with comment#27's suggestion then release to me for more verification here. Thank you.

Flags: needinfo?(youlong.jiang)

Updated

•

10 years ago

Summary: [Performance][Dialer] Call UI can take 5 seconds to appear when multiple other apps are open. → [Performance][Dialer] lmk.js is wrong in Flame base image

Comment 34

•

10 years ago

(In reply to Wesly Huang from comment #33)
> Sorry for catch up late. ni T2M
> 
> @Youlong: as just discussed in phone, pls help arrange a local build
> (userdebug) with comment#27's suggestion then release to me for more
> verification here. Thank you.

hi wesly -

we found lmk.js is generated per build. so, could you pls help to provide point that correct lmk.js value for your test.

tks.

Flags: needinfo?(youlong.jiang)

Comment 35

•

10 years ago

(In reply to youlong.jiang from comment #34)
> (In reply to Wesly Huang from comment #33)
> > Sorry for catch up late. ni T2M
> > 
> > @Youlong: as just discussed in phone, pls help arrange a local build
> > (userdebug) with comment#27's suggestion then release to me for more
> > verification here. Thank you.
> 
> hi wesly -
> 
> we found lmk.js is generated per build. so, could you pls help to provide
> point that correct lmk.js value for your test.
> 
> tks.
Viral, any idea?

Flags: needinfo?(vwang)

Paul Yang [:pyang] (away)

Comment 36

•

10 years ago

Per comment 32, please just do as Cervantes suggest.

Flags: needinfo?(vwang)

Comment 37

•

10 years ago

re-ni? stakeholder

Flags: needinfo?(vwang)

Comment 38

•

10 years ago

Actually I do think Cervantes already provide the answer in comment 27.
We should keep it as default value.

Flags: needinfo?(vwang)

Comment 39

•

10 years ago

Hi Youlong, I assume your questions is "how to" change the value as suggested in comment#27, right? If Cervantes and Viral has no suggestion here I recommand to go for QCT for answer.


@Cervantes, Viral: do you know that?

Flags: needinfo?(youlong.jiang)

Flags: needinfo?(vwang)

Flags: needinfo?(cyu)

Comment 40

•

10 years ago

Hi Wesly,

It looks like the code comes from qualcomm:

in "device/qcom/msm8610/msm8610.mk"
out/target/product/$(TARGET_PRODUCT)/system/gecko: gaia/profile/defaults/pref/lmk.js
.PHONY: gaia/profile/defaults/pref/lmk.js
gaia/profile/defaults/pref/lmk.js: gaia/profile.tar.gz
        echo 'pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 10240);' > $@
        echo 'pref("hal.processPriorityManager.gonk.notifyLowMemUnderKB", 9216);' >> $@

It will overwrite our default setting.
I think we should ask qualcomm the season why they modify the low memory killer parameters.
Maybe they can remove it and then we can use default value as our expect.

Flags: needinfo?(wehuang)

Flags: needinfo?(vwang)

Flags: needinfo?(cyu)

Comment 41

•

10 years ago

Thanks for Viral's help!

@Youlong, pls help check with QCT for the reason, see if it's ok to change, and how to change. Thank you.

Flags: needinfo?(wehuang)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 42

•

10 years ago

Hi Michael,

Not sure if you can help on this question in comment 40.
Looks like you overwrite parameters of low memory killer in device/qcom/msm8610/msm8610.mk
We suffer some OOM issues in this case.
Is that possible that you remove it and we can use default setting for lmk?

Flags: needinfo?(mvines)

Comment 43

•

10 years ago

(In reply to viral [:viralwang] from comment #40)
> Hi Wesly,
> 
> It looks like the code comes from qualcomm:
> 
> in "device/qcom/msm8610/msm8610.mk"
> out/target/product/$(TARGET_PRODUCT)/system/gecko:
> gaia/profile/defaults/pref/lmk.js
> .PHONY: gaia/profile/defaults/pref/lmk.js
> gaia/profile/defaults/pref/lmk.js: gaia/profile.tar.gz
>         echo 'pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB",
> 10240);' > $@
>         echo 'pref("hal.processPriorityManager.gonk.notifyLowMemUnderKB",
> 9216);' >> $@
> 
> It will overwrite our default setting.
> I think we should ask qualcomm the season why they modify the low memory
> killer parameters.
> Maybe they can remove it and then we can use default value as our expect.


We have changed this to save background apps getting killed by LMK frequently. Reducing this parameter ensures that we can have more background apps and we will be using zram more on 256MB device to do this.

Instead of changing this value, I would suggest to understand who is taking most cpu usage in use case from Comment 0 "Incoming Call UI takes up to 5 seconds to appear and allow the user to accept or deny the call." .

Running |adb shell top -t -m 10| should tell us cpu usage during this operation.

Flags: needinfo?(mvines)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

10 years ago

Flags: needinfo?(vwang)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 44

•

10 years ago

           NAME   PID PPID CPU(s) NICE  USS  PSS  RSS SWAP VSIZE OOM_ADJ USER     
            b2g   206    1  637.9    0 52.1 53.4 60.3 19.6 251.5       0 root     
         (Nuwa)   400  206    5.7    0  0.0  0.2  1.8  6.8  53.7     -16 root     
OperatorVariant   935  400    7.7   18  0.3  0.7  5.1  9.1  61.8      10 u0_a935  
     Homescreen  1185  400   90.8    1  5.2  6.1 12.1 12.3  83.4       2 u0_a1185 
        Browser  3435  400   23.8   18  1.5  2.3  8.4 13.4  71.2      10 u0_a3435 
Smart Collectio  9856  400    5.8   18  2.8  3.4  8.7  8.7  64.1      10 u0_a9856 
       Messages 10881  206    3.9   18  4.4  5.0  9.9 11.1  75.3      10 u0_a10881
 Communications 10890  400    4.8   18  6.3  7.2 13.4 10.8  71.4      10 u0_a10890
       Settings 10962  400    5.3   18  4.7  5.6 11.6 10.5  68.0      10 u0_a10962
    Marketplace 11011  400    9.0   18  7.7  8.5 14.3 11.2  76.6      10 u0_a11011
(Preallocated a 11309  400    0.8   18  4.0  4.7  9.8  4.8  60.8       1 u0_a11309


System memory info:

            Total 215.3 MB
        SwapTotal 192.0 MB
     Used - cache 187.8 MB
  B2G procs (PSS)  97.2 MB
    Non-B2G procs  90.6 MB
     Free + cache  27.5 MB
             Free   6.6 MB
            Cache  20.9 MB
         SwapFree  80.9 MB
Low-memory killer parameters:

  notify_trigger 9216 KB

  oom_adj min_free
        0  4096 KB
       58  5120 KB
      117  6144 KB
      352  7168 KB
      470  8192 KB
      588 10240 KB

SwapFree  80.9 MB : it tells that almost 192-80.9 = 111 MB data is pushed to zram device. 

We have at least 7 background apps running when this issue happened. But we should also understand cpu usage before making LMK more aggresive to kill background apps.

Assignee

Comment 45

•

10 years ago

Running top on the device shows that the system has high system CPU usage (>50%) and relatively low user space CPU usage (~25%) during the 5 sec period from ringtone playing to call screen showing up. It's very likely that the system tries really hard swapping memory into/out of zram. No sign of lowmemory killer taking action.

Actually from dmesg, I also see OOM killer taking action like:

<3>[30963.239308] Out of memory: Kill process 205 (b2g) score 575 or sacrifice child
<3>[30963.245539] Killed process 325 ((Nuwa)) total-vm:74100kB, anon-rss:108kB, file-rss:16kB
<4>[32064.980410] b2g-ps invoked oom-killer: gfp_mask=0xd0, order=2, oom_adj=0, oom_score_adj=0
<6>[32064.987646] [<c010bd74>] (unwind_backtrace+0x0/0xf8) from [<c087d9c8>] (dump_header.isra.10+0x74/0x180)

So I strongly suggest increasing the values as previously suggested.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 46

•

10 years ago

remove ni since Cervantes already feedback.

Flags: needinfo?(vwang)

Comment 47

•

10 years ago

(In reply to Cervantes Yu from comment #45)
> Running top on the device shows that the system has high system CPU usage
> (>50%) and relatively low user space CPU usage (~25%) during the 5 sec
> period from ringtone playing to call screen showing up. It's very likely
> that the system tries really hard swapping memory into/out of zram. No sign
> of lowmemory killer taking action.
> 


Could you please also post full output of |adb shell top -t -m 10| . I am curious to see what are those processes :) .

Flags: needinfo?(vwang)

Flags: needinfo?(cyu)

Assignee

Comment 48

•

10 years ago

Attached file CPU usage when the bug is reproduced — Details

Flags: needinfo?(vwang)

Flags: needinfo?(cyu)

Assignee

Comment 49

•

10 years ago

More experiment: If we don't ask the background process to GC then the performance is much better on incoming call, even with the current lmk setting and many apps open. Call screen shows up in about 1 sec after ringtone plays.

My wild guess is that zram and GC'ing the background process just doesn't work well with each other. With the foreground and background app running concurrently, zram could repeatedly and alternatively swapping pages in and out in these 2 processes.

This is not to say that we don't need to change the lmk settings. With the current lmk settings, I could even run out of swap space and let the OOM killer kicks in to kill a random process. This is really bad for end users.

Gabriele, what's your comment on not GC when sending the process to background?

Flags: needinfo?(gsvelto)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 50

•

10 years ago

(In reply to Cervantes Yu from comment #49)
> Gabriele, what's your comment on not GC when sending the process to
> background?

We already encountered the problem in the past (bug 963477) and disabled GC'ing the background application in the v1.3t branch only. The real solution would be to provide a fix for bug 1082290 which I'm working on, but that won't be ready for some time and I see that this bug is 2.1+ so we need a quick fix here. Also I'm not sure if the feature bug 1082290 will use is present in all kernels we support. My guess is that it's not so it wouldn't be enough.

What I would suggest is that we fix bug 975360 instead. The idea is that we would have a pref that establishes if applications sent in the background are GC'd or not. zram-based devices will then turn this pref off to prevent needlessly swapping from zram.

Bug 963477 delayed the GC but I don't think it's a good idea in general because that's going to cause a slow-down anyway (just later) and it might have a negative impact on battery life due to the the significant swapping.

Flags: needinfo?(gsvelto)

Comment 51

•

10 years ago

(In reply to Cervantes Yu from comment #48)
> Created attachment 8523640 [details]
> CPU usage when the bug is reproduced

I am seeing below line in cpu usage log which suggests b2g main process is doing some activity :

14172  0  21% S    69 243864K  46960K     root     /system/b2g/b2g

For comment 50,

My vote will be to fix gc issues instead of changing LMK.. It seems like we are moving in right direction already :)

Assignee

Comment 52

•

10 years ago

(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #51)
> (In reply to Cervantes Yu from comment #48)
> > Created attachment 8523640 [details]
> > CPU usage when the bug is reproduced
> 
> I am seeing below line in cpu usage log which suggests b2g main process is
> doing some activity :
> 
> 14172  0  21% S    69 243864K  46960K     root     /system/b2g/b2g
> 
> For comment 50,
> 
> My vote will be to fix gc issues instead of changing LMK.. It seems like we
> are moving in right direction already :)

No, lmk.js also needs to be changed. Otherwise, if the OOM killer kicks in, it will kill a random process. The worst case is the b2g process, which is a system crash.

Updated

•

10 years ago

Flags: needinfo?(bbajaj)

Comment 53

•

10 years ago

Hi Gabriele and Tapas, 

what's your view about comment#52, that fix not only GC but also lmk? Since per comment#27 & comment#28 indeed Flame has smaller value then a working setting in current b2g.js?

Flags: needinfo?(tkundu)

Flags: needinfo?(gsvelto)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 54

•

10 years ago

(In reply to Wesly Huang from comment #53)
> what's your view about comment#52, that fix not only GC but also lmk? Since
> per comment#27 & comment#28 indeed Flame has smaller value then a working
> setting in current b2g.js?

I agree. The values present in the base image don't make any sense. They will prevent the order we set up to kill applications from working correctly since the KillUnderKB value for background applications is terribly close to all others. There's also not enough room around the notifyLowMemUnderKB threshold possibly making low-memory notifications useless.

Note that the default parameters were designed with a 256MiB device in mind. On a device like the flame with more memory those could be raised a bit to allow for more wiggle room when lots of apps are open but definitely not lowered.

Flags: needinfo?(gsvelto)

Comment 55

•

10 years ago

(In reply to Cervantes Yu from comment #52)
> (In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from
> comment #51)
> > (In reply to Cervantes Yu from comment #48)
> > > Created attachment 8523640 [details]
> > > CPU usage when the bug is reproduced
> > 
> > I am seeing below line in cpu usage log which suggests b2g main process is
> > doing some activity :
> > 
> > 14172  0  21% S    69 243864K  46960K     root     /system/b2g/b2g
> > 
> > For comment 50,
> > 
> > My vote will be to fix gc issues instead of changing LMK.. It seems like we
> > are moving in right direction already :)
> 
> No, lmk.js also needs to be changed. Otherwise, if the OOM killer kicks in,
> it will kill a random process. The worst case is the b2g process, which is a
> system crash.

>> Otherwise, if the OOM killer kicks in, it will kill a random process.

Not really. LMK will kill b2g process only if system is still under memory pressure even after it kills NUWA, homescreen, preallocate app and all other FFOS apps. We never saw b2g getting killed randomly if there is no memleak on system. 


IMO, we should modify lmk.js only after solving gc issues. If we still see problem then we can go ahead and change lmk settings.

Flags: needinfo?(tkundu) → needinfo?(wehuang)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 56

•

10 years ago

(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #55)
> Not really. LMK will kill b2g process only if system is still under memory
> pressure even after it kills NUWA, homescreen, preallocate app and all other
> FFOS apps. We never saw b2g getting killed randomly if there is no memleak
> on system.

Yes, the main process is in a class of his own and *all* other apps will be killed before it. The only scenario in which the main process can be killed is if it's exhausted all memory on his own and all other apps have already been killed.

> IMO, we should modify lmk.js only after solving gc issues. If we still see
> problem then we can go ahead and change lmk settings.

The LMK changes are also needed. The settings you're seeing here are breaking our OOM policy. This page contains more details on the process and describes why the values of those parameters were set that way in b2g.js

https://developer.mozilla.org/en-US/Firefox_OS/Platform/Out_of_memory_management_on_Firefox_OS

Comment 57

•

10 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #56)
> The LMK changes are also needed. The settings you're seeing here are
> breaking our OOM policy. This page contains more details on the process and
> describes why the values of those parameters were set that way in b2g.js
> 
> https://developer.mozilla.org/en-US/Firefox_OS/Platform/
> Out_of_memory_management_on_Firefox_OS

ok please change lmk settings if you feel it is needed :) thanks for informing us.

Paul Yang [:pyang] (away)

Updated

•

10 years ago

Whiteboard: [2.1-exploratory-3] → [2.1-exploratory-3][mtbf]

Comment 58

•

10 years ago

According to comment 40, comment 56, and comment 57, we need to change lmk settings. We need T2M's help to change that. If T2M doesn't know how to do it, they will need to talk with the vendor who provide code to T2M. Thanks.

Component: GonkIntegration → Vendcom

Assignee

Comment 59

•

10 years ago

(In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from comment #55)
> (In reply to Cervantes Yu from comment #52)
> > (In reply to Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) from
> > comment #51)
> > > (In reply to Cervantes Yu from comment #48)
> > > > Created attachment 8523640 [details]
> > > > CPU usage when the bug is reproduced
> > > 
> > > I am seeing below line in cpu usage log which suggests b2g main process is
> > > doing some activity :
> > > 
> > > 14172  0  21% S    69 243864K  46960K     root     /system/b2g/b2g
> > > 
> > > For comment 50,
> > > 
> > > My vote will be to fix gc issues instead of changing LMK.. It seems like we
> > > are moving in right direction already :)
> > 
> > No, lmk.js also needs to be changed. Otherwise, if the OOM killer kicks in,
> > it will kill a random process. The worst case is the b2g process, which is a
> > system crash.
> 
> >> Otherwise, if the OOM killer kicks in, it will kill a random process.
> 
> Not really. LMK will kill b2g process only if system is still under memory
> pressure even after it kills NUWA, homescreen, preallocate app and all other
> FFOS apps. We never saw b2g getting killed randomly if there is no memleak
> on system. 
> 
There are 2 different killers: The low memory killer (LMK) and the OOM killer. The settings in lmk.js affects the low memory killer, but I am talking about the OOM killer. OOM killer kills a process based on the OOM score. The score is computed using various hints, one of which is how much memory it consumes.

On a running flame (319M), b2g's oom_score is even higher than the preallocated process:
b2g              0 root      203   1     214608 58876 ffffffff b6ef4894 S /system/b2g/b2g
(Preallocated a  2 u0_a2069  2069  389   79988  17408 ffffffff b6ef4894 S /system/b2g/b2g

root@flame:/ # cat /proc/203/oom_score                                         
150
root@flame:/ # cat /proc/2069/oom_score                                        
113

That is, when the "OOM killer" kicks in, b2g will be killed before the preallocated process, even we set the "LMK" to work the other way.

> IMO, we should modify lmk.js only after solving gc issues. If we still see
> problem then we can go ahead and change lmk settings.

For the above reason, we should modify lmk.js whether we solve the gc issue or not. Otherwise we might run into getting the b2g process killed.

Comment 60

•

10 years ago

Hi Youlong, pls see comment#58 and contact QC for help to change lmk setting.

@Tapas: if you know how to do this maybe you can kindly guide T2M here? Thanks.

Flags: needinfo?(wehuang) → needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 61

•

10 years ago

It's just a matter of removing that file. We already have sane defaults for those settings in master gecko.

Comment 62

•

10 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #61)
> It's just a matter of removing that file. We already have sane defaults for
> those settings in master gecko.

Yes . Agreed. Please let me know if that works for you/T2M :) .

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

10 years ago

Flags: needinfo?(wehuang)

bhavana bajaj [:bajaj]

Updated

•

10 years ago

Flags: needinfo?(bbajaj)

Comment 63

•

10 years ago

Thanks Tapas, Gabriele.

@Youlong: pls follow the suggestion above and let us know if any question. Thanks.

Flags: needinfo?(wehuang)

Comment 64

•

10 years ago

(In reply to Wesly Huang from comment #63)
> Thanks Tapas, Gabriele.
> 
> @Youlong: pls follow the suggestion above and let us know if any question.
> Thanks.

hi wesly -

we've moved lmk.js from system and would release version with patch taken.

tks.

Flags: needinfo?(youlong.jiang)

Mike Lien[:mlien]

Comment 65

•

10 years ago

verify with the latest base image v18D, lmk.js is already removed.

Mike Lien[:mlien]

Updated

•

10 years ago

Whiteboard: [2.1-exploratory-3][mtbf] → [2.1-exploratory-3][mtbf][POVB]

Michael Vines [:m1] [:evilmachines]

Comment 66

•

10 years ago

Hi Tapas:

Now the change is applied in T2M's SW release to us, however for our own full build it still links to code in CAF, do you think you can make same change there?

Flags: needinfo?(tkundu)

Comment 67

•

10 years ago

Please just fork the CAF project if you'd like to customize its contents for your Flame build.

Flags: needinfo?(tkundu)

Steven Yang [:styang]

Comment 68

•

9 years ago

Hi Wesly, any further action needed for this issue?

Flags: needinfo?(wehuang)

Comment 69

•

9 years ago

Any updates?

Also, we can for the CAF project but the things on T2M phones will still be incorrect.

Comment 70

•

9 years ago

(In reply to Steven Yang [:styang] from comment #68)
> Hi Wesly, any further action needed for this issue?

In comment#65 it's verified v18D has done the removal as suggested in comment#61 & #62, so I see no further action needed for T2M/Flame base image. (also we don't have further Flame base image release plan after v18D)

My understanding is, the left thing is if we would like to make changes accordingly in Mozilla's Flame build, comment#66 & #67 covers this topic, so it depends on if we would like to fork it in our code.

Flags: needinfo?(wehuang)

Comment 71

•

9 years ago

Does this bug still blocks 2.2 MTBF?

Flags: needinfo?(wachen)

Comment 72

•

9 years ago

I believe that the setting is still wrong in new build/images.

Flags: needinfo?(wachen)

Assignee

Updated

•

9 years ago

Updated

•

9 years ago

Blocks: 1180853

Peter Bylenga [:PBylenga]

Comment 73

•

9 years ago

MTBF Triage: remove from MTBF monitor.

No longer blocks: MTBF-B2G

Whiteboard: [2.1-exploratory-3][mtbf][POVB] → [2.1-exploratory-3][POVB]

Josh Cheng [:josh]

Updated

•

9 years ago

blocking-b2g: 2.1+ → 2.5+

Updated

•

9 years ago

QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][qa-tracking]

Comment 74

•

9 years ago

Per comment 32 and comment 65, lmk.js had been removed in v18D image. This should already be resolved, and not be 2.5 blocker. 

Mark verifyme to double check.

-----
Build ID               20150804150207
Gaia Revision          c5425d9f1f5184731a59ed4bc99295acbde30390
Gaia Date              2015-08-04 16:09:19
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/f3b757156f69
Gecko Version          42.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150712.193621
Firmware Date          Sun Jul 12 19:36:34 EDT 2015
Bootloader             L1TC000118D0

status-b2g-master: --- → unaffected

Keywords: verifyme

Pi Wei Cheng [:piwei] (inactive)

Comment 75

•

9 years ago

please help to verify bug. Thanks.

Keywords: qawanted

Comment 76

•

9 years ago

This issue is still reproducing on Flame 2.2 and 2.1. Following STR, device takes more than 5 seconds to show call UI. Reproduction frequency is 3 out of 3 on each branch.

Device: Flame 2.2 (full flashed 319MB KK)
BuildID: 20150828032506
Gaia: 335cd8e79c20f8d8e93a6efc9b97cc0ec17b5a46
Gecko: 16d864d163de
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 37.0 (2.2) 
Firmware Version: v18Dv4
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

Device: Flame 2.1 (full flashed 319MB KK)
BuildID: 20150724001207 (note: we stopped getting newer builds on this branch)
Gaia: 9dba58d18006e921546cec62c76074ce81e16518
Gecko: 41e10c6740be
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 34.0 (2.1) 
Firmware Version: v18Dv4
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

-------

This issue does NOT occur on Flame 2.5/master. Following STR, call UI is displayed within 2 seconds.

I think the reason why this doesn't repro on master is because of bug 1172167 where apps are getting aggressively killed in the background. Without that bug, master is likely still affected.

Device: Flame 2.5 (full flashed 319MB KK)
BuildID: 20150828030207
Gaia: b69c16798ddd7154207f56d983721a327522f5d1
Gecko: 87e23922be375985d0b1906ed5ba5f095f323a38
Gonk: c4779d6da0f85894b1f78f0351b43f2949e8decd
Version: 43.0a1 (2.5 Master) 
Firmware Version: v18Dv4
User Agent: Mozilla/5.0 (Mobile; rv:43.0) Gecko/43.0 Firefox/43.0

QA Whiteboard: [QAnalyst-Triage+][qa-tracking] → [QAnalyst-Triage?][qa-tracking][failed-verification]

Flags: needinfo?(jmercado)

Keywords: qawanted, verifyme

Jayme Mercado [:JMercado] (Inactive 3/4/2016)

Comment 77

•

9 years ago

Bobby please see comment 76

QA Whiteboard: [QAnalyst-Triage?][qa-tracking][failed-verification] → [QAnalyst-Triage+][qa-tracking][failed-verification]

Flags: needinfo?(jmercado) → needinfo?(bchien)

Pi Wei Cheng [:piwei] (inactive)

Comment 78

•

9 years ago

Per comment 76, could you share b2g-info log per your STR? So that we could work with Cervantes for further troubleshooting.

Flags: needinfo?(pcheng)

Flags: needinfo?(cyu)

Flags: needinfo?(bchien)

Comment 79

•

9 years ago

Attached file bug1081577_b2g-info — Details

Attaching b2g-info after bug reproduced on Flame 2.2 319MB memory.

Flags: needinfo?(pcheng)

Assignee

Comment 80

•

9 years ago

  oom_adj min_free
        0  4096 KB
       58  5120 KB
      117  6144 KB
      352  7168 KB
      470  8192 KB
      588 10240 KB
      ^^^^^^^^^^^^
This explains why the problem remains. But it's verified that /system/b2g/defaults/pref/lmk.js is removed, isn't it? We need to find out where the value 10240 KB comes from.

Flags: needinfo?(cyu)

Comment 81

•

9 years ago

Marking as P2 for 2.5

Priority: -- → P2

Comment 82

•

9 years ago

As result in comment 74 and comment 76, this is issue has been fixed in v2.5/v18D. Removed v2.5 blocker. 

However, as comment 74, lmk.js is already removed from v18D image. So there is no default lmk.js configurations if based on v18D. Suppose there is another configuration in v2.2, which would cause result in Description and comment 80. 

Mahe and Josh, do we continue investigate and fix issue in v2.1 and v2.2?

blocking-b2g: 2.5+ → ---

Flags: needinfo?(mpotharaju)

Flags: needinfo?(jcheng)

Josh Cheng [:josh]

Comment 83

•

9 years ago

Hi Bobby,
I would prefer consider this for 2.2r as it is the one we have device shipping. 
However I am not sure the fix from bug 1172167 also apply to 2.2r as it is different UI?

Flags: needinfo?(jcheng) → needinfo?(martijn.martijn)

Martijn Wargers (dead)

Comment 84

•

9 years ago

(In reply to Josh Cheng [:josh] from comment #83)
> Hi Bobby,
> I would prefer consider this for 2.2r as it is the one we have device
> shipping. 
> However I am not sure the fix from bug 1172167 also apply to 2.2r as it is
> different UI?

Josh, I'm not sure what you're asking. I only disabled a test in bug 1172167, because that test is failing because of aggressive lmk (although it could also be fixed in a different way).

Flags: needinfo?(martijn.martijn)

Comment 85

•

9 years ago

Martijn, 

Based on comment 76, 1172167 was referred as the reason this issue is not being reported on Master. Can we get a confirmation if this is the bug that is making issue not reproducible on Master?

Martijn Wargers (dead)

Comment 86

•

9 years ago

I think Josh meant to mention a different bug in comment 83, right Josh?

Flags: needinfo?(jocheng)

Josh Cheng [:josh]

Comment 87

•

9 years ago

(In reply to Martijn Wargers [:mwargers] (QA) from comment #86)
> I think Josh meant to mention a different bug in comment 83, right Josh?

The bug I mentioned is base on comment 76 from Pei-Wei: "doesn't repro on master is because of bug 1172167 where apps are getting aggressively killed in the background."
Did Pei-Wei mention wrong bug?

Flags: needinfo?(jocheng)

Martijn Wargers (dead)

Comment 88

•

9 years ago

(In reply to Josh Cheng [:josh] from comment #87)
> The bug I mentioned is base on comment 76 from Pei-Wei: "doesn't repro on
> master is because of bug 1172167 where apps are getting aggressively killed
> in the background."
> Did Pei-Wei mention wrong bug?

I guess not, but in comment 83, you mentioned:

(In reply to Josh Cheng [:josh] from comment #83)
> However I am not sure the fix from bug 1172167 also apply to 2.2r as it is
> different UI?

I don't see any fix in bug 1172167. There is only a pull request there that disabled one of the Gaia UI tests.

(In reply to Mahendra Potharaju [:mahe] from comment #85)
> Martijn, 
> 
> Based on comment 76, 1172167 was referred as the reason this issue is not
> being reported on Master. Can we get a confirmation if this is the bug that
> is making issue not reproducible on Master?

There is no clear idea on what caused bug 1172167. Perhaps it was caused by the pull request from bug 1094759. That is certainly something that's not easily backported at all.
I can certainly understand that by eagerly killing apps, this bug doesn't appear anymore.

Pi Wei Cheng [:piwei] (inactive)

Comment 89

•

9 years ago

Bobby, Yes, we need to continue investigate this. This issues is a blocker if surfaced on 2.5 OR 2.2. We are limiting patches on 2.2 as QualComm has completed their testing. Unless identified and fixed we cannot confirm it doesn't resurface on Master.

Flags: needinfo?(mpotharaju) → needinfo?(bchien)

Keywords: regressionwindow-wanted

Comment 90

•

9 years ago

See comment 3 for why we can't find a regression window for this bug. This is a vendor issue.

Flags: needinfo?(jmercado)

Keywords: regressionwindow-wanted

Jayme Mercado [:JMercado] (Inactive 3/4/2016)

Comment 91

•

9 years ago

Sorry, I missed that. Thanks for pointing that out Pi Wei. But we still would need to continue investigate this on 2.2.

Updated

•

9 years ago

Flags: needinfo?(jmercado)

Comment 92

•

9 years ago

mark 2.2+ for tracking.

blocking-b2g: --- → 2.2+