Closed Bug 1428659 Opened 6 years ago Closed 6 years ago

Cloudtrail log retrieval broken

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: riman)

Details

Attachments

(1 file)

Found in papertrail, been there since at least Dec 31:

> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py: Traceback (most recent call last):
> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py:   File "aws_get_cloudtrail_logs.py", line 114, in <module>
> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py:     main()
> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py:   File "aws_get_cloudtrail_logs.py", line 94, in main
> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py:     bucket = conn.get_bucket(args.s3_bucket)
> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py:   File "/builds/aws_manager/lib/python2.7/site-packages/boto/s3/connection.py", line 471, in get_bucket
> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py:     return self.head_bucket(bucket_name, headers=headers)
> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py:   File "/builds/aws_manager/lib/python2.7/site-packages/boto/s3/connection.py", line 504, in head_bucket
> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py:     raise err
> Jan 07 14:35:03 aws-manager2.srv.releng.scl3.mozilla.com aws_get_cloudtrail_logs.py: boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden

catlee, you've been looking at buckets recently. Do you have context on mozilla-releng-aws-logs, and if pulling the logs down is still useful ?
Flags: needinfo?(catlee)
I'm pretty sure this isn't needed any more. The sanity script used to use these to check if we had instances stopped for a long time, but it doesn't look like that is enabled any more.
Flags: needinfo?(catlee)
Bulk change of QA Contact to :jlund, per https://bugzilla.mozilla.org/show_bug.cgi?id=1428483
QA Contact: catlee → jlund
jlund, perhaps buildduty could work on this. There's a cron job on aws-manager2 that we can remove from puppet, plus assorted other cleanup, see
  https://dxr.mozilla.org/build-central/search?q=cloudtrail+path%3Apuppet&redirect=false

Plus manual removal of files/dirs on aws-manager[12].
Depending on how much time we want to put into on this there are a few of options.

1, Just remove the broken cron job, preventing an error in papertrail
a, write a puppet patch that removes https://dxr.mozilla.org/build-central/source/puppet/modules/aws_manager/manifests/cron.pp#165-171
b, get review, land
c, remove aws-manager[12]:/etc/cron.d/aws_manager-aws_get_cloudtrail_logs.py.cron, since puppet won't clean up by itself

2, Remove all of the cloudtrail cron jobs
a, Write a puppet patch which removes the three crons which download, process, and tidy up cloudtrail logs, see https://dxr.mozilla.org/build-central/source/puppet/modules/aws_manager/manifests/cron.pp#165-185
b, Use https://dxr.mozilla.org/build-central/ to see if cloudtrail_s3_bucket, cloudtrail_s3_base_prefix, cloudtrail_logs_dir, and events_dir are used by anything else. If not clean up those definitions too
c, get review, land
d, remove aws-manager[12]:/etc/cron.d/aws_manager-aws_get_cloudtrail_logs.py.cron, aws_manager-aws_process_cloudtrail_logs.py.cron, aws_manager-aws_clean_log_dir.py.cron, since puppet won't clean up by itself
e, leave the events dir alone on aws-manager[12], even though it's an empty husk, because that will break aws_sanity_checker.py

3, 2 plus remove --events-dir support from aws_sanity_checker.py, update cron definition, and remove the events_dir from disk on aws-manager[12].
Assignee: nobody → riman
Attachment #8960420 - Flags: review?(mtabara)
Comment on attachment 8960420 [details] [diff] [review]
patch_1428659.txt

Review of attachment 8960420 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for delays, been out in PTO and forgot to update my bugzilla handle. 
Looks good to me!
Attachment #8960420 - Flags: review?(mtabara) → review+
I have tried to push the patch, but I wasn't able to finish the process because I don't have the required access to do it. 

:jlund can you please help me with this?
Flags: needinfo?(jlund)
(In reply to Radu Iman[:raduiman] from comment #7)
> I have tried to push the patch, but I wasn't able to finish the process
> because I don't have the required access to do it. 
> 
> :jlund can you please help me with this?

Hi Radu,

You should have the ability to push with your ssh key on login.m.c. Could you ensure that you are using ssh: to `push` rather than http:? When you cloned, you may have done so over http: so your default push would be the same. We don't allow pushing over http.

`hg paths` (or look at [paths] in $puppet_dir/.hg/hgrc) will show you the named remote destinations you can pull/push with.

e.g. 
`default = https://hg.mozilla.org/build/puppet`

You can add another line to $puppet_dir/.hg/hgrc like:
`push = ssh://hg.mozilla.org/build/puppet`

Then you can do:
`hg push -r . push`  # this will push your current revision and head to ssh://hg.mozilla.org/build/puppet

Alternatively you could not use a named path and do:
`hg push -r . ssh://hg.mozilla.org/build/puppet`

Of course when you use ssh, you will have to either make sure hg knows which ssh key on your machine or else ensure your ssh key is added to your ssh-agent.
Flags: needinfo?(jlund) → needinfo?(riman)
Pushed the changes.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
I've merged this to the production branch:
 https://hg.mozilla.org/build/puppet/rev/96d40fcf5aa9213b6845a3eae97305d83d81e884

Once that has had time to be picked up (~1 hour) we can go ahead with removing the related cron files in aws-manager[12]:/etc/cron.d/, eg aws_manager-aws_get_cloudtrail_logs.py.cron. Puppet won't clean up by itself.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Thanks for pushing to prod and outlining remaining work that puppet won't automatically do.

@arny, @radu:

normally we don't let patches sit on default, we just merge to production once we have done some testing:
    * testing: https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/HowTo/Hack_on_PuppetAgain#Test_your_code_on_the_local_machine
    * merging: https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/HowTo/Push_changes_to_Production

It's a good idea to subscribe and keep an eye on puppet shared mail when you deploy as it is a good indicator 
    * shared mail: https://groups.google.com/a/mozilla.com/forum/?hl=en#!forum/releng-puppet-mail
For restoring purposes, I have moved the cron files aws_manager-aws_get_cloudtrail_logs.py.cron and aws_manager-aws_process_cloudtrail_logs.py.cron to my ~account on aws_manager2. On aws_manager1 there are no files.


Also, I have installed docker and made the test, however it returned the bellow. It is normal? What should we get when the test is OK and we can do the merge to production?

root@vbox:/home/arny/puppet# docker run --rm --name puppet-linter -v /home/arny/puppet/:/puppet mozillarelops/puppet-linter
Executing: puppet parser validate `find manifests modules -name '*.pp' -not -name 'site.pp'`
/exec_travis_scripts.rb:8: undefined method `exitcode' for #<Process::Status: pid=5,exited(0)> (NoMethodError)
	from /exec_travis_scripts.rb:5:in `each'
	from /exec_travis_scripts.rb:5
(In reply to Attila Craciun [:arny] from comment #12)
> For restoring purposes, I have moved the cron files
> aws_manager-aws_get_cloudtrail_logs.py.cron and
> aws_manager-aws_process_cloudtrail_logs.py.cron to my ~account on
> aws_manager2. On aws_manager1 there are no files.
> 

Thanks arny!

> 
> Also, I have installed docker and made the test, however it returned the
> bellow. It is normal? What should we get when the test is OK and we can do
> the merge to production?
> 
> root@vbox:/home/arny/puppet# docker run --rm --name puppet-linter -v
> /home/arny/puppet/:/puppet mozillarelops/puppet-linter
> Executing: puppet parser validate `find manifests modules -name '*.pp' -not
> -name 'site.pp'`
> /exec_travis_scripts.rb:8: undefined method `exitcode' for
> #<Process::Status: pid=5,exited(0)> (NoMethodError)
> 	from /exec_travis_scripts.rb:5:in `each'
> 	from /exec_travis_scripts.rb:5

looking at https://wiki.mozilla.org/index.php?title=ReleaseEngineering/PuppetAgain/HowTo/Hack_on_PuppetAgain&action=history it suggests this is a question dcrisan.
Flags: needinfo?(dcrisan)
Hello Jordan,

I had tried to push using http: ,this was the problem. 

I configured $puppet/.hg/hgrc, to push using ssh: 
My ssh key is added to the ssh-agent

It should work from now on. 

Thank you.
Flags: needinfo?(riman)
:jwatkins, can you have a look please over this error?
Flags: needinfo?(dcrisan) → needinfo?(jwatkins)
(In reply to Attila Craciun [:arny] PTO 4/2-4/11 from comment #12)

> Also, I have installed docker and made the test, however it returned the
> bellow. It is normal? What should we get when the test is OK and we can do
> the merge to production?
> 
> root@vbox:/home/arny/puppet# docker run --rm --name puppet-linter -v
> /home/arny/puppet/:/puppet mozillarelops/puppet-linter
> Executing: puppet parser validate `find manifests modules -name '*.pp' -not
> -name 'site.pp'`
> /exec_travis_scripts.rb:8: undefined method `exitcode' for
> #<Process::Status: pid=5,exited(0)> (NoMethodError)
> 	from /exec_travis_scripts.rb:5:in `each'
> 	from /exec_travis_scripts.rb:5



This was a bug introduced by my rubber stamp approval.  I've fixed it here:
https://github.com/mozilla-platform-ops/puppet-linter/commit/9f933e56e919098439a34be1c48b719bf862b42c

Please pull the latest docker image to update:
docker pull mozillarelops/puppet-linter
Flags: needinfo?(jwatkins)
I have run the docker command, everything worked as expected, so I'm gonna close this bug. Please reopen if anything happens.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: