Closed
Bug 909011
Opened 11 years ago
Closed 10 years ago
[traceback] handle occasional amqplib connection errors
Categories
(Input Graveyard :: Code Quality, defect, P1)
Input Graveyard
Code Quality
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: willkg, Assigned: willkg)
Details
(Whiteboard: u=user c=general p=1 s=input.2015q1)
When someone leaves feedback, we save the feedback in the database and then kick off a celery task to index that response.
Very infrequently, there's a connection error in amqplib which I'm pretty sure means the response isn't indexed in ES and also might mean that the user gets back an HTTP 500 error.
This bug covers writing a test for that to see whether those two theories are true and alleviating the issue in some way.
Traceback from production:
Traceback (most recent call last):
File "/data/www/input.mozilla.org/input/vendor/lib/python/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/usr/lib64/python2.6/site-packages/newrelic-1.11.0.55/newrelic/api/object_wrapper.py", line 216, in __call__
self._nr_instance, args, kwargs)
File "/usr/lib64/python2.6/site-packages/newrelic-1.11.0.55/newrelic/hooks/framework_django.py", line 475, in wrapper
return wrapped(*args, **kwargs)
File "/data/www/input.mozilla.org/input/vendor/lib/python/django/views/decorators/csrf.py", line 77, in wrapped_view
return view_func(*args, **kwargs)
File "/data/www/input.mozilla.org/input/vendor/lib/python/django/views/decorators/cache.py", line 89, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "/data/www/input.mozilla.org/input/fjord/feedback/views.py", line 265, in feedback_router
return view(request, *args, **kwargs)
File "/data/www/input.mozilla.org/input/vendor/lib/python/django/views/decorators/csrf.py", line 77, in wrapped_view
return view_func(*args, **kwargs)
File "/data/www/input.mozilla.org/input/vendor/lib/python/django/views/decorators/http.py", line 41, in inner
return func(request, *args, **kwargs)
File "/data/www/input.mozilla.org/input/fjord/feedback/views.py", line 213, in android_about_feedback
response, form = _handle_feedback_post(request)
File "/data/www/input.mozilla.org/input/fjord/feedback/views.py", line 110, in _handle_feedback_post
opinion.save()
File "/data/www/input.mozilla.org/input/vendor/lib/python/django/db/models/base.py", line 463, in save
self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "/data/www/input.mozilla.org/input/vendor/lib/python/django/db/models/base.py", line 565, in save_base
created=(not record_exists), raw=raw, using=using)
File "/data/www/input.mozilla.org/input/vendor/lib/python/django/dispatch/dispatcher.py", line 172, in send
response = receiver(signal=self, sender=sender, **named)
File "/data/www/input.mozilla.org/input/fjord/search/tasks.py", line 122, in _live_index_handler
index_item_task.delay(instance.get_mapping_type(), instance.id)
File "/data/www/input.mozilla.org/input/vendor/lib/python/celery/app/task/__init__.py", line 353, in delay
return self.apply_async(args, kwargs)
File "/data/www/input.mozilla.org/input/vendor/lib/python/celery/app/task/__init__.py", line 449, in apply_async
publish = publisher or self.app.amqp.publisher_pool.acquire(block=True)
File "/data/www/input.mozilla.org/input/vendor/lib/python/kombu/connection.py", line 657, in acquire
R = self.prepare(R)
File "/data/www/input.mozilla.org/input/vendor/lib/python/kombu/pools.py", line 57, in prepare
p.revive(connection.default_channel)
File "/data/www/input.mozilla.org/input/vendor/lib/python/kombu/connection.py", line 583, in default_channel
self._default_channel = self.channel()
File "/data/www/input.mozilla.org/input/vendor/lib/python/kombu/connection.py", line 151, in channel
chan = self.transport.create_channel(self.connection)
File "/data/www/input.mozilla.org/input/vendor/lib/python/kombu/transport/amqplib.py", line 259, in create_channel
return connection.channel()
File "/data/www/input.mozilla.org/input/vendor/lib/python/kombu/transport/amqplib.py", line 183, in channel
return Channel(self, channel_id)
File "/data/www/input.mozilla.org/input/vendor/lib/python/kombu/transport/amqplib.py", line 207, in __init__
super(Channel, self).__init__(*args, **kwargs)
File "/data/www/input.mozilla.org/input/vendor/lib/python/amqplib/client_0_8/channel.py", line 82, in __init__
self._x_open()
File "/data/www/input.mozilla.org/input/vendor/lib/python/amqplib/client_0_8/channel.py", line 469, in _x_open
self._send_method((20, 10), args)
File "/data/www/input.mozilla.org/input/vendor/lib/python/amqplib/client_0_8/abstract_channel.py", line 76, in _send_method
method_sig, args, content)
File "/data/www/input.mozilla.org/input/vendor/lib/python/amqplib/client_0_8/method_framing.py", line 252, in write_method
self.dest.write_frame(1, channel, payload)
File "/data/www/input.mozilla.org/input/vendor/lib/python/amqplib/client_0_8/transport.py", line 165, in write_frame
frame_type, channel, size, payload, 0xce))
File "<string>", line 1, in sendall
error: [Errno 32] Broken pipe
Assignee | ||
Comment 1•11 years ago
|
||
These could be causing some responses not to get added to ES.
Fixing the whiteboard data.
Priority: -- → P3
Whiteboard: u=user c=general p= se=input.2013q4 → u=user c=general p= s=input.2013q4
Assignee | ||
Comment 2•11 years ago
|
||
Pushing this to 2014q1 because I can't get to it this quarter.
Whiteboard: u=user c=general p= s=input.2013q4 → u=user c=general p= s=input.2014q1
Assignee | ||
Comment 3•11 years ago
|
||
Moving this to 2014q2.
Whiteboard: u=user c=general p= s=input.2014q1 → u=user c=general p= s=input.2014q2
Assignee | ||
Comment 4•11 years ago
|
||
When we add other post_save -> celery task jobs, this will affect those, too. Ricky mentioned maybe tossing those tasks in a db-based queue when amqplib fails.
Assignee | ||
Comment 5•11 years ago
|
||
I haven't seen one of these in a while. So bumping to next 2014q3. If I haven't seen one in a while at that point, I'll nix it.
Whiteboard: u=user c=general p= s=input.2014q2 → u=user c=general p= s=input.2014q3
Assignee | ||
Updated•10 years ago
|
Whiteboard: u=user c=general p= s=input.2014q3 → u=user c=general p= s=input.2014q4
Assignee | ||
Comment 6•10 years ago
|
||
These still happen periodically, but not often enough to warrant spending time on this. Pushing it to the backlog.
Whiteboard: u=user c=general p= s=input.2014q4 → u=user c=general p= s=
Assignee | ||
Comment 7•10 years ago
|
||
They're doing a PtoV for rabbitmq first week in march. I'm going to grab this bug and make sure that Input is resilient to rabbitmq being unavailable.
Outcome of this work should be that users leaving feedback *never* see an HTTP 500 if rabbitmq is down or amqp connection fails in some way. For the purposes of making sure it's not silently failing without anyone knowing, I'll have it send me emails.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: P3 → P1
Whiteboard: u=user c=general p= s= → u=user c=general p= s=input.2015q1
Assignee | ||
Comment 8•10 years ago
|
||
I wrapped the indexing task creation in some code that checks to see if the problem is amqp related and if so, it sends an error email to the admin, but otherwise doesn't do anything and thus doesn't kick up an HTTP 500 for the user. This should adequately handle most/all rabbitmq outages including the upcoming PtoV work.
In a PR: https://github.com/mozilla/fjord/pull/497#issuecomment-75563467
Landed in https://github.com/mozilla/fjord/commit/1f3bde06298007396117b27545d29aefb5f9fd30
Pushed it out just now. I'll keep an eye on it, but so far it looks good.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: u=user c=general p= s=input.2015q1 → u=user c=general p=1 s=input.2015q1
Updated•8 years ago
|
Product: Input → Input Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•