Investigating a Promise bug
As most people do nowadays, I write code using modern JavaScript… then serve the
transpiled version, also adding the polyfills where needed. It mostly works fine,
except when it doesn’t 🙃.
In fact, recently we got this bug report:
When the user clicks the menu button, the menu doesn’t appear.
The problem was easily reproducible on IE11 - and only on IE11.
First guesses
One interesting thing about this menu is that its content is fetched asyncronously only after the user clicks on a particular button.
So, my first guess was there had to be something wrong with this request; except the Network panel showed the request was completed perfectly fine, and still the menu was not rendered.
Then I looked at the Console panel, cause I imagined there had to be an error happening
somewhere before the menu could be rendered. But again, no luck: the console was
almost perfectly clean, except for one exception happening in a completely unrelated
third party script.
This particular error was happening also on different browsers, and didn’t seem to affect
the functionality of the third party script. I thought I could safely ignore it for now.
It didn’t require much time to understand that the menu was not the only component that had problems; in fact, at this point almost nothing was working properly - and in the end all the problems could be reconducted to the fact that also a simple snippet like the one below failed the general expectations.
Promise.resolve()
.then(
function () {
// Never runs; weird, isn't it?!
console.log("Promise fulfilled");
}
)
Well, that’s quite surprising, isn’t it?!
At this point I was hooked - I really enjoy investigating this kind of issues.
Also - since IE11 doesn’t support Promise
natively, … and I learned two, or three things
about Promise
’s polyfills when I wrote my own Promise polyfill
a couple of years ago - I felt confident I could understand the problem, and eventually fix the bug.
Debugging core-js Promise polyfill
At this point I thought the only way to figure out why the Promise
’s handlers
had not been executed was to dive deep into the polyfill implementation.
Using IE11 devtools is frustrating to say the least - and doing it through the great
BrowserStack doesn’t really help… so I forced Chrome to use the Promise
polyfill implementation.
// Add this just before you load the polyfill.
window.Promise = void 0;
I’ve used this approach a couple of time in the past, with great satisfaction.
My hope was this trick could permit to reproduce the issue also in Chrome. Sadly,
this was not the case.
Even if this attempt didn’t work they way I hoped, it permitted to learn another thing
about the problem I was investigating: counterintuitively, even if the problem manifests itself
as a Promise
issue, it doesn’t depends by the Promise
implementation.
I got further confirmation of this thesis by replacing the polyfill with a different one …
The problem was still there.
This sure is interesting, but doesn’t help much moving the investigation forward; rather makes pointless the effort of debugging a specific implementation.
So, at this point I was a bit lost.
I had already spent a couple of hours into investigating the issue, but didn’t advance much
into the understanding of the problem.
That was when I thought back at the error caused by the third party script, I noticed
at the very beginning. It was a pretty weak lead to follow, but I didn’t have many other ideas
back then; so it was worth at least taking a closer look.
Reproducing the IE11 bug
This is how I discovered the error was happening in the handler passed to a MutationObserver
registered by this third party script.
This isn’t something bad per se; but this little detail immediately ringed a bell.
Most Promise
polyfills use MutationObserver
under the hood to emulate
the scheduling of a microtask.
Could this be a simple coincidence?
Well, there was a quick way to find out.
This third party script is not fundamental for the page - I removed it, and at this point
everything was working perfectly fine.
At this point my job was pretty much done … contact the third party, file a bug, provide a repro … the usual stuff at the end; except I still wanted to understand how it was possible for this third party script to break so badly our website.
At this point I already knew few things. The problem - for mysterious reasons - is related to
Promise
polyfill and MutationObserver
.
So, I came out with a new strategy: in the Promise
polyfill, replace the microtask
implementation based on MutationObserver
, with another one that doesn’t rely on MutationObserver
.
It looks like a lot of work, but in the end is very simple:
// node_modules/core-js/internals/microtask.js
module.exports = function (fn) {
setTimeout(fn, 0);
}
Now the implementation above is not the best one, but it’s Good Enough â„¢, in fact is usually the last fallback in environments which do not provide any other means to schedule microtasks.
With this change in place, the menu - and everything else - was working again, despite the error in the third party script.
Well, I couldn’t ship this fix of course; but this time I learned something very important,
that in the end permitted me to understand the root of the issue: the problem is to be
searched in the MutationObserver
.
That’s how I came out to understand that Internet Explorer 11 has a bug, that when an
unhandled exception occurs in a MutationObserver
handler, it ignores all the other handlers
that should have been eventually executed during that same microtask.
Lovely. Here’s the simple repro:
var obs1 = new MutationObserver(
function handler1 () {
console.log("handler1 running");
}
);
var obs2 = new MutationObserver(
function handler2 () {
console.log("handler2 running");
}
);
obs1.observe(document.body, { childList: true });
obs2.observe(document.body, { childList: true });
function notify () {
document.body.appendChild(document.createElement("div"));
}
When notify
is executed, we expect to read two messages in console.
Let’s say now, handler1
for some reasons throw an exception:
var obs1 = new MutationObserver(
function handler1 () {
throw new Error("Ops.");
}
);
I still expect to see the message from the second observer.
At least this is what I experience when I run this code in Chrome, Firefox, Safari, etc.
Ok, so IE11 has a bug … who could have ever imagined!?
Jokes aside, I still didn’t understand how this IE11 bug was affecting the Promise
polyfill.
Finally, the light
At this point, it looked clear to me that, to understand the cause of the problem, I should have looked at the microtask implementation in the core-js polyfill.
To make things easier to follow, I’m writing here the important bits. The full version could be seen on GitHub.
var flush, head, last, notify, toggle, node;
// Bruno: I've removed the large `if` here,
// and kept only the relevant block.
flush = function () {
var parent, fn;
if (IS_NODE && (parent = process.domain)) parent.exit();
while (head) {
fn = head.fn;
head = head.next;
try {
fn();
} catch (error) {
if (head) notify();
else last = undefined;
throw error;
}
} last = undefined;
if (parent) parent.enter();
};
toggle = true;
node = document.createTextNode('');
new MutationObserver(flush).observe(node, { characterData: true });
notify = function () {
node.data = toggle = !toggle;
};
module.exports = function (fn) {
var task = { fn: fn, next: undefined };
if (last) last.next = task;
if (!head) {
head = task;
notify();
} last = task;
};
The head
variable is a single linked list; it references the unit of work to be processed.
The flush
function serves as the MutationObserver
handler … it walks the linked list,
processing the work, until it becomes empty.
The notify
function encapsulates the ability to trigger the MutationObserver
- it is
not too much different from my example above.
Finally the module exports a function, that wraps notify
, and takes care of updating
the job list. Here, there’s a very important thing to notice:
module.exports = function (fn) {
var task = { fn: fn, next: undefined };
if (last) last.next = task;
if (!head) {
head = task;
notify();
} last = task;
};
The notify
function is executed only when there isn’t any job currently processed - head
points to nothing. This usually works fine, cause as we said the flush
function takes care
of traversing the job list, and executing the jobs one by one … so that when everything
is done head
points back to nothing, or undefined
.
Except we discovered flush
gets not executed at all, if an error happens in a
MutationObserver
handler scheduled for execution in that same microtask.
That’s very unfortunate! Now head
points to the last job planned - a job that won’t ever
be executed, leaving head
pointing it for ever, so that further call to the microtask
scheduler will never reach the line where notify
is executed.
This finally explains why even the most simple of Promise
seems to never be fullfilled.
Promise.resolve()
.then(
function () {
// Never runs; weird, isn't it?!
console.log("Promise fulfilled");
}
)
One mistery remains
Going through all this was interesting, and being able to understand the cause
of the problem extremely rewarding.
But even after all of this, one mistery remains - one destined to remain covered
in mist for ever - why the cookie law script (our third party script) needed to register
a MutationObserver
in the first place ?!