From 8e0651374aebbee063846058c978c6b6dd50989a Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Thu, 6 Feb 2020 11:41:44 -0600 Subject: [PATCH] Deregister JIT frames on Linux in reverse order (#910) Investigating a susprisingly slow-compiling module recently, it turns out that if you create a wasm module with 40k empty functions (e.g. `(module (func) (func) (func) ...)`) then it takes **3 seconds** to compile and drop via the CLI locally on a Linux system. This seems like an extraordinary amount of time for "doing nothing", and after some profiling I found that basically all of the time was spent in `__deregister_frame` calls. Poking around in the source it looks like libgcc is managing some form of linked list, and by deregistering in the LIFO order instead of FIFO order it avoids a quadratic search of all registered functions. Now that being said it's still pretty bad to do a linear search all the time, and nothing will be fixed if there are *two* instances both with 40k functions. For now though I hope that this will patch over the performance issue and we can figure out better ways to manage this in the future. --- crates/jit/src/function_table.rs | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/crates/jit/src/function_table.rs b/crates/jit/src/function_table.rs index ffa37ce953..625ae9d7d9 100644 --- a/crates/jit/src/function_table.rs +++ b/crates/jit/src/function_table.rs @@ -199,9 +199,23 @@ impl Drop for FunctionTable { fn __deregister_frame(fde: *const u8); } - if self.published.is_some() { + if let Some(published) = &self.published { unsafe { - for fde in self.published.as_ref().unwrap() { + // I'm not really sure why, but it appears to be way faster to + // unregister frames in reverse order rather than in-order. This + // way we're deregistering in LIFO order, and maybe there's some + // vec shifting or something like that in libgcc? + // + // Locally on Ubuntu 18.04 a wasm module with 40k empty + // functions takes 0.1s to compile and drop with reverse + // iteration. With forward iteration it takes 3s to compile and + // drop! + // + // Poking around libgcc sources seems to indicate that some sort + // of linked list is being traversed... We may need to figure + // out something else for backtraces in the future since this + // API may not be long-lived to keep calling. + for fde in published.iter().rev() { __deregister_frame(*fde as *const _); } }