Clarify undefined behavior and notrap. (#170)

* Clarify undefined behavior and notrap.

Remove the "No undefined behavior" paragraph from the README. The other
paragraphs, specifically "Portable semantics" and
"Fast sandbox verification", describe Cretonne's goals in this area.

Define *addressable* and *accessible* memory, so that trapping remains a fully defined part of the semantics, and we have a clear boundary around undefined behavior, and use these terms to describe related constructs.
This commit is contained in:
Dan Gohman
2017-10-20 09:38:52 -07:00
committed by GitHub
parent e5c0e06fa8
commit 2569ef4c42
2 changed files with 73 additions and 35 deletions

View File

@@ -18,9 +18,6 @@ target-independent intermediate language into executable machine code.
Cretonne is designed to be a code generator for WebAssembly with these design Cretonne is designed to be a code generator for WebAssembly with these design
goals: goals:
No undefined behavior
Cretonne does not have a `nasal demons clause <http://www.catb.org/jargon/html/N/nasal-demons.html>`_, and it won't generate code
with unexpected behavior if invariants are broken.
Portable semantics Portable semantics
As far as possible, Cretonne's input language has well-defined semantics As far as possible, Cretonne's input language has well-defined semantics
that are the same on all target architectures. The semantics are usually that are the same on all target architectures. The semantics are usually
@@ -28,8 +25,8 @@ Portable semantics
Fast sandbox verification Fast sandbox verification
Cretonne's input language has a safe subset for sandboxed code. No advanced Cretonne's input language has a safe subset for sandboxed code. No advanced
analysis is required to verify memory safety as long as only the safe analysis is required to verify memory safety as long as only the safe
instructions are used. The safe instruction set is expressive enough to subset is used. The safe subset is expressive enough to implement
implement WebAssembly. WebAssembly.
Scalable performance Scalable performance
Cretonne can be configured to generate code as quickly as possible, or it Cretonne can be configured to generate code as quickly as possible, or it
can generate very good code at the cost of slower compile times. can generate very good code at the cost of slower compile times.

View File

@@ -429,41 +429,45 @@ Indirect function calls use a signature declared in the preamble.
.. autoinst:: call_indirect .. autoinst:: call_indirect
.. autoinst:: func_addr .. autoinst:: func_addr
.. _memory:
Memory Memory
====== ======
Cretonne provides fully general :inst:`load` and :inst:`store` instructions for Cretonne provides fully general :inst:`load` and :inst:`store` instructions for
accessing memory, as well as :ref:`extending loads and truncating stores accessing memory, as well as :ref:`extending loads and truncating stores
<extload-truncstore>`. There are also more restricted operations for accessing <extload-truncstore>`.
specific types of memory objects.
If the memory at the given addresss is not :term:`addressable`, the behavior of
these instructions is undefined. If it is addressable but not
:term:`accessible`, they :term:`trap`.
.. autoinst:: load .. autoinst:: load
.. autoinst:: store .. autoinst:: store
There are also more restricted operations for accessing specific types of memory
objects.
Memory operation flags Memory operation flags
---------------------- ----------------------
Loads and stores can have flags that loosen their semantics in order to enable Loads and stores can have flags that loosen their semantics in order to enable
optimizations. optimizations.
======= ========================================= ======= ===========================================
Flag Description Flag Description
======= ========================================= ======= ===========================================
notrap Trapping is not required. notrap Memory is assumed to be :term:`accessible`.
aligned Trapping allowed for misaligned accesses. aligned Trapping allowed for misaligned accesses.
======= ========================================= ======= ===========================================
Trapping is part of the semantics of memory accesses. The operating system may When the ``accessible`` flag is set, the behavior is undefined if the memory
have configured parts of the address space to cause a trap when read and/or is not :term:`accessible`.
written, and Cretonne's memory instructions respect that. When the ``notrap``
flat is set, the trapping behavior is optional. This allows the optimizer to
delete loads whose results are not used.
Loads and stores are *misaligned* if the resultant address is not a multiple of Loads and stores are *misaligned* if the resultant address is not a multiple of
the expected alignment. By default, misaligned loads and stores are allowed, the expected alignment. By default, misaligned loads and stores are allowed,
but when the ``aligned`` flag is set, a misaligned memory access is allowed to but when the ``aligned`` flag is set, a misaligned memory access is allowed to
trap. :term:`trap`.
Local variables Local variables
--------------- ---------------
@@ -471,7 +475,8 @@ Local variables
One set of restricted memory operations access the current function's stack One set of restricted memory operations access the current function's stack
frame. The stack frame is divided into fixed-size stack slots that are frame. The stack frame is divided into fixed-size stack slots that are
allocated in the :term:`function preamble`. Stack slots are not typed, they allocated in the :term:`function preamble`. Stack slots are not typed, they
simply represent a contiguous sequence of bytes in the stack frame. simply represent a contiguous sequence of :term:`accessible` bytes in the stack
frame.
.. inst:: SS = local Bytes, Flags... .. inst:: SS = local Bytes, Flags...
@@ -492,8 +497,8 @@ about because stack slots and offsets are fixed at compile time. For example,
the alignment of these stack memory accesses can be inferred from the offsets the alignment of these stack memory accesses can be inferred from the offsets
and stack slot alignments. and stack slot alignments.
It can be necessary to escape from the safety of the restricted instructions by It can be necessary to escape from the :term:`safety` of the restricted
taking the address of a stack slot. instructions by taking the address of a stack slot.
.. autoinst:: stack_addr .. autoinst:: stack_addr
@@ -508,12 +513,12 @@ instructions before instruction selection::
Global variables Global variables
---------------- ----------------
A *global variable* is an object in memory whose address is not known at A *global variable* is an :term:`accessible` object in memory whose address is
compile time. The address is computed at runtime by :inst:`global_addr`, not known at compile time. The address is computed at runtime by
possibly using information provided by the linker via relocations. There are :inst:`global_addr`, possibly using information provided by the linker via
multiple kinds of global variables using different methods for determining relocations. There are multiple kinds of global variables using different
their address. Cretonne does not track the type or even the size of global methods for determining their address. Cretonne does not track the type or even
variables, they are just pointers to non-stack memory. the size of global variables, they are just pointers to non-stack memory.
When Cretonne is generating code for a virtual machine environment, globals can When Cretonne is generating code for a virtual machine environment, globals can
be used to access data structures in the VM's runtime. This requires functions be used to access data structures in the VM's runtime. This requires functions
@@ -570,9 +575,9 @@ in, and all accesses are bounds checked. Cretonne models this through the
concept of *heaps*. concept of *heaps*.
A heap is declared in the function preamble and can be accessed with the A heap is declared in the function preamble and can be accessed with the
:inst:`heap_addr` instruction that traps on out-of-bounds accesses or returns a :inst:`heap_addr` instruction that :term:`traps` on out-of-bounds accesses or
pointer that is guaranteed to trap. Heap addresses can be smaller than the returns a pointer that is guaranteed to trap. Heap addresses can be smaller than
native pointer size, for example unsigned :type:`i32` offsets on a 64-bit the native pointer size, for example unsigned :type:`i32` offsets on a 64-bit
architecture. architecture.
.. digraph:: static .. digraph:: static
@@ -588,18 +593,21 @@ architecture.
A heap appears as three consecutive ranges of address space: A heap appears as three consecutive ranges of address space:
1. The *mapped pages* are the usable memory range in the heap. Loads and stores 1. The *mapped pages* are the :term:`accessible` memory range in the heap. A
to this range won't trap. A heap may have a minimum guaranteed size which heap may have a minimum guaranteed size which means that some mapped pages
means that some mapped pages are always present. are always present.
2. The *unmapped pages* is a possibly empty range of address space that may be 2. The *unmapped pages* is a possibly empty range of address space that may be
mapped in the future when the heap is grown. mapped in the future when the heap is grown. They are :term:`addressable` but
not :term:`accessible`.
3. The *guard pages* is a range of address space that is guaranteed to cause a 3. The *guard pages* is a range of address space that is guaranteed to cause a
trap when accessed. It is used to optimize bounds checking for heap accesses trap when accessed. It is used to optimize bounds checking for heap accesses
with a shared base pointer. with a shared base pointer. They are :term:`addressable` but
not :term:`accessible`.
The *heap bound* is the total size of the mapped and unmapped pages. This is The *heap bound* is the total size of the mapped and unmapped pages. This is
the bound that :inst:`heap_addr` checks against. Memory accesses inside the the bound that :inst:`heap_addr` checks against. Memory accesses inside the
heap bounds can trap if they hit an unmapped page. heap bounds can trap if they hit an unmapped page (which is not
:term:`accessible`).
.. autoinst:: heap_addr .. autoinst:: heap_addr
@@ -920,6 +928,9 @@ In addition to the normal :inst:`load` and :inst:`store` instructions, Cretonne
provides extending loads and truncation stores for 8, 16, and 32-bit memory provides extending loads and truncation stores for 8, 16, and 32-bit memory
accesses. accesses.
These instructions succeed, trap, or have undefined behavior, under the same
conditions as :ref:`normal loads and stores <memory>`.
.. autoinst:: uload8 .. autoinst:: uload8
.. autoinst:: sload8 .. autoinst:: sload8
.. autoinst:: istore8 .. autoinst:: istore8
@@ -1009,6 +1020,19 @@ Glossary
.. glossary:: .. glossary::
addressable
Memory in which loads and stores have defined behavior. They either
succeed or :term:`trap`, depending on whether the memory is
:term:`accessible`.
accessible
:term:`Addressable` memory in which loads and stores always succeed
without :term:`trapping`, except where specified otherwise (eg. with the
`aligned` flag). Heaps, globals, and the stack may contain accessible,
merely addressable, and outright unaddressable regions. There may also
be additional regions of addressable and/or accessible memory not
explicitly declared.
basic block basic block
A maximal sequence of instructions that can only be entered from the A maximal sequence of instructions that can only be entered from the
top, and that contains no branch or terminator instructions except for top, and that contains no branch or terminator instructions except for
@@ -1089,6 +1113,15 @@ Glossary
intermediate representation. Cretonne's IR can be converted to text intermediate representation. Cretonne's IR can be converted to text
losslessly. losslessly.
safe
safety
Execution of exclusively defined behavior. Safe programs cannot
read, write, or execute memory outside of heaps, globals, stack
regions, and functions that have been explicitly provided to them. In
some instances, defined behavior can be nondeterministic, where the
specific behavior may vary among a bounded set of possibilities.
Execution of undefined behavior is unsafe.
stack slot stack slot
A fixed size memory allocation in the current function's activation A fixed size memory allocation in the current function's activation
frame. Also called a local variable. frame. Also called a local variable.
@@ -1101,3 +1134,11 @@ Glossary
The basic terminator instructions are :inst:`br`, :inst:`return`, and The basic terminator instructions are :inst:`br`, :inst:`return`, and
:inst:`trap`. Conditional branches and instructions that trap :inst:`trap`. Conditional branches and instructions that trap
conditionally are not terminator instructions. conditionally are not terminator instructions.
trap
traps
trapping
Terminates execution of the current thread. The specific behavior after
a trap depends on the underlying OS. For example, a common behavior is
delivery of a signal, with the specific signal depending on the event
that triggered it.