Suppose I want to sum a bunch of floatingpoint numbers. In C that would probably be:
float summation(float* x, size_t len) {
float sum = 1.0;
for (size_t i = 0; i < len; i++) {
sum += x[i];
}
return sum;
}
Except this code did not just sum the numbers, it summed them in order. Floats do not behave the same way as real numbers. When I typed the code for
summation
, the compiler assumes I’m aware that floatingpoint addition is not associative and I indeed want the summation to be done sequentially from the first one to the last.
The ffastmath
flag allows the compiler to relax the rules a bit which can allow it to do more aggressive optimizations. Like in the summation example here  by allowing the compiler to assume that float add is associative, instead of summing the numbers one by one, the compiler might decide that it’s faster to:
 Sum the elements of
x
four at a time with SIMD  Sum the four values of the resulting vector
 Then add in the 1  3 elements that remain (if any).^{1}
And we want this in Rust. The mechanism is already in the LLVM backend, we just need to take advantage of it.
State of things
Some might note that we already do have fastmath available in Rustnightly in the form of functions in
std::intrinsics
:
// These should only be called on f32/f64 even though the type bound does not
// reflect this. Thankfully, rustc would throw an error if you try to call
// these with integer types.
pub fn fadd_fast<T: Copy>(a: T, b: T) > T;
pub fn fsub_fast<T: Copy>(a: T, b: T) > T;
pub fn fmul_fast<T: Copy>(a: T, b: T) > T;
pub fn fdiv_fast<T: Copy>(a: T, b: T) > T;
pub fn frem_fast<T: Copy>(a: T, b: T) > T;
I can think of three ways of having fastmath in Rust:
 Create a
FastFloat
type that has fastmath instructions internally.  Imitate what Clang does and add it via a flag.
 Add a
#[fast_math]
attribute to apply it locally to functions/statements.
These gravitate towards the first one. And, yes, one could express a good deal of fastmath ops with just these. The code in the intro can be done with
fadd_fast
. But I would like to point out that these are just half of the fastmath abled operations in LLVM. The language reference lists the floatingpoint ops that may have fastmath flags as
fneg
,
fadd
,
fsub
,
fmul
,
fdiv
,
frem
,
fcmp
,
phi
,
select
and
call
.
About adding the missing ones in the same way:

fneg
seems straightforward to do. 
fcmp
should be fine to expose. The comparison type parameter could be an enum. 
call
itself is not what is needed but the instrinsics that have approximate variants (log
,sqrt
,exp
,…). Code repetitiveness aside, these instrinsics could each have_fast
functions added without much trouble. 
phi
andselect
appear after the compiler is done parsing the code, so these instructions are only available during LLVM IR codegen. Exposing these two via a library function does not seem feasible.^{2}
A minor nitpick: Fastmath is a combination of 7 different flags and I think
fadd_fast
, et al. should take a parameter for the flags we want to enable. This can help the compiler do delicious optimizations without it doing reduced precision divides and
sqrt
’s and/or without assuming something potentially unsafe like NaNs not existing.
// defined somewhere
bitflags! {
#[derive(Default, Encodable, Decodable)]
pub struct FastMathFlags: u8 {
...
}
}
pub fn fadd_fast<T: Copy>(a: T, b: T, flags: FastMathFlags) > T;
Explosion of functions
Something more problematic with the current approach is that it won’t scale well. We also probably want fastmath for other types besides just
f32
and
f64
; if we look at what LLVM does, we see that fastmath call
’s also applies to float vectors:
// excerpt from llvm/IR/Operator.h
case Instruction::Call: {
Type *Ty = V>getType();
while (ArrayType *ArrTy = dyn_cast<ArrayType>(Ty)) {
Ty = ArrTy>getElementType();
}
return Ty>isFPOrFPVectorTy();
}
Vector types are what LLVM uses to represent SIMD types in the IR. While researching about this topic, I found an example on stackoverflow where ffastmath
modifies the result of SSE intrinsics.^{3} If we care about raw performance, we might want to also support fastmath on SIMDs.
Consider
_mm_add_ps
 an SSE intrinsic that adds four 32bit floats contained in a
__m128d
. In Rust, this is internally a call to
simd_add
, which is just
fadd
 meaning this could be modified by a fastmath flag. An
_mm_add_ps_fast
is one thing but we have multitudes of intrinsics for multitudes of architectures; a
_fast
version for each one of them is certainly doable but doesn’t smell like good design to me.
Conclusion
Let me just say that I am by no way against a
FastFloat
type. I just think a proper one would need more assistance from the compiler and not rely on the
_fast
functions. That probably needs some changes in the Rust HIR and/or MIR and cannot be done with just the information available to the backend LLVM IR.
Since implementing a fastmath type right now would not fully capture what LLVM is capable of, I’m going to explore in this series of posts the other two options  fastmath via a flag and an attribute. They look easier to implement. The next part would be adding a Z
flag.

And that is actually what Clang produces  add four at a time with
fadd fast <4 x float>
then horizontally sum the SIMD vector with@llvm.experimental.vector.reduce.v2.fadd.f32.v4f32
. ↩︎ 
Might not matter much. LLVM is smart enough to derive
phi
andselect
from justbr
’s. And while there are a couple of usages ofselect
in the LLVM backend code, I only found a total of two places wherephi
was used. ↩︎ 
This is actually
fcmp
, with Clang translating_mm_cmpord_ps
directly to LLVM IR. Rust calls thellvm.x86.sse.cmp.ps
intrinsic instead. ↩︎