Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Animatable trait for interpolation and blending #4482

Merged
merged 22 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 221 additions & 0 deletions crates/bevy_animation/src/animatable.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
use crate::util;
use bevy_asset::{Asset, Assets, Handle, HandleId};
use bevy_core::FloatOrd;
use bevy_ecs::world::World;
use bevy_math::*;
use bevy_reflect::Reflect;
use bevy_transform::prelude::Transform;

/// An individual input for [`Animatable::blend`].
pub struct BlendInput<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought a bit more about this abstraction. I think it's useful to look at what our goals are, especially in terms of how things will be used, and how this is trying to achieve those goals.

As I recall, the base idea of BlendInput<T> is to "preprocess" all the various blend sources into a single common data structure per type (one for f32, one for Vec4, etc.) and then run a tight loop on blending all sources for that data type, which is then easier because you iterate only on BlendInput<T> and not a collection of heterogenous sources.

Now, this is clearly a step in the right direction, in the sense that we're thinking about the data usage pattern to optimize it and leverage e.g. SIMD. Where I think this abstraction falls short is in two ways:

  1. It iterates over the blend sources in the inner loop, as seen in the signature of Animatable::blend() which takes a set of sources to blend together. How many different sources do we expect will blend that way? Even for an advanced scenario with say 4 base animations blending together (walk, run, jump, and crouch), and let's say 4 more additive ones on top (look left/right, up/down, and whatever else), that makes 8 sources to blend into a final vertex value. Compare that with how many vertices we need to blend for a single 3D character in a modern game, which is in the order of thousands. And those will be iterated over in an outer loop. We've got the loops reversed from a performance point of view.

  2. The data structure is not optimal for packing and memory accesses. First the packing: depending on T, and let's take any VecN as example, the compiler will most likely put the VecN first, aligned on 4 bytes (or 16 bytes for SIMD), followed by weight (4B size/align), followed by additive (1B). This means we have 3 bytes of padding for every 12B (f32) to 24B (Vec4) of total size, which is a lot. The access pattern, which is also to load the value and weight into separate (SIMD) registers to perform some lerp() (which is the most common case; let's ignore rotations for a minute) is also not helped here because we can't load more than 1 source at once (SIMD register load only accepts a continuous block of memory). If, on the other hand, we were transforming this "array of struct" into a "struct of array", we could load more data sources at once, and blend multiple of them via SIMD. Better even, we can even pack all lerp() sources together (that is, including packing f32 with Vec3 and Vec4) since the SIMD instructions to lerp are the same and operate component-wise. This means a very tight loop on a long array of data without any branching etc. which is the absolute optimal for SIMD.

So the way I'd see a better abtraction for the blending step is:

  • explicitly support lerp / nlerp / slerp with separate optimized codepaths, because they require different SIMD instructions;
  • possibly add a fallback where we invoke something like the current Animatable::interpolate(), but only for "exotic" interpolations, and I'm not even sure we want that (I don't have an example);
  • pack all inputs and do lerp(lhs[], rhs[], weights[]) instead of lerp(lhs,rhs,weight)[].
pub enum Interpolation<T> {
    Linear,
    NormalizedLinear,
    SphericalLinear,
    // See how this immediately become messy due to T, maybe we can skip this entirely
    Custom(Box<dyn Fn(&T, &T, f32) -> T),
}

impl Interpolation<f32> {
    // Blend two arrays of values with given weights
    fn blend(&self, src: &mut [f32], rhs: &[f32], weights: &[f32], additive: bool) {
        match *self {
            Interpolation::<f32>::Linear => {
                // Branch _before_ we start the hot loop
                if additive {
                    // This loop can get partially unrolled, use SIMD, etc.
                    for i in 0..src.len() {
                        src[i] += rhs[i] * weights[i];
                    }
                } else {
                    // Same here
                    for i in 0..src.len() {
                        src[i] = (1.0 - weights[i]) * src[i] + rhs[i] * weights[i];
                    }
                }
            }
            // etc.
        }
    }
}

// Now, we need to know how to pack e.g. Vec3 and f32 and Vec4 together:
pub trait Animatable: [...] {
    type Unit;
    fn prepare(&self) -> &[Unit];
}

impl Animatable for Vec3 {
    type Unit = f32;
    fn prepare(&self) -> &[f32] { self.as_ref() }
}

// etc. for other types...

fn animation_system(anims: &[...]) {
    // Read and prepare values from component (here, Transform)
    let mut src = vec![];
    for tr in &transforms {
        // Assuming here the case where both translation and scale use Interpolation::Linear
        // to keep the example short:
        src.extend_from_slice(tr.translation.prepare());
        src.extend_from_slice(tr.scale.prepare());
    }

    // Blend all anims into src directly
    let mut rhs = vec![];
    for anim in &anims {
        // Grab one animation. Ideally we can even pre-process this in the anim format,
        // and completely remove this block.
        rhs.clear();
        for tr in &anim.transforms {
            rhs.extend_from_slice(tr.translation.prepare());
            rhs.extend_from_slice(tr.scale.prepare());
        }

        // Do the actual blend for every transform at once
        Interpolation::<f32>::blend(&mut src, &rhs, &anim.weights, false);
    }

    // Store back final result
    let mut offset = 0;
    for tr in &mut transforms {
        tr.translation.copy_from_slice(&src[offset..offset+3]);
        offset += 3;
        tr.scale.copy_from_slice(&src[offset..offset+3]);
        offset += 3;
    }
}

/// The individual item's weight. This may not be bound to the range `[0.0, 1.0]`.
pub weight: f32,
/// The input value to be blended.
pub value: T,
/// Whether or not to additively blend this input into the final result.
pub additive: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example where parts of an animation are additive and parts are not? That's looks overly complex. I think the additive state should be at the animation level not at the individual input one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right that we should probably make this decision at the animation level. However, the intent is to do all of the computation for blending in Animatable::blend and then apply the final result to the animated components, so BlendInput should be the sum-total of everything required to compute the final blend result.

This is ultimately required for handling full-Reflect based property animation, where the only way to handle it in a performant way is to lift the dynamic dispatch to the track level and rely on monomorphization of generics: https://github.com/HouraiTeahouse/bevy_prototype_animation/blob/main/src/graph/track.rs#L212 and https://github.com/HouraiTeahouse/bevy_prototype_animation/blob/main/src/graph/track.rs#L256. This otherwise might get very hairy to handle.

Tentatively, treating the full list of additive or not as a read-only property like this instead of as a post-process may also allow us to parallelize on a per-property or per-Entity level, since it's treated as a precomputed list rather than a singular synchronization point in the animation process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Let's close for now.

}

/// An animatable value type.
pub trait Animatable: Reflect + Sized + Send + Sync + 'static {
/// Interpolates between two values of.
///
/// The `time` parameter here may not be clamped to the range `[0.0, 1.0]`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean? How is an implementor supposed to understand interpolating with t=1.5 (which sounds like an extrapolation rather)? Are we sure this is even possible in the general case? Because this seems to strongly imply lerp here / a linear transform.

fn interpolate(a: &Self, b: &Self, time: f32) -> Self;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a generic interpolate, this is lerp (linear interpolation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tackled in james7132#7

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% convinced we should change the name here. Yes, most of the implementations right now are linear interpolations, but it's not so clear when you have distinct quantized values (i.e. bool, any of the integers, asset handles, etc). Preferably we leave this up to the implementor as to what it's supposed to do. I'm open to changing the documentation to reflect this, but directly putting "linear" in the name overly constrains what implementors of the trait can do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a fair point. Let's amend the docs then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we then explicitly document that the interpolation can be anything, including nonlinear ones?


/// Blends one or more values together.
///
/// Implementors should return a default value when no inputs are provided here.
fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self;

/// Post-processes the value using resources in the [`World`].
/// Most animatable types do not need to implement this.
///
/// # Safety
/// All concrete implementors of this function can only read
/// into the resources stored in the World. Mutation of any state,
/// or reading any component or [`NonSend`] resource data may cause
/// undefined behavior or data races.
///
/// [`NonSend`]: bevy_ecs::system::NonSend
unsafe fn post_process(&mut self, _world: &World) {}
}

macro_rules! impl_float_animatable {
($ty: ty, $base: ty) => {
impl Animatable for $ty {
#[inline(always)]
alice-i-cecile marked this conversation as resolved.
Show resolved Hide resolved
james7132 marked this conversation as resolved.
Show resolved Hide resolved
fn interpolate(a: &Self, b: &Self, t: f32) -> Self {
let t = <$base>::from(t);
(*a) * (1.0 - t) + (*b) * t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use built-in lerp?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work: the $base is not always f32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could lift this out as a parameter to the macro and lift this current implementation as a fallback when no other option exists. I'm a bit concerned about if the results we get are deterministic or not, depending on glam's implementations for SIMD lerping, but their stance on value mismatches seems to suggest that anything like that should be a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're concerned about SIMD determinism then we're having an entirely different and a lot more broad and complex discussion. Does it make sense to be concern by that here, for example, if next in the pipeline the rendering code is non deterministic? What's the use case for animation to be deterministic (and which determinism? per run on same machine? per run across identical machines? across different architectures?)? I think it's more reasonable to focus on animation performance and try to use SIMD whenever. Unless we want to adopt a deterministic stance for the entire Bevy, you can't be deterministic if not all crates are.

Practically I'm fine to close here if there's a plan or at least fallback change possible to recover SIMD usage if performance becomes an issue. Just wanted to make sure we're not locking ourselves into a no-SIMD corner, that would be unfortunate for the performance of animation I think.

}

#[inline(always)]
james7132 marked this conversation as resolved.
Show resolved Hide resolved
fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self {
let mut value = Default::default();
for input in inputs {
if input.additive {
value += <$base>::from(input.weight) * input.value;
} else {
value = Self::interpolate(&value, &input.value, input.weight);
}
}
value
}
}
};
}

impl_float_animatable!(f32, f32);
impl_float_animatable!(Vec2, f32);
impl_float_animatable!(Vec3A, f32);
impl_float_animatable!(Vec4, f32);

impl_float_animatable!(f64, f64);
impl_float_animatable!(DVec2, f64);
impl_float_animatable!(DVec3, f64);
impl_float_animatable!(DVec4, f64);

// Vec3 is special cased to use Vec3A internally for blending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?! Why not use impl Animatable for Vec3A when you need it, and leave the Vec3 one as using only Vec3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is primarily as a speedup: use SIMD where we can. Though this should probably be tested whether it's actually a speedup or not, since these will be using unaligned reads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we care about SIMD (and we should), we should make sure to convert animation data ASAP to Vec3A, much before we invoke this. Because converting back and forth at several points will negate the performance boost. Probably should be profiled as you said.

impl Animatable for Vec3 {
#[inline(always)]
fn interpolate(a: &Self, b: &Self, t: f32) -> Self {
(*a) * (1.0 - t) + (*b) * t
}

#[inline(always)]
fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self {
let mut value = Vec3A::ZERO;
for input in inputs {
if input.additive {
value += input.weight * Vec3A::from(input.value);
} else {
value = Vec3A::interpolate(&value, &Vec3A::from(input.value), input.weight);
}
}
Self::from(value)
}
}

impl Animatable for bool {
#[inline]
fn interpolate(a: &Self, b: &Self, t: f32) -> Self {
util::step_unclamped(*a, *b, t)
}

#[inline]
fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self {
inputs
.max_by(|a, b| FloatOrd(a.weight).cmp(&FloatOrd(b.weight)))
.map(|input| input.value)
.unwrap_or(false)
}
}

impl Animatable for HandleId {
#[inline]
fn interpolate(a: &Self, b: &Self, t: f32) -> Self {
util::step_unclamped(*a, *b, t)
}

#[inline]
fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self {
inputs
.max_by(|a, b| FloatOrd(a.weight).cmp(&FloatOrd(b.weight)))
.map(|input| input.value)
.expect("Attempted to blend HandleId with zero inputs.")
james7132 marked this conversation as resolved.
Show resolved Hide resolved
}
}

impl<T: Asset> Animatable for Handle<T> {
#[inline]
fn interpolate(a: &Self, b: &Self, t: f32) -> Self {
util::step_unclamped(a.clone_weak(), b.clone_weak(), t)
}

#[inline]
fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self {
inputs
.max_by(|a, b| FloatOrd(a.weight).cmp(&FloatOrd(b.weight)))
.map(|input| input.value)
.expect("Attempted to blend Handle with zero inputs.")
}

// SAFE: This implementation only reads resources from the provided
// World.
unsafe fn post_process(&mut self, world: &World) {
// Upgrade weak handles into strong ones.
if self.is_strong() {
return;
}
*self = world
.get_resource::<Assets<T>>()
.expect(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, this should not panic during the actual animation run, this should have been caught much earlier and/or be silently skipped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in james7132#6.

"Attempted to animate a Handle<T> without the corresponding Assets<T> resource.",
)
.get_handle(self.id);
}
}

impl Animatable for Transform {
fn interpolate(a: &Self, b: &Self, t: f32) -> Self {
Self {
translation: Vec3::interpolate(&a.translation, &b.translation, t),
rotation: Quat::interpolate(&a.rotation, &b.rotation, t),
Copy link
Contributor

@atlv24 atlv24 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be more explicit about the kind of interpolation
this definitely needs to explicitly be slerp

Suggested change
rotation: Quat::interpolate(&a.rotation, &b.rotation, t),
rotation: a.rotation.slerp(&b.rotation, t),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because interpolate should be nlerp by default and transforms can be very different rotations and thus need slerp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you mean that for rotations the interpolation should be fixed to always be slerp because others will not provide decent results in general, right? I agree that we should probably limit the number of different interpolations we support, explicitly. See my other long comment about the design.

scale: Vec3::interpolate(&a.scale, &b.scale, t),
}
}

fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self {
let mut translation = Vec3A::ZERO;
let mut scale = Vec3A::ZERO;
let mut rotation = Quat::IDENTITY;

for input in inputs {
if input.additive {
translation += input.weight * Vec3A::from(input.value.translation);
scale += input.weight * Vec3A::from(input.value.scale);
rotation = (input.value.rotation * input.weight) * rotation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. Quat * f32 scales the components individually, which is very weird:

https://github.com/bitshifter/glam-rs/blob/5fd61e965f777dc0e3c3d353e9007e5a7c957304/src/f32/scalar/quat.rs#L714

I'm pretty sure the result after applying the second rotation has nothing to do with an interpolation. The quaternion sign is also ignored. This should use Quat::lerp() at the very least, and probably even Quat::sleep() to produce some smooth-looking rotation animation.

See bevy_tweening for example:

https://github.com/djeedai/bevy_tweening/blob/29e8c3389f109f1beec15639137527020ddd7e90/src/lens.rs#L151

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tackled in james7132#8

} else {
translation = Vec3A::interpolate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this causes unnecessary conversions, which are likely to be an overhead greater than what you gain by using Vec3A.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vec3A vs Vec3 is a difference in just padding and alignment. AFAICT, reading a Vec3 as if it was a Vec3A is primarily a unaligned read and a mask, which should be pretty cheap compared to doing the same interpolation operations on each component. As @alice-i-cecile commented, we should probably benchmark this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a load from CPU register to SIMD one, then a SIMD operation, then a store back. This extra load/store is almost surely faster if you do multiple operations while the values are in the SIMD registers, but for a single one it's unclear whether it's worth it (and probably depends on platform). In general when working with SIMD you should maximize grouped operations while values are in registers, and try to load only at start, store at end. So overhaul we should try to entirely avoid Vec3 inside the animation pipeline and use ,Vec3A everywhere, so we keep the values loaded in registers. We should even consider interpolating first all translations, then all rotations, etc. so that we can batch the calculations.

&translation,
&Vec3A::from(input.value.translation),
input.weight,
);
scale = Vec3A::interpolate(&scale, &Vec3A::from(input.value.scale), input.weight);
rotation = Quat::interpolate(&rotation, &input.value.rotation, input.weight);
alice-i-cecile marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

@atlv24 atlv24 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be more explicit about the kind of interpolation

rotation = rotation.slerp(&input.value.rotation, input.weight);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nlerp is wanted here for commutativity on the blend stack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that mean? What's supposed to be commutative? What's the blend stack? Blending operations are not commutative in general, you need to apply them in order. And even if they are, what do we gain by reordering?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the blend stack is the sequence of animations being blended. commutativity makes reasoning about the operations simpler in the general case, as you dont need to have logic to ensure they are always ordered the same way for the result to be consistent. consider two animations A and B, which can each start and stop at any moment. if you start A, then B, you will have A applied first then B applied on top. if you start B then A, they will be on the stack the other way around and will look different. This gets more complicated if you can cancel and restart, or pause animations. You start needing an animation priority system to determine which get applied first, and that can be confusing. Besides all that, its slower for not much gain quality-wise, as far as i understand. Slerp is not needed here, refer to the document linked in the comments above interpolate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you start A, then B, you will have A applied first then B applied on top.

Only if B is additive, because otherwise you should stop A first, you can't apply two non-additive animations to the same target. Also pause etc. is about animation transitions and creating the stack, not about blending (how to consume the stack). And if B is additive then I argue that the operations are not commutative. It's not a question of quality, it's a question of producing the correct result.

}
}

Self {
translation: Vec3::from(translation),
rotation,
scale: Vec3::from(scale),
}
}
}

impl Animatable for Quat {
/// Performs an nlerp, because it's cheaper and easier to combine with other animations,
james7132 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docs are dissonant. The code is not doing an nlerp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be doing an nlerp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your other comment says it should be doing a slerp() no? Because nlerp() for very different rotations might be way off.

/// reference: <http://number-none.com/product/Understanding%20Slerp,%20Then%20Not%20Using%20It/>
#[inline]
fn interpolate(a: &Self, b: &Self, t: f32) -> Self {
// Make sure is always the short path, look at this: https://github.com/mgeier/quaternion-nursery
james7132 marked this conversation as resolved.
Show resolved Hide resolved
let b = if a.dot(*b) < 0.0 { -*b } else { *b };

let a: Vec4 = (*a).into();
let b: Vec4 = b.into();
let rot = Vec4::interpolate(&a, &b, t);
let inv_mag = util::approx_rsqrt(rot.dot(rot));
Quat::from_vec4(rot * inv_mag)
james7132 marked this conversation as resolved.
Show resolved Hide resolved
}

#[inline]
fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self {
let mut value = Self::IDENTITY;
for input in inputs {
value = Self::interpolate(&value, &input.value, input.weight);
}
value
}
}
6 changes: 5 additions & 1 deletion crates/bevy_animation/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

#![warn(missing_docs)]

mod animatable;
mod util;

use std::ops::Deref;

use bevy_app::{App, CoreStage, Plugin};
Expand All @@ -25,7 +28,8 @@ use bevy_utils::{tracing::warn, HashMap};
pub mod prelude {
#[doc(hidden)]
pub use crate::{
AnimationClip, AnimationPlayer, AnimationPlugin, EntityPath, Keyframes, VariableCurve,
animatable::*, AnimationClip, AnimationPlayer, AnimationPlugin, EntityPath, Keyframes,
VariableCurve,
};
}

Expand Down
28 changes: 28 additions & 0 deletions crates/bevy_animation/src/util.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/// Fast approximated reciprocal square root.
#[inline]
pub(crate) fn approx_rsqrt(x: f32) -> f32 {
james7132 marked this conversation as resolved.
Show resolved Hide resolved
// Fall back to Quake 3 fast inverse sqrt, is has a higher error
// but still good enough and faster than `.sqrt().recip()`,
// implementation borrowed from Piston under the MIT License:
// [https://github.com/PistonDevelopers/skeletal_animation]
let x2: f32 = x * 0.5;
let mut y: f32 = x;

let mut i: i32 = y.to_bits() as i32;
i = 0x5f3759df - (i >> 1);
y = f32::from_bits(i as u32);

y = y * (1.5 - (x2 * y * y));
james7132 marked this conversation as resolved.
Show resolved Hide resolved
y
}

/// Steps between two different discrete values of any clonable type.
alice-i-cecile marked this conversation as resolved.
Show resolved Hide resolved
/// Returns a copy of `b` if `t >= 1.0`, otherwise returns a copy of `b`.
james7132 marked this conversation as resolved.
Show resolved Hide resolved
alice-i-cecile marked this conversation as resolved.
Show resolved Hide resolved
#[inline]
pub(crate) fn step_unclamped<T>(a: T, b: T, t: f32) -> T {
if t >= 1.0 {
a
} else {
b
}
}