Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Asm difference for F# and C# methods #58941

Closed
En3Tho opened this issue Sep 10, 2021 · 8 comments
Closed

[JIT] Asm difference for F# and C# methods #58941

En3Tho opened this issue Sep 10, 2021 · 8 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@En3Tho
Copy link
Contributor

En3Tho commented Sep 10, 2021

I'm not sure whether this might be JIt imporvement or F#.. Basically I will duplicate this in fsharp repo (dotnet/fsharp#12138) too.

Consider these 2 methods:

[<MethodImpl(MethodImplOptions.AggressiveInlining)>]
let fold initial folder (enumerator: #IEnumerator<'i>) =
    let folder = OptimizedClosures.FSharpFunc<_,_,_>.Adapt folder
    let mutable enumerator = enumerator
    let mutable result = initial
    while enumerator.MoveNext() do
        result <- folder.Invoke(result, enumerator.Current)
    result
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static TResult Fold<TResult, TItem, TEnumerator>(TResult result, FSharpFunc<TResult, FSharpFunc<TItem, TResult>> folder, TEnumerator enumerator)
            where TEnumerator : IEnumerator<TItem>
{
    var fSharpFunc = OptimizedClosures.FSharpFunc<TResult, TItem, TResult>.Adapt(folder);
    var enumerator2 = enumerator;
    var result2 = result;
    while (enumerator2.MoveNext())
        result2 = fSharpFunc.Invoke(result2, enumerator2.Current);

    return result2;
}

They look very similar but there is an importnat il emit difference:

C# method is compiled to this basically:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static TResult FoldRoslynVersion<TResult, TItem, TEnumerator>(TResult result, FSharpFunc<TResult, FSharpFunc<TItem, TResult>> folder, TEnumerator enumerator)
            where TEnumerator : IEnumerator<TItem>
{
    var fSharpFunc = OptimizedClosures.FSharpFunc<TResult, TItem, TResult>.Adapt(folder);
    var enumerator2 = enumerator;
    var result2 = result;
    goto movenext;

    logic:
    result2 = fSharpFunc.Invoke(result2, enumerator2.Current);

    movenext:
    if (!enumerator2.MoveNext())
        return result2;

    goto logic;
}

While F# is compiled to this basically:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static TResult FoldFSharpVersion<TResult, TItem, TEnumerator>(TResult result, FSharpFunc<TResult, FSharpFunc<TItem, TResult>> folder, TEnumerator enumerator)
            where TEnumerator : IEnumerator<TItem>
{
    var fSharpFunc = OptimizedClosures.FSharpFunc<TResult, TItem, TResult>.Adapt(folder);
    var enumerator2 = enumerator;
    var result2 = result;

    movenext:
    if (!enumerator2.MoveNext())
        goto exit;

    result2 = fSharpFunc.Invoke(result2, enumerator2.Current);
    goto movenext;

    exit:
    return result2;
}

While difference might be non obvious, C# version with condition at the end of the method results in 10-15% perf imporvement while having the same assembly size.

image

Can JIT compiler regonize these patterns better and ideally emit the same code for both variants?

category:cq
theme:loop-opt
skill-level:expert
cost:large
impact:large

@En3Tho En3Tho added the tenet-performance Performance related issue label Sep 10, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 10, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@EgorBo
Copy link
Member

EgorBo commented Sep 10, 2021

@BruceForstall a loop optimization for you I guess:

static void Test()
{
start:
    if (Cond())
    {
        DoWork();
        goto start;
    }
}

should be transformed into the same as:

static void Caller2()
{
    goto condition;
start:
    DoWork();
condition:
    if (Cond())
        goto start;
}

it might make code a bit bigger but loop body will be more efficient.

PS: this transformation is made in Roslyn for while loops for us, but F# seems doesn't do it.

@EgorBo
Copy link
Member

EgorBo commented Sep 10, 2021

Proof that Roslyn does it:
image

@EgorBo
Copy link
Member

EgorBo commented Sep 10, 2021

image

@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 11, 2021
@ghost
Copy link

ghost commented Sep 11, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

I'm not sure whether this might be JIt imporvement or F#.. Basically I will duplicate this in fsharp repo (dotnet/fsharp#12138) too.

Consider these 2 methods:

[<MethodImpl(MethodImplOptions.AggressiveInlining)>]
let fold initial folder (enumerator: #IEnumerator<'i>) =
    let folder = OptimizedClosures.FSharpFunc<_,_,_>.Adapt folder
    let mutable enumerator = enumerator
    let mutable result = initial
    while enumerator.MoveNext() do
        result <- folder.Invoke(result, enumerator.Current)
    result
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static TResult Fold<TResult, TItem, TEnumerator>(TResult result, FSharpFunc<TResult, FSharpFunc<TItem, TResult>> folder, TEnumerator enumerator)
            where TEnumerator : IEnumerator<TItem>
{
    var fSharpFunc = OptimizedClosures.FSharpFunc<TResult, TItem, TResult>.Adapt(folder);
    var enumerator2 = enumerator;
    var result2 = result;
    while (enumerator2.MoveNext())
        result2 = fSharpFunc.Invoke(result2, enumerator2.Current);

    return result2;
}

They look very similar but there is an importnat il emit difference:

C# method is compiled to this basically:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static TResult FoldRoslynVersion<TResult, TItem, TEnumerator>(TResult result, FSharpFunc<TResult, FSharpFunc<TItem, TResult>> folder, TEnumerator enumerator)
            where TEnumerator : IEnumerator<TItem>
{
    var fSharpFunc = OptimizedClosures.FSharpFunc<TResult, TItem, TResult>.Adapt(folder);
    var enumerator2 = enumerator;
    var result2 = result;
    goto movenext;

    logic:
    result2 = fSharpFunc.Invoke(result2, enumerator2.Current);

    movenext:
    if (!enumerator2.MoveNext())
        return result2;

    goto logic;
}

While F# is compiled to this basically:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static TResult FoldFSharpVersion<TResult, TItem, TEnumerator>(TResult result, FSharpFunc<TResult, FSharpFunc<TItem, TResult>> folder, TEnumerator enumerator)
            where TEnumerator : IEnumerator<TItem>
{
    var fSharpFunc = OptimizedClosures.FSharpFunc<TResult, TItem, TResult>.Adapt(folder);
    var enumerator2 = enumerator;
    var result2 = result;

    movenext:
    if (!enumerator2.MoveNext())
        goto exit;

    result2 = fSharpFunc.Invoke(result2, enumerator2.Current);
    goto movenext;

    exit:
    return result2;
}

While difference might be non obvious, C# version with condition at the end of the method results in 10-15% perf imporvement while having the same assembly size.

image

Can JIT compiler regonize these patterns better and ideally emit the same code for both variants?

Author: En3Tho
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@JulieLeeMSFT JulieLeeMSFT added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Sep 11, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Sep 11, 2021
@BruceForstall BruceForstall modified the milestones: Future, 7.0.0 Sep 17, 2021
@BruceForstall
Copy link
Member

The F# generated IL doesn't match the patterns the JIT looks for to do loop inversion or loop recognition, so the loop doesn't get added to our loop table for consideration for future optimization. We do mark the blocks with loop weights because we use a different logic for determining that.

Fixing this is part of a more general task to improve RyuJIT loop recognition.

@jakobbotsch
Copy link
Member

Fixed by PRs listed in #93144 (comment)

@En3Tho
Copy link
Contributor Author

En3Tho commented Jan 12, 2024

@jakobbotsch Thank you!

@github-actions github-actions bot locked and limited conversation to collaborators Feb 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants