Speed up identifier detection #89

jserv · 2024-11-16T07:25:10Z

Both 'is_ident1' and 'is_ident2' are now macros instead of function calls, and they are tweaked for ASCII detection in advance with the fallback to table lookup for non-ASCII characters.

The original code in function 'read_punct' relies on heavy string specific function calls, resulting in slower execution. Instead, this function can be faster using straightforward control flow.

fuhsnn · 2024-11-16T15:25:13Z

Thank you for the patches. Honestly performance hasn't been a priority, I consider the lexer less significant in the total pipeline compared to cache-misses from leaking and the overhead of external assembler. I do have some ideas as well, so a branch for performance experiments should make sense.

In terms of performance I want to ensure a common basis to work on: which projects to benchmark and what kind of environment, is it self-hosted, another simple C compiler or full-on gcc/clang optimizations with custom allocator etc. Do you have a preference?

fuhsnn · 2024-11-16T15:33:33Z

On the pull request: I think the short-circuiting of ASCII characters can be done earlier, skipping decode_utf8() altogether.

jserv · 2024-11-16T15:47:06Z

In terms of performance I want to ensure a common basis to work on: which projects to benchmark and what kind of environment, is it self-hosted, another simple C compiler or full-on gcc/clang optimizations with custom allocator etc.

Using uftrace, I identified the performance bottlenecks in the compilation process.

Here's what I found:
First, apply the following changes:

--- a/GNUmakefile
+++ b/GNUmakefile
@@ -1,4 +1,4 @@
-CFLAGS=-std=c99 -g -fno-common -Wall -pedantic -Wno-switch
+CFLAGS=-std=c99 -g -pg -fno-common -Wall -pedantic -Wno-switch
 
 SRCS=$(wildcard *.c)
 OBJS=$(SRCS:.c=.o)

After running uftrace ./slimcc -c -o stage2/parse.o parse.c, the trace showed the following call stack:

Copy[6] is_ident2
[5] read_ident
[4] tokenize
[3] tokenize_file
[2] must_tokenize_file
[1] cc1
[0] main

Based on this analysis, I tweaked the is_ident1 and is_ident2 functions.

jserv · 2024-11-16T16:09:58Z

By the way, I am working with my students to enhance the shecc project, a self-compiling C compiler with some optimizations. Since shecc only implements a subset of C, we are planning to modify slimcc to output compatible C code that shecc can process, rather than having slimcc generate x86-64 assembly directly. This way, slimcc would serve as a full C language frontend for shecc, while allowing shecc to focus on optimization and backend support. To achieve this, we plan to contribute improvements to slimcc's parsing, tokenization, and preprocessing components.

fuhsnn · 2024-11-16T16:59:46Z

I'm thrilled to know that the project has some real world value and am happy to target different backends as long as the IR is stable, but I imagine memory usage will be a major pain point on embedded targets that shecc currently self-hosts on. I thought about re-architecting, but then the code wouldn't be as accessible and one might as well work on other more optimized C frontend like tcc, cproc, tyfkda/xcc, libfirm/cparser etc.

If C++ is allowed in your classroom, I do have an unfinished C++ port sitting around that use RAII for nearly identical code with much better memory consumption, a minimal C++ front-end bootstrappable with C99 is also something I've always wanted to work on.

fuhsnn · 2024-11-16T22:14:26Z

The C++ version is uploaded in partial_c++_port branch, it's less C++ than I remembered and still able to build as C with just a little macro and typedef's.

If that particular style is okay to you, I'd prefer basing performance optimizations on that version, since it feels a bit bike shedding to optimize parsing while compiling larger projects hit OOM on an embedded device.

See fuhsnn#92

Speed up recognize punctuation

jserv · 2024-11-22T08:11:02Z

I defer this pull request to @ChAoSUnItY, who is assigned to proceed with the above research prototype under my supervision.

Previously, HashMap relied on division operations, which are expensive. This update replaces divisions with logical AND operations using an additional mask, improving performance.

Avoid expensive division in HashMap

fuhsnn · 2024-11-23T01:52:51Z

I believe identifier detection is of decent speed after efcbb85, feel free to open new issues regarding the C backend.

Btw thanks for mentioning uftrace, logging exact call counts is incredible. Can't believe it's not more well known.

In 'preprocess2' function, 'expand_macro' was called multiple times, some of which were unnecessary. This change ensures that 'expand_macro' is called only when the preprocessor token is an identifier.

Expand macro only when preprocessor token is identifier

Fixes fuhsnn#98

ChAoSUnItY · 2024-11-26T17:42:34Z

I've adopted identifier reading logic from efcbb85 and further refine it with the macro functions introduced previously in this PR, and also applied fool-proofing logic to the function. This way, it improves readability.

jserv · 2024-11-28T00:49:14Z

Let's rebase the latest main branch and squash the commits.

Resolves fuhsnn#97

Both 'is_ident1' and 'is_ident2' are now macros instead of function calls, and they are tweaked for ASCII detection in advance with the fallback to table lookup for non-ASCII characters. Also discard unnecessary variable and add safe guard: - Variable is_first can be replaced with boolean expression "p == start" - Add safe guard for ascii character checking to ensure starting identifier character must not be numberic Additionally, replace first ascii character check with macro is_ident2_ascii to keep readability. The later is_ident2 function call is replaced with is_ident2_non_ascii because the expanded macro function will result in multiple decode_utf8 function call, also it's redundant to check if it's an ascii character or not. Co-authored-by: Jim Huang <[email protected]>

Speed up recognize punctuation

5d2f341

The original code in function 'read_punct' relies on heavy string specific function calls, resulting in slower execution. Instead, this function can be faster using straightforward control flow.

fuhsnn added 9 commits November 18, 2024 16:16

Rework path handling of #include <>

c7ae74f

Disable universal char names by default

aa15496

See fuhsnn#92

Replace most <ctypes.h> usage with macro

bfe1420

Use Inrange() macro more

4932df2

Micro-optimize tokenizer space skipping

c97957f

Merge pull request fuhsnn#90 from jserv/speedup-read-punct

845a058

Speed up recognize punctuation

Tweak previous commit

004c4c9

Remove unnessasary strndup()'s

e5cca77

Micro-optimize tokenizing ASCII identifiers

efcbb85

fuhsnn and others added 7 commits November 22, 2024 16:51

Reduce strlen/strncmp calls

f65db8e

Rework integer literal suffix algorithm

97fbcea

Rework newline canonicalization

3b70322

Fixup f65db8e

4a79cd9

Avoid expensive division in HashMap

4d34438

Previously, HashMap relied on division operations, which are expensive. This update replaces divisions with logical AND operations using an additional mask, improving performance.

Make Token struct packed

240f854

Merge pull request fuhsnn#93 from jserv/tweak-hashmap

d67ae4b

Avoid expensive division in HashMap

jserv and others added 2 commits November 23, 2024 14:26

Expand macro only when preprocessor token is identifier

d80d590

In 'preprocess2' function, 'expand_macro' was called multiple times, some of which were unnecessary. This change ensures that 'expand_macro' is called only when the preprocessor token is an identifier.

Merge pull request fuhsnn#95 from jserv/refine-expand_macro

6966567

Expand macro only when preprocessor token is identifier

ChAoSUnItY mentioned this pull request Nov 23, 2024

Support of C Backend #102

Open

fuhsnn added 2 commits November 24, 2024 10:48

Reimplement #include_next

9ad13db

Fixes fuhsnn#98

Fix test driver

ed5f470

fuhsnn added 3 commits November 26, 2024 21:58

Tweak CI

fab69f6

Recognize .h files with -E

486f55a

Tweak CI

76d26f1

Fix string buffer over-read

7b08c7b

fuhsnn force-pushed the main branch 4 times, most recently from 892c6b6 to 7c97de8 Compare November 27, 2024 09:19

More CI tests

ed6cfec

fuhsnn force-pushed the main branch from 7c97de8 to ed6cfec Compare November 27, 2024 09:35

Fix wrongly omitted unsigned bitfield cast

89b25b4

Rework enum underlying type determination

9c18a36

Resolves fuhsnn#97

ChAoSUnItY force-pushed the speedup-is-ident branch from d4f5463 to 644be33 Compare November 28, 2024 05:48

ChAoSUnItY force-pushed the speedup-is-ident branch from 644be33 to 4b34032 Compare November 28, 2024 06:00

fuhsnn force-pushed the main branch 13 times, most recently from d4283bc to c10050f Compare November 29, 2024 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up identifier detection #89

Speed up identifier detection #89

jserv commented Nov 16, 2024

fuhsnn commented Nov 16, 2024

fuhsnn commented Nov 16, 2024

jserv commented Nov 16, 2024

jserv commented Nov 16, 2024

fuhsnn commented Nov 16, 2024

fuhsnn commented Nov 16, 2024 •

edited

Loading

jserv commented Nov 22, 2024

fuhsnn commented Nov 23, 2024

ChAoSUnItY commented Nov 26, 2024

jserv commented Nov 28, 2024

Speed up identifier detection #89

Are you sure you want to change the base?

Speed up identifier detection #89

Conversation

jserv commented Nov 16, 2024

fuhsnn commented Nov 16, 2024

fuhsnn commented Nov 16, 2024

jserv commented Nov 16, 2024

jserv commented Nov 16, 2024

fuhsnn commented Nov 16, 2024

fuhsnn commented Nov 16, 2024 • edited Loading

jserv commented Nov 22, 2024

fuhsnn commented Nov 23, 2024

ChAoSUnItY commented Nov 26, 2024

jserv commented Nov 28, 2024

fuhsnn commented Nov 16, 2024 •

edited

Loading