Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minify state names #3728

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

benedikt-bartscher
Copy link
Contributor

@benedikt-bartscher benedikt-bartscher commented Jul 31, 2024

Alternative to #3701

@benedikt-bartscher
Copy link
Contributor Author

benedikt-bartscher commented Aug 4, 2024

Actually, it's not a problem with the env vars not getting passed. Tests are failing due to _var_set_state which is called during init_subclass hooks and writes the state names into the vars/vardata.

@benedikt-bartscher benedikt-bartscher marked this pull request as ready for review August 4, 2024 21:07
@benedikt-bartscher
Copy link
Contributor Author

benedikt-bartscher commented Aug 5, 2024

Actually, it's not a problem with the env vars not getting passed. Tests are failing due to _var_set_state which is called during init_subclass hooks and writes the state names into the vars/vardata.

@masenf do you have any idea how to solve this?
I have added a skip marker to those tests. They are working fine if you set REFLEX_MINIFY_STATES or prod env in your shell.

Copy link
Collaborator

@masenf masenf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for the _var_set_state, we probably could just save a reference to the state class and defer formatting of the state name until it's actually needed (_var_full_name_needs_state_prefix)

@benedikt-bartscher
Copy link
Contributor Author

as for the _var_set_state, we probably could just save a reference to the state class and defer formatting of the state name until it's actually needed (_var_full_name_needs_state_prefix)

Thanks, i was thinking about that. Do you want me to migrate to state refs in this PR or in a followup?

@masenf
Copy link
Collaborator

masenf commented Aug 5, 2024

i think you can do the refs in this PR, as long as it's not too invasive. We have a big Var refactor coming in the next week or so, that might conflict, but we can fix that up

@benedikt-bartscher benedikt-bartscher marked this pull request as draft September 4, 2024 19:38
@masenf
Copy link
Collaborator

masenf commented Oct 8, 2024

Merged origin/main and made a few fixes on top to get the tests working for me locally.

@benedikt-bartscher benedikt-bartscher marked this pull request as ready for review October 10, 2024 21:12
@benedikt-bartscher
Copy link
Contributor Author

Merged origin/main and made a few fixes on top to get the tests working for me locally.

Thank you very much @masenf!

Regarding your commit msg "test_minified_states: remove skip -- things seem to be working as-is" - that might be due to the var refactor from @adhami3310 especially ComputedVar.__get__.

I think this should be ready for review ✔️

@benedikt-bartscher
Copy link
Contributor Author

Do we still want to get this in? I will rebase it if there is intention to review it

@masenf
Copy link
Collaborator

masenf commented Oct 24, 2024

yes, we want to get it 🤞 thanks @benedikt-bartscher

@Lendemor
Copy link
Collaborator

Marking this as Draft until the conflict with main are solved.
(Yes we want that PR 👍 )

@Lendemor Lendemor marked this pull request as draft October 25, 2024 18:16
@benedikt-bartscher benedikt-bartscher marked this pull request as ready for review October 29, 2024 00:07
@benedikt-bartscher
Copy link
Contributor Author

Hi @Lendemor and @masenf good to hear that you want this 🚀
I resolved all the conflicts, we just need to coordinate the env vars with #4248 otherwise this PR is ready for review

@Lendemor
Copy link
Collaborator

Thanks for your work on this.
I believe we should work on getting #4248 merged first, then adapt this one to work with it.

@benedikt-bartscher
Copy link
Contributor Author

Thanks for your work on this. I believe we should work on getting #4248 merged first, then adapt this one to work with it.

You are welcome. Sounds good, I will wait for a review of #4248 then

@benedikt-bartscher benedikt-bartscher force-pushed the minify-state-names-v2 branch 2 times, most recently from 5e7c38c to 7652fa1 Compare November 1, 2024 21:55
@benedikt-bartscher
Copy link
Contributor Author

@Lendemor I based this onto #4248 and migrated the minify states env to the new env system to speed up the process

@benedikt-bartscher
Copy link
Contributor Author

@masenf thanks for testing it again
It seems like the prod mode env is not set early enough when using the cli (reflex run --env prod)

Traceback (most recent call last):
  File "/.cache/pypoetry/virtualenvs/minified-state-names-IlpRchCp-py3.13/bin/reflex", line 5, in <module>
    from reflex.reflex import cli
  File "/reflex/reflex/reflex.py", line 18, in <module>
    from reflex.state import reset_disk_state_manager
  File "/reflex/reflex/state.py", line 2228, in <module>
    class State(BaseState):
    ...<3 lines>...
        is_hydrated: bool = False
  File "/.cache/pypoetry/virtualenvs/minified-state-names-IlpRchCp-py3.13/lib/python3.13/site-packages/pydantic/v1/main.py", line 282, in __new__
    cls = super().__new__(mcs, name, bases, new_namespace, **kwargs)
  File "<frozen abc>", line 106, in __new__
  File "/reflex/reflex/state.py", line 576, in __init_subclass__
    f.name: get_var_for_field(cls, f)
            ~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/reflex/reflex/state.py", line 277, in get_var_for_field
    field_name = format.format_state_name(cls.get_full_name()) + "." + f.name
                                          ~~~~~~~~~~~~~~~~~^^
  File "/reflex/reflex/state.py", line 951, in get_full_name
    name = cls.get_name()
  File "/reflex/reflex/state.py", line 935, in get_name
    raise Exception(
        f"{EnvironmentVariables.REFLEX_ENV_MODE.get()=}\n{EnvironmentVariables.REFLEX_MINIFY_STATES.get()=}"
    )
Exception: EnvironmentVariables.REFLEX_ENV_MODE.get()=<Env.DEV: 'dev'>
EnvironmentVariables.REFLEX_MINIFY_STATES.get()=False

@benedikt-bartscher
Copy link
Contributor Author

benedikt-bartscher commented Nov 6, 2024

@masenf I do not really like it but with 6ce9471 it works
init_subclass magic hitting again :/

@benedikt-bartscher benedikt-bartscher marked this pull request as draft November 12, 2024 23:13
adhami3310 and others added 5 commits November 14, 2024 21:24
* add typing to function vars

* import ParamSpec from typing_extensions

* remove ellipsis as they are not supported in 3.9

* try importing everything from extensions

* special case 3.9

* don't use Any from extensions

* get typevar from extensions
* Add template name to reflex init success msg

* fix pyright message
@benedikt-bartscher
Copy link
Contributor Author

@masenf I fixed the ci
https://github.com/reflex-dev/reflex/actions/runs/11944696408/job/33296085097?pr=3728 seems like a temporary failure

@benedikt-bartscher benedikt-bartscher marked this pull request as ready for review November 21, 2024 01:27
@benedikt-bartscher
Copy link
Contributor Author

Fixed conflicts and ci failure due to recent reflex-web changes as well
Should be ready to merge, let me know if anything else is needed or issues arise

@benedikt-bartscher
Copy link
Contributor Author

Note to myself: consider porting EnvVar default_factory support to a separate PR for #4430

@benedikt-bartscher benedikt-bartscher changed the title Minify state names v2 Minify state names Nov 25, 2024
@benedikt-bartscher
Copy link
Contributor Author

If there is smth missing here please let me know

@benedikt-bartscher
Copy link
Contributor Author

Just a friendly remainder, i am still interested in getting this one in. Please let me know if anything is missing or not working properly. I could also split it up into multiple PRs if wanted

@masenf
Copy link
Collaborator

masenf commented Dec 11, 2024

Please let me know if anything is missing or not working properly.

I'm testing this with reflex-web workflow like follows

reflex export --frontend-only --no-zip

run a caddy webserver over .web/_static

reflex run --backend-only --env prod

The first thing I noticed is that reflex export does not set prod mode, so the resulting files are using unminified names against the backend which is expecting minified names and doesn't work.

If I set REFLEX_ENV_MODE=prod in the environment before the export, then it produces a static bundle that is able to talk to the minified state backend.

So I tried to update the code to set REFLEX_ENV_MODE=prod anytime an export is performed, this partially worked, but it seems that not all of the states got minified correctly (maybe set the var too late?). I'm doing some investigation on where things go wrong and have a small patch i'm planning to push to your branch when I get things working (wip 34a5592)

@masenf
Copy link
Collaborator

masenf commented Dec 11, 2024

okay, i probed a little further... and it turns out that rxconfig.py in reflex-web imports stuff in the app, which eventually imports from reflex.state, which causes the in-built states to get defined and get non-shorted named before the REFLEX_ENV_MODE is set by the export command (config = get_config() is evaluated early in reflex/reflex.py). that in itself shouldn't be a huge issue, just sub-optimal... but the main_state_name seems to get inserted into context.js incorrectly

@masenf
Copy link
Collaborator

masenf commented Dec 11, 2024

i tried out this patch

diff --git a/reflex/config.py b/reflex/config.py
index a882ce9d2..29ab353d9 100644
--- a/reflex/config.py
+++ b/reflex/config.py
@@ -3,10 +3,13 @@
 from __future__ import annotations
 
 import enum
+from functools import lru_cache
 import importlib
 import inspect
 import os
+import subprocess
 import sys
+import tempfile
 import threading
 import urllib.parse
 from importlib.util import find_spec
@@ -900,3 +903,28 @@ def get_config(reload: bool = False) -> Config:
             # Restore the original sys.path.
             sys.path.clear()
             sys.path.extend(sys_path)
+
+
+@lru_cache
+def get_config_safe() -> Config:
+    """Get the app config without introducing import side-effects.
+
+    Returns:
+        The app config.
+    """
+    with (
+        tempfile.NamedTemporaryFile(mode="w", encoding="utf-8") as script,
+        tempfile.NamedTemporaryFile() as config_json,
+    ):
+        script.write(f"""
+from pathlib import Path
+from reflex.config import get_config
+
+Path({config_json.name!r}).write_text(get_config().json())
+""")
+        script.flush()
+
+        subprocess.run(
+            [sys.executable, script.name],
+        )
+        return Config.parse_file(config_json.name)
diff --git a/reflex/custom_components/custom_components.py b/reflex/custom_components/custom_components.py
index 1406e37bc..92ee1a576 100644
--- a/reflex/custom_components/custom_components.py
+++ b/reflex/custom_components/custom_components.py
@@ -17,11 +17,11 @@ import typer
 from tomlkit.exceptions import TOMLKitError
 
 from reflex import constants
-from reflex.config import environment, get_config
+from reflex.config import environment, get_config_safe
 from reflex.constants import CustomComponents
 from reflex.utils import console
 
-config = get_config()
+config = get_config_safe()
 custom_components_cli = typer.Typer()
 
 POST_CUSTOM_COMPONENTS_GALLERY_ENDPOINT = (
diff --git a/reflex/reflex.py b/reflex/reflex.py
index 7b74071f3..3b5c561a5 100644
--- a/reflex/reflex.py
+++ b/reflex/reflex.py
@@ -15,7 +15,7 @@ from reflex_cli.utils import dependency
 from reflex_cli.v2.deployments import check_version, hosting_cli
 
 from reflex import constants
-from reflex.config import environment, get_config
+from reflex.config import environment, get_config, get_config_safe
 from reflex.custom_components.custom_components import custom_components_cli
 from reflex.utils import console, telemetry
 
@@ -29,8 +29,8 @@ except TypeError:
     # Fallback for older typer versions.
     cli = typer.Typer(add_completion=False)
 
-# Get the config.
-config = get_config()
+# Get the config via subprocess without triggering import side-effects.
+config = get_config_safe()
 
 
 def version(value: bool):

And while it does work for avoiding import side-effects before we have a chance to set the REFLEX_ENV_MODE appropriately, it has the unfortunate effect of slowing down reflex cli commands by over 0.5s on my M2 mac (and I assume much worse on windows and lower spec machines).

I don't think we can take that much of a slowdown on startup, there must be another way to implement this feature. Particularly, we need something that is more deterministic and doesn't depend on the import order, because I'm sure some innocuous looking change somewhere will regress this current approach in the future, and it doesn't seem like our current tests are able to effectively catch that.

We need an approach where the minified name is not tied so strongly to the initialization of the state class but can be turned on before compilation and still work. I think it would also be nice if either regular or minified name could be used against the same backend.

@masenf
Copy link
Collaborator

masenf commented Dec 12, 2024

i tried adding _reset_name_cache and set_env_mode methods to facilitate re-naming and minifying already imported states when changing the REFLEX_ENV_MODE... but this also didn't seem to work because of how the backend runs in a separate process, it ends up getting mismatched between what is exported and the app doesn't work.

code i tried

diff --git a/reflex/reflex.py b/reflex/reflex.py
index 7b74071f3..1e4d57f28 100644
--- a/reflex/reflex.py
+++ b/reflex/reflex.py
@@ -18,6 +18,7 @@ from reflex import constants
 from reflex.config import environment, get_config
 from reflex.custom_components.custom_components import custom_components_cli
 from reflex.utils import console, telemetry
+from reflex.utils.exec import set_env_mode
 
 # Disable typer+rich integration for help panels
 typer.core.rich = None  # type: ignore
@@ -136,7 +137,7 @@ def _run(
     """Run the app in the given directory."""
     # Set env mode in the environment
     # This must be set before importing modules that contain rx.State subclasses
-    environment.REFLEX_ENV_MODE.set(env)
+    set_env_mode(env)
 
     from reflex.state import reset_disk_state_manager
     from reflex.utils import build, exec, prerequisites, processes
@@ -318,7 +319,7 @@ def export(
     """Export the app to a zip file."""
     # Set env mode in the environment
     # This must be set before importing modules that contain rx.State subclasses
-    environment.REFLEX_ENV_MODE.set(constants.Env.PROD)
+    set_env_mode(constants.Env.PROD)
 
     from reflex.utils import export as export_utils
     from reflex.utils import prerequisites
@@ -668,7 +669,7 @@ def deployv2(
     """Deploy the app to the Reflex hosting service."""
     # Set env mode in the environment
     # This must be set before importing modules that contain rx.State subclasses
-    environment.REFLEX_ENV_MODE.set(constants.Env.PROD)
+    set_env_mode(constants.Env.PROD)
 
     from reflex_cli.v2 import cli as hosting_cli
     from reflex_cli.v2.utils import dependency
diff --git a/reflex/state.py b/reflex/state.py
index c1e423f9e..c25ff745f 100644
--- a/reflex/state.py
+++ b/reflex/state.py
@@ -554,18 +554,23 @@ class BaseState(Base, ABC, extra=pydantic.Extra.allow):
         if parent_state is not None:
             cls.inherited_vars = parent_state.vars
             cls.inherited_backend_vars = parent_state.backend_vars
+        else:
+            # Track class_subclasses of the BaseState
+            parent_state = BaseState
 
-            # Check if another substate class with the same name has already been defined.
-            if cls.get_name() in set(
-                c.get_name() for c in parent_state.class_subclasses
-            ):
-                # This should not happen, since we have added module prefix to state names in #3214
-                raise StateValueError(
-                    f"The substate class '{cls.get_name()}' has been defined multiple times. "
-                    "Shadowing substate classes is not allowed."
-                )
-            # Track this new subclass in the parent state's subclasses set.
-            parent_state.class_subclasses.add(cls)
+        # Check if another substate class with the same name has already been defined.
+        #if not _reflex_internal_reinit and cls.get_name() in set(
+        if cls.get_name() in set(
+            c.get_name() for c in parent_state.class_subclasses
+        ):
+            breakpoint()
+            # This should not happen, since we have added module prefix to state names in #3214
+            raise StateValueError(
+                f"The substate class '{cls.get_name()}' has been defined multiple times. "
+                "Shadowing substate classes is not allowed."
+            )
+        # Track this new subclass in the parent state's subclasses set.
+        parent_state.class_subclasses.add(cls)
 
         # Get computed vars.
         computed_vars = cls._get_computed_vars()
@@ -970,6 +975,21 @@ class BaseState(Base, ABC, extra=pydantic.Extra.allow):
             name = ".".join((parent_state.get_full_name(), name))
         return name
 
+    @classmethod
+    def _reset_name_cache(cls):
+        """Reset `get_name` and `get_full_name` cache to use minified state names."""
+        subclasses = sorted(cls.class_subclasses, key=lambda x: x.__name__)
+        cls.get_name.cache_clear()
+        cls.get_full_name.cache_clear()
+        if cls is BaseState:
+            cls.class_subclasses.clear()
+        else:
+            cls.__init_subclass__()
+            print(f"REINIT {cls.__name__} ({cls.get_full_name()})")
+            print(minified_state_names)
+        for subcls in subclasses:
+            subcls._reset_name_cache()
+
     @classmethod
     @functools.lru_cache()
     def get_class_substate(cls, path: Sequence[str] | str) -> Type[BaseState]:
@@ -1658,6 +1678,8 @@ class BaseState(Base, ABC, extra=pydantic.Extra.allow):
             raise ValueError(
                 "The value of state cannot be None when processing an event."
             )
+        if name not in substate.event_handlers:
+            breakpoint()
         handler = substate.event_handlers[name]
 
         # For background tasks, proxy the state
diff --git a/reflex/utils/exec.py b/reflex/utils/exec.py
index 5291de095..56c16f712 100644
--- a/reflex/utils/exec.py
+++ b/reflex/utils/exec.py
@@ -497,6 +497,19 @@ def is_prod_mode() -> bool:
     return current_mode == constants.Env.PROD
 
 
+def set_env_mode(mode: constants.Env):
+    """Mark the app as running in dev or prod mode.
+
+    Args:
+        mode: The mode to set the app to.
+    """
+    environment.REFLEX_ENV_MODE.set(mode)
+    from reflex.state import BaseState
+
+    # Ensure that minified names are or are not used based on the mode.
+    BaseState._reset_name_cache()
+
+
 def is_frontend_only() -> bool:
     """Check if the app is running in frontend-only mode.

I think a proper solution for minifying the state names needs a more deterministic approach.

Either

  • hashing the state name into something smaller, or
  • writing out the minified mapping as a compilation artifact, so that the frontend and backend can always agree on what the minified names are.

Basically the following needs to work for reflex-web (and any app, but reflex-web seems especially finicky becausee its rxconfig.py ends up importing rx.State).

reflex export --frontend-only --no-zip
python3.12 -m http.server -d .web/_static 3000 &
reflex run --env prod --backend-only

So unfortunately this PR won't make it into our last release of 2024, but we do still want it, and think it's helpful for reducing network payload.

@benedikt-bartscher
Copy link
Contributor Author

Thanks for all the details and ideas @masenf! I will work through your comments in the next days and try to find a better approach. Will keep you posted.

So unfortunately this PR won't make it into our last release of 2024, but we do still want it, and think it's helpful for reducing network payload.

Absolutely no problem, we obviously need to properly address all issues first.
Btw, it also reduces the redis key size a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants