From 4d72aa3811e487c57e90be19a044afabdac8b30c Mon Sep 17 00:00:00 2001 From: Joel Lamy-Poirier Date: Wed, 18 Dec 2024 21:14:16 -0500 Subject: [PATCH] Markdownlint pre-commit --- .github/ISSUE_TEMPLATE/bug_report.md | 24 ++--- .github/ISSUE_TEMPLATE/feature_request.md | 10 +- .github/PULL_REQUEST_TEMPLATE.md | 54 +++++----- docs/.markdownlint.yaml => .markdownlint.yaml | 20 +++- .pre-commit-config.yaml | 15 ++- CODE_OF_CONDUCT.md | 40 +++---- README.md | 102 +++++++++--------- SECURITY.md | 16 +-- docs/about-us.md | 14 +-- docs/developers/contributing.md | 46 ++++---- docs/developers/style-guide.md | 97 +++++++++++------ docs/index.md | 20 ++-- docs/join-us.md | 12 +-- docs/quick-start.md | 6 +- docs/recipes/data-preparation.md | 6 +- fast_llm/models/custom/readme.md | 27 +++-- 16 files changed, 281 insertions(+), 228 deletions(-) rename docs/.markdownlint.yaml => .markdownlint.yaml (69%) diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 72e68d69..f5879ce4 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -7,16 +7,16 @@ assignees: jlamypoirier --- -# 🐞 Describe the Bug +## 🐞 Describe the Bug Provide a clear and concise description of the bug. -# πŸ”„ Steps to Reproduce +## πŸ”„ Steps to Reproduce Steps to reproduce the behavior: -1. **Get the relevant Fast-LLM version** (e.g., git commit hash or Docker image tag) that you encountered the issue with. -2. **Run the following command** (modify or redact as needed): +1. **Get the relevant Fast-LLM version** (e.g., git commit hash or Docker image tag) that you encountered the issue with. +2. **Run the following command** (modify or redact as needed): ```bash torchrun --rdzv_backend=static \ @@ -31,14 +31,14 @@ Steps to reproduce the behavior: --config /path/to/your/config.yaml ``` -3. **Include relevant log excerpts** to help us diagnose the issue, with `NCCL_DEBUG=INFO` (or higher) enabled. Make sure the logs contain the full configuration of the run. -4. **Provide the configuration YAML** used for the Fast-LLM setup if logs are unavailable. +3. **Include relevant log excerpts** to help us diagnose the issue, with `NCCL_DEBUG=INFO` (or higher) enabled. Make sure the logs contain the full configuration of the run. +4. **Provide the configuration YAML** used for the Fast-LLM setup if logs are unavailable. -# 🎯 Expected Behavior +## 🎯 Expected Behavior Describe what you expected to happen. -# πŸ“œ Environment Information +## πŸ“œ Environment Information Run the following script in your environment and paste its output here: @@ -105,10 +105,10 @@ fi echo "=== END OF ENVIRONMENT INFORMATION ===" ``` -# πŸ“ Additional Context +## πŸ“ Additional Context Include any other information that may help us understand the issue, such as: -- Recent changes to the configuration or code. -- Whether the issue occurs consistently or intermittently. -- Any troubleshooting steps you have already tried. +- Recent changes to the configuration or code. +- Whether the issue occurs consistently or intermittently. +- Any troubleshooting steps you have already tried. diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index 1b434b9b..d258dedd 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -7,27 +7,27 @@ assignees: '' --- -# 🧐 Problem Description +## 🧐 Problem Description Is your feature request related to a specific problem? Please describe it clearly. For example: "I'm always frustrated when [...]" -# πŸ’‘ Proposed Solution +## πŸ’‘ Proposed Solution Describe the solution you would like to see. Be as specific as possible about how it would work or be implemented. -# πŸ”„ Alternatives Considered +## πŸ”„ Alternatives Considered Have you considered any alternative solutions or approaches? If so, please describe them and explain why they might not be ideal. -# πŸ“ˆ Potential Benefits +## πŸ“ˆ Potential Benefits Explain how this feature could benefit Fast-LLM users. Consider how it might improve performance, usability, scalability, etc. -# πŸ“ Additional Context +## πŸ“ Additional Context Add any other context or information that could help us understand the feature request better. If applicable, provide links to relevant references or examples. diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 8b595407..6330048c 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -9,21 +9,21 @@ Closes # Select all that apply: -- [ ] πŸ› **Bug fix** (non-breaking change that addresses a specific issue) -- [ ] πŸš€ **New feature** (non-breaking change that adds functionality) -- [ ] ⚠️ **Breaking change** (a change that could affect existing functionality) -- [ ] πŸ“ˆ **Performance improvement/optimization** (improves speed, memory usage, or efficiency) -- [ ] πŸ› οΈ **Code refactor** (non-functional changes that improve code readability, structure, etc.) -- [ ] πŸ“¦ **Dependency bump** (updates dependencies, including Dockerfile or package changes) -- [ ] πŸ“ **Documentation change** (updates documentation, including new content or typo fixes) -- [ ] πŸ”§ **Infrastructure/Build change** (affects build process, CI/CD, or dependencies) +- [ ] πŸ› **Bug fix** (non-breaking change that addresses a specific issue) +- [ ] πŸš€ **New feature** (non-breaking change that adds functionality) +- [ ] ⚠️ **Breaking change** (a change that could affect existing functionality) +- [ ] πŸ“ˆ **Performance improvement/optimization** (improves speed, memory usage, or efficiency) +- [ ] πŸ› οΈ **Code refactor** (non-functional changes that improve code readability, structure, etc.) +- [ ] πŸ“¦ **Dependency bump** (updates dependencies, including Dockerfile or package changes) +- [ ] πŸ“ **Documentation change** (updates documentation, including new content or typo fixes) +- [ ] πŸ”§ **Infrastructure/Build change** (affects build process, CI/CD, or dependencies) ## πŸ“ Changes List the key changes introduced in this PR: -1. Change A -2. Change B +1. Change A +2. Change B ## βœ… Checklist @@ -31,32 +31,32 @@ Make sure the following tasks are completed before submitting the PR: ### General -- [ ] πŸ“œ I have read and followed the [contributing guidelines](https://servicenow.github.io/Fast-LLM/developers/contributing). -- [ ] 🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced. -- [ ] πŸŽ‰ The functionality is complete, and I have tested the changes. -- [ ] πŸ“ I have updated the documentation if needed. -- [ ] ⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases). -- [ ] 🧩 I have commented my code, especially in hard-to-understand areas. +- [ ] πŸ“œ I have read and followed the [contributing guidelines](https://servicenow.github.io/Fast-LLM/developers/contributing). +- [ ] 🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced. +- [ ] πŸŽ‰ The functionality is complete, and I have tested the changes. +- [ ] πŸ“ I have updated the documentation if needed. +- [ ] ⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases). +- [ ] 🧩 I have commented my code, especially in hard-to-understand areas. ### Dependencies and Configuration -- [ ] πŸ‹ I have updated the Docker configuration or dependencies, if applicable. -- [ ] πŸ”„ I have ensured compatibility with the existing setup after dependency changes. +- [ ] πŸ‹ I have updated the Docker configuration or dependencies, if applicable. +- [ ] πŸ”„ I have ensured compatibility with the existing setup after dependency changes. ### Testing -- [ ] πŸ§ͺ I have added or updated tests to cover my changes. -- [ ] βœ”οΈ New and existing tests pass locally with my changes. -- [ ] 🚦 I have tested these changes on GPUs and verified training stability. -- [ ] πŸ‹οΈ I have tested the changes on realistic training workloads, if applicable. +- [ ] πŸ§ͺ I have added or updated tests to cover my changes. +- [ ] βœ”οΈ New and existing tests pass locally with my changes. +- [ ] 🚦 I have tested these changes on GPUs and verified training stability. +- [ ] πŸ‹οΈ I have tested the changes on realistic training workloads, if applicable. ### Performance Impact -- [ ] πŸ“Š I have run benchmarks where applicable to evaluate the performance impact. -- [ ] βœ… The benchmarks show no performance regression. -- [ ] πŸš€ The benchmarks indicate a potential performance improvement. -- [ ] ⚠️ The benchmarks indicate a potential performance degradation. -- [ ] πŸ“ˆ I have provided benchmark results and detailed any performance impact below, if applicable. +- [ ] πŸ“Š I have run benchmarks where applicable to evaluate the performance impact. +- [ ] βœ… The benchmarks show no performance regression. +- [ ] πŸš€ The benchmarks indicate a potential performance improvement. +- [ ] ⚠️ The benchmarks indicate a potential performance degradation. +- [ ] πŸ“ˆ I have provided benchmark results and detailed any performance impact below, if applicable. ## πŸ“Š Performance Impact Details diff --git a/docs/.markdownlint.yaml b/.markdownlint.yaml similarity index 69% rename from docs/.markdownlint.yaml rename to .markdownlint.yaml index 44d5cf91..bdd8af70 100644 --- a/docs/.markdownlint.yaml +++ b/.markdownlint.yaml @@ -20,13 +20,23 @@ MD010: # MD013/line-length : Line length : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md013.md MD013: false +# MD024/no-duplicate-heading Multiple headings with the same content (disabled because we do it). +MD024: false + +# Temporarily disabled because not automatically fixed. # MD030/list-marker-space : Spaces after list markers : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md030.md -MD030: +MD030: false # Spaces for single-line unordered list items - ul_single: 3 + # ul_single: 3 # Spaces for single-line ordered list items - ol_single: 2 + # ol_single: 2 # Spaces for multi-line unordered list items - ul_multi: 3 + # ul_multi: 3 # Spaces for multi-line ordered list items - ol_multi: 2 + # ol_multi: 2 + +# Code block style (disable because of interactions with mkdocs note blocks) +MD046: false + +# Link and image reference definitions (disable because of interactions with mkdocs footnotes) +MD053: false diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index f8465c52..c6d2671a 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -2,7 +2,7 @@ # See https://pre-commit.com/hooks.html for more hooks repos: - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v4.6.0 + rev: v5.0.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer @@ -11,7 +11,7 @@ repos: - --unsafe - id: check-added-large-files - repo: https://github.com/asottile/pyupgrade - rev: v3.17.0 + rev: v3.19.1 hooks: - id: pyupgrade args: @@ -42,9 +42,18 @@ repos: name: isort (pyi) types: [pyi] - repo: https://github.com/psf/black - rev: 24.8.0 + rev: 24.10.0 hooks: - id: black args: - "--config" - "./pyproject.toml" +- repo: https://github.com/DavidAnson/markdownlint-cli2 + rev: v0.16.0 + hooks: + - id: markdownlint-cli2 + name: markdownlint + entry: markdownlint-cli2 + args: ["--fix"] + language: node + types: [markdown] diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 4e623f9f..3a639c3f 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -6,15 +6,15 @@ This code of conduct provides guidelines for participation in ServiceNow-managed Communities thrive when members support each other and provide useful feedback. -- Be polite and courteous. Respect and treat others as you would expect to be treated yourself. -- Respect your audience. Posts should not upset, annoy, threaten, harass, abuse or embarrass other members. -- User Contributions must not include material that is defamatory, obscene, indecent, abusive, offensive, harassing, violent, hateful, inflammatory or otherwise objectionable. -- Lively and collegial discussions are always encouraged in a healthy community. It is okay to argue facts but not okay to argue personalities or personal beliefs. -- Do not use text formats such as all caps or bold that may be read as annoying, rude or send a strong message. -- Do not publish anyone's private personal information without their explicit consent. -- Avoid using abbreviations or terminology that others may not understand. An abbreviation may mean something to you but in another context or country, it may have another meaning. -- Be accountable for your actions by correcting your mistakes and indicating where you have changed a previous post of yours. -- Mark content as correct and helpful, and provide feedback. If you read a discussion post that you find helpful, we encourage you to leave a positive vote and comment in the replies. If you find a post that is unhelpful, please provide more information in the issue comments. +- Be polite and courteous. Respect and treat others as you would expect to be treated yourself. +- Respect your audience. Posts should not upset, annoy, threaten, harass, abuse or embarrass other members. +- User Contributions must not include material that is defamatory, obscene, indecent, abusive, offensive, harassing, violent, hateful, inflammatory or otherwise objectionable. +- Lively and collegial discussions are always encouraged in a healthy community. It is okay to argue facts but not okay to argue personalities or personal beliefs. +- Do not use text formats such as all caps or bold that may be read as annoying, rude or send a strong message. +- Do not publish anyone's private personal information without their explicit consent. +- Avoid using abbreviations or terminology that others may not understand. An abbreviation may mean something to you but in another context or country, it may have another meaning. +- Be accountable for your actions by correcting your mistakes and indicating where you have changed a previous post of yours. +- Mark content as correct and helpful, and provide feedback. If you read a discussion post that you find helpful, we encourage you to leave a positive vote and comment in the replies. If you find a post that is unhelpful, please provide more information in the issue comments. ## Issue board guidelines @@ -22,20 +22,20 @@ Many open-source projects provide an Issues board, with similar functionality to ServiceNow suggests the following technical support pathways for open-source projects: -1. Clearly identify and document the issue or question you have. -2. View the Documentation. -3. Search the Discussions. -4. Search the project documentation for known errors, useful solutions, and troubleshooting tips. -5. Check the project contribution guidelines if you would like details on how you can submit a change. Community contributions are valued and appreciated! -6. Log an Issue if it hasn't already been logged. If the issue has already been logged by another user, vote it up, and add a comment with additional or missing information. Do your best to choose the correct category when logging a new issue. This will make it easier to differentiate bugs from new feature requests or ideas. If after logging an issue you find the solution, please close your issue and provide a comment with the solution. This will help the project owners and other users. -7. Contact the project team contributors of the project to see if they can help as a last resort only. +1. Clearly identify and document the issue or question you have. +2. View the Documentation. +3. Search the Discussions. +4. Search the project documentation for known errors, useful solutions, and troubleshooting tips. +5. Check the project contribution guidelines if you would like details on how you can submit a change. Community contributions are valued and appreciated! +6. Log an Issue if it hasn't already been logged. If the issue has already been logged by another user, vote it up, and add a comment with additional or missing information. Do your best to choose the correct category when logging a new issue. This will make it easier to differentiate bugs from new feature requests or ideas. If after logging an issue you find the solution, please close your issue and provide a comment with the solution. This will help the project owners and other users. +7. Contact the project team contributors of the project to see if they can help as a last resort only. ## Repositories -- Read and follow the license instructions -- Remember to include citations if you use someone else's work in your own project. Use the [`CITATION.cff`](CITATION.cff) to find the correct project citation reference. -- β€˜Star' project repos to save for future reference. -- β€˜Watch' project repos to get notifications of changes – this can get noisy for some projects, so only watch the ones you really need to track closely. +- Read and follow the license instructions +- Remember to include citations if you use someone else's work in your own project. Use the [`CITATION.cff`](CITATION.cff) to find the correct project citation reference. +- β€˜Star' project repos to save for future reference. +- β€˜Watch' project repos to get notifications of changes – this can get noisy for some projects, so only watch the ones you really need to track closely. ## Enforcement and reporting diff --git a/README.md b/README.md index d02e7f95..91b04f38 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,15 @@ + +
+ Fast-LLM [![Docker][ci-badge]][ci-workflow] [![Documentation][docs-badge]][docs-workflow] [![License][license-badge]][license] -*Accelerating your LLM training to full speed* +# Accelerating your LLM training to full speed Made with ❀️ by [ServiceNow Research][servicenow-research] @@ -25,36 +28,36 @@ As a truly open-source project, Fast-LLM allows full customization and extension ## Why Fast-LLM? -1. πŸš€ **Fast-LLM is Blazingly Fast**: - - ⚑️ Optimized kernel efficiency and reduced overheads. - - πŸ”‹ Optimized memory usage for best performance. - - ⏳ Minimizes training time and cost. - -2. πŸ“ˆ **Fast-LLM is Highly Scalable**: - - πŸ“‘ Distributed training across multiple GPUs and nodes using 3D parallelism (Data, Tensor, and Pipeline). - - πŸ”— Supports sequence length parallelism to handle longer sequences effectively. - - 🧠 ZeRO-1, ZeRO-2, and ZeRO-3 implementations for improved memory efficiency. - - πŸŽ›οΈ Mixed precision training support for better performance. - - πŸ‹οΈβ€β™‚οΈ Large batch training and gradient accumulation support. - - πŸ”„ Reproducible training with deterministic behavior. - -3. 🎨 **Fast-LLM is Incredibly Flexible**: - - πŸ€– Compatible with all common language model architectures in a unified class. - - ⚑ Efficient dropless Mixture-of-Experts (MoE) implementation with SoTA performance. - - 🧩 Customizable language model architectures, data loaders, loss functions, and optimizers (in progress). - - πŸ€— Seamless integration with [Hugging Face Transformers][transformers]. - -4. 🎯 **Fast-LLM is Super Easy to Use**: - - πŸ“¦ [Pre-built Docker images](https://github.com/ServiceNow/Fast-LLM/pkgs/container/fast-llm) for quick deployment. - - πŸ“ Simple YAML configuration for hassle-free setup. - - πŸ’» Command-line interface for easy launches. - - πŸ“Š Detailed logging and real-time monitoring features. - - πŸ“š Extensive [documentation][docs] and practical tutorials (in progress). - -5. 🌐 **Fast-LLM is Truly Open Source**: - - βš–οΈ Licensed under [Apache 2.0][license] for maximum freedom to use Fast-LLM at work, in your projects, or for research. - - πŸ’» Transparently developed on GitHub with public [roadmap][roadmap] and [issue tracking][issues]. - - 🀝 Contributions and collaboration are always welcome! +1. πŸš€ **Fast-LLM is Blazingly Fast**: + - ⚑️ Optimized kernel efficiency and reduced overheads. + - πŸ”‹ Optimized memory usage for best performance. + - ⏳ Minimizes training time and cost. + +2. πŸ“ˆ **Fast-LLM is Highly Scalable**: + - πŸ“‘ Distributed training across multiple GPUs and nodes using 3D parallelism (Data, Tensor, and Pipeline). + - πŸ”— Supports sequence length parallelism to handle longer sequences effectively. + - 🧠 ZeRO-1, ZeRO-2, and ZeRO-3 implementations for improved memory efficiency. + - πŸŽ›οΈ Mixed precision training support for better performance. + - πŸ‹οΈβ€β™‚οΈ Large batch training and gradient accumulation support. + - πŸ”„ Reproducible training with deterministic behavior. + +3. 🎨 **Fast-LLM is Incredibly Flexible**: + - πŸ€– Compatible with all common language model architectures in a unified class. + - ⚑ Efficient dropless Mixture-of-Experts (MoE) implementation with SoTA performance. + - 🧩 Customizable language model architectures, data loaders, loss functions, and optimizers (in progress). + - πŸ€— Seamless integration with [Hugging Face Transformers][transformers]. + +4. 🎯 **Fast-LLM is Super Easy to Use**: + - πŸ“¦ [Pre-built Docker images](https://github.com/ServiceNow/Fast-LLM/pkgs/container/fast-llm) for quick deployment. + - πŸ“ Simple YAML configuration for hassle-free setup. + - πŸ’» Command-line interface for easy launches. + - πŸ“Š Detailed logging and real-time monitoring features. + - πŸ“š Extensive [documentation][docs] and practical tutorials (in progress). + +5. 🌐 **Fast-LLM is Truly Open Source**: + - βš–οΈ Licensed under [Apache 2.0][license] for maximum freedom to use Fast-LLM at work, in your projects, or for research. + - πŸ’» Transparently developed on GitHub with public [roadmap][roadmap] and [issue tracking][issues]. + - 🀝 Contributions and collaboration are always welcome! ## Usage @@ -71,14 +74,14 @@ Expect to see a significant speedup in training time compared to other libraries #### Prerequisites -- A [Slurm](https://slurm.schedmd.com/) cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each. -- CUDA 12.1 or higher. -- Dependencies: [PyTorch][pytorch], [Triton][triton], and [Apex](https://github.com/NVIDIA/apex) installed on all nodes. +- A [Slurm](https://slurm.schedmd.com/) cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each. +- CUDA 12.1 or higher. +- Dependencies: [PyTorch][pytorch], [Triton][triton], and [Apex](https://github.com/NVIDIA/apex) installed on all nodes. #### Steps -1. Deploy the [nvcr.io/nvidia/pytorch:24.07-py3](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) Docker image to all nodes (recommended), because it contains all the necessary dependencies. -2. Install Fast-LLM on all nodes: +1. Deploy the [nvcr.io/nvidia/pytorch:24.07-py3](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) Docker image to all nodes (recommended), because it contains all the necessary dependencies. +2. Install Fast-LLM on all nodes: ```bash sbatch < loss`). This gives other users a better chance to understand the code. +* Use meaningful, self-descriptive identifier names (ex. `x -> loss`). Abstract variable names such as `x` are however OK for generic methods where more descriptive names aren't appropriate (ex. `add(x, y)`). -* Avoid abbreviations, especially domain-specific ones. Ex. `bs -> batch_size`. -This gives everyone a chance to understand the code, regardless of their prior knowledge. -* Avoid redundancies especially for configuration parameters, ex. `data.data_type` -> `data.type`. -* Avoid name parts that refer to the data type, ex. `num`. Use type hints instead. - -Note that these conventions are enforced more strictly on user-facing names since they are more difficult to change, +* Please avoid abbreviations, especially domain-specific ones. +This gives everyone a chance to understand the code, regardless of their prior knowledge. Ex. `bs -> batch_size`. +* Try to keep names concise, for example by eliminating redundancies +and avoiding data type qualifiers such as `num` (covered by the type hint). +This is especially important for configuration parameters as the fully qualified names can get very long. +For example, `transformer.num_transformers_heads` can be simplified to `transformer.heads` without sacrificing clarity. + +Note that these conventions are especially important on user-facing names which are more difficult to change, for example configuration parameters and the public interface of core classes and modules. +!!! note "Why this matters" + Using explicit, self-explanatory names gives other users a better chance to understand the code, + regardless of their prior knowledge, which facilitates collaboration and maintenance. + Our conventions follow this principle, while attempting to avoid excessively long names. + ## πŸ›¬ Imports We use the following conventions for imports (other than those enforced by isort): -* Import standard library and third party modules by module (ex. `import package.module`, not `from package.module import method`). -In addition to keeping the code consistent, this keeps identifier's origin explicit so anyone can tell where it came from with just a quick glance at the code. This is especially useful for identifiers that with otherwise ambiguous source (ex. `float32` may come from torch, numpy, triton, etc.; Fast-LLM's configuration scheme has many identifiers in common with `dataclasses`, `omegaconf` and `pydantic`) -* Avoid renaming with `as`, except for some (arbitrarily chosen) common ones: `numpy as np`, `triton.language as tl`. -* Import first-party modules through specific identifiers (ex. `from fast_llm.module import method`, not `import fast_llm.module`). This keeps Fast-LLM identifiers to a manageable length and makes it easier to track what is used in a given file. -* Always use absolute imports (ex. no `from .module import method`) -* Include all explicitly-imported third-party module to `setup.cfg`. +* Import standard library and third party modules by module (ex. `import package.module`, not `from package.module import method`). +In addition to keeping the code consistent, this keeps identifier's origin explicit so anyone can tell where it came from with just a quick glance at the code. +* Avoid renaming with `as`, except for some (arbitrarily chosen) common ones: `numpy as np`, `triton.language as tl`. +* Import first-party modules through specific identifiers (ex. `from fast_llm.module import method`, not `import fast_llm.module`). This keeps Fast-LLM identifiers to a manageable length and makes it easier to track what is used in a given file. +* Always use absolute imports (ex. no `from .module import method`) +* Include all explicitly-imported third-party module to `setup.cfg`. Only add new requirements if they provide a substantial benefit, as we try to keep the requirements to a minimum. -* Prefer file-level imports over imports inside methods, unless they significantly slow down the import process +* Prefer file-level imports over imports inside methods, unless they significantly slow down the import process or concern an optional dependency that should not be absolutely required to import the module (ex. `transformers`). If an offending import is only required for a type hint, include it in a `if typing.TYPE_CHECKING:` block. -!!! warning "Configuration modules" +!!! note "Why this matters" + Most python conventions make no clear recommendation concerning imports, + which can easily lead to inconsistent import formats across a repo, and can make it harder to understand. + Our conventions aim to avoid these arbitrary choices by providing an explicit prescription, + which should be good enough nearly everywhere. Our choice is justified as follows: + + * For third-party and standard library packages, fully qualified identifiers are typically relatively short, + so it makes sense to keep them. + This also keeps identifier's origin explicit so anyone can tell where it came from with just a quick glance at the code. + This is especially useful for identifiers that with otherwise ambiguous source (ex. `float32` may come from torch, numpy, triton, etc.; Fast-LLM's configuration scheme has many identifiers in common with `dataclasses`, `omegaconf` and `pydantic`) + * For first-package, fully qualified names are generally too long to use in code, + since they include the entire directory structure to the Fast-LLM, + so first-party identifiers need to be imported by name. + There should be very little ambiguity, because name clashes are uncommon within Fast-LLM, + and external identifiers are already clearly marked as such. +!!! warning "Configuration modules" Fast-LLM supports instantiation and validation of configurations with a barebone installation. Because of this, modules that contain configuration classes (usually named `config.py`) should not include any top-level third-party import (except for those installed in the [barebone install](https://github.com/ServiceNow/Fast-LLM/blob/main/setup.cfg)), @@ -71,29 +92,43 @@ If an offending import is only required for a type hint, include it in a `if typ ## πŸ”“ Public and Private Interface -Although good practices of object-oriented programming are generally ignored in python, -Fast-LLM attempts to follow them to an extent, while avoiding unnecessary bloat: +We use the following conventions for class and module interfaces: -* Mark private and protected variables with an underscore `_` prefix. +* Mark private and protected variables with an underscore `_` prefix. As is customary in python, we make no distinction between the two and avoid the double-underscore `__` notation. -* Keep public interfaces (methods and variables without underscore prefix) as lean as possible, +* Keep public interfaces (methods and variables without underscore prefix) as lean as possible, i.e. mark everything as private/protected unless there is a clear need to make it public. We can always add to the public interface later, but removing from it is difficult. -* Use accessors sparingly through the `@property` decorator or equivalent, +* Use accessors sparingly through the `@property` decorator or equivalent, usually to define read-only public variables. +!!! note "Why this matters" + Although good practices of object-oriented programming are generally ignored in python, + Fast-LLM attempts to follow them to an extent, while avoiding unnecessary bloat. + Public interfaces are expected to be stable, + which make further modifications difficult as they could break external code. + On the other hand, private interface are freely modifiable, + which provides more freedom for fixes, improvement, refactoring, etc. + Therefore, having lean public interfaces is critical for us to keep maintaining and improving Fast-LLM. + ## πŸ’‘ Type Hints Fast-LLM uses type hints for several reasons, including code readability, type checking in IDEs, and type validation for configurations: -* Always use type hints for the public interface of a classes and modules. -Type hints for method outputs may be omitted if they can be easily inferred. -* Prefer using type hints in private interfaces, especially if it improves readability and/or static type checking. -* Use newer type hint formats when possible, ex. `typing.List -> list`, `typing.Union(A,B) -> A | B`. +* Always use type hints for the public interface of a classes and modules. +Type hints for method outputs may be omitted if they can be trivially inferred, +ex. if they return the input, an explicitly typed variable or nothing. +* Prefer using type hints in private interfaces, especially if it improves readability and/or static type checking. +* Prefer newer type hint formats over older ones, ex. `typing.List -> list`, `typing.Union(A,B) -> A | B`. + +!!! note "Why this matters" + We use type hints for various reasons. In addition to making the code more understandable, + they are used by IDEs such as VS Code or PyCharm to perform static type checking, + which speeds up development and is essential to keeping the code bug-free. ## πŸ—‘οΈ Misc -* Please add descriptions and comments as needed, especially for parts that would otherwise be difficult to understand. -* Use `pathlib` rather than `os.path`. -* We encourage the use of modern python features when beneficial, up to the minimum python version (3.12). +* Please add descriptions and comments as needed, especially for parts that would otherwise be difficult to understand. +* Use `pathlib` rather than `os.path`. +* We encourage the use of modern python features when beneficial, up to the minimum python version (3.12). diff --git a/docs/index.md b/docs/index.md index d60b405b..9a543d6f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -50,9 +50,9 @@ Fast-LLM offers all the capabilities you need to accelerate your LLM training an Fast-LLM powers the world's most advanced AI projects: -- **NLP Research and Development:** Train state-of-the-art language models for natural language understanding, summarization, and conversational AI. -- **Enterprise AI Solutions:** Accelerate time-to-market for AI products by reducing training costs and enabling faster iteration. -- **Academic Collaborations:** Drive AI innovation with high-performance training capabilities that support cutting-edge research in machine learning. +- **NLP Research and Development:** Train state-of-the-art language models for natural language understanding, summarization, and conversational AI. +- **Enterprise AI Solutions:** Accelerate time-to-market for AI products by reducing training costs and enabling faster iteration. +- **Academic Collaborations:** Drive AI innovation with high-performance training capabilities that support cutting-edge research in machine learning. See how Fast-LLM has helped early adopters achieve faster results. [Explore use cases and success stories](success-stories/starcoder-2.md). @@ -60,18 +60,18 @@ See how Fast-LLM has helped early adopters achieve faster results. [Explore use Fast-LLM is designed to be the **go-to solution** for those training the most sophisticated language models. Our objectives include: -- **Accelerating Training Workflows:** Deliver the fastest LLM training experience with optimized kernel efficiency, parallelism, and memory management. -- **Supporting a Broad Range of Architectures:** Offer built-in support for all major language model architectures, with an architecture-agnostic approach that allows users to easily adapt the framework to emerging models. -- **Enabling Seamless Integration and Deployment:** Integrate effortlessly into existing ML pipelines, including [HuggingFace Transformers](https://huggingface.co/transformers) and [Kubernetes](https://kubernetes.io)-based clusters. -- **Advancing LLM Research and Production-Readiness:** Be suitable for both cutting-edge research and mission-critical production workloads. +- **Accelerating Training Workflows:** Deliver the fastest LLM training experience with optimized kernel efficiency, parallelism, and memory management. +- **Supporting a Broad Range of Architectures:** Offer built-in support for all major language model architectures, with an architecture-agnostic approach that allows users to easily adapt the framework to emerging models. +- **Enabling Seamless Integration and Deployment:** Integrate effortlessly into existing ML pipelines, including [HuggingFace Transformers](https://huggingface.co/transformers) and [Kubernetes](https://kubernetes.io)-based clusters. +- **Advancing LLM Research and Production-Readiness:** Be suitable for both cutting-edge research and mission-critical production workloads. ## Collaboration and Contribution As Fast-LLM evolves, we invite the community to contribute and help shape its future. We welcome: -- **Testing and Bug Fixes:** Help us identify issues and improve stability. -- **Feature Development:** Contribute new models, new training features, and new optimizations. -- **Documentation and Tutorials:** Make Fast-LLM more accessible by improving our documentation and writing practical guides. +- **Testing and Bug Fixes:** Help us identify issues and improve stability. +- **Feature Development:** Contribute new models, new training features, and new optimizations. +- **Documentation and Tutorials:** Make Fast-LLM more accessible by improving our documentation and writing practical guides. Fast-LLM is more than just software, it's a community. Get involved by exploring our [contribution guidelines](developers/contributing.md) and engaging with us on [GitHub Discussions](https://github.com/ServiceNow/Fast-LLM/discussions). diff --git a/docs/join-us.md b/docs/join-us.md index 31ff49ab..26154314 100644 --- a/docs/join-us.md +++ b/docs/join-us.md @@ -16,15 +16,15 @@ Want to keep up with the latest Fast-LLM updates and new opportunities to get in Fast-LLM thrives on collaboration, and we're excited to welcome new contributors! From fixing bugs to adding new features, every code contribution makes a difference. If you're just getting started, our [Good First Issues](https://github.com/ServiceNow/Fast-LLM/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) on GitHub are labeled to help newcomers find approachable tasks. To set up your development environment and get oriented with Fast-LLM, check out our **Developer's Corner** for everything you need: -- [**Contributing**](developers/contributing.md) – for setup instructions and contributing guidelines -- [**Best Practices**](developers/dev-practices.md) – for tips on writing clean, maintainable code +- [**Contributing**](developers/contributing.md) – for setup instructions and contributing guidelines +- [**Best Practices**](developers/dev-practices.md) – for tips on writing clean, maintainable code Here's a quick overview of the process: -1. **Fork & Clone**: Start by forking the repo and cloning it to your machine. -2. **Set Up Your Dev Environment**: The Developer's Corner guides you through configuring your environment for maximum productivity. -3. **Write Awesome Code**: Make your changes, document them, and follow our best practices. -4. **Open a Pull Request**: Submit a PR to showcase your work and get feedback from our team and the community. +1. **Fork & Clone**: Start by forking the repo and cloning it to your machine. +2. **Set Up Your Dev Environment**: The Developer's Corner guides you through configuring your environment for maximum productivity. +3. **Write Awesome Code**: Make your changes, document them, and follow our best practices. +4. **Open a Pull Request**: Submit a PR to showcase your work and get feedback from our team and the community. Explore our [Developer's Corner](developers/contributing.md) for everything you need to get started! diff --git a/docs/quick-start.md b/docs/quick-start.md index 56189d0c..b4c208f4 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -10,9 +10,9 @@ To follow this guide, you'll need: - **Hardware**: At least one NVIDIA GPU, preferably with Ampere architecture or newer. Note that this tutorial is designed for 80 GB A100s or H100 GPUs, and some adjustments are needed to run it with less memory or an earlier architecture. - **Software**: Depending on your setup, you'll need one of the following: - - **Docker**: If you're using the prebuilt Docker image on your local machine. - - **Python 3.10**: If you're setting up a custom environment (virtual environment, bare-metal, etc.) on your local machine. - - **Cluster Setup**: Access to a Docker-enabled Slurm cluster or to a Kubernetes cluster with Kubeflow if you're using those environments. + - **Docker**: If you're using the prebuilt Docker image on your local machine. + - **Python 3.10**: If you're setting up a custom environment (virtual environment, bare-metal, etc.) on your local machine. + - **Cluster Setup**: Access to a Docker-enabled Slurm cluster or to a Kubernetes cluster with Kubeflow if you're using those environments. ## πŸ— Step 1: Initial Setup diff --git a/docs/recipes/data-preparation.md b/docs/recipes/data-preparation.md index be0f8ef0..412cafb2 100644 --- a/docs/recipes/data-preparation.md +++ b/docs/recipes/data-preparation.md @@ -11,9 +11,9 @@ For this guide, you would need: - **Hardware**: Just a machine with CPUs will do. But having a large numbers of CPUs and nodes helps distribute the data preparation job and significantly speed things up. - **Software**: Depending on your setup, you'll need one of the following: - - **Docker**: If you're using the prebuilt Docker image on your local machine. - - **Python 3.10**: If you're setting up a custom environment (virtual environment, bare-metal, etc.) on your local machine. - - **Cluster Setup**: Access to a Docker-enabled Slurm cluster or to a Kubernetes cluster with Kubeflow if you're using those environments. + - **Docker**: If you're using the prebuilt Docker image on your local machine. + - **Python 3.10**: If you're setting up a custom environment (virtual environment, bare-metal, etc.) on your local machine. + - **Cluster Setup**: Access to a Docker-enabled Slurm cluster or to a Kubernetes cluster with Kubeflow if you're using those environments. ## πŸ“š Step 1: Download the dataset from Huggingface diff --git a/fast_llm/models/custom/readme.md b/fast_llm/models/custom/readme.md index bb3330a3..ca005908 100644 --- a/fast_llm/models/custom/readme.md +++ b/fast_llm/models/custom/readme.md @@ -4,18 +4,17 @@ The "custom" model is a template for customized training of a GPT-style model, for example to fine-tune it for a particular class. This is typically done as follows: -1. Create a copy of the `custom` model, and rename it appropriately, ex. `my_model`, `MyModelTrainer`, etc. -2. If necessary, adjust the base classes to inherit from more abstract classes or another model. +1. Create a copy of the `custom` model, and rename it appropriately, ex. `my_model`, `MyModelTrainer`, etc. +2. If necessary, adjust the base classes to inherit from more abstract classes or another model. ex. `MyModelData(AbstractData)` to re-implement data processing from scratch. -3. Add custom configuration fields in `config.py`. -4. Adapt or re-implement the data loading scheme in `MyModelData`. -5. Adapt or re-implement the preprocessing scheme in `MyModelBaseModel`. -6. Adapt or re-implement the model head, ex. change the task and/or add a custom loss. -7. If needed, adapt the huggingface interface to return outputs for the desired task. -8. Apply other changes as needed. -9. Add the new model to the registry (`models.auto.py`) so it can be used through the cli. -10. Run training with the new model, ex. `fast-llm train my_model [...]`. - +3. Add custom configuration fields in `config.py`. +4. Adapt or re-implement the data loading scheme in `MyModelData`. +5. Adapt or re-implement the preprocessing scheme in `MyModelBaseModel`. +6. Adapt or re-implement the model head, ex. change the task and/or add a custom loss. +7. If needed, adapt the huggingface interface to return outputs for the desired task. +8. Apply other changes as needed. +9. Add the new model to the registry (`models.auto.py`) so it can be used through the cli. +10. Run training with the new model, ex. `fast-llm train my_model [...]`. ## Preprocessing variables and kwargs @@ -26,10 +25,10 @@ Those kwargs will be passed directly to the `forward` method of each layer and c In some cases, it may be desirable to modify the `kwargs` inside a layer, for example to pass additional data to other layers or to the backward pass. This possible with certain caveats: -* There is no direct support for autograd. Detaching tensors is recommended to prevent memory losses. -* Such modifications may be incompatible with pipeline parallelism, -as the data will not be transferred to pipeline-parallel devices. +* There is no direct support for autograd. Detaching tensors is recommended to prevent memory losses. +* Such modifications may be incompatible with pipeline parallelism, +as the data will not be transferred to pipeline-parallel devices. ## Disclaimer