Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Native DOE Integration for FAST-OAD #576

Open
enricostragiotti opened this issue Nov 8, 2024 · 5 comments
Open

[Proposal] Native DOE Integration for FAST-OAD #576

enricostragiotti opened this issue Nov 8, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@enricostragiotti
Copy link
Contributor

Design of Experiments (DOE) is a valuable tool in aeronautical engineering, enabling rapid exploration of an aircraft's design space. DOE allows engineers to compare design choices, assess the impact of new technologies, and evaluate sensitivity to various parameters. For these reasons, I believe it would be highly beneficial for FAST-OAD to support native DOE execution, both from configuration files and command-line interfaces, similarly to the current capabilities for MDA and MDO.

Main Features:

  • Seamless DOE Execution: Integrate DOE launch directly from the configuration file, enabling a user experience consistent with MDA and MDO processes.
  • Customizable Sampling: Allow users to define the sampling method for the design space (e.g., Latin Hypercube Sampling, full factorial, etc.).
  • Parallelized Execution: Support parallel execution of multiple FAST-OAD instances locally and on HPC resources to facilitate fast evaluation of hundreds of configurations.
  • Automatic File Organization: When initiating DOE, automatically generate subfolders for each run, containing all necessary files to locally relaunch MDA (i.e., configuration file, modified data file for the specific calculation ID, and mission files). A CSV file in the root directory will also record the input variables modified by the DOE for each run ID.

Nice-to-Have Features:

  • Nested DOE Support: Enable nested DOEs for constructing multifidelity surrogate models.
  • Support for Discrete and Categorical Variables: Although FAST-OAD currently may not support discrete variables, it would be useful to allow DOE configurations that include such variables—for instance, to evaluate design shifts from a T-tail to a conventional tail, or to test different numbers of engines.
  • Linked Variable Variation: Allow specific variables to be modified in tandem (e.g., linking the fuel efficiency parameter k for both cruise and takeoff stages), ensuring variables change together rather than independently.
  • Basic Post-Processing Tools: Provide users with basic post-processing options, including a CSV file of requested output variables, along with a scatter matrix (example: scatterplot matrix) and basic response surfaces.

Future Developments:

  • DOE-Driven Optimization: Use DOE results to identify the set of possible starting design points for subsequent optimization processes.
  • GUI for Result Analysis: A graphical user interface for result analysis could add significant value, ideally allowing users to build
    surrogate models, visualize contour plots, filter out configurations based on resutls, and conduct advanced sensitivity analyses with Sobol indices.

Expected User Inputs:

  • DOE variable names
  • Numerical bounds for each DOE variable, provided as absolute values or percentage variations relative to values in the data file
  • Sampling algorithm selection
    -Number of samples
  • Random seed value for reproducibility
  • Maximum processors allowed for evaluation
  • Output directory containing subfolders for each experiment run, along with a CSV summarizing modified input variables and an unique ID.
@enricostragiotti enricostragiotti added the enhancement New feature or request label Nov 8, 2024
@enricostragiotti
Copy link
Contributor Author

I add here a sample of what would the configuration file looks like:


# Input and output files
input_file: ./problem_inputs.xml
output_file: ./problem_outputs.xml

# Definition of problem driver assuming the OpenMDAO convention "import openmdao.api as om"
driver: om.ScipyOptimizeDriver(tol=1e-2, optimizer='COBYLA')

model:
  nonlinear_solver: om.NonlinearBlockGS(maxiter=100, atol=1e-2)
  linear_solver: om.DirectSolver()
  geometry:
    # An OpenMDAO component is identified by its "id"
    id: fastoad.geometry.legacy
    # ....
  performance:
    id: fastoad.performances.mission
    propulsion_id: fastoad.wrapper.propulsion.rubber_engine
    mission_file_path: ::sizing_mission
    out_file: ./flight_points.csv
    adjust_fuel: true
    is_sizing: true
  
design_of_experiments:
  - name: exploration_1
  # Multiple DOEs can be defined in the same configuration file. The name will 
  # be used to generate the subfolders of the calculations
    variables:
      # Description of the variables and the bounds used to create the DOE
      - name: data:geometry:wing:aspect_ratio
        lower: 9.0
        upper: 18.0
      - name: data:geometry:wing:MAC:at25percent:x
        # The bounds can be absolute or relative to a given value or to the value 
        # of the MDA (requires model definition in the configuration file)
        lower_perc: 20.0
        upper_perc: 20.0
        ref_value: evaluate
      - name: data:geometry:wing:sweep_25
        # We can mix percentage and absolute bounds
        lower_perc: 20.0
        upper: 26.0
        ref_value: 25.0
  - name: exploration_2
    variables:
      - name: tuning:propulsion:rubber_engine:SFC:k_cr
        lower: 0.8
        upper: 1.0
      - name: tuning:propulsion:rubber_engine:SFC:k_sl
        bounds_binding_to: tuning:propulsion:rubber_engine:SFC:k_cr
        # Variable can be bind together and share the same value in the DOE
    sampling:
    # Overwrite here the common settings (written later in the file)
      algorithm: FullFactorial
      n_samples: 20
  # Common settings (for all the DOEs in the file)
  sampling:
    algorithm: NestedLHS
    n_samples: 150
    seed_val: 12
    options:
      - nlevel=2
  computing:
    max_CPUs: 8
  output_folder: ./DOE

@rparello
Copy link

rparello commented Nov 14, 2024

A few things that would be nice to have:

  • a way to define inputs that are array into the DOE
  • a way to have a model option as DOE inputs
  • in accordance with the previous point, some inputs could be integer or even string to be chosen in a list in the case of option, not float
  • the possibilty to use different point selection method for different variables; for example, one could be LMS and a second full factorial

@christophe-david
Copy link
Contributor

Thanks for this detailed proposal.

I globally agree with the proposal, with a few details to rework IMHO (and that's why it's good to
have a prototype of the configuration file)

design_of_experiments:
  - name: exploration_1
  # Multiple DOEs can be defined in the same configuration file. The name will 
  # be used to generate the subfolders of the calculations

Is it really needed to allow multiple DoE definitions in the same configuration file? As you know, we only allow one definition for optimization, and I think it should be the same for DoE. I don't feel like it's that bad to ask users to have a configuration file for each DoE they want, even with the same model assembly.
Said differently, is this feature worth the additional complexity in development and usage? (that's another thing we could think ahead: what will be the API/CLI like to run the DoE?)

      - name: data:geometry:wing:MAC:at25percent:x
        # The bounds can be absolute or relative to a given value or to the value 
        # of the MDA (requires model definition in the configuration file)
        lower_perc: 20.0
        upper_perc: 20.0
        ref_value: evaluate

Not sure that I understand (requires model definition in the configuration file) and the evaluate thing. Given the variable you use for example, it looks like you want to use an output of the MDA as DoE variable, but I can't see how you could do that, so I guess I misunderstood. What is your point, here?

      - name: data:geometry:wing:sweep_25
        # We can mix percentage and absolute bounds
        lower_perc: 20.0
        upper: 26.0
        ref_value: 25.0

To me, usage of percentage is justified only if ref_value cannot be "hard coded" (this example would work as well without ref_value using lower: 20.). So, if evaluate is not usable, I suggest getting rid of percentages.

  - name: exploration_2
    variables:
      - name: tuning:propulsion:rubber_engine:SFC:k_cr
        lower: 0.8
        upper: 1.0
      - name: tuning:propulsion:rubber_engine:SFC:k_sl
        bounds_binding_to: tuning:propulsion:rubber_engine:SFC:k_cr
        # Variable can be bind together and share the same value in the DOE

Just detail: Ok for the feature, but the name should be changed. For the joke, I will tell that the correct writing would probably be bounds_bound_to:. But anyway, you don't want to bind just the limits. You want to bind the values, so same_values_as: would probably suit better.

  # Common settings (for all the DOEs in the file)
  sampling:
    algorithm: NestedLHS
    n_samples: 150
    seed_val: 12
    options:
      - nlevel=2
  computing:
    max_CPUs: 8
  output_folder: ./DOE

Here you do not respect YAML syntax, because sampling and others are at the same level as the
-name: ... elements. You are mixing list elements with dict elements (and also, nlevel=2 ??)

Luckily, if we give up the idea of having several definitions of DoE, we will only have variables,
sampling, etc… sections and all we be fine.

Finally, a word on naming: I guess you did not focus on this aspect for now, but I would not want
the current names to be considered final. I would recommend to generally avoid abbreviations
(seed_value would be just fine), and to be consistent, e.g. n_samples would be more consistent
with n_levels (and considering the abbreviation question here, maybe sample_count and
level_count would do).

@christophe-david
Copy link
Contributor

To sum up my remarks, here is the configuration file as I would currently see it:

# Input and output files
input_file: ./problem_inputs.xml
output_file: ./problem_outputs.xml

# Definition of problem driver assuming the OpenMDAO convention "import openmdao.api as om"
driver: om.ScipyOptimizeDriver(tol=1e-2, optimizer='COBYLA')

model:
  nonlinear_solver: om.NonlinearBlockGS(maxiter=100, atol=1e-2)
  linear_solver: om.DirectSolver()
  geometry:
    # An OpenMDAO component is identified by its "id"
    id: fastoad.geometry.legacy
    # ....
  performance:
    id: fastoad.performances.mission
    propulsion_id: fastoad.wrapper.propulsion.rubber_engine
    mission_file_path: ::sizing_mission
    out_file: ./flight_points.csv
    adjust_fuel: true
    is_sizing: true

design_of_experiments:
  # Description of the variables and the bounds used to create the DOE
  - name: data:geometry:wing:aspect_ratio
    lower: 9.0
    upper: 18.0
  - name: data:geometry:wing:sweep_25
    lower: 20.0
    upper: 26.0
  # Common settings (for all the DOEs in the file)
  sampling:
    algorithm: NestedLHS
    n_samples: 150
    seed_val: 12
    options:
      - nlevel=2
  computing:
    max_CPUs: 8
  output_folder: ./DOE

@christophe-david
Copy link
Contributor

A few things that would be nice to have:

* a way to define inputs that are array into the DOE

For that, I guess you would want to provide all the input values of the DoE, not to let these values being computed by LHS or something else (unless all the values in your array are expected to be independent, but then they would just be a bunch of scalars).
Anyway, I second the idea of allowing DoE inputs to be read from a file instead of computed.

* a way to have a model option as DOE inputs

* in accordance with the previous point, some inputs could be integer or even string to be chosen in a list in the case of option, not float

What you really want is discrete variables. Even about model option. If you want it as DoE inputs, then there would be a way to have it implemented using discrete variables instead of model options. We should probably add this feature to FAST-OAD before allowing it to be controlled from a DoE.

* the possibilty to use different point selection method for different variables; for example, one could be LMS and a second full factorial

I have no opinion about that. It would fit easily in the configuration file, but what is really the need: would it be nice to have, or very useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants