Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: handling files from multipart/form-data request #415

Merged
merged 15 commits into from
Jun 3, 2024

Conversation

lsdch
Copy link
Contributor

@lsdch lsdch commented Apr 26, 2024

This PR aims to improve the handling of multipart/form-data requests by:

  • 3570bf1 allowing users to document the request body at the level of operations, which is currently not possible since it is overwritten by huma when RawBody is multipart.FormData. The generated schema is more of an example and does not reflect the actual form data structure, which should be specified by the user.
  • d6eb219 implementing MultipartFormFiles, which can be used to type RawBody in request inputs and provides logic for decoding and validating files from a multipart/form-data request as multipart.File values, as well as generating the associated schema in the spec. Although this could be extended to support data types other than files, it would require significant refactoring to reuse some of the logic from body/query/path/header parameters decoding, and I was not confident enough to engage into it.
  • 01ca953 implementing mimetype validation for submitted files

There is currently no automated tests for this PR, I would like to first get some feedback before spending more time on this (and still have to get familiar with humatest).

Example usage:

type FileData struct {
        // This is an example, any number of `multipart.File` fields can be defined.
        // Nested structs are not supported.
	SomeFile multipart.File   `form-data:"some_file" content-type:"image/png" required:"true"`
	SeveralFiles []multipart.File `form-data:"several_files" content-type:"image/png,image/jpeg" required:"true"`
}

type FileHandlerInput struct {
	RawBody huma.MultipartFormFiles[FileData]
}

func FileHandler(ctx context.Context, input *FileHandlerInput) (*struct{}, error) {
  fileData := input.RawBody.Data()
  DoSomeThingWith(fileData.SomeFile)
  OrSomethingElseWith(fileData.SeveralFiles)
}

huma.Register(api,
  huma.Operation{
	  Path:             "/handle-files",
	  Method:        http.MethodPost,
	  OperationID:     "Handle files",
  }, FileHandler)

When using MultipartFormFiles[T], only the fields in T that implement multipart.File or []multipart.File are considered when generating the spec or resolving a request. The raw form data can be accessed at MultipartFormFiles.Form which is a multipart.Form.

form-data tag specifies the lookup key to retrieve the encoded file in the multipart.Form. In its absence, the name of the field is used instead.
content-type specifies the acceptable mime types for a file. It is used to document the endpoint in the spec, and validate file format using http.DetectContentType
required makes the field required. In the case of []multipart.File, validation checks that at least one file was submitted.

Generated operation spec :

{
  "/handle-files": {
    "post": {
      "operationId": "HandleFiles",
      "requestBody": {
        "content": {
          "multipart/form-data": {
            "encoding": {
              "some_file": {
                "contentType": "image/png"
              },
              "several_files": {
                "contentType": "image/png,image/jpeg"
              }
            },
            "schema": {
              "properties": {
                "some_file": {
                  "format": "binary",
                  "type": "string"
                },
                "several_files": {
                  "items": {
                    "format": "binary",
                    "type": "string"
                  },
                  "type": "array"
                }
              },
              "required": [
                "some_file",
                "several_files"
              ],
              "type": "object"
            }
          }
        }
      }
    }
  }
}

Summary by CodeRabbit

  • New Features

    • Introduced functionality for handling multipart form data in HTTP requests.
    • Added support for validating MIME types in uploaded files.
    • Enhanced the ability to read and decode form data, including single and multiple file uploads.
  • Tests

    • Added multiple test cases to ensure robust handling of multipart form data, including validation, error handling, and content type detection.

Copy link

codecov bot commented Apr 26, 2024

Codecov Report

Attention: Patch coverage is 93.93939% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 92.80%. Comparing base (e1b7179) to head (f1a58f1).

Files Patch % Lines
formdata.go 94.87% 4 Missing and 4 partials ⚠️
huma.go 90.47% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #415      +/-   ##
==========================================
+ Coverage   92.76%   92.80%   +0.03%     
==========================================
  Files          21       22       +1     
  Lines        3567     3750     +183     
==========================================
+ Hits         3309     3480     +171     
- Misses        220      226       +6     
- Partials       38       44       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lsdch lsdch marked this pull request as draft April 26, 2024 14:20
@danielgtaylor
Copy link
Owner

@lsdch this looks amazing! I will dig in and give it a thorough review soon, just wanted to say thanks for all the work on this and I like the idea of better file handling & documentation.

formdata.go Outdated
var contentTypeSplitRe = regexp.MustCompile(`,\s*`)

func NewMimeTypeValidator(encoding *Encoding) MimeTypeValidator {
return MimeTypeValidator{accept: contentTypeSplitRe.Split(encoding.ContentType, -1)}
Copy link
Contributor

@x-user x-user Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the point of using regex? Why not strings.Split or strings.Fileds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

466d277 Changed it to use strings.Split, I do agree it's better to avoid regex unless necessary.

@lsdch lsdch force-pushed the multipart-form-files branch from 4188450 to f1a58f1 Compare May 18, 2024 12:03
Copy link

coderabbitai bot commented May 18, 2024

Walkthrough

The changes introduce extensive support for handling multipart form data in HTTP requests within the huma package. This includes defining structures and methods for reading, validating, and decoding form files, as well as adding comprehensive test cases to ensure robust functionality.

Changes

Files Change Summary
formdata.go Added structures and methods for handling multipart form data, including file reading, validation, and decoding.
huma.go Renamed rawBodyMultipart to rawBodyDecodedMultipart in the Register function.
huma_test.go Added multiple test cases for multipart file handling, including validation, optional/required files, and content type checks.

🐰
In bytes and streams, we trust,
To handle forms, precise and just.
With MIME types checked and files read right,
Our code now gleams, a coder's delight.
So cheers to tests, both new and bold,
Our multipart forms, a tale well told.
🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@lsdch
Copy link
Contributor Author

lsdch commented May 18, 2024

@danielgtaylor hey, finally took some time to write tests for this feature. Coverage is almost 100%, except for 3 error checks on file I/O that are hard to trigger.

API has changed a little, I introduced small convenience features to help working with optional files and content-type.
Also changed tag names to be more consistent with the naming conventions in Huma -- I don't mind changing them again to whatever you like.

I expect that some of the code could be better optimized, especially the parts using reflect, which I do not know in depth yet.

New usage example :

type FileData struct {
        // Now using huma.FormFile instead of multipart.File
        // FormFile is a simple wrapper around multipart.File which provides some convenience features
	SomeFile huma.FormFile   `form:"some_file" contentType:"image/png" required:"true"`
	SomeImage huma.FormFile   `form:"some_image" contentType:"image/*"`
	SeveralFiles []huma.FormFile `form:"several_files" contentType:"image/png,image/jpeg" required:"true"`
}

type FileHandlerInput struct {
	RawBody huma.MultipartFormFiles[FileData]
}

func FileHandler(ctx context.Context, input *FileHandlerInput) (*struct{}, error) {
  fileData := input.RawBody.Data()
  DoSomeThingWith(fileData.SomeFile)
  OrSomethingElseWith(fileData.SeveralFiles)

  if fileData.SomeImage.IsSet { // We can now check the presence of optional files
    fmt.Printf("Content type: %s", fileData.SomeImage.ContentType) // Actual content type (declared in the form or detected as fallback)
  } 
}

huma.Register(api,
  huma.Operation{
	  Path:             "/handle-files",
	  Method:        http.MethodPost,
	  OperationID:     "Handle files",
  }, FileHandler)

@lsdch lsdch marked this pull request as ready for review May 18, 2024 12:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between e1b7179 and f1a58f1.
Files selected for processing (3)
  • formdata.go (1 hunks)
  • huma.go (4 hunks)
  • huma_test.go (2 hunks)
Additional Context Used
GitHub Check Runs (1)
codecov/patch success (6)

formdata.go: [warning] 34-34: formdata.go#L34
Added line #L34 was not covered by tests


formdata.go: [warning] 44-44: formdata.go#L44
Added line #L44 was not covered by tests


formdata.go: [warning] 51-51: formdata.go#L51
Added line #L51 was not covered by tests


formdata.go: [warning] 91-91: formdata.go#L91
Added line #L91 was not covered by tests


huma.go: [warning] 641-641: huma.go#L641
Added line #L641 was not covered by tests


huma.go: [warning] 664-664: huma.go#L664
Added line #L664 was not covered by tests

Additional comments not posted (14)
formdata.go (1)

34-34: Increase test coverage for critical error handling paths.

The lines handling default MIME types and file opening errors are not covered by tests. Ensuring these are tested will improve the robustness of the feature.

Also applies to: 44-44, 51-51, 91-91

huma.go (3)

629-631: Ensure that the type check for MultipartFormFiles is robust.


641-641: This line was not covered by tests.


664-664: This line was not covered by tests.

huma_test.go (10)

14-14: Added import statement for os package.

This import is necessary for file operations in the test cases, particularly for handling multipart file uploads.


739-787: Comprehensive test for multipart file upload handling.

This test case effectively validates the handling of multipart form data, including required and optional fields, and checks the OpenAPI spec generation. The use of assert and require ensures that the test will halt on the first error, which is appropriate for unit tests.


813-852: Validates error handling for missing required multipart fields.

The test correctly checks for HTTP 422 Unprocessable Entity responses when required fields are missing, and it properly uses JSON unmarshalling to verify the error details.


855-867: Properly tests the handling of optional multipart fields.

The test ensures that the absence of an optional field does not set the IsSet flag, which is a crucial check for optional multipart form data handling.


875-909: Checks handling of incorrect cardinality in multipart file uploads.

This test case effectively checks for errors when multiple files are uploaded where only one is expected, validating the framework's ability to enforce file upload constraints.


912-943: Validates MIME type enforcement in multipart file uploads.

The test case correctly identifies and handles cases where the uploaded file's MIME type does not match the expected type, ensuring robust input validation.


946-962: Ensures default content type handling for multipart file uploads.

The test verifies that the default content type "application/octet-stream" is used when no specific content type is provided, which aligns with HTTP standards for binary data.


975-989: Tests wildcard content type handling in multipart file uploads.

This test case effectively handles wildcard content types, allowing for flexibility in accepted file types while still enforcing the specified major type.


1003-1015: Properly tests image file upload handling with specific content types.

The test ensures that image files are correctly handled and that the content type is validated against the expected types, which is crucial for handling binary data securely.


1035-1047: Tests the system's ability to detect and validate the content type of uploaded images.

This test case is crucial for ensuring that the system can correctly identify and validate the content type of uploaded files, which enhances security and data integrity.

Comment on lines +28 to +36
func NewMimeTypeValidator(encoding *Encoding) MimeTypeValidator {
var mimeTypes = strings.Split(encoding.ContentType, ",")
for i := range mimeTypes {
mimeTypes[i] = strings.Trim(mimeTypes[i], " ")
}
if len(mimeTypes) == 0 {
mimeTypes = []string{"application/octet-stream"}
}
return MimeTypeValidator{accept: mimeTypes}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider handling empty ContentType more robustly.

The current implementation defaults to "application/octet-stream" only when no content types are specified. It might be beneficial to handle cases where the content type is provided but empty, ensuring robustness.

- if len(mimeTypes) == 0 {
+ if len(mimeTypes) == 0 || (len(mimeTypes) == 1 && mimeTypes[0] == "") {
    mimeTypes = []string{"application/octet-stream"}
  }

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
func NewMimeTypeValidator(encoding *Encoding) MimeTypeValidator {
var mimeTypes = strings.Split(encoding.ContentType, ",")
for i := range mimeTypes {
mimeTypes[i] = strings.Trim(mimeTypes[i], " ")
}
if len(mimeTypes) == 0 {
mimeTypes = []string{"application/octet-stream"}
}
return MimeTypeValidator{accept: mimeTypes}
func NewMimeTypeValidator(encoding *Encoding) MimeTypeValidator {
var mimeTypes = strings.Split(encoding.ContentType, ",")
for i := range mimeTypes {
mimeTypes[i] = strings.Trim(mimeTypes[i], " ")
}
if len(mimeTypes) == 0 || (len(mimeTypes) == 1 && mimeTypes[0] == "") {
mimeTypes = []string{"application/octet-stream"}
}
return MimeTypeValidator{accept: mimeTypes}

formdata.go Show resolved Hide resolved
formdata.go Show resolved Hide resolved
formdata.go Show resolved Hide resolved
huma.go Show resolved Hide resolved
Comment on lines +1165 to +1179
if rawBodyMultipart {
f.Set(reflect.ValueOf(*form))
} else {
f.FieldByName("Form").Set(reflect.ValueOf(form))
r := f.Addr().
MethodByName("Decode").
Call([]reflect.Value{
reflect.ValueOf(op.RequestBody.Content["multipart/form-data"]),
})
errs := r[0].Interface().([]error)
if errs != nil {
WriteErr(api, ctx, http.StatusUnprocessableEntity, "validation failed", errs...)
return
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring this block to reduce complexity and improve readability.

- if rawBodyMultipart {
-     f.Set(reflect.ValueOf(*form))
- } else {
-     f.FieldByName("Form").Set(reflect.ValueOf(form))
-     r := f.Addr().
-         MethodByName("Decode").
-         Call([]reflect.Value{
-             reflect.ValueOf(op.RequestBody.Content["multipart/form-data"]),
-         })
-     errs := r[0].Interface().([]error)
-     if errs != nil {
-         WriteErr(api, ctx, http.StatusUnprocessableEntity, "validation failed", errs...)
-         return
-     }
- }
+ handleMultipartForm(f, form, rawBodyMultipart, op, api, ctx)

Consider extracting this logic into a separate function handleMultipartForm to handle the multipart form based on the rawBodyMultipart flag.


Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if rawBodyMultipart {
f.Set(reflect.ValueOf(*form))
} else {
f.FieldByName("Form").Set(reflect.ValueOf(form))
r := f.Addr().
MethodByName("Decode").
Call([]reflect.Value{
reflect.ValueOf(op.RequestBody.Content["multipart/form-data"]),
})
errs := r[0].Interface().([]error)
if errs != nil {
WriteErr(api, ctx, http.StatusUnprocessableEntity, "validation failed", errs...)
return
}
}
if rawBodyMultipart {
f.Set(reflect.ValueOf(*form))
} else {
f.FieldByName("Form").Set(reflect.ValueOf(form))
r := f.Addr().
MethodByName("Decode").
Call([]reflect.Value{
reflect.ValueOf(op.RequestBody.Content["multipart/form-data"]),
})
errs := r[0].Interface().([]error)
if errs != nil {
WriteErr(api, ctx, http.StatusUnprocessableEntity, "validation failed", errs...)
return
}
}
```
</details>
<!-- suggestion_end -->
<!-- This is an auto-generated comment by CodeRabbit -->

@danielgtaylor
Copy link
Owner

@lsdch awesome work! I will dig in and do a thorough review ASAP! Thank you.

Copy link
Owner

@danielgtaylor danielgtaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work!

@danielgtaylor danielgtaylor merged commit fa7a38d into danielgtaylor:main Jun 3, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants