Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -b output binary option #97

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jonahbeckford
Copy link

@jonahbeckford jonahbeckford commented Nov 9, 2024

Problem

Because cppo writes with open_out, CRLF are added to output files on Windows even when the input files are LF.

Context

  • .cmi CRC checksums are computed on the marshalled signatures of .mli files. These CRC checksums are the source of the The files xxx.cmi and yyy.cmi makes inconsistent assumptions on interface Zzz ocamlc/ocamlopt errors.
  • The marshalled signatures include the byte locations inside the .mli files (because the marshalled Types.signature indirectly references Lexer.position). Easy to see in the utop-full toplevel on any .cmi file:
    utop> require "compiler-libs" ;;
    utop> Cmi_format.read_cmi filename ;;
     (Types.Sig_modtype (<abstr>,
       {Types.mtd_type =
         Some
           (Types.Mty_signature
             [Types.Sig_value (<abstr>,
               {Types.val_type = <abstr>; val_kind = Types.Val_reg;
               val_loc =
                 {Location.loc_start = {Lexing.pos_fname = "endianString.cppo.mli"; pos_lnum = 21; pos_bol = 1326; pos_cnum = 1328}; 
                 loc_end = {Lexing.pos_fname = "endianString.cppo.mli"; pos_lnum = 21; pos_bol = 1326; pos_cnum = 1364}; 
                 loc_ghost = false};
               val_attributes = []; val_uid = Types.Uid.Item {Types.Uid.comp_unit = "EndianString"; id = 0}},
               Types.Exported);
    
  • CRLF inserted in .mli will therefore have different CRC checksums than original LF-ending .mli files

The net effect is that I can't share .cmi/.cma between Unix and Windows machines if CPPO is used as a preprocessor. The files xxx.cmi and yyy.cmi makes inconsistent assumptions on interface Zzz is the result.

Solution

Use open_out_bin. This PR hides it behind a command line option, but I actually think binary mode should always have been used.

Copy link
Member

@liyishuai liyishuai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an example in the test directory.

@jonahbeckford
Copy link
Author

jonahbeckford commented Nov 15, 2024

Sorry, but I have no clue what would be added there!

Also ...

but I actually think binary mode should always have been used.

It seems like the code was incorrect from the beginning, and this PR adding a CLI option is overkill. Can we start with that first ... why would cppo ever need to add carriage returns if the source has no carriage returns?

@liyishuai
Copy link
Member

It seems like the code was incorrect from the beginning, and this PR adding a CLI option is overkill. Can we start with that first ... why would cppo ever need to add carriage returns if the source has no carriage returns?

Yes this design makes more sense to me. We can change the default behavior to use open_out_bin, and provide some -text flag for backward compatibility.

@liyishuai
Copy link
Member

liyishuai commented Nov 16, 2024

Sorry, but I have no clue what would be added there!

  1. Input files *.cppo.bin in LF and in CRLF;
  2. Expected output files *.ref.bin when processed in binary mode and in text mode, on Windows and on Linux;
  3. .gitattribute configuration that tells Git to treat *.bin files as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants