Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Wide Strings in SOCI for Enhanced Unicode Handling #1133

Closed
wants to merge 71 commits into from

Conversation

bold84
Copy link

@bold84 bold84 commented Mar 22, 2024

This pull request adds comprehensive support for wide strings (wchar_t, std::wstring) to the SOCI database library, significantly improving its support for Unicode string types such as SQL Server's NVARCHAR and NTEXT. This enhancement is crucial for applications that require robust handling of international and multi-language data.

Key Changes:

  1. Introduced exchange_type_traits and exchange_traits Specializations:

    • These specializations facilitate the handling of wide strings during type exchange, ensuring proper conversion and management within the SOCI library.
  2. Updated ODBC Backend:

    • Added support for wide strings, specifically for wchar_t and std::wstring.
    • Adjusted the parameter binding and data retrieval mechanisms to correctly process wide characters.
  3. Enhanced Buffer Management:

    • Modified buffer allocation and management to accommodate the larger size of wide characters, which are essential for proper Unicode support.
    • Implemented logic to handle buffer size overflow, ensuring safety and stability when processing large text data.
  4. Improved Unicode Support:

    • Incorporated routines to convert between different Unicode encodings (UTF-16 and UTF-32 on Unix-like systems, native UTF-16 on Windows) to handle wide strings properly across various platforms.
  5. Extended Test Coverage:

    • Added comprehensive tests focusing on wide string handling, especially ensuring compatibility with SQL Server.
    • Included edge cases for large strings to test buffer management and overflow handling.

Notes:

  • The focus of this pull request is on the ODBC backend.
  • While the original pull request mentioned a focus on C++17 standards, this has been removed to maintain compatibility with earlier C++ versions, ensuring broader usability of these enhancements.
  • This work sets a foundational framework to extend wide string support to other database backends in SOCI in future updates.

This update significantly bolsters SOCI's capabilities in handling Unicode data, making it a more versatile and powerful tool for database interactions in multi-language applications.

Example usage

Here are a few examples showing how the new wide string features can be used with the ODBC backend.

Example 1: Handling std::wstring in SQL Queries

Inserting and Selecting std::wstring Data

#include <soci.h>
#include <soci-odbc.h>
#include <iostream>

int main()
{
    try
    {
        soci::session sql(soci::odbc, "DSN=my_datasource;UID=user;PWD=password");

        // Create table with NVARCHAR column
        sql << "CREATE TABLE soci_test (id INT IDENTITY PRIMARY KEY, wide_text NVARCHAR(40) NULL)";

        // Define a wstring to insert
        std::wstring wide_str_in = L"Hello, 世界!";
        
        // Insert the wstring
        sql << "INSERT INTO soci_test(wide_text) VALUES (:wide_text)", soci::use(wide_str_in);

        // Retrieve the wstring
        std::wstring wide_str_out;
        sql << "SELECT wide_text FROM soci_test WHERE id = 1", soci::into(wide_str_out);

        // Output the retrieved wstring
        std::wcout << L"Retrieved wide string: " << wide_str_out << std::endl;
    }
    catch (const soci::soci_error& e)
    {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

Example 2: Working with wchar_t Vectors

Inserting and Selecting Wide Characters

#include <soci.h>
#include <soci-odbc.h>
#include <iostream>
#include <vector>

int main()
{
    try
    {
        soci::session sql(soci::odbc, "DSN=my_datasource;UID=user;PWD=password");

        // Create table with NCHAR column
        sql << "CREATE TABLE soci_test (id INT IDENTITY PRIMARY KEY, wide_char NCHAR(2) NULL)";

        // Define a vector of wide characters to insert
        std::vector<wchar_t> wide_chars_in = {L'A', L'B', L'C', L'D'};
        
        // Insert the wide characters
        sql << "INSERT INTO soci_test(wide_char) VALUES (:wide_char)", soci::use(wide_chars_in);

        // Retrieve the wide characters
        std::vector<wchar_t> wide_chars_out(4);
        sql << "SELECT wide_char FROM soci_test WHERE id IN (1, 2, 3, 4)", soci::into(wide_chars_out);

        // Output the retrieved wide characters
        for (wchar_t ch : wide_chars_out)
        {
            std::wcout << L"Retrieved wide char: " << ch << std::endl;
        }
    }
    catch (const soci::soci_error& e)
    {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

Example 3: Using std::wstring with the sql Stream Operator

Inserting and Selecting std::wstring Data with Stream Operator

#include <soci.h>
#include <soci-odbc.h>
#include <iostream>

int main()
{
    try
    {
        soci::session sql(soci::odbc, "DSN=my_datasource;UID=user;PWD=password");

        // Create table with NVARCHAR column
        sql << "CREATE TABLE soci_test (id INT IDENTITY PRIMARY KEY, wide_text NVARCHAR(40) NULL)";

        // Define a wstring to insert
        std::wstring wide_str_in = L"Hello, 世界!";
        
        // Use stream operator to insert the wstring
        sql << "INSERT INTO soci_test(wide_text) VALUES (N'" << wide_str_in << "')";
        
        // Retrieve the wstring using stream operator
        std::wstring wide_str_out;
        sql << "SELECT wide_text FROM soci_test WHERE id = 1", soci::into(wide_str_out);

        // Output the retrieved wstring
        std::wcout << L"Retrieved wide string: " << wide_str_out << std::endl;
    }
    catch (const soci::soci_error& e)
    {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

In this example:

  1. A soci::session object is created to connect to the database.
  2. A table is created with an NVARCHAR column.
  3. A std::wstring is defined for insertion.
  4. The sql stream operator is used to insert the std::wstring into the database. Note the use of N' to indicate a Unicode string in SQL Server.
  5. The std::wstring is retrieved from the database using the sql stream operator and the soci::into function.
  6. The retrieved wide string is printed to the console using std::wcout.

These examples demonstrate how to insert and retrieve wide strings and wide characters using SOCI's newly added features for handling wide strings (wchar_t, std::wstring).

Limitation: The current implementation does not handle combining characters correctly. Combining characters, such as accents or diacritical marks, are treated separately instead of being combined with their base characters. This limitation may result in incorrect conversions for strings containing combining characters. A potential solution would be to incorporate Unicode normalization before the conversion process to ensure that combining characters are properly combined with their base characters.
Unicode defines several normalization forms (e.g., NFC, NFD, NFKC, NFKD), each with its own set of rules and behaviors. Choosing the appropriate normalization form is crucial, as different forms may produce different results.

To have full Unicode support, linking against a library like ICU or iconv is necessary. It can be made optional.

Disclaimer: This text is partially AI generated.

@bold84 bold84 marked this pull request as draft March 22, 2024 11:25
@bold84
Copy link
Author

bold84 commented Mar 22, 2024

Converting from UTF-16 to UTF-8 is no problem when retrieving data, because the column data type is known.
When inserting/updating though, it is not so straightforward, as we don't have programmatic knowledge of the column data type in advance.

I'm thinking of adding another argument to "soci::use()" that lets the developer override the data type that's used for the underlying ODBC call.

Another issue is the currently non-existing N'' enclosure for unicode strings for MSSQL in case of soci::use().

Another issue is the stream interface. Currently std::wstring isn't supported and as far as I understand, supporting it would require widening the query to UTF-16 before sending it to the DB.

@bold84 bold84 marked this pull request as ready for review March 22, 2024 13:09
Copy link
Member

@vadz vadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This globally looks good but there are globally 2 issues:

  1. The new functionality needs to be documented, notably it should be clearly stated that wstring and wchar_t are only supported in the ODBC backend (and only when using SQL Server?).
  2. The use of/checks for C++17 are confusing as it's not clear if it is required for wide char support or if it's just some kind of optimization (in the latter case I'd drop it, it's not worth the extra code complexity).

include/private/soci-vector-helpers.h Outdated Show resolved Hide resolved
include/soci/soci-backend.h Show resolved Hide resolved
src/core/use-type.cpp Outdated Show resolved Hide resolved
include/soci/odbc/soci-odbc.h Outdated Show resolved Hide resolved
@bold84 bold84 force-pushed the wstring_support branch from d1e7608 to a9f7996 Compare April 1, 2024 00:11
@bold84 bold84 marked this pull request as draft April 1, 2024 00:14
bold84 added 9 commits April 1, 2024 13:33
This commit updates the Unicode conversion functions to handle UTF-16 on Windows and UTF-32 on other platforms.

The changes include:

1. Updating the `utf8_to_wide` and `wide_to_utf8` functions to handle UTF-32 on Unix/Linux platforms.
2. Updating the `copy_from_string` function to handle UTF-16 on Windows and convert UTF-32 to UTF-16 on other platforms.
3. Updating the `bind_by_pos` function to handle UTF-16 on Windows and convert UTF-32 to UTF-16 on other platforms.
4. Adding a test case for wide strings in the ODBC MSSQL tests.
@vadz
Copy link
Member

vadz commented Nov 14, 2024

Thanks, I'm doing some final checks and I realize I have some more questions:

  1. Do we really need to have x_wchar? I realize that it's symmetric to x_char, but I just don't see what would it ever be useful for. I've never had to work with individual Unicode characters, only Unicode strings and a wchar_t is not even capable of representing individual characters (if they require surrogates in UTF-16 under Windows). So maybe it would be simpler to just avoid introducing it?
  2. It's unfortunate to have to define a special accumulate() overload in ref_counted_statement_base for wide strings. I wonder if we could find some way to avoid doing it, e.g. maybe adding to_string() to exchange_traits?
  3. I'm not sure if we really want to have all the functions inline in soci-unicode.h. Was there some particular reason to define them there or should we move the definitions for (some of) them to src/core/unicode.cpp?
  4. Last but the most important: I think we need to provide a fallback implementation of std::wstring for all the other backends, as I don't see how would you use it currently if you ever plan to use anything but SQL Server. I.e. I believe we should provide the default support for wstring, converting it to UTF-8, and then override it with native handling in the ODBC backend, rather than not supporting it in all the other backends at all.

What do you think?

@bold84
Copy link
Author

bold84 commented Nov 15, 2024

Thanks, I'm doing some final checks and I realize I have some more questions:

  1. Do we really need to have x_wchar? I realize that it's symmetric to x_char, but I just don't see what would it ever be useful for. I've never had to work with individual Unicode characters, only Unicode strings and a wchar_t is not even capable of representing individual characters (if they require surrogates in UTF-16 under Windows). So maybe it would be simpler to just avoid introducing it?

Not sure if it's really needed. There was a char, so I added a wchar_t. 🤷🏻‍♂️
But I agree with you, the value appears to be little.

  1. It's unfortunate to have to define a special accumulate() overload in ref_counted_statement_base for wide strings. I wonder if we could find some way to avoid doing it, e.g. maybe adding to_string() to exchange_traits?

I would have to look into that. But I can't promise when I can get to that. I have a full plate at the moment. 😕

  1. I'm not sure if we really want to have all the functions inline in soci-unicode.h. Was there some particular reason to define them there or should we move the definitions for (some of) them to src/core/unicode.cpp?

Good point. I will do that today, as it doesn't take long.

  1. Last but the most important: I think we need to provide a fallback implementation of std::wstring for all the other backends, as I don't see how would you use it currently if you ever plan to use anything but SQL Server. I.e. I believe we should provide the default support for wstring, converting it to UTF-8, and then override it with native handling in the ODBC backend, rather than not supporting it in all the other backends at all.

What do you think?

I fully agree. But at the time, I didn't want to make this a larger PR than it already is (and I expected it to be). It's quite a lot of fun to get it all right on all 3 platforms, as you know.
When do you plan to release 4.1?

@vadz
Copy link
Member

vadz commented Nov 15, 2024

Thanks, I can try to do (2) and do (3) myself, I just wanted to check if there was any reason not to.

As for 4.1, I wanted to make it a.s.a.p. just because I feel like I've already delayed it way too much...

@bold84
Copy link
Author

bold84 commented Nov 15, 2024

Thanks, I can try to do (2) and do (3) myself, I just wanted to check if there was any reason not to.

As for 4.1, I wanted to make it a.s.a.p. just because I feel like I've already delayed it way too much...

Okay, then I will try to see if I can get 4 done using an LLM. I fed the relevant sources and the diff to 2 LLMs and one of the responses look very good for mysql. I will look into this later tonight.

@vadz
Copy link
Member

vadz commented Nov 15, 2024

Concerning (2): after looking at it again, I think this accumulate() overload should be just removed. AFAICS it's only used in order to allow using session << wide_string, but it doesn't really make sense to implicitly convert wide strings to UTF-8 here, this is inconsistent with the standard streams and hides an important conversion by making it implicit.

We want using wstring, i.e. passing them to soci::use(), to work, but I don't think we want to be able to use them directly in the query at all. Why would we want this?

@bold84
Copy link
Author

bold84 commented Nov 15, 2024

Concerning (2): after looking at it again, I think this accumulate() overload should be just removed. AFAICS it's only used in order to allow using session << wide_string, but it doesn't really make sense to implicitly convert wide strings to UTF-8 here, this is inconsistent with the standard streams and hides an important conversion by making it implicit.

We want using wstring, i.e. passing them to soci::use(), to work, but I don't think we want to be able to use them directly in the query at all. Why would we want this?

So we don't have to convert them before using them. Naturally, there are always pros and cons.
Consider all the code as an offer, I tried to make it as complete (for one backend) as possible.

@vadz
Copy link
Member

vadz commented Nov 15, 2024

Sorry, I still don't understand: under which circumstances would you have wide strings as part of the query instead of using them as parameters?

Anyhow, I've realized there is a worse problem here:

  std::string str_out_utf8;
  sql << "select wide_text from soci_test", into(str_out_utf8);

doesn't work correctly, the existing "MS SQL wide string" test only passes because it uses ASCII strings. Changing them to this:

  std::wstring const str_in = L"Привет, SOCI!";
  std::string const str_in_utf8 = "\xd0\x9f\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82, SOCI!";

makes it fail, at least when using unixODBC SQL Server driver under Linux.

@vadz
Copy link
Member

vadz commented Nov 15, 2024

But actually I don't see anything that would even attempt to make this work, so I think it's just the test which is wrong and needs to be removed. I.e. there is no provision for retrieving SQL_WVARCHAR columns as UTF-8-encoded strings at all there currently, is there?

@vadz
Copy link
Member

vadz commented Nov 15, 2024

Also, while doing (3) I've realized that this code looks like something extracted from another project as it's not written in SOCI style. What is its copyright/licence and can we include it in SOCI at all?

@vadz
Copy link
Member

vadz commented Nov 15, 2024

I've pushed my changes to #1179, but I still have important questions about the origin of the Unicode-related functions code.

@bold84
Copy link
Author

bold84 commented Nov 15, 2024

Also, while doing (3) I've realized that this code looks like something extracted from another project as it's not written in SOCI style. What is its copyright/licence and can we include it in SOCI at all?

That is likely because of the influence that OpenAI's, Anthropic's and Alibaba's LLMs had on the code. 😉
I have not extracted any code from another project / source.

@bold84
Copy link
Author

bold84 commented Nov 15, 2024

I ran the code through copyleaks.

CleanShot 2024-11-16 at 01 31 44@2x

soci-unicode.h - report.pdf

@bold84
Copy link
Author

bold84 commented Nov 15, 2024

Sorry, I still don't understand: under which circumstances would you have wide strings as part of the query instead of using them as parameters?

Anyhow, I've realized there is a worse problem here:

  std::string str_out_utf8;
  sql << "select wide_text from soci_test", into(str_out_utf8);

doesn't work correctly, the existing "MS SQL wide string" test only passes because it uses ASCII strings. Changing them to this:

  std::wstring const str_in = L"Привет, SOCI!";
  std::string const str_in_utf8 = "\xd0\x9f\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82, SOCI!";

makes it fail, at least when using unixODBC SQL Server driver under Linux.

Good catch. I think this was the first test I added and at the time it did not contain unicode strings yet, as I focussed only on wstring at first. So, I think this is an issue that should be fixed. I will look into this tomorrow (it's 1:38 here now).

@vadz
Copy link
Member

vadz commented Nov 15, 2024

OK, thanks, I think we could merge #1179 then and enhance it further in other PRs because I really believe that the other problem is just due to support for interoperability between SQL_WVARCHAR and std::string being not implemented at all, not a bug per se.

FYI, I did change/fix a couple of things in the Unicode functions (calling a void function is_xxx is really confusing, and so is claiming that it returns something in its doc comments) and I'd like to change a few more things there too, but I think that we're perhaps going to switch to using ICU anyhow in the future, so I didn't want to spend time on this now.

@bold84
Copy link
Author

bold84 commented Nov 15, 2024

#1179

Thank you for the fixes.
I think using ICU is definitely the better solution!
Okay, I will add support for the other backends in new PRs.

@vadz
Copy link
Member

vadz commented Nov 16, 2024

Sorry, it's worse than I thought: it's broken under Windows when using native ODBC driver, see e.g. this build job. If even this doesn't work, we really can't merge it.

@bold84
Copy link
Author

bold84 commented Nov 16, 2024

Sorry, it's worse than I thought: it's broken under Windows when using native ODBC driver, see e.g. this build job. If even this doesn't work, we really can't merge it.

Hmm... the tests did pass before. I am not sure what the cause is.
I will have a look.

@vadz
Copy link
Member

vadz commented Nov 16, 2024

They passed before because they used 7 bit ASCII strings. I've modified them to use non-ASCII ones and checked that this still worked under Linux, but not under Windows -- where they fail.

@bold84
Copy link
Author

bold84 commented Nov 16, 2024

#1179

I see.
I'm going to fix it now.

@bold84
Copy link
Author

bold84 commented Nov 16, 2024

It's actually a limitation of the Microsoft compiler. A better explanation follows within the next 30 minutes.

@bold84
Copy link
Author

bold84 commented Nov 16, 2024

When using wide string literals (L"…") in C++ on Windows, particularly with non-ASCII characters like Cyrillic, it's important to understand how these are interpreted:

  1. Wide String and wchar_t: On Windows, wchar_t typically represents UTF-16 encoded values. However, direct use of wide string literals with non-ASCII characters could lead to misinterpretation if the source file's encoding and the compiler's expected encoding don't match.

  2. Source File Encoding: The source file containing the code must be saved with an encoding that matches the expected encoding when compiled. Commonly, UTF-8 is used, but Windows by default might read it differently if not specified with BOM (Byte Order Mark) or compiler options.

  3. Compiler Behavior: The Microsoft compiler (MSVC), for instance, may interpret characters in wide string literals based on the system's locale and code page if the encoding isn't explicitly defined, leading to misinterpretation of characters.

  4. Usage of Unicode Escape Sequences: To ensure accurate representation irrespective of encoding issues, you can use Unicode escape sequences (e.g., L"\u041F") which define characters by their Unicode code point. This guarantees that characters are correctly represented as UTF-16, the intended wide character format for databases like MS SQL.

  5. Database Consideration: Databases expecting UTF-16 encoding, like MS SQL for NVARCHAR columns, require that the application code correctly interprets and transmits the string in this format.

std::wstring const str_in = L"Привет, SOCI!";

leads to:

CleanShot 2024-11-16 at 23 09 17@2x

std::wstring const str_in = L"\u041F\u0440\u0438\u0432\u0435\u0442, SOCI!";

leads to:

CleanShot 2024-11-16 at 23 10 34@2x

This is also the reason I used this method in the unicode tests.

Here are the updated versions of the test methods:

TEST_CASE("MS SQL wide string", "[odbc][mssql][wstring]")
{
  soci::session sql(backEnd, connectString);

  wide_text_table_creator create_wide_text_table(sql);
  
  // std::wstring const str_in = L"Привет, SOCI!";
  std::wstring const str_in = L"\u041F\u0440\u0438\u0432\u0435\u0442, SOCI!";
  
  sql << "insert into soci_test(wide_text) values(:str)", use(str_in);

  std::wstring str_out;
  sql << "select wide_text from soci_test", into(str_out);

  CHECK(str_out == str_in);
}

TEST_CASE("MS SQL wide string vector", "[odbc][mssql][vector][wstring]")
{
  soci::session sql(backEnd, connectString);

  wide_text_table_creator create_wide_text_table(sql);

  std::vector<std::wstring> const str_in = {
      L"\u041F\u0440\u0438\u0432\u0435\u0442, SOCI!", // Привет, SOCI!
      L"\u041F\u0440\u0438\u0432\u0435\u0442, World!", // Привет, World!
      L"\u041F\u0440\u0438\u0432\u0435\u0442, Universe!", // Привет, Universe!
      L"\u041F\u0440\u0438\u0432\u0435\u0442, Galaxy!"}; // Привет, Galaxy!

  sql << "insert into soci_test(wide_text) values(:str)", use(str_in);

  std::vector<std::wstring> str_out(4);

  sql << "select wide_text from soci_test", into(str_out);

  CHECK(str_out.size() == str_in.size());
  for (std::size_t i = 0; i != str_in.size(); ++i)
  {
    CHECK(str_out[i] == str_in[i]);
  }
}

Result:

CleanShot 2024-11-16 at 23 13 53@2x

@bold84
Copy link
Author

bold84 commented Nov 16, 2024

c00baa4

@vadz
Copy link
Member

vadz commented Nov 16, 2024

Strange, I thought we already used /utf-8 because I saw that the tests using L"..." in test-unicode.cpp passed, how comes it works there but not in this file?

But we can use \u, of course, even if it's not nice...

@bold84
Copy link
Author

bold84 commented Nov 16, 2024

I remember that I have had lots of trouble before on Windows because of this issue.
MSVC doesn't seems to be compliant. GCC and Clang are. But that's probably not a surprise.

@bold84
Copy link
Author

bold84 commented Nov 16, 2024

We're using L"..." and \u.

Generally, the following should work as well:

std::u16string const utf16 = u"Привет, SOCI!";
std::wstring str_in(utf16.begin(), utf16.end());

but it doesn't with MSVC.

I created another test (scroll right for comments):

TEST_CASE("MS SQL utf8-to-utf16 string", "[odbc][mssql][wstring]")
{
 soci::session sql(backEnd, connectString);

 wide_text_table_creator create_wide_text_table(sql);
 
 std::string const utf8 = "Привет, SOCI!";                                                                   // works
 //std::string const utf8 = u8"Привет, SOCI!";                                                               // doesn't work
 // std::string const utf8 = "\xD0\x9F\xD1\x80\xD0\xB8\xD0\xB2\xD0\xB5\xD1\x82\x2C\x20\x53\x4F\x43\x49\x21"; // works
 std::u16string utf16 = soci::details::utf8_to_utf16(utf8);
 std::wstring str_in(utf16.begin(), utf16.end());
 

 sql << "insert into soci_test(wide_text) values(:str)", use(str_in);

 std::wstring str_out;
 sql << "select wide_text from soci_test", into(str_out);

 CHECK(str_out == str_in);
}

@vadz
Copy link
Member

vadz commented Nov 16, 2024

OK, I figured it out: the tests in test-unicode.cpp are broken too, it's just that they still pass because the LHS and RHS of comparisons are broken in the same way, so the tests are still true.

Also, the fact that scalar ODBC wstring test and the first 2 checks in the vector test pass is due to the fact that the column size is 40 and when interpreting the file as CP1252 and not UTF-8 (as MSVC did), the last 2 strings became longer than 40 characters, while the first 2 ones still fit.

Finally, while debugging this, I've reread the code more carefully and I see many redundant comments and other things which don't make sense (why have utf16_to_wide() only to never use it and duplicate what it does in the code which needs it?). I realize now that it must have been generated by an LLM and while I'm not categorically against using LLMs, like some other people, I am strongly against not reviewing their results carefully. I'll fix them myself for this PR but I really hope you can do it if you keep using them.

@vadz
Copy link
Member

vadz commented Nov 16, 2024

It's really not great that the more I look at this code, the more problems I find in it :-(

Dividing the column size by sizeof(SQLWCHAR) is wrong, the column size is in characters, not in bytes.

@bold84
Copy link
Author

bold84 commented Nov 16, 2024

Great that you were able to figure this out. I wasn't able to back then. And due to posts on numerous websites (StackOverflow and others) suggesting an issue with MSVC while the builds with GCC and Clang worked, I ended up believing that it's really MSVC. I took a wrong turn when I should have dug deeper. Clearly a mistake by me. 🤷🏻‍♂️

The utf16_to_wide was used before but the consuming code was removed.
Similar to how utf8_to_wide usage was removed when you removed the accumulate overload.

I'm much more aware of (and experienced with) the issues that come with LLM generated code than I was when most of this code was created.
It's a large change/addition, created over a period of 1.5 years with several long interruptions. I would be surprised if you didn't find any issue in it. Without LLM usage, you wouldn't have seen a "division by sizeof(SQLWCHAR)" though. 😅

@vadz
Copy link
Member

vadz commented Nov 17, 2024

I've removed most of unnecessary LLM-generated stuff and made conversions more efficient to avoid copying potentially huge strings multiple times.

There is still a lot of completely unused code remaining, notably conversions from UTF-8 and also one of UTF-16/32 conversions to it. I'm not sure if it should be removed or preserved to be used later for wstring support in the other backends, for now I'm leaving it here just because I already spent vastly more time on this than I ever planned to.

I'm also closing this PR and will merge #1179 if nobody sees any other problems with it.

@vadz vadz closed this Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants