Fails to break long multibyte strings. #273
Labels
No labels
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: tslocum/tinyib#273
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
If you send a long multi-byte character string such as Japanese with TINYIB_WORDBREAK set, problems such as the post content being cut off in the middle will occur.
For example, posting long sentences without spaces, such as the one below, is common on Japanese futaba. However, the processing of WORDBREAK in multibyte encoding seems to be inappropriate, and it cuts off in the middle.
(Example of futaba's famous meme, KOUSHIROU-bot. Encode is UTF-8)
Sending this sentence results in something like the attached image.
We found that there was a problem in handling multi-byte strings in the handling of TINYIB_WORDBREAK, so we temporarily fixed it as follows. It is assumed that the encoding is UTF-8.
This modified code probably has the following problems.
This code will not work correctly when sending multibyte strings without the mbstring module enabled. However, most people who need to post UTF-8 multibyte strings on TinyIB will have the mbstring module enabled.
This code cannot use multibyte character codes other than UTF-8. This is because PCRE Functions does not work properly with multibyte character codes other than UTF-8 (such as sjis). If you run into this problem, you'll have no choice but to force UTF-8 encoding.
Thanks for reporting this. There are several areas of TinyIB which do not handle UTF-8 properly. I will look into this when I have the time.
This will probably require a version bump in the minimum supported PHP version. PHP 7 was released in late 2015, about eight years ago now.
Probably gettext doesn't work when using PHP version 5.
If I use gettext to translate, probably I need PHP 7.4 or later. Because gettext uses "class properties typing".
It did not work with PHP Version 5.6.40 due to the following error at gettext.
Thanks. I had upgraded the gettext library to a version that was incompatible with versions before PHP 7. I've resolved this just now so TinyIB can continue to work on PHP 5 and 6.
I checked the latest sources and confirmed that TinyIB works with PHP 5.6.40. In my environment, the modified part of TINYIB_WORDBREAK_MULTIBYTE also works, but this may be due to the behavior of mbstring.
In Japan, mbstring is used in almost all cases, so it is difficult to verify operation without mbstring.