Electronics & Programming

develissimo

Open Source electronics development and programming

  • You are not logged in.
  • Root
  • » PHP
  • » [PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored [RSS Feed]

#1 March 18, 2008 04:52:32

Stephen B.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


Attached is a simple proposed patch that fixes Bug 43477. Basically, the code
that set the error mode of the ICU converter was giving it an instruction
(the context parameter) to only skip or substitute if the code point was not
represented in the new encoding. However, it still was returning an error for
illegal sequences.

The test suite returns the same results with or without the patch. Test also
attached.

-Stephen BachIndex: Zend/zend_unicode.c
===================================================================
RCS file: /repository/ZendEngine2/zend_unicode.c,v
retrieving revision 1.37
diff -u -r1.37 zend_unicode.c
--- Zend/zend_unicode.c 31 Dec 2007 07:12:07 -0000 1.37
+++ Zend/zend_unicode.c 15 Mar 2008 23:37:36 -0000
@@ -47,16 +47,16 @@

case ZEND_CONV_ERROR_SKIP:
if (direction == ZEND_FROM_UNICODE)
- ucnv_setFromUCallBack(conv, UCNV_FROM_U_CALLBACK_SKIP, UCNV_SKIP_STOP_ON_ILLEGAL, NULL, NULL, &status);
+ ucnv_setFromUCallBack(conv, UCNV_FROM_U_CALLBACK_SKIP, NULL, NULL, NULL, &status);
else
- ucnv_setToUCallBack(conv, UCNV_TO_U_CALLBACK_SKIP, UCNV_SKIP_STOP_ON_ILLEGAL, NULL, NULL, &status);
+ ucnv_setToUCallBack(conv, UCNV_TO_U_CALLBACK_SKIP, NULL, NULL, NULL, &status);
break;

case ZEND_CONV_ERROR_SUBST:
if (direction == ZEND_FROM_UNICODE)
- ucnv_setFromUCallBack(conv, UCNV_FROM_U_CALLBACK_SUBSTITUTE, UCNV_SUB_STOP_ON_ILLEGAL, NULL, NULL, &status);
+ ucnv_setFromUCallBack(conv, UCNV_FROM_U_CALLBACK_SUBSTITUTE, NULL, NULL, NULL, &status);
else
- ucnv_setToUCallBack(conv, UCNV_TO_U_CALLBACK_SUBSTITUTE, UCNV_SUB_STOP_ON_ILLEGAL, NULL, NULL, &status);
+ ucnv_setToUCallBack(conv, UCNV_TO_U_CALLBACK_SUBSTITUTE, NULL, NULL, NULL, &status);
break;

case ZEND_CONV_ERROR_ESCAPE_UNICODE:--TEST--
Bug #43477 (Unicode error mode)
--FILE--
<?php
var_dump(unicode_decode(b"\xF8", 'UTF-8', U_CONV_ERROR_SKIP));
?>
--EXPECT--
unicode(0) ""--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#2 March 18, 2008 18:43:54

Andrei Z.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


Why would we not want to stop on illegal sequences?

-Andrei

Stephen Bach wrote:Attached is a simple proposed patch that fixes Bug 43477. Basically, the codethat set the error mode of the ICU converter was giving it an instruction(the context parameter) to only skip or substitute if the code point was notrepresented in the new encoding. However, it still was returning an error forillegal sequences.The test suite returns the same results with or without the patch. Test alsoattached.-Stephen Bach--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#3 March 18, 2008 19:26:17

Geoffrey S.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


On 18 Mar 2008, at 17:43, Andrei Zmievski wrote:Why would we not want to stop on illegal sequences?Stuff relies on things _not_ stopping. No web browser stops on anillegal sequence: they all use some replacement character (U+FFFDREPLACEMENT CHARACTER in most). Sure, idealists will say that stoppingon errors will get the errors fixed, but that manifestly isn't true.There's a huge amount of ill-formed XML out there. Shipping somethingthat required strings to be valid is just asking to not be used. Itbreaks too much.We already claim to have error modes that don't stop on error. This isbroken.--
Geoffrey Sneddon
<http://gsnedders.com/>


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#4 March 18, 2008 20:37:53

Stephen B.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


I'm just suggesting that other error modes should do what they claim to do.
Stopping on an illegal sequence is fine, unless the user had called a
function telling the converter to do something else.

U_CONV_ERROR_STOP: stops on illegal character (the default)
U_CONV_ERROR_ESCAPE_*: 5 different modes that escape the illegal sequence in
various ways

Shouldn't U_CONV_ERROR_SKIP and U_CONV_ERROR_SUBST work the same way?

-Stephen

On Tuesday 18 March 2008 01:43:13 pm Andrei wrote:
> Why would we not want to stop on illegal sequences?
>
> -Andrei
>
> Stephen Bach wrote:
> > Attached is a simple proposed patch that fixes Bug 43477. Basically, the
> > code that set the error mode of the ICU converter was giving it an
> > instruction (the context parameter) to only skip or substitute if the
> > code point was not represented in the new encoding. However, it still was
> > returning an error for illegal sequences.
> >
> > The test suite returns the same results with or without the patch. Test
> > also attached.
> >
> > -Stephen Bach



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#5 March 18, 2008 23:03:17

Geoffrey S.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


On 18 Mar 2008, at 19:37, Stephen Bach wrote:Shouldn't U_CONV_ERROR_SKIP and U_CONV_ERROR_SUBST work the same way?I guess: U_CONV_ERROR_SKIP is just U_CONV_ERROR_SUBST with thesubstitution string as nothing, though I expect slight speed gainscould be made by keeping them separate (due to no attempt to even addanything after coming across an invalid sequence — though the speedgains will be very slight).--
Geoffrey Sneddon
<http://gsnedders.com/>


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#6 March 18, 2008 23:26:05

Stephen B.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


Sorry for the ambiguity. Allow me to clarify: I meant that U_CONV_ERROR_SKIP
and U_CONV_ERROR_SUBST should work the same as the other error modes.
Otherwise, what's the point of having them?

-Stephen

On Tuesday 18 March 2008 06:02:36 pm Geoffrey wrote:
> On 18 Mar 2008, at 19:37, Stephen Bach wrote:
> > Shouldn't U_CONV_ERROR_SKIP and U_CONV_ERROR_SUBST work the same way?
>
> I guess: U_CONV_ERROR_SKIP is just U_CONV_ERROR_SUBST with the
> substitution string as nothing, though I expect slight speed gains
> could be made by keeping them separate (due to no attempt to even add
> anything after coming across an invalid sequence — though the speed
> gains will be very slight).
>
>
> --
> Geoffrey Sneddon
> <http://gsnedders.com/>



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#7 March 20, 2008 21:33:28

Andrei Z.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


Okay. I'm fine with the patch.

Stephen Bach wrote:Sorry for the ambiguity. Allow me to clarify: I meant that U_CONV_ERROR_SKIPand U_CONV_ERROR_SUBST should work the same as the other error modes.Otherwise, what's the point of having them?-Stephen

On Tuesday 18 March 2008 06:02:36 pm Geoffrey wrote:On 18 Mar 2008, at 19:37, Stephen Bach wrote:Shouldn't U_CONV_ERROR_SKIP and U_CONV_ERROR_SUBST work the same way?I guess: U_CONV_ERROR_SKIP is just U_CONV_ERROR_SUBST with the
substitution string as nothing, though I expect slight speed gains
could be made by keeping them separate (due to no attempt to even add
anything after coming across an invalid sequence — though the speed
gains will be very slight).


--
Geoffrey Sneddon
<http://gsnedders.com/>--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#8 March 21, 2008 13:10:20

Antony D.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


On 03/18/2008 06:51 AM, Stephen Bach wrote:
> Attached is a simple proposed patch that fixes Bug 43477. Basically, the code
> that set the error mode of the ICU converter was giving it an instruction
> (the context parameter) to only skip or substitute if the code point was not
> represented in the new encoding. However, it still was returning an error for
> illegal sequences.
>
> The test suite returns the same results with or without the patch. Test also
> attached.

Patch committed, thanks.

--
Wbr,
Antony Dovgal

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#9 March 21, 2008 16:29:30

Geoffrey S.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


On 21 Mar 2008, at 12:09, Antony Dovgal wrote:On 03/18/2008 06:51 AM, Stephen Bach wrote:Attached is a simple proposed patch that fixes Bug 43477.Basically, the codethat set the error mode of the ICU converter was giving it aninstruction(the context parameter) to only skip or substitute if the codepoint was notrepresented in the new encoding. However, it still was returning anerror forillegal sequences.The test suite returns the same results with or without the patch.Test alsoattached.Patch committed, thanks.Can we test U_CONV_ERROR_SUBST too? See attached patch. Also, the bugshould be closed.--
Geoffrey Sneddon
<http://gsnedders.com/>--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#10 March 21, 2008 17:07:10

Antony D.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored


On 03/21/2008 06:28 PM, Geoffrey Sneddon wrote:
>> Patch committed, thanks.
>
> Can we test U_CONV_ERROR_SUBST too? See attached patch. Also, the bug
> should be closed.

The patch breaks the test.

Can you guys decide on what should work and how, I'll commit the patch
afterwards, ok?

--
Wbr,
Antony Dovgal

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

  • Root
  • » PHP
  • » [PHP-DEV] [PATCH] Bug 43477 - Unicode error mode ignored [RSS Feed]

Board footer

Moderator control

Enjoy the 11th of December
PoweredBy

The Forums are managed by develissimo stuff members, if you find any issues or misplaced content please help us to fix it. Thank you! Tell us via Contact Options
Leave a Message
Welcome to Develissimo Live Support