Electronics & Programming

develissimo

Open Source electronics development and programming

  • You are not logged in.
  • Root
  • » PHP
  • » [PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer [RSS Feed]

#1 March 2, 2008 22:22:39

Marcus B.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


RFC: REPLACE THE FLEX-BASED SCANNER WITH AN RE2C BASED LEXER

Situation:
The current flex-based lexer depends on an outdated and unsupported flex
version. Alternatives include either updating to a newer version of flex or
using re2c, which we already use for a variety of things (serializing, pdo sql
scanning, date/time parsing). While moving towards a newer flex version would
be much easier, switching to re2c promises a much faster lexer. Actually,
without any specific re2c optimizations we already get around a 20% scanner
performance increase. Running the tests gets an overall speedup of 2%. It is
arguable whether this is enough, but re2c has more advantages. First of all,
re2c allows one to scan any type of input (ASCII, UTF-8, UTF-16, UTF-32).
Secondly, it allows for better integration with Lemon , which would be the
next step. And thirdly we can switch to a reentrant scanner.

Current state:
Flex has been fully replaced by re2c in Zend. We have also switched to an
mmap-based lexer approach for now. However, we had to drop multibyte support
as well as the encoding declare. The current state can be checked out from
Scott's subversion repository and you can follow the development on his
Trac setup . When you want to build php with re2c, then you need to grab
re2c from its sourceforge subversion repository . You can also check out
the changes in a patch created Sunday 2nd March against a PHP checkout from
14th February .

Further steps:
Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. Discuss/recreate
multibyte support with libintl.

Future steps:
Replace bison with lemon in PHP 5.4 or HEAD.

Time Frame:
Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple
of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs decision).
After that is done, decide about multibyte support. Along with the commit to
the 5.3 branch there will be a new re2c version available.


Marcus Boerger
Nuno Lopes
Scott MacVicar


http://re2c.org/http://www.hwaci.com/sw/lemon/ svn://whisky.macvicar.net/php-re2c
http://trac.macvicar.net/php-re2c/https://re2c.svn.sourceforge.net/svnroot/re2c/trunk/re2chttp://php.net/~helly/php-re2c-20080302.diff.txt--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#2 March 2, 2008 22:48:49

Stanislav M.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


Hi!be much easier, switching to re2c promises a much faster lexer. Actually,
without any specific re2c optimizations we already get around a 20% scannerI think 20% faster is very cool.However, as I understand re2c is not a standard tool found everywhere.So what happens if you wanted to use it on some exotic system where re2cis not readily available as manintainer-supported software? Also, flexis available on Windows for example as part of cygwin, while I don't seere2c there.I understand this can be of low importance since we keep generated filesin our repositories, but I think we still have to keep it in mind.I understand also current patch requires non-release version of re2c -maybe we should have some release version at least until we make PHPdepend on it?Current state:
Flex has been fully replaced by re2c in Zend. We have also switched to an
mmap-based lexer approach for now. However, we had to drop multibyte supportWere the stream support issues solved?as well as the encoding declare. The current state can be checked out from
Scott's subversion repository and you can follow the development on his
Trac setup . When you want to build php with re2c, then you need to grab
re2c from its sourceforge subversion repository . You can also check outthe changes in a patch created Sunday 2nd March against a PHP checkout from14th February .Further steps:
Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. Discuss/recreate
multibyte support with libintl.Note - pecl/intl does nothing towards multibyte support etc., at leastfor now. If there are voloteers to change that, it can be discussed, butso far it is for doing entirely other things (locale-dependentfunctionality mostly).So, I think before re2c parser can be merged the issue with multibytecompatibility must be solved - otherwise it will make the users thatrely on it unable to use newer PHP. As cool as 20% faster is, I think wecan't drop support for such feature, especially not in 5.3.Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple
of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs decision).
After that is done, decide about multibyte support. Along with the commit to
the 5.3 branch there will be a new re2c version available.I think we first need to figure out what happens to multibyte support,and not commit anything before we have it figured out. Multibyte supportis important piece of functionality for some PHP users, and it worksnow. Breaking it without providing any alternative - especially that wehave now 5.3 mostly ready for the release cycle, and solving multibyteproblems with re2c may take undefined amount of time, as far as Iunderstand. I do not think it would be acceptable to release 5.3 withoutmultibyte support, so the option here either merge it now and have 5.3waiting until MB is figured out, or try to figure it out before commitand if we can't in a reasonable term, go forward with 5.3 and defer theparser change for 5.4.Again, while I think the speedup is great and congratulate Marcus, Nunoand Scott on great work, I think we should keep in mind we have workingparser right now and changing it in an incompatible way is veryhigh-risk and should not be taken hastily.--
Stanislav Malyshev, Zend Software Architect
http://www.zend.com/(408)253-8829 MSN:

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#3 March 2, 2008 23:26:37

Rasmus L.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


Stanislav Malyshev wrote:Hi!be much easier, switching to re2c promises a much faster lexer.Actually,without any specific re2c optimizations we already get around a 20%scannerI think 20% faster is very cool.However, as I understand re2c is not a standard tool found everywhere.So what happens if you wanted to use it on some exotic system wherere2c is not readily available as manintainer-supported software? Also,flex is available on Windows for example as part of cygwin, while Idon't see re2c there.I don't think this part is a concern since we have required re2c forquite a while now to build many critical parts of PHP. People whoactually need to regenerate the parser files are the same people forwhom it is trivial to figure out how to install re2c. And yes, it wouldof course be good to use a released version of re2c, but I think by thetime 5.3 is ready to go the version of re2c we need will be out there.Since it is Marcus' baby, he can just push it out, I don't think this isa stumbling block either. Some of the new stuff in re2c wasspecifically added to make it easier to write a PHP parser, so I don'tthink backporting to an older version is really an option.-Rasmus

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#4 March 2, 2008 23:28:03

Marcus B.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


Hello Stanislav,

Sunday, March 2, 2008, 11:47:57 PM, you wrote:

> Hi!

>> be much easier, switching to re2c promises a much faster lexer. Actually,
>> without any specific re2c optimizations we already get around a 20% scanner

> I think 20% faster is very cool.
> However, as I understand re2c is not a standard tool found everywhere.
> So what happens if you wanted to use it on some exotic system where re2c
> is not readily available as manintainer-supported software? Also, flex
> is available on Windows for example as part of cygwin, while I don't see
> re2c there.
> I understand this can be of low importance since we keep generated files
> in our repositories, but I think we still have to keep it in mind.
> I understand also current patch requires non-release version of re2c -
> maybe we should have some release version at least until we make PHP
> depend on it?

Well, re2c works for on a very large amount of systems, can easily be build
and comes with a read to download windows executable. Furthermore all major
distributions have re2c packages. Along with storing the generated files in
cvs i see no issue at all in these regards.

>> Current state:
>> Flex has been fully replaced by re2c in Zend. We have also switched to an
>> mmap-based lexer approach for now. However, we had to drop multibyte support

> Were the stream support issues solved?

We completely dropped multibyte support. The reason is that the way we were
doing it, is that we constanlty switch between the full original and a
recoded duplicate that simply ignores multibyte (or any encoding at all).
Once we have finished the move to re2c, we can support all of those
correctly. The multibyte support also duplicated the encoding tables
otherwise available in ext/mbstring or ext/iconv or pecl/intl.

>> as well as the encoding declare. The current state can be checked out from
>> Scott's subversion repository and you can follow the development on his
>> Trac setup . When you want to build php with re2c, then you need to grab
>> re2c from its sourceforge subversion repository . You can also check out
>> the changes in a patch created Sunday 2nd March against a PHP checkout from
>> 14th February .
>>
>> Further steps:
>> Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. Discuss/recreate
>> multibyte support with libintl.

> Note - pecl/intl does nothing towards multibyte support etc., at least
> for now. If there are voloteers to change that, it can be discussed, but
> so far it is for doing entirely other things (locale-dependent
> functionality mostly).

Yes I know. However pecl/intl brings in a php/icu bridge which we can build
on.

> So, I think before re2c parser can be merged the issue with multibyte
> compatibility must be solved - otherwise it will make the users that
> rely on it unable to use newer PHP. As cool as 20% faster is, I think we
> can't drop support for such feature, especially not in 5.3.

Rely on a not supported undocumented feature? I am rather able to build php
and rewrite that support.

>> Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple
>> of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs decision).
>> After that is done, decide about multibyte support. Along with the commit to
>> the 5.3 branch there will be a new re2c version available.

> I think we first need to figure out what happens to multibyte support,
> and not commit anything before we have it figured out. Multibyte support
> is important piece of functionality for some PHP users, and it works
> now. Breaking it without providing any alternative - especially that we
> have now 5.3 mostly ready for the release cycle, and solving multibyte
> problems with re2c may take undefined amount of time, as far as I
> understand. I do not think it would be acceptable to release 5.3 without
> multibyte support, so the option here either merge it now and have 5.3
> waiting until MB is figured out, or try to figure it out before commit
> and if we can't in a reasonable term, go forward with 5.3 and defer the
> parser change for 5.4.

> Again, while I think the speedup is great and congratulate Marcus, Nuno
> and Scott on great work, I think we should keep in mind we have working
> parser right now and changing it in an incompatible way is very
> high-risk and should not be taken hastily.

You are free to contribute and make MB support working upfront.

Best regards,
Marcus


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#5 March 2, 2008 23:28:47

Pierre J.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


Hi Stan,

On Sun, Mar 2, 2008 at 11:47 PM, Stanislav Malyshev <> wrote:
> Hi!
>
>
> > be much easier, switching to re2c promises a much faster lexer. Actually,
> > without any specific re2c optimizations we already get around a 20% scanner
>
> I think 20% faster is very cool.
> However, as I understand re2c is not a standard tool found everywhere.
> So what happens if you wanted to use it on some exotic system where re2c
> is not readily available as manintainer-supported software? Also, flex
> is available on Windows for example as part of cygwin, while I don't see
> re2c there.

A quick note about this non problem. re2c works pretty well on windows
and they provide a .exe as far as I remember (much easier than flex
which requires cygwin or gnuwin32, even if both work :). Besides the
portability of re2c, we already use it in some extensions (if I
remember correctly) and nobody complained.

Cheers,
--
Pierrehttp://blog.thepimp.net|http://www.libgd.org--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#6 March 2, 2008 23:44:13

Marcus B.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


Hello Rasmus,

Monday, March 3, 2008, 12:25:52 AM, you wrote:

> Stanislav Malyshev wrote:
>> Hi!
>>
>>> be much easier, switching to re2c promises a much faster lexer.
>>> Actually,
>>> without any specific re2c optimizations we already get around a 20%
>>> scanner
>>
>> I think 20% faster is very cool.
>> However, as I understand re2c is not a standard tool found everywhere.
>> So what happens if you wanted to use it on some exotic system where
>> re2c is not readily available as manintainer-supported software? Also,
>> flex is available on Windows for example as part of cygwin, while I
>> don't see re2c there.
> I don't think this part is a concern since we have required re2c for
> quite a while now to build many critical parts of PHP. People who
> actually need to regenerate the parser files are the same people for
> whom it is trivial to figure out how to install re2c. And yes, it would
> of course be good to use a released version of re2c, but I think by the
> time 5.3 is ready to go the version of re2c we need will be out there.
> Since it is Marcus' baby, he can just push it out, I don't think this is
> a stumbling block either. Some of the new stuff in re2c was
> specifically added to make it easier to write a PHP parser, so I don't
> think backporting to an older version is really an option.

Right. The current re2c development cycle is solely dedicated to be able
to rewrite the PHP scanners. I will update re2c whenever necessary during
the remaining development cycle and release a new stable release before we
release PHP 5.3.

Best regards,
Marcus


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#7 March 2, 2008 23:49:32

Alan K.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


Can you clarify the Multibyte issues:- I presume this means that it can handle ASCII/UTF8/16 etc. but willnot handle things like BIG5/GB encoding in source code - this may be abit of an issue around here..Regards
Alan


Marcus Boerger wrote:RFC: REPLACE THE FLEX-BASED SCANNER WITH AN RE2C BASED LEXER

Situation:
The current flex-based lexer depends on an outdated and unsupported flex
version. Alternatives include either updating to a newer version of flex or
using re2c, which we already use for a variety of things (serializing, pdo sql
scanning, date/time parsing). While moving towards a newer flex version would
be much easier, switching to re2c promises a much faster lexer. Actually,
without any specific re2c optimizations we already get around a 20% scanner
performance increase. Running the tests gets an overall speedup of 2%. It is
arguable whether this is enough, but re2c has more advantages. First of all,
re2c allows one to scan any type of input (ASCII, UTF-8, UTF-16, UTF-32).
Secondly, it allows for better integration with Lemon , which would be the
next step. And thirdly we can switch to a reentrant scanner.

Current state:
Flex has been fully replaced by re2c in Zend. We have also switched to an
mmap-based lexer approach for now. However, we had to drop multibyte support
as well as the encoding declare. The current state can be checked out from
Scott's subversion repository and you can follow the development on his
Trac setup . When you want to build php with re2c, then you need to grab
re2c from its sourceforge subversion repository . You can also check outthe changes in a patch created Sunday 2nd March against a PHP checkout from14th February .Further steps:
Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. Discuss/recreate
multibyte support with libintl.

Future steps:
Replace bison with lemon in PHP 5.4 or HEAD.

Time Frame:
Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple
of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs decision).
After that is done, decide about multibyte support. Along with the commit to
the 5.3 branch there will be a new re2c version available.


Marcus Boerger
Nuno Lopes
Scott MacVicar


http://re2c.org/http://www.hwaci.com/sw/lemon/ svn://whisky.macvicar.net/php-re2c
http://trac.macvicar.net/php-re2c/https://re2c.svn.sourceforge.net/svnroot/re2c/trunk/re2chttp://php.net/~helly/php-re2c-20080302.diff.txt--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#8 March 3, 2008 04:40:16

Stanislav M.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


Hi!Were the stream support issues solved?We completely dropped multibyte support. The reason is that the way we wereI wasn't asking about multibyte (that we discuss below), but about otherstreams - I think I mentioned it on IRC last time re2c parser wasdiscussed. I remember re2c used mmap, and not all files PHP can run canbe mmap'ed. Was it fixed?Once we have finished the move to re2c, we can support all of those
correctly. The multibyte support also duplicated the encoding tables
otherwise available in ext/mbstring or ext/iconv or pecl/intl.pecl/intl per se doesn't have any encoding tables. ICU does, but thatwould mean you have to have ICU to run PHP. That might not be a bigproblem since ICU is supported by IBM (read: good chance more "exotic"systems would have support) it is still dependency on non-bundled 3rdparty library in PHP 5 core. Of course, PHP 6 has this dependency, butwe might want to not have such things in 5.x so that you won't have tochange your system too much while staying on 5.x.Rely on a not supported undocumented feature? I am rather able to build php
and rewrite that support.Being undocumented is nothing to be proud of, however as poorlydocumented as it is, it is used. I'm all for implementing it in a betterway - and having new parser is a good time to do it. That's exactly thereason we shouldn't rush with it but do it right this time. There's noburning need to have a new parser right now, so we can have some momentto think - ok, how we want multibyte support there to work? And if wemight need some modifications, we'd have time and flexibility to do it,not having the code in 5.3 which was supposed to go in RC in Q1 (ending1 month from now).You are free to contribute and make MB support working upfront.I know I'm free :) However, as much as I understand the eagerness ofhaving it in the source tree, I repeat that I do not think droppingmultibyte support in 5.3 is acceptable. Thus, if it is committed rightnow, 5.3 would have to be deferred until this is resolved. If this isresolved timely for 5.3 - great. If not, we better get it in 5.4 rightthan in 5.3 wrong.--
Stanislav Malyshev, Zend Software Architect
http://www.zend.com/(408)253-8829 MSN:

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#9 March 3, 2008 04:41:13

Stanislav M.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


I don't think this part is a concern since we have required re2c forquite a while now to build many critical parts of PHP. People whoOk, great then - only issue remaining is the multibyte support.

--
Stanislav Malyshev, Zend Software Architect
http://www.zend.com/(408)253-8829 MSN:

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#10 March 3, 2008 08:28:39

Derick R.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer


On Sun, 2 Mar 2008, Marcus Boerger wrote:

> However, we had to drop multibyte support as well as the encoding
> declare.

Just wondering, why did you have to drop the "declare(encoding=...)" ?
It's just ignored in PHP 5.x - and it is useful to have for migrating
php 5.3 apps to 6. So can you atleast make the new parser just ignore
this statement?

regards,
Derick

--
Derick Rethanshttp://derickrethans.nl|http://ezcomponents.org|http://xdebug.org--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

  • Root
  • » PHP
  • » [PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer [RSS Feed]

Board footer

Moderator control

Enjoy the 11th of December
PoweredBy

The Forums are managed by develissimo stuff members, if you find any issues or misplaced content please help us to fix it. Thank you! Tell us via Contact Options
Leave a Message
Welcome to Develissimo Live Support