Electronics & Programming

develissimo

Open Source electronics development and programming

  • You are not logged in.

#1 Dec. 31, 2010 12:02:32

Enrico W.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] Re: EBNF


* Rune Kaagaard <rumi***@*mail.com> wrote:
> Dear internals
>
> After enviously looking at pythons grammar
> (http://docs.python.org/dev/reference/grammar.html) I keep feeling
> that PHP is missing out on a lot of interesting meta projects by not
> having an official EBNF.

ACK. PHP also misses a lot of other fundamental specifications
(at least I'm not aware of them). That's probably one of reasons
for the many problems experienced from user and enterprise operator
side: sudden semantic changes.

> Building your own PHP parser is _very_ hard and is PhD (Paul Biggar:)
> level stuff if you wan't to get all the edge cases right. Having _the_
> official EBNF would make this easier.

Hmm, perhaps it really would make a good PhD project to actually
create a clear specification, a full language report (at least for
the language itself and the core library) and write an tiny reference
implementation. Once that specification is finished, it should become
the official one where official PHP is tested against.


cu
--
----------------------------------------------------------------------
Enrico Weigelt, metux IT service --http://www.metux.de/phone: +49 36207 519931 email: weig***@*etux.de
mobile: +49 151 27565287 icq: 210169427 skype: nekrad666
----------------------------------------------------------------------
Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#2 Dec. 31, 2010 21:23:43

Stas M.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] Re: EBNF


Hi!But still I have to ask if I'm the only one thinking about this or is
there something I'm being completely ignorant about?You're not the only one thinking about it. But so far nobody moved fromthinking about it to actually doing it :)--
Stanislav Malyshev, Software Architect
SugarCRM:http://www.sugarcrm.com/(408)454-6900 ext. 227

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#3 Jan. 1, 2011 09:01:56

Gwynne R.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] Re: EBNF


On Dec 31, 2010, at 6:54 AM, Enrico Weigelt wrote:
>> After enviously looking at pythons grammar
>> (http://docs.python.org/dev/reference/grammar.html) I keep feeling
>> that PHP is missing out on a lot of interesting meta projects by not
>> having an official EBNF.
> ACK. PHP also misses a lot of other fundamental specifications
> (at least I'm not aware of them). That's probably one of reasons
> for the many problems experienced from user and enterprise operator
> side: sudden semantic changes.
>> Building your own PHP parser is _very_ hard and is PhD (Paul Biggar:)
>> level stuff if you wan't to get all the edge cases right. Having _the_
>> official EBNF would make this easier.
> Hmm, perhaps it really would make a good PhD project to actually
> create a clear specification, a full language report (at least for
> the language itself and the core library) and write an tiny reference
> implementation. Once that specification is finished, it should become
> the official one where official PHP is tested against.


If anyone's curious why this hasn't been done...

There has never been a language grammar, so there's been nothing to refer to at
all. As for why no one's made one more recently, for fun I snagged the .l and
.y files from trunk and W3C's version of EBNF from XML. In two hours of hacking
away, I managed to come up with this sort-of beginning to a grammar, which I'm
certain contains several errors, and only hints at a syntax:

/*http://www.w3.org/TR/REC-xml/#sec-notation*/

ws ::= +
string ::= *

namespace-name ::= '\\'? string ( '\\' string )*

use-declaration ::= 'use' ws+ namespace-name ( ws+ 'as' ws+ string )? ( ws* ','
ws* namespace-name ( ws+ 'as' ws+ string )? )+ ws* ';'

constant-declaration ::= 'const' ws+ string ws* '=' ws* static-scalar ( ws* ','
ws* string ws* '=' ws* static-scalar )* ws* ';'

inner-statement ::= statement | function-declaration-statement |
class-declaration-statement

statement ::= unticked-statement | string ':'

unticked-statement ::= '{' ws* inner-statement* ws* '}' |
'if' ws* '(' ws* expr ws* ')' ws* statement ws* elseif*
ws* else-single? |
'if' ws* '(' ws* expr ws* ')' ws* ':' inner-statement*
elseif-2* ws* else-single-2?

halt-compiler ::= '__halt_compiler' ws* '(' ws* ')' ws* ';'

top-statement ::= inner-statement |
halt-compiler |
'namespace' ws+ namespace-name ws* ';' |
'namespace' ( ws+ namespace-name )? ws* '{' ws*
top-statement-list ws* '}' |
use-declaration |
constant-declaration

script ::= top-statement*

Considering what it takes JUST to define namespaces, halt_compiler, basic
blocks, and the idea of a conditional statement... well, suffice to say the
"expr" production alone would be triple the size of this. It doesn't help that
there's no way I'm immediately aware of to check whether a grammar like this is
accurate.

Obviously there's room for optimization. An EBNF doesn't have to jump through
some of the hoops that a re2c parser backed by a flex lexer does; it could be
simplified once all the parser rules were considered. Or it could be written
without referring to the parser at all. Whether that would result in a better
or worse grammar, I don't know.

Nonetheless, it's a significant undertaking to deal with the complexity of the
language. There are dozens of tiny little edge cases in PHP's parsing that
require bunches of extra parser rules. An example from above is the difference
between using "statement" and "inner-statement" for the two different forms of
"if". Because "statement" includes basic blocks and labels, the rule disallows
writing "if: { xyz; } endif;", since apparently Zend doesn't support arbitrary
basic blocks. All those cases wreak havoc on the grammar. In its present form,
it will never reduce down to something nearly as small as Python's.

-- Gwynne


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#4 Jan. 1, 2011 16:21:37

g.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] Re: EBNF


Hi all,

PHP grammar is far from being complex. It is possible to describe most
of the syntax with a simple explanation.
Example:

* We can separate a program into several statements.
* There're a couple of items that cannot be declared into different
places (namespace, use), so consider them as top-statements.
* Also, Namespace declaration may contain multiple statements if you
define them under brackets.
* UseStatement can only be used inside a namespace or inside global scope.
* Finally, we support Classes.

Now we can describe a good portion of PHP grammar:

/* Terminals */
identifier
char
string
integer
float
boolean

/* Grammar Rules */
Literal ::= string | char | integer | float | boolean

Qualifier ::= ("private" | "public" | "protected")

/* Identifiers */
NamespaceIdentifier ::= identifier {"\" identifier}
ClassIdentifier ::= identifier
MethodIdentifier ::= identifier
FullyQualifiedClassIdentifier ::= ClassIdentifier

/* Root grammar */
Program ::= {TopStatement} {Statement}

TopStatement ::= NamespaceDeclaration | UseStatement | CommentStatement
Statement ::= ClassDeclaration | FunctionDeclaration | ...

/* Namespace Declaration */
NamespaceDeclaration ::= InlineNamespaceDeclaration | ScopeNamespaceDeclaration
InlineNamespaceDeclaration ::= SimpleNamespaceDeclaration ";"
{UseDeclaration} {Statement}
ScopeNamespaceDeclaration ::= SimpleNamespaceDeclaration "{"
{UseDeclaration} {Statement} "}"
SimpleNamespaceDeclaration ::= "namespace" NamespaceIdentifier

/* Use Statement */
UseStatement ::= "use" SimpleUseStatement {"," SimpleUseStatement} ";"
SimpleUseStatement ::= SimpleNamespaceUseStatement | SimpleClassUseStatement
SimpleNamespaceUseStatement ::= NamespaceIdentifier
SimpleClassUseStatement ::= FullyQualifiedClassIdentifier

/* Comment Declaration */
CommentStatement ::= InlineCommentStatement | MultilineCommentStatement
InlineCommentStatement ::= ("//" | "#") string
MultilineCommentStatement ::= SimpleMultilineCommentStatement |
DocBlockStatement
SimpleMultilineCommentStatement ::= "/*" {"*" string} "*/"
DocBlockStatement ::= "/**" {"*" string} "*/"

/* Class Declaration */
ClassDeclaration ::= SimpleClassDeclaration "{" {ClassMemberDeclaration} "}"
SimpleClassDeclaration ::= "class" ClassIdentifier


ClassMemberDeclaration ::= ConstDeclaration | PropertyDeclaration |
MethodDeclaration
ConstDeclaration ::= "const" identifier "=" Literal ";"
PropertyDeclaration ::= Qualifier Variable (PrototypeMethodDeclaration
| ComplexMethodDeclaration)

PrototypeMethodDeclaration ::= "abstract" Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ");"
ComplexMethodDeclaration ::= Qualifier "function"
MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}"
ArgumentDeclaration ::= SimpleArgumentDeclatation {","
SimpleArgumentDeclaration}
SimpleArgumentDeclaration ::= Variable

Offline

#5 Jan. 1, 2011 16:24:58

g.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] Re: EBNF


As a final note, I'd like to mention that even PHP grammar being quite
simple, it is light-years more complex (due to the lack of
standardization) than other languages.

You can compare this initial description I wrote to the Java
Specification and get your own conclusions:http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.htmlCheers,

On Sat, Jan 1, 2011 at 2:20 PM, guilhermebla***@*mail.com
<guilhermebla***@*mail.com> wrote:
> Hi all,
>
> PHP grammar is far from being complex. It is possible to describe most
> of the syntax with a simple explanation.
> Example:
>
> * We can separate a program into several statements.
> * There're a couple of items that cannot be declared into different
> places (namespace, use), so consider them as top-statements.
> * Also, Namespace declaration may contain multiple statements if you
> define them under brackets.
> * UseStatement can only be used inside a namespace or inside global scope.
> * Finally, we support Classes.
>
> Now we can describe a good portion of PHP grammar:
>
> /* Terminals */
> identifier
> char
> string
> integer
> float
> boolean
>
> /* Grammar Rules */
> Literal ::= string | char | integer | float | boolean
>
> Qualifier ::= ("private" | "public" | "protected")
>
> /* Identifiers */
> NamespaceIdentifier ::= identifier {"\" identifier}
> ClassIdentifier ::= identifier
> MethodIdentifier ::= identifier
> FullyQualifiedClassIdentifier ::= ClassIdentifier
>
> /* Root grammar */
> Program ::= {TopStatement} {Statement}
>
> TopStatement ::= NamespaceDeclaration | UseStatement | CommentStatement
> Statement ::= ClassDeclaration | FunctionDeclaration | ...
>
> /* Namespace Declaration */
> NamespaceDeclaration ::= InlineNamespaceDeclaration |
> ScopeNamespaceDeclaration
> InlineNamespaceDeclaration ::= SimpleNamespaceDeclaration ";"
> {UseDeclaration} {Statement}
> ScopeNamespaceDeclaration ::= SimpleNamespaceDeclaration "{"
> {UseDeclaration} {Statement} "}"
> SimpleNamespaceDeclaration ::= "namespace" NamespaceIdentifier
>
> /* Use Statement */
> UseStatement ::= "use" SimpleUseStatement {"," SimpleUseStatement} ";"
> SimpleUseStatement ::= SimpleNamespaceUseStatement | SimpleClassUseStatement
> SimpleNamespaceUseStatement ::= NamespaceIdentifier
> SimpleClassUseStatement ::= FullyQualifiedClassIdentifier
>
> /* Comment Declaration */
> CommentStatement ::= InlineCommentStatement | MultilineCommentStatement
> InlineCommentStatement ::= ("//" | "#") string
> MultilineCommentStatement ::= SimpleMultilineCommentStatement |
> DocBlockStatement
> SimpleMultilineCommentStatement ::= "/*" {"*" string} "*/"
> DocBlockStatement ::= "/**" {"*" string} "*/"
>
> /* Class Declaration */
> ClassDeclaration ::= SimpleClassDeclaration "{" {ClassMemberDeclaration} "}"
> SimpleClassDeclaration ::= "class" ClassIdentifier
>
>
> ClassMemberDeclaration ::= ConstDeclaration | PropertyDeclaration |
> MethodDeclaration
> ConstDeclaration ::= "const" identifier "=" Literal ";"
> PropertyDeclaration ::= Qualifier Variable (PrototypeMethodDeclaration
> | ComplexMethodDeclaration)
>
> PrototypeMethodDeclaration ::= "abstract" Qualifier "function"
> MethodIdentifier "(" {ArgumentDeclaration} ");"
> ComplexMethodDeclaration ::= Qualifier "function"
> MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}"
> ArgumentDeclaration ::= SimpleArgumentDeclatation {","
> SimpleArgumentDeclaration}
> SimpleArgumentDeclaration ::= Variable

Offline

#6 Jan. 2, 2011 21:32:56

Rune K.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] Re: EBNF


Hi Guilherme

You wrote that Java spec? Cool! Also very nice example of the PHP
EBNF! I think PHP needs a canonical one of those and that the parser
should be rewritten to represent said EBNF. Thats what I'm dreaming of
at least :)

Cheers
Rune

On Sat, Jan 1, 2011 at 5:23 PM, guilhermebla***@*mail.com
<guilhermebla***@*mail.com> wrote:
> As a final note, I'd like to mention that even PHP grammar being quite
> simple, it is light-years more complex (due to the lack of
> standardization) than other languages.
>
> You can compare this initial description I wrote to the Java
> Specification and get your own conclusions:
>http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html>
>
> Cheers,
>
> On Sat, Jan 1, 2011 at 2:20 PM, guilhermebla***@*mail.com
> <guilhermebla***@*mail.com> wrote:
>> Hi all,
>>
>> PHP grammar is far from being complex. It is possible to describe most
>> of the syntax with a simple explanation.
>> Example:
>>
>> * We can separate a program into several statements.
>> * There're a couple of items that cannot be declared into different
>> places (namespace, use), so consider them as top-statements.
>> * Also, Namespace declaration may contain multiple statements if you
>> define them under brackets.
>> * UseStatement can only be used inside a namespace or inside global scope.
>> * Finally, we support Classes.
>>
>> Now we can describe a good portion of PHP grammar:
>>
>> /* Terminals */
>> identifier
>> char
>> string
>> integer
>> float
>> boolean
>>
>> /* Grammar Rules */
>> Literal ::= string | char | integer | float | boolean
>>
>> Qualifier ::= ("private" | "public" | "protected")
>>
>> /* Identifiers */
>> NamespaceIdentifier ::= identifier {"\" identifier}
>> ClassIdentifier ::= identifier
>> MethodIdentifier ::= identifier
>> FullyQualifiedClassIdentifier ::= ClassIdentifier
>>
>> /* Root grammar */
>> Program ::= {TopStatement} {Statement}
>>
>> TopStatement ::= NamespaceDeclaration | UseStatement | CommentStatement
>> Statement ::= ClassDeclaration | FunctionDeclaration | ...
>>
>> /* Namespace Declaration */
>> NamespaceDeclaration ::= InlineNamespaceDeclaration |
>> ScopeNamespaceDeclaration
>> InlineNamespaceDeclaration ::= SimpleNamespaceDeclaration ";"
>> {UseDeclaration} {Statement}
>> ScopeNamespaceDeclaration ::= SimpleNamespaceDeclaration "{"
>> {UseDeclaration} {Statement} "}"
>> SimpleNamespaceDeclaration ::= "namespace" NamespaceIdentifier
>>
>> /* Use Statement */
>> UseStatement ::= "use" SimpleUseStatement {"," SimpleUseStatement} ";"
>> SimpleUseStatement ::= SimpleNamespaceUseStatement | SimpleClassUseStatement
>> SimpleNamespaceUseStatement ::= NamespaceIdentifier
>> SimpleClassUseStatement ::= FullyQualifiedClassIdentifier
>>
>> /* Comment Declaration */
>> CommentStatement ::= InlineCommentStatement | MultilineCommentStatement
>> InlineCommentStatement ::= ("//" | "#") string
>> MultilineCommentStatement ::= SimpleMultilineCommentStatement |
>> DocBlockStatement
>> SimpleMultilineCommentStatement ::= "/*" {"*" string} "*/"
>> DocBlockStatement ::= "/**" {"*" string} "*/"
>>
>> /* Class Declaration */
>> ClassDeclaration ::= SimpleClassDeclaration "{" {ClassMemberDeclaration} "}"
>> SimpleClassDeclaration ::= "class" ClassIdentifier
>>
>>
>> ClassMemberDeclaration ::= ConstDeclaration | PropertyDeclaration |
>> MethodDeclaration
>> ConstDeclaration ::= "const" identifier "=" Literal ";"
>> PropertyDeclaration ::= Qualifier Variable (PrototypeMethodDeclaration
>> | ComplexMethodDeclaration)
>>
>> PrototypeMethodDeclaration ::= "abstract" Qualifier "function"
>> MethodIdentifier "(" {ArgumentDeclaration} ");"
>> ComplexMethodDeclaration ::= Qualifier "function"
>> MethodIdentifier "(" {ArgumentDeclaration} ")" "{" {Statement} "}"
>> ArgumentDeclaration ::= SimpleArgumentDeclatation {","
>> SimpleArgumentDeclaration}
>> SimpleArgumentDeclaration ::= Variable

Offline

#7 Jan. 2, 2011 22:22:51

Jon D.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] Re: EBNF


>> Nonetheless, it's a significant undertaking to deal with the complexity of
>> the language. There are dozens of tiny little edge cases in PHP's parsing
>> that require bunches of extra parser rules. An example from above is the
>> difference between using "statement" and "inner-statement" for the two
>> different forms of "if". Because "statement" includes basic blocks and
>> labels, the rule disallows writing "if: { xyz; } endif;", since apparently
>> Zend doesn't support arbitrary basic blocks. All those cases wreak havoc on
>> the grammar. In its present form, it will never reduce down to something
>> nearly as small as Python's.
>
> Just to have a solid, complete maintained EBNF would be a _major_ leap
> forward!
>

Having an EBNF would be useful in cases where we want to write
something like Ruby's CoffeeScript. After looking at PHP's grammar
file, it's about 1,000 lines long. Since this is used to generate the
parser, isn't it possible to strip out the C macros to create an EBNF
that catches all edge cases?

Jon

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

#8 Jan. 4, 2011 08:57:38

Rune K.
Registered: 2009-11-02
Reputation: +  0  -
Profile   Send e-mail  

[PHP-DEV] Re: EBNF


> Having an EBNF would be useful in cases where we want to write
> something like Ruby's CoffeeScript. After looking at PHP's grammar
> file, it's about 1,000 lines long. Since this is used to generate the
> parser, isn't it possible to strip out the C macros to create an EBNF
> that catches all edge cases?

Not being sure at all, but I reckon a lot of those edge cases are
handled in the c macros and not in the plain "yacc"-style grammar
definition.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit:http://www.php.net/unsub.php

Offline

Board footer

Moderator control

Enjoy the 12th of December
PoweredBy

The Forums are managed by develissimo stuff members, if you find any issues or misplaced content please help us to fix it. Thank you! Tell us via Contact Options
Leave a Message
Welcome to Develissimo Live Support