From:  pear-qa@lists.php.net ("lexx918@gmail.com")
Date:  26 Aug 2015 00:59:29 Hong Kong Time
Newsgroup:  news.php.net/php.pear.bugs
Subject:  

[PEAR-BUG] Bug #20943 [Com]: Comment tokenizer does not understand national symbols in UTF-8.

NNTP-Posting-Host:  null

Edit report at https://pear.php.net/bugs/bug.php?id=20943&edit=1

 ID:               20943
 Comment by:       lexx918@gmail.com
 Reported By:      lexx918 at gmail dot com
 Summary:          Comment tokenizer does not understand national
                    symbols in UTF-8.
 Status:           Open
 Type:             Bug
 Package:          PHP_CodeSniffer
 Operating System: Ubuntu
 Package Version:  2.3.3
 PHP Version:      5.5.9
 Roadmap Versions: 
 New Comment:

My attempt to correct the error:
https://github.com/squizlabs/PHP_CodeSniffer/pull/692


Previous Comments:
------------------------------------------------------------------------

[2015-08-25 15:23:25] lexx918

Description:
------------
Compare this ..

---------- source:
  /**\r\n繚*繚?\r\n繚*/
    *** START COMMENT TOKENIZING ***
    Create comment token: T_DOC_COMMENT_OPEN_TAG => /**
    Create comment token: T_DOC_COMMENT_WHITESPACE => \r\n
    Create comment token: T_DOC_COMMENT_WHITESPACE => 繚
    Create comment token: T_DOC_COMMENT_STAR => *
    Create comment token: T_DOC_COMMENT_WHITESPACE => 繚
    Create comment token: T_DOC_COMMENT_STRING => ?
    Create comment token: T_DOC_COMMENT_WHITESPACE => \r\n
    Create comment token: T_DOC_COMMENT_STRING => 繚*
    Create comment token: T_DOC_COMMENT_CLOSE_TAG => /
    *** END COMMENT TOKENIZING ***
Process token [2]: T_WHITESPACE => \r\n
----------

.. and this.

---------- source:
  /**\r\n繚*繚A\r\n繚*/
    *** START COMMENT TOKENIZING ***
    Create comment token: T_DOC_COMMENT_OPEN_TAG => /**
    Create comment token: T_DOC_COMMENT_WHITESPACE => \r\n
    Create comment token: T_DOC_COMMENT_WHITESPACE => 繚
    Create comment token: T_DOC_COMMENT_STAR => *
    Create comment token: T_DOC_COMMENT_WHITESPACE => 繚
    Create comment token: T_DOC_COMMENT_STRING => A
    Create comment token: T_DOC_COMMENT_WHITESPACE => \r\n
    Create comment token: T_DOC_COMMENT_WHITESPACE => 繚
    Create comment token: T_DOC_COMMENT_CLOSE_TAG => */
    *** END COMMENT TOKENIZING ***
Process token [2]: T_WHITESPACE => \r\n
----------

Pay attention to:
"T_DOC_COMMENT_CLOSE_TAG => /" (In the case of the Russian letter "?")
versus
"T_DOC_COMMENT_CLOSE_TAG => */" (In the case of the English letter "A")

Because
https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Tokenizers/Comment.php
does not know about the selected encoding (--encoding=utf-8).

As a consequence, sniffer
http://pear.php.net/package/PHP_CodeSniffer/docs/latest/PHP_CodeSniffer/Generic_Sniffs_Commenting_DocCommentSniff.html
always report an error: "The close comment tag must be the only content
on the line".

------------------------------------------------------------------------


-- 
Edit this bug report at https://pear.php.net/bugs/bug.php?id=20943&edit=1