FAQ

String Literals, take 2

Joseph F. Ryan
Dec 2, 2002 at 11:58 am
I've integrated most of the proposed suggestions, as well as a section
on vstrings and a winged section on hash interpolation. So that leaves
these known issues:

- Reference stringification
- Default Object Strinigifcation
(.AS_STRING needs to be added to the doc as well, but I figure it
is still getting hammered out)
- Does <<>> mess up here-docs?
(I'm inclined to say that <<>> is more trouble than it is worth,
and to ditch <<>>, simply sticking with qw())

Also, would any sort of diff be helpful with these document revisions?
There's Text::ParagraphDiff, but that doesn't work too well with pod,
since pod is line-oriented rather than paragraph oriented. Regular
diffs aren't that helpful on text either. However, either one is
better than nothing, so if you'd like one, let me know.


Joseph. F Ryan
ryan.311@osu.edu



=pod

=head1 Strings

A string is formed when text is enclosed by a quoting operator.
There are two types of quoting operators: interpolating and
non-interpolating. In interpolating constructs, the value of a
variable is substituted for the variable name within the string
and certain characters have special meaning when preceded by a
backslash (C<\>). In non-interpolating constructs, a variable
name that appears within the string is used as-is. The simplest
examples of these two types of quoting operators are strings
delimited by double (interpolating) and single quotes
(non-interpolating). For example:

'The quick brown $animal'
"The quick brown $animal"

In the first string, perl will take each character literally and
perform no special processing. In the second string, the value
of the variable $animal is inserted within the string at that
location. If $animal had had the value "fox", then the second
string would have become "The quick brown fox".

More on the various quoting operators below.

=head2 Non-Interpolating Constructs

Non-Interpolating constructs are strings in which expressions do not
interpolate, or expand. The one exception to this is that the
backslash character, \, will always escape the character that
immediately follows the it.

The base form for a non-interpolating string is the single-quoted
string: 'string'. However, non-interpolating strings can also be formed
with the q() operator. The q() operator allows strings to be made with
any non-space, non-letter, non-digit character as the delimeter instead
of '. In addition, if the starting delimeter is a part of a paired
set, such as (, [, <, or {, then the closing delimeter may be the
matching member of the set. In addition, the reverse holds true;
delimeters which are the tail end of a pair may use the starting item
as the closing delimeter.

=over 3
Examples:

$string = 'string' # $string = 'string'
$string = q|string| # $string = 'string'
$string = q(string) # $string = 'string'
$string = q]string[ # $string = 'string'
=back

There are a few special cases for delimeters; specifically : and #.
: is not allowed because it might be used by custom-defined quoting
operators to apply a property; # is allowed, but there cannot be a
space between the operator and the #. In addition, comments are not
allowed within # delimeted expressions (for obvious reasons).

=head3 Embedding Interpolated Strings

It is also possible to embed an interpolating string within a non-
interpolating string by the use of the \qq{} construct. A string
inside a \qq{} constructs acts exactly as if it were an interpolated
string. Note that any end-brackets, "}", must be escaped within the
the \qq{} construct so that the parser can read it correctly.

=over 3
Examples ( assuming C<< $var="two" >> ):

$string = 'one \qq{$var} two' # $string = 'one two three'
$string = 'one\qq{ {$var\} }two' # $string = 'one {two} three'
=back

=head3 <<>>; expanding a string as a list.

A set of braces is a special op that evaluates into the list of words
contained, using whitespace as the delimeter. It is similar to qw()
from perl5, and can be thought of as roughly equivalent to:
C<< "STRING".split(' ') >>

=over 3
Examples:

@array = <one two three>; # @array = ('one', 'two', 'three');
@array = <one <\> three>; # @array = ('one', '<>', 'three');
=back

=head2 Interpolating Constructs

Interpolating constructs are another form of string in which variables
that are embedded into the string are expanded into their value at
runtime. Interpolated strings are formed using the double quote:
"string". In addition, qq() is a synonym for "", which is similar to
q() being a synoynm for ''. The rules for interpolation are as
follows:

=head3 Interpolation Rules

=over 3

=item Scalars: C<"$scalar">, C<"$(expression)">
Non-Reference scalars will simply interpolate as their value. $()
forces its expression into scalar context, which is then handled as
either a scalar or a reference, depending on how expression evaluates.

=item Lists: C<"@list">, C<"@(expression)">
Arrays and lists are interpolated by joining their list elements by the
list's separator property, which is by default a space. Therefore, the
following two expressions are equivalent:

=over 3
print "@list";
print "" ~ @list.join(@list.separator) ~ "";
=back

=item Hashes: C<"%hash">, C<"%(expression)">
Hashes interpolate by joining its pairs on its .separator property,
which by default is a newline. Pairs stringify by joining the key and
value with the hash's .pairsep property, which by default is a space.
Note that hashes are unordered, and so the output will be unordered.
Therefore, the following two expressions are equivalant:

=over 3
print "%hash";
print "" ~
join ( %hash.separator,
map { $_ ~ %hash.pairsep ~ %hash{$_} } %hash.keys
~ "";
=back

=item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
Subroutines and Methods will interpolate their return value into the
string, which will be handled in whichever type the return value is.
Same for object methods. Note that parens B<are> required during
interpolation so that the parser can disambiguate between object
methods and object members.

=item References C<"$ref">
# Behavior not defined

=item Default Object Stringification C<"$obj">
# Behavior not defined

=item Escaped Characters
# Basically the same as Perl5; also, how are locale semantics handled?

\t tab
\n newline
\r return
\f form feed
\b backspace
\a alarm (bell)
\e escape
\b10 binary char
\o33 octal char
\x1b hex char
\x{263a} wide hex char
\c[ control char
\N{name} named Unicode character

=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>

Modifiers apply a modification to text which they enclose; they can be
embedded within interpolated strings.

\L{} Lowercase all characters within brackets
\U{} Uppercase all characters within brackets
\Q{} Escape all characters that need escaping
within brackets (except "}")

=back

=head3 Stopping Interpolation (\Q)

Within an interpolated string, interpolation of expressions can be
stopped by \Q.

=over 3
Example:
@list = (1,2);
print "@list\Q[0]"; # prints '1 2[0]'
=back

=head3 Embedding non-interpolated constructs: C<\q{}>

Similar to embedding an interpolated string within a non-interpolated
string, it is possible to embed a non-interpolated string within a
interpolated string with \q{}. Any characters within a \q{} construct
are treated as if they were in an non-interpolated string.

=over 3
Example:
"string \q{$variable}" # $variable will not be interpolated
=back
=head3 C<qx()>, backticks (C<``>)

A string which is (possibly) interpolated and then executed as a system
command with /bin/sh or its equivalent. Shell wildcards, pipes, and
redirections will be honored. The collected standard output of the
command is returned; standard error is unaffected. In scalar context,
it comes back as a single (potentially multi-line) string, or undef if
the command failed. In list context, returns a of list of lines split
on the standard input separator, or an empty list if the command
failed.

=head2 Special Quoting

=head3 Here-Docs

A line-oriented form of quoting is based on the shell "here-document"
syntax. Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item. The terminating string
may be either an identifier (a word), or some quoted text. If quoted,
the type of quotes you use determines the treatment of the text, just
as in regular quoting. An unquoted identifier works like double quotes.
The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.

=over 3
Examples:

print << EOF;
The price is $Price.
EOF

print << "EOF"; # same as above
The price is $Price.
EOF

print << "EOF"; # same as above
The price is $Price.
EOF

print << `EOC`; # execute commands
echo hi there
echo lo there
EOC

print <<"foo", <<"bar"; # you can stack them
I said foo.
foo
I said bar.
bar

myfunc(<< "THIS", 23, <<'THAT');
Here's a line
or two.
THIS
and here's another.
THAT

=back

Don't forget that you have to put a semicolon on the end to finish the
statement, as Perl doesn't know you're not going to try to do this:

=over 3
print <<ABC
179231
ABC
+ 20;
=back

If you want your here-docs to be indented with the rest of the code,
you'll need to remove leading whitespace from each line manually:

=over 3
($quote = <<'FINIS') =~ s/^\s+//gm;
The Road goes ever on and on,
down from the door where it began.
FINIS
=back

If you use a here-doc within a delimited construct, such as in s///eg,
the quoted material must come on the lines following the final
delimiter. So instead of:

=over 3
s/this/<<E . 'that'
the other
E
. 'more '/eg;
=back

you have to write

=over 3
s/this/<<E . 'that'
. 'more '/eg;
the other
E
=back

Also note that with single quoted here-docs, backslashes are not
special, and are taken for a literal backslash, a behaivor that is
different from normal single-quoted strings.

=head3 V-Strings

V-Strings are formed when 3 or digits are joined by decimal points,
with a possible leading v. The resulting item is then treated like
a string, rather than a number.

=over 3
Examples:
$var = v5.8.0; # $var = "5.8.0";
$var = 192.168.0.1; # $var = "192.168.0.1";
=back

=head2 Gory Details of parsing quoted constructs

No string section would be complete without a "Gory details of parsing
quoted constructs"; however, since the current implementation in P6C
doesn't have support for \Q, \Q{}, \L{}, \U{}, \N{name}, or \x{}, the
implementation may have to change. If you really need your blood and
guts, please see P6C/Tree/String.pm for the current string-parsing
semantics.

=cut
reply

Search Discussions

14 responses

  • James Mastros at Dec 2, 2002 at 5:14 pm
    Just a few more nits to pick...
    On 12/02/2002 6:58 AM, Joseph F. Ryan wrote:
    The q() operator allows strings to be made with
    any non-space, non-letter, non-digit character as the delimeter instead
    of '. In addition, if the starting delimeter is a part of a paired
    set, such as (, [, <, or {, then the closing delimeter may be the
    matching member of the set. In addition, the reverse holds true;
    delimeters which are the tail end of a pair may use the starting item
    as the closing delimeter.
    We need to decide if this is a user doc or a developer doc/language
    specification. If it's the later, we need a regirous defintion of what
    a pair is.
    There are a few special cases for delimeters; specifically : and #.
    : is not allowed because it might be used by custom-defined quoting
    operators to apply a property; # is allowed, but there cannot be a
    space between the operator and the #. In addition, comments are not
    allowed within # delimeted expressions (for obvious reasons).
    Are comments ever allowed within q() constructs? If not, ditch the
    statement about comments not being allowed in q## constructs.
    =head3 <<>>; expanding a string as a list.

    A set of braces is a special op that evaluates into the list of word
    A doubled set of angle brackets (<<text here>>) or a set of double-angle
    quotation marks (guillemets, «text here»).
    contained, using whitespace as the delimeter. It is similar to qw()
    from perl5, and can be thought of as roughly equivalent to:
    Are we getting rid of qw()? I assumed that we were keeping it as a
    longhand form of <<>>/guillemets, just like qq() is the longhand form of "".
    C<< "STRING".split(' ') >>
    I'd be more explicit here, and say C<<"STRING".split(/\s+/)>>. (The two
    are equivlent, but only because of special-casing; the second is more
    explicit.)
    =head2 Interpolating Constructs

    Interpolating constructs are another form of string in which variables
    that are embedded into the string are expanded into their value at
    runtime. Interpolated strings are formed using the double quote:
    ...using double quotes, as in "string".
    "string". In addition, qq() is a synonym for "", which is similar to
    q() being a synoynm for ''.
    ...similarly to...
    =item Hashes: C<"%hash">, C<"%(expression)">
    Hashes interpolate by joining its pairs on its .separator property,
    which by default is a newline. Pairs stringify by joining the key and
    value with the hash's .pairsep property, which by default is a space.
    Have these defaults been defined somewhere? I'd rather see them be ', '
    and '=>' by default...
    Note that hashes are unordered, and so the output will be unordered.
    Therefore, the following two expressions are equivalant:
    Get rid of the therefore; it seems to refer to the preceding sentance,
    which has nothing to do with the example.
    =item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
    Subroutines and Methods will interpolate their return value into the
    string, which will be handled in whichever type the return value is.
    Same for object methods. Note that parens B<are> required during
    interpolation so that the parser can disambiguate between object
    methods and object members.
    Has this been vetted? $(...)/etc seem to cover this case, and & being a
    qq() metachar makes using qq() strings to print HTML/XML difficult.
    =item Escaped Characters
    # Basically the same as Perl5; also, how are locale semantics handled?

    \t tab
    \n newline
    \r return
    \f form feed
    \b backspace
    \a alarm (bell)
    \e escape
    Can we get some riggor here? Also, is \n the same everwhere, or do we
    play the same tricks we did with it in p5? (I think it should be the
    same everywhere, a CR char, "\cM". Disciplines, or encodings, or
    whatever we're calling them, can take care of it on IO.) Oh, and it
    might be nice for \0 to be NUL. (This used to be implicit with \0 as
    octal, but since \0 isn't octal anymore...)
    \b10 binary char
    \o33 octal char
    Numeric Literals, take 3
    (http://archive.develooper.com/perl6-documentation@perl.org/msg00462.html),
    in the "*** Bin/Hex/Oct shorthands" section, gives 0c123 as the
    shorthand form of octal numbers, so it doesn't make much sense for octal
    character constants to be \o123. Do we want to change shorthand octal
    literal numbers to 0o123 (I don't like this, it's hard to read), change
    octal chars to \c123 (can't do this without getting rid of, or changing,
    \c for control-character), get rid of octal chars entirely, or
    somthing else? (Baring a good "somthing else", I vote for killing octal
    chars.)
    \x1b hex char
    Exactly two digits after the \x? Perl5 attempts to do the right thing
    either way, but this can be confusing too -- "\xA" eq chr(0xA), "\xABar"
    eq chr(0xAB)."ar", "\xAQux" eq chr(0xA)."Qux".
    \x{263a} wide hex char
    \c[ control char
    Rigor? What is \c~? perl5 thinks it's >, should perl6 agree? How
    about \c\x{1000} (that's invalid, but you get the point), is that equiv
    to \x{ff9c}? What about \cé, (e+acute accent), does that capitalize,
    then subtract 64, or just subtract?
    \N{name} named Unicode character
    Reference to charnames pragmata, or however we end up defining the exact
    semantics of \N. (Since we don't know yet, just put in a FIXME, I suppose.)

    Is there any way to give the ordnal in decimal, like "\d192"? (I'm not
    sure how useful this would be, but it would be nice parrellelisim.
    OTOH, you can use chr() easily enough.
    =item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>

    Modifiers apply a modification to text which they enclose; they can be
    embedded within interpolated strings.

    \L{} Lowercase all characters within brackets
    \U{} Uppercase all characters within brackets
    \Q{} Escape all characters that need escaping
    within brackets (except "}")
    Rigor: escape all non-alphanumerics.
    Do we still have the other modifiers that p5 supports, \l and \u? Do we
    want a new titlecase modifier, \T{james mastros} eq "James Mastros",
    doing the Right Thing for other languages, where it isn't so simple
    (there are complicated cases for this, but IIRC Unicode defines a robust
    algo to do this). I'll check on the Unicode stuff if anybody thinks
    it's a good idea... I'm uncertian, myself, I never liked the qq()
    case-modifers, so don't use them.

    A string which is (possibly) interpolated and then executed as a system
    command with /bin/sh or its equivalent. Shell wildcards, pipes, and
    redirections will be honored. The collected standard output of the
    command is returned; standard error is unaffected. In scalar context,
    it comes back as a single (potentially multi-line) string, or undef if
    the command failed. In list context, returns a of list of lines split
    on the standard input separator, or an empty list if the command
    failed.
    This whole section is very unix-centric, but I'm not certian what to do
    about that -- the functionality is very system-specifc. Also, I suspect
    we're going to want to rewrite it anyway when we hammer out iterators,
    files, and context.
    A line-oriented form of quoting is based on the shell "here-document"
    s/shell/unix borne shell/
    syntax. Following a << you specify a string to terminate the quoted
    material, and all lines following the current line down to the
    terminating string are the value of the item. The terminating string
    may be either an identifier (a word), or some quoted text. If quoted,
    the type of quotes you use determines the treatment of the text, just
    as in regular quoting. An unquoted identifier works like double quotes.
    The terminating string must appear by itself, and any preceding or
    following whitespace on the terminating line is discarded.
    I could have sworn that Larry recently put somthing out about the edge
    cases between << heredoc and << beginning-of-qw. I /think/ he said that
    qw("Foo" bar) must be written as << "Foo" bar>>, because otherwise it
    would be interpreted as a here-doc ending with Foo with double-quote
    interpolation. Can anybody find this, or is Larry watching?
    Also note that with single quoted here-docs, backslashes are not
    special, and are taken for a literal backslash, a behaivor that is
    different from normal single-quoted strings.
    Are \qq()s still special, even in <<'noninterpolating's? Either way, it
    should be explicitly noted.
    V-Strings are formed when 3 or digits are joined by decimal points,
    with a possible leading v. The resulting item is then treated like
    a string, rather than a number.

    =over 3
    Examples:
    $var = v5.8.0; # $var = "5.8.0";
    $var = 192.168.0.1; # $var = "192.168.0.1";
    =back
    Note that the v is non-optional for two-character v-strings.

    I'd say somthing like:
    V-strings are actualy strings that just happen to look like numbers.
    Each dot-sepperated number is transformed into the character with that
    Unicode ordnal, and the string is concotantaed together.

    (The transformation from normal string to v-string looks like
    C<<$vstring='v' ~ join '.', map {ord} split //, $instring>>; the
    transformation from v-string to normal string looks like
    C<<print join '', map {chr} split /\./, $vstring>>;
    (Where vstring cannot begin with a leading 'v', for purposes of
    illistration.))

    Thus, C<<80.101.114.108.32.54.33 eq 'Perl 6!'>>

    Also, your examples are misleading at best. v5.8.0 eq "\x05\x08\x00".
    192.168.0.1 eq chr(192)~chr(168)~chr(0)~chr(1).

    -=- James Mastros
  • Joseph F. Ryan at Dec 2, 2002 at 9:42 pm

    James Mastros wrote:

    Just a few more nits to pick...
    On 12/02/2002 6:58 AM, Joseph F. Ryan wrote:

    The q() operator allows strings to be made with
    any non-space, non-letter, non-digit character as the delimeter instead
    of '. In addition, if the starting delimeter is a part of a paired
    set, such as (, [, <, or {, then the closing delimeter may be the
    matching member of the set. In addition, the reverse holds true;
    delimeters which are the tail end of a pair may use the starting item
    as the closing delimeter.
    We need to decide if this is a user doc or a developer doc/language
    specification. If it's the later, we need a regirous defintion of
    what a pair is.

    I'm more inclined towards a user doc; a rigorous definition of pairs in
    the tests should be good enough for the developers.
    There are a few special cases for delimeters; specifically : and #.
    : is not allowed because it might be used by custom-defined quoting
    operators to apply a property; # is allowed, but there cannot be a
    space between the operator and the #. In addition, comments are not
    allowed within # delimeted expressions (for obvious reasons).
    Are comments ever allowed within q() constructs? If not, ditch the
    statement about comments not being allowed in q## constructs.

    You're right, they're not. Woops.
    =head3 <<>>; expanding a string as a list.

    A set of braces is a special op that evaluates into the list of word
    A doubled set of angle brackets (<<text here>>) or a set of
    double-angle quotation marks (guillemets, «text here»).
    contained, using whitespace as the delimeter. It is similar to qw()
    from perl5, and can be thought of as roughly equivalent to:
    Are we getting rid of qw()? I assumed that we were keeping it as a
    longhand form of <<>>/guillemets, just like qq() is the longhand form
    of "".
    C<< "STRING".split(' ') >>
    I'd be more explicit here, and say C<<"STRING".split(/\s+/)>>. (The
    two are equivlent, but only because of special-casing; the second is
    more explicit.)

    Nope, split (' ', $string) is special; it eats up all preceding
    whitespace before splitting on the space, while with /\s+/ there
    will be an intial empty element. The example is straight from
    perl5's perlop anyways :)
    Have these defaults been defined somewhere? I'd rather see them be ',
    ' and '=>' by default...

    Well, that's what the RFC suggested, and there didnt seem
    to be many complaints about the defaults in the Apoc
    (besides the variable names) Like I said, I just winged it :)
    Note that hashes are unordered, and so the output will be unordered.
    Therefore, the following two expressions are equivalant:
    Get rid of the therefore; it seems to refer to the preceding sentance,
    which has nothing to do with the example.
    =item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
    Subroutines and Methods will interpolate their return value into the
    string, which will be handled in whichever type the return value is.
    Same for object methods. Note that parens B<are> required during
    interpolation so that the parser can disambiguate between object
    methods and object members.
    Has this been vetted? $(...)/etc seem to cover this case, and & being
    a qq() metachar makes using qq() strings to print HTML/XML difficult.

    Well, it was in Apoc 2:
    http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 252: interpolation of
    subroutines
    http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 222: interpolation of
    object method calls
    =item Escaped Characters
    # Basically the same as Perl5; also, how are locale semantics handled?

    \t tab
    \n newline
    \r return
    \f form feed
    \b backspace
    \a alarm (bell)
    \e escape
    Can we get some riggor here? Also, is \n the same everwhere, or do we
    play the same tricks we did with it in p5? (I think it should be the
    same everywhere, a CR char, "\cM". Disciplines, or encodings, or
    whatever we're calling them, can take care of it on IO.) Oh, and it
    might be nice for \0 to be NUL. (This used to be implicit with \0 as
    octal, but since \0 isn't octal anymore...)

    As someone who has had to use NT, Mac OS 9, and Solaris with much
    frequency, I can say I very much appreciated the special tricks
    that \n did (does).
    \b10 binary char
    \o33 octal char
    Numeric Literals, take 3
    (http://archive.develooper.com/perl6-documentation@perl.org/msg00462.html),
    in the "*** Bin/Hex/Oct shorthands" section, gives 0c123 as the
    shorthand form of octal numbers, so it doesn't make much sense for
    octal character constants to be \o123. Do we want to change shorthand
    octal literal numbers to 0o123 (I don't like this, it's hard to read),
    change octal chars to \c123 (can't do this without getting rid of, or
    changing, \c for control-character), get rid of octal chars entirely,
    or somthing else? (Baring a good "somthing else", I vote for killing
    octal chars.)

    This seems to be going back and forth:

    $octal_format = ($octal_format_still_exists) ?
    sprintf("\\%s%d",$octals_current_letter_of_the_week,
    $number) :
    undef;

    That should clear things up.
    \x1b hex char
    Exactly two digits after the \x? Perl5 attempts to do the right thing
    either way, but this can be confusing too -- "\xA" eq chr(0xA),
    "\xABar" eq chr(0xAB)."ar", "\xAQux" eq chr(0xA)."Qux".

    That was in perl5's perldoc, so I assume it is encouraged.

    You brought this up before:
    http://archive.develooper.com/perl6-documentation@perl.org/msg00485.html

    I still say to stick with perl5's behavior.
    \x{263a} wide hex char
    \c[ control char
    Rigor? What is \c~? perl5 thinks it's >, should perl6 agree?

    I don't see why it shouldn't.
    How about \c\x{1000} (that's invalid, but you get the point), is that
    equiv to \x{ff9c}?

    No, its "\c\" ~ "x{1000}"
    What about \cé, (e+acute accent), does that capitalize, then subtract
    64, or just subtract?
    \N{name} named Unicode character
    Reference to charnames pragmata, or however we end up defining the
    exact semantics of \N. (Since we don't know yet, just put in a FIXME,
    I suppose.)

    Just recycle perl5's, I suppose. Not *everything* needs to be redone
    from scratch.
    Is there any way to give the ordnal in decimal, like "\d192"? (I'm
    not sure how useful this would be, but it would be nice parrellelisim.
    OTOH, you can use chr() easily enough.

    That is a good point; if there is a 0dxxxxx, then there should be a
    "\dxxxxx".
    =item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>

    Modifiers apply a modification to text which they enclose; they can be
    embedded within interpolated strings.

    \L{} Lowercase all characters within brackets
    \U{} Uppercase all characters within brackets
    \Q{} Escape all characters that need escaping
    within brackets (except "}")
    Rigor: escape all non-alphanumerics.
    Do we still have the other modifiers that p5 supports, \l and \u?

    That's a good question. There was no reference to them in Apoc,
    however, that doesn't mean that they are gone. I haven't a clue,
    really.
    Do we want a new titlecase modifier, \T{james mastros} eq "James
    Mastros", doing the Right Thing for other languages, where it isn't so
    simple (there are complicated cases for this, but IIRC Unicode defines
    a robust algo to do this). I'll check on the Unicode stuff if anybody
    thinks it's a good idea... I'm uncertian, myself, I never liked the
    qq() case-modifers, so don't use them.

    There is ucfirst(), which I'm sure could be updated to handle Unicode;
    however, I don't know if it is important enough to deserve \T{}. You
    might want to ask Larry :)
    A string which is (possibly) interpolated and then executed as a system
    command with /bin/sh or its equivalent. Shell wildcards, pipes, and
    redirections will be honored. The collected standard output of the
    command is returned; standard error is unaffected. In scalar context,
    it comes back as a single (potentially multi-line) string, or undef if
    the command failed. In list context, returns a of list of lines split
    on the standard input separator, or an empty list if the command
    failed.
    This whole section is very unix-centric, but I'm not certian what to
    do about that -- the functionality is very system-specifc. Also, I
    suspect we're going to want to rewrite it anyway when we hammer out
    iterators, files, and context.

    Why?
    A line-oriented form of quoting is based on the shell "here-document"
    s/shell/unix borne shell/
    syntax. Following a << you specify a string to terminate the quoted
    material, and all lines following the current line down to the
    terminating string are the value of the item. The terminating string
    may be either an identifier (a word), or some quoted text. If quoted,
    the type of quotes you use determines the treatment of the text, just
    as in regular quoting. An unquoted identifier works like double quotes.
    The terminating string must appear by itself, and any preceding or
    following whitespace on the terminating line is discarded.
    I could have sworn that Larry recently put somthing out about the edge
    cases between << heredoc and << beginning-of-qw. I /think/ he said
    that qw("Foo" bar) must be written as << "Foo" bar>>, because
    otherwise it would be interpreted as a here-doc ending with Foo with
    double-quote interpolation. Can anybody find this, or is Larry watching?
    Also note that with single quoted here-docs, backslashes are not
    special, and are taken for a literal backslash, a behaivor that is
    different from normal single-quoted strings.
    Are \qq()s still special, even in <<'noninterpolating's? Either way,
    it should be explicitly noted.

    As far as I know, *nothing* is special in a single quoted heredoc.
    V-Strings are formed when 3 or digits are joined by decimal points,
    with a possible leading v. The resulting item is then treated like
    a string, rather than a number.

    =over 3
    Examples:
    $var = v5.8.0; # $var = "5.8.0";
    $var = 192.168.0.1; # $var = "192.168.0.1";
    =back
    Note that the v is non-optional for two-character v-strings.

    Good point, because otherwise its a number. Definately
    needs to be added to the test suite.
    I'd say somthing like:
    V-strings are actualy strings that just happen to look like numbers.
    Each dot-sepperated number is transformed into the character with that
    Unicode ordnal, and the string is concotantaed together.

    (The transformation from normal string to v-string looks like
    C<<$vstring='v' ~ join '.', map {ord} split //, $instring>>; the
    transformation from v-string to normal string looks like
    C<<print join '', map {chr} split /\./, $vstring>>;
    (Where vstring cannot begin with a leading 'v', for purposes of
    illistration.))

    Thus, C<<80.101.114.108.32.54.33 eq 'Perl 6!'>>

    Also, your examples are misleading at best. v5.8.0 eq "\x05\x08\x00".
    192.168.0.1 eq chr(192)~chr(168)~chr(0)~chr(1).

    You're right, the vstring section should be totally redone.

    Thanks for the feedback.,

    Joseph F. Ryan
    ryan.311@osu.edu
  • Michael Lazzaro at Dec 3, 2002 at 7:25 pm

    On Monday, December 2, 2002, at 01:42 PM, Joseph F. Ryan wrote:
    James Mastros wrote:
    We need to decide if this is a user doc or a developer doc/language
    specification. If it's the later, we need a regirous defintion of
    what a pair is.
    I'm more inclined towards a user doc; a rigorous definition of pairs in
    the tests should be good enough for the developers.
    I think we've been gravitating to a "language reference", geared
    primarily towards intermediate/advanced users. Something much more
    rigorous than beginners would be comfortable with (since it defines
    things in much greater detail than beginners would need) and written to
    assume *no* prior knowledge of Perl5. It will be useful to the
    developers -- in that it will describe required P6 behaviors in much
    greater detail than the Apocalypses and Exegesis -- but it will be
    written for users.

    The document should be taken to mean "we aren't describing how Perl6 is
    implemented or what the guts look like, but the language behaviors
    described herein should always be true."

    Do we want to change shorthand octal literal numbers to 0o123 (I
    don't like this, it's hard to read), change octal chars to \c123
    (can't do this without getting rid of, or changing, \c for
    control-character), get rid of octal chars entirely, or somthing
    else? (Baring a good "somthing else", I vote for killing octal
    chars.)
    As of Larry's last writings, there will definitely be an octal (it
    still has good uses), and it's syntax will definitely be 0o777 -- with
    an 'o', not a 'c'. The 'o' is a little hard to read, but the best
    anyone can come up with. It has to be lowercase 'o', not uppercase
    'O', which helps *enormously*. :-)

    (But since I assume you can use \d, \b, \h anywhere you use \o, you
    won't have to use octal at all if you don't want to.)

    MikeL
  • James Mastros at Dec 3, 2002 at 11:39 pm

    On 12/03/2002 2:27 PM, Michael Lazzaro wrote:
    I think we've been gravitating to a "language reference", geared
    primarily towards intermediate/advanced users. Something much more
    rigorous than beginners would be comfortable with (since it defines
    things in much greater detail than beginners would need) and written to
    assume *no* prior knowledge of Perl5. It will be useful to the
    developers -- in that it will describe required P6 behaviors in much
    greater detail than the Apocalypses and Exegesis -- but it will be
    written for users.
    I quite agree... which still means we need more rigor then this document
    has. The defintion of a pair and the semantics of \c[ and friends is
    important so that users know exactly what "\c~" means ('>',
    C<<chr(ord('['-64))>> ), and if C<<qq◄some words here►>> will work (no,
    those aren't a matched Pi/Pf or Pb/Pe pair, they're just Misc. Shapes
    that have no direction information, and we can't do them reasonably
    without looking at every character in Unicode visualy -- if somebody
    wants to, be my guest!).
    Do we want to change shorthand octal literal numbers to 0o123 (I
    don't like this, it's hard to read), change octal chars to \c123
    (can't do this without getting rid of, or changing, \c for
    control-character), get rid of octal chars entirely, or somthing
    else? (Baring a good "somthing else", I vote for killing octal chars.)
    As of Larry's last writings, there will definitely be an octal (it still
    has good uses), and it's syntax will definitely be 0o777 -- with an 'o',
    not a 'c'. The 'o' is a little hard to read, but the best anyone can
    come up with. It has to be lowercase 'o', not uppercase 'O', which
    helps *enormously*. :-)
    Huh? In that case, somebody should tell Angel Faus; "Numeric literals,
    take 3" says 0c777, and nobody disented. IIRC, in fact, nobody's
    descented to 0c777 since it was first suggested.
    (But since I assume you can use \d, \b, \h anywhere you use \o, you
    won't have to use octal at all if you don't want to.)
    \d is pure speculation on my part. (As is \0 == chr(0).)

    In fact, for this, and \o777 vs. whatever, I'm cc-ing perl6-language on
    this.


    p6l guys and the Design Team, if you havn't been following the
    conversation, here's how it goes:
    In perl5, octal numbers are specified as 0101 -- with a leading zero,
    and octal characters in strings are specified as "\0101". In perl6, our
    current documentation lists 0c101 as being the new way to write octal
    numbers, because it lets people use leading zeros in numbers in an
    intuitive way, and 0o101 was decided to be too difficult to read. The
    last writing of Larry to address this, as far as I (or anybody else who
    I've noticed) knows, says 0o101.

    It's generaly been agreed on, I think, that 0c101 is the way to go.

    Now, we're working on string literals, and the question is how we write
    octal character literals. The current writer of the string literal spec
    wants "\o101" to be the new way to write what is "\101" in perl5 (and
    C). I'd prefer this to be "\c101", to match up with how the current doc
    says octal numerics are written. Unfornatly, \c is taken for
    control-characters (ie "\c[" eq chr(ord '[' - 64) eq ESC), which is a
    more important use of \c.

    What do we do, oh great and wonderful design team?

    Numeric String Upside Downside
    ------- ------ ------ --------
    0101 \101 p5/C compatable Unintutive
    0o101 \o101 Consistent Hard to read
    0c101 \o101 keeps \c for Inconsistent
    control-char
    0c101 unsupported Consistent octal string chars
    unsupported
    0t101 \t101 Consistent what's tab?

    Or somthing else?
    All choices are bad, which one is best?

    -=- James Mastros
  • Luke Palmer at Dec 4, 2002 at 4:21 pm

    Date: Tue, 03 Dec 2002 18:39:27 -0500
    From: James Mastros <james@mastros.biz>

    Huh? In that case, somebody should tell Angel Faus; "Numeric literals,
    take 3" says 0c777, and nobody disented. IIRC, in fact, nobody's
    descented to 0c777 since it was first suggested.
    Well, except Larry. I remember him saying initially that it should be
    0o777, not just in the most recent one. I'm not much of a thread
    scaveneger, so I can't point you to the message.
    (But since I assume you can use \d, \b, \h anywhere you use \o, you
    won't have to use octal at all if you don't want to.)
    \d is pure speculation on my part. (As is \0 == chr(0).)

    p6l guys and the Design Team, if you havn't been following the
    conversation, here's how it goes:
    In perl5, octal numbers are specified as 0101 -- with a leading zero,
    and octal characters in strings are specified as "\0101". In perl6, our
    current documentation lists 0c101 as being the new way to write octal
    numbers, because it lets people use leading zeros in numbers in an
    intuitive way, and 0o101 was decided to be too difficult to read. The
    last writing of Larry to address this, as far as I (or anybody else who
    I've noticed) knows, says 0o101.

    It's generaly been agreed on, I think, that 0c101 is the way to go.
    I get a different impression. I think it's generally a
    non-controversial topic, and nobody really cares either way... aside
    from you, perhaps.
    Now, we're working on string literals, and the question is how we write
    octal character literals. The current writer of the string literal spec
    wants "\o101" to be the new way to write what is "\101" in perl5 (and
    C). I'd prefer this to be "\c101", to match up with how the current doc
    says octal numerics are written. Unfornatly, \c is taken for
    control-characters (ie "\c[" eq chr(ord '[' - 64) eq ESC), which is a
    more important use of \c.

    What do we do, oh great and wonderful design team?

    Numeric String Upside Downside
    ------- ------ ------ --------
    0101 \101 p5/C compatable Unintutive
    0o101 \o101 Consistent Hard to read
    Not that I'm "great and wonderful design team," but this one is my
    favorite. I don't think 0o101 is terribly hard to read, and "o"
    stands for "octal" a lot better than "c" does.

    That comes back in reading, too. Once people figure out that's the
    letter "o", and not a miniature zero, it will be perfectly clear what
    is meant. That's not true of "c".

    Luke
  • Larry Wall at Dec 4, 2002 at 6:37 pm
    It's o, not c.

    Larry
  • Larry Wall at Dec 4, 2002 at 7:47 pm
    On Mon, Dec 02, 2002 at 04:42:52PM -0500, Joseph F. Ryan wrote:
    : >Has this been vetted? $(...)/etc seem to cover this case, and & being
    : >a qq() metachar makes using qq() strings to print HTML/XML difficult.
    :
    :
    : Well, it was in Apoc 2:
    : http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 252: interpolation of
    : subroutines
    : http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 222: interpolation of
    : object method calls

    This is why the parens are required on sub interpolations.
    HTML/XML entities don't have parens. The parens are required on
    method interpolations because it's too easy to get an accidental
    "." after a variable.

    : >>=item Escaped Characters
    : >># Basically the same as Perl5; also, how are locale semantics handled?
    : >>
    : >> \t tab
    : >> \n newline
    : >> \r return
    : >> \f form feed
    : >> \b backspace
    : >> \a alarm (bell)
    : >> \e escape
    : >
    : >Can we get some riggor here? Also, is \n the same everwhere, or do we
    : >play the same tricks we did with it in p5? (I think it should be the
    : >same everywhere, a CR char, "\cM". Disciplines, or encodings, or
    : >whatever we're calling them, can take care of it on IO.) Oh, and it
    : >might be nice for \0 to be NUL. (This used to be implicit with \0 as
    : >octal, but since \0 isn't octal anymore...)
    :
    :
    : As someone who has had to use NT, Mac OS 9, and Solaris with much
    : frequency, I can say I very much appreciated the special tricks
    : that \n did (does).

    In regexen, \n matches any known newline sequence. In a string, it interpolates
    whatever is the native newline.

    : >> \b10 binary char

    Can't easily have this and backspace \b. But \b is already a mess from
    meaning word boundary in regexen. I'm inclined to throw out \b meaning
    backspace. It doesn't really work well in a Unicode world anyway. If
    you really mean it you can always specify a control-H.

    : >> \o33 octal char
    : >
    : >Numeric Literals, take 3
    : >(http://archive.develooper.com/perl6-documentation@perl.org/msg00462.html),
    : >in the "*** Bin/Hex/Oct shorthands" section, gives 0c123 as the shorthand
    : >form of octal numbers, so it doesn't make much sense for octal character
    : >constants to be \o123. Do we want to change shorthand octal literal
    : >numbers to 0o123 (I don't like this, it's hard to read), change octal
    : >chars to \c123 (can't do this without getting rid of, or changing, \c for
    : >control-character), get rid of octal chars entirely, or somthing else?
    : >(Baring a good "somthing else", I vote for killing octal chars.)
    :
    :
    : This seems to be going back and forth:
    :
    : $octal_format = ($octal_format_still_exists) ?
    : sprintf("\\%s%d",$octals_current_letter_of_the_week,
    : $number) :
    : undef;
    :
    : That should clear things up.
    :
    : >> \x1b hex char
    : >
    : >Exactly two digits after the \x? Perl5 attempts to do the right thing
    : >either way, but this can be confusing too -- "\xA" eq chr(0xA),
    : >"\xABar" eq chr(0xAB)."ar", "\xAQux" eq chr(0xA)."Qux".
    :
    :
    : That was in perl5's perldoc, so I assume it is encouraged.
    :
    : You brought this up before:
    : http://archive.develooper.com/perl6-documentation@perl.org/msg00485.html
    :
    : I still say to stick with perl5's behavior.
    :
    : >> \x{263a} wide hex char

    May switch all of these to use square brackets instead of curlies:

    \x[263a] wide hex char

    : >> \c[ control char

    \c is no longer control char. \c means what \N used to mean.
    (\N now means "not a newline".)

    To specify a control-H, say \c[^H].

    : >Rigor? What is \c~? perl5 thinks it's >, should perl6 agree?
    :
    :
    : I don't see why it shouldn't.
    :
    : >How about \c\x{1000} (that's invalid, but you get the point), is that
    : >equiv to \x{ff9c}?
    :
    :
    : No, its "\c\" ~ "x{1000}"
    :
    : >What about \cé, (e+acute accent), does that capitalize, then subtract
    : >64, or just subtract?

    \c[^é] would be é with it's 64-bit flipped.

    : >> \N{name} named Unicode character

    No, that's now \c[name]. \N means "not a newline". Note that
    \C[name] means "not a \c[name]".

    : Just recycle perl5's, I suppose. Not *everything* needs to be redone
    : from scratch.

    True, but everything is being reevaluated from scratch. Nothing gets a
    free ride just because it's in Perl 5.

    : >Is there any way to give the ordnal in decimal, like "\d192"? (I'm
    : >not sure how useful this would be, but it would be nice parrellelisim.
    : >OTOH, you can use chr() easily enough.
    :
    :
    : That is a good point; if there is a 0dxxxxx, then there should be a
    : "\dxxxxx".

    Can't, if \d still means digit. But maybe \x[1234] is shorthand for
    \c[0x1234]. In which case, you can always say \c[0d4321].

    : >>=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>
    : >>
    : >>Modifiers apply a modification to text which they enclose; they can be
    : >>embedded within interpolated strings.
    : >>
    : >> \L{} Lowercase all characters within brackets
    : >> \U{} Uppercase all characters within brackets
    : >> \Q{} Escape all characters that need escaping
    : >> within brackets (except "}")

    Square brackets preferred these days--looks less like a closure.

    : >Rigor: escape all non-alphanumerics.
    : >Do we still have the other modifiers that p5 supports, \l and \u?

    Yes, unless we want to roll over and allow \uXXXX for unicode, just to
    be compatible with the rest of the world.

    : >Do we want a new titlecase modifier, \T{james mastros} eq "James
    : >Mastros", doing the Right Thing for other languages, where it isn't so
    : >simple (there are complicated cases for this, but IIRC Unicode defines
    : >a robust algo to do this). I'll check on the Unicode stuff if anybody
    : >thinks it's a good idea... I'm uncertian, myself, I never liked the
    : >qq() case-modifers, so don't use them.
    :
    :
    : There is ucfirst(), which I'm sure could be updated to handle Unicode;
    : however, I don't know if it is important enough to deserve \T{}. You
    : might want to ask Larry :)

    \u does title-case already in Perl 5. \U[] will do uppercase.
    So \u\U[$foo] would titlecase the first letter and uppercase the rest.

    : >>A line-oriented form of quoting is based on the shell "here-document"
    : >
    : >s/shell/unix borne shell/
    : >
    : >>syntax. Following a << you specify a string to terminate the quoted
    : >>material, and all lines following the current line down to the
    : >>terminating string are the value of the item. The terminating string
    : >>may be either an identifier (a word), or some quoted text. If quoted,
    : >>the type of quotes you use determines the treatment of the text, just
    : >>as in regular quoting. An unquoted identifier works like double quotes.
    : >>The terminating string must appear by itself, and any preceding or
    : >>following whitespace on the terminating line is discarded.
    : >
    : >I could have sworn that Larry recently put somthing out about the edge
    : >cases between << heredoc and << beginning-of-qw. I /think/ he said
    : >that qw("Foo" bar) must be written as << "Foo" bar>>, because
    : >otherwise it would be interpreted as a here-doc ending with Foo with
    : >double-quote interpolation. Can anybody find this, or is Larry watching?

    Here docs require quotes, so <<EOF is the beginning of a qw//. (This week.)

    : >>Also note that with single quoted here-docs, backslashes are not
    : >>special, and are taken for a literal backslash, a behaivor that is
    : >>different from normal single-quoted strings.
    : >
    : >Are \qq()s still special, even in <<'noninterpolating's? Either way,
    : >it should be explicitly noted.
    :
    :
    : As far as I know, *nothing* is special in a single quoted heredoc.

    Here docs is where you *most* want the \qq[] ability. It is assumed that
    the sequence "\qq[" will not occur by accident very often in the typical
    single-quoted string.

    Larry
  • Michael Lazzaro at Dec 4, 2002 at 8:57 pm
    On Wednesday, December 4, 2002, at 11:47 AM, Larry Wall wrote:
    <stuff>

    This is great stuff, and I think it solves everything we were talking
    about. Joseph, can you edit your doc to match all this? (If not, just
    lemme know and I can help.)

    If anyone can think of any more issues w/ strings and heredocs, plz
    speak up.

    MikeL
  • Brad Hughes at Dec 4, 2002 at 10:51 pm

    Larry Wall wrote:
    On Mon, Dec 02, 2002 at 04:42:52PM -0500, Joseph F. Ryan wrote: [...]
    : As far as I know, *nothing* is special in a single quoted heredoc.

    Here docs is where you *most* want the \qq[] ability. It is assumed that
    the sequence "\qq[" will not occur by accident very often in the typical
    single-quoted string.
    For this we VMS Perlers offer many thanks...

    brad
  • Luke Palmer at Dec 2, 2002 at 9:36 pm

    Date: Mon, 02 Dec 2002 06:58:12 -0500
    From: "Joseph F. Ryan" <ryan.311@osu.edu>

    =pod

    =head1 Strings

    'The quick brown $animal'
    "The quick brown $animal"
    This will not format correctly in POD. Either indent or put it in a
    list.
    =head2 Non-Interpolating Constructs

    Non-Interpolating constructs are strings in which expressions do not
    interpolate, or expand. The one exception to this is that the
    ^
    s/,//
    backslash character, \, will always escape the character that
    immediately follows the it.
    ^^^^
    s/the //

    Except in single-quoted heredocs. Something about that doesn't seem
    right. I, personally, want single quotes and q[] to never use \
    specially.
    The base form for a non-interpolating string is the single-quoted
    string: 'string'. However, non-interpolating strings can also be formed
    with the q() operator. The q() operator allows strings to be made with
    any non-space, non-letter, non-digit character as the delimeter instead
    of '. In addition, if the starting delimeter is a part of a paired
    set, such as (, [, <, or {, then the closing delimeter may be the
    matching member of the set. In addition, the reverse holds true;
    delimeters which are the tail end of a pair may use the starting item
    as the closing delimeter.
    Perhaps it's best not to use q(), since () are not valid delimiters
    anymore (see A4, I think).
    =over 3
    Examples:

    $string = 'string' # $string = 'string'
    $string = q|string| # $string = 'string'
    $string = q(string) # $string = 'string'
    ^ ^
    Yoink.
    $string = q]string[ # $string = 'string'
    =back

    There are a few special cases for delimeters; specifically : and #.
    : is not allowed because it might be used by custom-defined quoting
    s/is/are/; s/it/they/
    operators to apply a property; # is allowed, but there cannot be a
    space between the operator and the #. In addition, comments are not
    allowed within # delimeted expressions (for obvious reasons).
    Yep. That's why () are not allowed, as they could mean an argument to
    a modifier.
    =head3 Embedding Interpolated Strings

    It is also possible to embed an interpolating string within a non-
    interpolating string by the use of the \qq{} construct. A string
    inside a \qq{} constructs acts exactly as if it were an interpolated
    string. Note that any end-brackets, "}", must be escaped within the
    the \qq{} construct so that the parser can read it correctly.
    I don't remember this from anywhere. Where was this discussed?
    =over 3
    Examples:

    @array = <one two three>; # @array = ('one', 'two', 'three');
    @array = <one <\> three>; # @array = ('one', '<>', 'three');
    Naturally, you mean:
    @array = <<one two three>>; # ...
    ...
    =head3 Interpolation Rules

    =over 3

    =item Scalars: C<"$scalar">, C<"$(expression)">
    Non-Reference scalars will simply interpolate as their value. $()
    forces its expression into scalar context, which is then handled as
    either a scalar or a reference, depending on how expression evaluates.

    =item Lists: C<"@list">, C<"@(expression)">
    Arrays and lists are interpolated by joining their list elements by the
    list's separator property, which is by default a space. Therefore, the
    following two expressions are equivalent:
    s/separator <sp> property/.separator attribute/
    =over 3
    print "@list";
    print "" ~ @list.join(@list.separator) ~ "";
    =back

    =item Hashes: C<"%hash">, C<"%(expression)">
    Hashes interpolate by joining its pairs on its .separator property,
    which by default is a newline. Pairs stringify by joining the key and
    value with the hash's .pairsep property, which by default is a space.
    Note that hashes are unordered, and so the output will be unordered.
    Therefore, the following two expressions are equivalant:
    Again, s:e/property/attribute/
    =item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
    Subroutines and Methods will interpolate their return value into the
    string, which will be handled in whichever type the return value is.
    Same for object methods. Note that parens B<are> required during
    interpolation so that the parser can disambiguate between object
    methods and object members.
    Object methods I<are> object members. All attributes are private, but
    accessors are auto-generated for you (if you don't say otherwise). So
    I don't think parens should be required.
    =head3 Embedding non-interpolated constructs: C<\q{}>

    Similar to embedding an interpolated string within a non-interpolated
    string, it is possible to embed a non-interpolated string within a
    interpolated string with \q{}. Any characters within a \q{} construct
    are treated as if they were in an non-interpolated string.
    And this is waaay down here away from \qq{}... why?
    =head2 Special Quoting

    =head3 Here-Docs

    A line-oriented form of quoting is based on the shell "here-document"
    syntax. Following a << you specify a string to terminate the quoted
    material, and all lines following the current line down to the
    terminating string are the value of the item. The terminating string
    may be either an identifier (a word), or some quoted text. If quoted,
    the type of quotes you use determines the treatment of the text, just
    as in regular quoting. An unquoted identifier works like double quotes.
    The terminating string must appear by itself, and any preceding or
    following whitespace on the terminating line is discarded.

    =over 3
    Examples:

    print << EOF;
    The price is $Price.
    EOF

    print << "EOF"; # same as above
    The price is $Price.
    EOF

    print << "EOF"; # same as above
    The price is $Price.
    EOF

    print << `EOC`; # execute commands
    echo hi there
    echo lo there
    EOC

    print <<"foo", <<"bar"; # you can stack them
    I said foo.
    foo
    I said bar.
    bar

    myfunc(<< "THIS", 23, <<'THAT');
    Here's a line
    or two.
    THIS
    and here's another.
    THAT
    You didn't mention that <<'THAT' doesn't interpolate.
    If you use a here-doc within a delimited construct, such as in s///eg,
    Ummm, s:e//$()/

    And that's interesting, as the rule might not still hold.
    the quoted material must come on the lines following tvhe final
    delimiter. So instead of:

    =over 3
    s/this/<<E . 'that'
    the other
    E
    . 'more '/eg;
    =back

    you have to write

    =over 3
    s/this/<<E . 'that'
    . 'more '/eg;
    the other
    E
    =back

    Also note that with single quoted here-docs, backslashes are not
    special, and are taken for a literal backslash, a behaivor that is
    different from normal single-quoted strings.
    Yes. Shoulda mentioned that a long time ago, IMO.

    Nice work.

    Luke
  • Joseph F. Ryan at Dec 2, 2002 at 10:11 pm

    Luke Palmer wrote:

    =head3 Embedding Interpolated Strings

    It is also possible to embed an interpolating string within a non-
    interpolating string by the use of the \qq{} construct. A string
    inside a \qq{} constructs acts exactly as if it were an interpolated
    string. Note that any end-brackets, "}", must be escaped within the
    the \qq{} construct so that the parser can read it correctly.
    I don't remember this from anywhere. Where was this discussed?
    http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 226: selective
    interpolation in single quotish context.
    Object methods I<are> object members. All attributes are private, but
    accessors are auto-generated for you (if you don't say otherwise). So
    I don't think parens should be required.
    Larry seems to disagree:
    http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 252: interpolation of
    subroutines
    =head3 Embedding non-interpolated constructs: C<\q{}>

    Similar to embedding an interpolated string within a non-interpolated
    string, it is possible to embed a non-interpolated string within a
    interpolated string with \q{}. Any characters within a \q{} construct
    are treated as if they were in an non-interpolated string.
    And this is waaay down here away from \qq{}... why?
    A sort of segregationist measure; I tried to keep single quote
    behaivors confined to a single quote section, and double quote
    behaviors to a double quote section. Perhaps some sort of
    reorganization is in order?
    =head2 Special Quoting

    =head3 Here-Docs

    A line-oriented form of quoting is based on the shell "here-document"
    syntax. Following a << you specify a string to terminate the quoted
    material, and all lines following the current line down to the
    terminating string are the value of the item. The terminating string
    may be either an identifier (a word), or some quoted text. If quoted,
    the type of quotes you use determines the treatment of the text, just
    as in regular quoting. An unquoted identifier works like double quotes.
    The terminating string must appear by itself, and any preceding or
    following whitespace on the terminating line is discarded.

    =over 3
    Examples:

    print << EOF;
    The price is $Price.
    EOF

    print << "EOF"; # same as above
    The price is $Price.
    EOF

    print << "EOF"; # same as above
    The price is $Price.
    EOF

    print << `EOC`; # execute commands
    echo hi there
    echo lo there
    EOC

    print <<"foo", <<"bar"; # you can stack them
    I said foo.
    foo
    I said bar.
    bar

    myfunc(<< "THIS", 23, <<'THAT');
    Here's a line
    or two.
    THIS
    and here's another.
    THAT
    You didn't mention that <<'THAT' doesn't interpolate.


    If you use a here-doc within a delimited construct, such as in s///eg,
    Ummm, s:e//$()/
    Silly me :)
    And that's interesting, as the rule might not still hold.
    Very true.
    the quoted material must come on the lines following tvhe final
    delimiter. So instead of:

    =over 3
    s/this/<<E . 'that'
    the other
    E
    . 'more '/eg;
    =back

    you have to write

    =over 3
    s/this/<<E . 'that'
    . 'more '/eg;
    the other
    E
    =back

    Also note that with single quoted here-docs, backslashes are not
    special, and are taken for a literal backslash, a behaivor that is
    different from normal single-quoted strings.
    Yes. Shoulda mentioned that a long time ago, IMO.
    Yeah, yeah, yeah.



    Thanks for responding,

    Joseph F. Ryan
    ryan.311@osu.edu
  • Andrew Wilson at Dec 4, 2002 at 12:26 am

    On Mon, Dec 02, 2002 at 02:36:52PM -0700, Luke Palmer wrote:
    There are a few special cases for delimeters; specifically : and #.
    : is not allowed because it might be used by custom-defined quoting
    s/is/are/; s/it/they/
    operators to apply a property; # is allowed, but there cannot be a
    space between the operator and the #. In addition, comments are not
    allowed within # delimeted expressions (for obvious reasons).
    No, that should definitely be is and it. Although
    s/delimeters/delimiters/

    andrew
    --
    Libra: (Sept. 23 - Oct. 23)
    You have always rejected the doctrine of reincarnation as superstitious
    nonsense, which comes as a great relief to Hindu couples expecting
    children early next month.
  • Andrew Wilson at Dec 4, 2002 at 2:55 am

    On Mon, Dec 02, 2002 at 06:58:12AM -0500, Joseph F. Ryan wrote:
    A string is formed when text is enclosed by a quoting operator.
    There are two types of quoting operators: interpolating and
    non-interpolating. In interpolating constructs, the value of a
    variable is substituted for the variable name within the string
    and certain characters have special meaning when preceded by a
    backslash (C<\>). In non-interpolating constructs, a variable
    name that appears within the string is used as-is. The simplest
    examples of these two types of quoting operators are strings
    delimited by double (interpolating) and single quotes
    (non-interpolating). For example:
    This was true of perl 5 which only interpolated variables. However,
    perl 6 will interpolate any expression if you put it in $() for scalar
    context or @() for list context (you mention this later). It's
    misleading to talk about interpolation in terms of variables, lots of
    things interpolate. I've reworked that paragraph above, I think
    something like this is more appropriate.

    A literal string is formed when text is enclosed by a quoting
    operator, there are two types: interpolating and non-interpolating.
    Interpolating constructs insert (interpolate) the value of an
    expression into the string in place of themselves. The simplest
    examples of the two types of quoting operators are strings delimited
    by double (interpolating) and single (non-interpolating) quotes.

    Certain characters, known as meta characters, have special
    meaning within a literal string. The most basic of these is the
    backslash (C<\>), it is special in both interpolated and
    non-interpolated strings. The backslash makes ordinary characters
    special and special characters ordinary. Non-interpolated strings
    only have two meta characters, the backslash itself and the character
    that is being used as the delimiter. Interpolated strings have many
    more meta characters, see the section on Escaped characters below.

    The most basic expression that may be interpolated is a scalar
    variable. In non-interpolating constructs, a variable name that
    appears within the string is used as-is. For example:
    'The quick brown $animal'
    "The quick brown $animal"

    In the first string, perl will take each character literally and
    perform no special processing. In the second string, the value
    of the variable $animal is inserted within the string at that
    location. If $animal had had the value "fox", then the second
    string would have become "The quick brown fox".
    perl will take each character in the first string literally and
    perform no special processing. However, the value of the variable
    $animal is inserted into the second string string in place of the text
    $animal. If $animal had had the value "fox", then the second string
    would have become "The quick brown fox".
    More on the various quoting operators below.

    =head2 Non-Interpolating Constructs

    Non-Interpolating constructs are strings in which expressions do not
    interpolate, or expand. The one exception to this is that the
    backslash character, \, will always escape the character that
    immediately follows the it.
    Are you sure about this? In perl 5 a \ is a literal \ unless it precedes
    the string delimiter or another \. Larry said (Apoc 2) this wasn't
    changing with the exception of adding \qq{} to allow inserting
    interpolating constructs into non-interpolating constructs.
    The base form for a non-interpolating string is the single-quoted
    string: 'string'. However, non-interpolating strings can also be formed
    with the q() operator. The q() operator allows strings to be made with
    any non-space, non-letter, non-digit character as the delimeter instead
    of '. In addition, if the starting delimeter is a part of a paired
    s/instead of '//
    set, such as (, [, <, or {, then the closing delimeter may be the
    matching member of the set. In addition, the reverse holds true;
    delimeters which are the tail end of a pair may use the starting item
    as the closing delimeter.

    =over 3
    Examples:

    $string = 'string' # $string = 'string'
    $string = q|string| # $string = 'string'
    $string = q(string) # $string = 'string'
    $string = q]string[ # $string = 'string'
    =back

    There are a few special cases for delimeters; specifically : and #.
    : is not allowed because it might be used by custom-defined quoting
    operators to apply a property; # is allowed, but there cannot be a
    space between the operator and the #. In addition, comments are not
    allowed within # delimeted expressions (for obvious reasons).

    =head3 Embedding Interpolated Strings

    It is also possible to embed an interpolating string within a non-
    interpolating string by the use of the \qq{} construct. A string
    inside a \qq{} constructs acts exactly as if it were an interpolated
    string. Note that any end-brackets, "}", must be escaped within the
    the \qq{} construct so that the parser can read it correctly.
    Do these nest arbitrarily?

    q{my string \qq{interpolate $this \q{but not $this} or am $I} Just asking for trouble?}
    =over 3
    Examples ( assuming C<< $var="two" >> ):

    $string = 'one \qq{$var} two' # $string = 'one two three'
    $string = 'one\qq{ {$var\} }two' # $string = 'one {two} three'
    =back

    =head3 <<>>; expanding a string as a list.

    A set of braces is a special op that evaluates into the list of words
    contained, using whitespace as the delimeter. It is similar to qw()
    from perl5, and can be thought of as roughly equivalent to:
    C<< "STRING".split(' ') >>

    =over 3
    Examples:

    @array = <one two three>; # @array = ('one', 'two', 'three');
    @array = <one <\> three>; # @array = ('one', '<>', 'three');
    =back

    =head2 Interpolating Constructs

    Interpolating constructs are another form of string in which variables
    that are embedded into the string are expanded into their value at
    runtime. Interpolated strings are formed using the double quote:
    "string". In addition, qq() is a synonym for "", which is similar to
    q() being a synonym for ''. The rules for interpolation are as
    follows:
    Again this shouldn't say variables, it should say expressions.
    =head3 Interpolation Rules

    =over 3

    =item Scalars: C<"$scalar">, C<"$(expression)">
    Non-Reference scalars will simply interpolate as their value. $()
    forces its expression into scalar context, which is then handled as
    either a scalar or a reference, depending on how expression evaluates.

    =item Lists: C<"@list">, C<"@(expression)">
    Arrays and lists are interpolated by joining their list elements by the
    list's separator property, which is by default a space. Therefore, the
    following two expressions are equivalent:

    =over 3
    print "@list";
    print "" ~ @list.join(@list.separator) ~ "";
    =back

    =item Hashes: C<"%hash">, C<"%(expression)">
    Hashes interpolate by joining its pairs on its .separator property,
    which by default is a newline. Pairs stringify by joining the key and
    value with the hash's .pairsep property, which by default is a space.
    Note that hashes are unordered, and so the output will be unordered.
    Therefore, the following two expressions are equivalent:

    =over 3
    print "%hash";
    print "" ~
    join ( %hash.separator,
    map { $_ ~ %hash.pairsep ~ %hash{$_} } %hash.keys
    ~ "";
    =back

    =item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
    Subroutines and Methods will interpolate their return value into the
    string, which will be handled in whichever type the return value is.
    Same for object methods. Note that parens B<are> required during
    interpolation so that the parser can disambiguate between object
    methods and object members.

    =item References C<"$ref">
    # Behavior not defined

    =item Default Object Stringification C<"$obj">
    # Behavior not defined

    =item Escaped Characters
    # Basically the same as Perl5; also, how are locale semantics handled?

    \t tab
    \n newline
    \r return
    \f form feed
    \b backspace
    \a alarm (bell)
    \e escape
    \b10 binary char
    \o33 octal char
    \x1b hex char
    \x{263a} wide hex char
    \c[ control char
    \N{name} named Unicode character

    =item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>

    Modifiers apply a modification to text which they enclose; they can be
    embedded within interpolated strings.

    \L{} Lowercase all characters within brackets
    \U{} Uppercase all characters within brackets
    \Q{} Escape all characters that need escaping
    within brackets (except "}")

    =back

    =head3 Stopping Interpolation (\Q)

    Within an interpolated string, interpolation of expressions can be
    stopped by \Q.
    This needs more explanation.

    Within an interpolated string, perl will always try to take the
    longest possible expression to interpolate. For instance this:
    C[0]"> will interpolate element C<0> of the array C<@list>. If
    you want perl to include the array C<@list> followed by the string
    C<"[0]">, then you need to use the null string (specified by C<\Q>):
    =over 3
    Example:
    @list = (1,2);
    print "@list\Q[0]"; # prints '1 2[0]'
    =back

    =head3 Embedding non-interpolated constructs: C<\q{}>

    Similar to embedding an interpolated string within a non-interpolated
    string, it is possible to embed a non-interpolated string within a
    interpolated string with \q{}. Any characters within a \q{} construct
    are treated as if they were in an non-interpolated string.
    I would lose everything up to the first comma and change the with to
    using.

    It is possible to embed a non-interpolated string within an
    interpolated string using \q{}. Any characters within the \q{}
    construct are treated as if they were in an non-interpolated string.
    =over 3
    Example:
    "string \q{$variable}" # $variable will not be interpolated
    =back
    =head3 C<qx()>, backticks (C<``>)

    A string which is (possibly) interpolated and then executed as a system
    command with /bin/sh or its equivalent. Shell wildcards, pipes, and
    redirections will be honored. The collected standard output of the
    command is returned; standard error is unaffected. In scalar context,
    it comes back as a single (potentially multi-line) string, or undef if
    the command failed. In list context, returns a of list of lines split
    on the standard input separator, or an empty list if the command
    failed.

    =head2 Special Quoting

    =head3 Here-Docs

    A line-oriented form of quoting is based on the shell "here-document"
    syntax. Following a << you specify a string to terminate the quoted
    material, and all lines following the current line down to the
    terminating string are the value of the item. The terminating string
    may be either an identifier (a word), or some quoted text. If quoted,
    the type of quotes you use determines the treatment of the text, just
    as in regular quoting. An unquoted identifier works like double quotes.
    The terminating string must appear by itself, and any preceding or
    following whitespace on the terminating line is discarded.

    =over 3
    Examples:

    print << EOF;
    The price is $Price.
    EOF

    print << "EOF"; # same as above
    The price is $Price.
    EOF

    print << "EOF"; # same as above
    The price is $Price.
    EOF

    print << `EOC`; # execute commands
    echo hi there
    echo lo there
    EOC

    print <<"foo", <<"bar"; # you can stack them
    I said foo.
    foo
    I said bar.
    bar

    myfunc(<< "THIS", 23, <<'THAT');
    Here's a line
    or two.
    THIS
    and here's another.
    THAT

    =back

    Don't forget that you have to put a semicolon on the end to finish the
    statement, as Perl doesn't know you're not going to try to do this:

    =over 3
    print <<ABC
    179231
    ABC
    + 20;
    =back

    If you want your here-docs to be indented with the rest of the code,
    you'll need to remove leading whitespace from each line manually:

    =over 3
    ($quote = <<'FINIS') =~ s/^\s+//gm;
    The Road goes ever on and on,
    down from the door where it began.
    FINIS
    =back

    If you use a here-doc within a delimited construct, such as in s///eg,
    the quoted material must come on the lines following the final
    delimiter. So instead of:

    =over 3
    s/this/<<E . 'that'
    the other
    E
    . 'more '/eg;
    =back

    you have to write

    =over 3
    s/this/<<E . 'that'
    . 'more '/eg;
    the other
    E
    =back
    Are these examples using . for string concatenation? If they are that
    should be a ~.

    andrew
    --
    Scorpio: (Oct. 24 - Nov. 21)
    You've always thought of Death as a journey into the infinite, but it
    turns out to be a lot more like Harry Dean Stanton.
  • Joseph F. Ryan at Dec 4, 2002 at 5:56 am

    Andrew Wilson wrote:
    Do these nest arbitrarily?

    q{my string \qq{interpolate $this \q{but not $this} or am $I} Just asking for trouble?}
    As far as I know, yes. The current behavior already allows this,
    unless the design team vetos it for some reason.

    Thanks for all of the great suggestions; I'll try to get another
    revision that incorporates them sometime tomarrow.

    Joseph F. Ryan
    ryan.311@osu.edu

Related Discussions