I've integrated most of the proposed suggestions, as well as a section
on vstrings and a winged section on hash interpolation. So that leaves
these known issues:
- Reference stringification
- Default Object Strinigifcation
(.AS_STRING needs to be added to the doc as well, but I figure it
is still getting hammered out)
- Does <<>> mess up here-docs?
(I'm inclined to say that <<>> is more trouble than it is worth,
and to ditch <<>>, simply sticking with qw())
Also, would any sort of diff be helpful with these document revisions?
There's Text::ParagraphDiff, but that doesn't work too well with pod,
since pod is line-oriented rather than paragraph oriented. Regular
diffs aren't that helpful on text either. However, either one is
better than nothing, so if you'd like one, let me know.
Joseph. F Ryan
ryan.311@osu.edu
=pod
=head1 Strings
A string is formed when text is enclosed by a quoting operator.
There are two types of quoting operators: interpolating and
non-interpolating. In interpolating constructs, the value of a
variable is substituted for the variable name within the string
and certain characters have special meaning when preceded by a
backslash (C<\>). In non-interpolating constructs, a variable
name that appears within the string is used as-is. The simplest
examples of these two types of quoting operators are strings
delimited by double (interpolating) and single quotes
(non-interpolating). For example:
'The quick brown $animal'
"The quick brown $animal"
In the first string, perl will take each character literally and
perform no special processing. In the second string, the value
of the variable $animal is inserted within the string at that
location. If $animal had had the value "fox", then the second
string would have become "The quick brown fox".
More on the various quoting operators below.
=head2 Non-Interpolating Constructs
Non-Interpolating constructs are strings in which expressions do not
interpolate, or expand. The one exception to this is that the
backslash character, \, will always escape the character that
immediately follows the it.
The base form for a non-interpolating string is the single-quoted
string: 'string'. However, non-interpolating strings can also be formed
with the q() operator. The q() operator allows strings to be made with
any non-space, non-letter, non-digit character as the delimeter instead
of '. In addition, if the starting delimeter is a part of a paired
set, such as (, [, <, or {, then the closing delimeter may be the
matching member of the set. In addition, the reverse holds true;
delimeters which are the tail end of a pair may use the starting item
as the closing delimeter.
=over 3
Examples:
$string = 'string' # $string = 'string'
$string = q|string| # $string = 'string'
$string = q(string) # $string = 'string'
$string = q]string[ # $string = 'string'
=back
There are a few special cases for delimeters; specifically : and #.
: is not allowed because it might be used by custom-defined quoting
operators to apply a property; # is allowed, but there cannot be a
space between the operator and the #. In addition, comments are not
allowed within # delimeted expressions (for obvious reasons).
=head3 Embedding Interpolated Strings
It is also possible to embed an interpolating string within a non-
interpolating string by the use of the \qq{} construct. A string
inside a \qq{} constructs acts exactly as if it were an interpolated
string. Note that any end-brackets, "}", must be escaped within the
the \qq{} construct so that the parser can read it correctly.
=over 3
Examples ( assuming C<< $var="two" >> ):
$string = 'one \qq{$var} two' # $string = 'one two three'
$string = 'one\qq{ {$var\} }two' # $string = 'one {two} three'
=back
=head3 <<>>; expanding a string as a list.
A set of braces is a special op that evaluates into the list of words
contained, using whitespace as the delimeter. It is similar to qw()
from perl5, and can be thought of as roughly equivalent to:
C<< "STRING".split(' ') >>
=over 3
Examples:
@array = <one two three>; # @array = ('one', 'two', 'three');
@array = <one <\> three>; # @array = ('one', '<>', 'three');
=back
=head2 Interpolating Constructs
Interpolating constructs are another form of string in which variables
that are embedded into the string are expanded into their value at
runtime. Interpolated strings are formed using the double quote:
"string". In addition, qq() is a synonym for "", which is similar to
q() being a synoynm for ''. The rules for interpolation are as
follows:
=head3 Interpolation Rules
=over 3
=item Scalars: C<"$scalar">, C<"$(expression)">
Non-Reference scalars will simply interpolate as their value. $()
forces its expression into scalar context, which is then handled as
either a scalar or a reference, depending on how expression evaluates.
=item Lists: C<"@list">, C<"@(expression)">
Arrays and lists are interpolated by joining their list elements by the
list's separator property, which is by default a space. Therefore, the
following two expressions are equivalent:
=over 3
print "@list";
print "" ~ @list.join(@list.separator) ~ "";
=back
=item Hashes: C<"%hash">, C<"%(expression)">
Hashes interpolate by joining its pairs on its .separator property,
which by default is a newline. Pairs stringify by joining the key and
value with the hash's .pairsep property, which by default is a space.
Note that hashes are unordered, and so the output will be unordered.
Therefore, the following two expressions are equivalant:
=over 3
print "%hash";
print "" ~
join ( %hash.separator,
map { $_ ~ %hash.pairsep ~ %hash{$_} } %hash.keys
~ "";
=back
=item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
Subroutines and Methods will interpolate their return value into the
string, which will be handled in whichever type the return value is.
Same for object methods. Note that parens B<are> required during
interpolation so that the parser can disambiguate between object
methods and object members.
=item References C<"$ref">
# Behavior not defined
=item Default Object Stringification C<"$obj">
# Behavior not defined
=item Escaped Characters
# Basically the same as Perl5; also, how are locale semantics handled?
\t tab
\n newline
\r return
\f form feed
\b backspace
\a alarm (bell)
\e escape
\b10 binary char
\o33 octal char
\x1b hex char
\x{263a} wide hex char
\c[ control char
\N{name} named Unicode character
=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>
Modifiers apply a modification to text which they enclose; they can be
embedded within interpolated strings.
\L{} Lowercase all characters within brackets
\U{} Uppercase all characters within brackets
\Q{} Escape all characters that need escaping
within brackets (except "}")
=back
=head3 Stopping Interpolation (\Q)
Within an interpolated string, interpolation of expressions can be
stopped by \Q.
=over 3
Example:
@list = (1,2);
print "@list\Q[0]"; # prints '1 2[0]'
=back
=head3 Embedding non-interpolated constructs: C<\q{}>
Similar to embedding an interpolated string within a non-interpolated
string, it is possible to embed a non-interpolated string within a
interpolated string with \q{}. Any characters within a \q{} construct
are treated as if they were in an non-interpolated string.
=over 3
Example:
"string \q{$variable}" # $variable will not be interpolated
=back
=head3 C<qx()>, backticks (C<``>)
A string which is (possibly) interpolated and then executed as a system
command with /bin/sh or its equivalent. Shell wildcards, pipes, and
redirections will be honored. The collected standard output of the
command is returned; standard error is unaffected. In scalar context,
it comes back as a single (potentially multi-line) string, or undef if
the command failed. In list context, returns a of list of lines split
on the standard input separator, or an empty list if the command
failed.
=head2 Special Quoting
=head3 Here-Docs
A line-oriented form of quoting is based on the shell "here-document"
syntax. Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item. The terminating string
may be either an identifier (a word), or some quoted text. If quoted,
the type of quotes you use determines the treatment of the text, just
as in regular quoting. An unquoted identifier works like double quotes.
The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.
=over 3
Examples:
print << EOF;
The price is $Price.
EOF
print << "EOF"; # same as above
The price is $Price.
EOF
print << "EOF"; # same as above
The price is $Price.
EOF
print << `EOC`; # execute commands
echo hi there
echo lo there
EOC
print <<"foo", <<"bar"; # you can stack them
I said foo.
foo
I said bar.
bar
myfunc(<< "THIS", 23, <<'THAT');
Here's a line
or two.
THIS
and here's another.
THAT
=back
Don't forget that you have to put a semicolon on the end to finish the
statement, as Perl doesn't know you're not going to try to do this:
=over 3
print <<ABC
179231
ABC
+ 20;
=back
If you want your here-docs to be indented with the rest of the code,
you'll need to remove leading whitespace from each line manually:
=over 3
($quote = <<'FINIS') =~ s/^\s+//gm;
The Road goes ever on and on,
down from the door where it began.
FINIS
=back
If you use a here-doc within a delimited construct, such as in s///eg,
the quoted material must come on the lines following the final
delimiter. So instead of:
=over 3
s/this/<<E . 'that'
the other
E
. 'more '/eg;
=back
you have to write
=over 3
s/this/<<E . 'that'
. 'more '/eg;
the other
E
=back
Also note that with single quoted here-docs, backslashes are not
special, and are taken for a literal backslash, a behaivor that is
different from normal single-quoted strings.
=head3 V-Strings
V-Strings are formed when 3 or digits are joined by decimal points,
with a possible leading v. The resulting item is then treated like
a string, rather than a number.
=over 3
Examples:
$var = v5.8.0; # $var = "5.8.0";
$var = 192.168.0.1; # $var = "192.168.0.1";
=back
=head2 Gory Details of parsing quoted constructs
No string section would be complete without a "Gory details of parsing
quoted constructs"; however, since the current implementation in P6C
doesn't have support for \Q, \Q{}, \L{}, \U{}, \N{name}, or \x{}, the
implementation may have to change. If you really need your blood and
guts, please see P6C/Tree/String.pm for the current string-parsing
semantics.
=cut
String Literals, take 2
| Tweet |
|
Search Discussions
-
James Mastros at Dec 2, 2002 at 5:14 pm ⇧
Just a few more nits to pick...On 12/02/2002 6:58 AM, Joseph F. Ryan wrote:We need to decide if this is a user doc or a developer doc/language
The q() operator allows strings to be made with
any non-space, non-letter, non-digit character as the delimeter instead
of '. In addition, if the starting delimeter is a part of a paired
set, such as (, [, <, or {, then the closing delimeter may be the
matching member of the set. In addition, the reverse holds true;
delimeters which are the tail end of a pair may use the starting item
as the closing delimeter.
specification. If it's the later, we need a regirous defintion of what
a pair is.There are a few special cases for delimeters; specifically : and #.Are comments ever allowed within q() constructs? If not, ditch the
: is not allowed because it might be used by custom-defined quoting
operators to apply a property; # is allowed, but there cannot be a
space between the operator and the #. In addition, comments are not
allowed within # delimeted expressions (for obvious reasons).
statement about comments not being allowed in q## constructs.=head3 <<>>; expanding a string as a list.A doubled set of angle brackets (<<text here>>) or a set of double-angle
A set of braces is a special op that evaluates into the list of word
quotation marks (guillemets, «text here»).contained, using whitespace as the delimeter. It is similar to qw()Are we getting rid of qw()? I assumed that we were keeping it as a
from perl5, and can be thought of as roughly equivalent to:
longhand form of <<>>/guillemets, just like qq() is the longhand form of "".C<< "STRING".split(' ') >>I'd be more explicit here, and say C<<"STRING".split(/\s+/)>>. (The two
are equivlent, but only because of special-casing; the second is more
explicit.)=head2 Interpolating ConstructsHave these defaults been defined somewhere? I'd rather see them be ', '
Interpolating constructs are another form of string in which variables
that are embedded into the string are expanded into their value at
runtime. Interpolated strings are formed using the double quote:
...using double quotes, as in "string".
"string". In addition, qq() is a synonym for "", which is similar to
q() being a synoynm for ''.
...similarly to...
=item Hashes: C<"%hash">, C<"%(expression)">
Hashes interpolate by joining its pairs on its .separator property,
which by default is a newline. Pairs stringify by joining the key and
value with the hash's .pairsep property, which by default is a space.
and '=>' by default...Note that hashes are unordered, and so the output will be unordered.Get rid of the therefore; it seems to refer to the preceding sentance,
Therefore, the following two expressions are equivalant:
which has nothing to do with the example.=item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">Has this been vetted? $(...)/etc seem to cover this case, and & being a
Subroutines and Methods will interpolate their return value into the
string, which will be handled in whichever type the return value is.
Same for object methods. Note that parens B<are> required during
interpolation so that the parser can disambiguate between object
methods and object members.
qq() metachar makes using qq() strings to print HTML/XML difficult.=item Escaped CharactersCan we get some riggor here? Also, is \n the same everwhere, or do we
# Basically the same as Perl5; also, how are locale semantics handled?
\t tab
\n newline
\r return
\f form feed
\b backspace
\a alarm (bell)
\e escape
play the same tricks we did with it in p5? (I think it should be the
same everywhere, a CR char, "\cM". Disciplines, or encodings, or
whatever we're calling them, can take care of it on IO.) Oh, and it
might be nice for \0 to be NUL. (This used to be implicit with \0 as
octal, but since \0 isn't octal anymore...)\b10 binary charNumeric Literals, take 3
\o33 octal char
(http://archive.develooper.com/perl6-documentation@perl.org/msg00462.html),
in the "*** Bin/Hex/Oct shorthands" section, gives 0c123 as the
shorthand form of octal numbers, so it doesn't make much sense for octal
character constants to be \o123. Do we want to change shorthand octal
literal numbers to 0o123 (I don't like this, it's hard to read), change
octal chars to \c123 (can't do this without getting rid of, or changing,
\c for control-character), get rid of octal chars entirely, or
somthing else? (Baring a good "somthing else", I vote for killing octal
chars.)\x1b hex charExactly two digits after the \x? Perl5 attempts to do the right thing
either way, but this can be confusing too -- "\xA" eq chr(0xA), "\xABar"
eq chr(0xAB)."ar", "\xAQux" eq chr(0xA)."Qux".\x{263a} wide hex charRigor? What is \c~? perl5 thinks it's >, should perl6 agree? How
\c[ control char
about \c\x{1000} (that's invalid, but you get the point), is that equiv
to \x{ff9c}? What about \cé, (e+acute accent), does that capitalize,
then subtract 64, or just subtract?\N{name} named Unicode characterReference to charnames pragmata, or however we end up defining the exact
semantics of \N. (Since we don't know yet, just put in a FIXME, I suppose.)
Is there any way to give the ordnal in decimal, like "\d192"? (I'm not
sure how useful this would be, but it would be nice parrellelisim.
OTOH, you can use chr() easily enough.=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>Rigor: escape all non-alphanumerics.
Modifiers apply a modification to text which they enclose; they can be
embedded within interpolated strings.
\L{} Lowercase all characters within brackets
\U{} Uppercase all characters within brackets
\Q{} Escape all characters that need escaping
within brackets (except "}")
Do we still have the other modifiers that p5 supports, \l and \u? Do we
want a new titlecase modifier, \T{james mastros} eq "James Mastros",
doing the Right Thing for other languages, where it isn't so simple
(there are complicated cases for this, but IIRC Unicode defines a robust
algo to do this). I'll check on the Unicode stuff if anybody thinks
it's a good idea... I'm uncertian, myself, I never liked the qq()
case-modifers, so don't use them.A string which is (possibly) interpolated and then executed as a systemThis whole section is very unix-centric, but I'm not certian what to do
command with /bin/sh or its equivalent. Shell wildcards, pipes, and
redirections will be honored. The collected standard output of the
command is returned; standard error is unaffected. In scalar context,
it comes back as a single (potentially multi-line) string, or undef if
the command failed. In list context, returns a of list of lines split
on the standard input separator, or an empty list if the command
failed.
about that -- the functionality is very system-specifc. Also, I suspect
we're going to want to rewrite it anyway when we hammer out iterators,
files, and context.A line-oriented form of quoting is based on the shell "here-document"I could have sworn that Larry recently put somthing out about the edge
s/shell/unix borne shell/
syntax. Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item. The terminating string
may be either an identifier (a word), or some quoted text. If quoted,
the type of quotes you use determines the treatment of the text, just
as in regular quoting. An unquoted identifier works like double quotes.
The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.
cases between << heredoc and << beginning-of-qw. I /think/ he said that
qw("Foo" bar) must be written as << "Foo" bar>>, because otherwise it
would be interpreted as a here-doc ending with Foo with double-quote
interpolation. Can anybody find this, or is Larry watching?Also note that with single quoted here-docs, backslashes are notAre \qq()s still special, even in <<'noninterpolating's? Either way, it
special, and are taken for a literal backslash, a behaivor that is
different from normal single-quoted strings.
should be explicitly noted.V-Strings are formed when 3 or digits are joined by decimal points,Note that the v is non-optional for two-character v-strings.
with a possible leading v. The resulting item is then treated like
a string, rather than a number.
=over 3
Examples:
$var = v5.8.0; # $var = "5.8.0";
$var = 192.168.0.1; # $var = "192.168.0.1";
=back
I'd say somthing like:
V-strings are actualy strings that just happen to look like numbers.
Each dot-sepperated number is transformed into the character with that
Unicode ordnal, and the string is concotantaed together.
(The transformation from normal string to v-string looks like
C<<$vstring='v' ~ join '.', map {ord} split //, $instring>>; the
transformation from v-string to normal string looks like
C<<print join '', map {chr} split /\./, $vstring>>;
(Where vstring cannot begin with a leading 'v', for purposes of
illistration.))
Thus, C<<80.101.114.108.32.54.33 eq 'Perl 6!'>>
Also, your examples are misleading at best. v5.8.0 eq "\x05\x08\x00".
192.168.0.1 eq chr(192)~chr(168)~chr(0)~chr(1).
-=- James Mastros
-
Joseph F. Ryan at Dec 2, 2002 at 9:42 pm ⇧
James Mastros wrote:
Just a few more nits to pick...On 12/02/2002 6:58 AM, Joseph F. Ryan wrote:We need to decide if this is a user doc or a developer doc/language
The q() operator allows strings to be made with
any non-space, non-letter, non-digit character as the delimeter instead
of '. In addition, if the starting delimeter is a part of a paired
set, such as (, [, <, or {, then the closing delimeter may be the
matching member of the set. In addition, the reverse holds true;
delimeters which are the tail end of a pair may use the starting item
as the closing delimeter.
specification. If it's the later, we need a regirous defintion of
what a pair is.
I'm more inclined towards a user doc; a rigorous definition of pairs in
the tests should be good enough for the developers.There are a few special cases for delimeters; specifically : and #.Are comments ever allowed within q() constructs? If not, ditch the
: is not allowed because it might be used by custom-defined quoting
operators to apply a property; # is allowed, but there cannot be a
space between the operator and the #. In addition, comments are not
allowed within # delimeted expressions (for obvious reasons).
statement about comments not being allowed in q## constructs.
You're right, they're not. Woops.=head3 <<>>; expanding a string as a list.A doubled set of angle brackets (<<text here>>) or a set of
A set of braces is a special op that evaluates into the list of word
double-angle quotation marks (guillemets, «text here»).contained, using whitespace as the delimeter. It is similar to qw()Are we getting rid of qw()? I assumed that we were keeping it as a
from perl5, and can be thought of as roughly equivalent to:
longhand form of <<>>/guillemets, just like qq() is the longhand form
of "".C<< "STRING".split(' ') >>I'd be more explicit here, and say C<<"STRING".split(/\s+/)>>. (The
two are equivlent, but only because of special-casing; the second is
more explicit.)
Nope, split (' ', $string) is special; it eats up all preceding
whitespace before splitting on the space, while with /\s+/ there
will be an intial empty element. The example is straight from
perl5's perlop anyways :)Have these defaults been defined somewhere? I'd rather see them be ',
' and '=>' by default...
Well, that's what the RFC suggested, and there didnt seem
to be many complaints about the defaults in the Apoc
(besides the variable names) Like I said, I just winged it :)Note that hashes are unordered, and so the output will be unordered.Get rid of the therefore; it seems to refer to the preceding sentance,
Therefore, the following two expressions are equivalant:
which has nothing to do with the example.=item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">Has this been vetted? $(...)/etc seem to cover this case, and & being
Subroutines and Methods will interpolate their return value into the
string, which will be handled in whichever type the return value is.
Same for object methods. Note that parens B<are> required during
interpolation so that the parser can disambiguate between object
methods and object members.
a qq() metachar makes using qq() strings to print HTML/XML difficult.
Well, it was in Apoc 2:
http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 252: interpolation of
subroutines
http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 222: interpolation of
object method calls=item Escaped CharactersCan we get some riggor here? Also, is \n the same everwhere, or do we
# Basically the same as Perl5; also, how are locale semantics handled?
\t tab
\n newline
\r return
\f form feed
\b backspace
\a alarm (bell)
\e escape
play the same tricks we did with it in p5? (I think it should be the
same everywhere, a CR char, "\cM". Disciplines, or encodings, or
whatever we're calling them, can take care of it on IO.) Oh, and it
might be nice for \0 to be NUL. (This used to be implicit with \0 as
octal, but since \0 isn't octal anymore...)
As someone who has had to use NT, Mac OS 9, and Solaris with much
frequency, I can say I very much appreciated the special tricks
that \n did (does).\b10 binary charNumeric Literals, take 3
\o33 octal char
(http://archive.develooper.com/perl6-documentation@perl.org/msg00462.html),
in the "*** Bin/Hex/Oct shorthands" section, gives 0c123 as the
shorthand form of octal numbers, so it doesn't make much sense for
octal character constants to be \o123. Do we want to change shorthand
octal literal numbers to 0o123 (I don't like this, it's hard to read),
change octal chars to \c123 (can't do this without getting rid of, or
changing, \c for control-character), get rid of octal chars entirely,
or somthing else? (Baring a good "somthing else", I vote for killing
octal chars.)
This seems to be going back and forth:
$octal_format = ($octal_format_still_exists) ?
sprintf("\\%s%d",$octals_current_letter_of_the_week,
$number) :
undef;
That should clear things up.\x1b hex charExactly two digits after the \x? Perl5 attempts to do the right thing
either way, but this can be confusing too -- "\xA" eq chr(0xA),
"\xABar" eq chr(0xAB)."ar", "\xAQux" eq chr(0xA)."Qux".
That was in perl5's perldoc, so I assume it is encouraged.
You brought this up before:
http://archive.develooper.com/perl6-documentation@perl.org/msg00485.html
I still say to stick with perl5's behavior.\x{263a} wide hex charRigor? What is \c~? perl5 thinks it's >, should perl6 agree?
\c[ control char
I don't see why it shouldn't.How about \c\x{1000} (that's invalid, but you get the point), is that
equiv to \x{ff9c}?
No, its "\c\" ~ "x{1000}"What about \cé, (e+acute accent), does that capitalize, then subtract
64, or just subtract?\N{name} named Unicode characterReference to charnames pragmata, or however we end up defining the
exact semantics of \N. (Since we don't know yet, just put in a FIXME,
I suppose.)
Just recycle perl5's, I suppose. Not *everything* needs to be redone
from scratch.Is there any way to give the ordnal in decimal, like "\d192"? (I'm
not sure how useful this would be, but it would be nice parrellelisim.
OTOH, you can use chr() easily enough.
That is a good point; if there is a 0dxxxxx, then there should be a
"\dxxxxx".=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>Rigor: escape all non-alphanumerics.
Modifiers apply a modification to text which they enclose; they can be
embedded within interpolated strings.
\L{} Lowercase all characters within brackets
\U{} Uppercase all characters within brackets
\Q{} Escape all characters that need escaping
within brackets (except "}")
Do we still have the other modifiers that p5 supports, \l and \u?
That's a good question. There was no reference to them in Apoc,
however, that doesn't mean that they are gone. I haven't a clue,
really.Do we want a new titlecase modifier, \T{james mastros} eq "James
Mastros", doing the Right Thing for other languages, where it isn't so
simple (there are complicated cases for this, but IIRC Unicode defines
a robust algo to do this). I'll check on the Unicode stuff if anybody
thinks it's a good idea... I'm uncertian, myself, I never liked the
qq() case-modifers, so don't use them.
There is ucfirst(), which I'm sure could be updated to handle Unicode;
however, I don't know if it is important enough to deserve \T{}. You
might want to ask Larry :)A string which is (possibly) interpolated and then executed as a systemThis whole section is very unix-centric, but I'm not certian what to
command with /bin/sh or its equivalent. Shell wildcards, pipes, and
redirections will be honored. The collected standard output of the
command is returned; standard error is unaffected. In scalar context,
it comes back as a single (potentially multi-line) string, or undef if
the command failed. In list context, returns a of list of lines split
on the standard input separator, or an empty list if the command
failed.
do about that -- the functionality is very system-specifc. Also, I
suspect we're going to want to rewrite it anyway when we hammer out
iterators, files, and context.
Why?A line-oriented form of quoting is based on the shell "here-document"I could have sworn that Larry recently put somthing out about the edge
s/shell/unix borne shell/
syntax. Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item. The terminating string
may be either an identifier (a word), or some quoted text. If quoted,
the type of quotes you use determines the treatment of the text, just
as in regular quoting. An unquoted identifier works like double quotes.
The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.
cases between << heredoc and << beginning-of-qw. I /think/ he said
that qw("Foo" bar) must be written as << "Foo" bar>>, because
otherwise it would be interpreted as a here-doc ending with Foo with
double-quote interpolation. Can anybody find this, or is Larry watching?Also note that with single quoted here-docs, backslashes are notAre \qq()s still special, even in <<'noninterpolating's? Either way,
special, and are taken for a literal backslash, a behaivor that is
different from normal single-quoted strings.
it should be explicitly noted.
As far as I know, *nothing* is special in a single quoted heredoc.V-Strings are formed when 3 or digits are joined by decimal points,Note that the v is non-optional for two-character v-strings.
with a possible leading v. The resulting item is then treated like
a string, rather than a number.
=over 3
Examples:
$var = v5.8.0; # $var = "5.8.0";
$var = 192.168.0.1; # $var = "192.168.0.1";
=back
Good point, because otherwise its a number. Definately
needs to be added to the test suite.I'd say somthing like:
V-strings are actualy strings that just happen to look like numbers.
Each dot-sepperated number is transformed into the character with that
Unicode ordnal, and the string is concotantaed together.
(The transformation from normal string to v-string looks like
C<<$vstring='v' ~ join '.', map {ord} split //, $instring>>; the
transformation from v-string to normal string looks like
C<<print join '', map {chr} split /\./, $vstring>>;
(Where vstring cannot begin with a leading 'v', for purposes of
illistration.))
Thus, C<<80.101.114.108.32.54.33 eq 'Perl 6!'>>
Also, your examples are misleading at best. v5.8.0 eq "\x05\x08\x00".
192.168.0.1 eq chr(192)~chr(168)~chr(0)~chr(1).
You're right, the vstring section should be totally redone.
Thanks for the feedback.,
Joseph F. Ryan
ryan.311@osu.edu
-
Michael Lazzaro at Dec 3, 2002 at 7:25 pm ⇧
I think we've been gravitating to a "language reference", gearedOn Monday, December 2, 2002, at 01:42 PM, Joseph F. Ryan wrote:
James Mastros wrote:We need to decide if this is a user doc or a developer doc/languageI'm more inclined towards a user doc; a rigorous definition of pairs in
specification. If it's the later, we need a regirous defintion of
what a pair is.
the tests should be good enough for the developers.
primarily towards intermediate/advanced users. Something much more
rigorous than beginners would be comfortable with (since it defines
things in much greater detail than beginners would need) and written to
assume *no* prior knowledge of Perl5. It will be useful to the
developers -- in that it will describe required P6 behaviors in much
greater detail than the Apocalypses and Exegesis -- but it will be
written for users.
The document should be taken to mean "we aren't describing how Perl6 is
implemented or what the guts look like, but the language behaviors
described herein should always be true."As of Larry's last writings, there will definitely be an octal (itDo we want to change shorthand octal literal numbers to 0o123 (I
don't like this, it's hard to read), change octal chars to \c123
(can't do this without getting rid of, or changing, \c for
control-character), get rid of octal chars entirely, or somthing
else? (Baring a good "somthing else", I vote for killing octal
chars.)
still has good uses), and it's syntax will definitely be 0o777 -- with
an 'o', not a 'c'. The 'o' is a little hard to read, but the best
anyone can come up with. It has to be lowercase 'o', not uppercase
'O', which helps *enormously*. :-)
(But since I assume you can use \d, \b, \h anywhere you use \o, you
won't have to use octal at all if you don't want to.)
MikeL
-
James Mastros at Dec 3, 2002 at 11:39 pm ⇧
I quite agree... which still means we need more rigor then this documentOn 12/03/2002 2:27 PM, Michael Lazzaro wrote:
I think we've been gravitating to a "language reference", geared
primarily towards intermediate/advanced users. Something much more
rigorous than beginners would be comfortable with (since it defines
things in much greater detail than beginners would need) and written to
assume *no* prior knowledge of Perl5. It will be useful to the
developers -- in that it will describe required P6 behaviors in much
greater detail than the Apocalypses and Exegesis -- but it will be
written for users.
has. The defintion of a pair and the semantics of \c[ and friends is
important so that users know exactly what "\c~" means ('>',
C<<chr(ord('['-64))>> ), and if C<<qq◄some words here►>> will work (no,
those aren't a matched Pi/Pf or Pb/Pe pair, they're just Misc. Shapes
that have no direction information, and we can't do them reasonably
without looking at every character in Unicode visualy -- if somebody
wants to, be my guest!).Huh? In that case, somebody should tell Angel Faus; "Numeric literals,As of Larry's last writings, there will definitely be an octal (it stillDo we want to change shorthand octal literal numbers to 0o123 (I
don't like this, it's hard to read), change octal chars to \c123
(can't do this without getting rid of, or changing, \c for
control-character), get rid of octal chars entirely, or somthing
else? (Baring a good "somthing else", I vote for killing octal chars.)
has good uses), and it's syntax will definitely be 0o777 -- with an 'o',
not a 'c'. The 'o' is a little hard to read, but the best anyone can
come up with. It has to be lowercase 'o', not uppercase 'O', which
helps *enormously*. :-)
take 3" says 0c777, and nobody disented. IIRC, in fact, nobody's
descented to 0c777 since it was first suggested.(But since I assume you can use \d, \b, \h anywhere you use \o, you\d is pure speculation on my part. (As is \0 == chr(0).)
won't have to use octal at all if you don't want to.)
In fact, for this, and \o777 vs. whatever, I'm cc-ing perl6-language on
this.
p6l guys and the Design Team, if you havn't been following the
conversation, here's how it goes:
In perl5, octal numbers are specified as 0101 -- with a leading zero,
and octal characters in strings are specified as "\0101". In perl6, our
current documentation lists 0c101 as being the new way to write octal
numbers, because it lets people use leading zeros in numbers in an
intuitive way, and 0o101 was decided to be too difficult to read. The
last writing of Larry to address this, as far as I (or anybody else who
I've noticed) knows, says 0o101.
It's generaly been agreed on, I think, that 0c101 is the way to go.
Now, we're working on string literals, and the question is how we write
octal character literals. The current writer of the string literal spec
wants "\o101" to be the new way to write what is "\101" in perl5 (and
C). I'd prefer this to be "\c101", to match up with how the current doc
says octal numerics are written. Unfornatly, \c is taken for
control-characters (ie "\c[" eq chr(ord '[' - 64) eq ESC), which is a
more important use of \c.
What do we do, oh great and wonderful design team?
Numeric String Upside Downside
------- ------ ------ --------
0101 \101 p5/C compatable Unintutive
0o101 \o101 Consistent Hard to read
0c101 \o101 keeps \c for Inconsistent
control-char
0c101 unsupported Consistent octal string chars
unsupported
0t101 \t101 Consistent what's tab?
Or somthing else?
All choices are bad, which one is best?
-=- James Mastros
-
Luke Palmer at Dec 4, 2002 at 4:21 pm ⇧
Well, except Larry. I remember him saying initially that it should beDate: Tue, 03 Dec 2002 18:39:27 -0500
From: James Mastros <james@mastros.biz>
Huh? In that case, somebody should tell Angel Faus; "Numeric literals,
take 3" says 0c777, and nobody disented. IIRC, in fact, nobody's
descented to 0c777 since it was first suggested.
0o777, not just in the most recent one. I'm not much of a thread
scaveneger, so I can't point you to the message.I get a different impression. I think it's generally a(But since I assume you can use \d, \b, \h anywhere you use \o, you\d is pure speculation on my part. (As is \0 == chr(0).)
won't have to use octal at all if you don't want to.)
p6l guys and the Design Team, if you havn't been following the
conversation, here's how it goes:
In perl5, octal numbers are specified as 0101 -- with a leading zero,
and octal characters in strings are specified as "\0101". In perl6, our
current documentation lists 0c101 as being the new way to write octal
numbers, because it lets people use leading zeros in numbers in an
intuitive way, and 0o101 was decided to be too difficult to read. The
last writing of Larry to address this, as far as I (or anybody else who
I've noticed) knows, says 0o101.
It's generaly been agreed on, I think, that 0c101 is the way to go.
non-controversial topic, and nobody really cares either way... aside
from you, perhaps.Now, we're working on string literals, and the question is how we writeNot that I'm "great and wonderful design team," but this one is my
octal character literals. The current writer of the string literal spec
wants "\o101" to be the new way to write what is "\101" in perl5 (and
C). I'd prefer this to be "\c101", to match up with how the current doc
says octal numerics are written. Unfornatly, \c is taken for
control-characters (ie "\c[" eq chr(ord '[' - 64) eq ESC), which is a
more important use of \c.
What do we do, oh great and wonderful design team?
Numeric String Upside Downside
------- ------ ------ --------
0101 \101 p5/C compatable Unintutive
0o101 \o101 Consistent Hard to read
favorite. I don't think 0o101 is terribly hard to read, and "o"
stands for "octal" a lot better than "c" does.
That comes back in reading, too. Once people figure out that's the
letter "o", and not a miniature zero, it will be perfectly clear what
is meant. That's not true of "c".
Luke
-
Larry Wall at Dec 4, 2002 at 7:47 pm ⇧
On Mon, Dec 02, 2002 at 04:42:52PM -0500, Joseph F. Ryan wrote:
: >Has this been vetted? $(...)/etc seem to cover this case, and & being
: >a qq() metachar makes using qq() strings to print HTML/XML difficult.
:
:
: Well, it was in Apoc 2:
: http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 252: interpolation of
: subroutines
: http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 222: interpolation of
: object method calls
This is why the parens are required on sub interpolations.
HTML/XML entities don't have parens. The parens are required on
method interpolations because it's too easy to get an accidental
"." after a variable.
: >>=item Escaped Characters
: >># Basically the same as Perl5; also, how are locale semantics handled?
: >>
: >> \t tab
: >> \n newline
: >> \r return
: >> \f form feed
: >> \b backspace
: >> \a alarm (bell)
: >> \e escape
: >
: >Can we get some riggor here? Also, is \n the same everwhere, or do we
: >play the same tricks we did with it in p5? (I think it should be the
: >same everywhere, a CR char, "\cM". Disciplines, or encodings, or
: >whatever we're calling them, can take care of it on IO.) Oh, and it
: >might be nice for \0 to be NUL. (This used to be implicit with \0 as
: >octal, but since \0 isn't octal anymore...)
:
:
: As someone who has had to use NT, Mac OS 9, and Solaris with much
: frequency, I can say I very much appreciated the special tricks
: that \n did (does).
In regexen, \n matches any known newline sequence. In a string, it interpolates
whatever is the native newline.
: >> \b10 binary char
Can't easily have this and backspace \b. But \b is already a mess from
meaning word boundary in regexen. I'm inclined to throw out \b meaning
backspace. It doesn't really work well in a Unicode world anyway. If
you really mean it you can always specify a control-H.
: >> \o33 octal char
: >
: >Numeric Literals, take 3
: >(http://archive.develooper.com/perl6-documentation@perl.org/msg00462.html),
: >in the "*** Bin/Hex/Oct shorthands" section, gives 0c123 as the shorthand
: >form of octal numbers, so it doesn't make much sense for octal character
: >constants to be \o123. Do we want to change shorthand octal literal
: >numbers to 0o123 (I don't like this, it's hard to read), change octal
: >chars to \c123 (can't do this without getting rid of, or changing, \c for
: >control-character), get rid of octal chars entirely, or somthing else?
: >(Baring a good "somthing else", I vote for killing octal chars.)
:
:
: This seems to be going back and forth:
:
: $octal_format = ($octal_format_still_exists) ?
: sprintf("\\%s%d",$octals_current_letter_of_the_week,
: $number) :
: undef;
:
: That should clear things up.
:
: >> \x1b hex char
: >
: >Exactly two digits after the \x? Perl5 attempts to do the right thing
: >either way, but this can be confusing too -- "\xA" eq chr(0xA),
: >"\xABar" eq chr(0xAB)."ar", "\xAQux" eq chr(0xA)."Qux".
:
:
: That was in perl5's perldoc, so I assume it is encouraged.
:
: You brought this up before:
: http://archive.develooper.com/perl6-documentation@perl.org/msg00485.html
:
: I still say to stick with perl5's behavior.
:
: >> \x{263a} wide hex char
May switch all of these to use square brackets instead of curlies:
\x[263a] wide hex char
: >> \c[ control char
\c is no longer control char. \c means what \N used to mean.
(\N now means "not a newline".)
To specify a control-H, say \c[^H].
: >Rigor? What is \c~? perl5 thinks it's >, should perl6 agree?
:
:
: I don't see why it shouldn't.
:
: >How about \c\x{1000} (that's invalid, but you get the point), is that
: >equiv to \x{ff9c}?
:
:
: No, its "\c\" ~ "x{1000}"
:
: >What about \cé, (e+acute accent), does that capitalize, then subtract
: >64, or just subtract?
\c[^é] would be é with it's 64-bit flipped.
: >> \N{name} named Unicode character
No, that's now \c[name]. \N means "not a newline". Note that
\C[name] means "not a \c[name]".
: Just recycle perl5's, I suppose. Not *everything* needs to be redone
: from scratch.
True, but everything is being reevaluated from scratch. Nothing gets a
free ride just because it's in Perl 5.
: >Is there any way to give the ordnal in decimal, like "\d192"? (I'm
: >not sure how useful this would be, but it would be nice parrellelisim.
: >OTOH, you can use chr() easily enough.
:
:
: That is a good point; if there is a 0dxxxxx, then there should be a
: "\dxxxxx".
Can't, if \d still means digit. But maybe \x[1234] is shorthand for
\c[0x1234]. In which case, you can always say \c[0d4321].
: >>=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>
: >>
: >>Modifiers apply a modification to text which they enclose; they can be
: >>embedded within interpolated strings.
: >>
: >> \L{} Lowercase all characters within brackets
: >> \U{} Uppercase all characters within brackets
: >> \Q{} Escape all characters that need escaping
: >> within brackets (except "}")
Square brackets preferred these days--looks less like a closure.
: >Rigor: escape all non-alphanumerics.
: >Do we still have the other modifiers that p5 supports, \l and \u?
Yes, unless we want to roll over and allow \uXXXX for unicode, just to
be compatible with the rest of the world.
: >Do we want a new titlecase modifier, \T{james mastros} eq "James
: >Mastros", doing the Right Thing for other languages, where it isn't so
: >simple (there are complicated cases for this, but IIRC Unicode defines
: >a robust algo to do this). I'll check on the Unicode stuff if anybody
: >thinks it's a good idea... I'm uncertian, myself, I never liked the
: >qq() case-modifers, so don't use them.
:
:
: There is ucfirst(), which I'm sure could be updated to handle Unicode;
: however, I don't know if it is important enough to deserve \T{}. You
: might want to ask Larry :)
\u does title-case already in Perl 5. \U[] will do uppercase.
So \u\U[$foo] would titlecase the first letter and uppercase the rest.
: >>A line-oriented form of quoting is based on the shell "here-document"
: >
: >s/shell/unix borne shell/
: >
: >>syntax. Following a << you specify a string to terminate the quoted
: >>material, and all lines following the current line down to the
: >>terminating string are the value of the item. The terminating string
: >>may be either an identifier (a word), or some quoted text. If quoted,
: >>the type of quotes you use determines the treatment of the text, just
: >>as in regular quoting. An unquoted identifier works like double quotes.
: >>The terminating string must appear by itself, and any preceding or
: >>following whitespace on the terminating line is discarded.
: >
: >I could have sworn that Larry recently put somthing out about the edge
: >cases between << heredoc and << beginning-of-qw. I /think/ he said
: >that qw("Foo" bar) must be written as << "Foo" bar>>, because
: >otherwise it would be interpreted as a here-doc ending with Foo with
: >double-quote interpolation. Can anybody find this, or is Larry watching?
Here docs require quotes, so <<EOF is the beginning of a qw//. (This week.)
: >>Also note that with single quoted here-docs, backslashes are not
: >>special, and are taken for a literal backslash, a behaivor that is
: >>different from normal single-quoted strings.
: >
: >Are \qq()s still special, even in <<'noninterpolating's? Either way,
: >it should be explicitly noted.
:
:
: As far as I know, *nothing* is special in a single quoted heredoc.
Here docs is where you *most* want the \qq[] ability. It is assumed that
the sequence "\qq[" will not occur by accident very often in the typical
single-quoted string.
Larry
-
Michael Lazzaro at Dec 4, 2002 at 8:57 pm ⇧
On Wednesday, December 4, 2002, at 11:47 AM, Larry Wall wrote:
<stuff>
This is great stuff, and I think it solves everything we were talking
about. Joseph, can you edit your doc to match all this? (If not, just
lemme know and I can help.)
If anyone can think of any more issues w/ strings and heredocs, plz
speak up.
MikeL
-
Brad Hughes at Dec 4, 2002 at 10:51 pm ⇧
For this we VMS Perlers offer many thanks...Larry Wall wrote:
On Mon, Dec 02, 2002 at 04:42:52PM -0500, Joseph F. Ryan wrote: [...]
: As far as I know, *nothing* is special in a single quoted heredoc.
Here docs is where you *most* want the \qq[] ability. It is assumed that
the sequence "\qq[" will not occur by accident very often in the typical
single-quoted string.
brad
-
Luke Palmer at Dec 2, 2002 at 9:36 pm ⇧
This will not format correctly in POD. Either indent or put it in aDate: Mon, 02 Dec 2002 06:58:12 -0500
From: "Joseph F. Ryan" <ryan.311@osu.edu>
=pod
=head1 Strings
'The quick brown $animal'
"The quick brown $animal"
list.=head2 Non-Interpolating Constructs^
Non-Interpolating constructs are strings in which expressions do not
interpolate, or expand. The one exception to this is that the
s/,//backslash character, \, will always escape the character that^^^^
immediately follows the it.
s/the //
Except in single-quoted heredocs. Something about that doesn't seem
right. I, personally, want single quotes and q[] to never use \
specially.The base form for a non-interpolating string is the single-quotedPerhaps it's best not to use q(), since () are not valid delimiters
string: 'string'. However, non-interpolating strings can also be formed
with the q() operator. The q() operator allows strings to be made with
any non-space, non-letter, non-digit character as the delimeter instead
of '. In addition, if the starting delimeter is a part of a paired
set, such as (, [, <, or {, then the closing delimeter may be the
matching member of the set. In addition, the reverse holds true;
delimeters which are the tail end of a pair may use the starting item
as the closing delimeter.
anymore (see A4, I think).=over 3^ ^
Examples:
$string = 'string' # $string = 'string'
$string = q|string| # $string = 'string'
$string = q(string) # $string = 'string'
Yoink.$string = q]string[ # $string = 'string'Yep. That's why () are not allowed, as they could mean an argument to
=back
There are a few special cases for delimeters; specifically : and #.
: is not allowed because it might be used by custom-defined quoting
s/is/are/; s/it/they/
operators to apply a property; # is allowed, but there cannot be a
space between the operator and the #. In addition, comments are not
allowed within # delimeted expressions (for obvious reasons).
a modifier.=head3 Embedding Interpolated StringsNaturally, you mean:
It is also possible to embed an interpolating string within a non-
interpolating string by the use of the \qq{} construct. A string
inside a \qq{} constructs acts exactly as if it were an interpolated
string. Note that any end-brackets, "}", must be escaped within the
the \qq{} construct so that the parser can read it correctly.
I don't remember this from anywhere. Where was this discussed?
=over 3
Examples:
@array = <one two three>; # @array = ('one', 'two', 'three');
@array = <one <\> three>; # @array = ('one', '<>', 'three');
@array = <<one two three>>; # ...
...=head3 Interpolation RulesObject methods I<are> object members. All attributes are private, but
=over 3
=item Scalars: C<"$scalar">, C<"$(expression)">
Non-Reference scalars will simply interpolate as their value. $()
forces its expression into scalar context, which is then handled as
either a scalar or a reference, depending on how expression evaluates.
=item Lists: C<"@list">, C<"@(expression)">
Arrays and lists are interpolated by joining their list elements by the
list's separator property, which is by default a space. Therefore, the
following two expressions are equivalent:
s/separator <sp> property/.separator attribute/
=over 3
print "@list";
print "" ~ @list.join(@list.separator) ~ "";
=back
=item Hashes: C<"%hash">, C<"%(expression)">
Hashes interpolate by joining its pairs on its .separator property,
which by default is a newline. Pairs stringify by joining the key and
value with the hash's .pairsep property, which by default is a space.
Note that hashes are unordered, and so the output will be unordered.
Therefore, the following two expressions are equivalant:
Again, s:e/property/attribute/
=item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
Subroutines and Methods will interpolate their return value into the
string, which will be handled in whichever type the return value is.
Same for object methods. Note that parens B<are> required during
interpolation so that the parser can disambiguate between object
methods and object members.
accessors are auto-generated for you (if you don't say otherwise). So
I don't think parens should be required.=head3 Embedding non-interpolated constructs: C<\q{}>Ummm, s:e//$()/
Similar to embedding an interpolated string within a non-interpolated
string, it is possible to embed a non-interpolated string within a
interpolated string with \q{}. Any characters within a \q{} construct
are treated as if they were in an non-interpolated string.
And this is waaay down here away from \qq{}... why?
=head2 Special Quoting
=head3 Here-Docs
A line-oriented form of quoting is based on the shell "here-document"
syntax. Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item. The terminating string
may be either an identifier (a word), or some quoted text. If quoted,
the type of quotes you use determines the treatment of the text, just
as in regular quoting. An unquoted identifier works like double quotes.
The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.
=over 3
Examples:
print << EOF;
The price is $Price.
EOF
print << "EOF"; # same as above
The price is $Price.
EOF
print << "EOF"; # same as above
The price is $Price.
EOF
print << `EOC`; # execute commands
echo hi there
echo lo there
EOC
print <<"foo", <<"bar"; # you can stack them
I said foo.
foo
I said bar.
bar
myfunc(<< "THIS", 23, <<'THAT');
Here's a line
or two.
THIS
and here's another.
THAT
You didn't mention that <<'THAT' doesn't interpolate.
If you use a here-doc within a delimited construct, such as in s///eg,
And that's interesting, as the rule might not still hold.the quoted material must come on the lines following tvhe finalYes. Shoulda mentioned that a long time ago, IMO.
delimiter. So instead of:
=over 3
s/this/<<E . 'that'
the other
E
. 'more '/eg;
=back
you have to write
=over 3
s/this/<<E . 'that'
. 'more '/eg;
the other
E
=back
Also note that with single quoted here-docs, backslashes are not
special, and are taken for a literal backslash, a behaivor that is
different from normal single-quoted strings.
Nice work.
Luke
-
Joseph F. Ryan at Dec 2, 2002 at 10:11 pm ⇧
http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 226: selectiveLuke Palmer wrote:I don't remember this from anywhere. Where was this discussed?
=head3 Embedding Interpolated Strings
It is also possible to embed an interpolating string within a non-
interpolating string by the use of the \qq{} construct. A string
inside a \qq{} constructs acts exactly as if it were an interpolated
string. Note that any end-brackets, "}", must be escaped within the
the \qq{} construct so that the parser can read it correctly.
interpolation in single quotish context.Object methods I<are> object members. All attributes are private, butLarry seems to disagree:
accessors are auto-generated for you (if you don't say otherwise). So
I don't think parens should be required.
http://www.perl.com/pub/a/2001/05/03/wall.html#rfc 252: interpolation of
subroutinesA sort of segregationist measure; I tried to keep single quote=head3 Embedding non-interpolated constructs: C<\q{}>And this is waaay down here away from \qq{}... why?
Similar to embedding an interpolated string within a non-interpolated
string, it is possible to embed a non-interpolated string within a
interpolated string with \q{}. Any characters within a \q{} construct
are treated as if they were in an non-interpolated string.
behaivors confined to a single quote section, and double quote
behaviors to a double quote section. Perhaps some sort of
reorganization is in order?Very true.=head2 Special QuotingYou didn't mention that <<'THAT' doesn't interpolate.
=head3 Here-Docs
A line-oriented form of quoting is based on the shell "here-document"
syntax. Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item. The terminating string
may be either an identifier (a word), or some quoted text. If quoted,
the type of quotes you use determines the treatment of the text, just
as in regular quoting. An unquoted identifier works like double quotes.
The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.
=over 3
Examples:
print << EOF;
The price is $Price.
EOF
print << "EOF"; # same as above
The price is $Price.
EOF
print << "EOF"; # same as above
The price is $Price.
EOF
print << `EOC`; # execute commands
echo hi there
echo lo there
EOC
print <<"foo", <<"bar"; # you can stack them
I said foo.
foo
I said bar.
bar
myfunc(<< "THIS", 23, <<'THAT');
Here's a line
or two.
THIS
and here's another.
THATIf you use a here-doc within a delimited construct, such as in s///eg,Ummm, s:e//$()/
Silly me :)
And that's interesting, as the rule might not still hold.Yeah, yeah, yeah.the quoted material must come on the lines following tvhe finalYes. Shoulda mentioned that a long time ago, IMO.
delimiter. So instead of:
=over 3
s/this/<<E . 'that'
the other
E
. 'more '/eg;
=back
you have to write
=over 3
s/this/<<E . 'that'
. 'more '/eg;
the other
E
=back
Also note that with single quoted here-docs, backslashes are not
special, and are taken for a literal backslash, a behaivor that is
different from normal single-quoted strings.
Thanks for responding,
Joseph F. Ryan
ryan.311@osu.edu
-
Andrew Wilson at Dec 4, 2002 at 12:26 am ⇧
No, that should definitely be is and it. AlthoughOn Mon, Dec 02, 2002 at 02:36:52PM -0700, Luke Palmer wrote:
There are a few special cases for delimeters; specifically : and #.
: is not allowed because it might be used by custom-defined quoting
s/is/are/; s/it/they/
operators to apply a property; # is allowed, but there cannot be a
space between the operator and the #. In addition, comments are not
allowed within # delimeted expressions (for obvious reasons).
s/delimeters/delimiters/
andrew--
Libra: (Sept. 23 - Oct. 23)
You have always rejected the doctrine of reincarnation as superstitious
nonsense, which comes as a great relief to Hindu couples expecting
children early next month. -
Andrew Wilson at Dec 4, 2002 at 2:55 am ⇧
This was true of perl 5 which only interpolated variables. However,On Mon, Dec 02, 2002 at 06:58:12AM -0500, Joseph F. Ryan wrote:
A string is formed when text is enclosed by a quoting operator.
There are two types of quoting operators: interpolating and
non-interpolating. In interpolating constructs, the value of a
variable is substituted for the variable name within the string
and certain characters have special meaning when preceded by a
backslash (C<\>). In non-interpolating constructs, a variable
name that appears within the string is used as-is. The simplest
examples of these two types of quoting operators are strings
delimited by double (interpolating) and single quotes
(non-interpolating). For example:
perl 6 will interpolate any expression if you put it in $() for scalar
context or @() for list context (you mention this later). It's
misleading to talk about interpolation in terms of variables, lots of
things interpolate. I've reworked that paragraph above, I think
something like this is more appropriate.
A literal string is formed when text is enclosed by a quoting
operator, there are two types: interpolating and non-interpolating.
Interpolating constructs insert (interpolate) the value of an
expression into the string in place of themselves. The simplest
examples of the two types of quoting operators are strings delimited
by double (interpolating) and single (non-interpolating) quotes.
Certain characters, known as meta characters, have special
meaning within a literal string. The most basic of these is the
backslash (C<\>), it is special in both interpolated and
non-interpolated strings. The backslash makes ordinary characters
special and special characters ordinary. Non-interpolated strings
only have two meta characters, the backslash itself and the character
that is being used as the delimiter. Interpolated strings have many
more meta characters, see the section on Escaped characters below.
The most basic expression that may be interpolated is a scalar
variable. In non-interpolating constructs, a variable name that
appears within the string is used as-is. For example:'The quick brown $animal'perl will take each character in the first string literally and
"The quick brown $animal"
In the first string, perl will take each character literally and
perform no special processing. In the second string, the value
of the variable $animal is inserted within the string at that
location. If $animal had had the value "fox", then the second
string would have become "The quick brown fox".
perform no special processing. However, the value of the variable
$animal is inserted into the second string string in place of the text
$animal. If $animal had had the value "fox", then the second string
would have become "The quick brown fox".More on the various quoting operators below.Are you sure about this? In perl 5 a \ is a literal \ unless it precedes
=head2 Non-Interpolating Constructs
Non-Interpolating constructs are strings in which expressions do not
interpolate, or expand. The one exception to this is that the
backslash character, \, will always escape the character that
immediately follows the it.
the string delimiter or another \. Larry said (Apoc 2) this wasn't
changing with the exception of adding \qq{} to allow inserting
interpolating constructs into non-interpolating constructs.The base form for a non-interpolating string is the single-quotedDo these nest arbitrarily?
string: 'string'. However, non-interpolating strings can also be formed
with the q() operator. The q() operator allows strings to be made with
any non-space, non-letter, non-digit character as the delimeter instead
of '. In addition, if the starting delimeter is a part of a paired
s/instead of '//
set, such as (, [, <, or {, then the closing delimeter may be the
matching member of the set. In addition, the reverse holds true;
delimeters which are the tail end of a pair may use the starting item
as the closing delimeter.
=over 3
Examples:
$string = 'string' # $string = 'string'
$string = q|string| # $string = 'string'
$string = q(string) # $string = 'string'
$string = q]string[ # $string = 'string'
=back
There are a few special cases for delimeters; specifically : and #.
: is not allowed because it might be used by custom-defined quoting
operators to apply a property; # is allowed, but there cannot be a
space between the operator and the #. In addition, comments are not
allowed within # delimeted expressions (for obvious reasons).
=head3 Embedding Interpolated Strings
It is also possible to embed an interpolating string within a non-
interpolating string by the use of the \qq{} construct. A string
inside a \qq{} constructs acts exactly as if it were an interpolated
string. Note that any end-brackets, "}", must be escaped within the
the \qq{} construct so that the parser can read it correctly.
q{my string \qq{interpolate $this \q{but not $this} or am $I} Just asking for trouble?}=over 3This needs more explanation.
Examples ( assuming C<< $var="two" >> ):
$string = 'one \qq{$var} two' # $string = 'one two three'
$string = 'one\qq{ {$var\} }two' # $string = 'one {two} three'
=back
=head3 <<>>; expanding a string as a list.
A set of braces is a special op that evaluates into the list of words
contained, using whitespace as the delimeter. It is similar to qw()
from perl5, and can be thought of as roughly equivalent to:
C<< "STRING".split(' ') >>
=over 3
Examples:
@array = <one two three>; # @array = ('one', 'two', 'three');
@array = <one <\> three>; # @array = ('one', '<>', 'three');
=back
=head2 Interpolating Constructs
Interpolating constructs are another form of string in which variables
that are embedded into the string are expanded into their value at
runtime. Interpolated strings are formed using the double quote:
"string". In addition, qq() is a synonym for "", which is similar to
q() being a synonym for ''. The rules for interpolation are as
follows:
Again this shouldn't say variables, it should say expressions.
=head3 Interpolation Rules
=over 3
=item Scalars: C<"$scalar">, C<"$(expression)">
Non-Reference scalars will simply interpolate as their value. $()
forces its expression into scalar context, which is then handled as
either a scalar or a reference, depending on how expression evaluates.
=item Lists: C<"@list">, C<"@(expression)">
Arrays and lists are interpolated by joining their list elements by the
list's separator property, which is by default a space. Therefore, the
following two expressions are equivalent:
=over 3
print "@list";
print "" ~ @list.join(@list.separator) ~ "";
=back
=item Hashes: C<"%hash">, C<"%(expression)">
Hashes interpolate by joining its pairs on its .separator property,
which by default is a newline. Pairs stringify by joining the key and
value with the hash's .pairsep property, which by default is a space.
Note that hashes are unordered, and so the output will be unordered.
Therefore, the following two expressions are equivalent:
=over 3
print "%hash";
print "" ~
join ( %hash.separator,
map { $_ ~ %hash.pairsep ~ %hash{$_} } %hash.keys
~ "";
=back
=item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
Subroutines and Methods will interpolate their return value into the
string, which will be handled in whichever type the return value is.
Same for object methods. Note that parens B<are> required during
interpolation so that the parser can disambiguate between object
methods and object members.
=item References C<"$ref">
# Behavior not defined
=item Default Object Stringification C<"$obj">
# Behavior not defined
=item Escaped Characters
# Basically the same as Perl5; also, how are locale semantics handled?
\t tab
\n newline
\r return
\f form feed
\b backspace
\a alarm (bell)
\e escape
\b10 binary char
\o33 octal char
\x1b hex char
\x{263a} wide hex char
\c[ control char
\N{name} named Unicode character
=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>
Modifiers apply a modification to text which they enclose; they can be
embedded within interpolated strings.
\L{} Lowercase all characters within brackets
\U{} Uppercase all characters within brackets
\Q{} Escape all characters that need escaping
within brackets (except "}")
=back
=head3 Stopping Interpolation (\Q)
Within an interpolated string, interpolation of expressions can be
stopped by \Q.
Within an interpolated string, perl will always try to take the
longest possible expression to interpolate. For instance this:
C[0]"> will interpolate element C<0> of the array C<@list>. If
you want perl to include the array C<@list> followed by the string
C<"[0]">, then you need to use the null string (specified by C<\Q>):=over 3I would lose everything up to the first comma and change the with to
Example:
@list = (1,2);
print "@list\Q[0]"; # prints '1 2[0]'
=back
=head3 Embedding non-interpolated constructs: C<\q{}>
Similar to embedding an interpolated string within a non-interpolated
string, it is possible to embed a non-interpolated string within a
interpolated string with \q{}. Any characters within a \q{} construct
are treated as if they were in an non-interpolated string.
using.
It is possible to embed a non-interpolated string within an
interpolated string using \q{}. Any characters within the \q{}
construct are treated as if they were in an non-interpolated string.=over 3Are these examples using . for string concatenation? If they are that
Example:
"string \q{$variable}" # $variable will not be interpolated
=back
=head3 C<qx()>, backticks (C<``>)
A string which is (possibly) interpolated and then executed as a system
command with /bin/sh or its equivalent. Shell wildcards, pipes, and
redirections will be honored. The collected standard output of the
command is returned; standard error is unaffected. In scalar context,
it comes back as a single (potentially multi-line) string, or undef if
the command failed. In list context, returns a of list of lines split
on the standard input separator, or an empty list if the command
failed.
=head2 Special Quoting
=head3 Here-Docs
A line-oriented form of quoting is based on the shell "here-document"
syntax. Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item. The terminating string
may be either an identifier (a word), or some quoted text. If quoted,
the type of quotes you use determines the treatment of the text, just
as in regular quoting. An unquoted identifier works like double quotes.
The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.
=over 3
Examples:
print << EOF;
The price is $Price.
EOF
print << "EOF"; # same as above
The price is $Price.
EOF
print << "EOF"; # same as above
The price is $Price.
EOF
print << `EOC`; # execute commands
echo hi there
echo lo there
EOC
print <<"foo", <<"bar"; # you can stack them
I said foo.
foo
I said bar.
bar
myfunc(<< "THIS", 23, <<'THAT');
Here's a line
or two.
THIS
and here's another.
THAT
=back
Don't forget that you have to put a semicolon on the end to finish the
statement, as Perl doesn't know you're not going to try to do this:
=over 3
print <<ABC
179231
ABC
+ 20;
=back
If you want your here-docs to be indented with the rest of the code,
you'll need to remove leading whitespace from each line manually:
=over 3
($quote = <<'FINIS') =~ s/^\s+//gm;
The Road goes ever on and on,
down from the door where it began.
FINIS
=back
If you use a here-doc within a delimited construct, such as in s///eg,
the quoted material must come on the lines following the final
delimiter. So instead of:
=over 3
s/this/<<E . 'that'
the other
E
. 'more '/eg;
=back
you have to write
=over 3
s/this/<<E . 'that'
. 'more '/eg;
the other
E
=back
should be a ~.
andrew--
Scorpio: (Oct. 24 - Nov. 21)
You've always thought of Death as a journey into the infinite, but it
turns out to be a lot more like Harry Dean Stanton. -
Joseph F. Ryan at Dec 4, 2002 at 5:56 am ⇧
As far as I know, yes. The current behavior already allows this,Andrew Wilson wrote:
Do these nest arbitrarily?
q{my string \qq{interpolate $this \q{but not $this} or am $I} Just asking for trouble?}
unless the design team vetos it for some reason.
Thanks for all of the great suggestions; I'll try to get another
revision that incorporates them sometime tomarrow.
Joseph F. Ryan
ryan.311@osu.edu
Related Discussions
Discussion Navigation
| view | thread | post |
Discussion Overview
| group | perl6-documentation
|
| categories | perl |
| posted | Dec 2, '02 at 11:58a |
| active | Dec 4, '02 at 10:51p |
| posts | 15 |
| users | 7 |
| website | perl6.org |
