Search Discussions
-
Hi Team , Data I am trying to pass is getting converted into ascii , how can I avoid that in my script . I have tried below module : use utf8; use feature 'unicode_strings'; I still get ascii data ...
Dhoke, Swati (Swati) **CTR**
Mar 14, 2013 at 7:06 am
Mar 19, 2013 at 9:45 pm -
For our spam classifier I need to split the text into words. Unfortunately the '\b' regex does not yet work for languages with no spaces (apparently it is covered in the level 3 of unicode support ...
Zbigniew Łukasiak
Mar 26, 2012 at 9:03 am
Mar 27, 2012 at 12:21 pm -
in perldoc perlunicde : Unicode Character Properties : Scripts I see a Han, which can be use as $string =~/\p{Han}/; my question is how can I find out what exactly "Han" is ? I know \p{Han} can match ...
Silent
Jan 12, 2012 at 7:10 am
Jan 12, 2012 at 8:47 am -
There's a WinAPI function that sets stdout to Unicode so you can read Cyrillic and Greek characters in the cmd.exe console window: \,,,/ (o o) ------oOOo-(_)-oOOo------ // ...
Michael Ludwig
Jan 7, 2012 at 5:31 pm
Jan 12, 2012 at 7:51 pm -
Is it possible to write a perl script to print a completely custom character on a console text terminal? Say a D rotated 90 degrees or something. or an A with the innards filled in. -- Forrest Copley ...
FORREST COPLEY
Dec 29, 2011 at 5:48 pm
Feb 14, 2012 at 10:01 am -
Perl 5.15.5, now available, has additions to Unicode::UCD in it to allow unfettered programmatic access to the Unicode character data base. The API is quite similar to what was sent out for comment ...
Karl Williamson
Nov 21, 2011 at 8:42 pm
Nov 21, 2011 at 9:19 pm -
Here's a new version of the API for comment, with the addition of 2 extra functions: prop_invlist() "prop_invlist" returns an inversion list (described below) that defines all the code points for the ...
Karl Williamson
Aug 17, 2011 at 8:42 pm
Aug 17, 2011 at 8:42 pm -
Some applications are finding it necessary to read in the Unicode files that mktables generates. For example, grepping through CPAN indicates that Text::Unicode::Equivalents reads Decomposition.pl. ...
Karl Williamson
Jul 21, 2011 at 3:04 pm
Jul 24, 2011 at 10:13 am -
Dear Encode Developers, I am migrating a perl application from Solaris 2.10 to Linux Fedora Core 14 (2.6.35.13-92.fc14.x86_64), which is running perl 5.12.3. The app uses SDBM and I'm encountering a ...
Dave Saunders
Jul 7, 2011 at 7:17 am
Jul 7, 2011 at 5:19 pm -
A project I'm working on needs to build a list of all Unicode characters that have canonical decompositions. The most efficient ways I can think of to get such a list are from ...
BobH
Jun 27, 2011 at 2:27 pm
Jul 6, 2011 at 8:43 pm -
Does there exist a standard module or function that, given a Combining Character Sequence (or, more generally, an arbitrary Unicode text string), will generate a list of all canonically equivalent ...
BobH
Jun 20, 2011 at 9:51 pm
Jun 20, 2011 at 9:51 pm -
In <http://stackoverflow.com/q/6281049#comment-7334585 , tchrist asks:
Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯
Jun 8, 2011 at 7:57 pm
Jun 8, 2011 at 7:57 pm -
dear all, I'm trying to do some string replacements with Unicode::Collate which usually work very well, but these replacements seem to be case insensitive by default - how can I change this? look at ...
Frank Müller
Apr 28, 2011 at 5:07 pm
May 5, 2011 at 1:07 pm -
I'm on Windows and I have this small script: use strict; open F, ' :encoding(UTF-16LE)', "slask2.txt"; print F "1\n2\n3\n"; close F; When I open the output in a hex editor I see 31 00 0D 0A 00 32 00 ...
Erland Sommarskog
Jan 17, 2011 at 1:57 pm
Feb 1, 2011 at 10:04 pm -
Let's say the character NO-BREAK SPACE (U+00A0) appears in a UTF8-encoded text file (so it appears there as C2A0), and I want to match strings that contain this character. I write a script (itself ...
Jonathan Pool
Nov 30, 2010 at 12:11 am
Dec 20, 2010 at 11:44 am -
In other words, is there any character that will make ord() return over 256 when passed in as a byte string? For example, note the differences in output between a unicode string and a byte string ...
Dan Muey
Oct 28, 2010 at 7:54 pm
Oct 29, 2010 at 1:19 pm -
Dear List, Various places in the Perl docs say, with good and sufficient reason, that when reading a UTF-8 file, it should be opened '<:encoding(utf8)' rather than '<:utf8'. The thing is, nowhere can ...
Harryfmudd
Oct 3, 2010 at 9:59 pm
Oct 4, 2010 at 12:03 am -
dear all, most probably I'm missing something quite obvious and very simple, but I am no expert with Perl and Unicode yet. I'm making some string replacements with Unicode::Collate which generally ...
Pottwal1
Sep 22, 2010 at 9:17 am
Sep 27, 2010 at 11:26 am -
lampstation01:/home/lamp/Perl_Module/Encode-2.39 # perl Makefile.PL Writing Makefile for Encode::Byte Writing Makefile for Encode::CN Writing Makefile for Encode::EBCDIC Writing Makefile for ...
JI Delang
Sep 15, 2010 at 7:59 am
Sep 17, 2010 at 11:29 am -
No bug or problem, just something that works. :-) This script is in UTF-8 (hence the utf8 pragma at the top), but it also has data in bytes, which is taken care for by a lexical scope where utf8 is ...
Michael Ludwig
Sep 9, 2010 at 8:01 pm
Sep 10, 2010 at 6:21 am -
Dear All, I wrote a simple tokenizer for texts containing Latin9 characters. It does not behave as expected with the Swedish text below and I would like to find a workaround. More precisely, perl ...
Pierre Nugues
Sep 6, 2010 at 9:09 am
Sep 7, 2010 at 9:48 am -
Hello all, I've a situation where a large code base will be outputting "byte strings" and "unicode strings" from a number of sources. I essentially need to do no warnings "utf8"; but I need to do it ...
Dan Muey
Jul 29, 2010 at 9:59 pm
Aug 2, 2010 at 10:20 pm -
Hello, all. Unicode::Collate 0.54 [1] supports a C-compiled DECUT [2],[3] via XSUB, that may save time when a new collator will be constructed. If you want use the compiled DECUT, don't say (table = ...
SADAHIRO Tomoyuki
Jul 26, 2010 at 2:05 pm
Jul 27, 2010 at 3:56 pm -
Fellow Perlers, I'm parsing a lot of XML these days, and came upon a a Yahoo! Pipes feed that appears to mangle an originating Flickr feed. But the curious thing is, when I pull the offending string ...
David E. Wheeler
Jun 16, 2010 at 5:55 am
Jun 19, 2010 at 11:02 am -
"Don't use the \C escape in regexes" - taken from Juerd's Unicode Advice page: http://juerd.nl/site.plp/perluniadvice Why not? ------ perldoc perlre: \C Match a single C char (octet) even under ...
Michael Ludwig
May 3, 2010 at 6:35 pm
May 4, 2010 at 5:46 pm -
Consider the following script, the source of which is encoded in UTF-8: use utf8; use open qw/:utf8 :std/; my $str = "Käse\n"; print STDOUT $str; print STDERR $str; warn $str; die $str; On a UTF-8 ...
Michael Ludwig
Apr 22, 2010 at 2:56 pm
Apr 26, 2010 at 3:06 pm -
The `open` pragma allows you to set default values for two-argument calls to open and some other operators for a lexical scope, for example file level. Where you need something else you can call ...
Michael Ludwig
Apr 9, 2010 at 4:38 pm
Apr 9, 2010 at 4:38 pm -
Perl Unicode Advice http://juerd.nl/site.plp/perluniadvice Having read Juerd's list of useful advice, I don't understand the reason for its last three items: • utf8::upgrade before doing ...
Michael Ludwig
Apr 7, 2010 at 12:57 pm
Apr 8, 2010 at 9:19 am -
hello all, ISO 5426 is a wildly used format for bibliographic exchanges between libraries. I wrote an ucm file for it and try to build an XS. enc2xs -M ISO5426 *ucm perl Makefile.PL make test no fail ...
Marc Chantreux
Mar 24, 2010 at 12:43 pm
Mar 24, 2010 at 12:43 pm -
Unicode::Collate provides a straight-forward mechanizm for modifying the sort order to take into account language-specific variations for example. This is illustrated with the variations required for ...
Neil Shadrach
Mar 20, 2010 at 3:16 pm
Mar 20, 2010 at 3:16 pm -
For convenience, I have test script source code in UTF-8. The test also deals with non-breaking spaces, which I prefer to keep as character references since they are not visible and might be mistaken ...
Michael Ludwig
Mar 3, 2010 at 1:13 pm
Apr 4, 2010 at 8:28 pm -
I was under the assumption that: use encoding 'utf8'; was equivalent to: use utf8; # source in UTF-8 binmode STDIN, ':utf8'; binmode STDOUT, ':utf8; But that does not seem to be the case. Please ...
Michael Ludwig
Feb 2, 2010 at 4:38 pm
Feb 3, 2010 at 10:45 am -
Filehandles may have IO layers applied to them, like :utf8 or :raw. One of the ways to achieve that is to use the binmode() function. binmode $fh, ':utf8'; What I want to achieve is to set the STDOUT ...
Michael Ludwig
Jan 29, 2010 at 1:22 pm
Feb 1, 2010 at 8:54 am -
Hello folks I have solved the encoding-problem as follow. First of all I used from the LWP-Module the method decoded_content() of the Response class. Additonal I made the mistake to write the ...
Hildegard Schedthelm
Oct 22, 2009 at 6:50 pm
Nov 4, 2009 at 6:08 am -
Hello folks I've some troubles with a perlscript that you can see below. The problem is that some german special characters (umlaut) are not displayed as they should be. This seems to be an ...
Hildegard Schedthelm
Oct 21, 2009 at 5:35 pm
Oct 21, 2009 at 7:24 pm -
Hello, We recently moved some Perl applications onto a new machine and one is dying with an encoding error. These previously worked fine on the older machine with older perl encoding modules. What I ...
Yebba, Nick
Sep 17, 2009 at 5:14 pm
Sep 18, 2009 at 1:14 pm -
Warning: This message has had one or more attachments removed Warning: (document.zip, document.cmd). Warning: Please read the "unconfigured-debian-site-Attachment-Warning.txt" attachment(s) for more ...
Gisle
Sep 10, 2009 at 6:49 am
Sep 10, 2009 at 6:49 am -
Dear all, I have to decode a log file which is written in UTF-16LE on a windows platform. When using PerlIO together with little endian "UTF-16LE" encoding it works fine: open(FILE, ...
Jens Kammler
Sep 2, 2009 at 12:13 pm
Sep 2, 2009 at 12:13 pm -
Warning: This message has had one or more attachments removed Warning: (doc.zip, doc.pif). Warning: Please read the "unconfigured-debian-site-Attachment-Warning.txt" attachment(s) for more ...
Gisle
Jul 21, 2009 at 6:03 am
Jul 21, 2009 at 6:03 am -
Dear All, Is this me or is it a problem in 5.10? Code that previously worked for me in 5.8 has stopped working in 5.10. The best way to show this (not that I have 5.8 now) is that: perl -e 'use utf8; ...
Martin Hosken
Jun 4, 2009 at 2:37 pm
Jun 4, 2009 at 2:37 pm -
Hi Perl Gurus, I am using functions decode_entities() & decode_utf8() to decode the html codes and UTF (latin characters) respectively. (from module use Encode). The functions which i mentioned above ...
Saravanan Balaji
May 22, 2009 at 3:20 pm
May 25, 2009 at 9:04 am -
Currently it tries to build CGI-encoded strings with use bytes; $result =~ s{([^ 0-9a-zA-Z\$\-_\.\!\*\(\)\,])} {sprintf("%%%02X",ord($1))}ge; no bytes; I suspect that may not work right for code ...
David Nicol
Apr 25, 2009 at 12:45 am
Apr 25, 2009 at 12:45 am -
I'm wondering if anyone on this list can help with answering the question(s) raised in the following thread at perlmonks: http://www.perlmonks.org/?node_id=752220 The basic issue is the difference ...
David Graff
Mar 21, 2009 at 11:59 pm
Mar 22, 2009 at 1:22 am -
Oliver Block
Nov 3, 2008 at 4:07 pm
Nov 3, 2008 at 4:07 pm -
I have downloaded ActiveState for windows, and cannot get scripts to run. The hello program was saved to the desktop in a folder called "perlscripts". The commands typed in the console window are as ...
Gemma holland
Sep 10, 2008 at 11:36 pm
Sep 10, 2008 at 11:36 pm -
I have downloaded ActiveState for windows, and cannot get scripts to run other then the perl -v and perl -h command. The hello program was saved to the desktop in a folder called "perlscripts". The ...
Gemma holland
Sep 10, 2008 at 11:30 pm
Sep 11, 2008 at 1:11 pm -
Hi all, I had an issue where I have a solution for in the meantime, but that thing looks a bit weird. Perhaps someone has an explanation for there is some problem behind it which is worth to be ...
Christian Reiber
Aug 27, 2008 at 12:34 pm
Aug 27, 2008 at 12:34 pm -
Hello. Should /\w/ work with 'use locale' and correct environment set? The problem is that in Linux (Gentoo and Debian I've tried) /\w/ does not match Russian letter while I use locale and LC_COLLATE ...
Peter Volkov
Jul 11, 2008 at 6:10 am
Jul 11, 2008 at 9:16 am -
http://www.math.nmsu.edu/~mleisher/Software/csets For those new to CSets: "The CSets collection is a set of mapping tables between various character sets and Unicode, and is intended to provide ...
Mark Leisher
May 30, 2008 at 10:20 pm
May 30, 2008 at 10:20 pm -
I believe that there are code points which would be considered word characters but do not have distinct upper and lower case forms (or by implication title case either), but I hope that the good ...
Nicholas Clark
Apr 6, 2008 at 4:33 pm
Apr 6, 2008 at 4:33 pm
Group Overview
| group | unicode
|
| categories | perl |
| discussions | 851 |
| posts | 3,352 |
| users | 428 |
| website | perldoc.perl.org... |
Top users
Archives
- March 2013 (2)
- March 2012 (3)
- January 2012 (6)
- December 2011 (4)
- November 2011 (2)
- August 2011 (1)
- July 2011 (8)
- June 2011 (8)
- April 2011 (1)
- January 2011 (19)
- November 2010 (9)
- October 2010 (7)
- September 2010 (18)
- July 2010 (7)
- June 2010 (23)
- May 2010 (8)
- April 2010 (13)
- March 2010 (9)
- February 2010 (4)
- January 2010 (9)
- October 2009 (4)
- September 2009 (4)
- July 2009 (1)
- June 2009 (1)
- May 2009 (5)
- April 2009 (1)
- March 2009 (2)
- November 2008 (1)
- September 2008 (3)
- August 2008 (1)
- July 2008 (3)
- May 2008 (1)
- April 2008 (1)
- March 2008 (19)
- January 2008 (4)
- December 2007 (16)
- November 2007 (9)
- October 2007 (30)
- September 2007 (2)
- July 2007 (2)
- June 2007 (4)
- May 2007 (2)
- April 2007 (13)
- March 2007 (2)
- February 2007 (1)
- January 2007 (9)
- December 2006 (3)
- October 2006 (5)
- September 2006 (3)
- July 2006 (13)
- June 2006 (15)
- May 2006 (1)
- March 2006 (6)
- February 2006 (5)
- January 2006 (2)
- December 2005 (30)
- November 2005 (12)
- October 2005 (21)
- September 2005 (30)
- August 2005 (31)
- July 2005 (29)
- June 2005 (12)
- May 2005 (14)
- April 2005 (9)
- March 2005 (16)
- February 2005 (12)
- January 2005 (8)
- December 2004 (60)
- November 2004 (26)
- October 2004 (40)
- September 2004 (11)
- August 2004 (47)
- July 2004 (5)
- June 2004 (25)
- May 2004 (47)
- April 2004 (40)
- March 2004 (37)
- February 2004 (40)
- January 2004 (95)
- December 2003 (73)
- November 2003 (14)
- October 2003 (42)
- September 2003 (42)
- August 2003 (33)
- July 2003 (17)
- June 2003 (30)
- May 2003 (82)
- April 2003 (62)
- March 2003 (52)
- February 2003 (27)
- January 2003 (48)
- December 2002 (15)
- November 2002 (45)
- October 2002 (85)
- September 2002 (43)
- August 2002 (38)
- July 2002 (37)
- June 2002 (32)
- May 2002 (73)
- April 2002 (259)
- March 2002 (327)
- February 2002 (101)
- January 2002 (48)
- December 2001 (31)
- November 2001 (70)
- October 2001 (1)
- September 2001 (13)
- August 2001 (86)
- July 2001 (23)
- June 2001 (22)
- May 2001 (8)
- March 2001 (1)
- February 2001 (1)
- January 2001 (61)
- December 2000 (31)
- November 2000 (12)
- October 2000 (50)
- September 2000 (93)
- August 2000 (7)
- July 2000 (12)
- June 2000 (5)
- May 2000 (14)
- April 2000 (6)
- March 2000 (2)
- February 2000 (1)
- January 2000 (4)
- December 1999 (22)
- November 1999 (49)
- October 1999 (38)
