Grokbase Groups Perl unicode

Search Discussions

859 discussions - 3,372 posts

  • Hi! I had some encoding (actually decoding) issues when trying to read from a pipe, and I would like to check with some more experienced users if I'm doing something wrong, if it's intended behavior, ...
    Anders AnderssonAnders Andersson
    May 26, 2016 at 6:45 am
    May 26, 2016 at 6:45 am
  • Hello, I tried to make my Perl5 code unicode compliant after reading a post on stackoverflow[1]. As suggested in the post: “always run incoming stuff through NFD and outbound stuff from NFC.” I got a ...
    Daniel DehenninDaniel Dehennin
    May 9, 2016 at 2:53 pm
    May 12, 2016 at 12:51 am
  • Hi! I though that I understand UTF-8 encoding/decoding done in perl until I looked into source code of Encode package... (exactly sub encode_utf8) Before... I only read description of Encode package ...
    Pali RohárPali Rohár
    May 5, 2016 at 2:37 pm
    May 15, 2016 at 3:05 am
  • Dear Perl: I run an HP-UX server on Release 11.31. I checked the dependencies (Perl 5.7.3 - I have 5.8.8.F). I then executed the simple install steps. The 'make' process displayed a lot of warnings ...
    Ole C. OlsenOle C. Olsen
    Mar 4, 2016 at 8:24 pm
    Mar 4, 2016 at 8:24 pm
  • The Unicode Consortium is seeking feedback on options for fixing anomalies involving a few characters with the Sc and Scx properties. For details and to comment, see ...
    Karl WilliamsonKarl Williamson
    Jul 3, 2014 at 5:54 pm
    Jul 3, 2014 at 5:54 pm
  • I'm the maintainer of Audio::Taglib. Summary of my perl5 (revision 5 version 16 subversion 3) configuration: osname=linux, osvers=3.10.9-200.fc19.x86_64, archname=x86_64-linux-thread-multi $utf16 = ...
    Geoffrey LeachGeoffrey Leach
    Feb 9, 2014 at 11:59 pm
    Feb 10, 2014 at 7:51 am
  • "Cannot decode string with wide characters" I see this error when calling decode_utf8() on a character string (with wide characters) that was already decoded. This error started with Encode 2.53, it ...
    Bill MoseleyBill Moseley
    Dec 5, 2013 at 5:10 am
    Dec 5, 2013 at 5:10 am
  • Hi Team, We have a legacy Perl application which was developed using Perl 5.6.1. I would like to Install which is a pre requisite for Cache::Memcached. When i tried to install for ...
    Jul 3, 2013 at 4:20 am
    Jul 5, 2013 at 8:09 pm
  • Hi Team , Data I am trying to pass is getting converted into ascii , how can I avoid that in my script . I have tried below module : use utf8; use feature 'unicode_strings'; I still get ascii data ...
    Dhoke, Swati (Swati) **CTR**Dhoke, Swati (Swati) **CTR**
    Mar 14, 2013 at 7:06 am
    Mar 19, 2013 at 9:45 pm
  • For our spam classifier I need to split the text into words. Unfortunately the '\b' regex does not yet work for languages with no spaces (apparently it is covered in the level 3 of unicode support ...
    Zbigniew ŁukasiakZbigniew Łukasiak
    Mar 26, 2012 at 9:03 am
    Mar 27, 2012 at 12:21 pm
  • in perldoc perlunicde : Unicode Character Properties : Scripts I see a Han, which can be use as $string =~/\p{Han}/; my question is how can I find out what exactly "Han" is ? I know \p{Han} can match ...
    Jan 12, 2012 at 7:10 am
    Jan 12, 2012 at 8:47 am
  • There's a WinAPI function that sets stdout to Unicode so you can read Cyrillic and Greek characters in the cmd.exe console window: \,,,/ (o o) ------oOOo-(_)-oOOo------ // ...
    Michael LudwigMichael Ludwig
    Jan 7, 2012 at 5:31 pm
    Jan 12, 2012 at 7:51 pm
  • Is it possible to write a perl script to print a completely custom character on a console text terminal? Say a D rotated 90 degrees or something. or an A with the innards filled in. -- Forrest Copley ...
    Dec 29, 2011 at 5:48 pm
    Feb 14, 2012 at 10:01 am
  • Perl 5.15.5, now available, has additions to Unicode::UCD in it to allow unfettered programmatic access to the Unicode character data base. The API is quite similar to what was sent out for comment ...
    Karl WilliamsonKarl Williamson
    Nov 21, 2011 at 8:42 pm
    Nov 21, 2011 at 9:19 pm
  • Here's a new version of the API for comment, with the addition of 2 extra functions: prop_invlist() "prop_invlist" returns an inversion list (described below) that defines all the code points for the ...
    Karl WilliamsonKarl Williamson
    Aug 17, 2011 at 8:42 pm
    Aug 17, 2011 at 8:42 pm
  • Some applications are finding it necessary to read in the Unicode files that mktables generates. For example, grepping through CPAN indicates that Text::Unicode::Equivalents reads ...
    Karl WilliamsonKarl Williamson
    Jul 21, 2011 at 3:04 pm
    Jul 24, 2011 at 10:13 am
  • Dear Encode Developers, I am migrating a perl application from Solaris 2.10 to Linux Fedora Core 14 (, which is running perl 5.12.3. The app uses SDBM and I'm encountering a ...
    Dave SaundersDave Saunders
    Jul 7, 2011 at 7:17 am
    Jul 7, 2011 at 5:19 pm
  • A project I'm working on needs to build a list of all Unicode characters that have canonical decompositions. The most efficient ways I can think of to get such a list are from ...
    Jun 27, 2011 at 2:27 pm
    Jul 6, 2011 at 8:43 pm
  • Does there exist a standard module or function that, given a Combining Character Sequence (or, more generally, an arbitrary Unicode text string), will generate a list of all canonically equivalent ...
    Jun 20, 2011 at 9:51 pm
    Jun 20, 2011 at 9:51 pm
  • In < , tchrist asks:
    Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯
    Jun 8, 2011 at 7:57 pm
    Jun 8, 2011 at 7:57 pm
  • dear all, I'm trying to do some string replacements with Unicode::Collate which usually work very well, but these replacements seem to be case insensitive by default - how can I change this? look at ...
    Frank MüllerFrank Müller
    Apr 28, 2011 at 5:07 pm
    May 5, 2011 at 1:07 pm
  • I'm on Windows and I have this small script: use strict; open F, ' :encoding(UTF-16LE)', "slask2.txt"; print F "1\n2\n3\n"; close F; When I open the output in a hex editor I see 31 00 0D 0A 00 32 00 ...
    Erland SommarskogErland Sommarskog
    Jan 17, 2011 at 1:57 pm
    Feb 1, 2011 at 10:04 pm
  • Let's say the character NO-BREAK SPACE (U+00A0) appears in a UTF8-encoded text file (so it appears there as C2A0), and I want to match strings that contain this character. I write a script (itself ...
    Jonathan PoolJonathan Pool
    Nov 30, 2010 at 12:11 am
    Dec 20, 2010 at 11:44 am
  • In other words, is there any character that will make ord() return over 256 when passed in as a byte string? For example, note the differences in output between a unicode string and a byte string ...
    Dan MueyDan Muey
    Oct 28, 2010 at 7:54 pm
    Oct 29, 2010 at 1:19 pm
  • Dear List, Various places in the Perl docs say, with good and sufficient reason, that when reading a UTF-8 file, it should be opened '<:encoding(utf8)' rather than '<:utf8'. The thing is, nowhere can ...
    Oct 3, 2010 at 9:59 pm
    Oct 4, 2010 at 12:03 am
  • dear all, most probably I'm missing something quite obvious and very simple, but I am no expert with Perl and Unicode yet. I'm making some string replacements with Unicode::Collate which generally ...
    Sep 22, 2010 at 9:17 am
    Sep 27, 2010 at 11:26 am
  • lampstation01:/home/lamp/Perl_Module/Encode-2.39 # perl Makefile.PL Writing Makefile for Encode::Byte Writing Makefile for Encode::CN Writing Makefile for Encode::EBCDIC Writing Makefile for ...
    JI DelangJI Delang
    Sep 15, 2010 at 7:59 am
    Sep 17, 2010 at 11:29 am
  • No bug or problem, just something that works. :-) This script is in UTF-8 (hence the utf8 pragma at the top), but it also has data in bytes, which is taken care for by a lexical scope where utf8 is ...
    Michael LudwigMichael Ludwig
    Sep 9, 2010 at 8:01 pm
    Sep 10, 2010 at 6:21 am
  • Dear All, I wrote a simple tokenizer for texts containing Latin9 characters. It does not behave as expected with the Swedish text below and I would like to find a workaround. More precisely, perl ...
    Pierre NuguesPierre Nugues
    Sep 6, 2010 at 9:09 am
    Sep 7, 2010 at 9:48 am
  • Hello all, I've a situation where a large code base will be outputting "byte strings" and "unicode strings" from a number of sources. I essentially need to do no warnings "utf8"; but I need to do it ...
    Dan MueyDan Muey
    Jul 29, 2010 at 9:59 pm
    Aug 2, 2010 at 10:20 pm
  • Hello, all. Unicode::Collate 0.54 [1] supports a C-compiled DECUT [2],[3] via XSUB, that may save time when a new collator will be constructed. If you want use the compiled DECUT, don't say (table = ...
    SADAHIRO TomoyukiSADAHIRO Tomoyuki
    Jul 26, 2010 at 2:05 pm
    Jul 27, 2010 at 3:56 pm
  • Fellow Perlers, I'm parsing a lot of XML these days, and came upon a a Yahoo! Pipes feed that appears to mangle an originating Flickr feed. But the curious thing is, when I pull the offending string ...
    David E. WheelerDavid E. Wheeler
    Jun 16, 2010 at 5:55 am
    Jun 19, 2010 at 11:02 am
  • "Don't use the \C escape in regexes" - taken from Juerd's Unicode Advice page: Why not? ------ perldoc perlre: \C Match a single C char (octet) even under ...
    Michael LudwigMichael Ludwig
    May 3, 2010 at 6:35 pm
    May 4, 2010 at 5:46 pm
  • Consider the following script, the source of which is encoded in UTF-8: use utf8; use open qw/:utf8 :std/; my $str = "Käse\n"; print STDOUT $str; print STDERR $str; warn $str; die $str; On a UTF-8 ...
    Michael LudwigMichael Ludwig
    Apr 22, 2010 at 2:56 pm
    Apr 26, 2010 at 3:06 pm
  • The `open` pragma allows you to set default values for two-argument calls to open and some other operators for a lexical scope, for example file level. Where you need something else you can call ...
    Michael LudwigMichael Ludwig
    Apr 9, 2010 at 4:38 pm
    Apr 9, 2010 at 4:38 pm
  • Perl Unicode Advice Having read Juerd's list of useful advice, I don't understand the reason for its last three items: • utf8::upgrade before doing ...
    Michael LudwigMichael Ludwig
    Apr 7, 2010 at 12:57 pm
    Apr 8, 2010 at 9:19 am
  • hello all, ISO 5426 is a wildly used format for bibliographic exchanges between libraries. I wrote an ucm file for it and try to build an XS. enc2xs -M ISO5426 *ucm perl Makefile.PL make test no fail ...
    Marc ChantreuxMarc Chantreux
    Mar 24, 2010 at 12:43 pm
    Mar 24, 2010 at 12:43 pm
  • Unicode::Collate provides a straight-forward mechanizm for modifying the sort order to take into account language-specific variations for example. This is illustrated with the variations required for ...
    Neil ShadrachNeil Shadrach
    Mar 20, 2010 at 3:16 pm
    Mar 20, 2010 at 3:16 pm
  • For convenience, I have test script source code in UTF-8. The test also deals with non-breaking spaces, which I prefer to keep as character references since they are not visible and might be mistaken ...
    Michael LudwigMichael Ludwig
    Mar 3, 2010 at 1:13 pm
    Apr 4, 2010 at 8:28 pm
  • I was under the assumption that: use encoding 'utf8'; was equivalent to: use utf8; # source in UTF-8 binmode STDIN, ':utf8'; binmode STDOUT, ':utf8; But that does not seem to be the case. Please ...
    Michael LudwigMichael Ludwig
    Feb 2, 2010 at 4:38 pm
    Feb 3, 2010 at 10:45 am
  • Filehandles may have IO layers applied to them, like :utf8 or :raw. One of the ways to achieve that is to use the binmode() function. binmode $fh, ':utf8'; What I want to achieve is to set the STDOUT ...
    Michael LudwigMichael Ludwig
    Jan 29, 2010 at 1:22 pm
    Feb 1, 2010 at 8:54 am
  • Hello folks I have solved the encoding-problem as follow. First of all I used from the LWP-Module the method decoded_content() of the Response class. Additonal I made the mistake to write the ...
    Hildegard SchedthelmHildegard Schedthelm
    Oct 22, 2009 at 6:50 pm
    Nov 4, 2009 at 6:08 am
  • Hello folks I've some troubles with a perlscript that you can see below. The problem is that some german special characters (umlaut) are not displayed as they should be. This seems to be an ...
    Hildegard SchedthelmHildegard Schedthelm
    Oct 21, 2009 at 5:35 pm
    Oct 21, 2009 at 7:24 pm
  • Hello, We recently moved some Perl applications onto a new machine and one is dying with an encoding error. These previously worked fine on the older machine with older perl encoding modules. What I ...
    Yebba, NickYebba, Nick
    Sep 17, 2009 at 5:14 pm
    Sep 18, 2009 at 1:14 pm
  • Warning: This message has had one or more attachments removed Warning: (, document.cmd). Warning: Please read the "unconfigured-debian-site-Attachment-Warning.txt" attachment(s) for more ...
    Sep 10, 2009 at 6:49 am
    Sep 10, 2009 at 6:49 am
  • Dear all, I have to decode a log file which is written in UTF-16LE on a windows platform. When using PerlIO together with little endian "UTF-16LE" encoding it works fine: open(FILE, ...
    Jens KammlerJens Kammler
    Sep 2, 2009 at 12:13 pm
    Sep 2, 2009 at 12:13 pm
  • Warning: This message has had one or more attachments removed Warning: (, doc.pif). Warning: Please read the "unconfigured-debian-site-Attachment-Warning.txt" attachment(s) for more ...
    Jul 21, 2009 at 6:03 am
    Jul 21, 2009 at 6:03 am
  • Dear All, Is this me or is it a problem in 5.10? Code that previously worked for me in 5.8 has stopped working in 5.10. The best way to show this (not that I have 5.8 now) is that: perl -e 'use utf8; ...
    Martin HoskenMartin Hosken
    Jun 4, 2009 at 2:37 pm
    Jun 4, 2009 at 2:37 pm
  • Hi Perl Gurus, I am using functions decode_entities() & decode_utf8() to decode the html codes and UTF (latin characters) respectively. (from module use Encode). The functions which i mentioned above ...
    Saravanan BalajiSaravanan Balaji
    May 22, 2009 at 3:20 pm
    May 25, 2009 at 9:04 am
  • Currently it tries to build CGI-encoded strings with use bytes; $result =~ s{([^ 0-9a-zA-Z\$\-_\.\!\*\(\)\,])} {sprintf("%%%02X",ord($1))}ge; no bytes; I suspect that may not work right for code ...
    David NicolDavid Nicol
    Apr 25, 2009 at 12:45 am
    Apr 25, 2009 at 12:45 am
Group Navigation
period‹ prev | Latest | first ›
Group Overview
groupunicode @

Top users

Dankogai: 404 posts Jhi: 384 posts SADAHIRO Tomoyuki: 157 posts Nick: 134 posts Autrijus Tang: 119 posts Nick Ing-Simmons: 101 posts Gisle: 75 posts Jshin: 58 posts Markus Kuhn: 54 posts Anton Tagunov: 50 posts Mark Leisher: 48 posts Andreas Koenig: 42 posts Nicholas Clark: 39 posts Philip Newton: 39 posts John Delacour: 37 posts Rajarshi das: 35 posts Larry: 34 posts Juerd Waalboer: 28 posts Misha Wolf: 28 posts Martin_hosken: 24 posts
show more