FAQ
Hello,

I have a simple upload module but I can't get a pure binary upload
to work everytime. This is what I get sometimes :
- Original is 73856 bytes
- Uploaded is 33398 bytes or 27939 bytes

Also the header changed from :
0x00000000: FFD8FFE0 00104A46 49460001 0101012C ......JFIF.....,
To :
0x00000000: EFBFBDEF BFBD0010 4A464946 00010101 ........JFIF....
0x00000010: 012C .,
Or even :
0x00000000: EFBFBDEF BFBDEFBF BDEFBFBD 00104A46 ..............JF
0x00000010: 49460001 0101012C IF.....,

So it's not just a truncated file. What is doing this ? Why does it work sometimes
and sometimes not ?

I have tried setting escmode=0, binmode FILE, EmbperlBlocks with no change. What is
causing this behaviour ? How can I guarantee a binary upload ?

Here is my code :


[$ syntax EmbperlBlocks $]
[-
$req=shift;
$escmode=0;
$PHOTOPATH = "$ENV{DOCUMENT_ROOT}/data/img/artistes/big";
while ( ($k,$v)=each(%fdat)) {
if ($k =~ /^upl(\d+)$/ and $v) {
my $filename=$1;
open(FILE,">$PHOTOPATH/$filename.jpg") or print OUT $!;
binmode FILE;
my $buffer;
while (read($fdat{$k},$buffer,32768)) {
# should I do something with $buffer here ?
print FILE $buffer;
}
close(FILE);
}
}
-]


<form method="post" ENCTYPE="multipart/form-data">

And inside a loop :
<input type="FILE" id="upl[+ $p->[0] +]" name="upl[+ $p->[0] +]" />

<input type="SUBMIT" name="Bsave" value="Enregistrer" />
</form>


Thanks for your help,

--
Jean-Christophe Boggio -o)
embperl@thefreecat.org /\\
Independant Consultant and Developer _\_V

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org

Search Discussions

  • Chris Allen at Apr 4, 2012 at 4:39 pm
    Don't know why or where, but you've got some utf8 encoding going on.
    EF,BF,BD is the utf8 "replacement string" used for an unknown character
    (probably the initial FF).

    Suggest you sniff your data stream to see if it's happening before it
    reaches Embperl.

    On 04/04/2012 17:20, Jean-Christophe Boggio wrote:
    Hello,

    I have a simple upload module but I can't get a pure binary upload
    to work everytime. This is what I get sometimes :
    - Original is 73856 bytes
    - Uploaded is 33398 bytes or 27939 bytes

    Also the header changed from :
    0x00000000: FFD8FFE0 00104A46 49460001 0101012C ......JFIF.....,
    To :
    0x00000000: EFBFBDEF BFBD0010 4A464946 00010101 ........JFIF....
    0x00000010: 012C .,
    Or even :
    0x00000000: EFBFBDEF BFBDEFBF BDEFBFBD 00104A46 ..............JF
    0x00000010: 49460001 0101012C IF.....,

    So it's not just a truncated file. What is doing this ? Why does it
    work sometimes
    and sometimes not ?

    I have tried setting escmode=0, binmode FILE, EmbperlBlocks with no
    change. What is
    causing this behaviour ? How can I guarantee a binary upload ?

    Here is my code :


    [$ syntax EmbperlBlocks $]
    [-
    $req=shift;
    $escmode=0;
    $PHOTOPATH = "$ENV{DOCUMENT_ROOT}/data/img/artistes/big";
    while ( ($k,$v)=each(%fdat)) {
    if ($k =~ /^upl(\d+)$/ and $v) {
    my $filename=$1;
    open(FILE,">$PHOTOPATH/$filename.jpg") or print OUT $!;
    binmode FILE;
    my $buffer;
    while (read($fdat{$k},$buffer,32768)) {
    # should I do something with $buffer here ?
    print FILE $buffer;
    }
    close(FILE);
    }
    }
    -]


    <form method="post" ENCTYPE="multipart/form-data">

    And inside a loop :
    <input type="FILE" id="upl[+ $p->[0] +]" name="upl[+ $p->[0] +]" />

    <input type="SUBMIT" name="Bsave" value="Enregistrer" />
    </form>


    Thanks for your help,
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org
  • Jean-Christophe Boggio at Apr 4, 2012 at 11:36 pm

    Le 04/04/2012 18:38, Chris Allen a écrit :
    Don't know why or where, but you've got some utf8 encoding going on.
    EF,BF,BD is the utf8 "replacement string" used for an unknown character (probably the initial FF).

    Suggest you sniff your data stream to see if it's happening before it reaches Embperl.
    tcpdump prints this :

    0x0500: 7570 6c33 3831 223b 2066 696c 656e 616d upl381";.filenam
    0x0510: 653d 2243 686f 7269 7374 6573 4844 312e e="ChoristesHD1.
    0x0520: 6a70 6722 0d0a 436f 6e74 656e 742d 5479 jpg"..Content-Ty
    0x0530: 7065 3a20 696d 6167 652f 6a70 6567 0d0a pe:.image/jpeg..
    0x0540: 0d0a ffd8 ffe0 0010 4a46 4946 0001 0101 ........JFIF....
    0x0550: 012c 012c 0000 ffe1 33ff 4578 6966 0000 .,.,....3.Exif..

    So it has to be embperl-related.

    The site's config seems clean to me :

    <VirtualHost *>
    ServerName something.fr
    DocumentRoot /var/www/sites/semi
    DirectoryIndex index.html
    EMBPERL_DEBUG 0
    EMBPERL_APPNAME semiv2
    EMBPERL_OBJECT_BASE base.epl
    <FilesMatch "\.html">
    SetHandler perl-script
    PerlHandler Embperl::Object
    Options ExecCGI
    </FilesMatch>
    Options -Indexes
    </VirtualHost>

    <Directory "/var/www/sites/semi/admin">
    AuthUserFile /var/www/sites/semi/admin/.htpasswd
    AuthName "Administration SEMI"
    AuthType Basic
    require valid-user

    <Files *.cgi>
    AddHandler cgi-script .cgi .pl .htm
    Options ExecCGI
    </files>
    </Directory>

    The /etc/apache2/conf.d/charset file is empty (all commented out).

    Nothing charset-related in apache2.conf (standard Debian squeeze file)

    The "base.epl" file contains this (I have UTF-8 accented characters in my pages) :
    use utf8;
    use encoding "utf8";
    $http_headers_out{'Content-Type'}="text/html; charset=utf-8";

    and :
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    I tried creating a "test" subdir containing only the upload module and a simplistic base.epl :
    [-
    Execute("$ENV{DOCUMENT_ROOT}/basedb.epl");
    Execute('*');
    -]
    (basedb.epl only creates the $req->{dbh} handle to the DB)
    Still no change.

    Where can this encoding come from ? And why are the files smaller than the originals (UTF8 should
    only enlarge the file when it encounters unknown chars, no ?) And why don't they always have the
    same size ?

    Last clue : it *seems* that when I restart apache I can reliably do ONE upload.

    I run out of ideas so if anyone has one, I'll take it. Thanks for your help.

    --
    Jean-Christophe Boggio -o)
    embperl@thefreecat.org /\\
    Independant Consultant and Developer _\_V

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org
  • Jean-Christophe Boggio at Apr 5, 2012 at 4:43 pm
    Thanks for taking the time to help me.

    Le 05/04/2012 08:48, Chris Allen a écrit :
    Can you include all of the headers here please?
    I have attached the beginning of the dump (tcpdump addresses are changed to aa.aa.aaa.aa
    and bb.bbb.bb.bb but it's easy to find the real ones). Hope the list accepts attachments.
    The whole dump is 2.5Mb so I won't post it to the list but I have it handy if you need.
    It's possible you have more than one issue here. Firstly, what happens if you
    upload several textfiles (ASCII data only)? Do they upload correctly? Or perhaps
    they upload correctly but truncated?
    Uploaded the full tcpdump (2670592bytes). It's pure 7-bit ASCII : same size, same md5sum
    Uploaded a linux-header Makefile (53Kb). Probably 7-bit ASCII : same size, same md5sum

    Uploaded a big ASCII file containing a few accents :
    1395336 original
    1395118 copy
    Results are... insane : here is the diff :

    diff -u 0410959v-phase2.txt 14.jpg
    --- original 2011-09-05 15:18:49.000000000 +0200
    +++ copy 2012-04-05 16:17:22.091080638 +0200
    @@ -38,18 +38,18 @@
    Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use of uninitialized value in numeric eq (==) at ext-bin/do_5_gense2.pl line 1126.
    +Use of uninitialized value in numeric et-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use ofed value in n=) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use ozed value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use of uninitialized valu et-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use of uninite in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use of uninitialized value in num a_5_genfichiers_phase2.pl line 1126.
    +Use of uninitialized eric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use of uninitialized value in numeric eq (inhiers_phase2.pl line 1126.
    +Use of uninitialized value in ==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use of uninitialized value in numeric eq (==) at exense2.pl line 1126.
    +Use of uninitialized value in numeric et-bichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    @@ -258,7 +258,7 @@
    Warning: Permanently added '[10.141.0.61]:2222' (RSA) to the list of known hosts.
    Ubuntu 10.04.3 LTS
    Warning: Permanently added '192.168.122.130' (RSA) to the list of known hosts.
    -Arret du LDAP (patienter 10 secondes)
    +Arret LDAP (patienter 10 secondes)
    Stopping daemon monitor: monit.
    Stopping OpenLDAP: slapd.
    tar: Removing leading `/' from member names

    The differences are lines 41-52 and 261 though the file is 23818 lines long. I guess it comes
    from the fact that there's only one 32768-bytes buffer "corrupted" ?
    Accents are only lines 2-191 (not on all lines)
    The accents are still there, untouched. In the original file, they are UTF-8 encoded :
    iconv -f utf8 -t latin1 original >/dev/null
    -> no error

    Also the files are not "truncated", there are bits randomly missing in the middle.


    So as I understand it, the problemS (UTF8 encoding + bits missing) arise only when
    non-UTF8 characters are encountered.

    If you have ideas of where/what I can look next...

    Thanks for your patience,

    --
    Jean-Christophe Boggio -o)
    embperl@thefreecat.org /\\
    Independant Consultant and Developer _\_V
  • Gerald Richter at Apr 9, 2012 at 4:25 pm
    Hi,

    the file upload is handled by CGI.pm and not by Embperl itself. It looks like CGI.pm is doing some UTF8 conversion (or it is done when you write the file).

    Perl's UTF-8 handling is a kind of mystery (and least to me). Every time I thought I had understood what is going on, I got a new surprise.

    In the past the only way I got around is by try and error :-(

    You might specify a binary encoding in your open statement (binmode only set the crlf <-> lf conversion, but it doesn't change charset conversion).

    Gerald

    -----Original Message-----
    From: Jean-Christophe Boggio
    Sent: Thursday, April 05, 2012 6:43 PM
    To: Chris Allen
    Cc: embperl@perl.apache.org
    Subject: Re: Upload problem

    Thanks for taking the time to help me.

    Le 05/04/2012 08:48, Chris Allen a écrit :
    Can you include all of the headers here please?
    I have attached the beginning of the dump (tcpdump addresses are changed
    to aa.aa.aaa.aa and bb.bbb.bb.bb but it's easy to find the real ones). Hope
    the list accepts attachments.
    The whole dump is 2.5Mb so I won't post it to the list but I have it handy if
    you need.
    It's possible you have more than one issue here. Firstly, what happens
    if you upload several textfiles (ASCII data only)? Do they upload
    correctly? Or perhaps they upload correctly but truncated?
    Uploaded the full tcpdump (2670592bytes). It's pure 7-bit ASCII : same size,
    same md5sum Uploaded a linux-header Makefile (53Kb). Probably 7-bit ASCII
    : same size, same md5sum

    Uploaded a big ASCII file containing a few accents :
    1395336 original
    1395118 copy
    Results are... insane : here is the diff :

    diff -u 0410959v-phase2.txt 14.jpg
    --- original 2011-09-05 15:18:49.000000000 +0200
    +++ copy 2012-04-05 16:17:22.091080638 +0200
    @@ -38,18 +38,18 @@
    Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    -Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    +Use of uninitialized value in numeric eq (==) at ext-bin/do_5_gense2.pl line
    1126.
    +Use of uninitialized value in numeric et-bin/do_5_genfichiers_phase2.pl
    line 1126.
    +Use ofed value in n=) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use ozed value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl
    line 1126.
    +Use of uninitialized valu et-bin/do_5_genfichiers_phase2.pl line 1126.
    +Use of uninite in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl
    line 1126.
    +Use of uninitialized value in num a_5_genfichiers_phase2.pl line 1126.
    +Use of uninitialized eric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line
    1126.
    +Use of uninitialized value in numeric eq (inhiers_phase2.pl line 1126.
    +Use of uninitialized value in ==) at ext-bin/do_5_genfichiers_phase2.pl line
    1126.
    +Use of uninitialized value in numeric eq (==) at exense2.pl line 1126.
    +Use of uninitialized value in numeric et-bichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    Use of uninitialized value in numeric eq (==) at ext-
    bin/do_5_genfichiers_phase2.pl line 1126.
    @@ -258,7 +258,7 @@
    Warning: Permanently added '[10.141.0.61]:2222' (RSA) to the list of known
    hosts.
    Ubuntu 10.04.3 LTS
    Warning: Permanently added '192.168.122.130' (RSA) to the list of known
    hosts.
    -Arret du LDAP (patienter 10 secondes)
    +Arret LDAP (patienter 10 secondes)
    Stopping daemon monitor: monit.
    Stopping OpenLDAP: slapd.
    tar: Removing leading `/' from member names

    The differences are lines 41-52 and 261 though the file is 23818 lines long. I
    guess it comes from the fact that there's only one 32768-bytes buffer
    "corrupted" ?
    Accents are only lines 2-191 (not on all lines) The accents are still there,
    untouched. In the original file, they are UTF-8 encoded :
    iconv -f utf8 -t latin1 original >/dev/null
    -> no error

    Also the files are not "truncated", there are bits randomly missing in the
    middle.


    So as I understand it, the problemS (UTF8 encoding + bits missing) arise only
    when
    non-UTF8 characters are encountered.

    If you have ideas of where/what I can look next...

    Thanks for your patience,

    --
    Jean-Christophe Boggio -o)
    embperl@thefreecat.org /\\
    Independant Consultant and Developer _\_V

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org
  • Jean-Christophe Boggio at Apr 5, 2012 at 11:10 pm
    Hi Ed,

    Le 05/04/2012 15:16, Ed Grimm a écrit :
    If my guess is right, then I think doing a
    binmode OUT, ':encoding(UTF-8)';
    Tried that : the file is even more encoded (its size is growing rather than
    shrinking). Here's the "jpeg" header :

    0x00000000: C3BFC398 C3BFC3A0 00104A46 49460001 ..........JFIF..
    0x00000010: 01010048 00480000 C3BFC39B 00430001 ...H.H.......C..

    Thanks for your suggestion.


    I wonder if I'm the only one on this list to upload non-7 bit files in HTTP with embperl ?
    If so, it *has* to come from my config.

    --
    Jean-Christophe Boggio -o)
    embperl@thefreecat.org /\\
    Independant Consultant and Developer _\_V

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org
  • Dirk Jagdmann at Apr 5, 2012 at 11:37 pm

    I wonder if I'm the only one on this list to upload non-7 bit files in HTTP with
    embperl ?
    If so, it *has* to come from my config.
    I have an application with upload support (for any type of file). However I have
    not dealed with UTF-8 encoding, or in other words my system is simply using 8
    bit characters in LATIN1.

    --
    ---> Dirk Jagdmann
    ----> http://cubic.org/~doj
    -----> http://llg.cubic.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
    For additional commands, e-mail: embperl-help@perl.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupembperl @
categoriesmodperl, perl
postedApr 4, '12 at 4:21p
activeApr 9, '12 at 4:25p
posts7
users4
websiteperl.apache.org

People

Translate

site design / logo © 2018 Grokbase