In my ongoing quest to create a PDF parser in Perl6, I have some
Rakudo/PGE/parrot questions. These are low-urgency and some of these
may not be implemented yet...

1) byte orientation

PDF's syntax is inherently an 8-bit ASCII superset. Some subsections
may be interpreted as some multi-byte encoding or even binary, but
low-level parsers can safely work solely in the string-as-byte-array

How do I make a grammar work on bytes instead of chars? Is that a
property of the $.target string?

2) file as lazy string

PDF files are largely random access, but individual segments have
arbitrary lengths. Rather than slurping in the whole file or
guessing at segment lengths, I'd like to emulate a string via a
wrapper around a seekable file, and then apply my grammar to that
fake string. I think I can accomplish this by subclassing PGE::Match
and override new(), text() and item() appropriately. text() would
seek to appropriate locations in the file and buffer chunks at a
time. From there, I could substr the desired passages.

Does anyone know any implementation details that would make this lazy-
string approach work or not work? Has someone tried this?

It seems like the runtime/parrot/library/Stream classes parallel what
I want to accomplish.

3) gzip

Has anyone worked on a zlib interface?


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupperl6-users @
postedNov 22, '08 at 3:25a
activeNov 22, '08 at 3:25a

1 user in discussion

Chris Dolan: 1 post



site design / logo © 2021 Grokbase