"gbp" <gpepice1 at nycap.rr.com> wrote in message
news:398248BC.32DE3CC at nycap.rr.com...
I have experience with Perl but non with Python.
I need to write a script to read a large text file into a structured
Which is a list of 'records', one per line?
You can get the list of the lines by calling the readlines() method
on the file object. (If your file is too large to fit in memory at
once, you will have to loop line by line, but that's another issue).
If you have a lineToRecord function that takes a line and outputs
the record you desire for it, you can build your list of records
very easily (if it can all fit in memory at once, transiently):
theresult = map(lineToRecord, thefile.readlines())
while if you have to loop because it can't all fit in memory it
becomes something like:
theresult = 
line = thefile.readline()
if !line: break
or equivalent forms (often discussed because many don't like the
while 1: ... break idiom, but that's quite another issue).
In Perl one can create a list of records using something like pointers.
So really you have a list of pointers-- each one pointing to an
anonymous record. Each record is some data from a line in the file.
The pointers are, if you will, totally implicit in Python. Just
think of the list-of-records. So, the lineToRecord function is
really all you truly care about.
Since I learned data structures in languages with pointers (C, pascal)
I'm stuck. How does one go about constructing a list of records in
python? My python is still pretty weak. I understand how the lists
work (a lot like Perl). I understand that you can use objects as
records. I don't really get the full OOP though. I guess that you
can't make record anonymous in python? Is that right? Can you have
You cannot FAIL to have pointers, but the language handles them
for you, so you never really have to think about them.
If your records are just a collection of fields, and it's OK for
the fields to be identified by sequence, you will probably want
to use a tuple for each record (a list would also do, but a tuple
is clearer and more efficient if you don't need it to be
mutable). E.g., say that for each line you need to extract
its length, the 1st character, and the 4th character (and you
know all lines are at least 4 characters long). Then:
return len(line), line, line
If some lines are shorter you may want more discrimination:
l = len(line)
return l, line, line
return l, line
or if you prefer (a matter of style):
return len(line), line, line
return len(line), line
and that is also OK -- there is no need for all entries in the
list to have the same length (not even for them to be in any
way homogeneous, e.g. all tuples -- it only depends on what
proves most convenient to YOU, in terms of first preparing
the list, and later using it in some ways).
If you need more structure to your record, than just a sequence
of fields (where each field can be a scalar, None, another
nested sequence, ...), you will often choose to return a dictionary
from your lineToRecord function.
Tuples and dictionaries are anonymous.
Rarely, and only if your structuring needs are heavy, will you
NEED to return 'full-blown objects', i.e. instances of some
class of yours (which cannot be anonymous), although you may
_choose_ to do it for convenience even when a tuple or dict
would suffice (a class-instance might just be a handier way
to pack and access a dictionary, in some cases).
When I was learning Perl I found a manual page on this. Couldn't find a
example for Python. Hope that doesn't mean it can't be done.
I haven't yet found a task that can't be done in Python -- and
what's more, most can be done very elegantly & conveniently.