FAQ
Hello,

I'm writing a simple FTP log parser that sums file sizes as it runs. I
have a yearTotals dictionary with year keys and the monthTotals
dictionary as its values. The monthTotals dictionary has month keys
and file size values. The script works except the results are written
for all years, rather than just one year. I'm thinking there's an
error in the way I set my dictionaries up or reference them...

import glob, traceback

years = ["2005", "2006", "2007"]
months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
# Create months dictionary to convert log values
logMonths =
{"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}
# Create monthTotals dictionary with default 0 value
monthTotals = dict.fromkeys(months, 0)
# Nest monthTotals dictionary in yearTotals dictionary
yearTotals = {}
for year in years:
yearTotals.setdefault(year, monthTotals)

currentLogs = glob.glob("/logs/ftp/*")

try:
for currentLog in currentLogs:
readLog = open(currentLog,"r")
for line in readLog.readlines():
if not line: continue
if len(line) < 50: continue
logLine = line.split()

# The 2nd element is month, 5th is year, 8th is filesize
# Counting from zero:

# Lookup year/month pair value
logMonth = logMonths[logLine[1]]
currentYearMonth = yearTotals[logLine[4]][logMonth]

# Update year/month value
currentYearMonth += int(logLine[7])
yearTotals[logLine[4]][logMonth] = currentYearMonth
except:
print "Failed on: " + currentLog
traceback.print_exc()

# Print dictionaries
for x in yearTotals.keys():
print "KEY",'\t',"VALUE"
print x,'\t',yearTotals[x]
#print " key",'\t',"value"
for y in yearTotals[x].keys():
print " ",y,'\t',yearTotals[x][y]


Thank you,
Ian

Search Discussions

  • Gabriel Genellina at Apr 11, 2007 at 7:23 pm

    En Wed, 11 Apr 2007 15:57:56 -0300, IamIan <iansan at gmail.com> escribi?:

    I'm writing a simple FTP log parser that sums file sizes as it runs. I
    have a yearTotals dictionary with year keys and the monthTotals
    dictionary as its values. The monthTotals dictionary has month keys
    and file size values. The script works except the results are written
    for all years, rather than just one year. I'm thinking there's an
    error in the way I set my dictionaries up or reference them...
    monthTotals = dict.fromkeys(months, 0)
    # Nest monthTotals dictionary in yearTotals dictionary
    yearTotals = {}
    for year in years:
    yearTotals.setdefault(year, monthTotals)
    All your years share the *same* monthTotals object.
    This is similar to this FAQ entry:
    <http://effbot.org/pyfaq/how-do-i-create-a-multidimensional-list.htm>
    You have to create a new dict for each year; replace the above code with:

    yearTotals = {}
    for year in years:
    yearTotals[year] = dict.fromkeys(months, 0)

    --
    Gabriel Genellina
  • Terry Reedy at Apr 11, 2007 at 7:42 pm
    "IamIan" <iansan at gmail.com> wrote in message
    news:1176317876.917874.58040 at n59g2000hsh.googlegroups.com...
    Hello,

    I'm writing a simple FTP log parser that sums file sizes as it runs. I
    have a yearTotals dictionary with year keys and the monthTotals
    dictionary as its values. The monthTotals dictionary has month keys
    and file size values. The script works except the results are written
    for all years, rather than just one year. I'm thinking there's an
    error in the way I set my dictionaries up or reference them...

    import glob, traceback

    years = ["2005", "2006", "2007"]
    months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
    # Create months dictionary to convert log values
    logMonths =
    {"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}
    # Create monthTotals dictionary with default 0 value
    monthTotals = dict.fromkeys(months, 0)
    # Nest monthTotals dictionary in yearTotals dictionary
    yearTotals = {}
    for year in years:
    yearTotals.setdefault(year, monthTotals)
    try yearTotals.setdefault(year, dict.fromkeys(months, 0))
    so you start with a separate subdict for each year instead of 1 for all.

    tjr
  • 7stud at Apr 11, 2007 at 8:56 pm
    1) You have this setup:

    logMonths = {"Jan":"01", "Feb":"02",...}
    yearTotals = {
    "2005":{"01":0, "02":0, ....}
    "2006":
    "2007":
    }

    Then when you get a value such as "Jan", you look up the "Jan" in the
    logMonths dictionary to get "01". Then you use "01" and the year, say
    "2005", to look up the value in the yearTotals dictionary. Why do
    that? What is the point of even having the logMonths dictionary? Why
    not make "Jan" the key in the the "2005" dictionary and look it up
    directly:

    yearTotals = {
    "2005":{"Jan":0, "Feb":0, ....}
    "2006":
    "2007":
    }

    That way you could completely eliminate the lookup in the logMonths
    dict.

    2) In this part:

    logMonth = logMonths[logLine[1]]
    currentYearMonth = yearTotals[logLine[4]][logMonth]
    # Update year/month value
    currentYearMonth += int(logLine[7])
    yearTotals[logLine[4]][logMonth] = currentYearMonth

    I'm not sure why you are using all those intermediate steps. How
    about:

    yearTotals[logLine[4]][logLine[1]] += int(logLine[7])

    To me that is a lot clearer. Or, you could do this:

    year, month, val = logLine[4], logLine[1], int(logLine[7])
    yearTotals[year][month] += val

    3)
    I'm thinking there's an error in the way
    I set my dictionaries up or reference them
    Yep. It's right here:

    for year in years:
    yearTotals.setdefault(year, monthTotals)

    Every year refers to the same monthTotals dict. You can use a dicts
    copy() function to make a copy:

    monthTotals.copy()

    Here is a reworking of your code that also eliminates a lot of typing:

    import calendar, pprint

    years = ["200%s" % x for x in range(5, 8)]
    print years

    months = list(calendar.month_abbr)
    print months

    monthTotals = dict.fromkeys(months[1:], 0)
    print monthTotals

    yearTotals = {}
    for year in years:
    yearTotals.setdefault(year, monthTotals.copy())
    pprint.pprint(yearTotals)

    logs = [
    ["", "Feb", "", "", "2007", "", "", "12"],
    ["", "Jan", "", "", "2005", "", "", "3"],
    ["", "Jan", "", "", "2005", "", "", "7"],
    ]

    for logLine in logs:
    year, month, val = logLine[4], logLine[1], int(logLine[7])
    yearTotals[year][month] += val

    for x in yearTotals.keys():
    print "KEY", "\t", "VALUE"
    print x, "\t", yearTotals[x]
    for y in yearTotals[x].keys():
    print " ", y, "\t", yearTotals[x][y]
  • Bruno Desthuilliers at Apr 11, 2007 at 8:57 pm

    IamIan a ?crit :
    Hello,

    I'm writing a simple FTP log parser that sums file sizes as it runs. I
    have a yearTotals dictionary with year keys and the monthTotals
    dictionary as its values. The monthTotals dictionary has month keys
    and file size values. The script works except the results are written
    for all years, rather than just one year. I'm thinking there's an
    error in the way I set my dictionaries up or reference them...

    import glob, traceback

    years = ["2005", "2006", "2007"]
    months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
    # Create months dictionary to convert log values
    logMonths =
    {"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}
    DRY violation alert !

    logMonths = {
    "Jan":"01",
    "Feb":"02",
    "Mar":"03",
    "Apr":"04",
    "May":"05",
    #etc
    }

    months = sorted(logMonths.values())
    # Create monthTotals dictionary with default 0 value
    monthTotals = dict.fromkeys(months, 0)
    # Nest monthTotals dictionary in yearTotals dictionary
    yearTotals = {}
    for year in years:
    yearTotals.setdefault(year, monthTotals)
    A complicated way to write:
    yearTotals = dict((year, monthTotals) for year in years)

    And without even reading further, I can tell you have a problem here:
    all 'year' entry in yearTotals points to *the same* monthTotal dict
    instance. So when updating yearTotals['2007'], you see the change
    reflected for all years. The cure is simple: forget the monthTotals
    object, and define your yearTotals dict this way:

    yearTotals = dict((year, dict.fromkeys(months, 0)) for year in years)

    NB : for Python versions < 2.4.x, you need a list comp instead of a
    generator expression, ie:

    yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])

    HTH
  • IamIan at Apr 11, 2007 at 11:20 pm
    Thank you everyone for the helpful replies. Some of the solutions were
    new to me, but the script now runs successfully. I'm still learning to
    ride the snake but I love this language!

    Ian
  • 7stud at Apr 12, 2007 at 1:01 am

    On Apr 11, 2:57 pm, Bruno Desthuilliers wrote:
    IamIan a ?crit :

    yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])

    HTH
    List comprehensions without a list? What? Where? How?
  • 7stud at Apr 12, 2007 at 1:28 am

    On Apr 11, 7:01 pm, "7stud" wrote:
    On Apr 11, 2:57 pm, Bruno Desthuilliers

    wrote:
    IamIan a ?crit :
    yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])
    HTH
    List comprehensions without a list? What? Where? How?
    Ooops. I copied the wrong one. I was looking at this one:

    yearTotals = dict((year, monthTotals) for year in years)
  • 7stud at Apr 12, 2007 at 2:00 am

    On Apr 11, 7:28 pm, "7stud" wrote:
    On Apr 11, 7:01 pm, "7stud" wrote:

    On Apr 11, 2:57 pm, Bruno Desthuilliers
    wrote:
    IamIan a ?crit :
    yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])
    HTH
    List comprehensions without a list? What? Where? How?
    Ooops. I copied the wrong one. I was looking at this one:

    yearTotals = dict((year, monthTotals) for year in years)
    Never mind. I found this PEP:

    http://www.python.org/dev/peps/pep-0289/
  • 7stud at Apr 11, 2007 at 9:08 pm

    IamIan wrote:
    Hello,

    I'm writing a simple FTP log parser that sums file sizes as it runs. I
    have a yearTotals dictionary with year keys and the monthTotals
    dictionary as its values. The monthTotals dictionary has month keys
    and file size values. The script works except the results are written
    for all years, rather than just one year. I'm thinking there's an
    error in the way I set my dictionaries up or reference them...

    import glob, traceback

    years = ["2005", "2006", "2007"]
    months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
    # Create months dictionary to convert log values
    logMonths =
    {"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}
    # Create monthTotals dictionary with default 0 value
    monthTotals = dict.fromkeys(months, 0)
    # Nest monthTotals dictionary in yearTotals dictionary
    yearTotals = {}
    for year in years:
    yearTotals.setdefault(year, monthTotals)

    currentLogs = glob.glob("/logs/ftp/*")

    try:
    for currentLog in currentLogs:
    readLog = open(currentLog,"r")
    for line in readLog.readlines():
    if not line: continue
    if len(line) < 50: continue
    logLine = line.split()

    # The 2nd element is month, 5th is year, 8th is filesize
    # Counting from zero:

    # Lookup year/month pair value
    logMonth = logMonths[logLine[1]]
    currentYearMonth = yearTotals[logLine[4]][logMonth]

    # Update year/month value
    currentYearMonth += int(logLine[7])
    yearTotals[logLine[4]][logMonth] = currentYearMonth
    except:
    print "Failed on: " + currentLog
    traceback.print_exc()

    # Print dictionaries
    for x in yearTotals.keys():
    print "KEY",'\t',"VALUE"
    print x,'\t',yearTotals[x]
    #print " key",'\t',"value"
    for y in yearTotals[x].keys():
    print " ",y,'\t',yearTotals[x][y]


    Thank you,
    Ian

    1) You have this setup:

    logMonths = {"Jan":"01", "Feb":"02",...}
    yearTotals = {
    "2005":{"01":0, "02":0, ....}
    "2006":
    "2007":
    }

    Then when you get a result such as "Jan", you look up "Jan" in the
    logMonths dictionary to get "01". Then you use "01" and the year, say
    "2005", to look up the value in the yearTotals dictionary. What is
    the point of even having the logMonths dictionary? Why not make "Jan"
    the key in the the "2005" dictionary and look it up directly:

    yearTotals = {
    "2005":{"Jan":0, "Feb":0, ....}
    "2006":
    "2007":
    }

    That way you could completely eliminate the lookup in the logMonths
    dict.

    2) In this part:

    logMonth = logMonths[logLine[1]]
    currentYearMonth = yearTotals[logLine[4]][logMonth]
    # Update year/month value
    currentYearMonth += int(logLine[7])
    yearTotals[logLine[4]][logMonth] = currentYearMonth

    I'm not sure why you are using all those intermediate steps. How
    about:

    yearTotals[logLine[4]][logLine[1]] += int(logLine[7])

    To me that is a lot clearer. Or, you could do this:

    year, month, val = logLine[4], logLine[1], int(logLine[7])
    yearTotals[year][month] += val

    3)
    I'm thinking there's an error in the way
    I set my dictionaries up or reference them
    Yep. It's right here:

    for year in years:
    yearTotals.setdefault(year, monthTotals)

    Every year refers to the same monthTotals dict. You can use a dict's
    copy() function to make a copy:

    monthTotals.copy()

    Here is a reworking of your code that also eliminates a lot of typing:

    import calendar, pprint

    years = ["200%s" % x for x in range(5, 8)]
    print years

    months = list(calendar.month_abbr)
    print months

    monthTotals = dict.fromkeys(months[1:], 0)
    print monthTotals

    yearTotals = {}
    for year in years:
    yearTotals.setdefault(year, monthTotals.copy())
    pprint.pprint(yearTotals)

    logs = [
    ["", "Feb", "", "", "2007", "", "", "12"],
    ["", "Jan", "", "", "2005", "", "", "3"],
    ["", "Jan", "", "", "2005", "", "", "7"],
    ]

    for logLine in logs:
    year, month, val = logLine[4], logLine[1], int(logLine[7])
    yearTotals[year][month] += val

    for x in yearTotals.keys():
    print "KEY", "\t", "VALUE"
    print x, "\t", yearTotals[x]
    for y in yearTotals[x].keys():
    print " ", y, "\t", yearTotals[x][y]
  • IamIan at Apr 18, 2007 at 7:16 pm
    I am using the suggested approach to make a years list:

    years = ["199%s" % x for x in range(0,10)]
    years += ["200%s" % x for x in range(0,10)]

    I haven't had any luck doing this in one line though. Is it possible?

    Thanks.
  • Steven W. Orr at Apr 18, 2007 at 7:34 pm
    On Wednesday, Apr 18th 2007 at 12:16 -0700, quoth IamIan:

    =>I am using the suggested approach to make a years list:
    =>
    =>years = ["199%s" % x for x in range(0,10)]
    =>years += ["200%s" % x for x in range(0,10)]
    =>
    =>I haven't had any luck doing this in one line though. Is it possible?

    I'm so green that I almost get a chubby at being able to answer something.
    ;-)

    years = [str(1990+x) for x in range(0,20)]

    Yes?

    --
    Time flies like the wind. Fruit flies like a banana. Stranger things have .0.
    happened but none stranger than this. Does your driver's license say Organ ..0
    Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
    individuals! What if this weren't a hypothetical question?
    steveo at syslang.net
  • Marc 'BlackJack' Rintsch at Apr 18, 2007 at 7:37 pm

    In <1176923772.011976.116190 at p77g2000hsh.googlegroups.com>, IamIan wrote:

    years = ["199%s" % x for x in range(0,10)]
    years += ["200%s" % x for x in range(0,10)]

    I haven't had any luck doing this in one line though. Is it possible?
    In [48]: years = map(str, xrange(1999, 2011))

    In [49]: years
    Out[49]:
    ['1999',
    '2000',
    '2001',
    '2002',
    '2003',
    '2004',
    '2005',
    '2006',
    '2007',
    '2008',
    '2009',
    '2010']

    Ciao,
    Marc 'BlackJack' Rintsch
  • Steven D'Aprano at Apr 19, 2007 at 6:57 am

    On Wed, 18 Apr 2007 12:16:12 -0700, IamIan wrote:

    I am using the suggested approach to make a years list:

    years = ["199%s" % x for x in range(0,10)]
    years += ["200%s" % x for x in range(0,10)]

    I haven't had any luck doing this in one line though. Is it possible?
    years = ["199%s" % x for x in range(0,10)] + \
    ["200%s" % x for x in range(0,10)]

    Sorry for the line continuation, my news reader insists on breaking the
    line. In your editor, just delete the "\" and line break to make it a
    single line.


    If you don't like that solution, here's a better one:

    years = [str(1990 + n) for n in range(20)]

    Or there's this:

    years = [str(n) for n in range(1990, 2010)]

    Or this one:

    years = map(str, range(1990, 2010))


    --
    Steven.
  • IamIan at Apr 19, 2007 at 11:35 pm
    Thank you again for the great suggestions. I have one final question
    about creating a httpMonths dictionary like {'Jan':'01' , 'Feb':'02' ,
    etc} with a minimal amount of typing. My code follows (using Python
    2.3.4):

    import calendar

    # Create years list, formatting as strings
    years = map(str, xrange(1990,2051))

    # Create months list with three letter abbreviations
    months = list(calendar.month_abbr)

    # Create monthTotals dictionary with default value of zero
    monthTotals = dict.fromkeys(months[1:],0)

    # Create yearTotals dictionary with years for keys
    # and copies of the monthTotals dictionary for values
    yearTotals = dict([(year, monthTotals.copy()) for year in years])

    # Create httpMonths dictionary to map month abbreviations
    # to Apache numeric month representations
    httpMonths =
    {"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}

    It is this last step I'm referring to. I got close with:
    httpMonths = {}
    for month in months[1:]:
    httpMonths[month] = str(len(httpMonths)+1)

    but the month numbers are missing the leading zero for 01-09. Thanks!

    Ian
  • Rzed at Apr 21, 2007 at 1:31 pm
    IamIan <iansan at gmail.com> wrote in
    news:1177025742.252000.231430 at e65g2000hsc.googlegroups.com:
    Thank you again for the great suggestions. I have one final
    question about creating a httpMonths dictionary like {'Jan':'01'
    , 'Feb':'02' , etc} with a minimal amount of typing. My code
    follows (using Python 2.3.4):

    import calendar

    # Create years list, formatting as strings
    years = map(str, xrange(1990,2051))

    # Create months list with three letter abbreviations
    months = list(calendar.month_abbr)

    # Create monthTotals dictionary with default value of zero
    monthTotals = dict.fromkeys(months[1:],0)

    # Create yearTotals dictionary with years for keys
    # and copies of the monthTotals dictionary for values
    yearTotals = dict([(year, monthTotals.copy()) for year in
    years])

    # Create httpMonths dictionary to map month abbreviations
    # to Apache numeric month representations
    httpMonths =
    {"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"0 6
    ","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":" 1
    2"}

    It is this last step I'm referring to. I got close with:
    httpMonths = {}
    for month in months[1:]:
    httpMonths[month] = str(len(httpMonths)+1)

    but the month numbers are missing the leading zero for 01-09.
    Thanks!
    Maybe something like:
    httpMonths = dict((k,"%02d" % (x+1))
    for x,k in enumerate(months[1:]) )

    --
    rzed
  • Bruno Desthuilliers at Apr 21, 2007 at 7:49 pm

    IamIan a ?crit :
    I am using the suggested approach to make a years list:

    years = ["199%s" % x for x in range(0,10)]
    years += ["200%s" % x for x in range(0,10)]

    I haven't had any luck doing this in one line though. Is it possible?
    # Q, D and pretty obvious
    years = ["199%s" % x for x in range(0,10)] + ["200%s" % x for x in
    range(0,10)]

    # hardly more involved, and quite more generic
    years = ["%s%s" % (c, y) for c in ("199", "201") for y in range(10)]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedApr 11, '07 at 6:57p
activeApr 21, '07 at 7:49p
posts17
users9
websitepython.org

People

Translate

site design / logo © 2022 Grokbase