FAQ
Hi

I've been trying to decode a series of observations from multiple files
(each file is a different time) and put each type of observation into their
own separate file. The script runs successfully for one file but whenever I
try it for more they just overwrite each other. I'm new to python and I'm
not sure how to go about efficiently running through the process once and
then appending to the output file for all other input files. Has anyone done
something similar to this before?



If it helps, I'll also attach a sample of one of the input files


#!/usr/bin/python

import sys
import os
import re
import fileinput

#load in file list
#obs = os.system('ls s[i,m,n]uk[0,2,4][1,2,3]d_??00P.DATA')
obs = ['siuk21d_0300P.DATA', 'siuk21d_0900P.DATA']
print obs
#code for file type "datalist"
#fname = "datalist_201081813.txt"


#output files
foutname1 = 'prestest.txt'
foutname2 = 'temptest.txt'
foutname3 = 'tempdtest.txt'
foutname4 = 'wspeedtest.txt'
foutname5 = 'winddtest.txt'


#prepare times
time=[]
year="2009"
month="09"
day="18"
hour=[]

#outputs
pres_out = ''
temp_out = ''
dtemp_out = ''
dir_out = ''
speed_out = ''
x =''


#load in station file with lat/lons
file2 = open("uk_stations.txt","r")
stations = file2.readlines()
ids=[]
names=[]
lats=[]
lons=[]
for item in stations:
item_list = item.strip().split(',')
ids.append(item_list[0])
names.append(item_list[1])
lats.append(item_list[2])
lons.append(item_list[3])

#create loop over file list
time= [item.split('_')[1].split('.')[0] for item in obs]
print time
for x in time:
hour= x[:2]
print hour
newtime = year+month+day+'_'+hour+'00'
print newtime
for file in fileinput.input(obs):
data=file[:file.find(' 333 ')]
#data=st[split:]
print data
elements=data.split(' ')
print elements
station_id = elements[0]
try:
index = ids.index(station_id)
lat = lats[index]
lon = lons[index]
message_type = 'ADPSFC'
except:
print 'Station ID',station_id,'not in list!'
lat = lon = 'NaN'
message_type = 'Bad_station_id'
try:
temp = [item for item in elements if item.startswith('1')][0]
temperature = float(temp[2:])/10
sign = temp[1]
if sign == 1:
temperature=-temperature
except:
temperature='NaN'

try:
dtemp = [item for item in elements if item.startswith('2')][0]
dtemperature = float(dtemp[2:])/10
sign = dtemp[1]
if sign == 1:
dtemperature=-dtemperature
except:
detemperature='NaN'
try:
press = [item for item in elements[2:] if item.startswith('4')][0]
if press[1]=='9':
pressure = float(press[1:])/10
else:
pressure = float(press[1:])/10+1000
except:
pressure = 'NaN'

try:
wind = elements[elements.index(temp)-1]
direction = float(wind[1:3])*10
speed = float(wind[3:])*0.514444444
except:
direction=speed='NaN'



newline =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+ "002"
+c+"-9999"+c+"-9999"+c+str(pressure)+c
pres_out+=newline+'\n'


newline2 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+ "011"
+c+"-9999"+c+"-9999"+c+str(temperature)+c
print newline2
temp_out+=newline2+'\n'
fout = open(foutname2,'w')
fout.writelines(temp_out)
fout.close()




newline3 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+ "017"
+c+"-9999"+c+"-9999"+c+str(dtemperature)+c
print newline3
dtemp_out+=newline3+'\n'
fout = open(foutname3,'w')
fout.writelines(dtemp_out)
fout.close()


newline4 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+ "031"
+c+"-9999"+c+"-9999"+c+str(direction)+c
print newline4
dir_out+=newline4+'\n'
fout = open(foutname4,'w')
fout.writelines(dir_out)
fout.close()


newline5 =
message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+
"032"+c+"-9999"+c+"-9999"+c+str(speed)+c
print newline5
speed_out+=newline5+'\n'


fout = open(foutname1,'w')
fout.writelines(pres_out)
fout.close()
fout = open(foutname2,'w')
fout.writelines(temp_out)
fout.close()
fout = open(foutname3,'w')
fout.writelines(dtemp_out)
fout.close()
fout = open(foutname4,'w')
fout.writelines(dir_out)
fout.close()
fout = open(foutname5,'w')
fout.writelines(speed_out)
fout.close()










cheers

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101014/e955f299/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: siuk21d_0300P.DATA
Type: application/octet-stream
Size: 3298 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20101014/e955f299/attachment.obj>

Search Discussions

  • John Posner at Oct 14, 2010 at 2:16 pm

    On 10/14/2010 6:08 AM, Christopher Steele wrote:
    Hi

    I've been trying to decode a series of observations from multiple files
    (each file is a different time) and put each type of observation into
    their own separate file. The script runs successfully for one file but
    whenever I try it for more they just overwrite each other.
    fileinput.input() iterates over *lines* not entire *files*. So take a
    look at this location in the code:

    for file in fileinput.input(obs):
    data=file[:file.find(' 333 ')]

    Did you mean your iteration variable to be "file", implying that it will
    hold an entire file of input data?

    If you meant the iteration variable to be named "textline" instead of
    "file", is it guaranteed that string ' 333 ' will occur in every such
    text line?


    -John
  • Christopher Steele at Oct 14, 2010 at 2:44 pm
    The issue is that I need to be able to both, split the names of the files so
    that I can extract the relevant times, and open each individual file and
    process each line individually. Once I have achieved this I need to append
    the sorted files onto one another in one long file so that I can pass them
    into a verification package. I've tried changing the name to textline and I
    get the same result - the sorted files overwrite one another.
    The data are actually meteorological observations and I need to manipulate
    them in order to test the performance of a model. The 333 denotes that cloud
    observations are going to follow - something that is not always reported at
    stations.

    I hope this has helped

    Chris

    On Thu, Oct 14, 2010 at 3:16 PM, John Posner wrote:
    On 10/14/2010 6:08 AM, Christopher Steele wrote:

    Hi

    I've been trying to decode a series of observations from multiple files
    (each file is a different time) and put each type of observation into
    their own separate file. The script runs successfully for one file but
    whenever I try it for more they just overwrite each other.
    fileinput.input() iterates over *lines* not entire *files*. So take a look
    at this location in the code:


    for file in fileinput.input(obs):
    data=file[:file.find(' 333 ')]

    Did you mean your iteration variable to be "file", implying that it will
    hold an entire file of input data?

    If you meant the iteration variable to be named "textline" instead of
    "file", is it guaranteed that string ' 333 ' will occur in every such text
    line?


    -John
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20101014/853fd969/attachment.html>
  • John Posner at Oct 14, 2010 at 7:15 pm

    On 10/14/2010 10:44 AM, Christopher Steele wrote:
    The issue is that I need to be able to both, split the names of the
    files so that I can extract the relevant times, and open each
    individual file and process each line individually. Once I have
    achieved this I need to append the sorted files onto one another in
    one long file so that I can pass them into a verification package.
    I've tried changing the name to textline and I get the same result
    I'm very happy to hear that changing the name of a variable did not
    affect the way the program works! Anything else would be worrisome.

    - the sorted files overwrite one another.
    Variable *time* names a list, with one member for each input file. But
    variable *newtime* names a scalar value, not a list. That looks like a
    problem to me. Either of the following changes might help:

    Original:

    for x in time:
    hour= x[:2]
    print hour
    newtime = year+month+day+'_'+hour+'00'

    Alternative #1:

    newtime = []
    for x in time:
    hour= x[:2]
    print hour
    newtime.append(year+month+day+'_'+hour+'00')

    Alternative #2:
    newtime = [year + month + day + '_' + x[:2] + '00' for x in time]


    HTH,
    John
  • Christopher Steele at Oct 15, 2010 at 10:59 am
    Thanks,

    The issue with the times is now sorted, however I'm running into a problem
    towards the end of the script:

    File "sortoutsynop2.py", line 131, in <module>
    newline =
    message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+ "002"
    +c+"-9999"+c+"-9999"+c+str(pressure)+c
    TypeError: cannot concatenate 'str' and 'list' objects


    I think I can see the issue here, but I'm not entirely sure how to get
    around it. Several of my variables change either from one file to the next
    or from each line. Time and pressure would be examples of both of these
    types.Yet others, such as message_type, are constant. As a result I have a
    mixture of both lists and strings. Should I then create a list of the
    constant values? I'm a little confused, I'll send you the script that works
    for a single file and I'll see if I can come up with a more logical way
    around it.

    #!/usr/bin/python

    import sys
    import os
    import re

    #foutname = 'test.txt'
    #filelist = os.system('ls
    fname = "datalist_201081813.txt"
    foutname1 = 'prestest.txt'
    foutname2 = 'temptest.txt'
    foutname3 = 'tempdtest.txt'
    foutname4 = 'wspeedtest.txt'
    foutname5 = 'winddtest.txt'

    time = fname.split('_')[1].split('.')[0]
    year = time[:4]
    month = time[4:6]
    day = time[6:8]
    hour = time[-2:]

    newtime = year+month+day+'_'+hour+'0000'
    c = ','
    file1 = open(fname,"r")


    file2 = open("uk_stations.txt","r")
    stations = file2.readlines()
    ids=[]
    names=[]
    lats=[]
    lons=[]
    for item in stations:
    item_list = item.strip().split(',')
    ids.append(item_list[0])
    names.append(item_list[1])
    lats.append(item_list[2])
    lons.append(item_list[3])


    st = file1.readlines()
    print st
    data=[item[:item.find(' 333 ')] for item in st]
    #data=st[split:]
    print data

    pres_out = ''
    temp_out = ''
    dtemp_out = ''
    dir_out = ''
    speed_out = ''

    for line in data:
    elements=line.split(' ')
    station_id = elements[0]
    try:
    index = ids.index(station_id)
    lat = lats[index]
    lon = lons[index]
    message_type = 'blah'
    except:
    print 'Station ID',station_id,'not in list!'
    lat = lon = 'NaN'
    message_type = 'Bad_station_id'

    try:
    temp = [item for item in elements if item.startswith('1')][0]
    temperature = float(temp[2:])/10
    sign = temp[1]
    if sign == 1:
    temperature=-temperature
    except:
    temperature='NaN'

    try:
    dtemp = [item for item in elements if item.startswith('2')][0]
    dtemperature = float(dtemp[2:])/10
    sign = dtemp[1]
    if sign == 1:
    dtemperature=-dtemperature
    except:
    detemperature='NaN'
    try:
    press = [item for item in elements[2:] if item.startswith('4')][0]
    if press[1]=='9':
    pressure = float(press[1:])/10
    else:
    pressure = float(press[1:])/10+1000
    except:
    pressure = 'NaN'

    try:
    wind = elements[elements.index(temp)-1]
    direction = float(wind[1:3])*10
    speed = float(wind[3:])*0.514444444
    except:
    direction=speed='NaN'



    newline =
    message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+'-9999'+c+'002'+c+'-9999'+c+'-9999'+c+str(pressure)+c
    print newline
    pres_out+=newline+'\n'


    newline2 =
    message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+ "011"
    +c+"-9999"+c+"-9999"+c+str(temperature)+c
    print newline2
    temp_out+=newline2+'\n'
    fout = open(foutname2,'w')
    fout.writelines(temp_out)
    fout.close()




    newline3 =
    message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+ "017"
    +c+"-9999"+c+"-9999"+c+str(dtemperature)+c
    print newline3
    dtemp_out+=newline3+'\n'
    fout = open(foutname3,'w')
    fout.writelines(dtemp_out)
    fout.close()


    newline4 =
    message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+ "031"
    +c+"-9999"+c+"-9999"+c+str(direction)+c
    print newline4
    dir_out+=newline4+'\n'
    fout = open(foutname4,'w')
    fout.writelines(dir_out)
    fout.close()


    newline5 =
    message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+
    "032"+c+"-9999"+c+"-9999"+c+str(speed)+c
    print newline5
    speed_out+=newline5+'\n'


    fout = open(foutname1,'w')
    fout.writelines(pres_out)
    fout.close()
    fout = open(foutname2,'w')
    fout.writelines(temp_out)
    fout.close()
    fout = open(foutname3,'w')
    fout.writelines(dtemp_out)
    fout.close()
    fout = open(foutname4,'w')
    fout.writelines(dir_out)
    fout.close()
    fout = open(foutname5,'w')
    fout.writelines(speed_out)
    fout.close()


    cheers

    Chris











    On Thu, Oct 14, 2010 at 8:15 PM, John Posner wrote:
    On 10/14/2010 10:44 AM, Christopher Steele wrote:

    The issue is that I need to be able to both, split the names of the files
    so that I can extract the relevant times, and open each individual file and
    process each line individually. Once I have achieved this I need to append
    the sorted files onto one another in one long file so that I can pass them
    into a verification package. I've tried changing the name to textline and I
    get the same result
    I'm very happy to hear that changing the name of a variable did not affect
    the way the program works! Anything else would be worrisome.



    - the sorted files overwrite one another.
    Variable *time* names a list, with one member for each input file. But
    variable *newtime* names a scalar value, not a list. That looks like a
    problem to me. Either of the following changes might help:

    Original:


    for x in time:
    hour= x[:2]
    print hour
    newtime = year+month+day+'_'+hour+'00'

    Alternative #1:

    newtime = []

    for x in time:
    hour= x[:2]
    print hour
    newtime.append(year+month+day+'_'+hour+'00')

    Alternative #2:
    newtime = [year + month + day + '_' + x[:2] + '00' for x in time]


    HTH,
    John
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20101015/fffadbfe/attachment.html>
  • John Posner at Oct 15, 2010 at 3:26 pm

    On 10/15/2010 6:59 AM, Christopher Steele wrote:
    Thanks,

    The issue with the times is now sorted, however I'm running into a
    problem towards the end of the script:

    File "sortoutsynop2.py", line 131, in <module>
    newline =
    message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+c+"-9999"+c+
    "002" +c+"-9999"+c+"-9999"+c+str(pressure)+c
    TypeError: cannot concatenate 'str' and 'list' objects


    I think I can see the issue here, but I'm not entirely sure how to get
    around it. Several of my variables change either from one file to the
    next or from each line. Time and pressure would be examples of both of
    these types.Yet others, such as message_type, are constant. As a
    result I have a mixture of both lists and strings. Should I then
    create a list of the constant values?
    I suggest maintaining a list for each such variable, in order to keep
    your code simpler. It won't matter that some lists contain the same
    value over and over and over.

    (There's a slight possibility it would matter if you're dealing with
    massive amounts of data. But that's the kind of problem that you don't
    need to solve until you encounter it.)

    Some more notes below, interspersed with your code ...
    I'm a little confused, I'll send you the script that works for a
    single file
    Yes! That's a much better approach: figure out how to handle one file,
    place the code inside a function that takes the filename as an argument,
    and call the function on each file in turn.
    and I'll see if I can come up with a more logical way around it.

    #!/usr/bin/python

    import sys
    import os
    import re

    #foutname = 'test.txt'
    #filelist = os.system('ls
    fname = "datalist_201081813.txt"
    There's a digit missing from the above filename.

    foutname1 = 'prestest.txt'
    foutname2 = 'temptest.txt'
    foutname3 = 'tempdtest.txt'
    foutname4 = 'wspeedtest.txt'
    foutname5 = 'winddtest.txt'

    time = fname.split('_')[1].split('.')[0]
    year = time[:4]
    month = time[4:6]
    day = time[6:8]
    hour = time[-2:]

    newtime = year+month+day+'_'+hour+'0000'
    c = ','
    file1 = open(fname,"r")


    file2 = open("uk_stations.txt","r")
    stations = file2.readlines()
    ids=[]
    names=[]
    lats=[]
    lons=[]
    for item in stations:
    item_list = item.strip().split(',')
    ids.append(item_list[0])
    names.append(item_list[1])
    lats.append(item_list[2])
    lons.append(item_list[3])


    st = file1.readlines()
    print st
    data=[item[:item.find(' 333 ')] for item in st]
    I still think there's a problem in the above statement. In the data file
    you provided in a previous message, some lines lack the ' 333 '
    substring. In such lines, the find() method will return -1, which (I
    think) is not what you want. Ex:
    item = '11111 22222 333 44444'
    item[:item.find(' 333 ')]
    '11111 22222'
    item = '11111 22222 44444'
    item[:item.find(' 333 ')]
    '11111 22222 4444'

    Note that the last digit, "4", gets dropped. I *think* you want
    something like this:

    data = []
    posn = item.find(' 333 ')
    if posn != -1:
    data.append(item[:posn])
    else:
    data.append(...some other value...)

    #data=st[split:]
    print data

    pres_out = ''
    temp_out = ''
    dtemp_out = ''
    dir_out = ''
    speed_out = ''

    for line in data:
    elements=line.split(' ')
    Do you really want to specify a SPACE character argument to split()?
    'aaa bbb ccc'.split(' ')
    ['aaa', 'bbb', '', '', '', 'ccc']
    'aaa bbb ccc'.split()
    ['aaa', 'bbb', 'ccc']

    station_id = elements[0]
    try:
    index = ids.index(station_id)
    lat = lats[index]
    lon = lons[index]
    message_type = 'blah'
    except:
    It's bad form to use a "bare except", which defines a code block to be
    executed if *anything* does wrong. You should specify what you're
    expecting to go wrong:

    except IndexError:
    print 'Station ID',station_id,'not in list!'
    lat = lon = 'NaN'
    message_type = 'Bad_station_id'

    try:
    temp = [item for item in elements if item.startswith('1')][0]
    temperature = float(temp[2:])/10
    sign = temp[1]
    if sign == 1:
    temperature=-temperature
    except:
    temperature='NaN'
    What are expecting to go wrong (i.e. what exception might occur) in the
    above try/except code?
    try:
    dtemp = [item for item in elements if item.startswith('2')][0]
    dtemperature = float(dtemp[2:])/10
    sign = dtemp[1]
    if sign == 1:
    dtemperature=-dtemperature
    except:
    detemperature='NaN'
    try:
    press = [item for item in elements[2:] if item.startswith('4')][0]
    if press[1]=='9':
    pressure = float(press[1:])/10
    else:
    pressure = float(press[1:])/10+1000
    except:
    pressure = 'NaN'

    try:
    wind = elements[elements.index(temp)-1]
    direction = float(wind[1:3])*10
    speed = float(wind[3:])*0.514444444
    except:
    direction=speed='NaN'



    newline =
    message_type+c+str(station_id)+c+newtime+c+lat+c+lon+c+'-9999'+c+'002'+c+'-9999'+c+'-9999'+c+str(pressure)+c
    Try this:

    newline = c.join([message_type, str(station_id), newtime,
    lat, lon, '-9999', '002',
    '-9999', '-9999', str(pressure)]) + c

    You can split a square-bracketed list onto multiple lines.

    -John

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedOct 14, '10 at 10:08a
activeOct 15, '10 at 3:26p
posts6
users2
websitepython.org

2 users in discussion

John Posner: 3 posts Christopher Steele: 3 posts

People

Translate

site design / logo © 2022 Grokbase