I'm using Python to parse out metrics from logfiles, and ship them off to a database called InfluxDB, using their Python driver (https://github.com/influxdb/influxdb-python).

With InfluxDB, it's more efficient if you pack in more points into each message.

Hence, I'm using the grouper() recipe from the itertools documentation (https://docs.python.org/3.6/library/itertools.html), to process the data in chunks, and then shipping off the points at the end of each chunk:

   def grouper(iterable, n, fillvalue=None):
       "Collect data into fixed-length chunks or blocks"
       # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
       args = [iter(iterable)] * n
       return zip_longest(fillvalue=fillvalue, *args)
   for chunk in grouper(parse_iostat(f), 500):
       json_points = []
       for block in chunk:
           if block:
                   for i, line in enumerate(block):
                       # DO SOME STUFF
               except ValueError as e:
                   print("Bad output seen - skipping")
       print("Wrote in {} points to InfluxDB".format(len(json_points)))

However, for some parsers, not every line will yield a datapoint.

I'm wondering if perhaps rather than trying to chunk the input, it might be better off just calling len() on the points list each time, and sending it off when it's ready. E.g.:

     #!/usr/bin/env python3

     json_points = []
     _BATCH_SIZE = 2

     for line_number, line in enumerate(open('blah.txt', 'r')):
         if 'cat' in line:
             print('Found cat on line {}'.format(line_number + 1 ))
             print("json_points contains {} points".format(len(json_points)))
         if len(json_points) >= _BATCH_SIZE:
             # print("json_points contains {} points".format(len(json_points)))
             print('Sending off points!')
             json_points = []

     print("Loop finished. json_points contains {} points".format(len(json_points)))
     print('Sending off points!')

Does the above seem reasonable? Any issues you see? Or are there any other more efficient approaches to doing this?

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
postedSep 4, '15 at 10:09p
activeSep 4, '15 at 10:09p

1 user in discussion

Victor Hooi: 1 post



site design / logo © 2019 Grokbase