FAQ
Hi,


I'm using Python to parse out metrics from logfiles, and ship them off to a database called InfluxDB, using their Python driver (https://github.com/influxdb/influxdb-python).


With InfluxDB, it's more efficient if you pack in more points into each message.


Hence, I'm using the grouper() recipe from the itertools documentation (https://docs.python.org/3.6/library/itertools.html), to process the data in chunks, and then shipping off the points at the end of each chunk:


   def grouper(iterable, n, fillvalue=None):
       "Collect data into fixed-length chunks or blocks"
       # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
       args = [iter(iterable)] * n
       return zip_longest(fillvalue=fillvalue, *args)
   ....
   for chunk in grouper(parse_iostat(f), 500):
       json_points = []
       for block in chunk:
           if block:
               try:
                   for i, line in enumerate(block):
                       # DO SOME STUFF
               except ValueError as e:
                   print("Bad output seen - skipping")
       client.write_points(json_points)
       print("Wrote in {} points to InfluxDB".format(len(json_points)))




However, for some parsers, not every line will yield a datapoint.


I'm wondering if perhaps rather than trying to chunk the input, it might be better off just calling len() on the points list each time, and sending it off when it's ready. E.g.:


     #!/usr/bin/env python3


     json_points = []
     _BATCH_SIZE = 2


     for line_number, line in enumerate(open('blah.txt', 'r')):
         if 'cat' in line:
             print('Found cat on line {}'.format(line_number + 1 ))
             json_points.append(line_number)
             print("json_points contains {} points".format(len(json_points)))
         if len(json_points) >= _BATCH_SIZE:
             # print("json_points contains {} points".format(len(json_points)))
             print('Sending off points!')
             json_points = []


     print("Loop finished. json_points contains {} points".format(len(json_points)))
     print('Sending off points!')


Does the above seem reasonable? Any issues you see? Or are there any other more efficient approaches to doing this?

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedSep 4, '15 at 10:09p
activeSep 4, '15 at 10:09p
posts1
users1
websitepython.org

1 user in discussion

Victor Hooi: 1 post

People

Translate

site design / logo © 2019 Grokbase