# I was just at a job interview and was asked to write a function that did
a frequency of various words from a string.
# I wrote something out on the board similar to my freq() method below.
When I got home I coded this example up as
# an exercise
#
# I was then asked how could I improve the efficiency of the function. My
guess is it would need to involve multi processing ?
# multi threading would not help I don't think. Offhand I wasn't sure how
to answer the question. Multi processing via fork is
# not easy because the freq has to be combined at the end. That seems to
have been the best answer but how to do it in
# my simple function I don't know ..
#
# they also asked what is the complexity which I forget that stuff as once
in a while I review it, but then within a few months I forget it again and
# I have many areas I am trying to study ..
#
#

require 'benchmark'

class FreqTest
   attr_reader :data_str

   def initialize
     gen_data
   end

     def freq
         map = {}
         tokens = @data_str.scan(/\w+/)
         tokens.each do |tok|
             map[tok] ||= 0
             map[tok] += 1
         end
         map
   end

   private

     def gen_data
         parts = %w(ze pu ke no zi al loo kuna zetch pe def tuna kaga)
         @data_str = ""
         loop_cnt = rand(100000) + 100
         loop_cnt.times do
             str = ""
             (rand(4) + 1).times{str << parts[rand(parts.length)]}
             # puts "<" + str + ">"
             @data_str << str + " "
         end
     end

end

ftest = FreqTest.new


puts ftest.data_str

puts Benchmark.measure{
     ftest.freq
}

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe@googlegroups.com.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/fa8f97a6-900a-4246-b178-d1ed90b03e3f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Frederick Cheung at Sep 12, 2015 at 5:58 pm

    On Friday, September 11, 2015 at 7:32:01 PM UTC+1, wbsurfver@yahoo.com wrote:

    # I was just at a job interview and was asked to write a function that did
    a frequency of various words from a string.
    # I wrote something out on the board similar to my freq() method below.
    When I got home I coded this example up as
    # an exercise
    #
    # I was then asked how could I improve the efficiency of the function. My
    guess is it would need to involve multi processing ?
    # multi threading would not help I don't think. Offhand I wasn't sure how
    to answer the question. Multi processing via fork is
    # not easy because the freq has to be combined at the end. That seems to
    have been the best answer but how to do it in
    # my simple function I don't know ..
    #
    At the moment your inner loop body looks like this:

    map[tok] ||= 0
    map[tok] += 1

    which reads from the hash twice and writes to it once or twice depending on
    whether the value is already set.

    You could do
    existing = map[tok] || 0
    map[tok] = existing + 1

    which always does one hash read and one hash write. You could also use a
    hash with default value

    map = Hash.new(0)
    And then in your loop, just map[tok]+=1

    You could also argue that building up an array of words is wasteful -
    you're allocating a great big array of words but you don't actually need
    the array - just one work at a time.. You could instead do

    @data_str.scan(/\w+/) do |token|
       map[token] += 1
    end

    In a quick benchmark this is about 15% faster than the original on my
    machine

    Personally I wouldn't consider parallelising this to be something that
    makes it more efficient - just something that makes it faster. Forking
    should work fine. As you say you will have to combine the results, but
    that's not a problem. You might want to look at the map-reduce approach for
    more examples of this

    Fred

    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe@googlegroups.com.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/13ee10a3-f422-4512-b744-f7ef23d5bba7%40googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprubyonrails-talk @
categoriesrubyonrails
postedSep 11, '15 at 6:32p
activeSep 12, '15 at 5:58p
posts2
users2
websiterubyonrails.org
irc#RubyOnRails

2 users in discussion

Frederick Cheung: 1 post Wbsurfver: 1 post

People

Translate

site design / logo © 2021 Grokbase