FAQ
Hi all,

This is a bit of a long shot question, but I've not had much luck searching for a better solution.
I'll cry myself to sleep if I don't get an answer, but I won't be surprised :)

We are trying to develop an application that determines for a given movie shown at a particular
theater how much the surrounding video stores are likely to rent out.

For the theater, we have its location and gross sales. For the video stores, we have location,
demographics from the US Census, school schedules, weather reports, etc. In other words, we are
flooded with information and all attempts at going from movie sales to rental projections have
been a simple ad hoc affair of choosing sets of data and throwing it randomly at the problem to
see if the results are close to reality. The result has been a large, hard-coded set of
properties that is very inflexible and of questionable utility.

We're doing this in Perl, but I suspect we could might be able to switch languages if that's what
it takes. Does anyone know of any similar problems and could you point me in an appropriate
direction?

Cheers,
Ovid

=====
Silence is Evil http://users.easystreet.com/ovid/philosophy/indexdecency.htm
Ovid http://www.perlmonks.org/index.pl?node_id=17000
Web Programming with Perl http://users.easystreet.com/ovid/cgi_course/

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Search Discussions

  • Kevin at Sep 12, 2003 at 3:59 pm
    Well, you could set up a genetic algorithm to try and guess based on
    proximity. So you'd need an easy function to get all movie theatres
    within 5 (or X) miles of a store. You'd also need video rental reports
    to train the algorithm on - that way you can practice until its perfect
    with the history, then set it to work on the future...

    That said, GA are hard and I haven't done any in awhile :) But perl is
    definitely a good language for it, and you can find tons of GA-in-perl
    tutorials out there...

    Ciao,
    Kevin Watt
    Community Manager, Allpoetry.com
    What happened to the cow who went for a drive? He got a Moo_ving
    violation
    What do you call someone who is crazy about hot chocolate? A cocoa nut
    What do bees use to cut wood? Buzz saws
    Who eats at underwater resturants ? Suba diners
    How do really small people call each other ? On Microphones
    How do you fix a broken chimp? With a monkey wrench
    -----Original Message-----
    From: Ovid
    Sent: Friday, September 12, 2003 8:44 AM
    To: perl-ai@perl.org
    Subject: AI Sales Projections?

    Hi all,

    This is a bit of a long shot question, but I've not had much luck
    searching for a better solution.
    I'll cry myself to sleep if I don't get an answer, but I won't be
    surprised :)

    We are trying to develop an application that determines for a given movie
    shown at a particular
    theater how much the surrounding video stores are likely to rent out.

    For the theater, we have its location and gross sales. For the video
    stores, we have location,
    demographics from the US Census, school schedules, weather reports, etc.
    In other words, we are
    flooded with information and all attempts at going from movie sales to
    rental projections have
    been a simple ad hoc affair of choosing sets of data and throwing it
    randomly at the problem to
    see if the results are close to reality. The result has been a large,
    hard-coded set of
    properties that is very inflexible and of questionable utility.

    We're doing this in Perl, but I suspect we could might be able to switch
    languages if that's what
    it takes. Does anyone know of any similar problems and could you point me
    in an appropriate
    direction?

    Cheers,
    Ovid

    =====
    Silence is Evil
    http://users.easystreet.com/ovid/philosophy/indexdecency.htm
    Ovid
    http://www.perlmonks.org/index.pl?node_id=17000
    Web Programming with Perl
    http://users.easystreet.com/ovid/cgi_course/
    __________________________________
    Do you Yahoo!?
    Yahoo! SiteBuilder - Free, easy-to-use web site design software
    http://sitebuilder.yahoo.com
  • Mark Kvale at Sep 12, 2003 at 4:46 pm

    On Fri, 12 Sep 2003, Ovid wrote:

    We are trying to develop an application that determines for a given movie shown at a particular
    theater how much the surrounding video stores are likely to rent out.

    For the theater, we have its location and gross sales. For the video stores, we have location,
    demographics from the US Census, school schedules, weather reports, etc. In other words, we are
    flooded with information and all attempts at going from movie sales to rental projections have
    been a simple ad hoc affair of choosing sets of data and throwing it randomly at the problem to
    see if the results are close to reality. The result has been a large, hard-coded set of
    properties that is very inflexible and of questionable utility.

    We're doing this in Perl, but I suspect we could might be able to switch languages if that's what
    it takes. Does anyone know of any similar problems and could you point me in an appropriate
    direction?

    Hi Ovid,

    You don't say what algorithms you use to transform data into movie
    sales, so I'll give some general suggestions.

    Your question is really two questions:
    1) How do I transform various factors into an output?
    2) How do I verify the quality of my fit?

    1) For your input, it seems that you have a combination of categorical
    variables, e.g. school in session and and numerical variables,
    e.g. distance from the theatres. For your output you have a single
    numerical variable, sales. So your goal is to fit a multidimensional
    scalar function, a common task.

    There at least two ways to go about it.

    The first is a heuristic method. Identify relevant variables, assign
    sales scores to each value of the variables through comapring sales
    for each value of a categorical variable and perhaps linear fits for
    the numerical values, and combine into a total sales figure. This may
    work. If it does, you have not only a sales prediciton, but
    understanding of why you predict what you do. But as you say, it might
    be brittle and does not take into account interactions between
    variable.

    The second way is to use a machine learing approach to fit your
    multidimensional function. There are myriad approaches to machine
    learning. One of the simplest is to train a neural net. You create a
    database of training examples with entries consisting of input
    variables and known output. Then you plug this into one of the perl ML
    packages, eg,
    a) AI::NeuralNet::Mesh - trains up multi-layer perceptrons, a type of
    feedforward neural net. It has good documentation. For your problem, I
    would reccommend a 3 layer net, with one input, one hidden and one
    output layer, with tanh activation fuctions.
    b) Algorithm::SVM - trains up a support vector machine, which is a
    specialized type of neural net. I haven't used this one.

    Training will take a while, but once trained, running data through a
    neural net is quite fast.


    2) OK, so you have trained an NN -- how good is it? The stanadard
    measure of goodness is generalizability. That is, how well does it
    perform on data it has never seen? A simple method for testing
    generizability is a technique called cross-validation. In
    cross-validation, you
    a) partition the data into training data and test data. Use the
    training data to train your network. Use the test data to see how well
    your network does on data it has never seen. Performance on the test
    data will tell you how well the network will perform on, say, next
    quarter's sales figures. You'll use the test performance to see which
    neural acrhitecture is right for you. In a three layer net, the number
    of hidden units is variable and you will want to optimize this. The
    basic algorithm for doing so is:
    for num_hidden = 1 to N {
    train network with num_hidden units
    evaluate network against test data
    If best network, store it
    }
    What would be a sutiable performance measure for testing? Well, a
    common measure is least squares: the error of the netork output is the
    square of the difference of the predicted sales and the actual sales.

    This has been an extremely terse intro on to how you might solve your
    problem :) A book I like on this subject is

    Neural Networks for Pattern Recognition, by Chris Bishop

    --
    Mark Kvale, neurobiophysicist
    http://www.keck.ucsf.edu/~kvale/
  • Ala Qumsieh at Sep 12, 2003 at 5:09 pm

    On Fri, 12 Sep 2003, Mark Kvale wrote:

    a) AI::NeuralNet::Mesh - trains up multi-layer perceptrons, a type of
    feedforward neural net. It has good documentation. For your problem, I
    would reccommend a 3 layer net, with one input, one hidden and one
    output layer, with tanh activation fuctions.
    Where can I find AI::NeuralNet::Mesh? It doesn't show up in my CPAN
    mirror.

    I'm also tempted to suggest AI::FuzzyInference as a way to predict the
    sales. The fuzzy rules need to be defined somehow, and one way to do it is
    through a neural net as described by Mark. Those rules are then used by
    the inference engine to get a final sales figure.

    Just my 2 cents,
    --Ala
  • Mark Kvale at Sep 12, 2003 at 5:36 pm

    On Fri, 12 Sep 2003, Ala Qumsieh wrote:
    On Fri, 12 Sep 2003, Mark Kvale wrote:

    a) AI::NeuralNet::Mesh - trains up multi-layer perceptrons, a type of
    feedforward neural net. It has good documentation. For your problem, I
    would reccommend a 3 layer net, with one input, one hidden and one
    output layer, with tanh activation fuctions.
    Where can I find AI::NeuralNet::Mesh? It doesn't show up in my CPAN
    mirror.
    I'm not sure. Here is what my cpan says:

    Distribution J/JB/JBRYAN/AI-NeuralNet-Mesh-0.44.zip
    Module AI::NeuralNet::Mesh
    (J/JB/JBRYAN/AI-NeuralNet-Mesh-0.44.zip)
    I'm also tempted to suggest AI::FuzzyInference as a way to predict the
    sales. The fuzzy rules need to be defined somehow, and one way to do it is
    through a neural net as described by Mark. Those rules are then used by
    the inference engine to get a final sales figure.
    I've never been a fan of fuzzy inference because it is a crippled
    version of ordinary probability laws, which are not much more
    complicated than fuzzy inference rules. But some people have used
    fuzzy AI to good effect.

    Mark Kvale, neurobiophysicist
    http://www.keck.ucsf.edu/~kvale/
  • Richard Rankin at Sep 12, 2003 at 5:44 pm
    He's correct about gathering the right (and the right amount of data). If
    you need help, I can do. Will do. I've got an SMP machine with hardware
    RAID, lot's of disk and memory.

    Other tools suggestions:

    Oracle - Database, data mining tools.

    SAS - A data mining/statistical package - the gold standard

    Fuzzy logic - Creates sets for various data along with probabilities "It is
    highly likely that if a popular Sci-Fi DVD hits the stores, Sci-Fi theatre
    sales will decline", etc.

    Perl is my favorite language for parsing, editing, reporting etc. - and
    pretty much else. I use it to administrate databases all the time. For Data
    Mining? I'm not sure it's the tool of choice.



    -----Original Message-----
    From: Mark Kvale
    Sent: Friday, September 12, 2003 9:46 AM
    To: Ovid
    Cc: perl-ai@perl.org
    Subject: Re: AI Sales Projections?

    On Fri, 12 Sep 2003, Ovid wrote:

    We are trying to develop an application that determines for a given movie
    shown at a particular
    theater how much the surrounding video stores are likely to rent out.

    For the theater, we have its location and gross sales. For the video
    stores, we have location,
    demographics from the US Census, school schedules, weather reports, etc.
    In other words, we are
    flooded with information and all attempts at going from movie sales to
    rental projections have
    been a simple ad hoc affair of choosing sets of data and throwing it
    randomly at the problem to
    see if the results are close to reality. The result has been a large,
    hard-coded set of
    properties that is very inflexible and of questionable utility.

    We're doing this in Perl, but I suspect we could might be able to switch
    languages if that's what
    it takes. Does anyone know of any similar problems and could you point me
    in an appropriate
    direction?

    Hi Ovid,

    You don't say what algorithms you use to transform data into movie
    sales, so I'll give some general suggestions.

    Your question is really two questions:
    1) How do I transform various factors into an output?
    2) How do I verify the quality of my fit?

    1) For your input, it seems that you have a combination of categorical
    variables, e.g. school in session and and numerical variables,
    e.g. distance from the theatres. For your output you have a single
    numerical variable, sales. So your goal is to fit a multidimensional
    scalar function, a common task.

    There at least two ways to go about it.

    The first is a heuristic method. Identify relevant variables, assign
    sales scores to each value of the variables through comapring sales
    for each value of a categorical variable and perhaps linear fits for
    the numerical values, and combine into a total sales figure. This may
    work. If it does, you have not only a sales prediciton, but
    understanding of why you predict what you do. But as you say, it might
    be brittle and does not take into account interactions between
    variable.

    The second way is to use a machine learing approach to fit your
    multidimensional function. There are myriad approaches to machine
    learning. One of the simplest is to train a neural net. You create a
    database of training examples with entries consisting of input
    variables and known output. Then you plug this into one of the perl ML
    packages, eg,
    a) AI::NeuralNet::Mesh - trains up multi-layer perceptrons, a type of
    feedforward neural net. It has good documentation. For your problem, I
    would reccommend a 3 layer net, with one input, one hidden and one
    output layer, with tanh activation fuctions.
    b) Algorithm::SVM - trains up a support vector machine, which is a
    specialized type of neural net. I haven't used this one.

    Training will take a while, but once trained, running data through a
    neural net is quite fast.


    2) OK, so you have trained an NN -- how good is it? The stanadard
    measure of goodness is generalizability. That is, how well does it
    perform on data it has never seen? A simple method for testing
    generizability is a technique called cross-validation. In
    cross-validation, you
    a) partition the data into training data and test data. Use the
    training data to train your network. Use the test data to see how well
    your network does on data it has never seen. Performance on the test
    data will tell you how well the network will perform on, say, next
    quarter's sales figures. You'll use the test performance to see which
    neural acrhitecture is right for you. In a three layer net, the number
    of hidden units is variable and you will want to optimize this. The
    basic algorithm for doing so is:
    for num_hidden = 1 to N {
    train network with num_hidden units
    evaluate network against test data
    If best network, store it
    }
    What would be a sutiable performance measure for testing? Well, a
    common measure is least squares: the error of the netork output is the
    square of the difference of the predicted sales and the actual sales.

    This has been an extremely terse intro on to how you might solve your
    problem :) A book I like on this subject is

    Neural Networks for Pattern Recognition, by Chris Bishop

    --
    Mark Kvale, neurobiophysicist
    http://www.keck.ucsf.edu/~kvale/
  • Ala Qumsieh at Sep 12, 2003 at 6:11 pm

    On Fri, 12 Sep 2003, Mark Kvale wrote:

    a) AI::NeuralNet::Mesh - trains up multi-layer perceptrons, a type of
    feedforward neural net. It has good documentation. For your problem, I
    would reccommend a 3 layer net, with one input, one hidden and one
    output layer, with tanh activation fuctions.
    Perhaps a stupid question, but since we're on the subject of ANNs:

    What is a good criterion for choosing the number of nodes per layer? I
    haven't been up-to-date with ANN literature lately, but I recall reading
    that a 3-layer network should suffice most applications. Is that true?

    As for the nodes per layer, I would assume the input layer would have as
    many nodes as input variables, and the output layer will have as many
    nodes as the output var. What about the hidden layer(s)?

    --Ala
  • Mark Kvale at Sep 12, 2003 at 7:12 pm

    On Fri, 12 Sep 2003, Ala Qumsieh wrote:
    On Fri, 12 Sep 2003, Mark Kvale wrote:

    a) AI::NeuralNet::Mesh - trains up multi-layer perceptrons, a type of
    feedforward neural net. It has good documentation. For your problem, I
    would reccommend a 3 layer net, with one input, one hidden and one
    output layer, with tanh activation fuctions.
    Perhaps a stupid question, but since we're on the subject of ANNs:

    What is a good criterion for choosing the number of nodes per layer? I
    haven't been up-to-date with ANN literature lately, but I recall reading
    that a 3-layer network should suffice most applications. Is that true?

    As for the nodes per layer, I would assume the input layer would have as
    many nodes as input variables, and the output layer will have as many
    nodes as the output var. What about the hidden layer(s)?
    None of these are stupid questions.

    Regarding multi-layer perceptrons, the original and most common form
    of feedforward network, there are a few heuristic rules that people
    use to pare down the space of possible neural architectures.

    For the input layer, a continuous scalar variable is assigned to one
    input. For a categorical variable with C possible values, people
    typically use a 1 of (C-1) encoding scheme. If I wanted to encode
    weather as good, bad, ugly, and duck!, I would use three inputs:
    good 0 0 0
    bad 1 0 0
    ugly 0 1 0
    duck! 0 0 1
    If you have enogh categories and they all seem to lie along the same
    axis of measurment, you might try converting it to a numeric variable:
    good 0
    bad 1
    ugly 5
    duck! 10

    The output layer is similar: one oputput for each numeric variable,
    but a 1 to C encoding scheme for a categorical variable:
    good 1 0 0 0
    bad 0 1 0 0
    ugly 0 0 1 0
    duck! 0 0 0 1

    For the hidden layer, typically people start with a single hidden
    layer. As you say, it is suffucuent for many purposes. There is some
    theorem that says you can approximate any function with a sufficient
    number of hidden units in a single layer, but that may be a lot of
    hidden units! If it works, a single layer is nice because one can look
    at the pattern of weights and deduce which factors you through at the
    problem might be most important.

    Sometimes if the function to be fit is sufficienly complex, people
    might try two hidden layers, as this may reduce the total number of
    hidden units used. Reducing the number of hidden units is good because
    it reduces the number of parameters that must be learned, and thus the
    amount of data meeded to do a good job. As Einstein says, 'A theory
    should be as simple as possible, but no simpler'. Works for computer
    programs and neural nets, too.

    Ok, so one tries a single hidden layer first, and then maybe two
    layers. But how many units per layers should be used? Some people have
    created heuristics like there should be one hidden unit per M input
    lines, etc., but these are all crap. No such prescription is
    universally good over all possible problems. There is too much
    variety.

    The only reliable method for optimizing your architecture is to try it
    out! That is, use the method of cross validation I mentioned in my
    first email. By testing the NN on data that it has not been trained
    on, you'll get a good idea of how it works on real world data. There
    are three regimes of behavior you will encounter:
    1) too few nodes - there isn't enough computing capacity in the NN to
    model the complexity of the data, resulting in a high error rate in
    the test set.
    2) too many nodes - this NN captures all the complexity of the
    underlying process, but also has enough capacity to fit all the noise
    and random artifacts of your particular training set. Fitting noise
    will produce answers that are off base on your test set, because the
    NN is in effect taking into account spurious causes of the output.
    3) Just the right number of nodes, not too complex, not too simple.

    As one progresses through too few nodes, to just right, to too many,
    the eroor function will typically start out high, decrease fast to a
    minimum, and then rise slowly. Because the error function itself may
    be noisy (too few test samples, it's just a noisy system, etc.) I find
    it best to plot the error as a function of nodes and eyeball the
    minimum.

    --
    Mark Kvale, neurobiophysicist
    http://www.keck.ucsf.edu/~kvale/
  • Dan Von Kohorn at Sep 16, 2003 at 10:18 pm
    Can anyone shed some light on the differences between:

    AI::NeuralNet::Mesh
    (http://www.openbsd.org/2.9_packages/m68k/p5-AI-NeuralNet-Mesh-0.44.tgz-long
    .html)
    and
    AI::NeuralNet::BackProp
    (http://search.cpan.org/author/JBRYAN/AI-NeuralNet-BackProp-0.77/BackProp.pm
    )

    It looks like they both come from Josiah Bryan and have similar but not
    matching PODs, different contributor lists... It also looks like they share
    a mailing list, but PODs from both haven't been updated since 2000 (July for
    BackProp, Sept. for Mesh). It also appears that the constructor for
    BackProp does not take the arguments as described in the POD, but rather the
    same as Mesh. And why is Mesh not on CPAN?

    I've been scrounging for a good Perl implementation of neural network
    components, and am still dying to see that holy grail: flexible architecture
    specifications and fast training. Maybe a wrapper around the Neural Network
    Utility C++ library? (http://sourceforge.net/projects/nn-utility/)

    I'm not strong enough to tackle it myself, but if I can assist anyone on
    this let me know.

    Thanks everyone, this is a wonderful resource!

    Dan
  • Ovid at Sep 12, 2003 at 8:56 pm

    --- Mark Kvale wrote:
    a) AI::NeuralNet::Mesh - trains up multi-layer perceptrons, a type of
    feedforward neural net. It has good documentation. For your problem, I
    would reccommend a 3 layer net, with one input, one hidden and one
    output layer, with tanh activation fuctions.
    Hi all,

    Thanks to everyone for input. I decided to start first with AI::NeuralNet::Mesh because it looks
    easy, but so far, I can't seem to train my computer to learn binary. Below my signoff is the full
    program that I wrote. Here are the results of my test run:

    Enter a four digit binary number (<ENTER> to quit): 0001
    0 0 0 1: 9
    Enter a four digit binary number (<ENTER> to quit): 0011
    0 0 1 1: 12
    Enter a four digit binary number (<ENTER> to quit): 1100
    1 1 0 0: 12
    Enter a four digit binary number (<ENTER> to quit): 1001
    1 0 0 1: 12
    Enter a four digit binary number (<ENTER> to quit): 1111
    1 1 1 1: 16
    Enter a four digit binary number (<ENTER> to quit): 0000
    0 0 0 0: 7

    Is my problem that I have too small of a dataset from which the net can extrapolate results or is
    it my failure to understand how the module functions? So I have layers, nodes per layer, and
    output nodes. I believe "output nodes" corresponds to how many potential outputs can result from
    one combination of inputs (each series of zeroes and ones represents one decimal number), but I
    don't necessarily know what the other values do.

    Can you possibly give a brief explanation? I know I won't get exact results from a neural net,
    but I would like results that are close.

    Cheers,
    Ovid

    #!/usr/bin/perl
    use strict;

    use AI::NeuralNet::Mesh;
    my $net = AI::NeuralNet::Mesh->new(3,7,1);

    if (!$net->load('binary.mesh')) {
    $net->learn_set([
    [qw/1 0 1 1/], [11],
    [qw/0 0 1 0/], [2 ],
    [qw/0 1 0 1/], [5 ],
    [qw/1 1 0 0/], [12],
    [qw/0 0 1 1/], [3 ],
    [qw/1 1 0 1/], [13],
    [qw/0 1 0 0/], [4 ],
    [qw/0 1 1 0/], [6 ],
    [qw/0 1 1 1/], [7 ],
    [qw/1 0 1 0/], [10],
    [qw/1 1 1 0/], [14],
    [qw/0 0 0 1/], [1 ],
    [qw/0 0 0 0/], [0 ],
    [qw/1 0 0 1/], [9 ],
    ]);
    $net->save('binary.mesh');
    }

    while (my $number = prompt()) {
    printf "@$number: %d\n", $net->run($number)->[0];
    }

    sub prompt {
    my $number;
    do {
    print "Enter a four digit binary number (<ENTER> to quit): ";
    $number = <>;
    chomp $number;
    exit unless $number;
    } until $number =~ /^[01]{4}$/;
    return [split //, $number];
    }


    =====
    Silence is Evil http://users.easystreet.com/ovid/philosophy/indexdecency.htm
    Ovid http://www.perlmonks.org/index.pl?node_id=17000
    Web Programming with Perl http://users.easystreet.com/ovid/cgi_course/

    __________________________________
    Do you Yahoo!?
    Yahoo! SiteBuilder - Free, easy-to-use web site design software
    http://sitebuilder.yahoo.com
  • Mark Kvale at Sep 13, 2003 at 12:01 am

    On Friday 12 September 2003 01:56 pm, Ovid wrote:
    --- Mark Kvale wrote:
    a) AI::NeuralNet::Mesh - trains up multi-layer perceptrons, a type of
    feedforward neural net. It has good documentation. For your problem, I
    would reccommend a 3 layer net, with one input, one hidden and one
    output layer, with tanh activation fuctions.
    Hi all,

    Thanks to everyone for input. I decided to start first with
    AI::NeuralNet::Mesh because it looks easy, but so far, I can't seem to
    train my computer to learn binary. Below my signoff is the full program
    that I wrote. Here are the results of my test run:
    snip!

    Right off the bat, you have 4 binary inputs, but only 3 input nodes:

    my $net = AI::NeuralNet::Mesh->new(3,7,1);

    Your neural net is telling you that the most significant digit is relevant :)

    Second, you will have 4 inputs, and 7 hidden, resulting in 28 connections.
    The second layer of weights has seven connections, giving a total of 35
    parameters. Whether this is enough data depends on the details of the
    learning algorithm. See below.

    Below is a program that parametrizes number of examples and hidden units and
    uses a cross validation type of method to test network efficacy.

    The results are

    Number of examples used for training and testing: 10
    hidden: 1 RMS error per trial: 5.47722557505166
    hidden: 2 RMS error per trial: 4.7116875957559
    hidden: 3 RMS error per trial: 1.44913767461894
    hidden: 4 RMS error per trial: 2.77488738510232
    hidden: 5 RMS error per trial: 1.54919333848297
    hidden: 6 RMS error per trial: 1.92353840616713
    hidden: 7 RMS error per trial: 2.38746727726266
    hidden: 8 RMS error per trial: 2.72029410174709
    hidden: 9 RMS error per trial: 2.0976176963403
    hidden: 10 RMS error per trial: 2.91547594742265

    Number of examples used for training and testing: 50
    hidden: 1 RMS error per trial: 3.54118624192515
    hidden: 2 RMS error per trial: 1.75499287747842
    hidden: 3 RMS error per trial: 1.05830052442584
    hidden: 4 RMS error per trial: 1.90787840283389
    hidden: 5 RMS error per trial: 1.78885438199983
    hidden: 6 RMS error per trial: 1.80554700852678
    hidden: 7 RMS error per trial: 2.57681974534503
    hidden: 8 RMS error per trial: 2.36220236220354
    hidden: 9 RMS error per trial: 3.48998567332303
    hidden: 10 RMS error per trial: 2.19544984001001

    There are couple of things to notice. First, eroors decrease then increase as
    explained in a previous email. The optimum number of hidden units seems to be
    around 3 for both 10 and 50 examples. Second, AI::NeuralNet::Mesh doesn't
    seem to bootstrap the given examples, so it pays to present the ones you have
    mutilpe time. This is evidenced by the second set of results, which show
    lower error and less variance in the erors.

    At its best, a single layer with 3 hidden units gets on average to within
    1.06 of the correct answer over all examples. Which is OK, but not perfect.
    To do better, you may want to
    1) try more examples,
    2) add a second layer,
    3) alter activation functions for the hidden and output nodes,
    4) Since backprop is a gradient method, it may get stuck in local minima.
    Thus it pays to randomize initial weights and run it multiple times to find
    a better network.

    Becasue of this and other reasons, the backpropagation learning method used
    in this module is one of the weaker in use for NN learning. Quickprop or
    Rprop (not implemented here) may do a better job. A C/C++ program that
    implements a wide variety of NNs and learning methods is SNNS:

    http://www-ra.informatik.uni-tuebingen.de/SNNS/

    and there are propbably others that are good, too.

    Neural nets are one of the simpler machine learing paradigms, but they are
    not turnkey algorithms. It is an art to pick relevant input variables and
    some playing around is needed to achieve best results.

    -Mark


    #!/usr/bin/perl

    use strict;
    use AI::NeuralNet::Mesh;

    my $num_ex = shift;
    my @bin = qw(000 001 010 011 100 101 110 111);

    print "Number of examples used for training and testing: $num_ex\n";
    foreach my $hidden (1..10) {

    # create new network
    my $net = AI::NeuralNet::Mesh->new(3,$hidden,1);

    # create $num_ex random examples
    my $examples = [];
    foreach (1..$num_ex) {
    my $dec = int rand 8;
    my @digits = split //, $bin[$dec];
    push @$examples, [@digits], [$dec];
    }

    # train the NN
    $net->learn_set( $examples);

    # test the NN
    my $avg = 0;
    foreach (1..$num_ex) {
    my $dec = int rand 8;
    my @digits = split //, $bin[$dec];
    my $pred = $net->run( \@digits)->[0];
    $avg += ($dec - $pred) * ($dec - $pred);
    }
    $avg /= $num_ex;

    # print output
    print "hidden: $hidden\tRMS error per trial: ", sqrt $avg, "\n";
    }
  • Ovid at Sep 13, 2003 at 12:16 am

    --- Mark Kvale wrote:
    Below is a program that parametrizes number of examples and hidden units and
    uses a cross validation type of method to test network efficacy.
    Thank you very much. I understand quite a bit more now, though obviously I have quite a bit of
    work to do. The SNNS package that you pointed me to has docs that have been clearing my fog for
    me.

    Cheers,
    Ovid

    =====
    Silence is Evil http://users.easystreet.com/ovid/philosophy/indexdecency.htm
    Ovid http://www.perlmonks.org/index.pl?node_id=17000
    Web Programming with Perl http://users.easystreet.com/ovid/cgi_course/

    __________________________________
    Do you Yahoo!?
    Yahoo! SiteBuilder - Free, easy-to-use web site design software
    http://sitebuilder.yahoo.com
  • Ovid at Sep 14, 2003 at 2:03 am
    Hi all,

    I'm working with AI::NeuralNet::Mesh and I've seen a few areas that it can be improved slightly.
    Mainly, I can make it run clean under warnings and, according to some initial benchmarks, I can
    give it a nice performance boost with a few tweaks. However, I backed out my changes to be able
    to build a more comprehensive test suite to ensure that I don't break anything. This raises a
    question for me.

    After training the neural network, assuming that I am using the same training data every time (in
    the same order), are the results deterministic across operating systems, CPUs, Perl versions, etc?
    From reading through the code, I don't see anything that would cause problems here, but I'm not
    sure.

    If the results *are* deterministic then I can go ahead and build the test suite and send this back
    to the author. Otherwise, I can only build the tests for me, but I'd prefer to be able let others
    take advantage of my work.

    Cheers,
    Ovid

    =====
    Silence is Evil http://users.easystreet.com/ovid/philosophy/indexdecency.htm
    Ovid http://www.perlmonks.org/index.pl?node_id=17000
    Web Programming with Perl http://users.easystreet.com/ovid/cgi_course/

    __________________________________
    Do you Yahoo!?
    Yahoo! SiteBuilder - Free, easy-to-use web site design software
    http://sitebuilder.yahoo.com
  • Mark Kvale at Sep 14, 2003 at 3:34 am

    On Sat, 13 Sep 2003, Ovid wrote:

    Hi all,

    I'm working with AI::NeuralNet::Mesh and I've seen a few areas that it can be improved slightly.
    Mainly, I can make it run clean under warnings and, according to some initial benchmarks, I can
    give it a nice performance boost with a few tweaks. However, I backed out my changes to be able
    to build a more comprehensive test suite to ensure that I don't break anything. This raises a
    question for me.

    After training the neural network, assuming that I am using the same training data every time (in
    the same order), are the results deterministic across operating systems, CPUs, Perl versions, etc?
    From reading through the code, I don't see anything that would cause problems here, but I'm not
    sure.

    If the results *are* deterministic then I can go ahead and build the test suite and send this back
    to the author. Otherwise, I can only build the tests for me, but I'd prefer to be able let others
    take advantage of my work.
    Some speedups would certainly be welcome. I don't know the module
    code, but some general considerations on perfect repeatability across
    OSes and CPUs would include the fact that some OS/CPU combinations
    compute with doubles and some with long doubles. Also, special
    function libraries (for the logistic and tanh activation functions)
    will vary across the C libraries.

    Theoretically, the error landscape for an NN optimization is full of
    local minima, wich implies the existence of separatrices that can
    magnify even small numerical discrepanies. I have no idea, however,
    if this is a practical problem with NN testing; if one tested just a
    single batch learing step, I can't see how the divergence would grow
    large.

    --
    Mark Kvale, neurobiophysicist
    http://www.keck.ucsf.edu/~kvale/
  • Dan Von Kohorn at Sep 15, 2003 at 10:41 pm

    After training the neural network, assuming that I am using the same
    training data every time (in
    the same order), are the results deterministic across operating systems,
    CPUs, Perl versions, etc?
    From reading through the code, I don't see anything that would cause
    problems here, but I'm not
    sure.
    Training neural networks is not generally deterministic, but not for the
    reasons you mention. Neural networks are trained on data, but start with a
    random initializations. This random initialization can lead to different
    trained networks even when two networks are trained with the same data on
    the same machine.

    A test suite should be robust enough to handle the variations in trained
    nets (linear functions may be more repeatable, and adequate for architecture
    and basic testing purposes).

    DanVK

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupai @
categoriesperl
postedSep 12, '03 at 3:44p
activeSep 16, '03 at 10:18p
posts15
users6
websiteperl.org

People

Translate

site design / logo © 2021 Grokbase