Grokbase Groups Perl scripts May 2003
FAQ
I uploaded a script, ngram.pl (now at version 1.45), for which
I couldn't think any better category than 'Scientific'. (It's
kind of like of statistics, but applied to a very specific data,
so I opted for a more generic category.)

--
Jarkko Hietaniemi <jhi@iki.fi> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

Search Discussions

  • Vlado Keselj at May 21, 2003 at 10:50 am
    I think that category Text would be better. If you take a look at Perl
    modules, the most appropriate prefix would be Text::. "Scientific" is too
    general and not text-processing related.

    Vlado
    On Tue, 20 May 2003, Jarkko Hietaniemi wrote:

    I uploaded a script, ngram.pl (now at version 1.45), for which
    I couldn't think any better category than 'Scientific'. (It's
    kind of like of statistics, but applied to a very specific data,
    so I opted for a more generic category.)

    --
    Jarkko Hietaniemi <jhi@iki.fi> http://www.iki.fi/jhi/ "There is this special
    biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen
  • Jarkko Hietaniemi at May 21, 2003 at 8:46 pm
    Okay, how about Text::Statistics?

    (I don't like going for Lingua:: because my script isn't about any
    particular language or languages, just characters and words-- and
    thepoint of Lingua:: is to be specific about about one or more
    languages.)

    --
    Jarkko Hietaniemi <jhi@iki.fi> http://www.iki.fi/jhi/ "There is this special
    biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen
  • David Wheeler at May 21, 2003 at 10:07 pm

    On Wednesday, May 21, 2003, at 01:45 PM, Jarkko Hietaniemi wrote:

    (I don't like going for Lingua:: because my script isn't about any
    particular language or languages, just characters and words-- and
    thepoint of Lingua:: is to be specific about about one or more
    languages.)
    Getting a bit OT here...

    Well, in that case, my Lingua::Strfname module is probably mis-named.
    :-( It provides a strftime-like formatting language for formatting
    people's names. Suggested namespace improvements, anyone?

    Regards,

    David

    --
    David Wheeler AIM: dwTheory
    david@kineticode.com ICQ: 15726394
    http://kineticode.com/ Yahoo!: dew7e
    Jabber: Theory@jabber.org
    Kineticode. Setting knowledge in motion.[sm]
  • Jarkko Hietaniemi at May 25, 2003 at 8:31 pm

    Getting a bit OT here...

    Well, in that case, my Lingua::Strfname module is probably mis-named.
    :-( It provides a strftime-like formatting language for formatting
    people's names. Suggested namespace improvements, anyone?
    Errr, no, why would it be in the wrong top category? Though you might
    want to migrate it to Lingua::EN:: since it's rather English specific.

    --
    Jarkko Hietaniemi <jhi@iki.fi> http://www.iki.fi/jhi/ "There is this special
    biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen
  • David Wheeler at May 25, 2003 at 11:16 pm

    On Sunday, May 25, 2003, at 01:31 PM, Jarkko Hietaniemi wrote:

    Errr, no, why would it be in the wrong top category? Though you might
    want to migrate it to Lingua::EN:: since it's rather English specific.
    Well, I tried not to make it English specific. It doesn't really care
    what language is used, or what order the names are displayed in. And if
    there are more parts than the default (first, middle, last, prefix,
    suffix), they can be passed in as extra arguments -- up to 5 of them,
    IIRC.

    If it's perceived as English specific in some way, I'd rather try to
    eliminate the things that make it so, or eliminate the perception,
    rather than move it.

    Regards,

    David

    --
    David Wheeler AIM: dwTheory
    david@kineticode.com ICQ: 15726394
    http://kineticode.com/ Yahoo!: dew7e
    Jabber: Theory@jabber.org
    Kineticode. Setting knowledge in motion.[sm]
  • Jarkko Hietaniemi at May 26, 2003 at 5:57 am

    Errr, no, why would it be in the wrong top category? Though you might
    want to migrate it to Lingua::EN:: since it's rather English specific.
    Well, I tried not to make it English specific. It doesn't really care
    what language is used, or what order the names are displayed in. And if
    there are more parts than the default (first, middle, last, prefix,
    suffix), they can be passed in as extra arguments -- up to 5 of them,
    IIRC.

    If it's perceived as English specific in some way, I'd rather try to
    eliminate the things that make it so, or eliminate the perception,
    rather than move it.
    I now took a closer look and I must apologize, your module is not that
    English-specific.

    But the 'first, middle, last, extras' is a bit difficult.
    That is somehow... errr, "culturally Western European", probably.
    The good/bad news is that I haven't ever seen a comprehensive solution
    or terminology that would solve this. Some examples that come to mind:

    family_name given_name [Chinese, Japanese, Hungarian]

    given_name patronymic (male) [Icelandic, some Indic] [2]
    given_name matronymic (female)

    several levels of patronymics [Arabic]

    given_name (Tamil)

    [1] At least you didn't go by "ChristianName Surname".
    [2] 'Gurusamy Sarathy' being one example.

    --
    Jarkko Hietaniemi <jhi@iki.fi> http://www.iki.fi/jhi/ "There is this special
    biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen
  • David Wheeler at May 27, 2003 at 4:49 pm

    On Sunday, May 25, 2003, at 10:57 PM, Jarkko Hietaniemi wrote:

    I now took a closer look and I must apologize, your module is not that
    English-specific. Oh, whew!
    But the 'first, middle, last, extras' is a bit difficult.
    That is somehow... errr, "culturally Western European", probably.
    The good/bad news is that I haven't ever seen a comprehensive solution
    or terminology that would solve this. Some examples that come to mind:

    family_name given_name [Chinese, Japanese, Hungarian]

    given_name patronymic (male) [Icelandic, some Indic] [2]
    given_name matronymic (female)

    several levels of patronymics [Arabic]

    given_name (Tamil)

    [1] At least you didn't go by "ChristianName Surname".
    [2] 'Gurusamy Sarathy' being one example.
    Well, I admit that the terms are Amero-centric, and they suggest an
    order, as well. But the format placeholders are simple strings (%f, %m,
    %l, etc.), and to a certain extent, they can be considered arbitrary. I
    wouldn't want to change those, but would have no objection to changing
    what they're called in the docs, provided some more or less universally
    neutral terms could be put forward.

    Maybe I should just go through my anthropology texts and look for
    kinship terminology used in that discipline...

    At any rate, even if the docs change, I don't think that the interface
    would change in the slightest.

    Regards,

    David

    --
    David Wheeler AIM: dwTheory
    david@kineticode.com ICQ: 15726394
    http://kineticode.com/ Yahoo!: dew7e
    Jabber: Theory@jabber.org
    Kineticode. Setting knowledge in motion.[sm]
  • William R Ward at May 27, 2003 at 8:25 pm

    David Wheeler writes:
    Well, I admit that the terms are Amero-centric, and they suggest an
    order, as well. But the format placeholders are simple strings (%f,
    %m, %l, etc.), and to a certain extent, they can be considered
    arbitrary. I wouldn't want to change those, but would have no
    objection to changing what they're called in the docs, provided some
    more or less universally neutral terms could be put forward.

    Maybe I should just go through my anthropology texts and look for
    kinship terminology used in that discipline...

    At any rate, even if the docs change, I don't think that the interface
    would change in the slightest.
    Even within the Amero-centric paradigm there are still some scenarios
    your module may not handle (correct me if I'm wrong):

    * Two or more middle names (e.g. George Herbert Walker Bush, British
    nobility/royalty, etc.)
    * No middle name
    * Only a middle initial (e.g. Harry S. Truman, Bullwinkle J. Moose)
    * First initial, goes by middle name (e.g. my mom, D. Colleen Ward)
    * Last name is two words (e.g. a friend of mine, Joydeep Roy
    Chowdhury; his last name is "Roy Chowdhury")

    --Bill.

    --
    William R Ward bill@wards.net http://www.wards.net/~bill/
    -----------------------------------------------------------------------------
    "A foolish consistency is the hobgoblin of little minds, adored by
    little statesmen and philosophers and divines." - Emerson
  • David Wheeler at May 27, 2003 at 8:35 pm

    On Tuesday, May 27, 2003, at 01:25 PM, William R Ward wrote:

    Even within the Amero-centric paradigm there are still some scenarios
    your module may not handle (correct me if I'm wrong):

    * Two or more middle names (e.g. George Herbert Walker Bush, British
    nobility/royalty, etc.)
    The second middle name can be handled as an extra argument, and added
    to the format (or not) using %1.
    * No middle name
    If there is no middle name (it's undef or ''), then even if the format
    has %m in it, it will not be displayed. This is the whole point of the
    module, really -- to handle those situations where the available names
    vary a great deal.
    * Only a middle initial (e.g. Harry S. Truman, Bullwinkle J. Moose)
    It will be included as an initial when using %m for middle name, the
    middle initial with the period using %M, and the middle initial itself
    using %I. The only caveat is that if you use %M with a name like
    "Ulysses S Grant" (note the lack of a period), the middle initial will
    be output as "S.". This is a rare case, however.
    * First initial, goes by middle name (e.g. my mom, D. Colleen Ward)
    Make the first name argument "D.", the middle name argument "Colleen",
    and the last name argument "Ward".
    * Last name is two words (e.g. a friend of mine, Joydeep Roy
    Chowdhury; his last name is "Roy Chowdhury")
    It's still a single scalar argument, "Roy Chowdhury", to strfname(). Of
    course, the initial using %L would be "R.".

    The upshot is that you pass in the various name parts you want
    formatted into a string. Lingua::Strfname doesn't parse string into
    name parts. For that, someone would have to write Lingua::Strpname. ;-)
    And that would more likely have to be localized, e.g.,
    Lingua::EN::Strpname.

    Regards,

    david

    --
    David Wheeler AIM: dwTheory
    david@kineticode.com ICQ: 15726394
    http://kineticode.com/ Yahoo!: dew7e
    Jabber: Theory@jabber.org
    Kineticode. Setting knowledge in motion.[sm]
  • William R Ward at May 27, 2003 at 8:41 pm

    David Wheeler writes:
    The upshot is that you pass in the various name parts you want
    formatted into a string. Lingua::Strfname doesn't parse string into
    name parts. For that, someone would have to write Lingua::Strpname. ;-)
    And that would more likely have to be localized, e.g.,
    Lingua::EN::Strpname.
    I see. I was thinking that it parsed the name as well...

    --Bill.

    --
    William R Ward bill@wards.net http://www.wards.net/~bill/
    -----------------------------------------------------------------------------
    "A foolish consistency is the hobgoblin of little minds, adored by
    little statesmen and philosophers and divines." - Emerson
  • David Wheeler at May 27, 2003 at 8:43 pm

    On Tuesday, May 27, 2003, at 01:40 PM, William R Ward wrote:

    The upshot is that you pass in the various name parts you want
    formatted into a string. Lingua::Strfname doesn't parse string into
    name parts. For that, someone would have to write Lingua::Strpname.
    ;-)
    And that would more likely have to be localized, e.g.,
    Lingua::EN::Strpname.
    I see. I was thinking that it parsed the name as well...
    Do I _look_ that insane! Look what writing Pod::Simple did to Sean
    Burke! No, that way lies madness. ;-)

    David

    --
    David Wheeler AIM: dwTheory
    david@kineticode.com ICQ: 15726394
    http://kineticode.com/ Yahoo!: dew7e
    Jabber: Theory@jabber.org
    Kineticode. Setting knowledge in motion.[sm]
  • Vlado Keselj at May 22, 2003 at 8:16 pm
    Text::Statistics sounds good to me. (I would shorten it into
    Text::Stat, but it is not important.)

    There are some apparently similar scripts written by Ted Pedersen
    (http://www.d.umn.edu/~tpederse/nsp.html - N-gram Statistics Package
    (NSP)), but it does not seem that he uploaded them into CPAN.

    I will probably be adding some scripts/modules into this category.

    Vlado
    On Wed, 21 May 2003, Jarkko Hietaniemi wrote:

    Okay, how about Text::Statistics?

    (I don't like going for Lingua:: because my script isn't about any
    particular language or languages, just characters and words-- and
    thepoint of Lingua:: is to be specific about about one or more
    languages.)

    --
    Jarkko Hietaniemi <jhi@iki.fi> http://www.iki.fi/jhi/ "There is this special
    biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscripts @
categoriesperl
postedMay 20, '03 at 11:23a
activeMay 27, '03 at 8:43p
posts13
users5
websiteperl.org

People

Translate

site design / logo © 2021 Grokbase