FAQ
Hello,

I am acessing an Excel file by means of Win 32 COM technology.
For a given cell, I am able to read its formula. I want to make a map
of how cells reference one another, how different sheets reference one
another, how workbooks reference one another, etc.

Hence, I need to parse Excel formulas. Can I do it by means only of re
(regular expressions)?

I know that for simple formulas such as "=3*A7+5" it is indeed
possible. What about complex for formulas that include functions,
sheet names and possibly other *.xls files?

For example "=Book1!A5+8" should be parsed into ["=","Book1", "!",
"A5","+","8"]

Can anybody help? Any suggestions?

Vicente Soler

Search Discussions

  • MRAB at Jan 5, 2010 at 6:35 pm

    vsoler wrote:
    Hello,

    I am acessing an Excel file by means of Win 32 COM technology.
    For a given cell, I am able to read its formula. I want to make a map
    of how cells reference one another, how different sheets reference one
    another, how workbooks reference one another, etc.

    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?

    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?

    For example "=Book1!A5+8" should be parsed into ["=","Book1", "!",
    "A5","+","8"]

    Can anybody help? Any suggestions?
    Do you mean "how" or do you really mean "whether", ie, get a list of the
    other cells that are referred to by a certain cell, for example,
    "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5"]?
  • Vsoler at Jan 5, 2010 at 7:00 pm

    On 5 ene, 19:35, MRAB wrote:
    vsoler wrote:
    Hello,
    I am acessing an Excel file by means of Win 32 COM technology.
    For a given cell, I am able to read its formula. I want to make a map
    of how cells reference one another, how different sheets reference one
    another, how workbooks reference one another, etc.
    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?
    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    For example ? ?"=Book1!A5+8" should be parsed into ["=","Book1", "!",
    "A5","+","8"]
    Can anybody help? Any suggestions?
    Do you mean "how" or do you really mean "whether", ie, get a list of the
    other cells that are referred to by a certain cell, for example,
    "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5"]?
    I'd like to know how to do it, should it be possible.

    Vicente
  • MRAB at Jan 5, 2010 at 7:20 pm

    vsoler wrote:
    On 5 ene, 19:35, MRAB wrote:
    vsoler wrote:
    Hello,
    I am acessing an Excel file by means of Win 32 COM technology.
    For a given cell, I am able to read its formula. I want to make a map
    of how cells reference one another, how different sheets reference one
    another, how workbooks reference one another, etc.
    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?
    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    For example "=Book1!A5+8" should be parsed into ["=","Book1", "!",
    "A5","+","8"]
    Can anybody help? Any suggestions?
    Do you mean "how" or do you really mean "whether", ie, get a list of the
    other cells that are referred to by a certain cell, for example,
    "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5"]?
    I'd like to know how to do it, should it be possible.
    Something like this should work:

    references = re.findall(r"\b((?:\w+!)?[A-Za-z]+\d+)\b", formula)
  • John Posner at Jan 5, 2010 at 7:04 pm

    On Tue, 05 Jan 2010 13:12:00 -0500, vsoler wrote:

    Hello,

    I am acessing an Excel file by means of Win 32 COM technology.
    For a given cell, I am able to read its formula. I want to make a map
    of how cells reference one another, how different sheets reference one
    another, how workbooks reference one another, etc.

    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?

    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?

    For example "=Book1!A5+8" should be parsed into ["=","Book1", "!",
    "A5","+","8"]

    Can anybody help? Any suggestions?
    It seems like you want to recreate data structures that Excel, itself,
    must maintain in order to recalculate cells in the correct order. As long
    as you're using COM, you might be able to tap into those data structures.
    My 15-year-old (!) "Using Excel Visual Basic for Applications" book wasn't
    any help. :-( After a short Google session, I came up with one possible
    lead: http://www.decisionmodels.com/

    Good luck!
    John
  • Mensanator at Jan 5, 2010 at 7:05 pm

    On Jan 5, 12:35?pm, MRAB wrote:
    vsoler wrote:
    Hello,
    I am acessing an Excel file by means of Win 32 COM technology.
    For a given cell, I am able to read its formula. I want to make a map
    of how cells reference one another, how different sheets reference one
    another, how workbooks reference one another, etc.
    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?
    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    For example ? ?"=Book1!A5+8" should be parsed into ["=","Book1", "!",
    "A5","+","8"]
    Can anybody help? Any suggestions?
    Do you mean "how" or do you really mean "whether", ie, get a list of the
    other cells that are referred to by a certain cell, for example,
    "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
    Ok, although "Book1" would be the default name of a workbook, with
    default
    worksheets labeled "Sheet1". "Sheet2", etc.

    If I had a worksheet named "Sheety" that wanted to reference a cell on
    "Sheetx"
    OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
    a completely
    different workbook (say Book1 with worksheets labeled "Sheet1",
    "Sheet2") then
    the cell might have =[Book1]Sheet1!A7.

    And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.
  • Vsoler at Jan 5, 2010 at 7:21 pm

    On 5 ene, 20:05, Mensanator wrote:
    On Jan 5, 12:35?pm, MRAB wrote:


    vsoler wrote:
    Hello,
    I am acessing an Excel file by means of Win 32 COM technology.
    For a given cell, I am able to read its formula. I want to make a map
    of how cells reference one another, how different sheets reference one
    another, how workbooks reference one another, etc.
    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?
    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    For example ? ?"=Book1!A5+8" should be parsed into ["=","Book1", "!",
    "A5","+","8"]
    Can anybody help? Any suggestions?
    Do you mean "how" or do you really mean "whether", ie, get a list of the
    other cells that are referred to by a certain cell, for example,
    "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
    Ok, although "Book1" would be the default name of a workbook, with
    default
    worksheets labeled "Sheet1". "Sheet2", etc.

    If I had a worksheet named "Sheety" that wanted to reference a cell on
    "Sheetx"
    OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
    a completely
    different workbook (say Book1 with worksheets labeled "Sheet1",
    "Sheet2") then
    the cell might have =[Book1]Sheet1!A7.

    And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.
    Yes, Mensanator, but... what re should I use? I'm looking for the re
    statement. No doubt you can help!

    Thank you.
  • Vsoler at Jan 5, 2010 at 7:54 pm

    On 5 ene, 20:21, vsoler wrote:
    On 5 ene, 20:05, Mensanator wrote:


    On Jan 5, 12:35?pm, MRAB wrote:

    vsoler wrote:
    Hello,
    I am acessing an Excel file by means of Win 32 COM technology.
    For a given cell, I am able to read its formula. I want to make a map
    of how cells reference one another, how different sheets reference one
    another, how workbooks reference one another, etc.
    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?
    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    For example ? ?"=Book1!A5+8" should be parsed into ["=","Book1", "!",
    "A5","+","8"]
    Can anybody help? Any suggestions?
    Do you mean "how" or do you really mean "whether", ie, get a list of the
    other cells that are referred to by a certain cell, for example,
    "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
    Ok, although "Book1" would be the default name of a workbook, with
    default
    worksheets labeled "Sheet1". "Sheet2", etc.
    If I had a worksheet named "Sheety" that wanted to reference a cell on
    "Sheetx"
    OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
    a completely
    different workbook (say Book1 with worksheets labeled "Sheet1",
    "Sheet2") then
    the cell might have =[Book1]Sheet1!A7.
    And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.
    Yes, Mensanator, but... ?what re should I use? I'm looking for the re
    statement. No doubt you can help!

    Thank you.

    Let me give you an example:
    import re
    re.split("([^0-9])", "123+456*/")
    [?123?, ?+?, ?456?, ?*?, ??, ?/?, ??]

    I find it excellent that one single statement is able to do a lexical
    analysis of an expression!

    If the expression contains variables, such as A12 or B9, I can try
    another re expression. Which one should I use?

    And if my expression contains parenthesis? And the sin() function?

    Vicente Soler
  • John Machin at Jan 7, 2010 at 12:53 am

    On Jan 6, 6:54?am, vsoler wrote:
    On 5 ene, 20:21, vsoler wrote:


    On 5 ene, 20:05, Mensanator wrote:
    On Jan 5, 12:35?pm, MRAB wrote:

    vsoler wrote:
    Hello,
    I am acessing an Excel file by means of Win 32 COM technology.
    For a given cell, I am able to read its formula. I want to make a map
    of how cells reference one another, how different sheets reference one
    another, how workbooks reference one another, etc.
    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?
    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    For example ? ?"=Book1!A5+8" should be parsed into ["=","Book1", "!",
    "A5","+","8"]
    Can anybody help? Any suggestions?
    Do you mean "how" or do you really mean "whether", ie, get a list of the
    other cells that are referred to by a certain cell, for example,
    "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
    Ok, although "Book1" would be the default name of a workbook, with
    default
    worksheets labeled "Sheet1". "Sheet2", etc.
    If I had a worksheet named "Sheety" that wanted to reference a cell on
    "Sheetx"
    OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
    a completely
    different workbook (say Book1 with worksheets labeled "Sheet1",
    "Sheet2") then
    the cell might have =[Book1]Sheet1!A7.
    And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.
    Yes, Mensanator, but... ?what re should I use? I'm looking for the re
    statement. No doubt you can help!
    Thank you.
    Let me give you an example:
    import re
    re.split("([^0-9])", "123+456*/")
    [?123?, ?+?, ?456?, ?*?, ??, ?/?, ??]

    I find it excellent that one single statement is able to do a lexical
    analysis of an expression!
    That is NOT lexical analysis.
    If the expression contains variables, such as A12 or B9, I can try
    another re expression. Which one should I use?

    And if my expression contains parenthesis? ? And the sin() function?
    You need a proper lexical analysis, followed by a parser. What you
    are trying to do can NOT be accomplished in any generality with a
    single regex. The Excel formula syntax has several tricky bits. E.g.
    IIRC whether TAX09 is a (macro) name or a cell reference depends on
    what version of Excel you are targetting but if it appears like TAX09!
    A1:B2 then it's a sheet name.

    The xlwt package (of which I am the maintainer) has a lexer and parser
    for a largish subset of the syntax ... see http://pypi.python.org/pypi/xlwt
  • Chris Withers at Jan 12, 2010 at 7:26 am

    John Machin wrote:
    The xlwt package (of which I am the maintainer) has a lexer and parser
    for a largish subset of the syntax ... see http://pypi.python.org/pypi/xlwt
    xlrd, no?

    Also worth pointing out that the topic of Python and Excel has its own
    web site:

    http://www.python-excel.org

    ...which links to a more specialist group.

    Chris

    --
    Simplistix - Content Management, Batch Processing & Python Consulting
    - http://www.simplistix.co.uk
  • John Machin at Jan 12, 2010 at 12:35 pm

    On 12/01/2010 6:26 PM, Chris Withers wrote:
    John Machin wrote:
    The xlwt package (of which I am the maintainer) has a lexer and parser
    for a largish subset of the syntax ... see
    http://pypi.python.org/pypi/xlwt
    xlrd, no?
    A facility in xlrd to decompile Excel formula bytecode into a text
    formula is currently *under discussion*.

    The OP was planning to dig the formula text out using COM then parse the
    formula text looking for cell references and appeared to have a rather
    simplistic view of the ease of parsing Excel formula text -- that's why
    I pointed him at those facilities (existing, released, proven in the
    field) in xlwt.
  • Chris Withers at Jan 13, 2010 at 9:44 am

    John Machin wrote:
    The OP was planning to dig the formula text out using COM then parse the
    formula text looking for cell references and appeared to have a rather
    simplistic view of the ease of parsing Excel formula text -- that's why
    I pointed him at those facilities (existing, released, proven in the
    field) in xlwt.
    Which bits of xlwt are you referring to? (at a guess, the stuff that
    turns a piece of text into the correct formulae gubbinz when you write a
    formula to a cell with xlwt?)

    cheers,

    Chris

    --
    Simplistix - Content Management, Batch Processing & Python Consulting
    - http://www.simplistix.co.uk
  • MRAB at Jan 5, 2010 at 10:37 pm

    Mensanator wrote:
    On Jan 5, 12:35 pm, MRAB wrote:
    vsoler wrote:
    Hello,
    I am acessing an Excel file by means of Win 32 COM technology.
    For a given cell, I am able to read its formula. I want to make a map
    of how cells reference one another, how different sheets reference one
    another, how workbooks reference one another, etc.
    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?
    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    For example "=Book1!A5+8" should be parsed into ["=","Book1", "!",
    "A5","+","8"]
    Can anybody help? Any suggestions?
    Do you mean "how" or do you really mean "whether", ie, get a list of the
    other cells that are referred to by a certain cell, for example,
    "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
    Ok, although "Book1" would be the default name of a workbook, with
    default
    worksheets labeled "Sheet1". "Sheet2", etc.

    If I had a worksheet named "Sheety" that wanted to reference a cell on
    "Sheetx"
    OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
    a completely
    different workbook (say Book1 with worksheets labeled "Sheet1",
    "Sheet2") then
    the cell might have =[Book1]Sheet1!A7.

    And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.
    I forgot about the dollars! In that case, the regex is:

    references = re.findall(r"\b((?:\w+!)?\$?[A-Za-z]+\$?\d+)\b", formula)
  • Tim Chase at Jan 5, 2010 at 7:49 pm

    vsoler wrote:
    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?

    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    Where things start getting ugly is when you have nested function
    calls, such as

    =if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14,
    (Min(C1:C25)+3)*18,Max(B1:B25)))

    Regular expressions don't do well with nested parens (especially
    arbitrarily-nesting-depth such as are possible), so I'd suggest
    going for a full-blown parsing solution like pyparsing.

    If you have fair control over what can be contained in the
    formulas and you know they won't contain nested parens/functions,
    you might be able to formulate some sort of "kinda, sorta, maybe
    parses some forms of formulas" regexp.

    -tkc
  • Steve Holden at Jan 6, 2010 at 2:06 am

    Tim Chase wrote:
    vsoler wrote:
    Hence, I need to parse Excel formulas. Can I do it by means only of re
    (regular expressions)?

    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    Where things start getting ugly is when you have nested function calls,
    such as

    =if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14,
    (Min(C1:C25)+3)*18,Max(B1:B25)))

    Regular expressions don't do well with nested parens (especially
    arbitrarily-nesting-depth such as are possible), so I'd suggest going
    for a full-blown parsing solution like pyparsing.

    If you have fair control over what can be contained in the formulas and
    you know they won't contain nested parens/functions, you might be able
    to formulate some sort of "kinda, sorta, maybe parses some forms of
    formulas" regexp.
    And don't forget about named ranges, which can reference cells without
    using anything but a plain identifier ...

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
    Holden Web LLC http://www.holdenweb.com/
    UPCOMING EVENTS: http://holdenweb.eventbrite.com/
  • Paul McGuire at Jan 13, 2010 at 8:15 am

    On Jan 5, 1:49?pm, Tim Chase wrote:
    vsoler wrote:
    Hence, I need toparseExcel formulas. Can I do it by means only of re
    (regular expressions)?
    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    Where things start getting ugly is when you have nested function
    calls, such as

    ? ?=if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14,
    (Min(C1:C25)+3)*18,Max(B1:B25)))

    Regular expressions don't do well with nested parens (especially
    arbitrarily-nesting-depth such as are possible), so I'd suggest
    going for a full-blown parsing solution like pyparsing.

    If you have fair control over what can be contained in the
    formulas and you know they won't contain nested parens/functions,
    you might be able to formulate some sort of "kinda, sorta, maybe
    parses some forms of formulas" regexp.

    -tkc
    This might give the OP a running start:

    from pyparsing import (CaselessKeyword, Suppress, Word, alphas,
    alphanums, nums, Optional, Group, oneOf, Forward, Regex,
    operatorPrecedence, opAssoc, dblQuotedString)

    test1 = "=3*A7+5"
    test2 = "=3*Sheet1!$A$7+5"
    test3 = "=if(Sum(A1:A25)>42,Min(B1:B25), " \
    "if(Sum(C1:C25)>3.14, (Min(C1:C25)+3)*18,Max(B1:B25)))"

    EQ,EXCL,LPAR,RPAR,COLON,COMMA,DOLLAR = map(Suppress, '=!():,$')
    sheetRef = Word(alphas, alphanums)
    colRef = Optional(DOLLAR) + Word(alphas,max=2)
    rowRef = Optional(DOLLAR) + Word(nums)
    cellRef = Group(Optional(sheetRef + EXCL)("sheet") + colRef("col") +
    rowRef("row"))

    cellRange = (Group(cellRef("start") + COLON + cellRef("end"))
    ("range")
    cellRef )
    expr = Forward()

    COMPARISON_OP = oneOf("< = > >= <= != <>")
    condExpr = expr + COMPARISON_OP + expr

    ifFunc = (CaselessKeyword("if") +
    LPAR +
    Group(condExpr)("condition") +
    COMMA + expr("if_true") +
    COMMA + expr("if_false") + RPAR)
    statFunc = lambda name : CaselessKeyword(name) + LPAR + cellRange +
    RPAR
    sumFunc = statFunc("sum")
    minFunc = statFunc("min")
    maxFunc = statFunc("max")
    aveFunc = statFunc("ave")
    funcCall = ifFunc | sumFunc | minFunc | maxFunc | aveFunc

    multOp = oneOf("* /")
    addOp = oneOf("+ -")
    numericLiteral = Regex(r"\-?\d+(\.\d+)?")
    operand = numericLiteral | funcCall | cellRange | cellRef
    arithExpr = operatorPrecedence(operand,
    [
    (multOp, 2, opAssoc.LEFT),
    (addOp, 2, opAssoc.LEFT),
    ])

    textOperand = dblQuotedString | cellRef
    textExpr = operatorPrecedence(textOperand,
    [
    ('&', 2, opAssoc.LEFT),
    ])
    expr << (arithExpr | textExpr)

    import pprint
    for test in (test1,test2, test3):
    print test
    pprint.pprint( (EQ + expr).parseString(test).asList() )
    print


    Prints:

    =3*A7+5
    [[['3', '*', ['A', '7']], '+', '5']]

    =3*Sheet1!$A$7+5
    [[['3', '*', ['Sheet1', 'A', '7']], '+', '5']]

    =if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14, (Min(C1:C25)+3)
    *18,Max(B1:B25)))
    ['if',
    ['sum', [['A', '1'], ['A', '25']], '>', '42'],
    'min',
    [['B', '1'], ['B', '25']],
    'if',
    ['sum', [['C', '1'], ['C', '25']], '>', '3.14'],
    [['min', [['C', '1'], ['C', '25']], '+', '3'], '*', '18'],
    'max',
    [['B', '1'], ['B', '25']]]


    -- Paul
  • Gabriel Genellina at Jan 14, 2010 at 3:05 am
    En Wed, 13 Jan 2010 05:15:52 -0300, Paul McGuire <ptmcg at austin.rr.com>
    escribi?:
    vsoler wrote:
    Hence, I need toparseExcel formulas. Can I do it by means only of re
    (regular expressions)?
    This might give the OP a running start:

    from pyparsing import (CaselessKeyword, Suppress, ...
    Did you build those parsing rules just by common sense, or following some
    actual specification?

    --
    Gabriel Genellina
  • John Machin at Jan 15, 2010 at 12:16 am

    On Jan 13, 7:15?pm, Paul McGuire wrote:
    On Jan 5, 1:49?pm, Tim Chase wrote:


    vsoler wrote:
    Hence, I need toparseExcel formulas. Can I do it by means only of re
    (regular expressions)?
    I know that for simple formulas such as "=3*A7+5" it is indeed
    possible. What about complex for formulas that include functions,
    sheet names and possibly other *.xls files?
    Where things start getting ugly is when you have nested function
    calls, such as
    ? ?=if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14,
    (Min(C1:C25)+3)*18,Max(B1:B25)))
    Regular expressions don't do well with nested parens (especially
    arbitrarily-nesting-depth such as are possible), so I'd suggest
    going for a full-blown parsing solution like pyparsing.
    If you have fair control over what can be contained in the
    formulas and you know they won't contain nested parens/functions,
    you might be able to formulate some sort of "kinda, sorta, maybe
    parses some forms of formulas" regexp.
    -tkc
    This might give the OP a running start:
    Unfortunately "this" will blow up after only a few paces; see
    below ...
    from pyparsing import (CaselessKeyword, Suppress, Word, alphas,
    ? ? alphanums, nums, Optional, Group, oneOf, Forward, Regex,
    ? ? operatorPrecedence, opAssoc, dblQuotedString)

    test1 = "=3*A7+5"
    test2 = "=3*Sheet1!$A$7+5"
    test2a ="=3*'Sheet 1'!$A$7+5"
    test2b ="=3*'O''Reilly''s sheet'!$A$7+5"

    test3 = "=if(Sum(A1:A25)>42,Min(B1:B25), " \
    ? ? ?"if(Sum(C1:C25)>3.14, (Min(C1:C25)+3)*18,Max(B1:B25)))"
    Many functions can take a variable number of args and they are not
    restricted to cell references e.g.

    test3a = "=sum(a1:a25,10,min(b1,c2,d3))"

    The arg separator is comma or semicolon depending on the locale ... a
    parser should accept either.

    EQ,EXCL,LPAR,RPAR,COLON,COMMA,DOLLAR = map(Suppress, '=!():,$')
    sheetRef = Word(alphas, alphanums)
    colRef = Optional(DOLLAR) + Word(alphas,max=2)
    rowRef = Optional(DOLLAR) + Word(nums)
    cellRef = Group(Optional(sheetRef + EXCL)("sheet") + colRef("col") +
    ? ? ? ? ? ? ? ? ? ? rowRef("row"))

    cellRange = (Group(cellRef("start") + COLON + cellRef("end"))
    ("range")
    ? ? ? ? ? ? ? ? | cellRef )

    expr = Forward()

    COMPARISON_OP = oneOf("< = > >= <= != <>")
    condExpr = expr + COMPARISON_OP + expr

    ifFunc = (CaselessKeyword("if") +
    ? ? ? ? ? LPAR +
    ? ? ? ? ? Group(condExpr)("condition") +
    that should be any expression; at run-time it expects a boolean (TRUE
    or FALSE) or a number (0 means false, non-0 means true). Text causes a
    #VALUE! error. Trying to subdivide expressions into conditional /
    numeric /text just won't work.

    ? ? ? ? ? COMMA + expr("if_true") +
    ? ? ? ? ? COMMA + expr("if_false") + RPAR)
    statFunc = lambda name : CaselessKeyword(name) + LPAR + cellRange +
    RPAR
    sumFunc = statFunc("sum")
    minFunc = statFunc("min")
    maxFunc = statFunc("max")
    aveFunc = statFunc("ave")
    funcCall = ifFunc | sumFunc | minFunc | maxFunc | aveFunc

    multOp = oneOf("* /")
    addOp = oneOf("+ -")
    needs power op "^"
    numericLiteral = Regex(r"\-?\d+(\.\d+)?")
    Sorry, that "-" in there is a unary minus operator. What about 1e23 ?
    operand = numericLiteral | funcCall | cellRange | cellRef
    arithExpr = operatorPrecedence(operand,
    ? ? [
    ? ? (multOp, 2, opAssoc.LEFT),
    ? ? (addOp, 2, opAssoc.LEFT),
    ? ? ])

    textOperand = dblQuotedString | cellRef
    textExpr = operatorPrecedence(textOperand,
    ? ? [
    ? ? ('&', 2, opAssoc.LEFT),
    ? ? ])
    Excel evaluates excessively permissively, and the punters are
    definitely not known for self-restraint. The above just won't work:
    2.3 & 4.5 produces text "2.34.5", while "2.3" + "4.5" produces number
    6.8.
  • Paul McGuire at Jan 15, 2010 at 4:41 am
    I never represented that this parser would handle any and all Excel
    formulas! But I should hope the basic structure of a pyparsing
    solution might help the OP add some of the other features you cited,
    if necessary. It's actually pretty common to take an incremental
    approach in making such a parser, and so here are some of the changes
    that you would need to make based on the deficiencies you pointed out:

    functions can have a variable number of arguments, of any kind of
    expression
    - statFunc = lambda name : CaselessKeyword(name) + LPAR + delimitedList
    (expr) + RPAR

    sheet name could also be a quoted string
    - sheetRef = Word(alphas, alphanums) | QuotedString("'",escQuote="''")

    add boolean literal support
    - boolLiteral = oneOf("TRUE FALSE")
    - operand = numericLiteral | funcCall | boolLiteral | cellRange |
    cellRef

    These small changes are enough to extend the parser to successfully
    handle the test2a, 2b, and 3a cases. (I'll add this to the pyparsing
    wiki examples, as it looks like it is a good start on a familiar but
    complex expression.)

    -- Paul
  • John Machin at Jan 15, 2010 at 7:23 am

    On Jan 15, 3:41?pm, Paul McGuire wrote:
    I never represented that this parser would handle any and all Excel
    formulas!
    ?But I should hope the basic structure of a pyparsing
    solution might help the OP add some of the other features you cited,
    if necessary. It's actually pretty common to take an incremental
    approach in making such a parser, and so here are some of the changes
    that you would need to make based on the deficiencies you pointed out:

    functions can have a variable number of arguments, of any kind of
    expression
    - statFunc = lambda name : CaselessKeyword(name) + LPAR + delimitedList
    (expr) + RPAR

    sheet name could also be a quoted string
    - sheetRef = Word(alphas, alphanums) | QuotedString("'",escQuote="''")

    add boolean literal support
    - boolLiteral = oneOf("TRUE FALSE")
    - operand = numericLiteral | funcCall | boolLiteral | cellRange |
    cellRef
    or a string literal ... you seem to have ignored the significant point
    that the binary operators don't have narrow type requirements of their
    args ("""2.3 & 4.5 produces text "2.34.5", while "2.3" + "4.5"
    produces number 6.8"""); your attempt to enforce particular types for
    args at compile-time is erroneous OVER-engineering.
  • John Machin at Jan 15, 2010 at 12:23 am

    On Jan 14, 2:05?pm, "Gabriel Genellina" wrote:
    En Wed, 13 Jan 2010 05:15:52 -0300, Paul McGuire <pt... at austin.rr.com> ?
    escribi?:
    vsoler wrote:
    Hence, I need toparseExcel formulas. Can I do it by means only of re
    (regular expressions)?
    This might give the OP a running start:
    from pyparsing import (CaselessKeyword, Suppress, ...
    Did you build those parsing rules just by common sense, or following some ?
    actual specification?
    Leave your common sense with the barkeep when you enter the Excel
    saloon; it is likely to be a hindrance. The specification is what
    Excel does.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJan 5, '10 at 6:12p
activeJan 15, '10 at 7:23a
posts21
users10
websitepython.org

People

Translate

site design / logo © 2022 Grokbase