Grokbase Groups Camel users July 2012
FAQ
Hello!

I am fetching files from a FTP server (severall GB for the next years). The files are produced daily in directories which correspond to the date, like

- 20120501
- 20120502
- ...

I have only read rights and I am not the only consumer. This means that they keep the last month or so on the server and I fetch on a daily base. To avoid that I am fetching files twice I want to use an IdempotentRepository implementation. I don't want to save each file in a database or in a text file because the service will run for years and this is just unnecessary data.

What I want to store is the last processed date only. This handles just the directories and would mean that I need some other strategy for the files. I could combine this approach with the default in memory based store. But let just stick to the directories:

I read the directory sorted by file name. The IdempotentRepository is called by the FtpConsumer with

- start()
- contains() for every directory and file
- add() for files only

and that's it. No stop(), no confirm(). When I have errors, sometimes remove() is called. Since the repository is called only with String (the full path) I have not information if I deal with directories or files. I know it from the structure, but I am not able to implement a generic solution.

Anyway the idea is:

- Store the LastProcessedDate inside the repository
- contains(): if the path contains an already processed date (<LastProcessedDate) then I skip it (return true) otherwise return false.
- add(): if add() jumps to the next directory I set the LastProcessedDate to the directory before

The only problem is the last processed directory: even if it is finished I do not get the chance to mark it as processed (set LastProcessedDate to its value).

So finally my questions: do you think this approach makes sense and if yes: how would you deal with the last processed directory?
If no, how would you solve it?

Thanks and kind regards, Christian

Search Discussions

  • Sam (Stephen Samuel) at Jul 6, 2012 at 9:18 am
    "This operation is used if the option eager has been enabled."

    Take a look at that, that might be why confirm is not being called.
    On Fri, Jul 6, 2012 at 7:17 AM, Christian Lipp wrote:

    Hello!

    I am fetching files from a FTP server (severall GB for the next years).
    The files are produced daily in directories which correspond to the date,
    like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means that
    they keep the last month or so on the server and I fetch on a daily base.
    To avoid that I am fetching files twice I want to use an
    IdempotentRepository implementation. I don't want to save each file in a
    database or in a text file because the service will run for years and this
    is just unnecessary data.

    What I want to store is the last processed date only. This handles just
    the directories and would mean that I need some other strategy for the
    files. I could combine this approach with the default in memory based
    store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository is
    called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors, sometimes
    remove() is called. Since the repository is called only with String (the
    full path) I have not information if I deal with directories or files. I
    know it from the structure, but I am not able to implement a generic
    solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date
    (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the LastProcessedDate
    to the directory before

    The only problem is the last processed directory: even if it is finished I
    do not get the chance to mark it as processed (set LastProcessedDate to its
    value).

    So finally my questions: do you think this approach makes sense and if
    yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian


    --
    -Sam
  • Christian Lipp at Jul 6, 2012 at 9:54 am
    Do you mean " eagerMaxMessagesPerPoll" ?
    I set it to false, but nothing changed.

    In the camel code I could only find

    contains in GenericFileConsumer
    Add, remove in GenericFileOnCompletion

    Regards, CL

    -----Ursprüngliche Nachricht-----
    Von: Sam (Stephen Samuel)
    Gesendet: Freitag, 06. Juli 2012 11:17
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    "This operation is used if the option eager has been enabled."

    Take a look at that, that might be why confirm is not being called.
    On Fri, Jul 6, 2012 at 7:17 AM, Christian Lipp wrote:

    Hello!

    I am fetching files from a FTP server (severall GB for the next years).
    The files are produced daily in directories which correspond to the
    date, like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means
    that they keep the last month or so on the server and I fetch on a daily base.
    To avoid that I am fetching files twice I want to use an
    IdempotentRepository implementation. I don't want to save each file in
    a database or in a text file because the service will run for years
    and this is just unnecessary data.

    What I want to store is the last processed date only. This handles
    just the directories and would mean that I need some other strategy
    for the files. I could combine this approach with the default in
    memory based store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository is
    called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors, sometimes
    remove() is called. Since the repository is called only with String
    (the full path) I have not information if I deal with directories or
    files. I know it from the structure, but I am not able to implement a
    generic solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date
    (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the
    LastProcessedDate to the directory before

    The only problem is the last processed directory: even if it is
    finished I do not get the chance to mark it as processed (set
    LastProcessedDate to its value).

    So finally my questions: do you think this approach makes sense and if
    yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian


    --
    -Sam
  • Marco Westermann at Jul 6, 2012 at 10:10 am
    Hi,

    I had a very similar problem. I wrote a little bean which writes the
    last number processed to a file. Another method gives me the last number
    processed.

    regards, Marco

    Am 06.07.2012 11:53, schrieb Christian Lipp:
    Do you mean " eagerMaxMessagesPerPoll" ?
    I set it to false, but nothing changed.

    In the camel code I could only find

    contains in GenericFileConsumer
    Add, remove in GenericFileOnCompletion

    Regards, CL

    -----Ursprüngliche Nachricht-----
    Von: Sam (Stephen Samuel)
    Gesendet: Freitag, 06. Juli 2012 11:17
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    "This operation is used if the option eager has been enabled."

    Take a look at that, that might be why confirm is not being called.

    On Fri, Jul 6, 2012 at 7:17 AM, Christian Lippwrote:
    Hello!

    I am fetching files from a FTP server (severall GB for the next years).
    The files are produced daily in directories which correspond to the
    date, like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means
    that they keep the last month or so on the server and I fetch on a daily base.
    To avoid that I am fetching files twice I want to use an
    IdempotentRepository implementation. I don't want to save each file in
    a database or in a text file because the service will run for years
    and this is just unnecessary data.

    What I want to store is the last processed date only. This handles
    just the directories and would mean that I need some other strategy
    for the files. I could combine this approach with the default in
    memory based store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository is
    called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors, sometimes
    remove() is called. Since the repository is called only with String
    (the full path) I have not information if I deal with directories or
    files. I know it from the structure, but I am not able to implement a
    generic solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date
    (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the
    LastProcessedDate to the directory before

    The only problem is the last processed directory: even if it is
    finished I do not get the chance to mark it as processed (set
    LastProcessedDate to its value).

    So finally my questions: do you think this approach makes sense and if
    yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian

    --
    -Sam
  • Christian Lipp at Jul 6, 2012 at 11:27 am
    My problem is that the number (date) is in the directory, not in the file.
    So I only know that I am finished when it moves to a new directory which doesn't work at the end.

    Regards, Christian

    -----Ursprüngliche Nachricht-----
    Von: Marco Westermann
    Gesendet: Freitag, 06. Juli 2012 12:10
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    Hi,

    I had a very similar problem. I wrote a little bean which writes the last number processed to a file. Another method gives me the last number processed.

    regards, Marco

    Am 06.07.2012 11:53, schrieb Christian Lipp:
    Do you mean " eagerMaxMessagesPerPoll" ?
    I set it to false, but nothing changed.

    In the camel code I could only find

    contains in GenericFileConsumer
    Add, remove in GenericFileOnCompletion

    Regards, CL

    -----Ursprüngliche Nachricht-----
    Von: Sam (Stephen Samuel)
    Gesendet: Freitag, 06. Juli 2012 11:17
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    "This operation is used if the option eager has been enabled."

    Take a look at that, that might be why confirm is not being called.

    On Fri, Jul 6, 2012 at 7:17 AM, Christian Lippwrote:
    Hello!

    I am fetching files from a FTP server (severall GB for the next years).
    The files are produced daily in directories which correspond to the
    date, like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means
    that they keep the last month or so on the server and I fetch on a daily base.
    To avoid that I am fetching files twice I want to use an
    IdempotentRepository implementation. I don't want to save each file
    in a database or in a text file because the service will run for
    years and this is just unnecessary data.

    What I want to store is the last processed date only. This handles
    just the directories and would mean that I need some other strategy
    for the files. I could combine this approach with the default in
    memory based store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository is
    called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors, sometimes
    remove() is called. Since the repository is called only with String
    (the full path) I have not information if I deal with directories or
    files. I know it from the structure, but I am not able to implement a
    generic solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date
    (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the
    LastProcessedDate to the directory before

    The only problem is the last processed directory: even if it is
    finished I do not get the chance to mark it as processed (set
    LastProcessedDate to its value).

    So finally my questions: do you think this approach makes sense and
    if
    yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian

    --
    -Sam
  • Sam (Stephen Samuel) at Jul 6, 2012 at 11:14 am
    eagerMaxMessagesPerPoll needs to be set to true.
    On Fri, Jul 6, 2012 at 10:53 AM, Christian Lipp wrote:

    Do you mean " eagerMaxMessagesPerPoll" ?
    I set it to false, but nothing changed.

    In the camel code I could only find

    contains in GenericFileConsumer
    Add, remove in GenericFileOnCompletion

    Regards, CL

    -----Ursprüngliche Nachricht-----
    Von: Sam (Stephen Samuel)
    Gesendet: Freitag, 06. Juli 2012 11:17
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    "This operation is used if the option eager has been enabled."

    Take a look at that, that might be why confirm is not being called.

    On Fri, Jul 6, 2012 at 7:17 AM, Christian Lipp <Christian.Lipp@xion.at
    wrote:
    Hello!

    I am fetching files from a FTP server (severall GB for the next years).
    The files are produced daily in directories which correspond to the
    date, like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means
    that they keep the last month or so on the server and I fetch on a daily base.
    To avoid that I am fetching files twice I want to use an
    IdempotentRepository implementation. I don't want to save each file in
    a database or in a text file because the service will run for years
    and this is just unnecessary data.

    What I want to store is the last processed date only. This handles
    just the directories and would mean that I need some other strategy
    for the files. I could combine this approach with the default in
    memory based store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository is
    called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors, sometimes
    remove() is called. Since the repository is called only with String
    (the full path) I have not information if I deal with directories or
    files. I know it from the structure, but I am not able to implement a
    generic solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date
    (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the
    LastProcessedDate to the directory before

    The only problem is the last processed directory: even if it is
    finished I do not get the chance to mark it as processed (set
    LastProcessedDate to its value).

    So finally my questions: do you think this approach makes sense and if
    yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian


    --
    -Sam


    --
    -Sam
  • Christian Lipp at Jul 6, 2012 at 11:31 am
    eagerMaxMessagesPerPoll is true by default, so it doesn't change anything to set it.
    There was no difference for true/false.

    Regards, Christian

    -----Ursprüngliche Nachricht-----
    Von: Sam (Stephen Samuel)
    Gesendet: Freitag, 06. Juli 2012 13:13
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    eagerMaxMessagesPerPoll needs to be set to true.
    On Fri, Jul 6, 2012 at 10:53 AM, Christian Lipp wrote:

    Do you mean " eagerMaxMessagesPerPoll" ?
    I set it to false, but nothing changed.

    In the camel code I could only find

    contains in GenericFileConsumer
    Add, remove in GenericFileOnCompletion

    Regards, CL

    -----Ursprüngliche Nachricht-----
    Von: Sam (Stephen Samuel)
    Gesendet: Freitag, 06. Juli 2012 11:17
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    "This operation is used if the option eager has been enabled."

    Take a look at that, that might be why confirm is not being called.

    On Fri, Jul 6, 2012 at 7:17 AM, Christian Lipp <Christian.Lipp@xion.at
    wrote:
    Hello!

    I am fetching files from a FTP server (severall GB for the next years).
    The files are produced daily in directories which correspond to the
    date, like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means
    that they keep the last month or so on the server and I fetch on a
    daily base.
    To avoid that I am fetching files twice I want to use an
    IdempotentRepository implementation. I don't want to save each file
    in a database or in a text file because the service will run for
    years and this is just unnecessary data.

    What I want to store is the last processed date only. This handles
    just the directories and would mean that I need some other strategy
    for the files. I could combine this approach with the default in
    memory based store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository
    is called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors,
    sometimes
    remove() is called. Since the repository is called only with String
    (the full path) I have not information if I deal with directories or
    files. I know it from the structure, but I am not able to implement
    a generic solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date
    (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the
    LastProcessedDate to the directory before

    The only problem is the last processed directory: even if it is
    finished I do not get the chance to mark it as processed (set
    LastProcessedDate to its value).

    So finally my questions: do you think this approach makes sense and
    if
    yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian


    --
    -Sam


    --
    -Sam
  • Claus Ibsen at Jul 6, 2012 at 1:04 pm
    Hi

    You can use a filter, and then some custom logic (eg a pojo) and
    return true|false to accept the file.
    Then you dont need the idempotent pattern.

    And from Camel 2.10 onwards the filter is now also invoked for
    directories. So you can skip traversing down in directories you do not
    want to.

    On Fri, Jul 6, 2012 at 8:17 AM, Christian Lipp wrote:
    Hello!

    I am fetching files from a FTP server (severall GB for the next years). The files are produced daily in directories which correspond to the date, like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means that they keep the last month or so on the server and I fetch on a daily base. To avoid that I am fetching files twice I want to use an IdempotentRepository implementation. I don't want to save each file in a database or in a text file because the service will run for years and this is just unnecessary data.

    What I want to store is the last processed date only. This handles just the directories and would mean that I need some other strategy for the files. I could combine this approach with the default in memory based store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository is called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors, sometimes remove() is called. Since the repository is called only with String (the full path) I have not information if I deal with directories or files. I know it from the structure, but I am not able to implement a generic solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the LastProcessedDate to the directory before

    The only problem is the last processed directory: even if it is finished I do not get the chance to mark it as processed (set LastProcessedDate to its value).

    So finally my questions: do you think this approach makes sense and if yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian


    --
    Claus Ibsen
    -----------------
    FuseSource
    Email: cibsen@fusesource.com
    Web: http://fusesource.com
    Twitter: davsclaus, fusenews
    Blog: http://davsclaus.com
    Author of Camel in Action: http://www.manning.com/ibsen
  • Christian Lipp at Jul 8, 2012 at 8:27 am
    For repository contains() is called for directories and files, while add() is called only for files.
    But it is much easier to use filter instead of repository as you recommended, so I switched to filter.

    However the original problem is still there: I mark directories as processed, since these contain the date-information and I do this when I switch to a new directory: because the input is sorted I know that the old directory is handled and I mark it as processed.

    But when I read the last directory I don't receive a "it is over" message inside the filter, so the next time the route is executed, it copies the last directory again and I would like to solve this.

    Regards, Christian

    -----Ursprüngliche Nachricht-----
    Von: Claus Ibsen
    Gesendet: Freitag, 06. Juli 2012 15:04
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    Hi

    You can use a filter, and then some custom logic (eg a pojo) and return true|false to accept the file.
    Then you dont need the idempotent pattern.

    And from Camel 2.10 onwards the filter is now also invoked for directories. So you can skip traversing down in directories you do not want to.

    On Fri, Jul 6, 2012 at 8:17 AM, Christian Lipp wrote:
    Hello!

    I am fetching files from a FTP server (severall GB for the next
    years). The files are produced daily in directories which correspond
    to the date, like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means that they keep the last month or so on the server and I fetch on a daily base. To avoid that I am fetching files twice I want to use an IdempotentRepository implementation. I don't want to save each file in a database or in a text file because the service will run for years and this is just unnecessary data.

    What I want to store is the last processed date only. This handles just the directories and would mean that I need some other strategy for the files. I could combine this approach with the default in memory based store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository is
    called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors, sometimes remove() is called. Since the repository is called only with String (the full path) I have not information if I deal with directories or files. I know it from the structure, but I am not able to implement a generic solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the
    LastProcessedDate to the directory before

    The only problem is the last processed directory: even if it is finished I do not get the chance to mark it as processed (set LastProcessedDate to its value).

    So finally my questions: do you think this approach makes sense and if yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian


    --
    Claus Ibsen
    -----------------
    FuseSource
    Email: cibsen@fusesource.com
    Web: http://fusesource.com
    Twitter: davsclaus, fusenews
    Blog: http://davsclaus.com
    Author of Camel in Action: http://www.manning.com/ibsen
  • Christian Lipp at Jul 9, 2012 at 8:43 pm
    I think using filter is not possible since I cannot detect the end of the transmission. An implementation of IdempotentRepository should work because

    - contains() works like filter.accept(): directories and files are handed over but since they are all strings I have to know the structure to distinguish between them. Anyway I can count (or even remember) the number of files
    - add() only receives the files, not the directories. I know that I am finished when I receive al files I accepted in contains().

    Anyway I am still astound that there is no better solution or that it is not possible to receive the end of the ftp polling.
    Kind regards, Christian


    ________________________________________
    Re: AW: IdempotentRepository

    For repository contains() is called for directories and files, while add() is called only for files.
    But it is much easier to use filter instead of repository as you recommended, so I switched to filter.

    However the original problem is still there: I mark directories as processed, since these contain the date-information and I do this when I switch to a new directory: because the input is sorted I know that the old directory is handled and I mark it as processed.

    But when I read the last directory I don't receive a "it is over" message inside the filter, so the next time the route is executed, it copies the last directory again and I would like to solve this.

    Regards, Christian

    -----Ursprüngliche Nachricht-----
    Von: Claus Ibsen
    Gesendet: Freitag, 06. Juli 2012 15:04
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    Hi

    You can use a filter, and then some custom logic (eg a pojo) and return true|false to accept the file.
    Then you dont need the idempotent pattern.

    And from Camel 2.10 onwards the filter is now also invoked for directories. So you can skip traversing down in directories you do not want to.

    On Fri, Jul 6, 2012 at 8:17 AM, Christian Lipp wrote:
    Hello!

    I am fetching files from a FTP server (severall GB for the next
    years). The files are produced daily in directories which correspond
    to the date, like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means that they keep the last month or so on the server and I fetch on a daily base. To avoid that I am fetching files twice I want to use an IdempotentRepository implementation. I don't want to save each file in a database or in a text file because the service will run for years and this is just unnecessary data.

    What I want to store is the last processed date only. This handles just the directories and would mean that I need some other strategy for the files. I could combine this approach with the default in memory based store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository is
    called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors, sometimes remove() is called. Since the repository is called only with String (the full path) I have not information if I deal with directories or files. I know it from the structure, but I am not able to implement a generic solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the
    LastProcessedDate to the directory before

    The only problem is the last processed directory: even if it is finished I do not get the chance to mark it as processed (set LastProcessedDate to its value).

    So finally my questions: do you think this approach makes sense and if yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian


    --
    Claus Ibsen
    -----------------
    FuseSource
    Email: cibsen@fusesource.com
    Web: http://fusesource.com
    Twitter: davsclaus, fusenews
    Blog: http://davsclaus.com
    Author of Camel in Action: http://www.manning.com/ibsen
  • Claus Ibsen at Jul 10, 2012 at 6:00 am

    On Mon, Jul 9, 2012 at 10:42 PM, Christian Lipp wrote:
    I think using filter is not possible since I cannot detect the end of the transmission. An implementation of IdempotentRepository should work because

    - contains() works like filter.accept(): directories and files are handed over but since they are all strings I have to know the structure to distinguish between them. Anyway I can count (or even remember) the number of files
    - add() only receives the files, not the directories. I know that I am finished when I receive al files I accepted in contains().

    Anyway I am still astound that there is no better solution or that it is not possible to receive the end of the ftp polling.
    Kind regards, Christian
    The last exchange has a property that marks the end of the ftp
    polling. You can use that with on completion / event notifier etc. to
    know when the ftp was complete.

    Or use a custom poll strategy
    http://camel.apache.org/maven/current/camel-core/apidocs/org/apache/camel/spi/PollingConsumerPollStrategy.html


    ________________________________________
    Re: AW: IdempotentRepository

    For repository contains() is called for directories and files, while add() is called only for files.
    But it is much easier to use filter instead of repository as you recommended, so I switched to filter.

    However the original problem is still there: I mark directories as processed, since these contain the date-information and I do this when I switch to a new directory: because the input is sorted I know that the old directory is handled and I mark it as processed.

    But when I read the last directory I don't receive a "it is over" message inside the filter, so the next time the route is executed, it copies the last directory again and I would like to solve this.

    Regards, Christian

    -----Ursprüngliche Nachricht-----
    Von: Claus Ibsen
    Gesendet: Freitag, 06. Juli 2012 15:04
    An: users@camel.apache.org
    Betreff: Re: IdempotentRepository

    Hi

    You can use a filter, and then some custom logic (eg a pojo) and return true|false to accept the file.
    Then you dont need the idempotent pattern.

    And from Camel 2.10 onwards the filter is now also invoked for directories. So you can skip traversing down in directories you do not want to.

    On Fri, Jul 6, 2012 at 8:17 AM, Christian Lipp wrote:
    Hello!

    I am fetching files from a FTP server (severall GB for the next
    years). The files are produced daily in directories which correspond
    to the date, like

    - 20120501
    - 20120502
    - ...

    I have only read rights and I am not the only consumer. This means that they keep the last month or so on the server and I fetch on a daily base. To avoid that I am fetching files twice I want to use an IdempotentRepository implementation. I don't want to save each file in a database or in a text file because the service will run for years and this is just unnecessary data.

    What I want to store is the last processed date only. This handles just the directories and would mean that I need some other strategy for the files. I could combine this approach with the default in memory based store. But let just stick to the directories:

    I read the directory sorted by file name. The IdempotentRepository is
    called by the FtpConsumer with

    - start()
    - contains() for every directory and file
    - add() for files only

    and that's it. No stop(), no confirm(). When I have errors, sometimes remove() is called. Since the repository is called only with String (the full path) I have not information if I deal with directories or files. I know it from the structure, but I am not able to implement a generic solution.

    Anyway the idea is:

    - Store the LastProcessedDate inside the repository
    - contains(): if the path contains an already processed date (<LastProcessedDate) then I skip it (return true) otherwise return false.
    - add(): if add() jumps to the next directory I set the
    LastProcessedDate to the directory before

    The only problem is the last processed directory: even if it is finished I do not get the chance to mark it as processed (set LastProcessedDate to its value).

    So finally my questions: do you think this approach makes sense and if yes: how would you deal with the last processed directory?
    If no, how would you solve it?

    Thanks and kind regards, Christian


    --
    Claus Ibsen
    -----------------
    FuseSource
    Email: cibsen@fusesource.com
    Web: http://fusesource.com
    Twitter: davsclaus, fusenews
    Blog: http://davsclaus.com
    Author of Camel in Action: http://www.manning.com/ibsen


    --
    Claus Ibsen
    -----------------
    FuseSource
    Email: cibsen@fusesource.com
    Web: http://fusesource.com
    Twitter: davsclaus, fusenews
    Blog: http://davsclaus.com
    Author of Camel in Action: http://www.manning.com/ibsen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupusers @
categoriescamel
postedJul 6, '12 at 6:17a
activeJul 10, '12 at 6:00a
posts11
users4
websitecamel.apache.org

People

Translate

site design / logo © 2022 Grokbase