FAQ
Hi,

I have a bookmark-file from opera6.11 that contains a lot of duplicate
entries.

I would like to be able to remove all duplicate entries without
destroying the structure of the file.

I have tried this with a set of scripts that converted the file into a
format that could be sent through 'uniq' but somewhere the structure got
mangled up and all my folder-settings were lost.

An opera bookmark file has the fillowing syntax:
--begin-of-file--
Opera Hotlist version 2.0
Options: encoding = utf8, version=3

#FOLDER
NAME=software
CREATED=1025797561
ORDER=0

#URL
NAME=Arachnophilia Home Page
URL=http://www.arachnoid.com/arachnophilia/
CREATED=976878001
VISITED=1025962454
ORDER=0
-

#FOLDER
...
-

--end-of-file
The lines at the top can be easily copied should they get lost, they're
not much of a concern to me. But all '#FOLDER' blocks and all empty
lines and lines containing a single '-' should be preserved.

The values of 'CREATED' and 'VISITED' can be ignored for the comparison
and the value of 'ORDER' should be reset to 'ORDER='. (This way Opera
will regenerate the value of ORDER when the file is loaded)

An additional problem I discovered yesterday is that Murphy's law
applies even to Opera's bookmark file... The bookmarks are sorted
alphabetically, but only on name, so I found some blocks like:
NAME=tripod
URL=http://www.tripod.com
...
NAME=tripod
URL=http://www.tripod.lycos.com
...
NAME=tripod
URL=http://www.tripod.com
...
NAME=tripod
URL=http://www.tripod.lycos.com
...

So the script would have to look back 2 blocks...

So the script should be able to read several lines into a couple of
variables and modify and compare those variables to determine which
lines can be deleted...

Any suggestions?

TIA
--
# Mertens Bram "M8ram" <m8ram.list@wanadoo.be> Linux User #249103 #
# Red Hat Linux release 7.3 (Valhalla) kernel 2.4.18-3 i686 128MB RAM #
# 11:24pm up 9 days, 3:38, 1 user, load average: 0.75, 0.89, 0.73 #

Search Discussions

  • Wagner, David --- Senior Programmer Analyst --- WGO at Jan 19, 2003 at 11:31 pm

    Mertens Bram wrote:
    Hi,

    I have a bookmark-file from opera6.11 that contains a lot of duplicate
    entries.

    I would like to be able to remove all duplicate entries without
    destroying the structure of the file.

    I have tried this with a set of scripts that converted the file into a
    format that could be sent through 'uniq' but somewhere the structure
    got mangled up and all my folder-settings were lost.

    An opera bookmark file has the fillowing syntax:
    --begin-of-file--
    Opera Hotlist version 2.0
    Options: encoding = utf8, version=3

    #FOLDER
    NAME=software
    CREATED=1025797561
    ORDER=0

    #URL
    NAME=Arachnophilia Home Page
    URL=http://www.arachnoid.com/arachnophilia/
    CREATED=976878001
    VISITED=1025962454
    ORDER=0
    -

    #FOLDER
    ...
    -

    --end-of-file
    The lines at the top can be easily copied should they get lost,
    they're not much of a concern to me. But all '#FOLDER' blocks and
    all empty lines and lines containing a single '-' should be preserved.

    The values of 'CREATED' and 'VISITED' can be ignored for the
    comparison and the value of 'ORDER' should be reset to 'ORDER='.
    (This way Opera will regenerate the value of ORDER when the file is
    loaded)

    An additional problem I discovered yesterday is that Murphy's law
    applies even to Opera's bookmark file... The bookmarks are sorted
    alphabetically, but only on name, so I found some blocks like:
    NAME=tripod
    URL=http://www.tripod.com
    ...
    NAME=tripod
    URL=http://www.tripod.lycos.com
    ...
    NAME=tripod
    URL=http://www.tripod.com
    ...
    NAME=tripod
    URL=http://www.tripod.lycos.com
    ...

    So the script would have to look back 2 blocks...
    It appears that it should not be too hard to do this. I would
    rather see a file with a few more entries for Folders and URL's.

    Are there other fields which you do not have in the defining of
    either Folders and URL's?

    Do you take the first occurance of a URL or the last?

    How does a NAME get set? Could name be 'atripod' and another be
    'ztripod' for www.tripood.com?

    More info and I believe the list should be able to assis you in the
    cleanup.

    Wags ;)


    **********************************************************
    This message contains information that is confidential
    and proprietary to FedEx Freight or its affiliates.
    It is intended only for the recipient named and for
    the express purpose(s) described therein.
    Any other use is prohibited.
    ****************************************************************
  • Mertens Bram at Jan 20, 2003 at 11:06 am

    On Sun, 2003-01-19 at 23:31, Wagner, David --- Senior Programmer Analyst --- WGO wrote:
    Mertens Bram wrote:
    [snip]
    I would like to be able to remove all duplicate entries without
    destroying the structure of the file.
    [snip]
    It appears that it should not be too hard to do this. I would
    rather see a file with a few more entries for Folders and URL's.
    Do you mean you would like to see a larger portion of the file? I can
    send it off-list to you if you want...
    Are there other fields which you do not have in the defining of
    either Folders and URL's?
    No, folders always have the NAME, CREATED and ORDER fields. URL's
    always have the NAME, URL, CREATED, VISITED and ORDER fields.
    The URL's should be compared based on the values of NAME and URL. The
    VALUES of CREATED, VISITED and ORDER should be ignored for the
    comparison. The values of CREATED and VISITED can be retained from the
    URL's that are no duplicates but the ORDER field should be reset.
    Otherwise Opera might consider the file corrupted and overwrite it with
    it's backup.
    Do you take the first occurance of a URL or the last?
    Whatever is easier, if they are duplicates I don't mind, the CREATED,
    VISITED and ORDER fields are of little importance to me.
    How does a NAME get set? Could name be 'atripod' and another be
    'ztripod' for www.tripood.com?
    If I somehow assigned two name's to the same URL I don't mind deleting
    those manually afterwards. I still have to go through the file manually
    later anyhow to put some of the URL's into other folders.

    Right now I would like to remove the duplicates per folder. Rob's
    suggestion works fine but it doesn't preserve the syntax of the bookmark
    file.

    TIA
    --
    # Mertens Bram "M8ram" <m8ram.list@wanadoo.be> Linux User #249103 #
    # Red Hat Linux release 7.3 (Valhalla) kernel 2.4.18-3 i686 128MB RAM #
    # 11:50am up 9 days, 16:03, 1 user, load average: 0.25, 0.16, 0.16 #
  • Rob Dixon at Jan 20, 2003 at 11:55 am

    Mertens Bram wrote:
    On Sun, 2003-01-19 at 23:31, Wagner, David --- Senior Programmer
    Analyst
    --- WGO wrote:
    Are there other fields which you do not have in the defining of
    either Folders and URL's?
    No, folders always have the NAME, CREATED and ORDER fields. URL's
    always have the NAME, URL, CREATED, VISITED and ORDER fields.
    The URL's should be compared based on the values of NAME and URL.
    Are you sure? The naming is arbitrary (although it defaults to the HTML
    title)
    and identical URLs could be named differently.
    The VALUES of CREATED, VISITED and ORDER should be ignored for the
    comparison. The values of CREATED and VISITED can be retained from
    the URL's that are no duplicates but the ORDER field should be reset.
    Otherwise Opera might consider the file corrupted and overwrite it
    with it's backup.
    How does a NAME get set? Could name be 'atripod' and another be
    'ztripod' for www.tripood.com?
    If I somehow assigned two name's to the same URL I don't mind deleting
    those manually afterwards. I still have to go through the file
    manually later anyhow to put some of the URL's into other folders.
    It would be easier to use Opera's Manage bookmarks facility to drag and
    drop them into place.
    Right now I would like to remove the duplicates per folder.
    Per folder? That means you don't mind duplicate URLs across folders?
    Rob's suggestion works fine but it doesn't preserve the syntax of the
    bookmark file.
    It preserves it OK, its problem is that it doesn't touch the file at
    all!
    A solution which edits the file for you may be a few hours work. Are
    there so many duplicates that you don't want to edit them by hand, or
    will you want to do this again many times in the future? If not then I
    suggest that you stick to manual editing.

    Cheers,

    Rob
  • Rob Dixon at Jan 20, 2003 at 11:38 am

    David --- Senior Programmer Analyst --- Wgo Wagner wrote:
    It appears that it should not be too hard to do this. I would
    rather see a file with a few more entries for Folders and URL's.
    The syntax is simply a series of #FOLDER blocks, each followed by zero
    or more #URL blocks. Blocks of each type always contain the same fields,
    those that Marten has described. Every block of either type is
    terminated by a blank line. A #FOLDER block with all of its #URL blocks
    is followed additionally by a line containing just a single hyphen. All
    lines are terminated by "\r\n".

    HTH,

    Rob
  • Rob Dixon at Jan 20, 2003 at 12:03 am

    Mertens Bram wrote:
    Hi,

    I have a bookmark-file from opera6.11 that contains a lot of duplicate
    entries.

    I would like to be able to remove all duplicate entries without
    destroying the structure of the file.
    This is something that you could do to any level of complexity. First of
    all, hashes are good for finding duplicates. I would scan the file for
    all 'URL=' lines and increment the value of a hash keyed on that URL.
    For instance:

    my %urls;
    open BMK, "< Opera6.adr" or die "Unable to open bookmarks: $!";
    while (<BMK>)
    {
    chomp;
    next unless /URL=(.+)/;
    $urls{$1}++;
    }
    close BMK;

    foreach (sort keys %urls) { print "$_\n" if $urls{$_} > 1 };

    Which will list all of the URLs which occur more than once. You can use
    this list to edit the file manually, or you may want to go on to improve
    the script to where it will digest the entire file and generate a new
    one.

    Get it working this far first!

    Cheers,

    Rob

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbeginners @
categoriesperl
postedJan 19, '03 at 11:01p
activeJan 20, '03 at 11:55a
posts6
users3
websiteperl.org

People

Translate

site design / logo © 2022 Grokbase