FAQ

[Neo4j] Performance problem with BatchInsert

Tcb
Oct 19, 2012 at 4:02 pm
Hi,

I have run into some issues with the BatchInserter, and given the recent
attention on the list I hope you can shed some light.

I am importing a large dataset and using a simple variation on Michael
Hunger's imported (github.com/jexp/batch-import). I have a smallish number
of nodes ~10^7 which pretty much fit in memory and lots of relationships
~10^10. The batch insertion runs along a good pace and then dramatically
slows down (roughly 10 times slower) and I can see that CPU usage has
dropped but neo4j is doing a lot of disk io.

The question is why does the insertion slow down? I have set
"cache_type=none" so once the nodes are loaded (which happens really fast)
there is no need to cache any relationships- all I am doing is inserting
them into the database. I obviously don't have enough memory to hold the
full graph, but its not obvious why more memory is needed, beyond just
holding the nodes themselves.

In the neo directory the stores look like this (still importing at a
snail's pace...):

-rw-r--r-- 1 user user 5 Oct 19 14:28
neo.db/neostore.relationshiptypestore.db
-rw-r--r-- 1 user user 4.7G Oct 19 16:58
neo.db/neostore.relationshipstore.db
-rw-r--r-- 1 user user 36G Oct 19 16:58 neo.db/neostore.propertystore.db
-rw-r--r-- 1 user user 103M Oct 19 16:58 neo.db/neostore.nodestore.db



My current configuration parameters are below and I am using the
1.9-SNAPSHOT on osx 10.8.2.

stringMap("dump_configuration", "true"//
, "use_memory_mapped_buffers", "true"//
, "cache_type", "none" //
, "neostore.nodestore.db.mapped_memory", "128M"//
, "neostore.relationshipstore.db.mapped_memory", "2048M"//
, "neostore.propertystore.db.mapped_memory", "1024M"//
, "neostore.propertystore.db.strings.mapped_memory", "10M"//
, "neostore.propertystore.db.arrays.mapped_memory", "10M"//
, "neostore.propertystore.db.index.keys.mapped_memory", "10M"//
, "neostore.propertystore.db.index.mapped_memory", "10M"//
);


I suspect the problem is that I do not fully understand the configuration
parameters, and perhaps tuning them better would improve the situation for
me.

thanks

-

--
reply

Search Discussions

4 responses

  • Tcb at Oct 20, 2012 at 9:48 am
    Hi

    Just looking at a profile, it seems that a good chunk of the time is spent
    in sun.nio.ch.FileChannelImpl.read(). This ultimately gets called once for
    every relationship that is created. The corresponding nio write of the
    relationship is actually very quick
    ("sun.nio.ch.FileChannelImpl.write()","0.23733096","1446795","14"). I don't
    know why the read() takes so long- I would expect nio to scale very well to
    files this size and larger. Perhaps I'll give it a try importing on a linux
    box and see if there is any difference in the performance. Any other
    suggestions?

    thanks

    -


    "Call Tree - Method","Time [%]","Time","Time (CPU)","Invocations",
    " All threads","100.0","609610719","1"
    " main","100.0","609610719","1"
    " org.neo.user.NeoImport.main()","100.0","609610719","2"
    " org.neo.user.NeoImport.doMainNew()","100.0","609610719","2"
    " org.neo.user.NeoImport.run()","100.0","609610719","2"
    " org.neo.user.NeoImport.importRels()","100.0","609610719","2"
    "
    org.neo4j.unsafe.batchinsert.BatchInserterImpl.createRelationship()","94.842926","578172623","293"
    "
    org.neo4j.unsafe.batchinsert.BatchInserterImpl.connectRelationship()","65.519844","399415993","863"
    "
    org.neo4j.unsafe.batchinsert.BatchInserterImpl.connect()","65.519844","399415993","1641"
    "
    org.neo4j.kernel.impl.nioneo.store.RelationshipStore.getRecord()","64.98595","396161289","1635"
    "
    org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow()","44.379433","270541769","1605"
    "
    org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire()","44.379433","270541769","1605"
    "
    org.neo4j.kernel.impl.nioneo.store.PersistenceRow.lock()","44.13959","269079690","1597"
    "
    org.neo4j.kernel.impl.nioneo.store.PersistenceRow$State$1.transition()","44.13959","269079690","1597"
    "
    org.neo4j.kernel.impl.nioneo.store.PersistenceRow.readFullWindow()","44.13959","269079690","1597"
    "
    sun.nio.ch.FileChannelImpl.read()","44.13959","269079690","1597"
    " Self time","0.0","0","1597"
    " Self time","0.0","0","1597"
    " Self time","0.0","0","1597"
    " Self time","0.16926737","1031872","1605"
    "
    java.util.concurrent.ConcurrentHashMap.putIfAbsent()","0.070570774","430207","5"
    " Self time","0.0","0","1605"
    " Self time","20.593021","125537264","1635"
    "
    On Fri, Oct 19, 2012 at 5:02 PM, tcb wrote:

    Hi,

    I have run into some issues with the BatchInserter, and given the recent
    attention on the list I hope you can shed some light.

    I am importing a large dataset and using a simple variation on Michael
    Hunger's imported (github.com/jexp/batch-import). I have a smallish
    number of nodes ~10^7 which pretty much fit in memory and lots of
    relationships ~10^10. The batch insertion runs along a good pace and then
    dramatically slows down (roughly 10 times slower) and I can see that CPU
    usage has dropped but neo4j is doing a lot of disk io.

    The question is why does the insertion slow down? I have set
    "cache_type=none" so once the nodes are loaded (which happens really fast)
    there is no need to cache any relationships- all I am doing is inserting
    them into the database. I obviously don't have enough memory to hold the
    full graph, but its not obvious why more memory is needed, beyond just
    holding the nodes themselves.

    In the neo directory the stores look like this (still importing at a
    snail's pace...):

    -rw-r--r-- 1 user user 5 Oct 19 14:28
    neo.db/neostore.relationshiptypestore.db
    -rw-r--r-- 1 user user 4.7G Oct 19 16:58
    neo.db/neostore.relationshipstore.db
    -rw-r--r-- 1 user user 36G Oct 19 16:58 neo.db/neostore.propertystore.db
    -rw-r--r-- 1 user user 103M Oct 19 16:58 neo.db/neostore.nodestore.db



    My current configuration parameters are below and I am using the
    1.9-SNAPSHOT on osx 10.8.2.

    stringMap("dump_configuration", "true"//
    , "use_memory_mapped_buffers", "true"//
    , "cache_type", "none" //
    , "neostore.nodestore.db.mapped_memory", "128M"//
    , "neostore.relationshipstore.db.mapped_memory", "2048M"//
    , "neostore.propertystore.db.mapped_memory", "1024M"//
    , "neostore.propertystore.db.strings.mapped_memory", "10M"//
    , "neostore.propertystore.db.arrays.mapped_memory", "10M"//
    , "neostore.propertystore.db.index.keys.mapped_memory", "10M"//
    , "neostore.propertystore.db.index.mapped_memory", "10M"//
    );


    I suspect the problem is that I do not fully understand the configuration
    parameters, and perhaps tuning them better would improve the situation for
    me.

    thanks

    -
    --
  • Michael Hunger at Oct 20, 2012 at 10:13 pm
    How much RAM do you have in total?

    How many properties do you currently store on the relationships and what is their content?
    The problem is it has to read a lot of different relationship-store windows (segments of the relationship-file) to update all the relationship-linked-lists and if those windows are swapped frequently their initial loading/memory-mapping takes a bit which adds up.

    Would it be possible for you to:

    #1 increase the mmio-setting for relationships?
    #2 pre-order the relationships by from and to-node pairs in your input data?
    #3 do you run your mac on SSD's ?

    Looks like an interesting use-case, do you have a data-generator for your data?

    Michael

    Am 20.10.2012 um 11:48 schrieb tcb:
    Hi

    Just looking at a profile, it seems that a good chunk of the time is spent in sun.nio.ch.FileChannelImpl.read(). This ultimately gets called once for every relationship that is created. The corresponding nio write of the relationship is actually very quick ("sun.nio.ch.FileChannelImpl.write()","0.23733096","1446795","14"). I don't know why the read() takes so long- I would expect nio to scale very well to files this size and larger. Perhaps I'll give it a try importing on a linux box and see if there is any difference in the performance. Any other suggestions?

    thanks

    -


    "Call Tree - Method","Time [%]","Time","Time (CPU)","Invocations",
    " All threads","100.0","609610719","1"
    " main","100.0","609610719","1"
    " org.neo.user.NeoImport.main()","100.0","609610719","2"
    " org.neo.user.NeoImport.doMainNew()","100.0","609610719","2"
    " org.neo.user.NeoImport.run()","100.0","609610719","2"
    " org.neo.user.NeoImport.importRels()","100.0","609610719","2"
    " org.neo4j.unsafe.batchinsert.BatchInserterImpl.createRelationship()","94.842926","578172623","293"
    " org.neo4j.unsafe.batchinsert.BatchInserterImpl.connectRelationship()","65.519844","399415993","863"
    " org.neo4j.unsafe.batchinsert.BatchInserterImpl.connect()","65.519844","399415993","1641"
    " org.neo4j.kernel.impl.nioneo.store.RelationshipStore.getRecord()","64.98595","396161289","1635"
    " org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow()","44.379433","270541769","1605"
    " org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire()","44.379433","270541769","1605"
    " org.neo4j.kernel.impl.nioneo.store.PersistenceRow.lock()","44.13959","269079690","1597"
    " org.neo4j.kernel.impl.nioneo.store.PersistenceRow$State$1.transition()","44.13959","269079690","1597"
    " org.neo4j.kernel.impl.nioneo.store.PersistenceRow.readFullWindow()","44.13959","269079690","1597"
    " sun.nio.ch.FileChannelImpl.read()","44.13959","269079690","1597"
    " Self time","0.0","0","1597"
    " Self time","0.0","0","1597"
    " Self time","0.0","0","1597"
    " Self time","0.16926737","1031872","1605"
    " java.util.concurrent.ConcurrentHashMap.putIfAbsent()","0.070570774","430207","5"
    " Self time","0.0","0","1605"
    " Self time","20.593021","125537264","1635"
    "

    On Fri, Oct 19, 2012 at 5:02 PM, tcb wrote:
    Hi,

    I have run into some issues with the BatchInserter, and given the recent attention on the list I hope you can shed some light.

    I am importing a large dataset and using a simple variation on Michael Hunger's imported (github.com/jexp/batch-import). I have a smallish number of nodes ~10^7 which pretty much fit in memory and lots of relationships ~10^10. The batch insertion runs along a good pace and then dramatically slows down (roughly 10 times slower) and I can see that CPU usage has dropped but neo4j is doing a lot of disk io.

    The question is why does the insertion slow down? I have set "cache_type=none" so once the nodes are loaded (which happens really fast) there is no need to cache any relationships- all I am doing is inserting them into the database. I obviously don't have enough memory to hold the full graph, but its not obvious why more memory is needed, beyond just holding the nodes themselves.

    In the neo directory the stores look like this (still importing at a snail's pace...):

    -rw-r--r-- 1 user user 5 Oct 19 14:28 neo.db/neostore.relationshiptypestore.db
    -rw-r--r-- 1 user user 4.7G Oct 19 16:58 neo.db/neostore.relationshipstore.db
    -rw-r--r-- 1 user user 36G Oct 19 16:58 neo.db/neostore.propertystore.db
    -rw-r--r-- 1 user user 103M Oct 19 16:58 neo.db/neostore.nodestore.db


    My current configuration parameters are below and I am using the 1.9-SNAPSHOT on osx 10.8.2.

    stringMap("dump_configuration", "true"//
    , "use_memory_mapped_buffers", "true"//
    , "cache_type", "none" //
    , "neostore.nodestore.db.mapped_memory", "128M"//
    , "neostore.relationshipstore.db.mapped_memory", "2048M"//
    , "neostore.propertystore.db.mapped_memory", "1024M"//
    , "neostore.propertystore.db.strings.mapped_memory", "10M"//
    , "neostore.propertystore.db.arrays.mapped_memory", "10M"//
    , "neostore.propertystore.db.index.keys.mapped_memory", "10M"//
    , "neostore.propertystore.db.index.mapped_memory", "10M"//
    );

    I suspect the problem is that I do not fully understand the configuration parameters, and perhaps tuning them better would improve the situation for me.

    thanks

    -


    --
    --
  • Tcb at Oct 21, 2012 at 8:06 am

    On Sat, Oct 20, 2012 at 11:13 PM, Michael Hunger wrote:

    How much RAM do you have in total?
    I'm currently running on a machine with 16GB of RAM. I need to be able to
    run the import under a memory constraint since I will need to import a much
    larger dataset in the future.

    How many properties do you currently store on the relationships and what is
    their content?
    The problem is it has to read a lot of different relationship-store
    windows (segments of the relationship-file) to update all the
    relationship-linked-lists and if those windows are swapped frequently their
    initial loading/memory-mapping takes a bit which adds up.
    I am storing 6 ints per relationship. I don't know what these relationship
    linked-lists are- you mean neo has a list of relationships for each start
    and end nodes? I don't really need the gory details, but perhaps there is a
    better way to structure the input data so that as little
    loading/memory-swapping is done...?

    Would it be possible for you to:

    #1 increase the mmio-setting for relationships?
    yes, a little- but I don't have nearly enough memory to cover the number of
    relationships I need to import.

    #2 pre-order the relationships by from and to-node pairs in your input
    data?
    yes- it was somewhat tricky, but I have the list of relationships ordered
    by from-to nodes. Not sure it makes too much difference since the number of
    nodes is small enough to be cached.

    #3 do you run your mac on SSD's ?
    no- there is the cost and the size of the data... but is changing to SSD's
    likely to help at all? So SSD's will be faster for sure, but I don't mind
    that it takes some time to import, its that the importing seems to
    progressively slow down beyond some size.

    My nodestore is only 103M so all the nodes should be easily cached. The
    data is a communication network- a list of a->contacts->b and the
    relationships are a list of properties like contact time and duration etc.
    I don't know what neo is doing when I call createRelationship(a, b, type,
    properties)- is it possible that it has to load all the relationships from
    a->b in order to create a new one? This should be easy enough since I have
    the relationships ordered by a->b. But if it also has to load all
    relationships from b too then there could be a problem.

    My data is not too different from a random (Erdos-Renyi) graph where you
    have N nodes and M edges, and each node is connected to some random
    fraction z of other nodes. I would expect others have similar large
    networks in neo- if so, how did you manage to import the data?

    Looks like an interesting use-case, do you have a data-generator for your
    data?
    I have real data which I can't share (apart from the size of it)- but I'll
    see if I can make a simple generator which reproduces something like it.


    thanks

    -

    Michael

    Am 20.10.2012 um 11:48 schrieb tcb:

    Hi

    Just looking at a profile, it seems that a good chunk of the time is spent
    in sun.nio.ch.FileChannelImpl.read(). This ultimately gets called once for
    every relationship that is created. The corresponding nio write of the
    relationship is actually very quick
    ("sun.nio.ch.FileChannelImpl.write()","0.23733096","1446795","14"). I don't
    know why the read() takes so long- I would expect nio to scale very well to
    files this size and larger. Perhaps I'll give it a try importing on a linux
    box and see if there is any difference in the performance. Any other
    suggestions?

    thanks

    -


    "Call Tree - Method","Time [%]","Time","Time (CPU)","Invocations",
    " All threads","100.0","609610719","1"
    " main","100.0","609610719","1"
    " org.neo.user.NeoImport.main()","100.0","609610719","2"
    " org.neo.user.NeoImport.doMainNew()","100.0","609610719","2"
    " org.neo.user.NeoImport.run()","100.0","609610719","2"
    " org.neo.user.NeoImport.importRels()","100.0","609610719","2"
    "
    org.neo4j.unsafe.batchinsert.BatchInserterImpl.createRelationship()","94.842926","578172623","293"
    "
    org.neo4j.unsafe.batchinsert.BatchInserterImpl.connectRelationship()","65.519844","399415993","863"
    "
    org.neo4j.unsafe.batchinsert.BatchInserterImpl.connect()","65.519844","399415993","1641"
    "
    org.neo4j.kernel.impl.nioneo.store.RelationshipStore.getRecord()","64.98595","396161289","1635"
    "
    org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow()","44.379433","270541769","1605"
    "
    org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire()","44.379433","270541769","1605"
    "
    org.neo4j.kernel.impl.nioneo.store.PersistenceRow.lock()","44.13959","269079690","1597"
    "
    org.neo4j.kernel.impl.nioneo.store.PersistenceRow$State$1.transition()","44.13959","269079690","1597"
    "
    org.neo4j.kernel.impl.nioneo.store.PersistenceRow.readFullWindow()","44.13959","269079690","1597"
    "
    sun.nio.ch.FileChannelImpl.read()","44.13959","269079690","1597"
    " Self time","0.0","0","1597"
    " Self time","0.0","0","1597"
    " Self time","0.0","0","1597"
    " Self time","0.16926737","1031872","1605"
    "
    java.util.concurrent.ConcurrentHashMap.putIfAbsent()","0.070570774","430207","5"
    " Self time","0.0","0","1605"
    " Self time","20.593021","125537264","1635"
    "
    On Fri, Oct 19, 2012 at 5:02 PM, tcb wrote:

    Hi,

    I have run into some issues with the BatchInserter, and given the recent
    attention on the list I hope you can shed some light.

    I am importing a large dataset and using a simple variation on Michael
    Hunger's imported (github.com/jexp/batch-import). I have a smallish
    number of nodes ~10^7 which pretty much fit in memory and lots of
    relationships ~10^10. The batch insertion runs along a good pace and then
    dramatically slows down (roughly 10 times slower) and I can see that CPU
    usage has dropped but neo4j is doing a lot of disk io.

    The question is why does the insertion slow down? I have set
    "cache_type=none" so once the nodes are loaded (which happens really fast)
    there is no need to cache any relationships- all I am doing is inserting
    them into the database. I obviously don't have enough memory to hold the
    full graph, but its not obvious why more memory is needed, beyond just
    holding the nodes themselves.

    In the neo directory the stores look like this (still importing at a
    snail's pace...):

    -rw-r--r-- 1 user user 5 Oct 19 14:28
    neo.db/neostore.relationshiptypestore.db
    -rw-r--r-- 1 user user 4.7G Oct 19 16:58
    neo.db/neostore.relationshipstore.db
    -rw-r--r-- 1 user user 36G Oct 19 16:58 neo.db/neostore.propertystore.db
    -rw-r--r-- 1 user user 103M Oct 19 16:58 neo.db/neostore.nodestore.db



    My current configuration parameters are below and I am using the
    1.9-SNAPSHOT on osx 10.8.2.

    stringMap("dump_configuration", "true"//
    , "use_memory_mapped_buffers", "true"//
    , "cache_type", "none" //
    , "neostore.nodestore.db.mapped_memory", "128M"//
    , "neostore.relationshipstore.db.mapped_memory", "2048M"//
    , "neostore.propertystore.db.mapped_memory", "1024M"//
    , "neostore.propertystore.db.strings.mapped_memory", "10M"//
    , "neostore.propertystore.db.arrays.mapped_memory", "10M"//
    , "neostore.propertystore.db.index.keys.mapped_memory", "10M"//
    , "neostore.propertystore.db.index.mapped_memory", "10M"//
    );


    I suspect the problem is that I do not fully understand the configuration
    parameters, and perhaps tuning them better would improve the situation for
    me.

    thanks

    -

    --




    --

    --
  • Tcb at Oct 22, 2012 at 10:49 am

    On Sun, Oct 21, 2012 at 9:06 AM, tcb wrote:

    On Sat, Oct 20, 2012 at 11:13 PM, Michael Hunger <
    michael.hunger@neotechnology.com> wrote:
    How much RAM do you have in total?
    I'm currently running on a machine with 16GB of RAM. I need to be able to
    run the import under a memory constraint since I will need to import a much
    larger dataset in the future.

    How many properties do you currently store on the relationships and what
    is their content?
    The problem is it has to read a lot of different relationship-store
    windows (segments of the relationship-file) to update all the
    relationship-linked-lists and if those windows are swapped frequently their
    initial loading/memory-mapping takes a bit which adds up.
    I am storing 6 ints per relationship. I don't know what these relationship
    linked-lists are- you mean neo has a list of relationships for each start
    and end nodes? I don't really need the gory details, but perhaps there is a
    better way to structure the input data so that as little
    loading/memory-swapping is done...?

    Would it be possible for you to:

    #1 increase the mmio-setting for relationships?
    yes, a little- but I don't have nearly enough memory to cover the number
    of relationships I need to import.

    #2 pre-order the relationships by from and to-node pairs in your input
    data?
    yes- it was somewhat tricky, but I have the list of relationships ordered
    by from-to nodes. Not sure it makes too much difference since the number of
    nodes is small enough to be cached.

    #3 do you run your mac on SSD's ?
    no- there is the cost and the size of the data... but is changing to SSD's
    likely to help at all? So SSD's will be faster for sure, but I don't mind
    that it takes some time to import, its that the importing seems to
    progressively slow down beyond some size.

    My nodestore is only 103M so all the nodes should be easily cached. The
    data is a communication network- a list of a->contacts->b and the
    relationships are a list of properties like contact time and duration etc.
    I don't know what neo is doing when I call createRelationship(a, b, type,
    properties)- is it possible that it has to load all the relationships from
    a->b in order to create a new one? This should be easy enough since I have
    the relationships ordered by a->b. But if it also has to load all
    relationships from b too then there could be a problem.

    My data is not too different from a random (Erdos-Renyi) graph where you
    have N nodes and M edges, and each node is connected to some random
    fraction z of other nodes. I would expect others have similar large
    networks in neo- if so, how did you manage to import the data?

    Looks like an interesting use-case, do you have a data-generator for your
    data?
    I have real data which I can't share (apart from the size of it)- but I'll
    see if I can make a simple generator which reproduces something like it.


    thanks
    Hi,

    It seems that my mmio setting for the relationship store was too small-
    adjusting this to cover the required number of relationships improved
    importing performance a lot. It still slows down after a bit, but I think
    it will do for now. If I can just get the data into neo I can get on with
    other stuff.

    Here is a simple model generator for my data. It uses the batch inserter to
    create relationships a->b between nodes ordered by a and then b, with some
    integer properties on each relationship. Its not a perfect model, but its
    close enough and it may serve as a good test of import performance.

    ///
    /// I am running it like this: java -jar importer.jar neo.db 12000000
    300000000
    package ie.ucd.neo;

    import static org.neo4j.helpers.collection.MapUtil.map;
    import static org.neo4j.helpers.collection.MapUtil.stringMap;

    import java.io.File;
    import java.io.IOException;
    import java.util.Arrays;
    import java.util.HashMap;
    import java.util.Map;
    import java.util.Random;
    import java.util.logging.Logger;

    import org.neo4j.graphdb.DynamicRelationshipType;
    import org.neo4j.graphdb.RelationshipType;
    import org.neo4j.kernel.impl.util.FileUtils;
    import org.neo4j.unsafe.batchinsert.BatchInserter;
    import org.neo4j.unsafe.batchinsert.BatchInserters;

    public class Importer {
    private static Logger logger = Logger.getLogger("ie.ucd.neo");
    private BatchInserter graphDb = null;
    private static Report report = null;
    static Importer importer = null;
    private int numberNodes, numberRels;
    static String storeDir = null;
    private static Random random = new Random();

    static class StdOutReport implements Report {
    private long batch, dots, count;
    private long total = System.currentTimeMillis(), time, batchTime;

    public StdOutReport(long batch, int dots) {
    this.batch = batch;
    this.dots = batch / dots;
    }

    @Override
    public void reset() {
    count = 0;
    batchTime = time = System.currentTimeMillis();
    }

    @Override
    public void finish() {
    logger.info("\nTotal import time: " + (System.currentTimeMillis() - total)
    / 1000
    + " seconds ");
    }

    @Override
    public void dots() {
    if ((++count % dots) != 0)
    return;
    System.out.print(".");
    if ((count % batch) != 0)
    return;
    long now = System.currentTimeMillis();
    //System.out.println("\nImport: " + batch + " / " + count + " in " + (now -
    batchTime) + " ms " + Runtime.getRuntime().freeMemory());
    long propertystore_l = new File(storeDir +
    "/neostore.propertystore.db").length();
    long relationshipstore_l = new File(storeDir +
    "/neostore.relationshipstore.db").length();
    long nodestore_l = new File(storeDir + "/neostore.nodestore.db").length();
    System.out.println("\nImport: " + batch + " " + count + " " + (now -
    batchTime) + " " + Runtime.getRuntime().freeMemory() + " " //
    + nodestore_l + " " + relationshipstore_l + " " + propertystore_l);
    batchTime = now;
    }

    @Override
    public void finishImport(String type) {
    System.out.println("\nImporting " + count + " " + type + " took "
    + (System.currentTimeMillis() - time) / 1000 + " seconds ");
    }
    }

    public Importer(File dbPath, Integer numberNodes, Integer numberRels) {
    this.graphDb = BatchInserters.inserter(dbPath.getAbsolutePath(),
    getConfigBatch());
    storeDir = graphDb.getStoreDir();
    report = this.createReport();
    this.numberNodes = numberNodes;
    this.numberRels = numberRels;
    }

    public void importNodes() {
    report.reset();
    for (int i = 1; i < numberNodes + 1; i++) {
    graphDb.createNode((long) i, map("key", i));
    //report.dots();
    }
    report.finishImport("importNodes()");
    }

    public void importRels() {
    int edgeCount = (int) (numberRels / numberNodes);
    RelationshipType relType = DynamicRelationshipType.withName("contacts");
    Map<String, Object> properties = new HashMap<String, Object>();
    report.reset();
    int [] contacts = new int[edgeCount];
    for (int startNode = 1; startNode < numberNodes + 1; startNode++) {
    // add one to exclude the reference node- this choose edgeCount other
    nodes, possibly with duplicates
    for(int k = 0; k < edgeCount; ++k) {
    contacts[k] = 1 + random.nextInt(numberNodes);
    }
    Arrays.sort(contacts);
    //StringBuffer s = new StringBuffer();
    for(Integer endNode : contacts) {
    //s.append(v).append(" ");
    properties.put("prop_a", random.nextLong());
    properties.put("prop_b", random.nextLong());
    properties.put("prop_c", random.nextLong());
    properties.put("prop_d", random.nextLong());
    properties.put("prop_e", random.nextLong());
    properties.put("prop_f", random.nextLong());
    graphDb.createRelationship((long) startNode, (long) endNode, relType,
    properties);
    report.dots();
    }
    //System.out.println(i + " " + s.toString());
    }
    report.finishImport("importRels");
    }

    private Map<String, String> getConfigBatch() {
    return stringMap(//
    "dump_configuration", "true"//
    , "use_memory_mapped_buffers", "true"//
    , "cache_type", "none" //
    , "neostore.nodestore.db.mapped_memory", "128M"//
    , "neostore.relationshipstore.db.mapped_memory", "7G"//
    , "neostore.propertystore.db.mapped_memory", "1024M"//
    , "neostore.propertystore.db.strings.mapped_memory", "10M"//
    , "neostore.propertystore.db.arrays.mapped_memory", "10M"//
    , "neostore.propertystore.db.index.keys.mapped_memory", "10M"//
    , "neostore.propertystore.db.index.mapped_memory", "10M"//
    );
    }

    protected StdOutReport createReport() {
    return new StdOutReport(1000 * 1000, 10);
    }

    private void shutdown() {
    System.out.println("shutting down...");
    graphDb.shutdown();
    }

    public static void main(String[] args) throws IOException {
    if (args.length < 3) {
    System.err.println("Usage:");
    }
    File dbPath = new File(args[0]);
    Integer numberNodes = Integer.parseInt(args[1]);
    Integer numberRels = Integer.parseInt(args[2]);
    if (dbPath.exists()) {
    FileUtils.deleteRecursively(dbPath);
    }

    registerShutdownHook();
    importer = new Importer(dbPath, numberNodes, numberRels);
    importer.importNodes();
    importer.importRels();
    }

    private static void registerShutdownHook() {
    // Registers a shutdown hook for the Neo4j and index service instances
    // so that it shuts down nicely when the VM exits (even if you
    // "Ctrl-C" the running example before it's completed)
    Runtime.getRuntime().addShutdownHook(new Thread() {
    @Override
    public void run() {
    if (importer != null) {
    logger.info("shutdownHook: shutting down");
    importer.shutdown();
    }
    }
    });
    }
    }
    ///

    --

Related Discussions

Discussion Navigation
viewthread | post

2 users in discussion

Tcb: 4 posts Michael Hunger: 1 post