[
https://issues.apache.org/jira/browse/HADOOP-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko updated HADOOP-1687:
----------------------------------------
Description:
I've done some estimates on how much space our data structures take on the name-node per block, file and directory.
Brief overview of the data structures:
Directory tree (FSDirectory) is built of inodes. Each INode points either to an array of blocks
if it corresponds to a file or to a TreeMap<String, INode> of children INodes if it is a directory.
[Note: this estimates were made before Dhruba replaced the children TreeMap by ArrayList.]
Each block participates also in at least 2 more data structures.
BlocksMap contains a HashMap<Block, BlockInfo> of all blocks mapping a Block into a BlockInfo.
DatanodeDescriptor contains a TreeMap<Block, Block> of all blocks belonging to this data-node.
A block may or may not be contained also in other data-structures, like
{code}
UnderReplicatedBlocks
PendingReplicationBlocks
recentInvalidateSets
excessReplicateMap
{code}
Presence of a block in any of these structures is temporary and therefore I do not count them in my estimates.
The estimates can be viewed as lower bounds.
These are some classes that we are looking at here
{code}
class INode {
String name;
INode parent;
TreeMap<String, INode> children;
Block blocks[];
short blockReplication;
long modificationTime;
}
class Block {
long blkid;
long len;
}
class BlockInfo {
FSDirectory.INode inode;
DatanodeDescriptor[] nodes;
Block block;
}
{code}
The calculations are made for a 64-bit java vm based on that
Reference size = 8 bytes
Object header size = 16 bytes
Array header size = 24 bytes
Commonly used objects:
TreeMap.Entry = 64 bytes. It has 5 reference fields
HashMap.Entry = 48 bytes. It has 3 reference fields
String header = 64 bytes.
The size of a file includes:
# Size of an empty file INode: INode.children = null, INode.blocks is a 0-length array, and file name is empty. (152 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2, because String represents each name character by 2 bytes.
# Reference to the outer FSDirectory class (8 bytes)
The total: 224 + 2 * fileName.length
The size of a directory includes:
# Size of an empty directory INode: INode.children is an empty TreeMap, INode.blocks = null, and file name is empty. (192 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2
# Reference to the outer FSDirectory class (8 bytes)
The total: 264 + 2 * fileName.length
The size of a block includes:
# Size of Block. (32 bytes)
# Size of BlockInfo. (64 + 8*replication bytes)
# Reference to the block from INode.blocks (8 bytes)
# HashMap.Entry referencing the block from BlocksMap. (48 bytes)
# References to the block from all DatanodeDescriptors it belongs to.
This is a TreeMap.Entry size times block replication. (64 * replication)
The total: 152 + 72 * replication
Typical object sizes:
Taking into account that typical file name is 10-15 bytes and our default replication is 3 we can say that typical sizes are
File size: 250
Directory size: 290
Block size: 368
Object||size estimate (bytes)||typical size (bytes)||
File|224 + 2 * fileName.length|250|
Directory|264 + 2 * fileName.length|290|
Block|152 + 72 * replication|368|
One of our clusters has
# Files: 10 600 000
# Dirs: 310 000
# Blocks: 13 300 000
Total Size (estimate): 7,63 GB
Memory used on the name-node (actual reported by jconsole after gc): 9 GB
This means that other data structures like NetworkTopology, heartbeats, datanodeMap, Host2NodesMap,
leases, sortedLeases, and multiple block replication maps occupy ~1.4 GB, which seems to be pretty high
and need to be investigated as well.
Based on the above estimates blocks should be the main focus of the name-node memory reduction effort.
Space used by a block is 50% larger compared to a file, and there is more blocks than files or directories.
Some ideas on how we can reduce the name-node size without substantially changing the data structures.
# INode.children should be an ArrayList instead of a TreeMap. Already done HADOOP-1565. (-48 bytes)
# Factor out the INode class into a separate class (-8 bytes)
# Create base INode class and derive file inode and directory inode classes from the base.
Directory inodes do not need to contain blocks and replication fields (-16 bytes)
File inodes do not need to contain children field (-8 bytes)
# String name should be replaced by a mere byte[]. (-(40 + fileName.length) ~ -50 bytes)
# Eliminate the Block object.
We should move Block fields into into BlockInfo and completely get rid of the Block object. (-16 bytes)
# Block object is referenced at least 5 times in our structures for each physical block.
The number of references should be reduced to just 2. (-24)
# Remove name field from INode. File or directory name is stored in the corresponding directory
entry and does need to be duplicated in the INode (-8 bytes)
# Eliminate INode.parent field. INodes are accessed through the directory tree, and the parent can
be remembered in a local variable while browsing the tree. There is no need to persistently store
the parent reference for each object. (-8 bytes)
# Need to optimize data-node to block map. Currently each DatanodeDescriptor holds a TreeMap of
blocks contained in the node, and we have an overhead of one TreeMap.Entry per block replica.
I expect we can reorganize datanodeMap in a way that it stores only 1 or 2 references per replica
instead of an entire TreeMap.Entry. (-48 * replication)
Note: In general TreeMaps turned out to be very expensive, we should avoid using them if possible.
Or implement a custom map structure, which would avoid using objects for each map entry.
This is what we will have after all the optimizations
Object||size estimate (bytes)||typical size (bytes)||current typical size (bytes)||
File|112 + fileName.length|125|250|
Directory|144 + fileName.length|155|290|
Block|112 + 24 * replication|184|368|
was:
I've done some estimates on how much space our data structures take on the name-node per block, file and directory.
Brief overview of the data structures:
Directory tree (FSDirectory) is built of inodes. Each INode points either to an array of blocks
if it corresponds to a file or to a TreeMap<String, INode> of children INodes if it is a directory.
[Note: this estimates were made before Dhruba replaced the children TreeMap by ArrayList.]
Each block participates also in at least 2 more data structures.
BlocksMap contains a HashMap<Block, BlockInfo> of all blocks mapping a Block into a BlockInfo.
DatanodeDescriptor contains a TreeMap<Block, Block> of all blocks belonging to this data-node.
A block may or may not be contained also in other data-structures, like
{code}
UnderReplicatedBlocks
PendingReplicationBlocks
recentInvalidateSets
excessReplicateMap
{code}
Presence of a block in any of these structures is temporary and therefore I do not count them in my estimates.
The estimates can be viewed as lower bounds.
These are some classes that we are looking at here
{code}
class INode {
String name;
INode parent;
TreeMap<String, INode> children;
Block blocks[];
short blockReplication;
long modificationTime;
}
class Block {
long blkid;
long len;
}
class BlockInfo {
FSDirectory.INode inode;
DatanodeDescriptor[] nodes;
Block block;
}
{code}
The calculations are made for a 64-bit java vm based on that
Reference size = 8 bytes
Object header size = 16 bytes
Array header size = 24 bytes
Commonly used objects:
TreeMap.Entry = 64 bytes. It has 5 reference fields
HashMap.Entry = 48 bytes. It has 3 reference fields
String header = 64 bytes.
The size of a file includes:
# Size of an empty file INode: INode.children = null, INode.blocks is a 0-length array, and file name is empty. (152 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2, because String represents each name character by 2 bytes.
# Reference to the outer FSDirectory class (8 bytes)
The total: 224 + 2 * fileName.length
The size of a directory includes:
# Size of an empty directory INode: INode.children is an empty TreeMap, INode.blocks = null, and file name is empty. (192 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2
# Reference to the outer FSDirectory class (8 bytes)
The total: 264 + 2 * fileName.length
The size of a block includes:
# Size of Block. (32 bytes)
# Size of BlockInfo. (64 + 8*replication bytes)
# Reference to the block from INode.blocks (8 bytes)
# HashMap.Entry referencing the block from BlocksMap. (48 bytes)
# References to the block from all DatanodeDescriptors it belongs to.
This is a TreeMap.Entry size times block replication. (64 * replication)
The total: 152 + 72 * replication
Typical object sizes:
Taking into account that typical file name is 10-15 bytes and our default replication is 3 we can say that typical sizes are
File size: 250
Directory size: 290
Block size: 368
Object||size estimate (bytes)||typical size (bytes)||
File|224 + 2 * fileName.length|250|
Directory|264 + 2 * fileName.length|290|
Block|152 + 72 * replication|368|
One of our clusters has
# Files: 10 600 000
# Dirs: 310 000
# Blocks: 13 300 000
Total Size (estimate): 7,63 GB
Memory used on the name-node (actual reported by jconsole after gc): 9 GB
This means that other data structures like NetworkTopology, heartbeats, datanodeMap, Host2NodesMap,
leases, sortedLeases, and multiple block replication maps occupy ~1.4 GB, which seems to be pretty high
and need to be investigated as well.
Based on the above estimates blocks should be the main focus of the name-node memory reduction effort.
Space used by a block is 50% larger compared to a file, and there is more blocks than files or directories.
Some ideas on how we can reduce the name-node size without substantially changing the data structures.
# INode.children should be an ArrayList instead of a TreeMap. Already done HADOOP-1565. (-48 bytes)
# Factor out the INode class into a separate class (-8 bytes)
# Create base INode class and derive file inode and directory inode classes from the base.
Directory inodes do not need to contain blocks and replication fields (-16 bytes)
File inodes do not need to contain children field (-8 bytes)
# String name should be replaced by a mere byte[]. (-(40 + fileName.length) ~ -50 bytes)
# Eliminate the Block object.
We should move Block fields into into BlockInfo and completely get rid of the Block object. (-16 bytes)
# Block object is referenced at least 5 times in our structures for each physical block.
The number of references should be reduced to just 2. (-24)
# Remove name field from INode. File or directory name is stored in the corresponding directory
entry and does need to be duplicated in the INode (-8 bytes)
# Eliminate INode.parent field. INodes are accessed through the directory tree, and the parent can
be remembered in a local variable while browsing the tree. There is no need to persistently store
the parent reference for each object. (-8 bytes)
# Need to optimize data-node to block map. Currently each DatanodeDescriptor holds a TreeMap of
blocks contained in the node, and we have an overhead of one TreeMap.Entry per block replica.
I expect we can reorganize datanodeMap in a way that it stores only 1 or 2 references per replica
instead of an entire TreeMap.Entry. (-48 * replication)
Note: In general TreeMaps turned out to be very expensive, we should avoid using them if possible.
Or implement a custom map structure, which would avoid using objects for each map entry.
This is what we will have after all the optimizations
Object||size estimate (bytes)||typical size (bytes)||current typical size (bytes)||
File|112 + fileName.length|125|250|
Directory|144 + fileName.length|155|290|
Block|112 + 24 * replication|184|368|
Name-node memory size estimates and optimization proposal.
----------------------------------------------------------
Key: HADOOP-1687
URL:
https://issues.apache.org/jira/browse/HADOOP-1687Project: Hadoop
Issue Type: Improvement
Components: dfs
Affects Versions: 0.13.1
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
Fix For: 0.15.0
I've done some estimates on how much space our data structures take on the name-node per block, file and directory.
Brief overview of the data structures:
Directory tree (FSDirectory) is built of inodes. Each INode points either to an array of blocks
if it corresponds to a file or to a TreeMap<String, INode> of children INodes if it is a directory.
[Note: this estimates were made before Dhruba replaced the children TreeMap by ArrayList.]
Each block participates also in at least 2 more data structures.
BlocksMap contains a HashMap<Block, BlockInfo> of all blocks mapping a Block into a BlockInfo.
DatanodeDescriptor contains a TreeMap<Block, Block> of all blocks belonging to this data-node.
A block may or may not be contained also in other data-structures, like
{code}
UnderReplicatedBlocks
PendingReplicationBlocks
recentInvalidateSets
excessReplicateMap
{code}
Presence of a block in any of these structures is temporary and therefore I do not count them in my estimates.
The estimates can be viewed as lower bounds.
These are some classes that we are looking at here
{code}
class INode {
String name;
INode parent;
TreeMap<String, INode> children;
Block blocks[];
short blockReplication;
long modificationTime;
}
class Block {
long blkid;
long len;
}
class BlockInfo {
FSDirectory.INode inode;
DatanodeDescriptor[] nodes;
Block block;
}
{code}
The calculations are made for a 64-bit java vm based on that
Reference size = 8 bytes
Object header size = 16 bytes
Array header size = 24 bytes
Commonly used objects:
TreeMap.Entry = 64 bytes. It has 5 reference fields
HashMap.Entry = 48 bytes. It has 3 reference fields
String header = 64 bytes.
The size of a file includes:
# Size of an empty file INode: INode.children = null, INode.blocks is a 0-length array, and file name is empty. (152 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2, because String represents each name character by 2 bytes.
# Reference to the outer FSDirectory class (8 bytes)
The total: 224 + 2 * fileName.length
The size of a directory includes:
# Size of an empty directory INode: INode.children is an empty TreeMap, INode.blocks = null, and file name is empty. (192 bytes)
# A directory entry of the parent INode that points to this file, which is a TreeMap.Entry. (64 bytes)
# file name length times 2
# Reference to the outer FSDirectory class (8 bytes)
The total: 264 + 2 * fileName.length
The size of a block includes:
# Size of Block. (32 bytes)
# Size of BlockInfo. (64 + 8*replication bytes)
# Reference to the block from INode.blocks (8 bytes)
# HashMap.Entry referencing the block from BlocksMap. (48 bytes)
# References to the block from all DatanodeDescriptors it belongs to.
This is a TreeMap.Entry size times block replication. (64 * replication)
The total: 152 + 72 * replication
Typical object sizes:
Taking into account that typical file name is 10-15 bytes and our default replication is 3 we can say that typical sizes are
File size: 250
Directory size: 290
Block size: 368
Object||size estimate (bytes)||typical size (bytes)||
File|224 + 2 * fileName.length|250|
Directory|264 + 2 * fileName.length|290|
Block|152 + 72 * replication|368|
One of our clusters has
# Files: 10 600 000
# Dirs: 310 000
# Blocks: 13 300 000
Total Size (estimate): 7,63 GB
Memory used on the name-node (actual reported by jconsole after gc): 9 GB
This means that other data structures like NetworkTopology, heartbeats, datanodeMap, Host2NodesMap,
leases, sortedLeases, and multiple block replication maps occupy ~1.4 GB, which seems to be pretty high
and need to be investigated as well.
Based on the above estimates blocks should be the main focus of the name-node memory reduction effort.
Space used by a block is 50% larger compared to a file, and there is more blocks than files or directories.
Some ideas on how we can reduce the name-node size without substantially changing the data structures.
# INode.children should be an ArrayList instead of a TreeMap. Already done HADOOP-1565. (-48 bytes)
# Factor out the INode class into a separate class (-8 bytes)
# Create base INode class and derive file inode and directory inode classes from the base.
Directory inodes do not need to contain blocks and replication fields (-16 bytes)
File inodes do not need to contain children field (-8 bytes)
# String name should be replaced by a mere byte[]. (-(40 + fileName.length) ~ -50 bytes)
# Eliminate the Block object.
We should move Block fields into into BlockInfo and completely get rid of the Block object. (-16 bytes)
# Block object is referenced at least 5 times in our structures for each physical block.
The number of references should be reduced to just 2. (-24)
# Remove name field from INode. File or directory name is stored in the corresponding directory
entry and does need to be duplicated in the INode (-8 bytes)
# Eliminate INode.parent field. INodes are accessed through the directory tree, and the parent can
be remembered in a local variable while browsing the tree. There is no need to persistently store
the parent reference for each object. (-8 bytes)
# Need to optimize data-node to block map. Currently each DatanodeDescriptor holds a TreeMap of
blocks contained in the node, and we have an overhead of one TreeMap.Entry per block replica.
I expect we can reorganize datanodeMap in a way that it stores only 1 or 2 references per replica
instead of an entire TreeMap.Entry. (-48 * replication)
Note: In general TreeMaps turned out to be very expensive, we should avoid using them if possible.
Or implement a custom map structure, which would avoid using objects for each map entry.
This is what we will have after all the optimizations
Object||size estimate (bytes)||typical size (bytes)||current typical size (bytes)||
File|112 + fileName.length|125|250|
Directory|144 + fileName.length|155|290|
Block|112 + 24 * replication|184|368|
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.