Grokbase Groups HBase user May 2016
FAQ
Hi,

We saw a very strange case in one of our production cluster. A couple
regions cannot get their deleted rows or delete marker removed even after
major compaction. However when the region triggered split (we set 100G for
auto split), the deletion worked. The 100G region becomes two 10G daughter
regions, and all the delete marker are gone.

Also, the same region in the slave cluster (through replication) have
normal size at about 20+G.

BTW, the delete marker in the regions are mostly deleteFamily if it
matters.

This is really weird. Anyone has any clue for this strange behavior?

Thanks
Tian-Ying

A snippet of the HFile generated by the major compaction:

: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
V:

Search Discussions

  • Frank Luo at May 27, 2016 at 9:55 pm
    What if you manually trigger major-compact on that particular region? Does it run and the delete markers removed?

    -----Original Message-----
    From: Tianying Chang
    Sent: Friday, May 27, 2016 4:33 PM
    To: user@hbase.apache.org
    Subject: Major compaction cannot remove deleted rows until the region is split. Strange!

    Hi,

    We saw a very strange case in one of our production cluster. A couple regions cannot get their deleted rows or delete marker removed even after major compaction. However when the region triggered split (we set 100G for auto split), the deletion worked. The 100G region becomes two 10G daughter regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication) have normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if it matters.

    This is really weird. Anyone has any clue for this strange behavior?

    Thanks
    Tian-Ying

    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
    Join us at Merkle’s 2016 annual Performance Marketer Executive Summit – June 7 – 9 in Memphis, TN
    <http://www2.merkleinc.com/l/47252/2016-04-26/3lbfdc>

    Download the latest installment of our annual Marketing Imperatives, “Winning with People-Based Marketing”<http://www2.merkleinc.com/l/47252/2016-04-26/3lbfd1>
    This email and any attachments transmitted with it are intended for use by the intended recipient(s) only. If you have received this email in error, please notify the sender immediately and then delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or distribute this email without the author’s prior permission. We take precautions to minimize the risk of transmitting software viruses, but we advise you to perform your own virus checks on any attachment to this message. We cannot accept liability for any loss or damage caused by software viruses. The information contained in this communication may be confidential and may be subject to the attorney-client privilege.
  • Tianying Chang at May 27, 2016 at 10:13 pm
    Yes, I tried manually trigger major compact on the problematic region, and
    it is the same that the region did not get any deleted rows removed,
    therefore the size is kept at 90+G. And I verified that if do a split, then
    the major compaction will work as expected, i.e. into two 10+G regions.

    Thanks
    Tian-Ying
    On Fri, May 27, 2016 at 2:54 PM, Frank Luo wrote:

    What if you manually trigger major-compact on that particular region? Does
    it run and the delete markers removed?

    -----Original Message-----
    From: Tianying Chang
    Sent: Friday, May 27, 2016 4:33 PM
    To: user@hbase.apache.org
    Subject: Major compaction cannot remove deleted rows until the region is
    split. Strange!

    Hi,

    We saw a very strange case in one of our production cluster. A couple
    regions cannot get their deleted rows or delete marker removed even after
    major compaction. However when the region triggered split (we set 100G for
    auto split), the deletion worked. The 100G region becomes two 10G daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication) have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if it
    matters.

    This is really weird. Anyone has any clue for this strange behavior?

    Thanks
    Tian-Ying

    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
    Join us at Merkle’s 2016 annual Performance Marketer Executive Summit –
    June 7 – 9 in Memphis, TN
    <http://www2.merkleinc.com/l/47252/2016-04-26/3lbfdc>

    Download the latest installment of our annual Marketing Imperatives,
    “Winning with People-Based Marketing”<
    http://www2.merkleinc.com/l/47252/2016-04-26/3lbfd1>
    This email and any attachments transmitted with it are intended for use by
    the intended recipient(s) only. If you have received this email in error,
    please notify the sender immediately and then delete it. If you are not the
    intended recipient, you must not keep, use, disclose, copy or distribute
    this email without the author’s prior permission. We take precautions to
    minimize the risk of transmitting software viruses, but we advise you to
    perform your own virus checks on any attachment to this message. We cannot
    accept liability for any loss or damage caused by software viruses. The
    information contained in this communication may be confidential and may be
    subject to the attorney-client privilege.
  • Michael Stack at May 27, 2016 at 10:13 pm

    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang wrote:

    Hi,

    We saw a very strange case in one of our production cluster. A couple
    regions cannot get their deleted rows or delete marker removed even after
    major compaction. However when the region triggered split (we set 100G for
    auto split), the deletion worked. The 100G region becomes two 10G daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication) have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if it
    matters.

    This is really weird. Anyone has any clue for this strange behavior?

    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see incidence where
    there may have been versions older than the DeleteFamily that are also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
  • Tianying Chang at May 27, 2016 at 10:18 pm
    Yes, it is 94.26. By a quick glance, I didn't see any put that is older
    than the delete marker's TS, which could go as far as about couple weeks
    ago since major compaction on it for long time seems.

    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same region
    replicated at the slave side is totally normal, i.e. at 20+G....
    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:
    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang wrote:

    Hi,

    We saw a very strange case in one of our production cluster. A couple
    regions cannot get their deleted rows or delete marker removed even after
    major compaction. However when the region triggered split (we set 100G for
    auto split), the deletion worked. The 100G region becomes two 10G daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication) have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if it
    matters.

    This is really weird. Anyone has any clue for this strange behavior?

    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see incidence where
    there may have been versions older than the DeleteFamily that are also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
  • Michael Stack at May 29, 2016 at 11:35 pm

    On Fri, May 27, 2016 at 3:17 PM, Tianying Chang wrote:

    Yes, it is 94.26. By a quick glance, I didn't see any put that is older
    than the delete marker's TS, which could go as far as about couple weeks
    ago since major compaction on it for long time seems.
    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same region
    replicated at the slave side is totally normal, i.e. at 20+G....
    If you move the region to another server, does that work?

    Looking in 0.94 codebase, I see this in Compactor#compact


           // For major compactions calculate the earliest put timestamp

           // of all involved storefiles. This is used to remove

           // family delete marker during the compaction.

           if (majorCompaction) {

             tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

             if (tmp == null) {

               // There's a file with no information, must be an old one

               // assume we have very old puts

               earliestPutTs = HConstants.OLDEST_TIMESTAMP;

             } else {

               earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));

             }

           }


    The above is followed by this log line:


           if (LOG.isDebugEnabled()) {

             LOG.debug("Compacting " + file +

               ", keycount=" + keyCount +

               ", bloomtype=" + r.getBloomFilterType().toString() +

               ", size=" + StringUtils.humanReadableInt(r.length()) +

               ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

               (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));

           }

    This prints out earliestPutTs. You see that in the logs? You running with
    DEBUG? Does the earliest put ts preclude our dropping delete family?


    Looking more in code, we retain deletes in following circumstances:


         this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT || scan
    .isRaw();


    So, for sure we are running major compaction?

    Otherwise, have to dig in a bit more here.. This stuff is a little involved.
    St.Ack



    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:
    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang wrote:

    Hi,

    We saw a very strange case in one of our production cluster. A couple
    regions cannot get their deleted rows or delete marker removed even
    after
    major compaction. However when the region triggered split (we set 100G for
    auto split), the deletion worked. The 100G region becomes two 10G daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication) have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if it
    matters.

    This is really weird. Anyone has any clue for this strange behavior?

    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see incidence where
    there may have been versions older than the DeleteFamily that are also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
  • Tianying Chang at May 31, 2016 at 10:43 pm
    Hi, Stack

    Based on the log, the major compaction was run, and it took 5+ hours. And
    I also manually run major_compact from hbase shell explicitly to verify.

    I just moved the region to a different RS and issued a major_compact on
    that region again, let me see if the major compaction can succeed and will
    report back.

    Thanks
    Tian-Ying
    On Sun, May 29, 2016 at 4:35 PM, Stack wrote:
    On Fri, May 27, 2016 at 3:17 PM, Tianying Chang wrote:

    Yes, it is 94.26. By a quick glance, I didn't see any put that is older
    than the delete marker's TS, which could go as far as about couple weeks
    ago since major compaction on it for long time seems.
    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same region
    replicated at the slave side is totally normal, i.e. at 20+G....
    If you move the region to another server, does that work?

    Looking in 0.94 codebase, I see this in Compactor#compact


    // For major compactions calculate the earliest put timestamp

    // of all involved storefiles. This is used to remove

    // family delete marker during the compaction.

    if (majorCompaction) {

    tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

    if (tmp == null) {

    // There's a file with no information, must be an old one

    // assume we have very old puts

    earliestPutTs = HConstants.OLDEST_TIMESTAMP;

    } else {

    earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));

    }

    }


    The above is followed by this log line:


    if (LOG.isDebugEnabled()) {

    LOG.debug("Compacting " + file +

    ", keycount=" + keyCount +

    ", bloomtype=" + r.getBloomFilterType().toString() +

    ", size=" + StringUtils.humanReadableInt(r.length()) +

    ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

    (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));

    }

    This prints out earliestPutTs. You see that in the logs? You running with
    DEBUG? Does the earliest put ts preclude our dropping delete family?


    Looking more in code, we retain deletes in following circumstances:


    this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT || scan
    .isRaw();


    So, for sure we are running major compaction?

    Otherwise, have to dig in a bit more here.. This stuff is a little
    involved.
    St.Ack



    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:

    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Hi,

    We saw a very strange case in one of our production cluster. A couple
    regions cannot get their deleted rows or delete marker removed even
    after
    major compaction. However when the region triggered split (we set
    100G
    for
    auto split), the deletion worked. The 100G region becomes two 10G daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication) have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if it
    matters.

    This is really weird. Anyone has any clue for this strange behavior?

    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see incidence where
    there may have been versions older than the DeleteFamily that are also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
  • Tianying Chang at Jun 1, 2016 at 5:56 pm
    Hi, Stack

    After moving the region and issue a major compact on that region, its size
    shrink from 99G down to 24G. So it looks like the region is in a bad state
    that cannot recover, close/open it fixed the issue. And from the region
    size metric graph, we can see major compaction stop working since March
    31, so some bug that caused region enter into bad state... Unfortunately,
    we don't have DEBUG enabled and that is the last region that has the issue,
    it is hard to figure out what is the bug that caused the bad state...

    Thanks
    Tian-Ying
    On Tue, May 31, 2016 at 3:43 PM, Tianying Chang wrote:

    Hi, Stack

    Based on the log, the major compaction was run, and it took 5+ hours. And
    I also manually run major_compact from hbase shell explicitly to verify.

    I just moved the region to a different RS and issued a major_compact on
    that region again, let me see if the major compaction can succeed and will
    report back.

    Thanks
    Tian-Ying
    On Sun, May 29, 2016 at 4:35 PM, Stack wrote:

    On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Yes, it is 94.26. By a quick glance, I didn't see any put that is older
    than the delete marker's TS, which could go as far as about couple weeks
    ago since major compaction on it for long time seems.
    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same region
    replicated at the slave side is totally normal, i.e. at 20+G....
    If you move the region to another server, does that work?

    Looking in 0.94 codebase, I see this in Compactor#compact


    // For major compactions calculate the earliest put timestamp

    // of all involved storefiles. This is used to remove

    // family delete marker during the compaction.

    if (majorCompaction) {

    tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

    if (tmp == null) {

    // There's a file with no information, must be an old one

    // assume we have very old puts

    earliestPutTs = HConstants.OLDEST_TIMESTAMP;

    } else {

    earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));

    }

    }


    The above is followed by this log line:


    if (LOG.isDebugEnabled()) {

    LOG.debug("Compacting " + file +

    ", keycount=" + keyCount +

    ", bloomtype=" + r.getBloomFilterType().toString() +

    ", size=" + StringUtils.humanReadableInt(r.length()) +

    ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

    (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));

    }

    This prints out earliestPutTs. You see that in the logs? You running with
    DEBUG? Does the earliest put ts preclude our dropping delete family?


    Looking more in code, we retain deletes in following circumstances:


    this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT ||
    scan
    .isRaw();


    So, for sure we are running major compaction?

    Otherwise, have to dig in a bit more here.. This stuff is a little
    involved.
    St.Ack



    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:

    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Hi,

    We saw a very strange case in one of our production cluster. A
    couple
    regions cannot get their deleted rows or delete marker removed even
    after
    major compaction. However when the region triggered split (we set
    100G
    for
    auto split), the deletion worked. The 100G region becomes two 10G daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication)
    have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if it
    matters.

    This is really weird. Anyone has any clue for this strange behavior?

    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see incidence where
    there may have been versions older than the DeleteFamily that are also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
  • Michael Stack at Jun 1, 2016 at 6:50 pm

    On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang wrote:

    Hi, Stack

    After moving the region and issue a major compact on that region, its size
    shrink from 99G down to 24G. So it looks like the region is in a bad state
    that cannot recover, close/open it fixed the issue. And from the region
    size metric graph, we can see major compaction stop working since March
    31, so some bug that caused region enter into bad state... Unfortunately,
    we don't have DEBUG enabled and that is the last region that has the issue,
    it is hard to figure out what is the bug that caused the bad state...
    Interesting. So moving it to another RS make it major-compactable? That
    would seem to indicate some state kept in the RS memory is preventing the
    major compaction running. Is moving the region a workaround for you until
    we figure what it is Tian-Ying?

    St.


    Thanks
    Tian-Ying
    On Tue, May 31, 2016 at 3:43 PM, Tianying Chang wrote:

    Hi, Stack

    Based on the log, the major compaction was run, and it took 5+ hours. And
    I also manually run major_compact from hbase shell explicitly to verify.

    I just moved the region to a different RS and issued a major_compact on
    that region again, let me see if the major compaction can succeed and will
    report back.

    Thanks
    Tian-Ying
    On Sun, May 29, 2016 at 4:35 PM, Stack wrote:

    On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Yes, it is 94.26. By a quick glance, I didn't see any put that is older
    than the delete marker's TS, which could go as far as about couple
    weeks
    ago since major compaction on it for long time seems.
    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same region
    replicated at the slave side is totally normal, i.e. at 20+G....
    If you move the region to another server, does that work?

    Looking in 0.94 codebase, I see this in Compactor#compact


    // For major compactions calculate the earliest put timestamp

    // of all involved storefiles. This is used to remove

    // family delete marker during the compaction.

    if (majorCompaction) {

    tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

    if (tmp == null) {

    // There's a file with no information, must be an old one

    // assume we have very old puts

    earliestPutTs = HConstants.OLDEST_TIMESTAMP;

    } else {

    earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));

    }

    }


    The above is followed by this log line:


    if (LOG.isDebugEnabled()) {

    LOG.debug("Compacting " + file +

    ", keycount=" + keyCount +

    ", bloomtype=" + r.getBloomFilterType().toString() +

    ", size=" + StringUtils.humanReadableInt(r.length()) +

    ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

    (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));

    }

    This prints out earliestPutTs. You see that in the logs? You running
    with
    DEBUG? Does the earliest put ts preclude our dropping delete family?


    Looking more in code, we retain deletes in following circumstances:


    this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT ||
    scan
    .isRaw();


    So, for sure we are running major compaction?

    Otherwise, have to dig in a bit more here.. This stuff is a little
    involved.
    St.Ack



    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:

    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Hi,

    We saw a very strange case in one of our production cluster. A
    couple
    regions cannot get their deleted rows or delete marker removed
    even
    after
    major compaction. However when the region triggered split (we set
    100G
    for
    auto split), the deletion worked. The 100G region becomes two 10G daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication)
    have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if
    it
    matters.

    This is really weird. Anyone has any clue for this strange
    behavior?
    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see
    incidence
    where
    there may have been versions older than the DeleteFamily that are
    also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
  • Hao zhang at Jun 3, 2016 at 2:04 am
    Hi, Stack

       We found this in our production cluster, too. I take a look about the
    code and found one case which will make the major compact not work.

       1. put some rows
       2. scan
       3. delete
       4. scanner didn't close rightly, it will keep the the read point in
    region's scannerReadPoints.
       5. major compact. It can't work because we have a scanner which have
    small read point than the delete.

       But if move the region to new rs, the scannerReadPoints will update to
    the biggest memstoreTs which form sotrefile. So major compact will work.
       I thought the try...catch module in Scanner.leaseExpired() method needs a
    finally module to close the region scanner rightly.

         public void leaseExpired() {
           RegionScannerHolder rsh = scanners.remove(this.scannerName);
           if (rsh != null) {
             RegionScanner s = rsh.s;
             LOG.info("Scanner " + this.scannerName + " lease expired on region "
               + s.getRegionInfo().getRegionNameAsString());
             try {
               Region region =
    regionServer.getRegion(s.getRegionInfo().getRegionName());
               if (region != null && region.getCoprocessorHost() != null) {
                 region.getCoprocessorHost().preScannerClose(s);
               }
               s.close();
               if (region != null && region.getCoprocessorHost() != null) {
                 region.getCoprocessorHost().postScannerClose(s);
               }
             } catch (IOException e) {
               LOG.error("Closing scanner for "
                 + s.getRegionInfo().getRegionNameAsString(), e);
             }
           } else {
             LOG.warn("Scanner " + this.scannerName + " lease expired, but no
    related" +
               " scanner found, hence no chance to close that related scanner!");
           }
         }


    2016-06-02 2:50 GMT+08:00 Stack <stack@duboce.net>:
    On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang wrote:

    Hi, Stack

    After moving the region and issue a major compact on that region, its size
    shrink from 99G down to 24G. So it looks like the region is in a bad state
    that cannot recover, close/open it fixed the issue. And from the region
    size metric graph, we can see major compaction stop working since March
    31, so some bug that caused region enter into bad state... Unfortunately,
    we don't have DEBUG enabled and that is the last region that has the issue,
    it is hard to figure out what is the bug that caused the bad state...
    Interesting. So moving it to another RS make it major-compactable? That
    would seem to indicate some state kept in the RS memory is preventing the
    major compaction running. Is moving the region a workaround for you until
    we figure what it is Tian-Ying?

    St.


    Thanks
    Tian-Ying
    On Tue, May 31, 2016 at 3:43 PM, Tianying Chang wrote:

    Hi, Stack

    Based on the log, the major compaction was run, and it took 5+ hours. And
    I also manually run major_compact from hbase shell explicitly to
    verify.
    I just moved the region to a different RS and issued a major_compact on
    that region again, let me see if the major compaction can succeed and will
    report back.

    Thanks
    Tian-Ying
    On Sun, May 29, 2016 at 4:35 PM, Stack wrote:

    On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Yes, it is 94.26. By a quick glance, I didn't see any put that is older
    than the delete marker's TS, which could go as far as about couple
    weeks
    ago since major compaction on it for long time seems.
    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same region
    replicated at the slave side is totally normal, i.e. at 20+G....
    If you move the region to another server, does that work?

    Looking in 0.94 codebase, I see this in Compactor#compact


    // For major compactions calculate the earliest put timestamp

    // of all involved storefiles. This is used to remove

    // family delete marker during the compaction.

    if (majorCompaction) {

    tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

    if (tmp == null) {

    // There's a file with no information, must be an old one

    // assume we have very old puts

    earliestPutTs = HConstants.OLDEST_TIMESTAMP;

    } else {

    earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));

    }

    }


    The above is followed by this log line:


    if (LOG.isDebugEnabled()) {

    LOG.debug("Compacting " + file +

    ", keycount=" + keyCount +

    ", bloomtype=" + r.getBloomFilterType().toString() +

    ", size=" + StringUtils.humanReadableInt(r.length()) +

    ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

    (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));

    }

    This prints out earliestPutTs. You see that in the logs? You running
    with
    DEBUG? Does the earliest put ts preclude our dropping delete family?


    Looking more in code, we retain deletes in following circumstances:


    this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT ||
    scan
    .isRaw();


    So, for sure we are running major compaction?

    Otherwise, have to dig in a bit more here.. This stuff is a little
    involved.
    St.Ack



    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:

    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
    tychang@gmail.com>
    wrote:
    Hi,

    We saw a very strange case in one of our production cluster. A
    couple
    regions cannot get their deleted rows or delete marker removed
    even
    after
    major compaction. However when the region triggered split (we
    set
    100G
    for
    auto split), the deletion worked. The 100G region becomes two
    10G
    daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication)
    have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if
    it
    matters.

    This is really weird. Anyone has any clue for this strange
    behavior?
    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see
    incidence
    where
    there may have been versions older than the DeleteFamily that are
    also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
  • Anoop John at Jun 3, 2016 at 5:06 am
    So u feel the call s.close() would have created some Exception and
    the removal of entry from scannerReadPoints would not have happened.
    hmm.. sounds possible

    public synchronized void close() {
           if (storeHeap != null) {
             storeHeap.close();
             storeHeap = null;
           }
           if (joinedHeap != null) {
             joinedHeap.close();
             joinedHeap = null;
           }
           // no need to synchronize here.
           scannerReadPoints.remove(this);
           this.filterClosed = true;
         }

    We may need to change this close impl and make sure the removal from
    scannerReadPoints happen in a finally block.

    -Anoop-

    On Fri, Jun 3, 2016 at 7:33 AM, hao zhang wrote:
    Hi, Stack

    We found this in our production cluster, too. I take a look about the
    code and found one case which will make the major compact not work.

    1. put some rows
    2. scan
    3. delete
    4. scanner didn't close rightly, it will keep the the read point in
    region's scannerReadPoints.
    5. major compact. It can't work because we have a scanner which have
    small read point than the delete.

    But if move the region to new rs, the scannerReadPoints will update to
    the biggest memstoreTs which form sotrefile. So major compact will work.
    I thought the try...catch module in Scanner.leaseExpired() method needs a
    finally module to close the region scanner rightly.

    public void leaseExpired() {
    RegionScannerHolder rsh = scanners.remove(this.scannerName);
    if (rsh != null) {
    RegionScanner s = rsh.s;
    LOG.info("Scanner " + this.scannerName + " lease expired on region "
    + s.getRegionInfo().getRegionNameAsString());
    try {
    Region region =
    regionServer.getRegion(s.getRegionInfo().getRegionName());
    if (region != null && region.getCoprocessorHost() != null) {
    region.getCoprocessorHost().preScannerClose(s);
    }
    s.close();
    if (region != null && region.getCoprocessorHost() != null) {
    region.getCoprocessorHost().postScannerClose(s);
    }
    } catch (IOException e) {
    LOG.error("Closing scanner for "
    + s.getRegionInfo().getRegionNameAsString(), e);
    }
    } else {
    LOG.warn("Scanner " + this.scannerName + " lease expired, but no
    related" +
    " scanner found, hence no chance to close that related scanner!");
    }
    }


    2016-06-02 2:50 GMT+08:00 Stack <stack@duboce.net>:
    On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang wrote:

    Hi, Stack

    After moving the region and issue a major compact on that region, its size
    shrink from 99G down to 24G. So it looks like the region is in a bad state
    that cannot recover, close/open it fixed the issue. And from the region
    size metric graph, we can see major compaction stop working since March
    31, so some bug that caused region enter into bad state... Unfortunately,
    we don't have DEBUG enabled and that is the last region that has the issue,
    it is hard to figure out what is the bug that caused the bad state...
    Interesting. So moving it to another RS make it major-compactable? That
    would seem to indicate some state kept in the RS memory is preventing the
    major compaction running. Is moving the region a workaround for you until
    we figure what it is Tian-Ying?

    St.


    Thanks
    Tian-Ying

    On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Hi, Stack

    Based on the log, the major compaction was run, and it took 5+ hours. And
    I also manually run major_compact from hbase shell explicitly to
    verify.
    I just moved the region to a different RS and issued a major_compact on
    that region again, let me see if the major compaction can succeed and will
    report back.

    Thanks
    Tian-Ying
    On Sun, May 29, 2016 at 4:35 PM, Stack wrote:

    On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Yes, it is 94.26. By a quick glance, I didn't see any put that is older
    than the delete marker's TS, which could go as far as about couple
    weeks
    ago since major compaction on it for long time seems.
    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same region
    replicated at the slave side is totally normal, i.e. at 20+G....
    If you move the region to another server, does that work?

    Looking in 0.94 codebase, I see this in Compactor#compact


    // For major compactions calculate the earliest put timestamp

    // of all involved storefiles. This is used to remove

    // family delete marker during the compaction.

    if (majorCompaction) {

    tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

    if (tmp == null) {

    // There's a file with no information, must be an old one

    // assume we have very old puts

    earliestPutTs = HConstants.OLDEST_TIMESTAMP;

    } else {

    earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));

    }

    }


    The above is followed by this log line:


    if (LOG.isDebugEnabled()) {

    LOG.debug("Compacting " + file +

    ", keycount=" + keyCount +

    ", bloomtype=" + r.getBloomFilterType().toString() +

    ", size=" + StringUtils.humanReadableInt(r.length()) +

    ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

    (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));

    }

    This prints out earliestPutTs. You see that in the logs? You running
    with
    DEBUG? Does the earliest put ts preclude our dropping delete family?


    Looking more in code, we retain deletes in following circumstances:


    this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT ||
    scan
    .isRaw();


    So, for sure we are running major compaction?

    Otherwise, have to dig in a bit more here.. This stuff is a little
    involved.
    St.Ack



    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:

    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
    tychang@gmail.com>
    wrote:
    Hi,

    We saw a very strange case in one of our production cluster. A
    couple
    regions cannot get their deleted rows or delete marker removed
    even
    after
    major compaction. However when the region triggered split (we
    set
    100G
    for
    auto split), the deletion worked. The 100G region becomes two
    10G
    daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through replication)
    have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily if
    it
    matters.

    This is really weird. Anyone has any clue for this strange
    behavior?
    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see
    incidence
    where
    there may have been versions older than the DeleteFamily that are
    also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
    V:
  • Michael Stack at Jun 13, 2016 at 5:36 pm
    Thanks for filing HBASE-16012 Guanghao Zhang...
    St.Ack
    On Thu, Jun 2, 2016 at 10:06 PM, Anoop John wrote:

    So u feel the call s.close() would have created some Exception and
    the removal of entry from scannerReadPoints would not have happened.
    hmm.. sounds possible

    public synchronized void close() {
    if (storeHeap != null) {
    storeHeap.close();
    storeHeap = null;
    }
    if (joinedHeap != null) {
    joinedHeap.close();
    joinedHeap = null;
    }
    // no need to synchronize here.
    scannerReadPoints.remove(this);
    this.filterClosed = true;
    }

    We may need to change this close impl and make sure the removal from
    scannerReadPoints happen in a finally block.

    -Anoop-

    On Fri, Jun 3, 2016 at 7:33 AM, hao zhang wrote:
    Hi, Stack

    We found this in our production cluster, too. I take a look about the
    code and found one case which will make the major compact not work.

    1. put some rows
    2. scan
    3. delete
    4. scanner didn't close rightly, it will keep the the read point in
    region's scannerReadPoints.
    5. major compact. It can't work because we have a scanner which have
    small read point than the delete.

    But if move the region to new rs, the scannerReadPoints will update to
    the biggest memstoreTs which form sotrefile. So major compact will work.
    I thought the try...catch module in Scanner.leaseExpired() method needs a
    finally module to close the region scanner rightly.

    public void leaseExpired() {
    RegionScannerHolder rsh = scanners.remove(this.scannerName);
    if (rsh != null) {
    RegionScanner s = rsh.s;
    LOG.info("Scanner " + this.scannerName + " lease expired on region "
    + s.getRegionInfo().getRegionNameAsString());
    try {
    Region region =
    regionServer.getRegion(s.getRegionInfo().getRegionName());
    if (region != null && region.getCoprocessorHost() != null) {
    region.getCoprocessorHost().preScannerClose(s);
    }
    s.close();
    if (region != null && region.getCoprocessorHost() != null) {
    region.getCoprocessorHost().postScannerClose(s);
    }
    } catch (IOException e) {
    LOG.error("Closing scanner for "
    + s.getRegionInfo().getRegionNameAsString(), e);
    }
    } else {
    LOG.warn("Scanner " + this.scannerName + " lease expired, but no
    related" +
    " scanner found, hence no chance to close that related
    scanner!");
    }
    }


    2016-06-02 2:50 GMT+08:00 Stack <stack@duboce.net>:
    On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang wrote:

    Hi, Stack

    After moving the region and issue a major compact on that region, its size
    shrink from 99G down to 24G. So it looks like the region is in a bad state
    that cannot recover, close/open it fixed the issue. And from the
    region
    size metric graph, we can see major compaction stop working since
    March
    31, so some bug that caused region enter into bad state...
    Unfortunately,
    we don't have DEBUG enabled and that is the last region that has the issue,
    it is hard to figure out what is the bug that caused the bad state...
    Interesting. So moving it to another RS make it major-compactable? That
    would seem to indicate some state kept in the RS memory is preventing
    the
    major compaction running. Is moving the region a workaround for you
    until
    we figure what it is Tian-Ying?

    St.


    Thanks
    Tian-Ying

    On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Hi, Stack

    Based on the log, the major compaction was run, and it took 5+
    hours.
    And
    I also manually run major_compact from hbase shell explicitly to
    verify.
    I just moved the region to a different RS and issued a
    major_compact on
    that region again, let me see if the major compaction can succeed
    and
    will
    report back.

    Thanks
    Tian-Ying
    On Sun, May 29, 2016 at 4:35 PM, Stack wrote:

    On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <tychang@gmail.com
    wrote:
    Yes, it is 94.26. By a quick glance, I didn't see any put that
    is
    older
    than the delete marker's TS, which could go as far as about
    couple
    weeks
    ago since major compaction on it for long time seems.
    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same
    region
    replicated at the slave side is totally normal, i.e. at 20+G....
    If you move the region to another server, does that work?

    Looking in 0.94 codebase, I see this in Compactor#compact


    // For major compactions calculate the earliest put timestamp

    // of all involved storefiles. This is used to remove

    // family delete marker during the compaction.

    if (majorCompaction) {

    tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

    if (tmp == null) {

    // There's a file with no information, must be an old one

    // assume we have very old puts

    earliestPutTs = HConstants.OLDEST_TIMESTAMP;

    } else {

    earliestPutTs = Math.min(earliestPutTs,
    Bytes.toLong(tmp));
    }

    }


    The above is followed by this log line:


    if (LOG.isDebugEnabled()) {

    LOG.debug("Compacting " + file +

    ", keycount=" + keyCount +

    ", bloomtype=" + r.getBloomFilterType().toString() +

    ", size=" + StringUtils.humanReadableInt(r.length()) +

    ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

    (majorCompaction? ", earliestPutTs=" + earliestPutTs:
    ""));
    }

    This prints out earliestPutTs. You see that in the logs? You
    running
    with
    DEBUG? Does the earliest put ts preclude our dropping delete
    family?

    Looking more in code, we retain deletes in following circumstances:


    this.retainDeletesInOutput = scanType ==
    ScanType.MINOR_COMPACT ||
    scan
    .isRaw();


    So, for sure we are running major compaction?

    Otherwise, have to dig in a bit more here.. This stuff is a little
    involved.
    St.Ack



    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:

    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
    tychang@gmail.com>
    wrote:
    Hi,

    We saw a very strange case in one of our production cluster.
    A
    couple
    regions cannot get their deleted rows or delete marker
    removed
    even
    after
    major compaction. However when the region triggered split (we
    set
    100G
    for
    auto split), the deletion worked. The 100G region becomes two
    10G
    daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through
    replication)
    have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly
    deleteFamily if
    it
    matters.

    This is really weird. Anyone has any clue for this strange
    behavior?
    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see
    incidence
    where
    there may have been versions older than the DeleteFamily that
    are
    also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=
    2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=
    2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=
    2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=
    2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=
    2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=
    2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=
    2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=
    2285778927
    V:
  • Michael Stack at Jun 9, 2016 at 7:14 pm
    Tian-ying:

    Can you try the suggestion below to see if it helps?

    Do you see the lease expired logs?

    St.Ack

    On Thu, Jun 2, 2016 at 7:03 PM, hao zhang wrote:

    Hi, Stack

    We found this in our production cluster, too. I take a look about the
    code and found one case which will make the major compact not work.

    1. put some rows
    2. scan
    3. delete
    4. scanner didn't close rightly, it will keep the the read point in
    region's scannerReadPoints.
    5. major compact. It can't work because we have a scanner which have
    small read point than the delete.

    But if move the region to new rs, the scannerReadPoints will update to
    the biggest memstoreTs which form sotrefile. So major compact will work.
    I thought the try...catch module in Scanner.leaseExpired() method needs a
    finally module to close the region scanner rightly.

    public void leaseExpired() {
    RegionScannerHolder rsh = scanners.remove(this.scannerName);
    if (rsh != null) {
    RegionScanner s = rsh.s;
    LOG.info("Scanner " + this.scannerName + " lease expired on region
    "
    + s.getRegionInfo().getRegionNameAsString());
    try {
    Region region =
    regionServer.getRegion(s.getRegionInfo().getRegionName());
    if (region != null && region.getCoprocessorHost() != null) {
    region.getCoprocessorHost().preScannerClose(s);
    }
    s.close();
    if (region != null && region.getCoprocessorHost() != null) {
    region.getCoprocessorHost().postScannerClose(s);
    }
    } catch (IOException e) {
    LOG.error("Closing scanner for "
    + s.getRegionInfo().getRegionNameAsString(), e);
    }
    } else {
    LOG.warn("Scanner " + this.scannerName + " lease expired, but no
    related" +
    " scanner found, hence no chance to close that related
    scanner!");
    }
    }


    2016-06-02 2:50 GMT+08:00 Stack <stack@duboce.net>:
    On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang wrote:

    Hi, Stack

    After moving the region and issue a major compact on that region, its size
    shrink from 99G down to 24G. So it looks like the region is in a bad state
    that cannot recover, close/open it fixed the issue. And from the region
    size metric graph, we can see major compaction stop working since
    March
    31, so some bug that caused region enter into bad state...
    Unfortunately,
    we don't have DEBUG enabled and that is the last region that has the issue,
    it is hard to figure out what is the bug that caused the bad state...
    Interesting. So moving it to another RS make it major-compactable? That
    would seem to indicate some state kept in the RS memory is preventing the
    major compaction running. Is moving the region a workaround for you until
    we figure what it is Tian-Ying?

    St.


    Thanks
    Tian-Ying

    On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Hi, Stack

    Based on the log, the major compaction was run, and it took 5+ hours. And
    I also manually run major_compact from hbase shell explicitly to
    verify.
    I just moved the region to a different RS and issued a major_compact
    on
    that region again, let me see if the major compaction can succeed and will
    report back.

    Thanks
    Tian-Ying
    On Sun, May 29, 2016 at 4:35 PM, Stack wrote:

    On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Yes, it is 94.26. By a quick glance, I didn't see any put that
    is
    older
    than the delete marker's TS, which could go as far as about couple
    weeks
    ago since major compaction on it for long time seems.
    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same
    region
    replicated at the slave side is totally normal, i.e. at 20+G....
    If you move the region to another server, does that work?

    Looking in 0.94 codebase, I see this in Compactor#compact


    // For major compactions calculate the earliest put timestamp

    // of all involved storefiles. This is used to remove

    // family delete marker during the compaction.

    if (majorCompaction) {

    tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

    if (tmp == null) {

    // There's a file with no information, must be an old one

    // assume we have very old puts

    earliestPutTs = HConstants.OLDEST_TIMESTAMP;

    } else {

    earliestPutTs = Math.min(earliestPutTs,
    Bytes.toLong(tmp));
    }

    }


    The above is followed by this log line:


    if (LOG.isDebugEnabled()) {

    LOG.debug("Compacting " + file +

    ", keycount=" + keyCount +

    ", bloomtype=" + r.getBloomFilterType().toString() +

    ", size=" + StringUtils.humanReadableInt(r.length()) +

    ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

    (majorCompaction? ", earliestPutTs=" + earliestPutTs:
    ""));
    }

    This prints out earliestPutTs. You see that in the logs? You
    running
    with
    DEBUG? Does the earliest put ts preclude our dropping delete family?


    Looking more in code, we retain deletes in following circumstances:


    this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT
    scan
    .isRaw();


    So, for sure we are running major compaction?

    Otherwise, have to dig in a bit more here.. This stuff is a little
    involved.
    St.Ack



    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:

    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
    tychang@gmail.com>
    wrote:
    Hi,

    We saw a very strange case in one of our production cluster. A
    couple
    regions cannot get their deleted rows or delete marker removed
    even
    after
    major compaction. However when the region triggered split (we
    set
    100G
    for
    auto split), the deletion worked. The 100G region becomes two
    10G
    daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through
    replication)
    have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly deleteFamily
    if
    it
    matters.

    This is really weird. Anyone has any clue for this strange
    behavior?
    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see
    incidence
    where
    there may have been versions older than the DeleteFamily that
    are
    also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=
    2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=
    2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=
    2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=
    2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=
    2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=
    2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=
    2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=
    2285778927
    V:
  • Guanghao Zhang at Jun 13, 2016 at 10:45 am
    We add some log in our production cluster.
    2016-06-06,21:37:28,443 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    scannerReadPoints size is 0 and smallest read point is 4037995
    2016-06-06,21:37:51,429 ERROR
    org.apache.hadoop.hbase.regionserver.HRegionServer: Failed
    openScannerjava.io.IOException: Could not seek StoreFileScanner
    2016-06-06,21:38:02,103 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    scannerReadPoints size is 1 and smallest read point is 4038037
    2016-06-13,17:17:10,516 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    scannerReadPoints size is 1 and smallest read point is 4038037

    When new RegionScanner, it will add a scanner read point in
    scannerReadPoints. But if we got a exception after add read point, the read
    point will keep in regions server and the delete after this mvcc number
    will never be compacted.
    Create a issue about this. https://issues.apache.org/jira/browse/HBASE-16012

    2016-06-10 3:08 GMT+08:00 Stack <stack@duboce.net>:
    Tian-ying:

    Can you try the suggestion below to see if it helps?

    Do you see the lease expired logs?

    St.Ack

    On Thu, Jun 2, 2016 at 7:03 PM, hao zhang wrote:

    Hi, Stack

    We found this in our production cluster, too. I take a look about the
    code and found one case which will make the major compact not work.

    1. put some rows
    2. scan
    3. delete
    4. scanner didn't close rightly, it will keep the the read point in
    region's scannerReadPoints.
    5. major compact. It can't work because we have a scanner which have
    small read point than the delete.

    But if move the region to new rs, the scannerReadPoints will update to
    the biggest memstoreTs which form sotrefile. So major compact will work.
    I thought the try...catch module in Scanner.leaseExpired() method needs a
    finally module to close the region scanner rightly.

    public void leaseExpired() {
    RegionScannerHolder rsh = scanners.remove(this.scannerName);
    if (rsh != null) {
    RegionScanner s = rsh.s;
    LOG.info("Scanner " + this.scannerName + " lease expired on region
    "
    + s.getRegionInfo().getRegionNameAsString());
    try {
    Region region =
    regionServer.getRegion(s.getRegionInfo().getRegionName());
    if (region != null && region.getCoprocessorHost() != null) {
    region.getCoprocessorHost().preScannerClose(s);
    }
    s.close();
    if (region != null && region.getCoprocessorHost() != null) {
    region.getCoprocessorHost().postScannerClose(s);
    }
    } catch (IOException e) {
    LOG.error("Closing scanner for "
    + s.getRegionInfo().getRegionNameAsString(), e);
    }
    } else {
    LOG.warn("Scanner " + this.scannerName + " lease expired, but no
    related" +
    " scanner found, hence no chance to close that related
    scanner!");
    }
    }


    2016-06-02 2:50 GMT+08:00 Stack <stack@duboce.net>:
    On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang <tychang@gmail.com>
    wrote:
    Hi, Stack

    After moving the region and issue a major compact on that region, its size
    shrink from 99G down to 24G. So it looks like the region is in a bad state
    that cannot recover, close/open it fixed the issue. And from the
    region
    size metric graph, we can see major compaction stop working since
    March
    31, so some bug that caused region enter into bad state...
    Unfortunately,
    we don't have DEBUG enabled and that is the last region that has the issue,
    it is hard to figure out what is the bug that caused the bad state...
    Interesting. So moving it to another RS make it major-compactable? That
    would seem to indicate some state kept in the RS memory is preventing
    the
    major compaction running. Is moving the region a workaround for you
    until
    we figure what it is Tian-Ying?

    St.


    Thanks
    Tian-Ying

    On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <tychang@gmail.com>
    wrote:
    Hi, Stack

    Based on the log, the major compaction was run, and it took 5+
    hours.
    And
    I also manually run major_compact from hbase shell explicitly to
    verify.
    I just moved the region to a different RS and issued a
    major_compact
    on
    that region again, let me see if the major compaction can succeed
    and
    will
    report back.

    Thanks
    Tian-Ying
    On Sun, May 29, 2016 at 4:35 PM, Stack wrote:

    On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <
    tychang@gmail.com>
    wrote:
    Yes, it is 94.26. By a quick glance, I didn't see any put that
    is
    older
    than the delete marker's TS, which could go as far as about
    couple
    weeks
    ago since major compaction on it for long time seems.
    Also it is really strange that if the region is split, then seems
    everything is working as expected. Also we noticed, the same
    region
    replicated at the slave side is totally normal, i.e. at 20+G....
    If you move the region to another server, does that work?

    Looking in 0.94 codebase, I see this in Compactor#compact


    // For major compactions calculate the earliest put
    timestamp
    // of all involved storefiles. This is used to remove

    // family delete marker during the compaction.

    if (majorCompaction) {

    tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

    if (tmp == null) {

    // There's a file with no information, must be an old
    one
    // assume we have very old puts

    earliestPutTs = HConstants.OLDEST_TIMESTAMP;

    } else {

    earliestPutTs = Math.min(earliestPutTs,
    Bytes.toLong(tmp));
    }

    }


    The above is followed by this log line:


    if (LOG.isDebugEnabled()) {

    LOG.debug("Compacting " + file +

    ", keycount=" + keyCount +

    ", bloomtype=" + r.getBloomFilterType().toString() +

    ", size=" + StringUtils.humanReadableInt(r.length()) +

    ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

    (majorCompaction? ", earliestPutTs=" + earliestPutTs:
    ""));
    }

    This prints out earliestPutTs. You see that in the logs? You
    running
    with
    DEBUG? Does the earliest put ts preclude our dropping delete
    family?

    Looking more in code, we retain deletes in following
    circumstances:

    this.retainDeletesInOutput = scanType ==
    ScanType.MINOR_COMPACT
    scan
    .isRaw();


    So, for sure we are running major compaction?

    Otherwise, have to dig in a bit more here.. This stuff is a little
    involved.
    St.Ack



    On Fri, May 27, 2016 at 3:13 PM, Stack wrote:

    On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
    tychang@gmail.com>
    wrote:
    Hi,

    We saw a very strange case in one of our production
    cluster. A
    couple
    regions cannot get their deleted rows or delete marker
    removed
    even
    after
    major compaction. However when the region triggered split
    (we
    set
    100G
    for
    auto split), the deletion worked. The 100G region becomes
    two
    10G
    daughter
    regions, and all the delete marker are gone.

    Also, the same region in the slave cluster (through
    replication)
    have
    normal size at about 20+G.

    BTW, the delete marker in the regions are mostly
    deleteFamily
    if
    it
    matters.

    This is really weird. Anyone has any clue for this strange
    behavior?
    Thanks
    Tian-Ying

    These 0.94 Tian-Ying?
    It looks like the DeleteFamily is retained only; do you see
    incidence
    where
    there may have been versions older than the DeleteFamily that
    are
    also
    retained post-major-compaction?

    St.Ack


    A snippet of the HFile generated by the major compaction:

    : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=
    2292870047
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=
    2292869794
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=
    2289446916
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=
    2288670451
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=
    2287911443
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=
    2287213792
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=
    2286488738
    V:
    K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
    \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=
    2285778927
    V:

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshbase, hadoop
postedMay 27, '16 at 9:32p
activeJun 13, '16 at 5:36p
posts14
users5
websitehbase.apache.org

People

Translate

site design / logo © 2018 Grokbase