Grokbase
Topics Posts Groups | in
x
[ help ]

Commands failing silently?

View TopicPrint | Flat  Thread  Threaded
1) Dan Bongert Hello all: I have a couple CentOS 4 servers (all up-to-date) that are having strange command...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Hello all:

I have a couple CentOS 4 servers (all up-to-date) that are having strange
command failures. I first noticed this with a perl script that uses lots of
system calls.

Basically, sometimes a command just won't run:

thoth(52) /tmp> ls

thoth(53) /tmp> ls

thoth(54) /tmp> ls

thoth(55) /tmp> ls
learner  lost+found/

thoth(56) /tmp> ls
learner  lost+found/

thoth(57) /tmp> ls
learner  lost+found/

thoth(58) /tmp> ls
learner  lost+found/

thoth(59) /tmp> ls
learner  lost+found/

thoth(60) /tmp> ls
learner  lost+found/

thoth(61) /tmp> ls
learner  lost+found/

thoth(62) /tmp> ls

thoth(63) /tmp> ls

thoth(64) /tmp> ls

thoth(65) /tmp> ls

thoth(66) /tmp> uname -a
Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT
2008 i686 i686 i386 GNU/Linux

Nothing in either dmesg or /var/log/messages seems to indicate any problems.
It also doesn't seem to matter what the command is -- ls is the quickest
test, but sshd will sometimes to fail to spawn children, etc. There aren't a
large amount of processes on the machine either -- only 122 at the moment.

Has anyone seen this behavior before? Have I been hit with some sort of
cunning rootkit? This machine shouldn't be publicly accessible; it's behind
our firewall.

Thanks.
--
Dan Bongert                     [email protected: dbo...@wisc.edu]

_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
2) Bill Campbell There is a very good chance that the machine has been cracked, and the system's /bin/ls routine...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Mon, Mar 24, 2008, Dan Bongert wrote:
>Hello all:
>
>I have a couple CentOS 4 servers (all up-to-date) that are having strange
>command failures. I first noticed this with a perl script that uses lots of
>system calls.
>
>Basically, sometimes a command just won't run:
>
>thoth(52) /tmp> ls
>
...
>
>thoth(66) /tmp> uname -a
>Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT
>2008 i686 i686 i386 GNU/Linux
>
>Nothing in either dmesg or /var/log/messages seems to indicate any
>problems. It also doesn't seem to matter what the command is -- ls is the
>quickest test, but sshd will sometimes to fail to spawn children, etc.
>There aren't a large amount of processes on the machine either -- only 122
>at the moment.

There is a very good chance that the machine has been cracked,
and the system's /bin/ls routine replaced by one hacked to hide
the cracker's programs.  ``rpm -V coreutils procps util-linux''
may well show several critical programs changed.

You can also try running ``strace /bin/ls'' to see what is going on.

Bill
--
INTERNET: [email protected: b...@celestial.com] Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
FAX:            (206) 232-9186  Mercer Island, WA 98040-0820; (206) 236-1676

When I hear a man applauded by the mob I always feel a pang of pity
for him.  All he has to do to be hissed is to live long enough.
    -- H.L. Mencken, Minority Report
_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
3) mouss where is /tmp mounted? is this an external disk (usb, ...)? is it an nfs mount?
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Dan Bongert wrote:
> Hello all:
>
> I have a couple CentOS 4 servers (all up-to-date) that are having
> strange command failures. I first noticed this with a perl script that
> uses lots of system calls.
>
> Basically, sometimes a command just won't run:
>
> thoth(52) /tmp> ls
>
> thoth(53) /tmp> ls
>
> thoth(54) /tmp> ls
>
> thoth(55) /tmp> ls
> learner  lost+found/
>
> thoth(56) /tmp> ls
> learner  lost+found/
>
> thoth(57) /tmp> ls
> learner  lost+found/
>
> thoth(58) /tmp> ls
> learner  lost+found/
>
> thoth(59) /tmp> ls
> learner  lost+found/
>
> thoth(60) /tmp> ls
> learner  lost+found/
>
> thoth(61) /tmp> ls
> learner  lost+found/
>
> thoth(62) /tmp> ls
>
> thoth(63) /tmp> ls
>
> thoth(64) /tmp> ls
>
> thoth(65) /tmp> ls
>
> thoth(66) /tmp> uname -a
> Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55
> EDT 2008 i686 i686 i386 GNU/Linux
>
> Nothing in either dmesg or /var/log/messages seems to indicate any
> problems. It also doesn't seem to matter what the command is -- ls is
> the quickest test, but sshd will sometimes to fail to spawn children,
> etc. There aren't a large amount of processes on the machine either --
> only 122 at the moment.
>
> Has anyone seen this behavior before? Have I been hit with some sort
> of cunning rootkit? This machine shouldn't be publicly accessible;
> it's behind our firewall.

where is /tmp mounted? is this an external disk (usb, ...)? is it an nfs
mount?

_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
4) Dan Bongert Everything seems OK there: thoth(96) /tmp> sudo rpm -V coreutils procps util-linux Funnily enough,...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Bill Campbell wrote:
> On Mon, Mar 24, 2008, Dan Bongert wrote:
>> Hello all:
>>
>> I have a couple CentOS 4 servers (all up-to-date) that are having strange
>> command failures. I first noticed this with a perl script that uses lots of
>> system calls.
>>
>> Basically, sometimes a command just won't run:
>>
>> thoth(52) /tmp> ls
>>
> ...
>> thoth(66) /tmp> uname -a
>> Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55 EDT
>> 2008 i686 i686 i386 GNU/Linux
>>
>> Nothing in either dmesg or /var/log/messages seems to indicate any
>> problems. It also doesn't seem to matter what the command is -- ls is the
>> quickest test, but sshd will sometimes to fail to spawn children, etc.
>> There aren't a large amount of processes on the machine either -- only 122
>> at the moment.
>
> There is a very good chance that the machine has been cracked,
> and the system's /bin/ls routine replaced by one hacked to hide
> the cracker's programs. ``rpm -V coreutils procps util-linux''
> may well show several critical programs changed.

Everything seems OK there:


thoth(96) /tmp> sudo rpm -V coreutils procps util-linux

> You can also try running ``strace /bin/ls'' to see what is going on.

Funnily enough, running strace will work just fine. Though, as I said, just
about any command will fail -- 'ls' was just for testing purposes.


> Bill
> --
> INTERNET: [email protected: b...@celestial.com] Bill Campbell; Celestial Software LLC
> URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
> FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676
>
> When I hear a man applauded by the mob I always feel a pang of pity
> for him. All he has to do to be hissed is to live long enough.
>     -- H.L. Mencken, Minority Report
> _______________________________________________
> CentOS mailing list
> [email protected: C...@centos.org]
> http://lists.centos.org/mailman/listinfo/centos

--
Dan Bongert                     [email protected: dbo...@wisc.edu]

_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
5) Dan Bongert It's a local disk: thoth(97) /tmp> df -h . Filesystem Size Used Avail Use% Mounted on /dev/md4 16G...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
mouss wrote:
> Dan Bongert wrote:
>> Hello all:
>>
>> I have a couple CentOS 4 servers (all up-to-date) that are having
>> strange command failures. I first noticed this with a perl script that
>> uses lots of system calls.
>>
>> thoth(66) /tmp> uname -a
>> Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15 06:54:55
>> EDT 2008 i686 i686 i386 GNU/Linux
>>
>> Nothing in either dmesg or /var/log/messages seems to indicate any
>> problems. It also doesn't seem to matter what the command is -- ls is
>> the quickest test, but sshd will sometimes to fail to spawn children,
>> etc. There aren't a large amount of processes on the machine either --
>> only 122 at the moment.
>>
>> Has anyone seen this behavior before? Have I been hit with some sort
>> of cunning rootkit? This machine shouldn't be publicly accessible;
>> it's behind our firewall.
>
> where is /tmp mounted? is this an external disk (usb, ...)? is it an nfs
> mount?

It's a local disk:

thoth(97) /tmp> df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/md4               16G   77M   15G   1% /tmp

Though 'ls' was just an example -- just about any program will fail. The 'w'
command will fail too:

thoth(118) /tmp> w
   16:06:51 up  5:34,  1 user,  load average: 0.94, 1.46, 2.04
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
dbongert pts/0    copland.ssc.wisc 14:16    0.00s  0.22s  0.05s w

thoth(119) /tmp> w
   16:06:52 up  5:34,  1 user,  load average: 0.94, 1.46, 2.04
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
dbongert pts/0    copland.ssc.wisc 14:16    0.00s  0.22s  0.05s w

thoth(120) /tmp> w

thoth(121) /tmp> w

--
Dan Bongert                     [email protected: dbo...@wisc.edu]

_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
6) Peter l Jakobi That's funny. Or due to the output of strace changing timing & stress. Try redirecting the strace...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Mon, Mar 24, 2008 at 04:18:49PM -0500, Dan Bongert wrote:
>> You can also try running ``strace /bin/ls'' to see what is going on.
> Funnily enough, running strace will work just fine. Though, as I said, just
> about any command will fail -- 'ls' was just for testing purposes.

That's  funny. Or due to the output of strace changing timing & stress.

Try  redirecting the strace output to a separate (local filesystem  or
ramdisk) file, possibly restricted to file operations. Also: check top
- you don't have swap or ram problems?

--
cu
Peter l Jakobi
[email protected: l...@kefk.oa.shuttle.de]
_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
7) William L. Maltby Hmmm... Sure it's failing? Maybe just the output is going somewhere else? After the command runs,...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Mon, 2008-03-24 at 16:19 -0500, Dan Bongert wrote:
> mouss wrote:
> > Dan Bongert wrote:
> >> Hello all:
> >>
> >><snip>


> Though 'ls' was just an example -- just about any program will fail. The 'w'
> command will fail too:
>
> thoth(118) /tmp> w
> 16:06:51 up 5:34, 1 user, load average: 0.94, 1.46, 2.04
> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
> dbongert pts/0 copland.ssc.wisc 14:16 0.00s 0.22s 0.05s w
>
> thoth(119) /tmp> w
> 16:06:52 up 5:34, 1 user, load average: 0.94, 1.46, 2.04
> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
> dbongert pts/0 copland.ssc.wisc 14:16 0.00s 0.22s 0.05s w
>
> thoth(120) /tmp> w
>
> thoth(121) /tmp> w
>

Hmmm... Sure it's failing? Maybe just the output is going somewhere
else? After the command runs, what does "echo $?" show? Does it even
work? Echo is a bash internal command, so I would expect it to never
fail.

What is your output device? A serial terminal? If so, could be simple
flow control issues. In fact, any serial connection (even a PC emulating
a terminal) could suffer from flow control problems. And they would tend
to be erratic in nature.

If you are on a normal console, try running the commands similart to
this (trying to determine if *something* else is receiving output or
not)

    <your command> &> /dev/tty

if this works reliably, maybe that's a starting point.

There's a couple kernel guys who frequent this list. Maybe one of them
will have a clue as to what could go wrong. Corrupted libraries and
whatnot.

You might try that rpm -V command earlier against all packages (add a
"a" IIRC). Maybe some library accessed by the coreutils, but which is
not itself part of coreutils, is corrupt.


HTH
--
Bill

_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
8) Dan Bongert Ok, it's definitely getting an error from somewhere: thoth(3) /tmp> ls thoth(4) /tmp> echo $?...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
William L. Maltby wrote:
> On Mon, 2008-03-24 at 16:19 -0500, Dan Bongert wrote:
>> mouss wrote:
>>> Dan Bongert wrote:
>>>> Hello all:
>>>>
>>>> <snip>
>
>
>> Though 'ls' was just an example -- just about any program will fail. The 'w'
>> command will fail too:
>>
>> thoth(118) /tmp> w
>> 16:06:51 up 5:34, 1 user, load average: 0.94, 1.46, 2.04
>> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
>> dbongert pts/0 copland.ssc.wisc 14:16 0.00s 0.22s 0.05s w
>>
>> thoth(119) /tmp> w
>> 16:06:52 up 5:34, 1 user, load average: 0.94, 1.46, 2.04
>> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
>> dbongert pts/0 copland.ssc.wisc 14:16 0.00s 0.22s 0.05s w
>>
>> thoth(120) /tmp> w
>>
>> thoth(121) /tmp> w
>>
>
> Hmmm... Sure it's failing? Maybe just the output is going somewhere
> else? After the command runs, what does "echo $?" show? Does it even
> work? Echo is a bash internal command, so I would expect it to never
> fail.

Ok, it's definitely getting an error from somewhere:

thoth(3) /tmp> ls

thoth(4) /tmp> echo $?
141

Although:

thoth(31) ~> top


thoth(32) ~> echo $?


> What is your output device? A serial terminal? If so, could be simple
> flow control issues. In fact, any serial connection (even a PC emulating
> a terminal) could suffer from flow control problems. And they would tend
> to be erratic in nature.

I'm usually sshing into the machine, but I've also experienced the problem
on the console.

> If you are on a normal console, try running the commands similart to
> this (trying to determine if *something* else is receiving output or
> not)
>
>     <your command> &> /dev/tty
>
> if this works reliably, maybe that's a starting point.

Nope, that fails intermittently as well.

> There's a couple kernel guys who frequent this list. Maybe one of them
> will have a clue as to what could go wrong. Corrupted libraries and
> whatnot.
>
> You might try that rpm -V command earlier against all packages (add a
> "a" IIRC). Maybe some library accessed by the coreutils, but which is
> not itself part of coreutils, is corrupt.

Hmm....when I do a 'rpm -Va', I get lots of "at least one of file's
dependencies has changed since prelinking" errors. Even if I run prelink
manually, and then do a 'rpm -Va' immediately afterwards.
--
Dan Bongert                     [email protected: dbo...@wisc.edu]

_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
9) William L. Maltby "~>" ? Got me on that one. Ditto. Although I should mention that unless you "man bash" and find the...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
On Tue, 2008-03-25 at 13:21 -0500, Dan Bongert wrote:
> William L. Maltby wrote:
> > On Mon, 2008-03-24 at 16:19 -0500, Dan Bongert wrote:
> >> mouss wrote:
> >>> Dan Bongert wrote:
> >>>> Hello all:
> >>>>
> >>>> <snip>
> >
> >
> >> Though 'ls' was just an example -- just about any program will fail. The 'w'
> >> command will fail too:
> >>
> >> <snip>

> >
> > Hmmm... Sure it's failing? Maybe just the output is going somewhere
> > else? After the command runs, what does "echo $?" show? Does it even
> > work? Echo is a bash internal command, so I would expect it to never
> > fail.
>
> Ok, it's definitely getting an error from somewhere:
>
> thoth(3) /tmp> ls
>
> thoth(4) /tmp> echo $?
> 141
>
> Although:
>
> thoth(31) ~> top

"~>" ? Got me on that one.

>
>
> thoth(32) ~> echo $?
> 0

Ditto. Although I should mention that unless you "man bash" and find the
magic incantation I can't remember that gets return codes from a
pipeline (if that's what "~>" is supposed to be), the return from the
last command in the pipeline is what's returned. If echo is from bash,
as I expected, it should not fail and should return a 0 code regardless
of what happened ahead of it.

Your best tack is simplicity: one command, no pipes, just redirect
output with "&>" like so

   cat <your file> &>/tmp/test.out

Then you can see if the output file has greater than zero length, use
vim on in (if that works), etc.

> <snip possibility of serial connection>

> I'm usually sshing into the machine, but I've also experienced the problem
> on the console.

Ssh via e'net or serial? On the console, is the failure as reliable or
less frequent?

> > If you are on a normal console, try running the commands similart to
> > this (trying to determine if *something* else is receiving output or
> > not)
> >
> >     <your command> &> /dev/tty
> >
> > if this works reliably, maybe that's a starting point.
>
> Nope, that fails intermittently as well.

I would surmise that means that basic kernel operations are good and
there is some common library routine involved.

>
> > There's a couple kernel guys who frequent this list. Maybe one of them
> > will have a clue as to what could go wrong. Corrupted libraries and
> > whatnot.
> >
> > You might try that rpm -V command earlier against all packages (add a
> > "a" IIRC). Maybe some library accessed by the coreutils, but which is
> > not itself part of coreutils, is corrupt.
>
> Hmm....when I do a 'rpm -Va', I get lots of "at least one of file's
> dependencies has changed since prelinking" errors. Even if I run prelink
> manually, and then do a 'rpm -Va' immediately afterwards.

Well, I'd "man rpm" (no, I don't hate you, but I don't do rpm stuff
enough to remember it all and *I* am not going to "man rpm" unless I
suddenly become quite masochistic :-), select some promising looking
options and run it again, redirecting output to a file you can examine
(possibly have to get it to a machine that works reliably - "man nc"
someone mentioned in another thread looks like a useful tool).

You want to get the diagnostic output from rpm and see what files it
complains about. The ones tagged with a "c" are config files and will
often show up there. If your system hasn't been compromised, it's safe
to ignore these.

Examine all the ones that were unexpectedly tagged and see if there is a
pattern.

If your HDs are "smart", maybe a "smartctl -l <more params>" will
identify some sectors gone bad in a critical area of your HD.

I don't have a clue why right after prelink is run the rpm would claim
they had been changed, unless it's a matter of the rpm data base has not
yet been updated. I don't know how it all works together. Maybe the rpm
update runs at night or something?

WHERE'S THE KNOWLEDGEABLE FOLKS WHEN NEEDED? It's the blind leading the
blind ATM.  8-O

HTH
--
Bill

_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
10) mouss maybe check your PATH. try $ /bin/ls ...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Dan Bongert wrote:
> mouss wrote:
>> Dan Bongert wrote:
>>> Hello all:
>>>
>>> I have a couple CentOS 4 servers (all up-to-date) that are having
>>> strange command failures. I first noticed this with a perl script
>>> that uses lots of system calls.
>>>
>>> thoth(66) /tmp> uname -a
>>> Linux thoth.ssc.wisc.edu 2.6.9-67.0.7.ELsmp #1 SMP Sat Mar 15
>>> 06:54:55 EDT 2008 i686 i686 i386 GNU/Linux
>>>
>>> Nothing in either dmesg or /var/log/messages seems to indicate any
>>> problems. It also doesn't seem to matter what the command is -- ls
>>> is the quickest test, but sshd will sometimes to fail to spawn
>>> children, etc. There aren't a large amount of processes on the
>>> machine either -- only 122 at the moment.
>>>
>>> Has anyone seen this behavior before? Have I been hit with some sort
>>> of cunning rootkit? This machine shouldn't be publicly accessible;
>>> it's behind our firewall.
>>
>> where is /tmp mounted? is this an external disk (usb, ...)? is it an
>> nfs mount?
>
> It's a local disk:
>
> thoth(97) /tmp> df -h .
> Filesystem Size Used Avail Use% Mounted on
> /dev/md4               16G   77M   15G   1% /tmp
>
> Though 'ls' was just an example -- just about any program will fail.
> The 'w'
> command will fail too:


maybe check your PATH. try
$ /bin/ls


>
> thoth(118) /tmp> w
> 16:06:51 up 5:34, 1 user, load average: 0.94, 1.46, 2.04
> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
> dbongert pts/0 copland.ssc.wisc 14:16 0.00s 0.22s 0.05s w
>
> thoth(119) /tmp> w
> 16:06:52 up 5:34, 1 user, load average: 0.94, 1.46, 2.04
> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
> dbongert pts/0 copland.ssc.wisc 14:16 0.00s 0.22s 0.05s w
>
> thoth(120) /tmp> w
>
> thoth(121) /tmp> w
>


_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
11) Filipe Brandenburger Hi, 141 is SIGPIPE. If the process is killed by a signal, the return code will be 128+signal...
| +1 vote (Anchor)
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Hi,

On Tue, Mar 25, 2008 at 2:21 PM, Dan Bongert <dbongert@wisc.edu> wrote:
>  thoth(3) /tmp> ls
>
>  thoth(4) /tmp> echo $?
>  141

141 is SIGPIPE. If the process is killed by a signal, the return code
will be 128+signal number. 141-128=13, and kill -l says: 13) SIGPIPE.

SIGPIPE means that something that ls is writing to is being closed.
That's really strange, and I couldn't find why.

I still think strace would be the best way to trace it. Please try:

# rm -f /tmp/ls-strace.txt; strace -o /tmp/ls-strace.txt -tt -s 1024
-f ls --color=tty

Repeat it until ls doesn't print anything. Then less your
/tmp/ls-strace.txt file, you'll probably have something like +++
killed by SIGPIPE +++ as the last line of it. Then try to figure out
what happened before it got the SIGPIPE. Probably a "write" to
something, try to figure out to which file descriptor. If you can't do
it, try to post the last few lines of the file here.

Also, can you post the output of this command?
# ls -la /proc/$$/fd/

Filipe
_______________________________________________
CentOS mailing list
[email protected: C...@centos.org]
http://lists.centos.org/mailman/listinfo/centos
spacer
View TopicPrint | Flat  Thread  Threaded
Home > Groups > CentOS > Commands failing silently? (11 posts)