cancel
Showing results for 
Search instead for 
Did you mean: 

New Server

VileReynard
Hero
Posts: 12,616
Thanks: 582
Fixes: 20
Registered: ‎01-09-2007

Re: New Server

It says "Network File system SMB/FTP" - you really want NFS on a Linux file system such as ext3.
(Unless it's for Windows).

"In The Beginning Was The Word, And The Word Was Aardvark."

SteveA
Pro
Posts: 1,850
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

I'm on Linux/Vista (when I have to) on my laptop. Wife is on XP so most of the time she is accessing our stuff via SMB or using Itunes to listen from the daap.
I spent a happy hour or so pulling the existing box apart and I've removed and reseated everything apart from the CPU and I moved the hard drives round so there is more space round them.
If I can get the old server stable again then I might just go for Network storage but I dont want to buy any IDE drives to up the capacity on the existing box if its going to keep failing on me.
I hate these intermittent problems.
Peter_Vaughan
Grafter
Posts: 14,469
Registered: ‎30-07-2007

Re: New Server

Have you run any memory tests - I use memtest86+ boted from CD or floppy, but you need to leave it running overnight.
Also check your temps (CPU, mobo etc) as it may be a cooling problem. Not sure what apps are available for linux but you need to check it when its running normally. Try running with the case off for a short while (not recommended for long term use) to see if that makes a difference, if so it will be high temps.
SteveA
Pro
Posts: 1,850
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

Mem tests show its clear and I did wonder about the temp thing but usually that takes the CPU and thus the whole system down. This leaves the display running but frozen which is very odd
chillypenguin
Grafter
Posts: 4,729
Registered: ‎04-04-2007

Re: New Server

It's also worth grep'ing /var/log/message for error messages, and checking the last entries before the freeze. But it's long shot on a freezing system.
But I have found bad memory this way in the past.
Also I have seen system freezes caused by faulty power supplies.
VileReynard
Hero
Posts: 12,616
Thanks: 582
Fixes: 20
Registered: ‎01-09-2007

Re: New Server

Does your PC do a filesystem check on restart?
How about opening a console window and run something like:-
top > top.log

This will at least tell you what was running (maybe) at the time of freeze.

"In The Beginning Was The Word, And The Word Was Aardvark."

SteveA
Pro
Posts: 1,850
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

There is never anything in the logs, they just stop, all of them at pretty much the same time and there never is anything useful in any of them
It does an fsck from time to time as you'd expect and sometimes it decides to remount the primary HDD as read only for no apparent reason. I've got smarttools installed but for some reason it refuses to run diagnostics on the primary drive (a SAMSUNG SP1203N) but SMART sees the drive as sound

Ben_Brown
Grafter
Posts: 2,839
Registered: ‎13-06-2007

Re: New Server

Quote from: SteveA
...somemetimes it decides to remount the primary HDD as read only for no apparent reason.

Debian remounts / as read only if it sees any errors on the drive, I'm assuming that you are using Debian or a Debian based distro (e.g. Ubuntu).
To make sure, look in /etc/fstab. It will say something like:
/dev/sda1       /               ext3    defaults,errors=remount-ro 0       1

Is there anything in /var/log/kern.log at the time it goes wrong? That's where I'd look if this was happening to me.
HTH
SteveA
Pro
Posts: 1,850
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

I'm running Unbuntu 6.02LTS on it and its odd that yes it mounts the drive RO which it should do if sees errors but if force a reboot with an fdisk the disk comes back clean and SMART is enabled and thinks there is nothing wrong with it (I'm wondering if I should turn it off because its obviously serving no purpose),
The kern.log file never seems to have anything in it of any significance from the crash time
For example when it crashed on Jan 2nd in the early evening this is what is in the kern.log file
Jan  1 10:18:42 bantock kernel: [42949400.670000] ACPI: Power Button (FF) [PWRF]
Jan  1 10:18:42 bantock kernel: [42949400.670000] ACPI: Power Button (CM) [PWRB]
Jan  1 10:18:42 bantock kernel: [42949400.780000] ibm_acpi: ec object not found
Jan  1 10:18:42 bantock kernel: [42949400.840000] pcc_acpi: loading...
Jan  1 10:18:45 bantock kernel: [42949405.180000] eth1: no IPv6 routers present
Jan  2 21:33:37 bantock kernel: Inspecting /boot/System.map-2.6.15-53-server
Jan  2 21:33:37 bantock kernel: Loaded 23278 symbols from /boot/System.map-2.6.15-53-server.
Jan  2 21:33:37 bantock kernel: Symbols match kernel version 2.6.15.
Jan  2 21:33:37 bantock kernel: No module symbols loaded - kernel modules not enabled.
Jan  2 21:33:37 bantock kernel: [42949372.960000] Linux version 2.6.15-53-server (buildd@palmer) (gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)) #1 SMP Mon Nov 24 19:00:01 UTC 2008
VileReynard
Hero
Posts: 12,616
Thanks: 582
Fixes: 20
Registered: ‎01-09-2007

Re: New Server

Isn't the [42949400 etc] the number of seconds since the last reboot - about 18 months?
What does the uptime command give?
There isn't a Ubuntu 6.02LTS - there is a 8.04LTS though  Grin

"In The Beginning Was The Word, And The Word Was Aardvark."

SteveA
Pro
Posts: 1,850
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

Distributor ID: Ubuntu
Description: Ubuntu 6.06.2 LTS
Release: 6.06
Codename: dapper
So I forgot the extra 6!
That can't be the uptime as it had only been up for about 3 days that time.
Currenty uptime is standing at 8 days,  3:21,
I have had 160 days uptime
SteveA
Pro
Posts: 1,850
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

Still hasn't crashed...
13:55:53 up 17 days, 22:36
Typical!
It even got through some severe power wobbles last weekend which surprised me
MickKi
Grafter
Posts: 543
Registered: ‎30-09-2007

Re: New Server

Have a look at the smartmon tests explained here.
Read the capabilities (smartctl -a) to see what tests your particular drives can perform and then run them as explained in the link above.  Check the results for errors (although most errors do not mean that your drive will die anytime soon).
I would second the overheating problem as a probable cause.  Unless you've done so recently fit an earthing strap on your wrist, and clean very carefully the CPU fan and heat sink, but more importantly the video card fan and heat sink.  You may need to detach the video card from the MoBo to gain access to it.
If the above does not solve this problem keep sshd running and when a crash occurs log in remotely and check what the logs say (both messages or syslog and xorg.0.log).
Finally a quick word about RAID.  I would strongly advise against using any BIOS RAID setup because they are all proprietary software CRAP, which are often buggy and can leave you unable to access your data when you upgrade you OS later on, or worse the MoBo dies.  Switch these off in the BIOS and use the Linux kernel RAID drivers which do a much better job (both in terms of performance and stability). A LiveCD is all you need to recover your data whether your basic OS goes bad or perhaps the MoBo packs up one day.  Of course, if you can afford real hardware RAID with SCSI drives then by all means go for it.  It will outperform most/any software RAID - but at the same time you'll need to budget for 4-7 times the money you had in mind.  Wink
SteveA
Pro
Posts: 1,850
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

Smartmon will not run tests on the primary drive, or if they run they don't log. I spent a happy two days trying every single option.
smartctl -t short /dev/hda
smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Sun Jan 25 21:00:00 2009
root@bantock:/var/log/apache2# date
Sun Jan 25 21:04:34 GMT 2009
root@bantock:/var/log/apache2# smartctl -l selftest /dev/hda
smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 256
Warning: ATA Specification requires self-test log structure revision number = 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
The odd thing is that if I start a test and then do a  smartctl -a I get
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 245) Self-test routine in progress...
50% of test remaining.

but when it completes then I get
Self-test execution status:      (  0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
So its not logging but says there is nothing wrong
No fan on the video card, its an ancient old card and actually doesn't get really hot at all. I'm not running X on there and all that there is on the console usually is the login prompt.
VileReynard
Hero
Posts: 12,616
Thanks: 582
Fixes: 20
Registered: ‎01-09-2007

Re: New Server

I think Smarton will run.
The results look a bit chancy though:-
Quote
jeremy@HECTOR:~$ sudo smartctl -a /dev/sda
******
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device: SEAGATE  ST173404LC      Version: 0004
Serial number: 3CE0VVWM000071456HHD
Device type: disk
Local Time is: Mon Jan 26 12:22:09 2009 GMT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature:    44 C
Drive Trip Temperature:        65 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 1244801
  Blocks received from initiator = 1713452
  Blocks read from cache and sent to initiator = 160336
  Number of read and write commands whose size <= segment size = 117809
  Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 11.98
  number of minutes until next internal SMART test = 76
Error counter log:
          Errors Corrected by          Total  Correction    Gigabytes    Total
              ECC          rereads/    errors  algorithm      processed    uncorrected
          fast | delayed  rewrites  corrected  invocations  [10^9 bytes]  errors
read:          4        0        0        4          4          3.103          0
write:        0        0        0        0          0          0.268          0
verify:        3        0        0        3          3          5.275          0
Non-medium error count:        0
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                segment  LifeTime  LBA_first_err [SK ASC ASQ]
    Description                              number  (hours)
# 1  Background short  Completed                  -      1                - [-  -    -]
# 2  Background short  Completed                  -      1                - [-  -    -]
# 3  Background short  Completed                  -      1                - [-  -    -]
# 4  Background short  Completed                  -      0                - [-  -    -]
# 5  Background short  Completed                  -      0                - [-  -    -]
# 6  Background short  Completed                  -      0                - [-  -    -]
Long (extended) Self Test duration: 2480 seconds [41.3 minutes]

It only took a couple of seconds to run - and I'm sure I've transferred a lot more than 5GB!

"In The Beginning Was The Word, And The Word Was Aardvark."