New Server
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Plusnet Community
- :
- Forum
- :
- Other forums
- :
- Tech Help - Software/Hardware etc
- :
- New Server
Re: New Server
05-01-2009 4:14 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
(Unless it's for Windows).
"In The Beginning Was The Word, And The Word Was Aardvark."
Re: New Server
05-01-2009 4:46 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
I spent a happy hour or so pulling the existing box apart and I've removed and reseated everything apart from the CPU and I moved the hard drives round so there is more space round them.
If I can get the old server stable again then I might just go for Network storage but I dont want to buy any IDE drives to up the capacity on the existing box if its going to keep failing on me.
I hate these intermittent problems.
Re: New Server
08-01-2009 9:15 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
Also check your temps (CPU, mobo etc) as it may be a cooling problem. Not sure what apps are available for linux but you need to check it when its running normally. Try running with the case off for a short while (not recommended for long term use) to see if that makes a difference, if so it will be high temps.
Re: New Server
10-01-2009 11:05 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
Re: New Server
11-01-2009 10:30 AM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
But I have found bad memory this way in the past.
Also I have seen system freezes caused by faulty power supplies.
Re: New Server
11-01-2009 2:01 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
How about opening a console window and run something like:-
top > top.log
This will at least tell you what was running (maybe) at the time of freeze.
"In The Beginning Was The Word, And The Word Was Aardvark."
Re: New Server
11-01-2009 4:13 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
It does an fsck from time to time as you'd expect and sometimes it decides to remount the primary HDD as read only for no apparent reason. I've got smarttools installed but for some reason it refuses to run diagnostics on the primary drive (a SAMSUNG SP1203N) but SMART sees the drive as sound
Re: New Server
12-01-2009 3:53 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
Quote from: SteveA ...somemetimes it decides to remount the primary HDD as read only for no apparent reason.
Debian remounts / as read only if it sees any errors on the drive, I'm assuming that you are using Debian or a Debian based distro (e.g. Ubuntu).
To make sure, look in /etc/fstab. It will say something like:
/dev/sda1 / ext3 defaults,errors=remount-ro 0 1
Is there anything in /var/log/kern.log at the time it goes wrong? That's where I'd look if this was happening to me.
HTH
Re: New Server
12-01-2009 6:06 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
The kern.log file never seems to have anything in it of any significance from the crash time
For example when it crashed on Jan 2nd in the early evening this is what is in the kern.log file
Jan 1 10:18:42 bantock kernel: [42949400.670000] ACPI: Power Button (FF) [PWRF]
Jan 1 10:18:42 bantock kernel: [42949400.670000] ACPI: Power Button (CM) [PWRB]
Jan 1 10:18:42 bantock kernel: [42949400.780000] ibm_acpi: ec object not found
Jan 1 10:18:42 bantock kernel: [42949400.840000] pcc_acpi: loading...
Jan 1 10:18:45 bantock kernel: [42949405.180000] eth1: no IPv6 routers present
Jan 2 21:33:37 bantock kernel: Inspecting /boot/System.map-2.6.15-53-server
Jan 2 21:33:37 bantock kernel: Loaded 23278 symbols from /boot/System.map-2.6.15-53-server.
Jan 2 21:33:37 bantock kernel: Symbols match kernel version 2.6.15.
Jan 2 21:33:37 bantock kernel: No module symbols loaded - kernel modules not enabled.
Jan 2 21:33:37 bantock kernel: [42949372.960000] Linux version 2.6.15-53-server (buildd@palmer) (gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)) #1 SMP Mon Nov 24 19:00:01 UTC 2008
Re: New Server
12-01-2009 6:34 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
What does the uptime command give?
There isn't a Ubuntu 6.02LTS - there is a 8.04LTS though
"In The Beginning Was The Word, And The Word Was Aardvark."
Re: New Server
12-01-2009 6:42 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
Description: Ubuntu 6.06.2 LTS
Release: 6.06
Codename: dapper
So I forgot the extra 6!
That can't be the uptime as it had only been up for about 3 days that time.
Currenty uptime is standing at 8 days, 3:21,
I have had 160 days uptime
Re: New Server
22-01-2009 1:56 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
13:55:53 up 17 days, 22:36
Typical!
It even got through some severe power wobbles last weekend which surprised me
Re: New Server
24-01-2009 4:45 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
Read the capabilities (smartctl -a) to see what tests your particular drives can perform and then run them as explained in the link above. Check the results for errors (although most errors do not mean that your drive will die anytime soon).
I would second the overheating problem as a probable cause. Unless you've done so recently fit an earthing strap on your wrist, and clean very carefully the CPU fan and heat sink, but more importantly the video card fan and heat sink. You may need to detach the video card from the MoBo to gain access to it.
If the above does not solve this problem keep sshd running and when a crash occurs log in remotely and check what the logs say (both messages or syslog and xorg.0.log).
Finally a quick word about RAID. I would strongly advise against using any BIOS RAID setup because they are all proprietary software CRAP, which are often buggy and can leave you unable to access your data when you upgrade you OS later on, or worse the MoBo dies. Switch these off in the BIOS and use the Linux kernel RAID drivers which do a much better job (both in terms of performance and stability). A LiveCD is all you need to recover your data whether your basic OS goes bad or perhaps the MoBo packs up one day. Of course, if you can afford real hardware RAID with SCSI drives then by all means go for it. It will outperform most/any software RAID - but at the same time you'll need to budget for 4-7 times the money you had in mind.
Re: New Server
25-01-2009 9:09 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
smartctl -t short /dev/hda
smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Sun Jan 25 21:00:00 2009
root@bantock:/var/log/apache2# date
Sun Jan 25 21:04:34 GMT 2009
root@bantock:/var/log/apache2# smartctl -l selftest /dev/hda
smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 256
Warning: ATA Specification requires self-test log structure revision number = 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
The odd thing is that if I start a test and then do a smartctl -a I get
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 245) Self-test routine in progress...
50% of test remaining.
but when it completes then I get
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
So its not logging but says there is nothing wrong
No fan on the video card, its an ancient old card and actually doesn't get really hot at all. I'm not running X on there and all that there is on the console usually is the login prompt.
Re: New Server
26-01-2009 12:33 PM
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Report to Moderator
The results look a bit chancy though:-
Quote jeremy@HECTOR:~$ sudo smartctl -a /dev/sda
******
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device: SEAGATE ST173404LC Version: 0004
Serial number: 3CE0VVWM000071456HHD
Device type: disk
Local Time is: Mon Jan 26 12:22:09 2009 GMT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature: 44 C
Drive Trip Temperature: 65 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 1244801
Blocks received from initiator = 1713452
Blocks read from cache and sent to initiator = 160336
Number of read and write commands whose size <= segment size = 117809
Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 11.98
number of minutes until next internal SMART test = 76
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 4 0 0 4 4 3.103 0
write: 0 0 0 0 0 0.268 0
verify: 3 0 0 3 3 5.275 0
Non-medium error count: 0
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 1 - [- - -]
# 2 Background short Completed - 1 - [- - -]
# 3 Background short Completed - 1 - [- - -]
# 4 Background short Completed - 0 - [- - -]
# 5 Background short Completed - 0 - [- - -]
# 6 Background short Completed - 0 - [- - -]
Long (extended) Self Test duration: 2480 seconds [41.3 minutes]
It only took a couple of seconds to run - and I'm sure I've transferred a lot more than 5GB!
"In The Beginning Was The Word, And The Word Was Aardvark."
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page