cancel
Showing results for 
Search instead for 
Did you mean: 

New Server

MrC
Grafter
Posts: 525
Thanks: 4
Registered: ‎17-07-2008

Re: New Server

Quote from: SteveA
Smartmon will not run tests on the primary drive

Smartmon doesn't care what drive it talks to Smiley
Even if the machine has died because of a disk failure and been left for a time and seems totally comatose it should still respond to the alt-sysrq key combinations. It could be worth checking that sysreq has been enabled on your system and leaving a copy of something like http://en.wikipedia.org/wiki/Magic_SysRq_key left somewhere handy.
When leaving the system unattended it's worth leaving it at a logged-in console screen rather than an X display as you'll sometimes get messages printed there, plus you may be able to use the dmesg command to check the kernel log buffers and it makes the use of alt-sysrq much easier (ie you can see the results on the screen!)
One thing to check if it does seem to be a 'dead' disk and you have SATA drives is that the SATA connectors aren't loose or poorly fitting. SATA connectors can be a bit a of a weak point, especially if they're of the non-latching type and the connection is under pressure from a badly-routed cable (have a look on Google for this).
If the system doesn't respond to any of the alt-sysrq combinations I think I'd be more suspicious of one of: cpu, graphics card or northbridge overheating, power supply problems, bad connections, card seating or bad memory. To be honest drive failures don't normally lead to a total system freeze - a logged in command prompt on a console will normally still respond in some way.
re memory errors - how long have you run Memtest for? Memtest can have to be run for quite some time  (eg overnight and maybe up to 24hrs) to be reasonably sure there are no memory problems.
Another option is to find a local PC repair place that has memory and PSU testers that you could make use of. It is possible to get fairly cheap ATX PSU testers off Ebay - they're damn handy things to have in the toolbox.
Mike
SteveA
Pro
Posts: 1,849
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

Well it rebooted itself today and the kern.log had something interesting in it.
Jan 28 12:03:57 bantock kernel: [45011264.360000] Unable to handle kernel NULL pointer dereference at virtual address 00000404
Jan 28 12:03:57 bantock kernel: [45011264.360000]  printing eip:
Jan 28 12:03:57 bantock kernel: [45011264.360000] c0309365
Jan 28 12:03:57 bantock kernel: [45011264.360000] *pde = 0d91c001
Jan 28 12:03:57 bantock kernel: [45011264.360000] Oops: 0002 [#1]
Jan 28 12:03:57 bantock kernel: [45011264.360000] SMP
Jan 28 12:03:57 bantock kernel: [45011264.360000] Modules linked in: ppdev bluetooth video tc1100_wmi sony_acpi pcc_acpi hotkey dev_acpi container button acpi_sbs battery ac i2c_acpi_ec ipv6 sr_mod sbp2 scsi_mod ieee1394 psmouse parport_pc lp parport tsdev usbhid tulip pcspkr i2c_sis96x i2c_core sis_agp agpgart shpchp pci_hotplug evdev ext3 jbd ide_generic ehci_hcd ohci_hcd usbcore ide_cd cdrom ide_disk sis5513 generic thermal processor fan capability commoncap vga16fb vgastate fbcon tileblit font bitblit softcursor
Jan 28 12:03:57 bantock kernel: [45011264.360000] CPU:    0
Jan 28 12:03:57 bantock kernel: [45011264.360000] EIP:    0060:[_spin_lock_irqsave+5/32]    Not tainted VLI
Jan 28 12:03:57 bantock kernel: [45011264.360000] EFLAGS: 00010086   (2.6.15-53-server)
Jan 28 12:03:57 bantock kernel: [45011264.360000] EIP is at _spin_lock_irqsave+0x5/0x20
Jan 28 12:03:57 bantock kernel: [45011264.360000] eax: 00000286   ebx: 00000400   ecx: c1405f00   edx: 00000404
Jan 28 12:03:57 bantock kernel: [45011264.360000] esi: 00000404   edi: dfc07580   ebp: d5b79f38   esp: d5b79f14
Jan 28 12:03:57 bantock kernel: [45011264.360000] ds: 007b   es: 007b   ss: 0068
Jan 28 12:03:57 bantock kernel: [45011264.360000] Process sshd (pid: 21786, threadinfo=d5b78000 task=dfc07580)
Jan 28 12:03:57 bantock kernel: [45011264.360000] Stack: c011fa0b c1216d8c 0a582065 00000001 00000000 c01624e1 00000000 dfc07580
Jan 28 12:03:57 bantock kernel: [45011264.360000]        dfc07580 00000001 c0122816 00000286 c012ea54 ce5ce7e4 ffffffff 00000286
Jan 28 12:03:57 bantock kernel: [45011264.360000]        c012eca2 d14afe00 dfc07580 c0127122 dfc07580 d14afe00 d5b78000 dfc07580
Jan 28 12:03:57 bantock kernel: [45011264.360000] Call Trace:
Jan 28 12:03:57 bantock kernel: [45011264.360000]  [complete+27/96] complete+0x1b/0x60
Jan 28 12:03:57 bantock kernel: [45011264.360000]  [__handle_mm_fault+881/928] __handle_mm_fault+0x371/0x3a0
Jan 28 12:03:57 bantock kernel: [45011264.360000]  [mm_release+38/128] mm_release+0x26/0x80
Jan 28 12:03:57 bantock kernel: [45011264.360000]  [lock_timer_base+36/80] lock_timer_base+0x24/0x50
Jan 28 12:03:57 bantock kernel: [45011264.360000]  [try_to_del_timer_sync+82/96] try_to_del_timer_sync+0x52/0x60
Jan 28 12:03:57 bantock kernel: [45011264.360000]  [exit_mm+34/320] exit_mm+0x22/0x140
Jan 28 12:03:57 bantock kernel: [45011264.360000]  [do_exit+223/992] do_exit+0xdf/0x3e0
Jan 28 12:03:57 bantock kernel: [45011264.360000]  [do_group_exit+60/176] do_group_exit+0x3c/0xb0
Jan 28 12:03:57 bantock kernel: [45011264.360000]  [sysenter_past_esp+84/117] sysenter_past_esp+0x54/0x75
Jan 28 12:03:57 bantock kernel: [45011264.360000] Code: 00 01 31 c9 89 c8 c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 f0 83 28 01 79 05 e8 2d e1 ff ff c3 8d 74 26 00 89 c2 9c 58 fa <f0> fe 0a 79 12 a9 00 02 00 00 74 01 fb f3 90 80 3a 00 7e f9 fa
Jan 28 12:03:57 bantock kernel: [45011264.360000]  <1>Fixing recursive fault but reboot is needed!
Jan 29 15:23:57 bantock kernel: Inspecting /boot/System.map-2.6.15-53-server
Jan 29 15:23:57 bantock kernel: Loaded 23278 symbols from /boot/System.map-2.6.15-53-server.
Jan 29 15:23:57 bantock kernel: Symbols match kernel version 2.6.15.
Jan 29 15:23:57 bantock kernel: No module symbols loaded - kernel modules not enabled.
Jan 29 15:23:57 bantock kernel: [42949372.960000] Linux version 2.6.15-53-server (buildd@palmer) (gcc version 4.0.3 (Ubuntu 4.0.3-1ubuntu5)) #1 SMP Mon Nov 24 19:00:01 UTC 2008

I think that maybe I need to burn the alternative install CD for the latest release and do an upgrade because I'm running old stuff on this box now
SteveA
Pro
Posts: 1,849
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

Well I bit the bullet and spent a little bit of money.
1.5TB of disk space (500GB for O/s and stuff, 1TB for music files etc.) in two SATA disks (with space for 2 more), Twin channel IDE, integrated graphics and network and 6 USB slots, 1GB RAM, 4200+ Dual Core AMD64, 700W power supply (with twin fans) and a case with twin front mounted fans too.
Server Uptime: 59 day(s) 17 hour(s) 34 minute(s).
I did something sneaky too. I installed as a server with a GUI front end and then turned off the GUI which means the console is purely command line. BUT I also installed NX and when that connects to the server it fires up an X session for the duration. This means I can use some of the nice GUI tools without having X running all the time.
VileReynard
Hero
Posts: 12,616
Thanks: 582
Fixes: 20
Registered: ‎01-09-2007

Re: New Server

Well, how much did it cost to bite the bullet?

"In The Beginning Was The Word, And The Word Was Aardvark."

SteveA
Pro
Posts: 1,849
Thanks: 106
Fixes: 3
Registered: ‎17-06-2007

Re: New Server

About £300 all in