NFS mount hangs whole system

static_sa

Senior Member
Joined
Oct 3, 2011
Messages
852
#1
Please tell me there are some greybeards on this forum...

When my node loses connection to an NFS mount, when running the df command, the whole system hangs until it is unmounted.

Is there an option in fstab or exports that can safeguard me from this?

fstab:

defaults,_netdev,soft,intr,nolock,noacl,noatime,sync,proto=tcp,mountproto=udp,port=(removed) 0 0

NFS server exports:

(rw,async,no_root_squash,no_subtree_check,no_wdelay,fsid=16)

I was thinking changing the retrans to a very low value?

Any help is appreciated!
 

DWPTA

Expert Member
Joined
Jul 28, 2006
Messages
3,763
#4
Well the first question is why do you loose connection to the mount point? I suspect fixing that will resolve the issue as to when the connection is re-established.
 

SauRoNZA

Honorary Master
Joined
Jul 6, 2010
Messages
30,723
#6
What exactly is hosted on this mount?

Maybe that’s what is breaking the system as it requires access to that?
 

koffiejunkie

Executive Member
Joined
Aug 23, 2004
Messages
9,000
#8
Is it hanging the whole system (i.e. you can no longer ssh to it)? Or just the df command, and by extension your session? I'll assume the latter.

This is not an NFS problem. df by default includes local and remote filesystems. You want to add -l to df. Note the difference:

Code:
root@u16lab:~# df -h
Filesystem        Size  Used Avail Use% Mounted on
udev              232M     0  232M   0% /dev
tmpfs              49M  5.5M   43M  12% /run
/dev/xvda1         20G  1.6G   18G   8% /
tmpfs             242M     0  242M   0% /dev/shm
tmpfs             5.0M     0  5.0M   0% /run/lock
tmpfs             242M     0  242M   0% /sys/fs/cgroup
tmpfs              49M     0   49M   0% /run/user/0
192.168.3.2:/srv   20G  2.1G   17G  12% /mnt/data

root@u16lab:~# df -lh
Filesystem      Size  Used Avail Use% Mounted on
udev            232M     0  232M   0% /dev
tmpfs            49M  5.5M   43M  12% /run
/dev/xvda1       20G  1.6G   18G   8% /
tmpfs           242M     0  242M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           242M     0  242M   0% /sys/fs/cgroup
tmpfs            49M     0   49M   0% /run/user/0
root@u16lab:~# man df
-l, --local
limit listing to local file systems
With soft, it should time out eventually. Check what the timeout is set to:

Code:
root@u16lab:~# grep nfs /proc/mounts
none /proc/xen xenfs rw,relatime 0 0
192.168.3.2:/srv /mnt/data nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.3.2,mountvers=3,mountport=892,mountproto=udp,local_lock=none,addr=192.168.3.2 0 0
Now mount with a more short timeout to check

Code:
root@u16lab:~# mount -t nfs -o vers=3,soft,timeo=5 192.168.3.2:/srv /mnt/data
Firewall off the NFS server and test this:

Code:
root@u16lab:~# time ls /mnt/data
ls: cannot open directory '/mnt/data': Stale file handle

real	0m7.018s
user	0m0.000s
sys	0m0.000s
 

static_sa

Senior Member
Joined
Oct 3, 2011
Messages
852
#9
Thanks for the replies. I changed it to soft as well as some timeout values. The mount is to a backup server, and the actual host as well as all the vms hang when it loses the mount (Because it was hard), but now it fails gracefully
 

Fransh

Senior Member
Joined
Jul 30, 2011
Messages
837
#11
Please tell me there are some greybeards on this forum...

When my node loses connection to an NFS mount, when running the df command, the whole system hangs until it is unmounted.

Is there an option in fstab or exports that can safeguard me from this?

fstab:

defaults,_netdev,soft,intr,nolock,noacl,noatime,sync,proto=tcp,mountproto=udp,port=(removed) 0 0

NFS server exports:

(rw,async,no_root_squash,no_subtree_check,no_wdelay,fsid=16)

I was thinking changing the retrans to a very low value?

Any help is appreciated!
no_root_squash... Lovely privilege escalation ;)
 
Top