How does gfs2 work




















If you have a larger file system and can afford the space, using MB journals might improve performance. In general, 4K blocks are the preferred block size because 4K is the default page size memory for Linux. If your block size is 4K, the kernel has to do less work to manipulate the buffers. When a GFS2 file system is created with the mkfs. It attempts to estimate an optimal resource group size ranging from 32MB to 2GB. You can override the default with the -r option of the mkfs.

Which is better? How can I performance-tune GFS or make it any faster? How can I convert a file system from gfs1 to gfs2? Why is access to a GFS file system slower right after it's mounted?

After a node is fenced GFS hangs for X seconds on my other nodes. Can I reduce that time? Will my application run properly on a GFS file system? What does it mean when a GFS file system is withdrawn? After a withdraw, can you simply remount your GFS file system? What is GFS and why do I need it? GFS is the file system that runs on each of the nodes in the cluster. Like all file systems, it is basically a kernel module that runs on top of the vfs virtual file system layer of the kernel.

It controls how and where the data is stored on a block device or logical volume. In order to make a cluster of computers "nodes" cooperatively share the data on a SAN, you need GFS's ability to coordinate with a cluster locking protocol.

One such cluster locking protocol is dlm, the distributed lock manager, which is also a kernel module. It's job is to ensure that nodes in the cluster who share the data on the SAN don't corrupt each other's data. Many other file systems like ext3 are not cluster-aware, and therefore data kept on a volume that is shared between multiple computers, would quickly become corrupt otherwise.

Also, you need two or more computers and a network connection between them. Off-the-shelf PCs don't have shared storage. GFS 6. We have field reports of 45 and 50 TB file systems. Testing these configurations is difficult due to our lack of access to very large array systems. If you know of a bigger one, I'd love to hear from you. Currently, gfs and gfs2 do not use milliseconds for files. They use seconds. This is to maintain compatibility with the underlying vfs layer of the kernel.

If the kernel changes to milliseconds, we will also change. People don't normally care about milliseconds, so milliseconds only become important to computers when doing things like NFS file serving.

For example, to see if another computer has changed the data on disk since the time of the last known request. For GFS2, we're planning to implement inode generation numbers to keep track of these things more accurately than a timestamp can. It basically sets an attribute on an in-memory inode for the directory. No, it's not true. What it prevents is data as a result of the node waking up and erroneously issuing writes to the disk when it shouldn't.

The simple fact is that no one can guarantee against loss of data when a computer goes down. If a client goes down in the middle of a write, its cached data will be lost. If a server goes down in the middle of a write, cached data will be lost unless the filesystem is mounted with "sync" option. Unfortunately, the "sync" option has a performance penalty.

The NFS client should get a timeout on its write request, and that will cause it to retry the request, which should go to the server that has taken over the responsibilities of the failed NFS server. And GFS will ensure the original server having the problem will not corrupt the data.

You probably miss-typed the cluster name on mkfs. Use the 'dmesg' command to see what gfs is complaining about. Even if this is not your problem, if you have a problem mounting, always use dmesg to view complaints from the kernel. Unlike ext3, GFS will dynamically allocate inodes as it needs them.

Therefore, it's not a problem. It depends on file size and file system block size. Assuming the file system block size is a standard 4K, let's do the math: A GFS inode is bytes 0xe8 in length. By the way, in this case we say the file "height" is 0. Slightly bigger and the file needs to use a single-level of indirection, also known as height 1. The inode's bytes will be used to hold a group of block pointers. These pointers are bits each or 8 bytes so you can fit exactly of them on the block after the disk inode.

If you have all pointers to 4K blocks, you have at most 1. If your file gets over 1. That means you're inode will have at most pointers to 4K-blocks which can each hold block pointers. If your file is bigger than MB, you'll need a third level of indirection height 3 , which means your file can grow to have MB of pointers, which is enough for pointers.

The file can grow to bytes, or MB, also known as GB. Still bigger, at height 4, we get a max file size of , also known as GB or TB. If your file is bigger than TB, egads! Also, extended attributes like ACLs, if used, take up more blocks.

Yes you can. At the time of this writing, software RAID is not cluster-aware. Since software RAID can only be running on one node in the cluster, the other nodes will not be able to see the data properly, or will likely destroy each other's data. Sometime after 2. If your Linux distribution is running an older kernel, you may not be able to compile GFS. Your choices are: upgrade your kernel to a newer one, or downgrade your GFS or change the source code so that it uses semaphores like before.

Older versions are available from CVS. Because this is an open-source project, it's constantly evolving, as does the Linux kernel. Compile problems are to be expected and usually easily overcome unless you are compiling against the exact same kernel the developers happen to be using at the time. Surprisingly, yes. Please consider Mozart or Copin instead. Yes, that's a joke, ha ha Yes and No.

Yes it's possible, and one application will not block the other. No, since only one node can cache the content of the inode in question at a particular time, so the performance may be poor. The application should use some kind of locking for example, byte range locking, i.

However, GFS does not excuse the application from locking to protect the data. Two processes trying to write data to the same file can still clobber each other's data unless proper locking is in place to prevent it. Here's a good way to think about it: GFS will make two or more processes on two or more different nodes be treated the same as two or more processes on a single node. So if two processes can share data harmoniously on a single machine, then GFS will ensure they share data harmoniously on two nodes.

But if two processes would collide on a single machine, then GFS can't protect you against their lack of locking. So how is that going to help us achieve a shared file system among all of our web servers if it operates on block devices? There are many answers to that question, only one of which I have explored so far. You can attach block storage to remote systems in a number of different ways, to name a few:.

Let's explore how that is done, as without shared block storage, GFS2 isn't going to do much for us. Firstly, we need to designate our iSCSI target, or in other words, the storage target that the application servers will be accessing.

In this example I will be using just one iSCSI target, which I intend to later have serving my "vhosts" directory for web data. Let's call our iSCSI target system 'storage1'. It is going to need software for creating iSCSI targets, so let's go ahead and install the package "scsi-target-utils":. This package contains some useful commands that we will use to create the iSCSI target. Namely, "tgtadm". Before we can use any of them though, we need to start up the targeting service:.

Now let's attempt to create a new target. For simplicity's sake, I will be naming this particular target "vhosts". In other examples that I have read through, a rather long string was used as the target name. Using a short name like I am here may have its downside, but other than being easily identifiable I am not sure at this point what the significance is. Fire away:. You can display tgtd's current configuration using the following command:.

So there we have our target. Now what is it going to use for storage? If you want to use GFS2, partition it with gfs2. A little explanation here. Therefore, it is good to determine the number of machine you will place into this cluster. Contact support if you need assistance regarding this. After the tag, add the following text:. If you do not make this change, the servers will not be able to establish a quorum and will refuse to cluster by design.

You can check whether it works using the following command.



0コメント

  • 1000 / 1000