DiskSuite uses metadb DB replicas to hold all the critical information needed to access your data stored on metadevices. These replicas are essential in many cases to maintain your data integrity, and no discussion of Disksuite would be complete without some discussion on the care and maintenance of thes DB replicas.
Topics include:
Disksuite requires space on the disk to store its metadb database replicas. Because this database contains the critical information needed for you to access the disks, it must be replicated as widely as possible. You should in general spread the replicas out even over as many disks as are available, and where possible they should be evenly distributed over multiple controllers as well (although this is often not feasible). On a system with only 2 internal disks, the replicas will most likely be limitted to those 2 disks, and should be divided equally between them (this way the system can stay up if either disk fails). We also have some systems with 4 internal drives, and in these cases we replicate the databases across all 4 drives, even if only two of them are to be mirrored.
The database replicas take up room on the disk which cannot be used for filesystems, etc. The typical partition scheme for a Glued Solaris box is as follows:
/
(root slice)
/usr
/usr/vice/cache
(AFS cache)
/var
/usr/afs
on AFS servers, maybe DB replicas
As can be seen, the standard glue set up uses most of the slices available. Slice 2 might be usable, but I would recommend against it especially on a system disk. That leaves slices 6 and 7 free. Physics generally puts the DB replica on one of these 2 slices. The replicas are not that large, 8192 blocks or about 4MB on the recent versions (Solaris 9), and much smaller (about 1/2MB) on earlier (Solaris 7,8 ) versions, and we usually put 2-3 copies on each disk. (NOTE:it is important to spread the copies out over multiple disks, and have the same number of replicas on each disk.) Since I dislike making slices smaller than 50 or so MB, we usually waste a fair amount of space anyway. The other slice may have additional local space available if the disk is big enough that I cannot justify expending the entire disk on system slices.
However, if you hope to make the system an AFS server (thus using slice 6), and possibly put data on slice 7, you have a problem, as there are no more partitions free to put the DB replicas. Fortunately, there is a way around that, at least if you do the mirroring before making the system an AFS server. Disksuite can share a slice between the DB replicas and a filesystem in some cases:
-l
option to metadb.
Because it is unwise to have disksuite manage a /vicep
partition
on an AFS server, and since you would want the AFS server software of an
AFS server mirrored also, the best bet is if you can mirror the system before
the AFS server software is installed. Put the DB replicas on slice 6, mirror
root (/
), /usr
, /var
, swap, and
the AFS cache as normal, then create an empty metadevice on slice 6,
newfs
it, and mount it on /usr/afs
.
Some example configurations from Physics:
/usr/afs
on slice 6, and three
DB replicas on slice 7. Slices 0, 1, 3, 4, 5, and 6 are all mirrored.
/usr/afs
, and slice 7 contains the
extra space. Slices 0, 1, 3, 4, 5, and 6 are mirrored. Slice 7 may or may not
be mirrored (definitely not if used as a vice partition).
/usr/afs
. Slices 0, 1, 3, 4, 5, and 6 on the
system disks are mirrored. Slices 0-6 on the other two disks are available,
and may or may not be mirrored (definitely not if using as a vice partition).
Regardless of whether you want to do logging, mirroring, striping, or RAID, you need to create the metadb DB replicas for Disksuite. Because this step is so universal, it is being covered in its own section.
Before creating the DB replicas, you should have:
command to do this. If you are mirroring the
disks, you want them to have the same partition structure anyway, so once
the first disk is set up, you can use the command
prtvtoc /dev/rdsk/DISK1 | fmthard -s - /dev/rdsk/DISK2
DISK1
to
DISK2
.
We are now ready to create the state meta-databases. First, make sure
no one configured disksuite without your knowledge by checking for the
existence of DB replicas with the command metadb
.
Solaris 2.7 users may have to give a full path to the metadb
command, e.g. /usr/opt/SUNWmd/sbin/metadb
. On Solaris 8 and 9,
it is in /usr/sbin
which should be in your path. This should
return an error complaining that "there are no existing databases". It
might also just return nothing (usually indicating that DB replicas were
set up once and then all were deleted).
If you get a list of replicas, STOP. Someone set up or tried to set up disksuite before you, and figure out what the status is before proceding further. Using the command below to try to create another initial database set will hopefully yield an error, but if not could be disastrous, wiping out the previous DB and making the previously mirrored, striped, etc. disks inaccessible.
For a two disk system, Sun advises a minimum of 2 replicas per disk; physics uses 3. To create the initial replicas, issue the command (as root):
metadb -a -f -c 3 slice
/dev/dsk/c0t0d0s7
to put it on slice 7 of the 1st disk.
The -c 3
in the above command instructs it to put three copies of
the DB there. The -a
says we are adding replicas, and the
-f
forces the addition. NOTE: the -f
option should only be used for the initial DB replica,
when it is REQUIRED to avoid errors due to lack of any
previously existing replicas.
NOTE: if you a replacing a replica set on a partition that
has a file system on it, be aware of the change in the default replica size
between Solaris 7,8 and Solaris 9. You may need to use the -l
option on metadb to limit the size of the new replicas so as not to overwrite
the beginning of the filesystem, or do some nasty recreation of the filesystem
to a smaller size.
You can check the databases were created successfully by issuing the
metadb
command without arguments, or with just the -i
argument to get a legend for the flags. You should now see 3 (or whatever
value you gave to the -c
argument) DB replicas in successive blocks
on the slice you specified. At this point, only the a
(active)
and u
(up-to-date) flags should be set.
Now add the replicas on the second (and any other) drives. This is done with a command like:
metadb -a -c 3 /dev/dsk/c0t1d0s7
-d
option to
delete all replicas on the named partition, and then re-add the correct number.
You can use the plain metadb
command (or give it the
-i
option for the flags legend) to verify the databases are
functioning properly. This should be used right after creation to ensure
they were created successfully, and is also useful to use later to verify
things are OK.
You should again see a line for each replica on each disk, along with some flags indicating the status of each replica. In general, lower case flags are good, upper case flags are bad. The following flags seem to be set on a functioning system (flags should appear for every replica unless otherwise stated):
a
: the replica is active. This should always be set.
m
: flagging the master replica (only one replica should
have this set, usually the first)
p
: the replica is patched into the kernel. This should
get set after the first reboot (why? what does it mean?)
l
: the replica was read successfully. This should get
set after the first reboot?
u
: the replica is up-to-date. This should always
be set.
o
: the replica was active prior to the last database
change. This should get set after the first reboot.