Recently, I have found that there is a great deal of need to have the ability to flexibly and non-service-interruptingly (yeah, it’s a fake word) migrate an existing PV that resides in a LVM2 volume group to a new PV that is presented to a server. In an effort that I recently undertook, I was tasked with migrating a 2.0TB SATA LUN to a 2.5TB FC LUN. This is an easy enough request to fulfill, so why the importance of this you might ask? Well, the 2.0TB SATA LUN was hosting the data files for a production Oracle database that requires 99.999% uptime (actually, this is a by-law requirement for financial institutions in some states — and for my luck, this was the requirement for me as well). So, to the drawing board I went…

The scenario is this:

  • less than five minutes of available downtime;
  • 2.0 actual-used terabytes of data;
  • and different SAN shelves on different filers on the backend, so disk-mirroring tricks were out of the question (and there was good reason to not do that, which I will address in a future post)…

I started to become nervous. This is good hardware that I’m working with, so a reboot could probably fall within the five minutes of allotted downtime, but what am I going to accomplish with that? And, do I really want to burn my five minutes watching the box boot up? And, what if there’s a new problem that arises during reboot and we go outside of the five minutes, possibly incurring a state-issued fine as a result? I ruled it out… Rebooting was far too risky for this situation, and quite frankly not an option I’m a huge fan of anyway.

I went over the list of other possibilities in my head:

  • RAID1 software mirroring — nope, already have the existing disk structure;
  • LVM mirroring — this was actually a suitable solution, but I didn’t want the 2.5TB FC disk to be part of a mirrored array when it was over, I wanted it to be a standalone PV that thought it was the original 2.0TB SATA disk, only bigger and faster. Due to the sensitive nature of the data, I was hesitant to perform any logical volume conversions on the existing data drive, so I completely ruled out the mirrored logical volume idea;
  • dd? — Yeah, I did some google’ing around to see if other people on this planet had done anything like this, and the only thing that I found were posts from people still living in the pre-LVM old-school (no offense) and wanted me to do a block-level copy of one drive to the other. Again, too risky — by my calculations this was going to take somewhere in the range of 16 hours to complete, and god only knows what kind of disk-level Oracle problems we would’ve run into when it was all said and done… No, this was most certainly not the solution.
  • Copy the files from one drive to the other? Yeah, right…
  • Why not just create a new logical volume on the 2.5TB FC LUN and have Oracle move all of the data files over? I suppose that this could have been a possibility, but it’s not really the solution that we’re looking for! We don’t need layer-7 interaction to simply change one disk to another in a logical volume, that’s stupid… There’s gotta be an easier way to do this.

I was starting to get frustrated… But, I fell back on my years of extensive experience and training… and headed straight to google. I have a lot of experience working with LVM and I know its capabilities, so I knew that one of the developers had already thought of a solution for this issue, and that a tool certainly existed somewhere that would give me what I needed. I, of course, was right, and as it turns out the solution was extremely simple. The “pvmove” command became my newest friend… Though, unfortunately, I couldn’t really find a great deal of practical documentation and real-world testimonials, so I had to rely heavily on what the manpage was telling me.

What the manpage told me, and as it turns out also became my own personal experience, is that pvmove is really sick and was designed for exactly what I was trying to do. I’m going to try to outline in the highest-level terms the way that pvmove allows us to perform massive online data migrations without involving any downtime or end-user service interruptions… Let me begin by recapping the scenario in more detail:

  • Existing Physical Volume: 2.0TB, /dev/mapper/mpath0p1
  • Existing Volume Group: oracleprod
  • Existing Logical Volume: data (/dev/mapper/oracleprod-data)

Goal: We need to move to the following configuration, with at most 5 minutes of downtime and/or service interruption (I’m shooting for zero minutes) —

  • Physical Volume: 2.5TB, /dev/mapper/mpath1
  • Volume Group: oracleprod
  • Logical Volume: data (/dev/mapper/oracleprod-data)
  • Note: we’ll need to use the existing Volume Group and the existing Logical Volume configuration to ensure that the server-side configuration does not cause any referencing problems for Oracle (mount points, etc…).

During the migration from the 2.0TB SATA disk to the 2.5TB FC disk, pvmove will analyze the structure of the 2.0TB SATA disk and break it into block-level segments that will be migrated individually from the source to destination disks. Since LVM is abstracting the logical volume, if any program performs a modification of the data in a physical extent that has already been migrated, LVM will mirror that modification to both the source and destination disks. Additionally, as pvmove is performing the migration it will set checkpoints for itself, indicating the parts that it has already completed. This is great because if for whatever reason I needed to cancel the migration before it was completed (pvmove ——abort), I could go back and resume it from where it left off without having to completely start-over, a benefit I would not have been afforded with dd. The final point and major benefit to doing things this way is that pvmove allows you to run it in the background, meaning that you don’t need to sit there and hand-hold it while its doing it’s thing — and then, when it is done with the migration, it will no longer write new data to the source disk, allowing it to be safely removed from the Volume Group.

This is more detail than I usually go into, but I think that it is important to understand these things so that you can have piece of mind during your migration. With LVM abstracting the Logical Volume layer for the application, Oracle was completely unaware that anything was going on in the background, and didn’t need to be involved in the disk changing experience. Simply put — I was able to perform this entire operation online, without any downtime required, and only one final piece to the puzzle after the migration was complete (and really the ultimate reason why I had to do this in the first place)… Resize the Logical Volume from the existing 2.0TB configuration to the new 2.5TB configuration to take advantage of the additional extents. Since the /dev/mapper/oracleprod-data Logical Volume is not the OS Logical Volume, and since I was adding extents to the LV and not removing them, I was able to perform the filesystem resize online, while the FS was mounted, without service impact or interruption. When it was all said and done, the Oracle instance ended up with a 2.5TB ext3 logical volume, and neither user nor application was any the wiser.

The bare metal:

    # Begin by pvcreate'ing the new (2.5TB FC) disk
        - pvcreate /dev/mapper/mpath1
    # Now, add this disk into the oracleprod Volume Group
        - vgextend oracleprod /dev/mapper/mpath1
    # Fight the urge to lvextend -- we need the physical extends from the new disk to be "available physical extents" in order to perform the migration.
    # Begin the migration from the /dev/mapper/mpath0p1 (2.0TB SATA disk) PV to the new /dev/mapper/mpath1 (2.5TB FC disk) PV, and fork it into the background so that we can walk away from it.
        - pvmove -b /dev/mapper/mpath0p1 /dev/mapper/mpath1
    # For 2.0TB SATA to 2.5TB FC, on different filers and different shelves, this took approximately 7 hours to complete. Amazingly, during this time the load on the box never exceeded a 5-min average of 1.00, and I suspect that if normal business operations weren't occurring during this migration that it would not have broken a sweat at all. Of course, your mileage may vary depending on your hardware and common load averages.
    # Periodically, I check the progress of the migration... Make sure you're looking in the "Copy%" column for the LV and VG that you're working with.
        - lvs -a -o+devices
    # When the migration is completely finished, you will know because the "Copy%" column will no longer register a value for the LV and VG that you're working in.
    # Now, you can safely remove the original PV (2.0TB SATA) from the VG. pvmove has already seamlessly performed all of the necessary backend changes required to ensure that all of the data is going to the new disk.
        - vgreduce oracleprod /dev/mapper/mpath0p1
        - pvremove /dev/mapper/mpath0p1
     # And, finally, perform the filesystem resize as we already know how to. Without specifying a new size, of course, it instruct the filesystem to consume the entire length of the Logical Volume.
        - resize2fs /dev/mapper/oracleprod-data
        - mount -o remount /dev/mapper/oracleprod-data

And, that’s all there is to it. Full-fledged data migration from one Physical Volume to another, 500GB of additional space, and we’re now on Fiber Channel — mission accomplished!

There are a few caveats for me to cover:

      ** I absolutely encourage you to read the pvmove manpage thoroughly before you do anything!!
      1. While I hope that this isn’t the case, I realize that in some instances this may have happened… If you have a single physical volume servicing physical extents for multiple volume groups, then during your pvmove command, you will additionally need to supply the “-n ” parameter.
      2. You may have noticed that in this article I was moving from a source physical volume that was built on a (type 8e) partition table to a physical volume that was pvcreate’d without a partition table. This can be a little bit confusing, and honestly goes against ALL OF OUR RED HAT TRAINING, but I have recently learned that actual best practices for LVM2 is to pvcreate on an entire disk, without a partition table in place when you are working with a physical volume that will require no non-LVM partitions to be on it as well. Simply put, if you’re going to have a mix of regular ext3 and LVM on the same drive, you need a partition table (for example, check out an “fdisk -l” on your OS disk — grub doesn’t support LVM, so we need /boot to be ext3). I intend to cover LVM2 best practices in a subsequent posting, so stay tuned for that.

Good luck, enjoy, and as always, feel free to post or email me with questions —


5 Responses to “Migrating Physical Volumes in a LVM2 Volume Group”

  1. Nice article! This is what i’ve been looking for! thanks for this! The only challenge for me now is how to migrate ASM devices from DMX to VMAX. Any idea?

  2. Nice job on the sequence of things. The thing that I still can’t get any verification on from RH is the whole alignment offset particularly when using CLARiiON arrays and intel hardware. Some still insist you need the 128 offset done by using fdisk, and others say no. The LVM offset feature seems to only let you specify something larger than 256 which then probably defeats the purpose unless you know exactly what the next offset should be. anyhow, nice article.

    • The joys of storage administration. Depending on your filesystem’s configuration, you will have different tuning preferences. Since I have never found any one place that says, “THIS IS THE BEST WAY TO DO IT!” Typically what I do is try many different configurations, both from the storage backend (in a SAN situation) and from the server and run some real cut-and-dry dd tests to check performance. Whichever yields the best results, I go with.

  3. nice write up.. Moving my Satellite box and a bunch of other servers from DMX to VMAX and this confirmed what I was already thinking. I’m actually doing a pvmove on my satellite repo as I type this :)

    Thanks again!

Leave a Reply to vert Cancel reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© 2013 Dan's Blog Suffusion theme by Sayontan Sinha