Tuesday, October 7, 2014

kvm_callbacks

As i promised in previous chapter, lets have a closer look at kvm_callbacks. This is
initialised in

static int __init vmx_init(void) {
     r = kvm_init(&vmx_x86_ops, sizeof(struct vcpu_vmx), THIS_MODULE);
}

we need to dig deep into vmx_x86_ops in chapter vmx.c. But in this chapter we will have  at
when & why & who are calling this functions.

kvm_callbacks functions

1.  /// For 8bit IO reads from the guest (Usually when executing 'inb')
    int (*inb)(void *opaque, uint16_t addr, uint8_t *data);
   
    In previous chapter, you would have seen switch case statement for run->exit_reason. Upon vm_enter, cpu
    will start executing guest code on cpu directly. If it try to execute any previleged instruction or in
    other words, any instruction that cannot be done in cpu vm mode, cpu will exit and will do below switch-case
    statement to take care of that case.

    switch (run->exit_reason) {
          vm_exit happened when it try to do a IO operation. io could be a simulated device or a real one.

          Both needs the vm   to exit out of vm context.          
 

        case KVM_EXIT_IO:
               /*
                * based on size of io. this will invoke  kvm->callbacks->inb() or
               *  kvm->callbacks->inw(), or  kvm->callbacks->inl()  ...and based on
               * direction it may call  kvm->callbacks->outb(),  kvm->callbacks->outw()
               * or  kvm->callbacks->outl().
               */
           
              
2. /// generic memory reads to unmapped memory (For MMIO devices)
    

    int (*mmio_read)(void *opaque, uint64_t addr, uint8_t *data,
                                            int len);
                                           
    Let us explain mmio_read by taking an example. we can emulate so many devices with
    qemu system. One of them is e1000 nic. You have seen in chapter #1 , how pc_init1() is
    invoked.


    pc_init1() {
        pci_nic_init(pci_bus, nd, -1) {
                if (strcmp(nd->model, "e1000") == 0)
                     pci_dev = pci_e1000_init(bus, nd, devfn) {
                
                     pci_register_device(bus, "e1000");
                   
                    /* e1000_mmio_read & e1000_mmio_write are array of functions. for each
                     * array  at offset 0 , e1000_mmio_readb, e1000_mmio_readw, e1000_mmio_readl
                     * 
                     *
                     */
                    d->mmio_index = cpu_register_io_memory(0, e1000_mmio_read,
                                                              e1000_mmio_write, d);
  
                     pci_register_io_region((PCIDevice *)d, 0, PNPMMIO_SIZE,
                                   PCI_ADDRESS_SPACE_MEM, e1000_mmio_map);
  
                     pci_register_io_region((PCIDevice *)d, 1, IOPORT_SIZE,
                                          PCI_ADDRESS_SPACE_IO, ioport_map);
               }
        }       
   }
  

   cpu_register_io_memory() function does registers read and write function. For example
   cpu_register_io_memory(e1000_mmio_read, e1000_mmio_write, d) registers e1000_mmio 

    read and writ as shown  above.

   suppose a vm exit happened due to memory read/write.

   case  KVM_EXIT_MMIO:
   
      kvm_callbacks->mmio_read() {
             kvm_mmio_read() {
                   cpu_physical_memory_rw(addr, data, len, 0) {
                 
                    /* if it is a byte write  function ; which is at offset 0*/
                    io_mem_read[io_index][0](io_mem_opaque[io_index], addr, val);
                 
                     }
            }
   
      }
   

interrupts

before vm_enter, we will check for any interrupts available to proces and queues them.
ultimately it will set the interrupt.pending flag to true.


   kvm_run() {
       run->request_interrupt_window = try_push_interrupts(kvm);
   }


the above try_push_interrupts(kvm) function invokes a series of function calls
try_push_interrupts(kvm) {

     kvm_callbacks->try_push_interrupts() {
            kvm_arch_try_push_interrupts() {
                     kvm_update_interrupt_request() {
                      /* Get irq number */                   
                      irq = cpu_get_pic_interrupt(env);
                       
                      /* queue an  interrupt */
                      r = kvm_inject_irq(kvm_context, env->cpu_index, irq) {
                              ioctl(kvm->vcpu_fd[vcpu], KVM_INTERRUPT, irq) {
                                     kvm_queue_interrupt() {
                                              vcpu->arch.interrupt.pending = true;

                                      }                                    
                         }
                           
                   }
                                                      
         }
           
    }


processing of interrupt

inject_pending_irq() will call kvm_x86_ops function and will go deep into vmx.c.  i will cover them in vmx.c

__vcpu_run() {
       r = vcpu_enter_guest(vcpu, kvm_run) {
               inject_pending_irq(vcpu, kvm_run) {
                     kvm_x86_ops->set_irq(vcpu) {
                             vmx_inject_irq() 
                    }
               }
        } 

 }
  

Monday, September 1, 2014

vCPU creation

we have created VM thru an ioctl call. Now it is time for us to trace path of vCPU creation. In Last chapter you would have notitced the .init() function of
QEMU machine. The .init() function will create &initialize the cpu and create a thread to run the VM.  creation of cpu is done by kvm_create_vcpu(). kvm_main_loop() function does loop for ever.

 machine->init(ram_size, vga_ram_size, boot_devices, ds,
                  kernel_filename, kernel_cmdline, initrd_filename, cpu_model) {


    pc_init_pci() {

 
          pc_init1() {



              for(i = 0; i < smp_cpus; i++) {
                       

                   env = pc_new_cpu(i, cpu_model, pci_enabled) {

                         /* cpu_init macro would be pointing to cpu_x86_init. This is initialised
                            different cpu_arch_init() 
                          */
                         cpu_init(cpu_model) {
                                  
                                kvm_init_new_ap(env->cpu_index, env) {


                                     /* Create a new thread using pthread. main function 
                                         ap_main_loop
                                      */
                                      ap_main_loop();

                                            /* This create vpcu */  
                                            kvm_create_vcpu();
                                          
                                           
                                            kvm_main_loop_cpu(env); 

---------------------------------------------------------------------------------------

Lets study functions kvm_create_vcpu() &  kvm_main_loop_cpu() in next sections.


 void kvm_create_vcpu() {

        /* allocate memory for run structure */
         kvm->run[cpu_num] = mmap();

         r = ioctl(kvm->vm_fd, KVM_CREATE_VCPU, slot) {

               vcpu = kvm_arch_vcpu_create(kvm, n) {

                    kvm_x86_ops->vcpu_create(kvm, id) {
                           // .vcpu_create = vmx_create_vcpu is in
                           //vmx.c so it will call   vmx_create_vcpu()
                           vmx_create_vcpu() {
                                 // we need a bigger discussion on vmx_create_cput ...but 
                                 // will do it in "vmx.c chapter"

                           } 


                    }

               }

        }
        


-----------------------------------------------------------------------------------------
All resource needed for vCPU is allocated in previous step; now it is the time to 
start running.


void kvm_main_loop_cpu() {

   kvm_load_registers() {
        kvm_arch_load_regs()
   }

   
   
   while(1) {
                      
       kvm_cpu_exec() {

                 kvm_run {
                        int fd = kvm->vcpu_fd[vcpu];
                        struct kvm_run *run = kvm->run[vcpu];


                        /* Start VM using an ioctl KVM_RUN. this will move control to vmx.c

                        */
                        ioctl(fd, KVM_RUN, 0);
                 }                                         
                     

                 /* when virtual machine exit. The control comes here. Now it is time for us to 
                  * analyse the exit reason 
                  */


                  switch (run->exit_reason) {
                        case KVM_EXIT_IO:
                              /* handle_io calls kvm->callbacks->inb() etc functions to emulate the
                               * the behaviour
                               */    
                               r = handle_io(kvm, run, vcpu);

                                               
                        case KVM_EXIT_MMIO:
                               r = handle_mmio(kvm, run); {
                               /* this will call specific function for each hardware emulated device;
                                * Eg, for e1000 , it is  e1000_mmio_read &  e1000_mmio_write.
                                * These functions got registered during pci_init() and callbacks- 

                                * thru kvm-callbacks.                                                                               * d->mmio_index =     
                                *       cpu_register_io_memory(e1000_mmio_read,e1000_mmio_write, d);
                                *  we will cover more on peripheral chapter 
                                */ 

                       case KVM_EXIT_HLT:
                                r = handle_halt(kvm, vcpu);

                                break;
 
                   }
             



          }
          /* wait for SIG_IPI signal with a timeout */       
          kvm_main_loop_wait() //end of while loop

     }                        

-----------------------------------------------------------------------------------------------------

Lets talk about vm_enter and vm_exit. once QEMU decides that it is time to run 
a VM,  it can setup registers and call a vm_enter. vm_enter is implemented in assembly language and is calling processor specific processor specific  instruction to enter VM mode. In this mode, processor can execute VM code directly ; no need of emulation. This is possible only when we execute VM of same arch. for example, if you run a x86 vm on x86 linux os. 

                          
                                            


Thursday, August 28, 2014

This is about QEMU

                         
Qemu is VMM ( virtual machine monitor. That means QEMU helps to run your virtual machine. This can be a full visualization in software or a hardware assisted one( VT ). KVM (Kernel-based Virtual Machine) is a visualization infrastructure for the Linux kernel that turns linux into a hypervisor. KVM comes in form of a kernel module. QEMU can use this kernel module to avail the visualizationfunctionalists of processor. QEMU avails them thru ioctl calls to kernel.

I referred kvm-76 source code to make this doc. I picked up an older version to understand KVM better. 


I have copy/pasted source code from kvm-76 to here and used C style syntax for better code flow understanding. Those who are familiar with C language can understand it better. I have colored functions to understand the nesting better 
 - BLACK , RED, ORANGE ,  GREEN, BLUE

main() {

     layer_0 () {
             layer_1 () {
                     layer_2 () {
                             layer_3()
                                   layer_4()   




/* main() function is in file qemu/vl.c */ 


main() {    
     

   /* If you have seen manual page of qemu, you might be aware of various 
    * types of machines that qemu can emulate. This can be selected using a
    * command line option to qemu executable. All these different machine types 
    * are stored as QEMUMachine in the qemu source code. All these machines needs
    * to be stored in a linked list. register_machine() routine does that job.

    */                
    register_machines();



   /* if we have KVM defined & kvm is supported; Let us initialise KVM */
   #if USE_KVM


   kvm_qemu_init() {



           /* kvm_init() function malloc memory for kvm_context_t structure. This
            * structure important data structure like  below.
            *          struct kvm_context {
            *             /* Filedescriptor to /dev/kvm */
            *             int fd;

            *             /* file descriptor to virtual machine */  
            *             int vm_fd;

            *             /* file descriptor to no of vcpus */
            *             int vcpu_fd[MAX_VCPUS];

            *             struct kvm_run *run[MAX_VCPUS];
            *             /* Callbacks that KVM uses to emulate

            *              * various unvirtualizable functionality
            *              *
            *               struct kvm_callbacks *callbacks;
            *           }

            *
            *   
                        
            *  /dev/kvm device file is opened in read-write mode. 
            *  /dev/kvm device node is created upon insmod of kvm kernel module. 
            *  The file handle ( of /dev/kvm ) that opened in last step is 
            *  assigned to fd member in kvm_context_t. kvm_context_t has got an
            *  interesting member called "callback" (which is of type 
            *  structure "kvm_callbacks").
            *  This structure holds pointers to various functions that KVM will

            *  call them when it encounters something that cannot be virtualised,
            *  such as accessing hardware devices via MMIO or regular IO.
            *  This structure contains  routines like (*inb)(),(*outb)(),
            *  (*mmio_read)(), (*halt)(), (*io_window)(),    
            *  (*try_push_interrupts)(), (*post_kvm_run)(), (*pre_kvm_run)(). 
i           *  This structure is statically  initialized and passed to vm_init() 
            * and the same is assigned to kvm->callbacks member. 
            */

            kvm_init( ) {
               kvm_context_t kvm;
               fd = open("/dev/kvm", O_RDWR);
               kvm->fd = fd;
               kvm->vm_fd = -1;

               kvm->callbacks = callbacks;
               kvm->opaque = opaque;
            }



            /* chapter 3 is dedicated to talk about kvm_callbacks. please refer 
             * that chapter for more info.
             */



             /* We got handle to /dev/kvm in previous code snippet. we can 

              * create  a VM thru ioctl call to this file handle.  There will be 
              * a file handle for each vm created. The same is stored in vm_fd 
              * member of kvm_context
              */


              kvm_qemu_create_context() {
                 kvm_create() {
                      kvm_create_vm() {
                        /* we will explain more on VM creation and running 

                        * in next chapter. As you can see VM creation is thru
                        * an ioctl call to /dev/kvm.
                        */
                        fd = ioctl(kvm_context->fd, KVM_CREATE_VM, 0);
                        kvm_context->vm_fd = fd;
                      }
               

              }
     }                
                          
    
     /* We can specify "how much memory a VM machine should use" as 
      * a parameter to qemu executable.  The same is allocated and stored in 
      * phy_ram_base. I did not plan to explore more on the ram 
      * allocation. Is anybody out there to help me ? i will add it then.
      */       
      phys_ram_base = qemu_alloc_physram(phys_ram_size);



     /* QEMUMachine structure has a function called .init() . 
      * This is different for each machine type.
      *
      *  Lets take the case of pc_machine.
      *  QEMUMachine pc_machine = {
      *                        .name = "pc",
      *                        .desc = "Standard PC",
      *                        .init = pc_init_pci,
      *                         .ram_require = VGA_RAM_SIZE + PC_MAX_BIOS_SIZE,
      *   };
      *  .init() routine is the one who actually loads os , bootloader etc. 
      *   
      */  


      machine->init(ram_size, vga_ram_size, boot_devices, ds,
                  kernel_filename, kernel_cmdline, initrd_filename, cpu_model)
{


            pc_init_pci() {
                    pc_init1() {
                       
                        /* allocate memory and register */
                        ram_addr = qemu_ram_alloc(0xa0000);
                        cpu_register_physical_memory(0, 0xa0000, ram_addr)
{

                              kvm_register_phys_mem() {
                                    /* KVM_SET_USER_MEMORY_REGION is covered 

                                     * little bit more in datail in
                                     * in chapter 4
                                     */
                                  r = ioctl(kvm->vm_fd, 

                                            KVM_SET_USER_MEMORY_REGION, &memory);
                                }


                       
                        }

                    }

            }

            /* Load linux kernel image if we have specified qemu to load
             * a linux image as a command line parameter to qemy
                     */

            load_linux(kernel_filename, initrd_filename, kernel_cmdline); 

      }  

      /* sleep for events */               
    main_loop() {
        

        kvm_main_loop() {
              while (1) {
                    main_loop_wait(1000);
              }   
           
            }

     }   


}

                      

            KVM - The Linux Kernel Virtual Machine

                                                          

Introduction

It is always pleasure to find the stuff in google search when you are looking for something. Yes, offcourse, Google searching is a skill that you develop over years of experience. we are assigned some task on visualization by manager and could not find good info on web to have a good understanding on KVM. or we can blame Google search engine for not helping me. I could not find good books on KVM. So as a last resort, we started looking at the code. 

Suddenly it popped up in my mind to create a blog on KVM. Haha...it will be done by somebody if not me. I admit that i am a beginner. I request all the experts and readers to correct my mistakes and send me suggestions to improve this blog. You can reach us at ratheesh.ksz@gmail.com or My colleague John Thomas jothoma4@cisco.com 

I have divided the KVM documentation into 8 parts. I have only chapter headings as of today. No contents. Hope that i can complete all of them with all your support and suggestions. I may publish chapters in any order. Please bear with me as i am employed by a software company and have to do a lot of work there to earn my salary.


 Chapter 1: This is about QEMU
 Chapter 2: vCPU  creation
 Chapter 3: kvm_callbacks
 Chapter 4: vmx.c
 Chapter 5: MMU.
 Chapter 6: emulation of a device.
 Chapter 7: Tuning ?
 Chapter 8: Hacks.

 Chapter 9: KVM migration

I will concentrate more on KVM than QEMU.  Please write to me if you need info there as well. I will try (*conditions apply ) to include them