How to write a (toy) JVM

The article about KVM turned out to be interesting for readers, so today we publish a new translation of Serge Zaitsev's article: how the Java Virtual Machine works under the hood.


Whether we like it or not, Java is one of the most common and widely used programming languages. However, not every Java developer is curious enough to look under the hood and see how the JVM works.



I'll try to write a toy (and incomplete) JVM to show the basic principles of how it works. I hope this article piqued your interest and inspires you to explore the JVM further.



Our humble goal



Let's start simple:



public class Add {
  public static int add(int a, int b) {
    return a + b;
  }
}

      
      





We compile our class with javac Add.java and the result is Add.class . This class file is a binary file that the JVM can execute. All we have to do is create a JVM that will execute it correctly.



If we look inside Add.class with a hex dump, we probably won't be too impressed: Although we don't see a clear structure here yet, we need to find a way to parse it: what does () V and (II) I , < init> and why does the file start with "cafe babe"?



00000000 ca fe ba be 00 00 00 34 00 0f 0a 00 03 00 0c 07 |.......4........|

00000010 00 0d 07 00 0e 01 00 06 3c 69 6e 69 74 3e 01 00 |........<init>..|

00000020 03 28 29 56 01 00 04 43 6f 64 65 01 00 0f 4c 69 |.()V...Code...Li|

00000030 6e 65 4e 75 6d 62 65 72 54 61 62 6c 65 01 00 03 |neNumberTable...|

00000040 61 64 64 01 00 05 28 49 49 29 49 01 00 0a 53 6f |add...(II)I...So|

00000050 75 72 63 65 46 69 6c 65 01 00 08 41 64 64 2e 6a |urceFile...Add.j|

00000060 61 76 61 0c 00 04 00 05 01 00 03 41 64 64 01 00 |ava........Add..|

00000070 10 6a 61 76 61 2f 6c 61 6e 67 2f 4f 62 6a 65 63 |.java/lang/Objec|

00000080 74 00 21 00 02 00 03 00 00 00 00 00 02 00 01 00 |t.!.............|

00000090 04 00 05 00 01 00 06 00 00 00 1d 00 01 00 01 00 |................|

000000a0 00 00 05 2a b7 00 01 b1 00 00 00 01 00 07 00 00 |...*............|

000000b0 00 06 00 01 00 00 00 01 00 09 00 08 00 09 00 01 |................|

000000c0 00 06 00 00 00 1c 00 02 00 02 00 00 00 04 1a 1b |................|

000000d0 60 ac 00 00 00 01 00 07 00 00 00 06 00 01 00 00 |`...............|

000000e0 00 03 00 01 00 0a 00 00 00 02 00 0b |............|












You are probably familiar with another way to unload class files, often it turns out to be more useful: Now we see our class, its constructor and method. Both the constructor and the method contain multiple instructions. It becomes more or less clear what our add () method does: it loads two arguments ( iload_0 and iload_1 ), adds them up and returns the result. The JVM is a stack machine, so there are no registers in it, all instruction arguments are stored on the internal stack, and the results are pushed onto the stack as well.



$ javap -c Add

Compiled from "Add.java"

public class Add {

public Add();

Code:

0: aload_0

1: invokespecial #1 // Method java/lang/Object."<init>":()V

4: return



public static int add(int, int);

Code:

0: iload_0

1: iload_1

2: iadd

3: ireturn

}












Class loader



How do we achieve the same result that the javap program showed? How to parse a class file?



If we look at the JVM specification , we learn about the structure of class files . It always starts with a 4 byte signature (CAFEBABE) followed by 2 + 2 bytes for the version. Sounds simple!



Since we would have to read bytes, short, int and byte sequences from a binary file, let's start our loader implementation as follows:



type loader struct {
	r   io.Reader
	err error
}

func (l *loader) bytes(n int) []byte {
	b := make([]byte, n, n)
        //        ,
        //        
        //    ,    
	if l.err == nil {
		_, l.err = io.ReadFull(l.r, b)
	}
	return b
}
func (l *loader) u1() uint8  { return l.bytes(1)[0] }
func (l *loader) u2() uint16 { return binary.BigEndian.Uint16(l.bytes(2)) }
func (l *loader) u4() uint32 { return binary.BigEndian.Uint32(l.bytes(4)) }
func (l *loader) u8() uint64 { return binary.BigEndian.Uint64(l.bytes(8)) }

// Usage:
f, _ := os.Open("Add.class")
loader := &loader{r: f}
cafebabe := loader.u4()
major := loader.u2()
minor := loader.u2()
      
      





The spec then tells us to parse the constant pool. What is it? This is the name of the special part of the class file that contains the constants required to run the class. All strings, numeric constants and references are stored there, and each has a unique uint16 index (so a class can have up to 64K constants).



There are several types of constants in the pool, each of which contains its own set of values. We may be interested in:



  • UTF8: simple string literal
  • Class: index of the string containing the class name (indirect reference)
  • Name and type: index of the type name and descriptor used for fields and methods
  • Field and method references: indexes related to classes and constants of type name and type.


As you can see, the constants in the pool often refer to each other. Since we are writing a JVM in Go and there is no union types, let's create a Const type that will contain various possible constant fields:



type Const struct {
	Tag              byte
	NameIndex        uint16
	ClassIndex       uint16
	NameAndTypeIndex uint16
	StringIndex      uint16
	DescIndex        uint16
	String           string
}
type ConstPool []Const
      
      





Then, following the JVM spec, we could retrieve the constant pool data like this:



func (l *loader) cpinfo() (constPool ConstPool) {
	constPoolCount := l.u2()
	//       1
	for i := uint16(1); i < constPoolCount; i++ {
		c := Const{Tag: l.u1()}
		switch c.Tag {
		case 0x01: // UTF8 string literal, 2 bytes length + data
			c.String = string(l.bytes(int(l.u2())))
		case 0x07: // Class index
			c.NameIndex = l.u2()
		case 0x08: // String reference index
			c.StringIndex = l.u2()
		case 0x09, 0x0a: // Field and method: class index + NaT index
			c.ClassIndex = l.u2()
			c.NameAndTypeIndex = l.u2()
		case 0x0c: // Name-and-type
			c.NameIndex, c.DescIndex = l.u2(), l.u2()
		default:
			l.err = fmt.Errorf("unsupported tag: %d", c.Tag)
		}
		constPool = append(constPool, c)
	}
	return constPool
}
      
      





Now we are greatly simplifying the task for ourselves, but in a real JVM we would have to treat constant types long and double in the same way, inserting an additional unused constant element, as the JVM specification tells us (since constant elements are considered 32-bit).



To make it easier to get string literals by indices, let's implement the Resolve (index uint16) string method :



func (cp ConstPool) Resolve(index uint16) string {
	if cp[index-1].Tag == 0x01 {
		return cp[index-1].String
	}
	return ""
}
      
      





Now we need to add similar helpers to parse the list of interfaces, fields and methods of classes and their attributes:



func (l *loader) interfaces(cp ConstPool) (interfaces []string) {
	interfaceCount := l.u2()
	for i := uint16(0); i < interfaceCount; i++ {
		interfaces = append(interfaces, cp.Resolve(l.u2()))
	}
	return interfaces
}

//  field      
type Field struct {
	Flags      uint16
	Name       string
	Descriptor string 
	Attributes []Attribute 
}

//        
//   -   Code,     
type Attribute struct {
	Name string
	Data []byte
}

func (l *loader) fields(cp ConstPool) (fields []Field) {
	fieldsCount := l.u2()
	for i := uint16(0); i < fieldsCount; i++ {
		fields = append(fields, Field{
			Flags:      l.u2(),
			Name:       cp.Resolve(l.u2()),
			Descriptor: cp.Resolve(l.u2()),
			Attributes: l.attrs(cp),
		})
	}
	return fields
}

func (l *loader) attrs(cp ConstPool) (attrs []Attribute) {
	attributesCount := l.u2()
	for i := uint16(0); i < attributesCount; i++ {
		attrs = append(attrs, Attribute{
			Name: cp.Resolve(l.u2()),
			Data: l.bytes(int(l.u4())),
		})
	}
	return attrs
}
      
      





Both fields and methods are represented as Fields, which is very convenient and saves time. Finally, we can put it all together and parse our class fully:



type Class struct {
	ConstPool  ConstPool
	Name       string
	Super      string
	Flags      uint16
	Interfaces []string
	Fields     []Field
	Methods    []Field
	Attributes []Attribute
}

func Load(r io.Reader) (Class, error) {
	loader := &loader{r: r}
	c := Class{}
	loader.u8()           // magic (u32), minor (u16), major (u16)
	cp := loader.cpinfo() // const pool info
	c.ConstPool = cp
	c.Flags = loader.u2()             // access flags
	c.Name = cp.Resolve(loader.u2())  // this class
	c.Super = cp.Resolve(loader.u2()) // super class
	c.Interfaces = loader.interfaces(cp)
	c.Fields = loader.fields(cp)    // fields
	c.Methods = loader.fields(cp)   // methods
	c.Attributes = loader.attrs(cp) // methods
	return c, loader.err
}
      
      





Now, if we look at the information about the class, we will see that he does not have fields, two methods - <the init> :() the V and the add: (II of) I of . What are roman numbers with brackets? These are descriptors. They determine what types of arguments a method takes and what it returns. In this case, <init> (the synthetic method used to initialize objects when they are created) takes no arguments and returns nothing (V = void), while the "add" method accepts two data types int (I = int32) and returns an integer.



Bytecode



If we look closely, we realize that each method in our parsed class has one attribute named “Code”. This attribute contains a fraction of the bytes as payload. Bytes: In the spec, this time in the bytecode section , we'll read that the "Code" attribute starts with a maxstack value (2 bytes), then maxlocals (2 bytes), the length of the code (4 bytes) and then the actual code. So, our attributes can be read like this: Yes, we only have 4 and 5 bytes of code in each method. What do these bytes mean?



<init>:

[0 1 0 1 0 0 0 5 42 183 0 1 177 0 0 0 1 0 7 0 0 0 6 0 1 0 0 0 1]

add:

[0 2 0 2 0 0 0 4 26 27 96 172 0 0 0 1 0 7 0 0 0 6 0 1 0 0 0 3]












<init>: maxstack: 1, maxlocals: 1, code: [42 183 0 1 177]

add: maxstack: 2, maxlocals: 2, code: [26 27 96 172]












As I said, the JVM is a stacking machine. Each instruction is encoded as a single byte, which may be followed by additional arguments. Looking at the spec, we can see that the "add" method has the following instructions: Exactly as we saw in the javap output at the beginning! But how to do that?



26 = iload_0

27 = iload_1

96 = iadd

172 = ireturn












JVM frames



When a method is executed inside the JVM, it has its own stack for temporary operands, its own local variables, and a piece of code to execute. All of these parameters are stored in a single execution frame. In addition, frames contain a pointer to the current instruction (how far we have advanced in executing the bytecode) and a pointer to the class that contains the method. The latter is needed to access the pool of class constants, as well as other details.



Let's create a method that creates a frame for a specific method called with the given arguments. I'll use interface {} as the Value type here, although using the correct union types would of course be safer.



type Frame struct {
	Class  Class
	IP     uint32
	Code   []byte
	Locals []interface{}
	Stack  []interface{}
}

func (c Class) Frame(method string, args ...interface{}) Frame {
	for _, m := range c.Methods {
		if m.Name == method {
			for _, a := range m.Attributes {
				if a.Name == "Code" && len(a.Data) > 8 {
					maxLocals := binary.BigEndian.Uint16(a.Data[2:4])
					frame := Frame{
						Class:  c,
						Code:   a.Data[8:],
						Locals: make(
									[]interface{}, 
									maxLocals, 
									maxLocals
								),
					}
					for i := 0; i < len(args); i++ {
						frame.Locals[i] = args[i]
					}
					return frame
				}
			}
		}
	}
	panic("method not found")
}
      
      





So we got a Frame with initialized local variables, empty stack and preloaded bytecode. It's time to execute the bytecode:



func Exec(f Frame) interface{} {
	for {
		op := f.Code[f.IP]
		log.Printf("OP:%02x STACK:%v", op, f.Stack)
		n := len(f.Stack)
		switch op {
		case 26: // iload_0
			f.Stack = append(f.Stack, f.Locals[0])
		case 27: // iload_1
			f.Stack = append(f.Stack, f.Locals[1])
		case 96:
			a := f.Stack[n-1].(int32)
			b := f.Stack[n-2].(int32)
			f.Stack[n-2] = a + b
			f.Stack = f.Stack[:n-1]
		case 172: // ireturn
			v := f.Stack[n-1]
			f.Stack = f.Stack[:n-1]
			return v
		}
		f.IP++
	}
}

      
      





Finally, we can put everything together and run it by calling our add () method:



f, _ := os.Open("Add.class")
class, _ := Load(f)
frame := class.Frame("add", int32(2), int32(3))
result := Exec(frame)
log.Println(result)

// OUTPUT:
OP:1a STACK:[]
OP:1b STACK:[2]
OP:60 STACK:[2 3]
OP:ac STACK:[5]
5
      
      





So everything works. Yes, this is a very weak JVM on minimal, but still it does what the JVM is supposed to do: download the bytecode and interpret it (of course, the real JVM does much more).



What is missing?



The remaining 200 instructions, runtimes, OOP type systems, and a few more things.



There are 11 groups of instructions, most of which are commonplace:



  • Constants (pushes null, small number, or values ​​from the constant pool onto the stack).
  • Loads (pushes local variables onto the stack). Similar instructions 32.
  • Stores ( ). 32 .
  • Stack (pop/dup/swap), .
  • Math (add/sub/div/mul/rem/shift/logic). , 36 .
  • Conversions (int short, int float ..).
  • Comparisons (eq/ne/le/…). , if/else.
  • Control (goto/return). .
  • References. , , .
  • Extended. . , , .
  • Reserved. 0xca.


Most instructions are easy to implement: they take one or two arguments from the stack, perform some operation on them, and send the result. The only thing to keep in mind here is that the instructions for long and double types assume that each value occupies two slots on the stack, so you may need additional push () and pop (), which makes grouping instructions difficult.



To implement References, you need to think about the object model: how you want to store objects and their classes, how to represent inheritance, where to store instance fields and class fields. Also, you have to be careful with method dispatching here - there are several “invoke” statements and they behave differently:



  • invokestatic: call a static method on a class, no surprises.
  • invokespecial: , , <init> .
  • invokevirtual: .
  • invokeinterface: , invokevirtual, .
  • invokedynamic: call site, Java 7, MethodHandles.


If you are building a JVM in a language without garbage collection, then you should consider how to do it: reference counting, mark-and-sweep algorithm, and so on. Handling exceptions by implementing athrow , propagating them through frames, and handling using exception tables is another interesting topic.



Finally, your JVM will be useless if there are no runtime classes. Without java / lang / Object, you can hardly even see how new works instruction when creating new objects. Your runtime may provide some generic JRE classes from the java.lang, java.io and java.util packages, or it could be something more domain specific. Most likely, some methods in classes should be implemented natively, and not in Java. This will raise the question of how to find and execute such methods, and will be another edge case for your JVM.



In other words, building the right JVM isn't easy, but figuring out exactly how it works isn't hard.



For example, it took me only one summer weekend. My JVM still has a long way to go, but the structure looks more or less clear: https://github.com/zserge/tojvm (comments are always welcome!)



The actual code snippets from this article are even smaller and are available as a gist here .



If you want to learn more, you might consider using small JVMs:





I hope this article hasn't discouraged your interest in Java. Virtual machines are fun, and the Java virtual machine really deserves a close look.



I hope you enjoyed my article. You can follow my work on Github or Twitter , and subscribe via rss .



All Articles