Skip to content

bytecode

This should be an in-depth but terse guide to java. With examples to really get a grip of the internals.

hello

HelloWorld.java :

hello world
1
2
3
4
5
6
7
8
public class HelloWorld
{
    public static void main(String []s) {
        int x = 11;
        System.out.println("Jo!");
        System.out.println(x);
    }
}

To execute this program, the main function should be static, it does compile without static but you just won't have an entry point :

: Error: Main method is not static in class HelloWorld, please define the main method as: public static void main(String[] args)

This is logical, since there is no instantiated HelloWorld object to run main from, static makes it com into existence.

Let's take apart helloworld.

javap

Now of course it would print "Jo!", but here is a way to make things more transparent : javap the java class file disassembler. It can be used just to get java code again :

javap
javap HelloWorld.class :
output
1
2
3
4
5
Compiled from "HelloWorld.java"
public class HelloWorld {
    public HelloWorld();
    public static void main(java.lang.String[]);
}

Interesting is the implicit constructor that is added.

But.. you can go further and print the bytecode with -c :

decompile class file
javap -c HelloWorld.class
output
Compiled from "HelloWorld.java"
    public class HelloWorld {
        public HelloWorld();
        Code:
            0: aload_0
            1: invokespecial #1                  // Method java/lang/Object."<init>":()V
            4: return

    public static void main(java.lang.String[]);
        Code:
            0: bipush        11
            2: istore_1
            3: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
            6: ldc           #3                  // String Jo!
            8: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
            11: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
            14: iload_1
            15: invokevirtual #5                  // Method java/io/PrintStream.println:(I)V
            18: return
    }

For more info : visit

And visit

You can see each functions header is followed by a piece of code. The number means the index in the bytecode array that constitutes the code. So you can see it as the byte offset of that opcode and you will see it rises faster for lines with invocations and arguments.

  • aload_0 means to push the local variable "this" onto the stack.
  • invokespecial is used for invoking initialization methods, you can see which one after the comment. In this case initialise the mother of all Objects.
  • return is exactly what it says.

The main method does not start with aload_0, since it is a static method and does not have a this.

  • getstatic gets a static field from the System.out library which is imported by default. As far as i know it puts it's address on the stack.
  • ldc pushes various variables onto the stack, such as ints and string.
  • invokevirtual does indeed virtual method #4 from the

byte level

Even lower down you can look at the .class file bytes itself.

As a quickstart, you could represent the whole file with this C struct:

c version
struct Class_File_Format {
    u4 magic_number;

    u2 minor_version;   
    u2 major_version;

    u2 constant_pool_count;   

    cp_info constant_pool[constant_pool_count - 1];

    u2 access_flags;

    u2 this_class;
    u2 super_class;

    u2 interfaces_count;   

    u2 interfaces[interfaces_count];

    u2 fields_count;   
    field_info fields[fields_count];

    u2 methods_count;
    method_info methods[methods_count];

    u2 attributes_count;   
    attribute_info attributes[attributes_count];
}

Here is a complete strip down for HelloWorld.class :

helloworld.class binary
00000000: cafe babe 0000 0034 0020 0a00 0700 1009  .......4. ......

This is the first line, which always starts with the magic number 0xcafebabe.

offset size description
0 4 the magic number 0xcafe oxbabe
4 2 minor version of the class file format
6 2 major version of the class file format

Bytes 4-7 specify the class version number, which is 0 and 0x34. The major version a sequential number linked to a specific java version.

See visit, or more detailed : visit

In this case it is 0x34 (52), which is java SE 8 so with the 0 minor version this is java 8.0 code, which suits:

version
javac -version
javac 1.8.0_212

Next comes the constant pool table :

offset size description
8 2 number of entries in the constant pool (sort of)
10 cpsize this is the constant pool and it is of variable size

So 10 is the last stable entry, and we have to read away the cp table to see how big it is. There are 0x20 entries in there, so 32. But it's not actually 32, since they start at slot 1 (not 0) and some types take up two slots. In general the count is the number of entries -1 :

Each entry now has this format:

entry
1
2
3
4
cp_info {
    u1 tag;
    u1 info[];
}

Since info contains different information for each tag type, it is here represented as a byte array.

In table form :

offset size description
0 1 tag indicating the entry type
1 var tag specific information

In our example, the tag is 0x0a, which is (10: CONSTANT_Methodref)

CONSTANT
1
2
3
4
5
CONSTANT_Methodref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
}

So filled with the data it will be :

offset size description
0 1 tag indicating the entry type
1 2 class index 0007
3 2 name/type index 0010

The rest of the entries would be :

under construction, i just got this far