Why not just use XML?
Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:
* are simpler
* are 3 to 10 times smaller
* are 20 to 100 times faster
* are less ambiguous
* generate data access classes that are easier to use programmatically
For example, let's say you want to model a person with a name and an email. In XML, you need to do:
<person>
<name>John Doe</name>
<email>
[email protected]</email>
</person>
while the corresponding protocol buffer message (in protocol buffer text format) is:
# Textual representation of a protocol buffer.
# This is *not* the binary format used on the wire.
person {
name: "John Doe"
email: "
[email protected]"
}
When this message is encoded to the protocol buffer binary format (the text format above is just a convenient human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.
Also, manipulating a protocol buffer is much easier:
cout << "Name: " << person.name() << endl;
cout << "E-mail: " << person.email() << endl;
Whereas with XML you would have to do something like:
cout << "Name: "
<< person.getElementsByTagName("name")->item(0)->innerText()
<< endl;
cout << "E-mail: "
<< person.getElementsByTagName("email")->item(0)->innerText()
<< endl;
However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).