GSoC'20 final report | Chapel
Work submission for GSoC'20 with Chapel Programming Language
Student | Aniket Mathur |
Project | Protocol Buffers Integration |
Mentors | Audrey Pratt, Lydia Duncan, Michael Ferguson |
Link to proposal | ProtocolBuffersIntegration |
What is Chapel?
Chapel is a modern programming language designed for productive parallel computing at scale. Chapel’s design and implementation have been undertaken with portability in mind, permitting Chapel to run on multicore desktops and laptops, commodity clusters, and the cloud, in addition to the high-end supercomputers for which it was originally undertaken.
What are Protocol Buffers?
Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data. It is useful in developing programs to communicate with each other over a wire or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data. Please see the official documentation of protobuf to learn more.
The Project
The aim of the project was to develop a Chapel plugin for protoc
, the Protocol buffer compiler, and a user support runtime module.
This project made a good amount of progress in developing the user facing features and documenting these. Please see the user guide for the tool to understand it from a user point of view. This page will dicuss how these features were developed over the summer.
Progress in the project is made in a personal repository(Aniket21mathur/Chapel-protobuf) and the work is integrated to the main Chapel organisation repository(chapel-lang/chapel) through pull requests. User end API name review is done through issues in the main Chapel repository.
Link to pull requests to the main repository -
- Protocol Buffers Integration
- Generated code guide & Make Check
- Add support for Any message type, Maps and Oneofs
The implementation details are discussed below.
Link to issues for API review -
Tasks and related merged commits -
The progress in the project is monitored through a task-checklist.
Write basic plugin modules in C++ for generating chapel code.
The aim of this task was to write a basic working plugin using the APIs of the protobuf compiler. The generated binary should compile successfully with the protoc
compiler with the --chpl_out
language flag.
$protoc --chpl_out=. a.proto
Write scripts that set up the project.
The aim of this task was to write a shell script which generates and install the protoc-gen-chpl
(conventional protobuf binary name) binary to the system as well as provides the option to generate
a chapel file from a given proto file using the installed plugin.
NOTE
This script is utilised by the development repo.
After integrating protobuf to the main repo as a tool, the binary is installed with the help of a Makefile. Simply run make protoc-gen-chpl
in CHPL_HOME
.
Pull Request
Merge Commit
Set up a testing architecture for the project.
There were two types of tests that were used for testing-
-
End to end: These tests the correctness of the serialized stream generated by the Chapel module by reading it in Python. Deserialization is tested by reading the stream generated from Python.
-
General Method tests: Tests written to ensure correct working of individual methods of the protobuf protocol support module.
This task involved writing a shell script to automate the above tests.
Pull Request
Merge Commit
Add support for serialization of int32 and int64 type fields packed in key-value message stream.
Start with simple integer type fields and serialize these to key-value type message byte stream, where
key is derived from wire type
and field number
of the field.
The available wire types for proto3
are as follows:
Type | Meaning | Used For |
---|---|---|
0 | Varint | int32, int64, uint32, uint64, sint32, sint64, bool, enum |
1 | 64-bit | fixed64, sfixed64, double |
2 | Length-delimited | string, bytes, embedded messages, packed repeated fields |
5 | 32-bit | fixed32, sfixed32, float |
Add channel implementation for serialization methods and support for other proto field types.
In this task bytes are replaced by binary channels. The serialized stream is now directly written to the channel. This task also involved adding support for other types - DOUBLE, FLOAT, UINT64, FIXED64, FIXED32, BOOL , STRING , BYTES, UINT32, SFIXED32, SFIXED64, SINT32 and SINT64.
Pull Request
Merge Commit
Work on how to handle the package specifier in Chapel.
An optional package specifier can be added to a .proto
file to prevent name clashes between protocol message types. A proto package name, if specified, is used as a module/file name for the generated chapel file, otherwise the proto file name is used as a default.
Pull Request
Merge Commit
Handle unknown fields (backward and forward compatibility).
When an old binary parses data sent by a new binary with new fields, those new fields become unknown fields in the old binary. In proto3
versions less than 3.5
the unknown fields are discarded, but in 3.5
and latter unknown fields are retained during parsing and included in the serialized output.
Pull Request
Merge Commit
Add support for repeated fields.
This task involved implementing repeated fields. These fields can be repeated any number of times(including zero). The order of the repeated values will be preserved. In proto3
, repeated fields of
scalar numberic types use packed
encoding by default. Empty repeated fields are not serialized to the wire.
Pull Request
Merge Commit
Add support for enums.
This task involved generating Chapel code corresponding to proto
enums and simple/repeated enum
fields.
Pull Request
Merge Commit
Add support for using message types as fields.
Other message types can also be used as fields. For example:
message SearchResponse {
repeated Result results = 1;
}
message Result {
string url = 1;
string title = 2;
repeated string snippets = 3;
}
This task aimed at adding support for these type of fields.
Pull Request
Merge Commit
Add support for nested types.
You can define and use message types and enum types inside other message types. This pr linked to this task adds support for that.
Pull Request
Merge Commit
Add support for the Any message type.
The Any message type lets you use messages as embedded types. An Any contains an arbitrary serialized message as bytes, along with a URL that acts as a globally unique identifier for and resolves to that message’s type.
Pull Request
Merge Commit
Add support for the Oneof.
Memory can be saved using the oneof feature if we have a message with many fields and where at most one field will be set at the same time.
Pull Request
Merge Commit
Add support for Map.
Support for map type field is added through this task.
Pull Request
Merge Commit
Add documentation for the project.
The documentation part mainly comprises of two guides:
Apart from the mentioned PRs, updates are made through commits when new features are added.
Miscellaneous fixes -
Remove mason support from the project
https://github.com/Aniket21mathur/Chapel-protobuf/commit/a5ebcd1887acfe115b9768f1e7fb6f8f0d693fef
Remove explicit declaration of get/set functions
https://github.com/Aniket21mathur/Chapel-protobuf/commit/c6c4387460d51afa2a4828bb32b9c9d66ad6f809
Avoid serialization of scalar types with default values
https://github.com/Aniket21mathur/Chapel-protobuf/commit/b64db2170f66ecf3c26849faa5f54d01bf822544
Possible future improvements -
- Add reflection support to the module.(Related issue)
- Support
JSON
encoding.