26 Aug 2020

GSoC'20 final report | Chapel

Work submission for GSoC'20 with Chapel Programming Language

   
Student Aniket Mathur
Project Protocol Buffers Integration
Mentors Audrey Pratt, Lydia Duncan, Michael Ferguson
Link to proposal ProtocolBuffersIntegration

What is Chapel?

Chapel is a modern programming language designed for productive parallel computing at scale. Chapel’s design and implementation have been undertaken with portability in mind, permitting Chapel to run on multicore desktops and laptops, commodity clusters, and the cloud, in addition to the high-end supercomputers for which it was originally undertaken.

What are Protocol Buffers?

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data. It is useful in developing programs to communicate with each other over a wire or for storing data. The method involves an interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data. Please see the official documentation of protobuf to learn more.

The Project

The aim of the project was to develop a Chapel plugin for protoc, the Protocol buffer compiler, and a user support runtime module.

This project made a good amount of progress in developing the user facing features and documenting these. Please see the user guide for the tool to understand it from a user point of view. This page will dicuss how these features were developed over the summer.

Progress in the project is made in a personal repository(Aniket21mathur/Chapel-protobuf) and the work is integrated to the main Chapel organisation repository(chapel-lang/chapel) through pull requests. User end API name review is done through issues in the main Chapel repository.

The implementation details are discussed below.

The progress in the project is monitored through a task-checklist.

Write basic plugin modules in C++ for generating chapel code.

The aim of this task was to write a basic working plugin using the APIs of the protobuf compiler. The generated binary should compile successfully with the protoc compiler with the --chpl_out language flag.

$protoc --chpl_out=. a.proto

Pull Request
Merge Commit

Write scripts that set up the project.

The aim of this task was to write a shell script which generates and install the protoc-gen-chpl (conventional protobuf binary name) binary to the system as well as provides the option to generate a chapel file from a given proto file using the installed plugin.

NOTE This script is utilised by the development repo. After integrating protobuf to the main repo as a tool, the binary is installed with the help of a Makefile. Simply run make protoc-gen-chpl in CHPL_HOME.
Pull Request
Merge Commit

Set up a testing architecture for the project.

There were two types of tests that were used for testing-

  • End to end: These tests the correctness of the serialized stream generated by the Chapel module by reading it in Python. Deserialization is tested by reading the stream generated from Python.

  • General Method tests: Tests written to ensure correct working of individual methods of the protobuf protocol support module.

This task involved writing a shell script to automate the above tests.
Pull Request
Merge Commit

Add support for serialization of int32 and int64 type fields packed in key-value message stream.

Start with simple integer type fields and serialize these to key-value type message byte stream, where key is derived from wire type and field number of the field.

The available wire types for proto3 are as follows:

Type Meaning Used For
0 Varint int32, int64, uint32, uint64, sint32, sint64, bool, enum
1 64-bit fixed64, sfixed64, double
2 Length-delimited string, bytes, embedded messages, packed repeated fields
5 32-bit fixed32, sfixed32, float

Pull Request
Merge Commit

Add channel implementation for serialization methods and support for other proto field types.

In this task bytes are replaced by binary channels. The serialized stream is now directly written to the channel. This task also involved adding support for other types - DOUBLE, FLOAT, UINT64, FIXED64, FIXED32, BOOL , STRING , BYTES, UINT32, SFIXED32, SFIXED64, SINT32 and SINT64.
Pull Request
Merge Commit

Work on how to handle the package specifier in Chapel.

An optional package specifier can be added to a .proto file to prevent name clashes between protocol message types. A proto package name, if specified, is used as a module/file name for the generated chapel file, otherwise the proto file name is used as a default.
Pull Request
Merge Commit

Handle unknown fields (backward and forward compatibility).

When an old binary parses data sent by a new binary with new fields, those new fields become unknown fields in the old binary. In proto3 versions less than 3.5 the unknown fields are discarded, but in 3.5 and latter unknown fields are retained during parsing and included in the serialized output.
Pull Request
Merge Commit

Add support for repeated fields.

This task involved implementing repeated fields. These fields can be repeated any number of times(including zero). The order of the repeated values will be preserved. In proto3, repeated fields of scalar numberic types use packed encoding by default. Empty repeated fields are not serialized to the wire.
Pull Request
Merge Commit

Add support for enums.

This task involved generating Chapel code corresponding to proto enums and simple/repeated enum fields.
Pull Request
Merge Commit

Add support for using message types as fields.

Other message types can also be used as fields. For example:

message SearchResponse {
  repeated Result results = 1;
}

message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}

This task aimed at adding support for these type of fields.
Pull Request
Merge Commit

Add support for nested types.

You can define and use message types and enum types inside other message types. This pr linked to this task adds support for that.
Pull Request
Merge Commit

Add support for the Any message type.

The Any message type lets you use messages as embedded types. An Any contains an arbitrary serialized message as bytes, along with a URL that acts as a globally unique identifier for and resolves to that message’s type.
Pull Request
Merge Commit

Add support for the Oneof.

Memory can be saved using the oneof feature if we have a message with many fields and where at most one field will be set at the same time.
Pull Request
Merge Commit

Add support for Map.

Support for map type field is added through this task.
Pull Request
Merge Commit

Add documentation for the project.

The documentation part mainly comprises of two guides:

Apart from the mentioned PRs, updates are made through commits when new features are added.

Miscellaneous fixes -

Remove mason support from the project
https://github.com/Aniket21mathur/Chapel-protobuf/commit/a5ebcd1887acfe115b9768f1e7fb6f8f0d693fef

Remove explicit declaration of get/set functions
https://github.com/Aniket21mathur/Chapel-protobuf/commit/c6c4387460d51afa2a4828bb32b9c9d66ad6f809

Avoid serialization of scalar types with default values
https://github.com/Aniket21mathur/Chapel-protobuf/commit/b64db2170f66ecf3c26849faa5f54d01bf822544

Possible future improvements -

  • Add reflection support to the module.(Related issue)
  • Support JSON encoding.