Here at droxIT our vision is to provide efficient and universal data processing for everyone. As our path to this vision we elected the development of analysis software stack - an ecosystem of modules that can be combined flexibly in order to provide custom analytics pipelines with ease.
After initial attempts in that direction we decided to solve the infrastructural challenges of this endeavor first. We're currently developing a microservice framework, which aims to solve all our architectural and infrastructural needs. Creating an analytics module with it should be a matter of implementing its logic - nothing more. In the first part of our series we focus on this framework.
Let's start with a short excursion into the benefits of microservices in the context of our goals. Microservices have several advantages in comparison to classical monolithic architectures: because they are small independent units they can easily be developed by a single small team, hence scaling the development means adding new services which can be developed by additional teams. Since microservices should never rely on implementation details of other services and should instead rely on interfaces to each other, development teams as well as individuals are decoupled from each other reducing the need for coordination. Performance can also be scaled more easily: if a service is under heavy load more instances can be started. Additionally, microservices are easily tesiable and work very well with cloud deployment.
There are also advantages that are more specific to our use case: implementing analytics algorithms as services makes it easy to compose custom processing pipelines. Different algorithms and implementations can be benchmarked against each other. An ever-growing portfolio of services gives a construction kit to solve many use cases with little need for custom coding and instead of buying an overpowered single solution, customers only need to pay for the functionality they really need. As a result, the resource demand of your architecture is tuned to the actual need of the tasks that need to be performed.
To enjoy these benefits though it is imperative to have solid protocols for inter-service communication and well defined data serialization - to provide generic interfaces between services. This was our motivation for writing our own framework: providing an out of the box, painless microservice experience.
Let's get to the actual topic of this piece: ROXcomposer and ROXconnector.
The ROXcomposer is a development framework and infrastructure solution in one: at the core is a communication protocol that facilitates flexible routing between services and in principle it supports any programming language. We created a reference implementation in Python - a base class that allows for easy implementation of services. In simplest case it suffices to implement a on_message method, which contains the application logic - all ancillary tasks like message transport, logging and configuration interfaces as well as message tracing are already included.
As an illustration, we provide the traditional "Hello, World!" for the ROXcomposer:
from roxcomposer import base_service class MyService(base_service.BaseService): def __init__(self, args): super().__init__(args) def on_message(self, msg): # usually we would do something useful with msg and dispatch that new_msg = "Hello, World!" self.dispatch(new_msg) args = json_loads(sys.argv) ms = MyService(args) ms.listen()
This code snippet shows how to write a service, but it does not show you how to define pipelines and post messages to them. With the ROXcomposer all information on the pipeline and the payload are contained in the message so it is suffice to create a message and send it to the first service in the pipeline. To make this task easier we added an API gateway, which provides this and many more features via a REST API. Which leads us to the second product we wanted to discuss:
The ROXconnector is a lean and flexible API-Server written in Node.js. It allows you to define its endpoints via its configuration file and is extensible via plugins.
We've written a control plugin for the ROXcomposer that allows us to start and stop services, define pipelines and send data to these pipelines. We can query information about our system like running services, defined pipelines or message traces. In addition it is possible to save the current setup of our system to file and restore it in another session.
Finally we included a command line application that wraps the REST calls into a convenient interface.
Here we want to shed a light on some of the choices we made for this framework.
Python has a strong position in the data science community and provides many successful libraries for this field e.g. scikit-learn, NLTK, Theano and Pandas to name a few. Besides that Python is an elegant, easy to learn language that allows the use of C modules via Cython for performance critical operations.
We want to reiterate that services can be written in most languages. Whether droxIT will provide reference implementations for other languages is not decided yet.
We did not mention it so far but currently ROXcomposer services are started as simple processes on the host system - they're not containerized. The reason is that this is the simplest scenario that can be used without additional infrastructure. You can simply grab the ROXcomposer package and instantly start developing and deploying locally. Advanced deployment schemes like cloud deployment or dockerization are the next step on our agenda.
Ancillary functionality like logging and message tracing can be injected, which allows you to change it via configuration. Though it is our goal to provide a complete out of the box solution but there will always be requirements we don't fulfill or pre-existing infrastructure we don't support.
We always strive to make the ROXcomposer customizable to your needs.
The ROXcomposer is a work in progress. It already is usable, but you're limited to one host. The versions will bring major improvements on usability and will support distributed systems: we want to provide an Elastic Stack integration for monitoring purposes and the CLI will be reworked from the ground up. At the same time we want to begin building our service portfolio.
To provide a better impression on the framework we have produced a hands on video (currently only in German).