Skip to content

Latest commit

 

History

History
131 lines (113 loc) · 8.52 KB

README_en.md

File metadata and controls

131 lines (113 loc) · 8.52 KB

中文版

Sogou C++ Workflow

license MIT C++ platform

As Sogou`s C++ server engine, workflow supports almost all back-end C++ online services of Sogou, including all search services, cloud input method,online advertisements, etc., handling more than 10 billion requests every day. This is an enterprise-level programming engine with light and elegantly designed which can satisfy most C++ back-end development requirements.

You can use it:

  • To quickly build an Http server:
#include <stdio.h>
#include "workflow/WFHttpServer.h"

int main()
{
    WFHttpServer server([](WFHttpTask *task) {
        task->get_resp()->append_output_body("<html>Hello World!</html>");
    });

    if (server.start(8888) == 0) {  // start server on port 8888
        getchar(); // press "Enter" to end.
        server.stop();
    }

    return 0;
}
  • As a powerful asynchronous client. Currently supports http, redis, mysql and kafka protocols.
  • To realize user-defined protocol client/server and build your own RPC system.
    • Sogou RPC is based on it and open source as an independent project, which supports srpc, brpc and thrift protocol (benchmark).
  • To build asynchronous task flow, support common series and parallel structures, and also support more complex DAG structures.
  • As a parallel programming tool. In addition to network tasks, we also include the scheduling of computing tasks. All types of tasks can be put into the same task flow.
  • As a file asynchronous IO tool under Linux system, with a high performance exceeding any system call. Disk IO is also a task.
  • To realize any high-performance and high-concurrency back-end service with a very complex relationship between computing and communication.
  • To build a service mesh system.
    • The project has built-in service governance and load balancing features.

Compile and run environment

  • This project supports Linux, macOS, Windows and other operating systems.
    • Windows version is temporarily released as an independent branch, using iocp to implement asynchronous networking. All user interfaces are consistent with the Linux version.
  • Supports all CPU platforms, including 32 or 64-bit x86 processors, big-endian or little-endian arm processors.
  • Relies on OpenSSL, recommending OpenSSL 1.1 and above.
  • Uses the C++11 standard and therefore, needs to be compiled with a compiler which supports C++11. Does not rely on boost or asio.
  • No other dependencies. However, it contains the unmodified source code of several compression libraries such as lz4, zstd and snappy (required by the Kafka protocol).

Try it!

System design features

We believe that a typical back-end program consists of the following three parts and should be developed completely independently.

  • Protocol
    • In most cases, users use built-in common network protocols, such as http, redis or various rpc.
    • Users can also easily customize user-defined network protocol, at the mean time they only need to provide serialization and deserialization functions to define their own client/server.
  • Algorithm
    • In our design, algorithm is a symmetrical concept with protocol.
      • If protocol call is rpc, then algorithm call is an apc (Async Procedure Call).
    • We have provided some general algorithms, such as sort, merge, psort, reduce, which can be used directly.
    • Compared with user-defined protocol, user-defined algorithm is much more common. Any complex calculation with clear boundaries should be packaged into an algorithm.
  • Task flow
    • Task flow is the actual bussiness logic, which is to put the protocols and algorithms into the flow graph for use.
    • The typical task flow is a closed series-parallel graph. Complex business logic may be a non-closed DAG.
    • The task flow graph can be constructed directly or dynamically generated based on the results of each step. All tasks are executed asynchronously.

Basic task, task factory and complex task

  • Our system contains six basic tasks: communication, file IO, CPU, GPU, timer, and counter.
  • All tasks are generated by the task factory and automatically recycled after callback.
    • Server task is one kind of special communication task, generated by the framework which calls the task factory, and handed over to the user through the process function.
    • In most cases, the task generated by the user through the task factory is a complex task, which has no necessary to be perceived by the user.
    • For example, an Http request may include many asynchronous processes (DNS, redirection), but for the user, it is just a communication task.
    • File sorting seems to be an algorithm, but it actually includes many complex interaction processes between file IO and CPU calculation.
    • If you think of business logic as building circuits with well-designed electronic components, then each electronic component may be a complex circuit.

Asynchrony and encapsulation based on C++11 std::function

  • Not based on user mode coroutines. Users need to know that they are writing asynchronous programs.
  • All calls are executed asynchronously, and there are almost no operations to wait for threads.
    • Although we also provide some convenient semi-synchronous interfaces, they are not core features.
  • Please avoid derivation.Try to encapsulate user behavior with std::function instead, including:
    • The callback of any task.
    • Any server process. This conforms to the FaaS (Function as a Service) idea.
    • The realization of an algorithm is simply a std::function. But the algorithm can also be implemented by derivative.

Memory reclamation mechanism

  • Every task will be automatically reclaimed after the callback. If a task is created but does not want to run, the user needs to release it through the dismiss method.
  • Any data in the task, such as the response of the network request, will also be recycled with the task. At this time, the user can use std::move() to move the required data.
  • SeriesWork and ParallelWork are two kinds of framework objects, which are also recycled after their callback.
  • This project doesn’t use std::shared_ptr to manage memory.

More design documents

To be continued...

Authors