Skip to content

Latest commit

 

History

History
84 lines (74 loc) · 8.82 KB

README.md

File metadata and controls

84 lines (74 loc) · 8.82 KB

implementing Kafka with Golang

set of implemented features:
  • commit log
  • networking with gRPC
  • encrypting connection, mutual TLS authentication, ACL based authorization using Casbin and peer-to-peer grpc connection
  • Observability using zap, ctxtags and OpenCensus for tracing. all in gRPC interceptors ⚭
  • Server-to-Server Service Discovery using Serf
  • Coordination and Consesusn with Raft algorithm and bind the cluster with Serf for Discovery Integration between servers (trying to implement KRaft)
  • Client-side LoadBalancing with gRPC and End-to-End Discovery Integration
implementation:
  • commit log:
    • hash-table index approach for in-memory data structure using write-ahead-log and LSM Tree engine by fragmenting index, store and segments
    • storing log segments and stores in bytes for ease of serializing and deserializing in network and gRPC protocols
    • using store and index files approach for each log segment by in-memory data-structure for faster seeks. (already thinking about merging the old segments using merge-sort algorithm)
    • using go-mmap library to memory map index file for performance issues.
    • tests for each segment and it's store and index files
  • gRPC Services: v2.0.0
    • using bidirectional streaming APIs on the client and server side to stream the content between them.
    • using status, codes and errdetails packages to customize error messages between client and server.
    • Dependency Inversion using Interfaces. (DIP principle). --> wanna know more?
  • Security: v3.0.0
    • my approach to secure the system is based on three approach:
      • encrypt data in-flight to protect against man-in-the-middle attacks
      • authenticate to identify clients
      • authorize to determine the permission of the identified clients
    • base on the adoptiveness of mutual authentication in distributed services, I'm gonna try my best to adopt it too 🤠. interested? learn how cloudflare adopt it.
    • building access control list-based authorization & differentiate between authentication and authorization in case of varying levels of access and permissions.
    • using cloudflare's open source Certificate Authority (CA) called CSFFL for signing, verifying and bundling TLS certificates.
    • v3.0.2(Authentication) has compeleted mutual communication between client and server + containing tests.
    • Authorization: - using Casbin: Casbin supports enforcing authorization based on various control models—including ACLs. Plus Casbin extendable and i's exciting to explore it.
    • v4.0.0 --> encrypting connection, mutual TLS authentication, ACL based authorization using casbin
  • Observability: v4.0.0
    • using zap for structured logs
    • using request context tags to set value for request tags in context. it'll add a Tag object to the context that can be used by other middleware to add context about a request.
    • using OpenCensus for tracing
  • Service-to-Service Discovery:
    • implementing Membership using Serf on each service instance to discover other service instances.
    • implementing Replication to duplicate each server's data - already thinking about Consensus -_-.
    • after implementing our replicator, membership, log and server components, we'll implement and import an Agent type to run and sync these components on each instance. just like Hachicorp Consul.
    • updated on v6.0.0
  • Coordinate the Service using Raft Consensus Algorithm:
    • using my own log library as Raft's log store and satisfy the LogStore interface that Raft requires.
    • using Bolt which is an embedded and persisted key-value database for Go, as my stable store in Raft to store server's current Term and important metadata like the candidates the server voted for.
    • implemented Raft snapshots to recover and restore data efficiently, when necessary, like if our server’s EC2 instance failed and an autoscaling group(terraform terminology 🥸) brought up another instance for the Raft server. Rather than streaming all the data from the Raft leader, the new server would restore from the snapshot (which you could store in S3 or a similar storage service) and then get the latest changes from the leader.
    • again, using my own Log library as Raft's finite-state-machine(FSM), to replicate the logs across server's in the cluster.
    • Discovery integration and binding Serf and Raft to implement service discovery on Raft cluster by implementing Join and Leave methods to satisfy Serf's interface hence having a Membership in the cluster to be discovered. A Membership service determines which nodes are currently active and live members of the cluster
    • due to unbounded network delays it's not possible to reliably detect whether another node has failed, so I coupled my consnsus algorithm with serf membership package to come to an agreement about which nodes should be considered alive or not. (coupling Serf membership and Raft consensus algorithm for failure detection)
    • Multiplexing on our Service to run multiple services on one port
      • we identify the Raft connections from gRPC connections by making the Raft connection write a byte to identify them by, and multiplexing connection to different listeners to handle, and configured our agents to both manage Raft cluster connections and gRPC connection on our servers on a single port
  • Client-Side LoadBalancing: v7.0.0
    • three major features our client needs at this point:
      • discover servers in the cluster
      • direct append calls to leaders and consume calls to followers
      • balance consume calls across followers
    • we will start by writing our own resolver and picker usign gRPC builder pattern. gRPC separates the server discovery, loadbalancing and client request and response handling. in our gRPC:
      • resolvers discovers the servers and whether the server is the leader or not
      • pickers manage directing produce calls to the leader and balancing consume calls across the followers
    • Implementing End-to-End discovery and Balancing(client-side) with our agent.
UpComming features -updating...
  • orchestration and deployment with kubernetes + configuring with Helm and tune k8s controllers to handle our cluster as we desire
  • using Kafka to eliminate all the endpoints of the API for writing and just append those request to the log (append-only log) and let the system consume that log to escape dual-write issue. In this world view, the log is the authoritative source of what has happened, and consumers of the log (other services which needs to know what happend to the source of truth) consume that information in various different ways.
  • provisioning resources on AWS by Infrastructure as Code principles using Terraform
  • CI/CD using Jenkins pipeline cluster-wide + github webhooks to automate deployment
  • machine learning models as the core functionality and agent to embrace and wield every aspect of the project
Notes
  • merging segments together for key-value stores to keep the latest value for each key and truncate out-dated segments( compaction and merging, page 73 of designing data-intensive applications)
  • as our logs are fixed-length size of records, we could use binary-search to find the right offset on the disk. in this approach we don't need any index file to store our log's offsets and gain performance