graph_plan.txt

# Plan for a C++20 Graph Class Optimized for Chip Design (Netlist)

## Introduction

This plan outlines the design of a C++20 class for handling graphs optimized for chip design or netlists. The graph represents logic cells (nodes) and their connections (edges) at the pin level. The goal is to create an efficient and mutable graph structure that handles frequent node deletions, supports topological traversals, and optimizes memory usage for both small and large numbers of edges.

By focusing on efficient storage, mutability, and optimization techniques like delta encoding and overflow management, the design aims to handle the specific needs of hardware logic representation. The separation of meta-information and careful management of IDs and edges ensure that the graph remains efficient even as it mutates and scales.

## Data Structures

### Nodes

- **Node ID (Nid)**: A unique 32-bit integer that indexes into a `std::vector`.
- **Pins**: Each node contains at least one pin. To optimize storage, the first pin (Pin 0) ID corresponds to the Node ID.
- **Type**: A small integer representing the node's logic type (e.g., XOR, AND).
- **Edges**: Stored within the node if they originate from Pin 0. Overflow to a hashmap if too many edges exist.

### Pins

- **Pin ID**: Unique identifiers for each pin, also indexing into a `std::vector`.
- **Port ID**: Represents the specific port (`Port_ID`) of the node (e.g., input A, output Y).
- **Edges**: Connections to other pins, optimized for a small number of bidirectional edges. Overflow to a hashmap if necessary.

### Edges

- **Bidirectional**: Edges are bidirectional connections between pins.
- **Edge Storage**:
  - **Local Edges**: Stored directly within the pin or node using delta encoding.
  - **Overflow Edges**: If edges exceed local storage capacity, they overflow into a separate data structure (e.g., a `hash_set8`).

## Graph Representation

- **Vectors Instead of Pointers**: All nodes and pins are stored in `std::vector`s, indexed by their IDs to avoid pointer overhead.
- **Entry Types**: Each entry in the vectors can be of type Free, Node, Pin, or Overflow.
- **Overflow Handling**: For nodes or pins with many edges (e.g., reset/clk pins), edges overflow into a hashmap or a larger continuous array.

## Main Characteristics

1. **Mutability**: The graph supports adding and removing nodes and edges dynamically.
2. **Fixed IDs**: Once an ID is assigned to a node or pin, it is never reused, even after deletion (tombstone deletion).
3. **Frequent Node Deletion**: Nodes can be deleted frequently, which is common in optimization phases.
4. **Traversal Methods**:
   - **Any Order Traversal**: For operations that don't require a specific order.
   - **Topological Sort Traversal**: Essential for tasks that depend on processing order.
5. **Edge Addition Frequency**: Adding edges is most frequent during the initial graph creation phase.
6. **Edge Optimization**:
   - **Delta Encoding**: Since most edges are "short" (connect nearby nodes), delta encoding is used to store edge differences as smaller integers.
   - **Local Edge Storage**: Most nodes have under 8 edges, so edges are stored locally when possible.
7. **Meta Information**: Attributes and additional data are stored separately to keep the core graph structure lightweight.
8. **Concurrency**: The graph does not require thread-safe updates. Parallelism is achieved through parallel read-only operations.

## Main Methods

### Node Management

- **Add Node**: Creates a new node with a unique ID and initializes its pins.
- **Delete Node**: Marks a node as deleted without reusing its ID (tombstone). All the pin edges are deleted.
- **Get Node**: Retrieves a node by its ID. Used by iterators.
- **Set Node Type**: Sets the logic type of a node.
- **Get Node Type**: Returns the logic type of a node.

### Pin Management

- **Add Pin**: Adds a pin to a node, assigning it a unique ID.
- **Delete Pin**: Marks the pin as deleted without reusing the ID (tombstone). All the edges are deleted.
- **Get Pin**: Retrieves a pin by its ID. Used by iterators.
- **Set Port ID**: Sets the port ID of a pin.
- **Get Port ID**: Retrieves the port ID of a pin.
- **Check if Pin is Sink/Driver**: Determines whether a pin is an input (sink) or output (driver).

### Edge Management

- **Add Edge**:
  - Adds a bidirectional edge between two pins.
  - If the number of edges is small, stores it locally using delta encoding.
  - If the edge count exceeds local storage, moves edges to an overflow structure.
- **Delete Edge**: Removes an edge between two pins.
- **Overflow Handling**: Manages the transition of edges from local storage to overflow structures and vice versa.
- **Get Connected Pins**: Iterates over all pins connected to a given pin.

### Traversal Methods

- **Any Order Traversal**: Iterates over nodes and pins without a specific order.
- **Topological Sort Traversal**: Processes nodes in an order respecting dependencies, essential for certain analyses.

## Structure

### Classes and Components from the `hhds` Namespace

1. **Graph**: The main class managing the entire graph structure.
   - Contains vectors for nodes, pins, and overflow edges.
   - Manages ID assignments and tombstone tracking.
   - Provides methods for node, pin, and edge management.

2. **Graph_node**: Represents a logic cell (Node).
   - Contains at least one pin (Pin 0).
   - Stores type information and local edges.
   - Manages overflow if too many edges exist.

3. **Graph_pin**: Represents a connection point on a node (Pin).
   - Stores port ID and local edges.
   - Knows whether it is an input (sink) or output (driver) pin.
   - Manages overflow if too many edges exist.

4. **Edge Overflow Structures**:
   - Used when a pin's edges exceed local storage capacity.
   - Implemented as hashmaps or larger continuous arrays.

### Data Organization

- **Entry Types**: Each entry in the node and pin vectors has an `Entry_type` to indicate its status.
- **Delta Encoding**: Edge differences are stored as small integers when possible to save space.
- **Overflow Indicators**: Flags like `overflow_link` and `set_link` indicate whether to look for edges in overflow structures.

## Optimizations

### Delta Encoding for Edges

- Stores the difference between connected pin IDs instead of full 32-bit IDs.
- Effective when most edges connect nearby pins.
- Reduces memory footprint and improves cache performance.

### Handling Overflow Edges

- For pins with many edges, local storage is insufficient.
- Edges overflow into a hashmap (`emhash8::HashSet`) or a larger array.
- Overflow structures are managed carefully to balance performance and memory usage.

### Tombstone Deletion

- Nodes and pins are marked as deleted without reusing their IDs.
- Simplifies ID management and avoids potential conflicts.
- Deleted entries can be skipped during traversal.

## Data Types

```cpp
namespace hhds {

using Port_ID                  = uint16_t;  // 16 bits max value
constexpr int     Port_bits    = std::numeric_limits<Port_ID>::digits - 1;
constexpr Port_ID Port_invalid = ((1 << Port_bits) - 1);

using Bits_t               = int32_t;  // 17 bits max size
constexpr int    Bits_bits = 17;
constexpr Bits_t Bits_max  = ((1ULL << Bits_bits) - 1);

};  // namespace hhds
```

## Class Definitions with Methods to Implement

Below are the class definitions for `Graph_node` and `Graph_pin` with the methods that need to be implemented.

### `Graph_node.hpp`

```cpp
#pragma once

#include <array>
#include <cstdint>
#include <vector>

#include "graph_base.hpp"
#include "iassert.hpp"

namespace hhds {

class __attribute__((packed)) Graph_node {
public:
  Graph_node();

  // Node Type Methods
  [[nodiscard]] uint16_t get_type() const;
  void set_type(uint16_t t);

  // Instance Methods
  void set_instance(uint32_t gid);
  [[nodiscard]] bool has_instance() const;
  [[nodiscard]] uint32_t get_instance() const;

  // Edge Management
  bool add_edge(uint32_t self_id, uint32_t other_id);
  bool del_edge(uint32_t self_id, uint32_t other_id);

  // Edge Iteration
  class iterator {
  public:
    using value_type = uint32_t;
    using pointer    = const int16_t*;

    iterator(uint32_t sid, uint32_t x, pointer p, pointer e);

    [[nodiscard]] value_type operator*() const;
    iterator& operator++();
    [[nodiscard]] const iterator operator++(int);

    [[nodiscard]] bool operator==(const iterator& rhs) const;
    [[nodiscard]] bool operator!=(const iterator& rhs) const;

    [[nodiscard]] bool in_xtra() const;
    [[nodiscard]] pointer get_ptr();

  private:
    value_type self_id;
    value_type xtra;
    pointer    ptr;
    pointer    end;
  };

  iterator begin(uint32_t self_id);
  iterator end();
  [[nodiscard]] iterator begin(uint32_t self_id) const;
  [[nodiscard]] iterator end() const;

  // Edge Utilities
  void erase(iterator it);
  [[nodiscard]] size_t get_num_local_edges() const;
  [[nodiscard]] bool has_edges() const;

  // Pin Management
  uint32_t get_next_pin_ptr() const;
  void set_next_pin_ptr(uint32_t pin_id);

  // Node Deletion (deletes all Pins (not only Pin 0) edges too)
  void clear();

  // Debugging
  void dump(uint32_t self_id) const;

private:
  void switch_to_overflow(uint32_t overflow_id);

  // Node (16 bytes)
  // Byte 0:1
  Entry_type entry_type : 2;     // Free, Node, Pin, Overflow
  uint8_t    set_link : 1;       // Indicates if set link is used
  uint8_t    overflow_link : 1;  // When set, ledge points to overflow
  uint8_t    n_sedges : 2;       // Number of short edges
  uint16_t   type : 10;          // Node type
  // SEDGE: 2:7
  std::array<int16_t, 3> sedge;  // Short edges
  // Next Pin Pointer: 8:11
  uint32_t next_pin_ptr;         // Pointer to the next pin
  // Byte 12:15
  uint32_t ledge_or_overflow_or_set;  // Ledgers or overflow/set pointer
};

};  // namespace hhds
```

### `Graph_pin.hpp`

```cpp
#pragma once

#include <array>
#include <cstdint>
#include <vector>

#include "graph_base.hpp"
#include "iassert.hpp"

namespace hhds {

class __attribute__((packed)) Graph_pin {
public:
  Graph_pin(Port_ID pid, bool sink);

  // Port ID Methods
  [[nodiscard]] Port_ID get_port_id() const;

  // Sink/Driver Methods
  [[nodiscard]] bool is_sink() const;

  // Edge Management
  bool add_edge(uint32_t self_id, uint32_t other_id);
  bool del_edge(uint32_t self_id, uint32_t other_id);

  // Edge Iteration
  class iterator {
  public:
    using value_type = uint32_t;

    iterator(uint32_t x1, uint32_t x2);

    [[nodiscard]] value_type operator*() const;
    iterator& operator++();
    [[nodiscard]] const iterator operator++(int);

    [[nodiscard]] bool operator==(const iterator& rhs) const;
    [[nodiscard]] bool operator!=(const iterator& rhs) const;

    [[nodiscard]] bool in_data1() const;
    [[nodiscard]] bool in_data2() const;

  private:
    value_type data1;
    value_type data2;
  };

  iterator begin(uint32_t self_id);
  iterator end();
  [[nodiscard]] iterator begin(uint32_t self_id) const;
  [[nodiscard]] iterator end() const;

  // Edge Utilities
  void erase(iterator it);
  [[nodiscard]] size_t get_num_local_edges() const;
  [[nodiscard]] bool has_edges() const;

  // Node Association
  uint32_t get_node_id() const;
  void set_node_id(uint32_t nid);

  // Next Pin Pointer (iterate over Pins in a Node)
  uint32_t get_next_id() const;
  void set_next_id(uint32_t pin_id);

  // Pin Deletion (clears all edges too)
  void clear();

  // Debugging
  void dump(uint32_t self_id) const;

private:
  // Pin (16 bytes)
  // Byte 0:1
  Entry_type entry_type : 2;     // Free, Node, Pin, Overflow
  uint8_t    set_link : 1;       // Indicates if set link is used
  uint8_t    overflow_link : 1;  // When set, ledge points to overflow
  uint8_t    sink : 1;           // Sink (!driver)
  int16_t    sedge_0 : 11;       // Short edge (11 bits 2-complement)
  int16_t    portid;             // Port ID
  // Node ID: 4:7
  uint32_t node_id;              // Associated node ID
  // Next Pin Pointer: 8:11
  uint32_t next_id;              // Pointer to the next pin
  // Byte 12:15
  uint32_t ledge_or_overflow_or_set;  // Ledgers or overflow/set pointer
};

};  // namespace hhds
```

## Additional Notes

- **Graph Class**: In addition to `Graph_node` and `Graph_pin`, a `Graph` class should be implemented to manage the overall graph structure, including methods for node and pin creation, deletion, traversal, and edge management.
- **Overflow Structures**: The implementation of overflow structures (`emhash8::HashSet` or equivalent) should be defined to handle cases where local edge storage is insufficient.
- **Iterators**: Custom iterators are provided in both `Graph_node` and `Graph_pin` to facilitate traversal over edges.
- **Entry Types**: The `Entry_type` enumeration should be defined in `graph_base.hpp` to represent Free, Node, Pin, and Overflow entries.
- **Assertions**: Use of `IASSERT` or equivalent for debugging and ensuring data integrity.