Skip to content

Commit

Permalink
include io libraries to simplify API; flesh out documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
digizeph committed Oct 25, 2021
1 parent be11e80 commit 28e7841
Show file tree
Hide file tree
Showing 12 changed files with 368 additions and 610 deletions.
8 changes: 5 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "bgpkit-parser"
version = "0.3.0"
version = "0.4.0"
authors = ["Mingwei Zhang <[email protected]>"]
edition = "2021"
readme = "README.md"
Expand All @@ -25,7 +25,9 @@ log="0.4"
env_logger="0.9"
itertools = "0.10.1"

bzip2="0.4.3"
flate2="1.0.22"
reqwest = { version = "0.11", features = ["json", "blocking", "stream"]}

[dev-dependencies]
bzip2="0.4"
reqwest = { version = "0.11", features = ["json", "blocking", "stream"] }
bgpkit-broker = "0.3.0"
189 changes: 100 additions & 89 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,104 @@
# BGPKIT Parser

BGPKIT Parser provides MRT/BGP message parsing functionalities written in Rust.

## Features
BGPKIT Parser aims to provides the most ergonomic MRT/BGP message parsing Rust API.

BGPKIT Parser has the following features:
- performance comparable to C-based implementations like `bgpdump` or `bgpreader`.
- supporting most of the relevant BGP RFCs.
- simple API serves as building block for more complex workflows.
- **performant**: comparable to C-based implementations like `bgpdump` or `bgpreader`.
- **actively maintained**: we consistently introduce feature updates and bug fixes, and support most of the relevant BGP RFCs.
- **ergonomic API**: a three-line for loop can already get you started.
- **battery-included**: ready to handle remote or local, bzip2 or gz data files out of the box

## Examples

For complete examples, check out the [examples folder](examples).

### Parsing single MRT file

Let's say we want to print out all the BGP announcements/withdrawal from a single MRT file, either located remotely or locally.
Here is an example that does so.

```rust
use bgpkit_parser::BgpkitParser;
fn main() {
let parser = BgpkitParser::new("http://archive.routeviews.org/bgpdata/2021.10/UPDATES/updates.20211001.0000.bz2");
for elem in parser {
println!("{}", elem)
}
}
```

Yes, it is this simple!

You can even do some more interesting iterator operations that are event shorter.
For example, counting the number of announcements/withdrawals in that file:
```rust
use bgpkit_parser::BgpkitParser;
fn main() {
let url = "http://archive.routeviews.org/bgpdata/2021.10/UPDATES/updates.20211001.0000.bz2";
let count = BgpkitParser::new(url).into_iter().count();
println!("total: {}", count);
}
```

and it prints out
```
total: 255849
```

### Parsing multiple MRT files with BGPKIT Broker

[BGPKIT Broker][broker-repo] library provides search API for all RouteViews and RIPE RIS MRT data files. Using the
broker's Rust API ([`bgpkit-broker`][broker-crates-io]), we can easily compile a list of MRT files that we are interested
in for any time period and any data type (`update` or `rib`). This allows users to gather information without needing to
know about locations of specific data files.

[broker-repo]: https://github.com/bgpkit/bgpkit-broker
[broker-crates-io]: https://crates.io/crates/bgpkit-broker

The example below shows a relatively more interesting example that does the following:
- find all BGP archive data created on time 1634693400
- filter to only BGP updates files
- find all announcements originated from AS13335
- print out the total count of the announcements

```rust
fn main(){
// set broker query parameters
let mut params = bgpkit_broker::QueryParams::new();
params = params.start_ts(1634693400);
params = params.end_ts(1634693400);
params = params.data_type("update");
let mut broker = bgpkit_broker::BgpkitBroker::new("https://api.broker.bgpkit.com/v1");
broker.set_params(&params);

// loop through data files found by broker
for item in broker {

// create a parser that takes an URL and automatically determine
// the file location and file type, and handles data download and
// decompression streaming intelligently
let parser = BgpkitParser::new(item.url.as_str());

// iterating through the parser. the iterator returns `BgpElem` one at a time.
let elems = parser.into_elem_iter().map(|elem|{
if let Some(origins) = &elem.origin_asns {
if origins.contains(&13335) {
Some(elem)
} else {
None
}
} else {
None
}
}).filter_map(|x|x).collect::<Vec<BgpElem>>();
log::info!("{} elems matches", elems.len());
}
}
```

## Key Data Structures
## Data Representation

There are two key data structure to understand the parsing results:`MrtRecord` and `BgpElem`.
There are two key data structure to understand for the parsing results:`MrtRecord` and `BgpElem`.

### `MrtRecord`: unmodified MRT information representation

Expand All @@ -32,16 +119,16 @@ pub enum MrtMessage {
```

`MrtRecord` record representation is concise, storage efficient, but often less convenient to use. For example, when
trying to find out specific BGP announcements for certain IP prefix, we often needs to go through nested layers of
trying to find out specific BGP announcements for certain IP prefix, we often needs to go through nested layers of
internal data structure (NLRI, announced, prefix, or even looking up peer index table for Table Dump V2 format), which
could be irrelevant to what users really want to do.

### `BgpElem`: per-prefix BGP information, MRT-format-agnostic

To facilitate simpler data analysis of BGP data, we defined a new data structure called `BgpElem` in this crate. Each
To facilitate simpler data analysis of BGP data, we defined a new data structure called `BgpElem` in this crate. Each
`BgpElem` contains a piece of self-containing BGP information about one single IP prefix.
For example, when a bundled announcement of three prefixes P1, P2, P3 that shares the same AS path is processed, we break
the single record into three different `BgpElem` objects, each presenting a prefix.
the single record into three different `BgpElem` objects, each presenting a prefix.

```rust
pub struct BgpElem {
Expand All @@ -63,88 +150,12 @@ pub struct BgpElem {
}
```

The main benefit of using `BgpElem` is that the analysis can be executed on a per-prefix basis, generic to what the
The main benefit of using `BgpElem` is that the analysis can be executed on a per-prefix basis, generic to what the
backend MRT data format (bgp4mp, tabledumpv1, tabledumpv2, etc.). The obvious drawback is that we will have to duplicate
information to save at each elem, that consuming more memory.
information to save at each elem, that consuming more memory.

[mrt-record-doc]: https://docs.rs/bgp-models/0.3.4/bgp_models/mrt/struct.MrtRecord.html

## Examples

For complete examples, check out the [examples folder](examples)

### Parsing single MRT file

If having a file location already known, a user can directly read the data into memory and parse the bytes to creat
a parser object. The BGPKIT Parser provides convenient iterator over either `BgpElem` (the default iterator, or `.into_iter()`),
or `MrtRecord` (use `.into_record_iter()`). The example below iterates through all the elements in a single MRT updates file,
and logging all the announcements that were originated from a specific ASN.

```rust
// read updates data into bytes
let data_bytes = reqwest::blocking::get("http://archive.routeviews.org/bgpdata/2021.10/UPDATES/updates.20211001.0000.bz2")
.unwrap().bytes().unwrap().to_vec();
// create a buffered reader that wraps around a bzip2 decoder
let reader = BufReader::new(BzDecoder::new(&*data_bytes));
// create a parser that takes the buffered reader
let parser = BgpkitParser::new(reader);

// iterating through the parser. the iterator returns `BgpElem` one at a time.
for elem in parser {
// each BGP announcement contains one AS path, which depending on the path segment's type
// there could be multiple origin ASNs (e.g. AS-Set as the origin)
if let Some(origins) = &elem.origin_asns {
if origins.contains(&13335) {
log::info!("{}", &elem);
}
}
}
```

### Parsing multiple MRT files with BGPKIT Broker

[BGPKIT Broker][broker-repo] library provides search API for all RouteViews and RIPE RIS MRT data files. Using the
broker's Rust API ([`bgpkit-broker`][broker-crates-io]), we can easily compile a list of MRT files that we are interested
in for any time period and any data type (`update` or `rib`). This allows users to gather information without needing to
know about locations of specific data files.

[broker-repo]: https://github.com/bgpkit/bgpkit-broker
[broker-crates-io]: https://crates.io/crates/bgpkit-broker

```rust
let mut params = bgpkit_broker::QueryParams::new();
params = params.start_ts(1634693400);
params = params.end_ts(1634693400);
params = params.data_type("update");
let mut broker = bgpkit_broker::BgpkitBroker::new("https://api.broker.bgpkit.com/v1");
broker.set_params(&params);

for item in broker {
log::info!("downloading updates file: {}", &item.url);
// read updates data into bytes
let data_bytes = reqwest::blocking::get(item.url)
.unwrap().bytes().unwrap().to_vec();
// create a buffered reader that wraps around a bzip2 decoder
let reader = BufReader::new(BzDecoder::new(&*data_bytes));
// create a parser that takes the buffered reader
let parser = BgpkitParser::new(reader);

log::info!("parsing updates file");
// iterating through the parser. the iterator returns `BgpElem` one at a time.
let elems = parser.into_elem_iter().map(|elem|{
if let Some(origins) = &elem.origin_asns {
if origins.contains(&13335) {
Some(elem)
} else {
None
}
} else {
None
}
}).filter_map(|x|x).collect::<Vec<BgpElem>>();
log::info!("{} elems matches", elems.len());
}
```

## Contribution

Expand Down
8 changes: 8 additions & 0 deletions examples/count_elems.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
use bgpkit_parser::BgpkitParser;

/// an very simple example that reads a remote BGP data file and print out the message count.
fn main() {
let url = "http://archive.routeviews.org/bgpdata/2021.10/UPDATES/updates.20211001.0000.bz2";
let count = BgpkitParser::new(url).into_iter().count();
println!("{}", count);
}
17 changes: 1 addition & 16 deletions examples/parse-files-from-broker.rs
Original file line number Diff line number Diff line change
@@ -1,16 +1,7 @@
use std::io::BufReader;
use bzip2::read::BzDecoder;
use bgpkit_parser::{BgpElem, BgpkitParser};

/// This example shows how use BGPKIT Broker to retrieve a number of data file pointers that matches
/// the time range criteria, and then parse the data files for each one.
///
/// The dependency needed for this example are:
/// ```
/// bzip2="0.4"
/// reqwest = { version = "0.11", features = ["json", "blocking", "stream"] }
/// bgpkit-broker = "0.3.0"
/// ```
fn main() {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();

Expand All @@ -23,13 +14,7 @@ fn main() {

for item in broker {
log::info!("downloading updates file: {}", &item.url);
// read updates data into bytes
let data_bytes = reqwest::blocking::get(item.url)
.unwrap().bytes().unwrap().to_vec();
// create a buffered reader that wraps around a bzip2 decoder
let reader = BufReader::new(BzDecoder::new(&*data_bytes));
// create a parser that takes the buffered reader
let parser = BgpkitParser::new(reader);
let parser = BgpkitParser::new(item.url.as_str());

log::info!("parsing updates file");
// iterating through the parser. the iterator returns `BgpElem` one at a time.
Expand Down
24 changes: 3 additions & 21 deletions examples/parse-single-file.rs
Original file line number Diff line number Diff line change
@@ -1,36 +1,18 @@
use std::io::BufReader;
use bzip2::read::BzDecoder;
use bgpkit_parser::BgpkitParser;

/// This example shows how to download and process a single BGP archive file with BGPKIT Parser.
///
/// The dependency needed for this example are:
/// ```
/// bzip2="0.4"
/// reqwest = { version = "0.11", features = ["json", "blocking", "stream"] }
/// ```
fn main() {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();

log::info!("downloading updates file");
// read updates data into bytes
let data_bytes = reqwest::blocking::get("http://archive.routeviews.org/bgpdata/2021.10/UPDATES/updates.20211001.0000.bz2")
.unwrap().bytes().unwrap().to_vec();
// create a buffered reader that wraps around a bzip2 decoder
let reader = BufReader::new(BzDecoder::new(&*data_bytes));

// create a parser that takes the buffered reader
let parser = BgpkitParser::new(reader);
let parser = BgpkitParser::new("http://archive.routeviews.org/bgpdata/2021.10/UPDATES/updates.20211001.0000.bz2");

log::info!("parsing updates file");
// iterating through the parser. the iterator returns `BgpElem` one at a time.
for elem in parser {
// each BGP announcement contains one AS path, which depending on the path segment's type
// there could be multiple origin ASNs (e.g. AS-Set as the origin)
if let Some(origins) = &elem.origin_asns {
if origins.contains(&13335) {
log::info!("{}", &elem);
}
}
log::info!("{}", &elem);
}
log::info!("done");
}
Loading

0 comments on commit 28e7841

Please sign in to comment.