Releases: openziti/ziti
v0.17.6
Release 0.17.6
What's New
- Ziti executables that use JSON logging now emit timestamps that include fractional seconds.
Timestamps remain in the RFC3339 format. - Authentication mechanisms now allow
appId
andappVersion
insdkInfo
- Ziti executables that use JSON logging now emit timestamps that include fractional seconds.
Timestamps remain in the RFC3339 format. - Improved query performance by caching antlr lexers and parsers. Testing showed 2x-10x performance
improvement - Improve service list time by using indexes get related posture data
- Improved service polling
- Improved service policy enforcement - instead of polling this is now event based, which should
result in slower cpu utlization on the controller - Fixed a bug in service policy PATCH which would trigger when the policy type wasn't sent
- Support agent utilitiles (
ziti ps
) in ziti-tunnel - Cleanup ack handler goroutines when links shut down
- The check-data-integrity operation may now only run a single instance at a time
- To start the check,
ziti edge db start-check-integrity
- To check the status of a run
ziti edge db check-integrity-status
- To start the check,
- The build date in version info spelling has been fixed from builDate to buildDate
- A new metric has been added for timing service list requests
services.list
- A bug was fixed in the tunneler which may have lead to leaked connections
- Default hosting precedence and cost can now be configured for identities
- Health checks can now be configured for the go based tunneler (ziti-tunnel) using server configs
- ziti#177 ziti-tunnel has a new
host
mode, if you
are only hosting services - edge session events now contain a timestamp
- Improve log output for invalid API Session Tokens used to connect to Edge Routers
- Logs default to no color output
- API Session Certificate Support Added
Improved Service Polling
There's a new REST endpoint /current-api-session/service-updates, which will return the last time
services were changed. If there have been no service updates since the api session was established,
the api session create date/time will be returned. This endpoint can be polled to see if services
need to be refreshed. This will save network and cpu utilization on the client and controller.
Setting precedence and cost for tunneler hosted services
When the tunneler hosts services there was previously no way to specify the precedence and cost
associated with those services.
See Ziti XT documentation
for an overview of how precedence and cost relate to HA and load balancing.
There are now two new fields on identity:
- defaultHostingPrecedence - value values are
default
,required
andfailed
. Defaults
todefault
. - defaultHostingCost - valid values are between 0 and 65535. Defaults to 0.
When hosting a service via the tunneler, the terminator for the SDK hosted service will be created
with the precedence and cost of the identity used by the tunneler.
NOTE: This means all services hosted by an identity will have the same precedence and cost.
We'll likely add support for service specific overrides in the future if/when use cases arise which
call for it. In the meantime, a work-around is to use multiple identities if you need different
values for different services.
CLI Support
The ziti CLI supports setting the default hosting precedence and cost when creating identities
SDK API Change
The GO SDK has a new API method GetCurrentIdentity() (*edge.CurrentIdentity, error)
which lets SDK
users retrieve the currently logged in identity, including the default hosting precedence and cost.
This could be used by other SDK applications which may want to use the fields for the same reason
when hosting services.
Tunneler Health Checks
The go tunneler now supports health checks. Support for health checks may be added to other
tunnelers (such as ziti-edge-tunnel) in the future, but that is not guaranteed.
Health checks can be configured in the service configuration using the ziti-tunneler-server.v1
config type. Support in the host.v1
config type will be added when support for that config type is
added to the go tunneler.
Check Types
The tunneler supports two types of health check.
Port Checks
Port checks look to see if a host/port can be dialed. This is simple check which just ensures that
something is listening on a give host/port.
Port checks have the following properties:
- interval - how often the check is performanced
- timeout - how long to wait before declaring the check failed
- address - the address to dial. Should be of the form :. Example: localhost:5432
- actions - an array of actions to perform based on health check results. Actions will be discussed
in more detail below
HTTP Checks
Http checks a specific URL. They support the following properties:
- interval - how often the check is performanced
- timeout - how long to wait before declaring the check failed
- url - the url to connect to
- method - the HTTP method to use. Maybe one of
GET
,POST
,PUT
orPATCH
. Defaults toGET
- body - the body of the HTTP request. Defaults to an empty string
- expectStatus - the HTTP status to expect in the response. Defaults to 200
- expectBody - an optional string to look for in the response body.
- actions - an array of actions to perform based on health check results. Actions will be discussed
in more detail below
Health Check Actions
Each health check may specify actions to execute when a health check runs.
Each action may specify:
- trigger - valid values
pass
orfail
. Specifies if the action should run when the check is
passing or failing - consecutiveEvents - specifies if the action should only run after N consecutive passes or fails
- duration - specifies if the action should only run after the check has been passing or failing for
some period of time - action - specifies what to do when the action is run. valid values are:
mark healthy
- the terminator precedence will be set to the default hosting precedence of
the hosting identitymark unhealthy
- the terminator precedence will be set tofailed
increase cost N
- the terminator cost will be increased by N. This will only happen while
the terminator precedence is not failed. Once the terminator has failed we don't keep
increasing cost, otherwise it will likely reach max cost and take a long time to recover after
it goes back to healthy.decrease cost N
- the terminator cost will be decrease by N to a minimuim. The terminator
cost will not go below the hosting identity's default hosting cost
Examples
The following config defines a TCP service which can be reach at port 8171 on localhost
. It has a
port check defined which runs every 5 seconds, with a timeout of 500 milliseconds. The following
actions are defined on the health check:
- The terminator will be marked failed after the health check has failed 10 times in a row.
- The terminator cost will be increased by 100 each time the health check fails while the
terminator is not in failed state - The terminator will be returned to a non-failed state if the health check is healthy for 10
seconds - Every time the health check passes the cost will be reduced by 25, until it hits the baseline
cost defined by the hosting identity
{
"protocol" : "tcp",
"hostname" : "localhost",
"port" : 8171,
"portChecks" : [
{
"interval" : "5s",
"timeout" : "500ms",
"address" : "localhost:8171",
"actions": [
{
"action": "mark unhealthy",
"consecutiveEvents": 10,
"trigger": "fail"
},
{
"action": "increase cost 100",
"trigger": "fail"
},
{
"action": "mark healthy",
"duration": "10s",
"trigger": "pass"
},
{
"action": "decrease cost 25",
"trigger": "pass"
}
]
}
]
}
ziti-tunnel host command
The ziti-tunnel can now be run in a mode where it will only host services and will not intercept any
services.
Ex: ziti-tunnel host -i /path/to/identity.json
Schema Reference
For reference, here is the full, updated ziti-tunneler-server.v1
schema:
{
"$id": "http://edge.openziti.org/schemas/ziti-tunneler-server.v1.config.json",
"additionalProperties": false,
"definitions": {
"action": {
"additionalProperties": false,
"properties": {
"action": {
"pattern": "(mark (un)?healthy|increase cost [0-9]+|decrease cost [0-9]+)",
"type": "string"
},
"consecutiveEvents": {
"maximum": 65535,
"minimum": 0,
"type": "integer"
},
"duration": {
"$ref": "#/definitions/duration"
},
"trigger": {
"enum": [
"fail",
"pass"
],
"type": "string"
}
},
"required": [
"trigger",
"action"
],
"type": "object"
},
"actionList": {
"items": {
"$ref": "#/definitions/action"
},
"maxItems": 20,
"minItems": 1,
"type": "array"
},
"duration": {
"pattern": "[0-9]+(h|m|s|ms)",
"type": "str...
v0.18.2
Release 0.18.2
What's New
- Default hosting precedence and cost can now be configured for identities
- Health checks can now be configured for the go based tunneler (ziti-tunnel) using server configs
- ziti#177 ziti-tunnel has a new
host
mode, if you
are only hosting services - Changes to terminators (add/updated/delete/router online/router offline) will now generate events
that can be emitted - fabric and edge session events now contain a timestamp
Setting precedence and cost for tunneler hosted services
When the tunneler hosts services there was previously no way to specify the precedence and cost
associated with those services.
See Ziti XT documentation
for an overview of how precedence and cost relate to HA and load balancing.
There are now two new fields on identity:
- defaultHostingPrecedence - value values are
default
,required
andfailed
. Defaults
todefault
. - defaultHostingCost - valid values are between 0 and 65535. Defaults to 0.
When hosting a service via the tunneler, the terminator for the SDK hosted service will be created
with the precedence and cost of the identity used by the tunneler.
NOTE: This means all services hosted by an identity will have the same precedence and cost.
We'll likely add support for service specific overrides in the future if/when use cases arise which
call for it. In the meantime, a work-around is to use multiple identities if you need different
values for different services.
CLI Support
The ziti CLI supports setting the default hosting precedence and cost when creating identities
SDK API Change
The GO SDK has a new API method GetCurrentIdentity() (*edge.CurrentIdentity, error)
which lets SDK
users retrieve the currently logged in identity, including the default hosting precedence and cost.
This could be used by other SDK applications which may want to use the fields for the same reason
when hosting services.
Tunneler Health Checks
The go tunneler now supports health checks. Support for health checks may be added to other
tunnelers (such as ziti-edge-tunnel) in the future, but that is not guaranteed.
Health checks can be configured in the service configuration using the ziti-tunneler-server.v1
config type. Support in the host.v1
config type will be added when support for that config type is
added to the go tunneler.
Check Types
The tunneler supports two types of health check.
Port Checks
Port checks look to see if a host/port can be dialed. This is simple check which just ensures that
something is listening on a give host/port.
Port checks have the following properties:
- interval - how often the check is performanced
- timeout - how long to wait before declaring the check failed
- address - the address to dial. Should be of the form :. Example: localhost:5432
- actions - an array of actions to perform based on health check results. Actions will be discussed
in more detail below
HTTP Checks
Http checks a specific URL. They support the following properties:
- interval - how often the check is performanced
- timeout - how long to wait before declaring the check failed
- url - the url to connect to
- method - the HTTP method to use. Maybe one of
GET
,POST
,PUT
orPATCH
. Defaults toGET
- body - the body of the HTTP request. Defaults to an empty string
- expectStatus - the HTTP status to expect in the response. Defaults to 200
- expectBody - an optional string to look for in the response body.
- actions - an array of actions to perform based on health check results. Actions will be discussed
in more detail below
Health Check Actions
Each health check may specify actions to execute when a health check runs.
Each action may specify:
- trigger - valid values
pass
orfail
. Specifies if the action should run when the check is
passing or failing - consecutiveEvents - specifies if the action should only run after N consecutive passes or fails
- duration - specifies if the action should only run after the check has been passing or failing for
some period of time - action - specifies what to do when the action is run. valid values are:
mark healthy
- the terminator precedence will be set to the default hosting precedence of
the hosting identitymark unhealthy
- the terminator precedence will be set tofailed
increase cost N
- the terminator cost will be increased by N. This will only happen while
the terminator precedence is not failed. Once the terminator has failed we don't keep
increasing cost, otherwise it will likely reach max cost and take a long time to recover after
it goes back to healthy.decrease cost N
- the terminator cost will be decrease by N to a minimuim. The terminator
cost will not go below the hosting identity's default hosting cost
Examples
The following config defines a TCP service which can be reach at port 8171 on localhost
. It has a
port check defined which runs every 5 seconds, with a timeout of 500 milliseconds. The following
actions are defined on the health check:
- The terminator will be marked failed after the health check has failed 10 times in a row.
- The terminator cost will be increased by 100 each time the health check fails while the
terminator is not in failed state - The terminator will be returned to a non-failed state if the health check is healthy for 10
seconds - Every time the health check passes the cost will be reduced by 25, until it hits the baseline
cost defined by the hosting identity
{
"protocol" : "tcp",
"hostname" : "localhost",
"port" : 8171,
"portChecks" : [
{
"interval" : "5s",
"timeout" : "500ms",
"address" : "localhost:8171",
"actions": [
{
"action": "mark unhealthy",
"consecutiveEvents": 10,
"trigger": "fail"
},
{
"action": "increase cost 100",
"trigger": "fail"
},
{
"action": "mark healthy",
"duration": "10s",
"trigger": "pass"
},
{
"action": "decrease cost 25",
"trigger": "pass"
}
]
}
]
}
ziti-tunnel host command
The ziti-tunnel can now be run in a mode where it will only host services and will not intercept any
services.
Ex: ziti-tunnel host -i /path/to/identity.json
Schema Reference
For reference, here is the full, updated ziti-tunneler-server.v1
schema:
{
"$id": "http://edge.openziti.org/schemas/ziti-tunneler-server.v1.config.json",
"additionalProperties": false,
"definitions": {
"action": {
"additionalProperties": false,
"properties": {
"action": {
"pattern": "(mark (un)?healthy|increase cost [0-9]+|decrease cost [0-9]+)",
"type": "string"
},
"consecutiveEvents": {
"maximum": 65535,
"minimum": 0,
"type": "integer"
},
"duration": {
"$ref": "#/definitions/duration"
},
"trigger": {
"enum": [
"fail",
"pass"
],
"type": "string"
}
},
"required": [
"trigger",
"action"
],
"type": "object"
},
"actionList": {
"items": {
"$ref": "#/definitions/action"
},
"maxItems": 20,
"minItems": 1,
"type": "array"
},
"duration": {
"pattern": "[0-9]+(h|m|s|ms)",
"type": "string"
},
"httpCheck": {
"additionalProperties": false,
"properties": {
"actions": {
"$ref": "#/definitions/actionList"
},
"body": {
"type": "string"
},
"expectInBody": {
"type": "string"
},
"expectStatus": {
"maximum": 599,
"minimum": 100,
"type": "integer"
},
"interval": {
"$ref": "#/definitions/duration"
},
"method": {
"$ref": "#/definitions/method"
},
"timeout": {
"$ref": "#/definitions/duration"
},
"url": {
"type": "string"
}
},
"required": [
"interval",
"timeout",
"url"
],
"type": "object"
},
"httpCheckList": {
"items": {
"$ref": "#/definitions/httpCheck"
},
"type": "array"
},
"method": {
"enum": [
"GET",
"POST",
"PUT",
"PATCH"
],
"type": "string"
},
"portCheck": {
"additionalProperties": false,
"properties": {
"actions": {
"$ref": "#/definitions/actionList"
},
"address": {
"type": "string"
},
"interval": {
"$ref": "#/definitions/duration"
...
v0.18.1
Release 0.18.1
- Improve log output for invalid API Session Tokens used to connect to Edge Routers
- Logs default to no color output
- API Session Certificate Support Added
Logs default to no color output
Logs generated by Ziti components written in Go (Controller, Router, SDK) will
no longer output ANSI color control characters by default. Color logs can be
enabled by setting in the environment variable PFXLOG_USE_COLOR
to any
truthy value: 1, t, T, TRUE, true, True, 0, f, F, FALSE, false, False.
API Session Certificate Support Added
All authentication mechanisms can now bootstrap key pairs via an authenticated session
using API Session Certificates. These key pairs involve authenticating, preparing an
X509 Certificate Signing Request (CSR), and then submitting the CSR for processing.
The output is an ephemeral certificate tied to that session that can be used to
connect to Edge Routers for session dial/binds.
New Endpoints:
- current-api-session/certificates
- GET - lists current API Session Certificates
- POST - create a new API Session Certificate (accepts a JSON payload with a
csr
field)
- current-api-session/certificates/
- GET - retrieves a specific API Session Certificate
- DELETE - removes a specific API Session Certificate
API Session Certificates have a 12hr life span. New certificates can be created
before previous ones expire and be used for reconnection.
v0.18.0
Release 0.18.0
What's New
- ziti#253
ziti-tunnel enroll
should set non-zero
exit status if an error occur - Rewrite of Xgress with the following goals
- Fix deadlocks at high throughput
- Fix stalls when some endpoints are slower than others
- Improve windowing/retransmission by pulling forward some concepts from Michael Quigley's
transwarp work - Split xgress links into two separate connections, one for data and one for acks
- Allow hosting applications to mark incoming connections as failed. Update go tunneler so when a
dial fails for hosted services, the failure gets propagated back to controller - Streamline edge hosting protocol by allowing router to assign connection ids
- Edge REST query failures should now result in 4xx errors instead of 500 internal server errors
- Fixed bug where listing terminators via
ziti edge
would fail when terminators referenced pure
fabric services
Xgress Rewrite
Overview
This rewrite fixed several deadlocks observed at high throughput. It also tries to ensure that slow
clients attached to a router can't block traffic/processing for faster clients. It does this by
dropping data for a client if the client isn't handling incoming traffic quickly enough. Dropped
payloads will be retransmitted. The new xgress implementation uses similar windowing and
retransmission strategies to the upcoming transwarp work.
Backwards Compatability
0.18+ routers will probably work with older router versions, but probably not well. 0.18+ xgress
instances expect to get round trip times and receive buffer sizes on ack messages. If they don't get
them then retransmission will likely be either too agressive or not aggressive enough.
Mixing 0.18+ routers with older router versions is not recommended without doing more testing first.
Xgress Options Changes
Added
- txQueueSize - Number of payloads that can be queued for processing per client. Default value: 1
- txPortalStartSize - Initial size of send window. Default value: 16Kb
- txPortalMinSize - Smallest allowed send window size. Default value: 16Kb
- txPortalMaxSize - Largest allowed send window size. Default value: 4MB
- txPortalIncreaseThresh - Number of successful aks after which to increase send portal size:
Default value: 224 - txPortalIncreaseScale - Send portal will be increased by amount of data sent since last
retransmission. This controls how much to scale that amount by. Default value: 1.0 - txPortalRetxThresh - Number of retransmits after which to scale the send window. Default value: 64
- txPortalRetxScale - Amount by which to scale the send window after the retransmission threshold is
hit. Default value: 0.75 - txPortalDupAckThresh - Number of duplicates acks after which to scale the send window. Default
value: 64 - txPortalDupAckScale - Amount by which to scale the send window after the duplicate ack threshold
is hit. Default value: 0.9 - rxBufferSize - Receive buffer size. Default value: 4MB
- retxStartMs - Time after which, if no ack has been received, a payload should be queued for
retransmission. Default value: 200ms - retxScale - Amount by which to scale the retranmission timeout, which is calculated from the round
trip time. Default value: 2.0 - retxAddMs - Amount to add to the retransmission timeout after it has been scaled. Default value: 0
- maxCloseWaitMs - Maximum amount of time to wait for queued payloads to be
acknowledged/retransmitted after an xgress session has been closed. If queued payloads are all
acknowledged before this timeout is hit, the xgress session will be closed sooner. Default value:
30s
REMOVED: The retransmission option is no longer available. Retransmission can't be toggled off
anymore as that would lead to lossy connections.
Xgress Metrics Changes
New metrics were introduced as part of the rewrite.
NOTE: Some of these metrics were introduced to try and find places where tuning was required.
They may not be interesting or useful in the long term and may be removed in a future release.
The new metrics include:
New Meters
- xgress.dropped_payloads
- The count and rates payloads being dropped
- xgress.retransmissions
- The count and rates payloads being retransmitted
- xgress.retransmission_failures
- The count and rates payloads being retransmitted where the send fails
- xgress.rx.acks
- The count and rates of acks being received
- xgress.tx.acks
- The count and rates of acks being sent
- xgress.ack_failures
- The count and rates of acks being sent where the send fails
- xgress.ack_duplicates
- The count and rates of duplicate acks received
New Histograms
- xgress.rtt
- Round trip time statistics aggregated across all xgress instances
- xgress.tx_window_size
- Local window size statistics aggregated across all xgress instances
- xgress.tx_buffer_size
- Local send buffer size statistics aggregated across all xgress instances
- xgress.local.rx_buffer_bytes_size
- Receive buffer size statistics in bytes aggregated across all xgress instances
- xgress.local.rx_buffer_msgs_size
- Receive buffer size statistics in number of messages aggregated across all xgress instances
- xgress.remote.rx_buffer_size
- Receive buffer size from remote systems statistics aggregated across all xgress instances
- xgress.tx_buffer_size
- Receive buffer size from remote systems statistics aggregated across all xgress instances
New Timers
- xgress.tx_write_time
- Times how long it takes to write xgress payloads from xgress to the endpoint
- xgress.tx_write_time
- Times how long it takes to write acks to the link
- xgress.payload_buffer_time
- Times how long it takes to process xgress payloads coming off the link (mostly getting them
into the receive buffer)
- Times how long it takes to process xgress payloads coming off the link (mostly getting them
- xgress.payload_relay_time
- Times how long it takes to get xgress payloads out of the recieve buffer and queued to be sent
New Gauges
- xgress.blocked_by_local_window
- Count of how many xgress instances are blocked because the local tranmit buffer size equals or
exceeds the window size
- Count of how many xgress instances are blocked because the local tranmit buffer size equals or
- xgress.blocked_by_local_window
- Count of how many xgress instances are blocked because the remote receive buffer size equals
or exceeds the window size
- Count of how many xgress instances are blocked because the remote receive buffer size equals
- xgress.tx_unacked_payloads
- Count of payloads in the transmit buffer
- xgress.tx_unacked_payload_bytes
- Size in bytes of the transmit buffer
Split Links
The fabric will now create two channels for each link, one for data and the other for acks. When
establishing links the dialing side will attach headers indicating the channel type and a shared
link ID. If the receiving side doesn't support split links then it will treat both channels as
regular links and send both data and acks over both.
If an older router dials a router expecting split links it won't have the link type and will be
treated as a regular, non-split link.
Allow SDK Hosting Applications to propagate Dial Failures
The service terminator strategies use dial failures to adjust terminator weights and/or mark
terminators as failed. Previously SDK applications didn't have a way to mark a dial as failed. If
the SDK was hosting an application, this was generally not a problem. If the application could be
reached, it wouldn't want to mark an incoming connection as failed. However, the tunneler is just
proxying connections. It wants to be able to reach out to another application when the service is
dialed and proxy data. If the dial fails, it wants to be able to notify the controller that the
application wasn't reachable. The golang SDK now has the capability.
There is a new API on edge.Listener
.
AcceptEdge() (Conn, error)
The Conn
returned here is an edge.Conn
(which extends net.Conn
). edge.Conn
has two new APIs.
CompleteAcceptSuccess() error
CompleteAcceptFailed(err error)
If ListenWithOptions
is called with the ManualStart: true
in the provided options, the
connection won't be established until CompleteAcceptSuccess
is called. Writing or reading the
connection before call that method will have undefined results.
The ziti-tunnel has been updated to use this API, and so should now work correctly with the various
terminator strategies.
Edge Hosting Dial Protocol Enhancement
When establishing a new virtual connection to hosted SDK application the router had to execute the
following steps:
- Send a Dial message to the sdk application
- Receive the dial response, which included the sdk generaetd connection id.
- Create the router side virtual connection with the new id and register it
- Create the xgress instance tied to the new connection
- Now that the xgress is created, send a message to the sdk application letting it now that it can
start sending traffic
If the connection id could be established on the router, we could simplify things as follows
- Create the router side virtual connection with the new id and register it
- Create the xgress instance tied to the new connection
- Send the dial mesasge to the sdk with the connection id
- Receive the response and return the result to the controller
We didn't do this previously because the sdk controls ids for outbound connection. To enable this we
have split the 32 bit id range in half. The top half is now reserved for hosted connection ids. This
behavior is controlled by the SDK, which requests it when it binds uisng a boolean flag. The new
flag is:
RouterProvidedConnId = 1012
If the bind result from the router has the same flag set to true, then the sdk will expect Dial
messages from the router to have a connection id provided in the header keyed with the same 1012
.
This means that this feature should be both backwards and forward compatible.
v0.17.5
Release 0.17.5
What's New
-
Builds have been moved from travis.org to Github Actions
-
IDs generated for entities in the Edge no longer use underscores and instead use periods to avoid issues when used as a common name in CSRs
-
edge#424 Authenticated, non-admin, clients can query service terminators
-
sdk-golang#112 Process checks for Windows are case-insensitive
-
The CLI agent now runs over unix sockets and is enabled by default. See doc/ops-agent.md for details in the ziti repository.
-
ziti#245 Make timeout used by CLI's internal REST client configurable via cmd line arg
All
ziti edge controller
subcommands now support the--timeout=n
flag which controls the internal REST-client timeout used when communicating with the Controller. The timeout resolution is in seconds. If this flag is not specified, the default is5
. Prior to this release, the the REST-client timeout was always2
. You now have the opportunity to increase the timeout if necessary (e.g. if large amounts of data are being queried).All
ziti edge controller
subcommands now support the--verbose
flag which will cause internal REST-client to emit debugging information concerning HTTP headers, status, raw json response data, and more. You now have the opportunity to see much more information, which could be valuable during trouble-shooting.