-
Notifications
You must be signed in to change notification settings - Fork 21
/
fb-zstd.py
95 lines (73 loc) · 25.8 KB
/
fb-zstd.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
"""
Example add-on how to decompress traffic for 'graph.facebook.com/graphql', which arrives ZSTD-compressed
with a CUSTOM DICTIONARY TRAINED BY FACEBOOK.
The purpose of this add-on is not only to demo how zstd decompression with custom dicts could be achieved,
but also to emphasize the reasons on why "This is a bad idea".
Reasons not to decompress ZSTD with custom dicts:
The custom dicts are context specific (target service, target endpoint, dict version etc etc) and would have
to be available for all possible use-case. This is almost impossible to do from the perspective of a central
interception proxy, which has only limited awareness of the setup of the requesting clients (in fact, the
proxy can't know the proper decompression dictionary to use, unless it gets transmitted along - doing so would
counter the effect of dictionary based ZSTD compression and is unlikely to happen in the wild).
Also, this demo should show, that there is not enough "wire information" to safely conclude on the proper dictionary
to use. This addon is limited to the following criteria, in order to deploy decompression (with a hardcoded dictionary):
- request endpoint is 'https://graph.facebook.com/graphql'
- a header indicating usage of ZSTD compression is included in the response ('content-encoding: x-fb-dz')
- a header indicating usage of ZSTD dictionary number '1' for this endpoint is used ('x-fb-dz-dict: 1')
Even with all this criteria applied, it could not reliably determined which dictionary to use. This is because
the dictionary gets never transmitted "over the wire". Instead, it is hardcoded into the client (from where I dumped it)
and thus depends on the client version (facebook updates those dictionaries, as they are trained based on API traffic).
This again means, without knowing the exact client version of the Facebook app in use, no safe conclusion could be drawn
on the correct dictionary to use.
So this example is more like "doing it the hard way", which you shouldn't. The "easy way" would be to alter the
'accept-encoding' header to avoid responses compressed with 'zstd', at all.
I left a comment on how to do this (for Facebook traffic) in the following mitmproxy github issue:
https://github.com/mitmproxy/mitmproxy/issues/4394#issuecomment-957459382
In short words, it would be easier to replace request headers like
'accept-encoding: x-fb-dz;d=1, zstd, gzip, deflate'
which prefer zstd compression (with unknown dictionary #1), with
'accept-encoding: gzip, deflate'
which prefers 'gzip' compression (automatically handled by mitmproxy).
Easy as that.
To run this add-on use (bypassing cert pinning is up to you):
# mitmproxy -s /path/to/fb-zstd.py
"""
from mitmproxy import flowfilter
from mitmproxy import ctx, http
from base64 import b64decode
# while 'mitmproxy.net.encoding' has ZSTD support, it does not support traiing dictionaries and cannot be used
import zstandard
# ZSTD dictionary #1, dumped from libcoldstart.so of com.facebook.katana v342.0.0.37.119 (arm32)
FB_ZSTD_DICT1 = "" # noqa: E501
class Filter:
def __init__(self):
# only apply to traffic, which fullfills the following conditions, as the ZSTD compression dict only applies to this
# - request URL 'graph.facebook.com/graphql'
# - response header 'content-encoding: x-fb-dz' exists (indicates usage ZSTD compression)
# - response header 'x-fb-dz-dict: 1' exists (indicates that the ZSTD dict #1 was used to create the response)
#
# Warning: There is no filter criteria which enforce traffic for a specific client version, while the dict in use was
# extracted from Faceboo Android App v342.0.0.37.119. To train dictionaries for ZSTD compression is an ongoing
# process, which means facebook will very likely ship newer versions of the dictionary with newer clients
self.filter: flowfilter.TFilter = flowfilter.parse('~u graph.facebook.com/graphql & ~hs "x-fb-dz-dict:\\\\s*1" & ~hs "content-encoding:\\\\s*x-fb-dz"')
d_dict=zstandard.ZstdCompressionDict(data=b64decode(FB_ZSTD_DICT1))
self.decompressor = zstandard.ZstdDecompressor(d_dict)
def load(self, loader):
pass
def response(self, flow: http.HTTPFlow) -> None:
if flowfilter.match(self.filter, flow):
ctx.log.info("Flow matches filter:")
# decompress the body
if flow.response is not None and flow.response.raw_content is not None:
compressed = flow.response.raw_content
try:
decompressed = self.decompressor.decompress(compressed)
# replace content
flow.response.content = decompressed
# remove 'content-encoding', x-fb-dz' and 'x-fb-dz-dict' headers
del flow.response.headers[b"content-encoding"]
del flow.response.headers[b"x-fb-dz-dict"]
ctx.log.info(decompressed)
except:
pass # if it fails, it fails
addons = [Filter()]