-
Notifications
You must be signed in to change notification settings - Fork 2
/
CHANGES.txt
83 lines (62 loc) · 3.65 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
Version 0.3.13
- fs-c parse allows now to specify multiple handler types in a single run.
- New command fs-c gpimport to import FS-C trace data into the Greenplum database.
- fs-c trace has new option --salt to add a salt to the chunk fingerprint calculation
- fs-c trace has new option --privacy-mode to allow a directory-and privacy preserving filename handling.
If --privacy-mode=flat-sha1 is used, the full filepath is fingerprinted using sha-1. If
--privacy-mode=directory-sha1 is used, each filename of the full filepath is fingerprinted individually.
This preserves the general directory structure, while still providing sufficient privacy in most situations.
- Input files are now provided as parameters instead of -f options.
The -f options are still supported, but are deprecated.
- Add standard parse handler to extract file metadata details and statistics from a trace file
- Multiple bug fixes
- Fix a Java 7 resource leak
- Fix handling of named pipes (no only directories and regular files are processed at all)
- Fix an output error if sample size is not large enough for Harnik's method
- Single threaded mode is now really single threaded
Version 0.3.12
- Implement Harnik's estimation method (-t harniks).
It differs from the original approach by using reservoir sampling because "n" is not known before the first pass.
Provides an good (error usually < 1%) estimation of the deduplication ratio without the need to do
a full (often Hadoop-based) analysis.
- Updated pig scripts.
The Hadoop-based analysis is now faster and intermediate results are stored to avoid duplicate
calculations
- fs-c/import is now multi-threaded for higher import throughput
- Update to Protobuf 2.4.1 from 2.3.0, gives performance improvement from up to 20% during the parsing.
Version 0.3.10
- Fix a file descriptor leak under Linux
- Option to import a full file fingerprint during fs-c/import runs
- Uses Java 7 directory walker (if available, Java 5/6 is still supported as target platform)
Version 0.3.8
- Fix a bug in fs-c import
Version 0.3.7
- Fix a bug that causes a trace file corruptions
- Add command fs-c validate, which checks if a given trace file is valid
Version 0.3.6
- Import xml.minidom in fs-c only if actually needed. Not all Python installations have xml support available
- Use a fs-c directory in the installation package as most software has it.
- Fix the github download
Version 0.3.5
- Overall performance improvements
- Improvement for very large directories (> 10.000 files)
- Updated libraries and build system (now using Scala 2.9 and sbt)
- Automatic FS-C cluster: Multiple machines cooperate crawling a (parallel) file system
Version 0.3.2
- Fix that cdc8 is not used as default chunker
- Add partial file trace data (to better handle large files)
- Add the --progress-file parameter to which all fully processed files are written
Version 0.2.3
- Add Visual Basic Script (VBS) for make the usage on windows easier
Version 0.2.0
- Digest type (default: SHA-1) can be exchanged by every digest method supported by the Java Security
framework
- Digest length (default: 20 Byte for SHA-1) can be reduced to decrease the size of trace files
- Switch from Actor-based Concurrency during Tracing to Executor-based Concurrency due to the better
conjection control
- Update to Google Protocol Buffer 2.1.0
- Import to Hadoop DFS
- Contrib: PIG Scripts to analyze large datasets using Hadoop MapReduce
- Different trace format support: Currently legacy (old binary format), protobuf
Version 0.1.0
- Initial Version