sys-file-indexer
indices the directory specified as last
argument or the current directory by default.
sys-file-indexer
always outputs the result to stdout.
sys-file-indexer
has the following modes of operation:
-
Normal mode: outputs a special CSV file that combines the two datasets to generate and that does not contain unique ID. This needs to be processed further by split mode to be useful. No options are necessary.
Normal mode can benefit from a previous run if data is supplied with the
-delta
option. In this case,sys-file-indexer
uses the data generated by a previous run whenever the modification time of a file has not changed. -
Split mode: split mode takes the file generated with the output for normal mode as input and generates either the CSV for the sys_file dataset or for sys_file_metadata. See options
-ofile
and-ometa
. -
SQL mode: outputs readily usable SQL INSERT statements that can be piped directly to the database.
-
Single mode: outputs one single CSV dataset. Useful for testing onty.
Generate the normal mode CSV output:
$ sys-file-indexer >../normal.csv
Update a previously generated normal mode CSV:
$ sys-file-indexer -delta=../normal.csv >../new-normal.csv
Split normal mode CSV to generate two datasets:
$ sys-file-indexer -ofile=normal.csv >sys_file.csv
$ sys-file-indexer -ometa=normal.csv >sys_file_metadata.csv
Generate metadata directly into the database (cannot use -delta):
$ sys-file-indexer -sql | mysql ...
sys-file-indexer can be run on multiple machines if that leads to an increase in I/O throughput.
host1$ sys-file-indexer -w 1 -wg 3 ... > result1.csv
host2$ sys-file-indexer -w 2 -wg 3 ... > result2.csv
host3$ sys-file-indexer -w 3 -wg 3 ... > result3.csv
host1$ cat result1.csv result2.csv result3.csv > result.csv
- Can scan multiple directories