layout | title | permalink |
---|---|---|
doc |
MapR Integration |
/docs/mapr-integration.html |
Since Apache Eagle 0.4.0-incubating. Apache Eagle will be called Eagle in the following.
To get maprFSAuditLog monitoring started, we need to:
- Enable audit logs on MapR from MapR's terminal
- Created logstash conf file to send audit logs to Kafka1
- Initialize metadata for mapFSAuditLog and enabled the application
Here are the steps to follow:
First we need to enable data auditing at all three levels: cluster level, volume level and directory,file or table level.
$ maprcli audit data -cluster <cluster name> -enabled true
[ -maxsize <GB, defaut value is 32. When size of audit logs exceed this number, an alarm will be sent to the dashboard in the MapR Control Service > ]
[ -retention <number of Days> ]
Example:
$ maprcli audit data -cluster mapr.cluster.com -enabled true -maxsize 30 -retention 30
$ maprcli volume audit -cluster <cluster name> -enabled true
-name <volume name>
[ -coalesce <interval in minutes, the interval of time during which READ, WRITE, or GETATTR operations on one file from one client IP address are logged only once, if auditing is enabled> ]
Example:
$ maprcli volume audit -cluster mapr.cluster.com -name mapr.tmp -enabled true
To verify that auditing is enabled for a particular volume, use this command:
$ maprcli volume info -name <volume name> -json | grep -i 'audited\|coalesce'
and you should see something like this:
"audited":1,
"coalesceInterval":60
If "audited" is '1' then auditing is enabled for this volume.
$ hadoop mfs -setaudit on <directory|file|table>
To check whether Auditing is Enabled for a Directory, File, or MapR-DB Table, use $ hadoop mfs -ls
Example:
Before enable the audit log on file /tmp/dir
, try $ hadoop mfs -ls /tmp/dir
, you should see something like this:
drwxr-xr-x Z U U - root root 0 2016-03-02 15:02 268435456 /tmp/dir
p 2050.32.131328 mapr2.da.dg:5660 mapr1.da.dg:5660
The second U
means auditing on this file is not enabled.
Enable auditing with this command:
$ hadoop mfs -setaudit on /tmp/dir
Then check the auditing bit with :
$ hadoop mfs -ls /tmp/dir
you should see something like this:
drwxr-xr-x Z U A - root root 0 2016-03-02 15:02 268435456 /tmp/dir
p 2050.32.131328 mapr2.da.dg:5660 mapr1.da.dg:5660
We can see the previous U
has been changed to A
which indicates auditing on this file is enabled.
Important
:
When a directory has been enabled auditing, directories/files located in this dir won't inherit auditing, but a newly created file/dir (after enabling the auditing on this dir) in this directory will.
As MapR do not have name node, instead it use CLDB service, we have to use logstash to stream log data into Kafka.
- First find out the nodes that have CLDB service
- Then find out the location of audit log files, eg:
/mapr/mapr.cluster.com/var/mapr/local/mapr1.da.dg/audit/
, file names should be in this format:FSAudit.log-2016-05-04-001.json
- Created a logstash conf file and run it, following this docLogstash-kafka
After Eagle Service gets started, create mapFSAuditLog application using: $ ./maprFSAuditLog-init.sh
. By default it will create maprFSAuditLog in site "sandbox", you may need to change it to your own site.
After these steps you are good to go.
Have fun!!! :)
Footnotes
-
All mentions of "kafka" on this page represent Apache Kafka. ↩