filemon

Monitor what is modifiying your files

Files | Log | Commits | Refs | README


50821d3

Author: SM

Date: 2025-09-04

Subject: impl inotify, fanotify & readme

Diff

commit 50821d35bb725e942cf9ff127f6c977e4d27c8bc
Author: SM <seb.michalk@gmail.com>
Date:   Thu Sep 4 11:43:30 2025 +0200

    impl inotify, fanotify & readme

diff --git a/Makefile b/Makefile
index 11e02af..d0f3a1e 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 CC = gcc
 CFLAGS = -std=c99 -Wall -Wextra -O2 -D_POSIX_C_SOURCE=200809L
-TARGET = who
+TARGET = filemon
 SOURCES = main.c proc.c uid.c
 
 .PHONY: all clean install
diff --git a/README.md b/README.md
index c3fe281..59206f0 100644
--- a/README.md
+++ b/README.md
@@ -1,63 +1,31 @@
-# who
+# README
 
-who is a simple file monitor that tracks which processes modify files
+"filemon" tracks which processes modify files on Linux. It gives you the PID, user, command, and working directory without pulling in a web of dependencies or frameworks. Why should finding out which process wrote to a file require complex setups, containers, or megabytes of libraries? 
 
-## usage
+No configuration files, no plugins, no threads. Just ~650 lines of C with a single, focused purpose: tell you who touched your files; accurately when possible, gracefully when not. It uses built-in Linux interfaces like fanotify or falls back to simple heuristics when permissions or kernel limitations demand it. If any part of the system fails, it degrades, it doesn't crash. 
 
-	who [-o output] directory
+## How It Works
 
-monitor `directory` recursively and log file changes to `output` (default: who.log)
+The program uses `inotify` to monitor file operations, enhanced by two methods for associating events with processes:
 
-## build
+ When running as root, it uses `fanotify` to obtain exact PIDs for write operations. Events from inotify and fanotify are correlated by matching paths and timestamps. 
+ 
+ Without root privileges, a heuristic approach is used instead: the process table is scanned for processes that started before the event and are likely responsible. While this fallback is less precise, it works for most scenarios.
 
-	make
+Some events, like metadata updates (`chmod`, `unlink`) or deletes, do not generate fanotify notifications, even with root. These always rely on the heuristic. The process table is maintained dynamically using netlink to track active processes, ensuring it is accurate during correlation.
 
-requires linux with netlink connector support
+## Design and Implementation
 
-## how it works
+The system is built to be small, efficient, and reliable. Events are processed sequentially, avoiding threads or dependencies. Fanotify events are cached in a lightweight ring buffer indexed by path. On an inotify event, this cache is checked first. If no match exists, filemon falls back to consulting the process table. 
 
-- uses inotify for file events
-- uses netlink connector for process events  
-- correlates file changes to processes via /proc filesystem
-- filters kernel threads and long-running daemons
-- prefers recently started user processes
+## Limitations
 
-## output format
+Accurate PID detection requires root. Without it, the heuristic may miss operations that occur quickly or involve multiple writers. Metadata operations and deletes always use the heuristic, regardless of privileges, due to kernel limitations. Network filesystems may not emit events depending on their mount options and protocols. The program is designed only for local filesystems. Modern distributions with restrictions on kernel interfaces such as `eBPF` may further constrain the heuristic's effectiveness.
 
-	timestamp action path pid=N uid=N gid=N comm=name cwd=dir
+## Compatibility
 
-## requirements
+The program requires Linux 2.6.37 or newer to use fanotify. For older kernels, only heuristic correlation is available. It has been tested on kernel versions from 3.10 to 6.x and works in containers when fanotify is permitted. It compiles on any POSIX system, but the monitoring is Linux-specific. There are no runtime dependencies, special kernel modules, or configuration files. A single binary can run anywhere. 
 
-- linux 2.6.14+
-- root privileges (for netlink connector)
-- gcc with c99 support
+## Testing
 
-## architecture
-
-uses kernel apis directly instead of heavyweight frameworks:
-
-- netlink connector catches all process lifecycle events (fork/exec/exit)
-- inotify provides efficient file change notifications
-- /proc filesystem gives process context (cwd, uid, comm)
-- epoll multiplexes events in single thread
-
-correlation heuristic: prefer recently started processes with directory access
-
-who tracks process lifecycle events via netlink. processes started before
-it's initialization are not tracked and may be incorrectly attributed to
-long-running parent processes (shells, multiplexers). start who before
-launching monitored applications for accurate correlation.
-
-## limitations
-
-- linux specific (netlink, /proc, inotify)
-- requires root for netlink connector
-- correlation is heuristic, not guaranteed accurate
-- no support for containers/namespaces
-- limited to MAX_PROCS (1024) tracked processes
-- long pathnames may be truncated
-- processes started before who cannot be correlated accurately
-
-## license
-
-MIT
+Testing is manual to preserve simplicity. Automated process correlation testing would introduce unnecessary complexity that goes against the program's minimalistic design.
\ No newline at end of file
diff --git a/main.c b/main.c
index 73c73da..5347686 100644
--- a/main.c
+++ b/main.c
@@ -1,9 +1,9 @@
 /* See LICENSE file for copyright and license details.
- * who - file change process tracker
- * Monitors directory recursively and tracks which processes modify files
+ * filemon - monitors directory recursively and tracks which processes modify files
  */
 
 #define _POSIX_C_SOURCE 200809L
+#define _GNU_SOURCE
 
 #include <errno.h>
 #include <fcntl.h>
@@ -17,6 +17,7 @@
 #include <sys/socket.h>
 #include <sys/stat.h>
 #include <sys/types.h>
+#include <sys/select.h>
 #include <time.h>
 #include <unistd.h>
 
@@ -27,6 +28,7 @@
 #include <linux/cn_proc.h>
 #include <linux/connector.h>
 #include <linux/netlink.h>
+#include <sys/fanotify.h>
 
 #include "proc.h"
 #include "uid.h"
@@ -57,6 +59,17 @@ static Process procs[MAX_PROCS];
 static int nprocs = 0;
 static char dir[MAX_PATH];
 
+static int usefan = -1, fanfd = -1;
+
+struct fev {
+	char path[256];
+	pid_t pid;
+	time_t ts;
+};
+
+static struct fev fevs[32];
+static int nfevs = 0;
+
 /* function declarations */
 static void die(const char *fmt, ...);
 static void usage(void);
@@ -73,6 +86,9 @@ static void logchange(const char *path, Process *proc, uint32_t mask);
 static void cleanup(void);
 static void sighandler(int sig);
 static void scanprocs(void);
+static int initfan(void);
+static Process *corfan(const char *path, time_t ts);
+static Process *corheur(const char *path, time_t ts);
 
 static void
 die(const char *fmt, ...)
@@ -96,7 +112,7 @@ die(const char *fmt, ...)
 static void
 usage(void)
 {
-	die("usage: who directory\n");
+	die("usage: filemon directory\n");
 }
 
 static void
@@ -157,6 +173,41 @@ initnetlink(void)
 	return sock;
 }
 
+static void
+readfan(void)
+{
+    char buf[4096], path[256], fdpath[64];
+    struct fanotify_event_metadata *meta;
+    ssize_t len, plen;
+
+    if ((len = read(fanfd, buf, sizeof(buf))) < (ssize_t)sizeof(*meta))
+        return;
+
+    for (meta = (struct fanotify_event_metadata *)buf; 
+         FAN_EVENT_OK(meta, len); 
+         meta = FAN_EVENT_NEXT(meta, len)) 
+    {
+        if (meta->fd >= 0)
+            close(meta->fd);
+
+        if (meta->vers != FANOTIFY_METADATA_VERSION || meta->fd < 0 || nfevs >= 32)
+            continue;
+
+        snprintf(fdpath, sizeof(fdpath), "/proc/self/fd/%d", meta->fd);
+
+        if ((plen = readlink(fdpath, path, sizeof(path) - 1)) <= 0)
+            continue;
+
+        path[plen] = 0;
+
+        strncpy(fevs[nfevs].path, path, sizeof(fevs[nfevs].path) - 1);
+        fevs[nfevs].path[sizeof(fevs[nfevs].path) - 1] = 0;
+        fevs[nfevs].pid = meta->pid;
+        fevs[nfevs].ts = time(NULL);
+        ++nfevs;
+    }
+}
+
 static void
 handleproc(void)
 {
@@ -353,7 +404,7 @@ static int
 hasaccess(Process *proc, const char *filepath)
 {
 	char *fdir, *dpath;
-	int cwdlen, dpathlen;
+	int cwdlen, dpathlen, result;
 
 	if (!(fdir = strdup(filepath)))
 		return 0;
@@ -367,16 +418,16 @@ hasaccess(Process *proc, const char *filepath)
 	cwdlen = strlen(proc->cwd);
 	dpathlen = strlen(dpath);
 
-	int result = !strcmp(dpath, proc->cwd) ||
-	            (!strncmp(dpath, proc->cwd, cwdlen) && dpath[cwdlen] == '/') ||
-	            (!strncmp(proc->cwd, dpath, dpathlen) && proc->cwd[dpathlen] == '/');
+	result = !strcmp(dpath, proc->cwd) ||
+	         (!strncmp(dpath, proc->cwd, cwdlen) && dpath[cwdlen] == '/') ||
+	         (!strncmp(proc->cwd, dpath, dpathlen) && proc->cwd[dpathlen] == '/');
 
 	free(fdir);
 	return result;
 }
 
 static Process *
-correlate(const char *path, time_t ts)
+corheur(const char *path, time_t ts)
 {
 	Process *best, *proc;
 	time_t bestdiff, diff;
@@ -391,7 +442,6 @@ correlate(const char *path, time_t ts)
 		    strstr(proc->comm, "migration") || proc->comm[0] == '[')
 			continue;
 
-		/* Only update process info if it's stale */
 		if (ts - proc->start > 60)
 			updateproc(proc);
 
@@ -408,6 +458,63 @@ correlate(const char *path, time_t ts)
 	return best;
 }
 
+static Process *
+corfan(const char *path, time_t ts)
+{
+	int i;
+
+    for (i = 0; i < nfevs; ++i) {
+        if (ts - fevs[i].ts > 5 || strcmp(fevs[i].path, path)) {
+            continue;
+        }
+        Process *proc = findproc(fevs[i].pid);
+        if (proc) {
+            updateproc(proc);
+            return proc;
+        }
+        addproc(fevs[i].pid, 0, "fanotify");
+        if ((proc = findproc(fevs[i].pid))) {
+            updateproc(proc);
+            return proc;
+        }
+        return NULL;
+    }
+    return NULL;
+}
+
+static Process *
+correlate(const char *path, time_t ts)
+{
+    Process *proc;
+    fd_set rfds;
+	int result;
+    struct timeval tv = {0, 0};
+
+    if (usefan < 0) {
+        initfan();
+    }
+
+    if (usefan && fanfd >= 0) {
+        FD_ZERO(&rfds);
+        FD_SET(fanfd, &rfds);
+
+        result = select(fanfd + 1, &rfds, NULL, NULL, &tv);
+
+        if (result > 0) {
+            readfan();
+        }
+    }
+
+    if (usefan) {
+        proc = corfan(path, ts);
+        if (proc) {
+            return proc;
+        }
+    }
+
+    return corheur(path, ts);
+}
+
 static const char *
 maskstr(uint32_t mask)
 {
@@ -481,6 +588,22 @@ scanprocs(void)
 	closedir(proc_dir);
 }
 
+static int
+initfan(void)
+{
+	struct epoll_event ev;
+	
+	return usefan != -1 ? usefan :
+	       (usefan = 0,
+	        !getuid() &&
+	        (fanfd = fanotify_init(FAN_CLASS_NOTIF, O_RDONLY | O_LARGEFILE)) >= 0 &&
+	        !fanotify_mark(fanfd, FAN_MARK_ADD | FAN_MARK_MOUNT, 
+	                      FAN_MODIFY | FAN_CLOSE_WRITE | FAN_OPEN | FAN_ACCESS, AT_FDCWD, dir) &&
+	        (ev.events = EPOLLIN, ev.data.fd = fanfd,
+	         !epoll_ctl(efd, EPOLL_CTL_ADD, fanfd, &ev)) ?
+	        (usefan = 1) : (fanfd >= 0 && close(fanfd), fanfd = -1, 0));
+}
+
 int
 main(int argc, char **argv)
 {
@@ -506,21 +629,26 @@ main(int argc, char **argv)
 
 	nlfd = initnetlink();
 	ifd = initinotify(dir);
+	
 
 	uidinit();
 	scanprocs();
 
-	printf("who - started monitoring %s\n", dir);
+	printf("filemon - started monitoring %s\n", dir);
 	printf("active processes: %d\n", nprocs);
 	fflush(stdout);
 
-  for (;;) {
+  for (; state != STATE_SHUTDOWN;) {
 		nfds = epoll_wait(efd, events, MAX_EVENTS, -1);
-		if (nfds < 0)
-			errno == EINTR ? (void)0 : die("epoll_wait:");
+		if (nfds < 0) {
+			if (errno == EINTR)
+				continue;
+			die("epoll_wait:");
+		}
 
 		for (struct epoll_event *ev = events; ev < events + nfds; ++ev)
-			(ev->data.fd == nlfd) ? handleproc() : handlefile();
+			ev->data.fd == nlfd ? handleproc() :
+			ev->data.fd == fanfd ? readfan() : handlefile();
 	}
 
 	cleanup();