Wednesday, April 06, 2011

Finding unused files in a SCons project with Process Monitor

New technologies are born and die, but one things remains in your project - their files. Quite often you have no idea about where are they used, and attempt to remove them may lead to serious consequences.

Fortunately, if your project is managed by fine grained build system such as SCons, if your build scripts are not globbing too much, there are chances you can find files that are not participating in the builds.

Here is how to do this on Windows using Process Monitor tool that intercepts all system calls including file access.

While build systems are usually common for C/C++ and Java projects, it is possible to add fine-grained file usage control for any project. For example, SCons itself is written entirely in Python, it could run directly from the source checkout or build distributives from checkout. But instead, it uses build procedure to copy all necessary files from checkout into separate directory and do stuff from there.

Thanks to that it is possible to see which files are no more actual. While it is possible to compare checkout source tree and copied directory trees, I'll go through the hells of monitoring system file access in a source tree during the build process using Process Monitor (FileMon in the past). Linux should have similar tools too - let me know how are they called.

The process is the following:
  1. Start Process Monitor
  2. Stop incoming event flood by (un)clicking Capture (Ctrl-E) button
  3. Open Filter (Ctrl-L) dialog to add some filters
  4. SCons build is started by bootstrap.py script from a root of SCons source checkout. The script is executed by python executable, so I add python.exe process name to the filter. I know that bootstrap.py copies files from src/ subdirectory, so it is the directory I need to monitor, so I add this dir to filters too.

  5. Go Tools -> File Summary...
  6. There is a list of paths catched by Process Monitor when listening to system calls. They are already filtered, but additional filters can be applied using bottom left button to make information even more useful.

  7. Export to CSV using Save...

Exported CSV is not very useful without some postprocessing. I used the following a script to compare the list of paths in CSV to actual src/ directory contents. This gives me names of files that were not touched during build at all.

SRCDIR = "C:\\p\\python\\scons\\src"
CSVLIST = 'accessed_bootstrap_files.CSV'

import csv
import os

reader = csv.reader(open(CSVLIST))
header = reader.next()
pathidx = header.index("Path")
pathset = set([row[pathidx] for row in reader])

#for row in pathset:
#  print row

fileset = set()
for root, dirs, files in os.walk(SRCDIR):
  fileset.update( [os.path.join(root, f) for f in files] )
  if '.svn' in dirs:
    dirs.remove('.svn')  # don't visit .svn directories

if len(pathset & fileset) == 0:
  print 'Error: File sets do not intersect at all'

print "Files not found in source directory tree:"
for f in (pathset - fileset):
  if not os.path.isdir(f):
    print f

print
print "Untouched files in source directory tree:"
for f in sorted(fileset - pathset):
  if not os.path.isdir(f):
    print f
I've found a few interesting things about SCons. Core tests are mixed with source files in repository checkout. They are not copied during bootstrap build. There are also few setup.py files, post-install script and announcement that don't participate in the build.

Here is the output of the above script:

Files not found in source directory tree:
<Total>

Untouched files in source directory tree:
C:\p\python\scons\src\.aeignore
C:\p\python\scons\src\Announce.txt
C:\p\python\scons\src\engine\.aeignore
C:\p\python\scons\src\engine\SCons\.aeignore
C:\p\python\scons\src\engine\SCons\ActionTests.py
C:\p\python\scons\src\engine\SCons\BuilderTests.py
C:\p\python\scons\src\engine\SCons\CacheDirTests.py
C:\p\python\scons\src\engine\SCons\DefaultsTests.py
C:\p\python\scons\src\engine\SCons\EnvironmentTests.py
C:\p\python\scons\src\engine\SCons\ErrorsTests.py
C:\p\python\scons\src\engine\SCons\ExecutorTests.py
C:\p\python\scons\src\engine\SCons\JobTests.py
C:\p\python\scons\src\engine\SCons\MemoizeTests.py
C:\p\python\scons\src\engine\SCons\Node\.aeignore
C:\p\python\scons\src\engine\SCons\Node\AliasTests.py
C:\p\python\scons\src\engine\SCons\Node\FSTests.py
C:\p\python\scons\src\engine\SCons\Node\NodeTests.py
C:\p\python\scons\src\engine\SCons\Node\PythonTests.py
C:\p\python\scons\src\engine\SCons\Optik\.aeignore
C:\p\python\scons\src\engine\SCons\PathListTests.py
C:\p\python\scons\src\engine\SCons\Platform\.aeignore
C:\p\python\scons\src\engine\SCons\Platform\PlatformTests.py
C:\p\python\scons\src\engine\SCons\SConfTests.py
C:\p\python\scons\src\engine\SCons\SConsignTests.py
C:\p\python\scons\src\engine\SCons\Scanner\.aeignore
C:\p\python\scons\src\engine\SCons\Scanner\CTests.py
C:\p\python\scons\src\engine\SCons\Scanner\DirTests.py
C:\p\python\scons\src\engine\SCons\Scanner\FortranTests.py
C:\p\python\scons\src\engine\SCons\Scanner\IDLTests.py
C:\p\python\scons\src\engine\SCons\Scanner\LaTeXTests.py
C:\p\python\scons\src\engine\SCons\Scanner\ProgTests.py
C:\p\python\scons\src\engine\SCons\Scanner\RCTests.py
C:\p\python\scons\src\engine\SCons\Scanner\ScannerTests.py
C:\p\python\scons\src\engine\SCons\Script\.aeignore
C:\p\python\scons\src\engine\SCons\Script\MainTests.py
C:\p\python\scons\src\engine\SCons\Script\SConscriptTests.py
C:\p\python\scons\src\engine\SCons\SubstTests.py
C:\p\python\scons\src\engine\SCons\TaskmasterTests.py
C:\p\python\scons\src\engine\SCons\Tool\.aeignore
C:\p\python\scons\src\engine\SCons\Tool\JavaCommonTests.py
C:\p\python\scons\src\engine\SCons\Tool\PharLapCommonTests.py
C:\p\python\scons\src\engine\SCons\Tool\ToolTests.py
C:\p\python\scons\src\engine\SCons\Tool\f03.xml
C:\p\python\scons\src\engine\SCons\Tool\msvsTests.py
C:\p\python\scons\src\engine\SCons\UtilTests.py
C:\p\python\scons\src\engine\SCons\Variables\BoolVariableTests.py
C:\p\python\scons\src\engine\SCons\Variables\EnumVariableTests.py
C:\p\python\scons\src\engine\SCons\Variables\ListVariableTests.py
C:\p\python\scons\src\engine\SCons\Variables\PackageVariableTests.py
C:\p\python\scons\src\engine\SCons\Variables\PathVariableTests.py
C:\p\python\scons\src\engine\SCons\Variables\VariablesTests.py
C:\p\python\scons\src\engine\SCons\WarningsTests.py
C:\p\python\scons\src\engine\SCons\cppTests.py
C:\p\python\scons\src\engine\setup.cfg
C:\p\python\scons\src\engine\setup.py
C:\p\python\scons\src\script\.aeignore
C:\p\python\scons\src\script\scons-post-install.py
C:\p\python\scons\src\script\setup.cfg
C:\p\python\scons\src\script\setup.py
C:\p\python\scons\src\test_aegistests.py
C:\p\python\scons\src\test_files.py
C:\p\python\scons\src\test_interrupts.py
C:\p\python\scons\src\test_pychecker.py
C:\p\python\scons\src\test_setup.py
C:\p\python\scons\src\test_strings.py


Hope this helps clean up your projects too.

P.S. I wish there was a Python script replacement for Process Monitor, or at least that it could be controlled from command line.

2 comments:

  1. procmon can be automated from the command-line like so:


    set PM=C:\path\to\procmon.exe
    start %PM% /quiet /minimized /backingfile C:\path\to\pytest.pml
    %PM% /waitforidle
    start /wait C:\path\to\python.exe myscript.py
    %PM% /terminate
    start %PM% /quiet /minimized /openlog C:\path\to\mydump.pml /SaveAs C:\path\to\mydata.csv

    The downside is that the dumps can be huge, so this may not work so well with multi-hour builds of native code.

    ReplyDelete
  2. Thanks. Thats a good start at least. Unfortunately, PM configuration is in some binary format, so there still is no way to setup filters from Python.

    ReplyDelete