
This is Part 2 of a series on mutation testing in Clojure. Part 1 introduced the concept and why Clojure needed a purpose-built tool.
The previous post made a claim: mutation testing can be fast if you know which tests to run. This post shows how Heretic makes that happen.
We'll walk through the three core phases: collecting expression-level coverage with ClojureStorm, transforming source code with rewrite-clj, and the optimization techniques that keep mutation counts manageable.
Traditional coverage tools track lines. Heretic tracks expressions.
The difference matters. Consider:
(defn process-order [order]
(if (> (:quantity order) 10)
(* (:price order) 0.9) ;; <- Line 3: bulk discount
(:price order)))
Line-level coverage would show line 3 as "covered" if any test enters the bulk discount branch. But expression-level coverage distinguishes between tests that evaluate *, (:price order), and 0.9. When we later mutate 0.9 to 1.1, we can run only the tests that actually touched that specific literal - not every test that happened to call process-order.
ClojureStorm is a fork of the Clojure compiler that instruments every expression during compilation. Created by Juan Monetta for the FlowStorm debugger, it provides exactly the hooks Heretic needs. (Thanks to Juan for building such a solid foundation - Heretic would not exist without ClojureStorm.)
The integration is surprisingly minimal:
(ns heretic.tracer
(:import [clojure.storm Emitter Tracer]))
(def ^:private current-coverage
"Atom of {form-id #{coords}} for the currently running test."
(atom {}))
(defn record-hit! [form-id coord]
(swap! current-coverage
update form-id
(fnil conj #{})
coord))
(defn init! []
;; Configure what gets instrumented
(Emitter/setInstrumentationEnable true)
(Emitter/setFnReturnInstrumentationEnable true)
(Emitter/setExprInstrumentationEnable true)
;; Set up callbacks
(Tracer/setTraceFnsCallbacks
{:trace-expr-fn (fn [_ _ coord form-id]
(record-hit! form-id coord))
:trace-fn-return-fn (fn [_ _ coord form-id]
(record-hit! form-id coord))}))
When any instrumented expression evaluates, ClojureStorm calls our callback with two pieces of information:
defn)"3,2,1" meaning "third child, second child, first child"Together, [form-id coord] pinpoints exactly which subexpression executed. This is the key that unlocks targeted test selection.
To connect a mutation in the source code to the coverage data, we need a way to uniquely address any subexpression. Think of it as a postal address for code - we need to say "the a inside the + call inside the function body" in a format that both the coverage tracer and mutation engine can agree on.
ClojureStorm addresses this with a path-based coordinate system. Consider this function as a tree:
(defn foo [a b] (+ a b))
│
├─[0] defn
├─[1] foo
├─[2] [a b]
└─[3] (+ a b)
│
├─[3,0] +
├─[3,1] a
└─[3,2] b
Each number represents which child to pick at each level. The coordinate "3,2" means "go to child 3 (the function body), then child 2 (the second argument to +)". That gives us the b symbol.
This works cleanly for ordered structures like lists and vectors, where children have stable positions. But maps are unordered - {:name "Alice" :age 30} and {:age 30 :name "Alice"} are the same value, so numeric indices would be unstable.
ClojureStorm solves this by hashing the printed representation of map keys. Instead of "0" for the first entry, a key like :name gets addressed as "K-1925180523":
{:name "Alice" :age 30}
│
├─[K-1925180523] :name
├─[V-1925180523] "Alice"
├─[K-1524292809] :age
└─[V-1524292809] 30
The hash ensures stable addressing regardless of iteration order.
With this addressing scheme, we can say "test X touched coordinate 3,1 in form 12345" and later ask "which tests touched the expression we're about to mutate?"
Here's a problem we discovered during implementation: how do we connect the mutation engine to the coverage data?
The mutation engine uses rewrite-clj to parse and transform source files. It finds a mutation site at, say, line 42 of src/my/app.clj. But the coverage data is indexed by ClojureStorm's form-id - an opaque identifier assigned during compilation. We need to translate "file + line" into "form-id".
Fortunately, ClojureStorm's FormRegistry stores the source file and starting line for each compiled form. We build a lookup index:
(defn build-form-location-index [forms source-paths]
(into {}
(for [[form-id {:keys [form/file form/line]}] forms
:when (and file line)
:let [abs-path (resolve-path source-paths file)]
:when abs-path]
[[abs-path line] form-id])))
When the mutation engine finds a site at line 42, it searches for the form whose start line is the largest value less than or equal to 42 - that is, the innermost containing form. This gives us the ClojureStorm form-id, which we use to look up which tests touched that form.
This bridging layer is what allows Heretic to connect source transformations to runtime coverage, enabling targeted test execution.
Coverage collection runs each test individually and captures what it touches:
(defn run-test-with-coverage [test-var]
(tracer/reset-current-coverage!)
(try
(test-var)
(catch Throwable t
(println "Test threw exception:" (.getMessage t))))
{(symbol test-var) (tracer/get-current-coverage)})
The result is a map from test symbol to coverage data:
{my.app-test/test-addition
{12345 #{"3" "3,1" "3,2"} ;; form-id -> coords touched
12346 #{"1" "2,1"}}
my.app-test/test-subtraction
{12345 #{"3" "4"}
12347 #{"1"}}}
This gets persisted to .heretic/coverage/ with one file per test namespace, enabling incremental updates. Change a test file? Only that namespace gets recollected.
At this point we have a complete map: for every test, we know exactly which [form-id coord] pairs it touched. Now we need to generate mutations and look up which tests are relevant for each one.
With coverage data in hand, we need to actually mutate the code. This means:
rewrite-clj gives us a zipper over Clojure source that preserves whitespace and comments - essential for producing readable diffs:
(defn parse-file [path]
(z/of-file path {:track-position? true}))
(defn find-mutation-sites [zloc]
(->> (walk-form zloc)
(remove in-quoted-form?) ;; Skip '(...) and `(...)
(mapcat (fn [z]
(let [applicable (ops/applicable-operators z)]
(map #(make-mutation-site z %) applicable))))))
The walk-form function traverses the zipper depth-first. At each node, we check which operators match. An operator is a data map with a matcher predicate:
(def swap-plus-minus
{:id :swap-plus-minus
:original '+
:replacement '-
:description "Replace + with -"
:matcher (fn [zloc]
(and (= :token (z/tag zloc))
(symbol? (z/sexpr zloc))
(= '+ (z/sexpr zloc))))})
Each mutation site captures the file, line, column, operator, and - critically - the coordinate path within the form. This coordinate is what connects a mutation to the coverage data from Phase 1.
The tricky part is converting between rewrite-clj's zipper positions and ClojureStorm's coordinate strings. We need bidirectional conversion for the round-trip:
(defn coord->zloc [zloc coord]
(let [parts (parse-coord coord)] ;; "3,2,1" -> [3 2 1]
(reduce
(fn [z part]
(when z
(if (string? part) ;; Hash-based for maps/sets
(find-by-hash z part)
(nth-child z part)))) ;; Integer index for lists/vectors
zloc
parts)))
(defn zloc->coord [zloc]
(loop [z zloc
coord []]
(cond
(root-form? z) (vec coord)
(z/up z)
(let [part (if (is-unordered-collection? z)
(compute-hash-coord z)
(child-index z))]
(recur (z/up z) (cons part coord)))
:else (vec coord))))
The validation requirement is that these must be inverses:
(= coord (zloc->coord (coord->zloc zloc coord)))
With correct coordinate mapping, we can take a mutation at a known location and ask "which tests touched this exact spot?" That query is what makes targeted test execution possible.
Once we find a mutation site and can navigate to it, the actual transformation is straightforward:
(defn apply-mutation! [mutation]
(let [{:keys [file form-id coord operator]} mutation
operator-def (get ops/operators-by-id operator)
original-content (slurp file)
zloc (z/of-string original-content {:track-position? true})
form-zloc (find-form-by-id zloc form-id)
target-zloc (coord/coord->zloc form-zloc coord)
replacement-str (ops/apply-operator operator-def target-zloc)
modified-zloc (z/replace target-zloc
(n/token-node (symbol replacement-str)))
modified-content (z/root-string modified-zloc)]
(spit file modified-content)
(assoc mutation :backup original-content)))
After modifying the source file, we need the JVM to see the change. clj-reload handles this correctly:
(ns heretic.reloader
(:require [clj-reload.core :as reload]))
(defn init! [source-paths]
(reload/init {:dirs source-paths}))
(defn reload-after-mutation! []
(reload/reload {:throw false}))
Why clj-reload specifically? It solves problems that require :reload doesn't:
remove-ns before reloading, preventing protocol/multimethod accumulationThe mutation workflow becomes:
(with-mutation [m mutation]
(reloader/reload-after-mutation!)
(run-relevant-tests m))
;; Mutation automatically reverted in finally block
At this point we have the full pipeline: parse source, find mutation sites, apply a mutation, hot-reload, run targeted tests, restore. But running this once per mutation is still slow for large codebases. Phase 3 addresses that.
The operator library is where Heretic's Clojure focus shows. Beyond the standard arithmetic and comparison swaps, we have:
Threading operators - catch ->/->> confusion:
(-> data (get :users) first) ;; Original
(->> data (get :users) first) ;; Mutant: wrong arg position
Nil-handling operators - expose nil punning mistakes:
(when (seq users) ...) ;; Original: handles empty list
(when users ...) ;; Mutant: breaks on empty list (truthy)
Lazy/eager operators - catch chunking and realization bugs:
(map process items) ;; Original: lazy
(mapv process items) ;; Mutant: eager, different memory profile
Destructuring operators - expose JSON interop issues:
{:keys [user-id]} ;; Original: kebab-case
{:keys [userId]} ;; Mutant: camelCase from JSON
The full set includes first/last, rest/next, filter/remove, conj/disj, some->/->, and qualified keyword mutations. These are the mistakes Clojure developers actually make.
With 80+ operators applied to a real codebase, mutation counts grow quickly. The next phase makes this tractable.
With 80+ operators and a real codebase, mutation counts get large fast. A 1000-line project might generate 5000 mutations. Running the full test suite 5000 times is not practical.
Heretic uses several techniques to make this manageable.
This is the big one, enabled by Phase 1. Instead of running all tests for every mutation, we query the coverage index:
(defn tests-for-mutation [coverage-map mutation]
(let [form-id (resolve-form-id (:form-location-index coverage-map) mutation)
coord (:coord mutation)]
(get-in coverage-map [:coord-to-tests [form-id coord]] #{})))
A mutation at (+ a b) might only be covered by 2 tests out of 200. We run those 2 tests in milliseconds instead of the full suite in seconds.
This is where the Phase 1 coverage investment pays off. But we can go further by reducing the number of mutations we generate in the first place.
Some mutations produce semantically identical code. Detecting these upfront avoids wasted test runs:
;; (* x 0) -> (/ x 0) is NOT equivalent (divide by zero)
;; (* x 1) -> (/ x 1) IS equivalent (both return x)
(def equivalent-patterns
[{:operator :swap-mult-div
:context (fn [zloc]
(some #(= 1 %) (rest (z/child-sexprs (z/up zloc)))))
:reason "Multiplying or dividing by one has no effect"}
{:operator :swap-lt-lte
:context (fn [zloc]
(let [[_ left right] (z/child-sexprs (z/up zloc))]
(and (= 0 right)
(non-negative-fn? (first left)))))
:reason "(< (count x) 0) is always false"}])
The patterns cover boundary comparisons ((>= (count x) 0) is always true), function contracts ((nil? (str x)) is always false), and lazy/eager equivalences ((vec (map f xs)) equals (vec (mapv f xs))).
Filtering equivalent mutations prevents false "survived" reports. But we can also skip mutations that would be redundant to test.
Subsumption identifies when killing one mutation implies another would also be killed. If swapping < to <= is caught by a test, then swapping < to > would likely be caught too.
Based on the RORG (Relational Operator Replacement with Guard) research, we define subsumption relationships:
(def relational-operator-subsumption
{'< [:swap-lt-lte :swap-lt-neq :replace-comparison-false]
'> [:swap-gt-gte :swap-gt-neq :replace-comparison-false]
'<= [:swap-lte-lt :swap-lte-eq :replace-comparison-true]
;; ...
})
For each comparison operator, we only need to test the minimal set. The research shows this achieves roughly the same fault detection with 40% fewer mutations.
The subsumption graph also enables intelligent mutation selection:
(defn minimal-operator-set [operators]
(set/difference
operators
;; Remove any operator dominated by another in the set
(reduce
(fn [dominated op]
(into dominated
(set/intersection (dominated-operators op) operators)))
#{}
operators)))
These techniques reduce mutation count. The final optimization reduces the cost of each mutation.
The most sophisticated optimization is mutant schemata. Instead of applying one mutation, reloading, testing, reverting, reloading for each mutation, we embed multiple mutations into a single compilation:
;; Original
(defn calculate [x] (+ x 1))
;; Schematized (with 3 mutations)
(defn calculate [x]
(case heretic.schemata/*active-mutant*
:mut-42-5-plus-minus (- x 1)
:mut-42-5-1-to-0 (+ x 0)
:mut-42-5-1-to-2 (+ x 2)
(+ x 1))) ;; original (default)
We reload once, then switch between mutations by binding a dynamic var:
(def ^:dynamic *active-mutant* nil)
(defmacro with-mutant [mutation-id & body]
`(binding [*active-mutant* ~mutation-id]
~@body))
The workflow becomes:
(defn run-mutation-batch [file mutations test-fn]
(let [schemata-info (schematize-file! file mutations)]
(try
(reload!) ;; Once!
(doseq [[id mutation] (:mutation-map schemata-info)]
(with-mutant id
(test-fn id mutation)))
(finally
(restore-file! schemata-info)
(reload!))))) ;; Once!
For a file with 50 mutations, this means 2 reloads instead of 100. The overhead of case dispatch at runtime is negligible compared to compilation cost.
Finally, we offer presets that trade thoroughness for speed:
(def presets
{:fast #{:swap-plus-minus :swap-minus-plus
:swap-lt-gt :swap-gt-lt
:swap-and-or :swap-or-and
:swap-nil-some :swap-some-nil}
:minimal minimal-preset-operators ;; Subsumption-aware
:standard #{;; :fast plus...
:swap-first-last :swap-rest-next
:swap-thread-first-last}
:comprehensive (set (map :id all-operators))})
The :fast preset uses ~15 operators that research shows catch roughly 99% of bugs. The :minimal preset uses subsumption analysis to eliminate redundant mutations. Both run much faster than :comprehensive while maintaining detection power.
A mutation testing run with Heretic looks like:
*active-mutant*, run targeted testsThe result is mutation testing that runs in seconds for typical projects instead of hours.
This covers the core implementation. A future post will explore Phase 4: AI-powered semantic mutations and hybrid equivalent detection - using LLMs to generate the subtle, domain-aware mutations that traditional operators miss.
Previously: Part 1 - Heretic: Mutation Testing in Clojure
Published: 2025-12-30
Tagged: mutation-testing testing clojure clojurestorm