Frak

Transform collections of strings into regular expressions.
Alternatives To Frak
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Path To Regexp7,619534,1745,8822 months ago53May 06, 202220mitTypeScript
Turn a path string such as `/user/:name` into a regular expression
Super Expressive4,474
4 months ago10mitJavaScript
🦜 Super Expressive is a zero-dependency JavaScript library for building regular expressions in (almost) natural language
Randexp.js1,76919,7063896 months ago20July 21, 201811mitJavaScript
Create random strings that match a given regular expression.
Frak1,056
4 years ago9February 09, 20201Clojure
Transform collections of strings into regular expressions.
Tiny Regex C1,005
10 months ago32unlicenseC
Small portable regex in C
Regexp Examples48362153 years ago42January 09, 20208mitRuby
Generate strings that match a given regular expression
Re Build472
6 years ago2July 25, 2016mitJavaScript
Building regular expressions with natural language
Escape String Regexp436773,9462,4322 years ago10April 17, 20211mitJavaScript
Escape RegExp special characters
Teip432
7 months ago6April 28, 20224mitRust
Masking tape to help commands "do one thing well"
Node Re24284055a month ago52June 12, 20223otherJavaScript
node.js bindings for RE2: fast, safe alternative to backtracking regular expression engines.
Alternatives To Frak
Select To Compare


Alternative Project Comparisons
Readme

frak

frak transforms collections of strings into regular expressions for matching those strings. The primary goal of this library is to generate regular expressions from a known set of inputs which avoid backtracking as much as possible. It is available as a command line utility and for the browser as a JavaScript library.

"Installation"

Add frak as a dependency to your project.clj file.

Clojars Project

Clojure(Script) usage

user> (require 'frak)
nil
user> (frak/pattern ["foo" "bar" "baz" "quux"])
#"(?:ba[rz]|foo|quux)"
user> (frak/pattern ["Clojure" "Clojars" "ClojureScript"])
#"Cloj(?:ure(?:Script)?|ars)"
user> (frak/pattern ["skill" "skills" "skull" "skulls"])
#"sk(?:[ui]lls?)"

Options

Frak's pattern function can accept an options map as its second argument, the available options are:

  • :capture? - boolean (default false), whether rendered regex should create capture groups for each match
  • :escape-chars - vector (default see frak/metacharacters), characters to escape when rendering a regular expression.
  • :exact? - boolean (default false), whether the rendered regex should only produces matches when the entire input string matches.
  • :whole-words? - boolean (default false), whether the rendered regex should match only whole words (word boundary at both ends of the match) in the input string.

Command line usage

frak can be used from the command line with either Leiningen or NodeJS.

With Leiningen

Use the lein run command:

$ lein run -e foo bar baz quux
^(?:ba[rz]|foo|quux)$

With NodeJS

Compile the NodeJS version

$ lein do cljx once, cljsbuild once node
$ chmod +x bin/frak
$ bin/frak -e foo bar baz quux
^(?:ba[rz]|foo|quux)$

Browser usage

To use frak as a standalone library in the browser with JavaScript compile the browser version:

$ lein do cljx once, cljsbuild once browser
$ mv ./target/js/frak.min.js <destination>

Try it using this HTML:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
  <pre>Input: <span id="input"></span></pre>
  <pre>Output: <span id="output"></span></pre>
  <script src="http://code.jquery.com/jquery-2.0.3.min.js"></script>
  <script src="frak.min.js"></script>
  <script>
    var strings = ["foo", "bar", "baz", "quux"];
    // It's a good idea to use the `"exact?"` option.
    var pattern = frak.pattern(strings, {"exact?": true})
    jQuery("#input").text(strings.join(" "));
    jQuery("#output").text(pattern);
  </script>
</body>
</html>

For even more fun try it with AngularJS!

How?

A frak pattern is constructed from a trie of characters and a renderer which processes it. As characters are added to the trie, data such as such as which characters are terminal are stored in it's branches.

During the rendering process frak analyzes each branch and attempts to emit the most concise regular expression possible. Additional post operations are applied after rendering to improve the expression where possible.

Why?

Here's why. Also because.

And now for something completely different

Let's build a regular expression for matching any word in /usr/share/dict/words.

user> (require '[clojure.java.io :as io])
nil
user> (def words
           (-> (io/file "/usr/share/dict/words")
               io/reader
               line-seq))
#'user/words
user> (def word-re (frak/pattern words))
#'user/word-re
user> (every? #(re-matches word-re %) words)
true

The last two operations will take a moment since there are over 235,000 words to consider.

You can view the full expression here (it's approximately 1.5M!).

Benchmarks

(use 'criterium.core)

(def words
  (-> (io/file "/usr/share/dict/words")
      io/reader
      line-seq))

(defn naive-pattern
  "Create a naive regular expression pattern for matching every string
   in strs."
  [strs]
  (->> strs
       (clojure.string/join "|")
       (format "(?:%s)")
       re-pattern))

;; Shuffle 10000 words and build a naive and frak pattern from them.
(def ws (shuffle (take 10000 words)))

(def n-pat (naive-pattern ws))
(def f-pat (frak/pattern ws))

;; Verify the naive pattern matches everything it was constructed from.
(every? #(re-matches n-pat %) ws)
;; => true

;; Shuffle the words again since the naive pattern is built in the
;; same order as it's inputs.
(def ws' (shuffle ws))

;;;; Benchmarks

;; Naive pattern

(bench (doseq [w ws'] (re-matches n-pat w)))
;;             Execution time mean : 1.499489 sec
;;    Execution time std-deviation : 181.365166 ms
;;   Execution time lower quantile : 1.337817 sec ( 2.5%)
;;   Execution time upper quantile : 1.828733 sec (97.5%)

;; frak pattern

(bench (doseq [w ws'] (re-matches f-pat w)))
;;             Execution time mean : 155.515855 ms
;;    Execution time std-deviation : 5.663346 ms
;;   Execution time lower quantile : 148.168855 ms ( 2.5%)
;;   Execution time upper quantile : 164.164294 ms (97.5%)
Popular Regular Expression Projects
Popular Character Projects
Popular Text Processing Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Clojure
Character
Regular Expression