About — Technical Rhymer

Why search by sound, not spelling

Rhyme lives in pronunciation, and English spelling is a famously unreliable witness to it: colonel rhymes with kernel, through with threw, and cough, though, and tough agree on almost nothing. Letter-based rhyme tools inherit all of that noise. Technical Rhymer instead searches over phonemes — the actual sounds — written in ARPABET, the notation used by the CMU Pronouncing Dictionary: orange is AO1 R AH0 N JH.

Digits on vowels mark stress: 1 primary, 2 secondary, 0 unstressed. A perfect rhyme is "same sounds from the last stressed vowel to the end", which is exactly a suffix match on the phoneme string — that one observation is the whole design. Look a word up, take the tail you care about, and search it.

How matching works

Every entry is indexed by its phoneme string, and a search scans all of them (a linear pass over ~145k pronunciations takes tens of milliseconds — no clever index needed). The match modes anchor the fragment: Ends with is the rhyme case, Starts with finds alliterative twins, Anywhere and Exact do what they say. Two operators build richer patterns: * requires parts in order with anything between (AO R * JH), and | requires parts in any order. Listing a part twice (B | B) demands it occur twice.

Stress is ignored by default: requiring stress digits to match eliminates most slant rhymes people actually want, so exact-stress matching is the opt-in, not the default. Double rhymes — words where the searched segment occurs twice or more, like rat-a-tat for AE T — get split into their own section with a ×N badge, since a repeated rhyme is usually the better find.

Fuzzy matching (opt-in) treats like-sounding consonants as interchangeable for slant rhymes. The classes are grouped by manner and voicing of articulation — so T↔K match but T↔D (a voicing change) do not:

You never have to type ARPABET cold: looked-up pronunciations drop into the search box with a click, results highlight exactly which sounds matched, and a tap-to-build phoneme keyboard under the search box shows every sound with an example word.

Voiceless stops	`P T K`
Voiced stops	`B D G`
Voiceless sibilants	`S SH CH`
Voiced sibilants	`Z ZH JH`
Voiceless fricatives	`F TH`
Voiced fricatives	`V DH`
Nasals	`M N NG`
Liquids	`L R`
Glides	`W Y`

Vowels are never fuzzed — vowel identity is most of what makes a rhyme feel like one.

Those are the defaults, not dogma: the gear next to the Fuzzy toggle opens an editor where you can regroup the consonants however your ear likes — merge voicings, split the sibilants, set a sound loose so it only matches itself. Custom groups persist in your browser, and Reset to defaults brings back the table above.

Ranking: most common first

A rhyme search for a short tail can return thousands of words, and most of them are words nobody would use. Results are therefore ranked by commonality — the wordfreq project's Zipf scale, where ~7 is the and ~1 is deep obscurity. The thin bar under each result is that score. Usable rhymes surface first; the exotic tail is still there at the bottom. Sorting by syllable count or alphabetically is one click away.

The word-sense filter

Rhyming is usually rhyming about something. The optional filter keeps only results related to a word you name — searching EH ZH ER while writing about pirates keeps treasure. It has two engines:

AI mode — the top candidates are sent to a small relay we run, which asks Claude which of them are related / synonyms / opposites. No key, no account, nothing to configure — the API key lives on the relay, never in your browser. Requests are rate-limited and budget-capped (it's a free site). This is the primary engine because real-world association ("pirates" → "treasure") is exactly what language models are good at and word vectors are mediocre at.
Offline mode — if the AI is busy or over budget, "related" falls back to GloVe word vectors (50-dimensional, int8-quantized to keep the file small) with a cosine-similarity cutoff, and "synonym" / "opposite" use WordNet. The split exists because embeddings famously place antonyms close together (hot and cold appear in identical contexts), so cosine similarity can't tell likeness from opposition — a curated lexicon can.

The dictionary

Three layers, each honestly labeled in the UI:

CMU Pronouncing Dictionary — ~135k standard-English words with curated pronunciations. The backbone.
Urban Dictionary top 10,000 (gold UD badge) — the highest-net-voted slang terms. Pronunciations for terms CMUdict lacks are auto-generated with a grapheme-to-phoneme model, so they're approximate. UD content is user-submitted and can be crude or NSFW; it's included as-is, with vote scores and links back to the source for attribution.
New words of the 2020s (teal ’2x badge) — the 2,000 top-rated terms whose first meaningful Urban Dictionary definition appeared in 2020 or later and that don't exist in CMUdict. "First meaningful" matters: a raw first-definition-date rule would exclude rizz because someone defined it as a nickname in 2015 (scored −50, against 626 for the real 2023 definition), so a term qualifies as long as any pre-2020 definitions are marginal next to its 2020s peak. Pronunciations are approximate: grapheme-to-phoneme for pronounceable terms, letter names for vowel-less acronyms (tds → T IY1 D IY1 EH1 S). The source dataset ends in November 2023, so the newest of the new isn't here yet.

Engineering: aggressively static

The site is static files — no database, no framework, no build step at runtime. All dictionary data (~15 MB across six bundles) is baked into plain JavaScript files loaded with <script> tags, which means search works offline and even opened directly from disk over file://. The heavy word-vector and WordNet bundles are lazy-loaded only if the offline sense filter is actually used. The single exception to "no server" is the word-sense relay described above — a fixed-function endpoint that exists purely so the AI key never ships to your browser.

Data files are regenerated by small Python scripts in build/ (CMUdict parsing, wordfreq scores, GloVe quantization, WordNet extraction, Urban Dictionary ranking and grapheme-to-phoneme). The runtime never depends on them. Pinned result tabs and your preferences (fuzzy matching, stress handling, sort, match mode, panels) persist in localStorage only — no cookies — and stick across visits. Searches also sync into the address bar (?q=AH+N+JH), so any query is shareable as a plain link; opening someone else's link never overwrites your own preferences.

Privacy

No accounts, no analytics, no cookies, no tracking. Nothing you type leaves your machine, with one explicit exception: when you use the word-sense filter's AI mode, the filter word and the candidate rhyme list (just words — never your searches or anything else) are sent to our relay and on to Anthropic to be judged. They aren't tied to any identity, and there's nothing else to send — the site has no accounts to associate them with.

Credits

CMU Pronouncing Dictionary — pronunciations (ARPABET).
wordfreq — Zipf commonality scores.
GloVe — offline word vectors (glove-wiki-gigaword-50).
WordNet — synonyms and antonyms.
g2p_en — grapheme-to-phoneme for words CMUdict doesn't know.
Lucide — icons.