About overbuilt. — 21 Search Algorithms

// 21 algorithms for one search bar. you're welcome.

Most search engines are simple. I decided to make this one ridiculously complex. It is built to filter just 1,000 static JSON entries, but it runs 21 parallel algorithms in real-time in your browser. It combines exact matches, fuzzy logic, semantics, and personalization. Yes, it is beautifully over-engineered.

Core Matching

Exact Match

Core

The simplest and strongest signal. If the user types the exact full name of the target.

"github" → GitHub

Prefix Match

Core

Matches the start of a name. Rewards matches that cover more of the total word length.

"face" → Facebook

Word Prefix

Core

Matches the prefix of any individual word inside a multi-word target name.

"dri" → Google Drive

Substring (LCS)

Core

Uses Longest Common Substring logic to find continuous character blocks anywhere in the target.

"tube" → YouTube

Abbreviation Expansion

Core

A curated map of highly specific, manually verified abbreviations distinct from auto-generated acronyms.

"yt" → YouTube

Fuzzy & Edge-Cases

Keyboard Typo

Fuzzy

Calculates spatial distance on a QWERTY keyboard. A slip of the finger (like pressing 's' instead of 'a') gets penalized less than a completely wrong key.

"amaxon" (x is near z) → Amazon

N-Gram Similarity

Fuzzy

Breaks text into trigrams (3-letter chunks) and compares overlap. Brilliant at catching jumbled letters or missed spaces.

"linkeidn" → LinkedIn

Phonetic (Soundex)

Fuzzy

Encodes words based on how they sound when spoken aloud in English. Catches massive spelling errors as long as they sound similar.

"fasebuk" → Facebook

Acronym Match

Fuzzy

Extracts the first letter of each word in a multi-word target to match against user initials.

"aws" → Amazon Web Services

Character Transposition

Fuzzy

Implements strict Damerau-Levenshtein transposition checks to perfectly identify when two adjacent characters are swapped.

"googel" → Google

Negative Matching

Fuzzy

Penalizes results if the user types multiple words (e.g., 'google docs') and the target only contains one of them, effectively burying mismatched pairs.

"google docs" ⊘ Google Maps

Semantic Context

Synonym Match

Semantic

Uses an internal graph of related concepts. If you search for an intent, it finds the tool.

"email" → Gmail, Outlook

Keyword/Description

Semantic

Runs standard match algorithms against the target's underlying description text instead of just its name.

"music" → Spotify (music streaming)

Multi-word Analysis

Semantic

Splits the user query into individual words and ensures the target satisfies multiple concepts simultaneously.

"buy stocks" → Robinhood

Tag Match

Semantic

Direct matching against hidden categorical tags assigned to the target data.

"chat" → Slack [tag: chat]

Domain Match

Semantic

Strips out URL protocols (https, www) and explicitly scores against the root domain.

"figma.com" → Figma

Personalisation

Popularity

Personal

Tracks global frequency. Queries that are searched often get a slight organic boost.

+ up to 15% bonus

Click-through

Personal

Associates specific queries to specific clicks. If you always click 'GitHub' when searching 'g', GitHub learns to win.

+ up to 20% bonus

Recency

Personal

Maintains an LRU stack of your last 20 visited targets. Recently used items are strongly preferred.

+ up to 8% bonus

Time-Aware Routing

Personal

Dynamically shifts rankings based on the user's local clock. Work apps peak 9-5, media peaks in the evening, news peaks in the morning.

"docs" at 2PM → Google Docs

Session Context

Personal

Tracks short-term query sequences. If you search 'google' and immediately search 'drive', the engine cross-references them to boost 'Google Drive'.

Contextual Memory

Performance Engine

Adaptive Fast-Path

System

Intelligently bypasses expensive O(N²) fuzzy matching algorithms when scanning massive text fields (like 300+ character descriptions) in large datasets, instantly falling back to a blazing-fast O(N) boundary regex matcher to prevent UI blocking.

0.0ms Execution Time

Dictionary Memoization

System

Prevents memory leaks and garbage collection stutters by hoisting heavy Soundex phonetic dictionaries out of the hot-loop, stopping the browser from re-allocating memory 4,000 times per keystroke.

0.0mb Memory Leaks

Global Precomputation

System

N-Gram sets and Phonetic tokens are pre-computed at application load, guaranteeing zero allocation overhead during live search.

0.0ms Overhead

Input Debouncing

System

Prevents firing the search engine on every millisecond of a keypress sequence, instead running once when the user pauses typing.

150ms Buffer

Search Query Caching

System

Memorizes previous exact search queries and instantly serves the cached result set when a user hits backspace or re-types a query.

Instant Retrieval

Zero-Allocation Arrays

System

Pre-allocates global Float32Arrays for Keyboard Distance and LCS algorithms, eliminating expensive garbage collection during typing.

O(1) Memory Setup

Lazy Deduplication

System

Stops searching for fuzzy duplicates the microsecond it fills the UI render limit, preventing O(N²) browser freezes.

O(N) Lazy Evaluator

Contiguous Substrings

System

Dynamically constrains the Substring matching algorithm based on word counts to prevent random syllables from inflating scores.

Strict Boundaries

Algorithm Diagnostics

UX / UI

Click the (i) button on any search result to visually reverse-engineer its exact mathematical breakdown and highlighted substring extractions.

Radical Transparency

Result Diversification

System

A Stage-3 Re-ranking pipeline that prevents generic prefix queries (like 'goo') from flooding the top 5 with products from a single company.

Exploratory Routing