Hybrid search in Phoenix: fusing Meilisearch and pgvector with reciprocal-rank fusion

Keyword search and vector search each miss things the other catches. Here's how I combine them in a Phoenix app with reciprocal-rank fusion — and why it beats either one alone.

Most "AI search" ships as a single vector similarity query and calls it done. In production that quietly fails: vectors are great at meaning but weak at exact terms — names, IDs, statute numbers, error codes. Keyword search is the mirror image. The fix isn't picking one; it's fusing both.

This is the approach I use in Vidhi to return grounded, source-cited answers over a very large corpus, and it generalizes to almost any Phoenix app that needs search that actually works.

Why one retriever isn't enough

Run the same query two ways and you get two different, both-incomplete result sets:

Query type	Keyword (BM25)	Vector (embeddings)
Exact term / code / name	Strong	Weak
Paraphrase / synonym	Weak	Strong
Rare token	Strong	Often missed
"Similar in spirit"	Misses	Strong

Neither column is good enough on its own. You want the union of their strengths — without letting one drown out the other.

The shape of the solution

Three steps, all inside a normal Phoenix request:

Fire both retrievers concurrently — Meilisearch for lexical, pgvector for semantic.
Fuse the two ranked lists into one, using reciprocal-rank fusion (RRF).
Return the top k to the LLM (or the UI) with their source metadata intact.

Reciprocal-rank fusion needs no score calibration and no tuning per query. It only cares about an item's rank in each list — which makes it robust across retrievers whose raw scores aren't comparable.

RRF gives each document a score based on where it ranked in each list:

# rank is 0-based position in each result list; k smooths early ranks
defp rrf(result_lists, k \\ 60) do
  result_lists
  |> Enum.flat_map(fn list ->
    list |> Enum.with_index() |> Enum.map(fn {doc, rank} ->
      {doc.id, 1.0 / (k + rank), doc}
    end)
  end)
  |> Enum.group_by(fn {id, _, _} -> id end)
  |> Enum.map(fn {_id, entries} ->
    score = entries |> Enum.map(fn {_, s, _} -> s end) |> Enum.sum()
    {_, _, doc} = hd(entries)
    %{doc | score: score}
  end)
  |> Enum.sort_by(& &1.score, :desc)
end

Running the two retrievers concurrently keeps latency close to the slower of the two, not their sum:

def hybrid(query, opts \\ []) do
  limit = Keyword.get(opts, :limit, 20)

  [lexical, semantic] =
    [
      Task.async(fn -> Meili.search(query, limit: limit) end),
      Task.async(fn -> Vector.search(embed(query), limit: limit) end)
    ]
    |> Task.await_many(5_000)

  rrf([lexical, semantic]) |> Enum.take(limit)
end

Things that bite you in production

A few lessons that only show up once real traffic hits:

Cap both lists before fusing. Fetch ~20 from each retriever, not 200 — RRF over huge lists wastes CPU and the tail never surfaces anyway.
Deduplicate on a stable id, not on text. The same source can arrive from both retrievers with slightly different snippets.
Keep k boring. The literature's default of k = 60 is fine; resist the urge to "tune" it before you have an evaluation harness.
Measure it. Which is the real point below.

Don't ship retrieval you can't measure

The hard part isn't wiring the two retrievers together — it's knowing whether the fused result is actually better. Before RRF is allowed anywhere near users, it runs against a scored set of known query → expected-source pairs, and a regression in that score blocks the change. Retrieval quality is a testable property, not a vibe. That harness is a topic on its own — and the next post.

Building something where search or RAG quality matters? I write about this because I do it — see what I build or get in touch.