Lab 6 · Retrieval signals, side by side

Hybrid search and reranking

Run the same query through BM25, a deterministic pseudo-vector index, the reciprocal-rank-fusion hybrid, and a toy reranker. The point is structural, not numeric: each signal misses a different class of query, and the ranked lists rarely line up.

Keyword (BM25)

Classical lexical scorer. Strong on exact terms, weak on paraphrases.

  1. #1 · [P-001] Account holds and blocked accounts1.739
    matchedTerms:
    1
  2. #2 · [P-008] Auto-approval safe path1.302
    matchedTerms:
    1
  3. #3 · [P-004] Watchlist accounts1.004
    matchedTerms:
    1
  4. #4 · [P-003] Overdue invoice escalation0.948
    matchedTerms:
    1

Vector (pseudo-embedding)

Deterministic char-trigram bag, hashed to a 64-d unit vector with cosine similarity. Stand-in for a real embedding so the lesson is reproducible.

  1. #1 · [P-001] Account holds and blocked accounts0.651
  2. #2 · [P-004] Watchlist accounts0.602
  3. #3 · [P-006] Credit review ticket creation0.563
  4. #4 · [P-003] Overdue invoice escalation0.56

Hybrid (RRF)

Reciprocal Rank Fusion of the keyword and vector lists; no score normalisation needed.

  1. #1 · [P-001] Account holds and blocked accounts0.033
    keywordRank:
    1
    vectorRank:
    1
  2. #2 · [P-004] Watchlist accounts0.032
    keywordRank:
    3
    vectorRank:
    2
  3. #3 · [P-003] Overdue invoice escalation0.031
    keywordRank:
    4
    vectorRank:
    4
  4. #4 · [P-006] Credit review ticket creation0.031
    keywordRank:
    5
    vectorRank:
    3

Hybrid + rerank

Toy cross-encoder reranks the hybrid top-k by query/title overlap, tie-broken by hybrid score.

  1. #1 · [P-006] Credit review ticket creation1.031
    titleOverlap:
    1
    hybridScore:
    0.031
  2. #2 · [P-001] Account holds and blocked accounts0.033
    titleOverlap:
    0
    hybridScore:
    0.033
  3. #3 · [P-004] Watchlist accounts0.032
    titleOverlap:
    0
    hybridScore:
    0.032
  4. #4 · [P-003] Overdue invoice escalation0.031
    titleOverlap:
    0
    hybridScore:
    0.031

Source policy

The same eight sections used across the Agent Lab RAG demo. No external corpus, no hidden state.

  • [P-001] Account holds and blocked accounts

    When a customer account is in a blocked status, no new orders may be released regardless of available credit. The order must be held until the account hold is removed by the credit team. Sales must not promise a release date.

  • [P-002] Projected exposure and the credit limit

    Projected exposure equals current exposure plus the proposed order amount. If projected exposure exceeds the credit limit by any amount, the order is not auto-approvable. The over-limit amount must be reported in the recommendation reasons.

  • [P-003] Overdue invoice escalation

    Any open invoice that is more than 30 days past due triggers a credit review. The credit team must inspect the oldest overdue invoice. Orders should not be auto-approved while overdue invoices exist on the account.

  • [P-004] Watchlist accounts

    Accounts on the watchlist require a credit review on every new order, even when projected exposure stays within the credit limit and no invoices are overdue. Watchlist status is set by the credit team.

  • [P-005] Large-order approval threshold

    Any order whose amount exceeds one million United States dollars requires human approval before the eligibility decision is finalized. This rule applies even to read-only eligibility checks because it implies a large business decision.

  • [P-006] Credit review ticket creation

    Creating a credit review ticket is a write action and always requires explicit human approval at the gate. The ticket reason must be at least eight characters and should reference the specific failing policy rule. Tickets are persistent business records.

  • [P-007] Risk level and tightening factors

    A high risk level alone does not block an order, but it must be cited in the recommendation reasons. A combination of high risk plus any of: blocked status, watchlist, overdue invoices, or over-limit projection should escalate to a credit review.

  • [P-008] Auto-approval safe path

    An order may be auto-approved only when all of the following hold: the account is active, projected exposure stays within the credit limit, there are no overdue invoices, and the order amount is at or below the large-order threshold.