Add player name normalization helper #437
Labels
No labels
domain:backend
domain:devops
domain:frontend
status:approved
status:in-progress
status:needs-fix
status:qa
type:bug
type:devops
type:feature
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ldraney/basketball-api#437
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Type
Feature
Lineage
Standalone — spawned from the westside-sheet-sync project scaffold on 2026-04-10. Blocker for the sheet_sync service module because the DB and sheet use different name formats; without normalization the sync would insert duplicates.
Repo
forgejo_admin/basketball-apiUser Story
As the sheet_sync service
I want a function that normalizes player names across different formats (DB "Firstname Lastname" vs Sheet "LASTNAME, Firstname")
So that I can reliably decide whether a DB player already exists in the sheet without inserting duplicates
Ties to
story:sheet-sync.Context
The basketball-api DB stores player names as a single VARCHAR column
players.name, formatted "First Last" or "First Middle Last" (e.g., "Daniel Bryan Niyitanga", "Mateus Rigitano de Paula", "Sarah Lédio da Silva"). Marcus's Google Sheet uses "LASTNAME, Firstname" formatting (e.g., "NIYITANGA, Daniel Bryan", "RIGITANO DE PAULA, Mateus", "DA SILVA, Sarah Lédio").For the sync to be idempotent (running it twice in a row should be a no-op if nothing has changed), we need a function that takes two strings and returns
Trueif they refer to the same player. Challenges:The right abstraction: a
normalize_name(name: str) -> strfunction that returns a canonical form regardless of input format. Two names are "the same" if their canonical forms match.File Targets
Files to create:
src/basketball_api/services/name_normalize.py— containsnormalize_name(name: str) -> strandnames_match(a: str, b: str) -> bool.Files to create (tests):
tests/test_name_normalize.py— table-driven tests covering every edge case listed in Context.Files NOT to touch:
src/basketball_api/models.py— no schema changes.Acceptance Criteria
normalize_name("Daniel Bryan Niyitanga")andnormalize_name("NIYITANGA, Daniel Bryan"), then both return the same canonical form.names_match("Mateus Rigitano de Paula", "RIGITANO DE PAULA, Mateus"), then it returnsTrue.names_match("Sarah Lédio da Silva", "DA SILVA, Sarah Lédio"), then it returnsTrue(unicode handled).names_match("Elson", "ELSON"), then it returnsTrue(single-name player).names_match("Terrail Brown Jr.", "BROWN JR., Terrail"), then it returnsTrue(suffix handled).names_match("Jace Bronson", "Jacelyn Bronson"), then it returnsFalse(similar but different first names).names_match("", ""), then it returnsFalse(empty strings are not a match).Test Expectations
tests/test_name_normalize.pywith at least 15 pairs covering the edge cases above. Half are expectedTrue, half are expectedFalse.Noneinput by raising TypeError (or returningFalsefromnames_match— pick one, document it).pytest tests/test_name_normalize.py -vConstraints
unicodedata,re) — no new third-party dependencies.unicodedata.normalize("NFKD", s).Checklist
Related
westside-sheet-sync— project this affectsstory-westside-jersey-sheet-sync— user story