Normalize email on parent creation (Gmail dot + case) #418

Open
opened 2026-04-10 18:23:27 +00:00 by forgejo_admin · 0 comments

Type

Bug

Lineage

Standalone — discovered during 2026-04-10 Westside Ops Streamlit session while investigating Marcus's jersey/contract list. Surfaced three concrete duplicate-email clusters in the parents table, all caused by the same root bug.

Repo

forgejo_admin/basketball-api

What Broke

The parents table accumulates duplicate rows for the same logical person because the uniqueness check on email is a literal string comparison, with no normalization for case or Gmail's dot-insensitive local part.

Three concrete dupe clusters confirmed in prod today:

  1. aliceuwamahoro13a.@gmail.com and aliceuwamahoro13.a@gmail.com — ONE Alice Uwamahoro, TWO parent rows (ids 175 and 176) + TWO player rows (ids 201 and 202), registered 6 minutes apart on 2026-04-08. Blocked contract creation during tonight's Marcus sync.
  2. marcusdraney23@gmail.com has THREE rows with different capitalizations: marcusdraney23@gmail.com, Marcusdraney23@gmail.com, marcusdraney23@gmail.com. That's Coach Marcus Draney's own email — single person, three rows.
  3. draneylucas@gmail.com has two rows (Lucas's own, likely test data, same mechanism).

Gmail treats the local part as case-insensitive AND ignores dots — but the app's uniqueness check uses a literal string compare, so any capitalization difference or dot placement creates a new parent row. Every time this happens, downstream player / contract / order rows get orphaned or duplicated.

Repro Steps

  1. On the registration form (or POST to the registration endpoint), register with parent email alice.uwamahoro@gmail.com
  2. Register again with parent email aliceuwamahoro@gmail.com (same address, dots removed)
  3. Query SELECT COUNT(*) FROM parents WHERE lower(email) LIKE '%aliceuwamahoro%'
  4. Observe: 2 rows instead of 1

Alternative repro using capitalization:

  1. Register with test@gmail.com
  2. Register with Test@gmail.com
  3. Observe: 2 rows

Expected Behavior

Registration with any email that Gmail would deliver to the same inbox should collapse to a single parents row. Specifically:

  • Case should be normalized to lowercase for all providers
  • For @gmail.com and @googlemail.com addresses, dots (.) in the local part should be stripped before the uniqueness check
  • +suffix tags should also be stripped for Gmail (alice+tag@gmail.comalice@gmail.com)
  • Non-Gmail addresses preserve dots (they may be significant)

The normalized form is what gets stored in the database going forward. Existing non-normalized rows should still resolve on lookup (lookup normalizes the input before querying).

Environment

  • Cluster/namespace: prod (basketball-api)
  • Service version/commit: current main (unknown SHA — grep latest Woodpecker build tag)
  • Affects: parents table, registration POST handler, parent lookup functions
  • Related alerts: none (silent data corruption)
  • Discovery: Westside Ops Streamlit dashboard query joining players → parents for Marcus's 5-player list

Acceptance Criteria

  • Bug no longer reproduces — alice.@gmail.com and alice@gmail.com collapse to one row
  • Capitalization differences collapse to one row (Test@gmail.com = test@gmail.com)
  • +suffix tags collapse for Gmail (alice+tag@gmail.com = alice@gmail.com)
  • Non-Gmail addresses preserve dots (Yahoo, Outlook, etc.)
  • Existing parent rows with non-normalized emails still resolve on lookup during the transition period
  • Unit tests cover: Gmail dot stripping, Gmail lowercase, Gmail +tag stripping, non-Gmail preservation
  • Integration test: simulate the Alice dupe scenario end-to-end, assert one row after two registrations
  • No regression in existing registration flow
  • Symptom-cleanup tickets for Alice (#TBD) and Marcus (#TBD) parent dupes are now unblocked
  • westside-basketball — project this affects
  • Symptom cleanup (pending): Alice Uwamahoro player+parent dedupe (ticket A, not yet filed)
  • Symptom cleanup (pending): marcusdraney23@gmail.com parent dedupe (ticket B, not yet filed)
  • Symptom cleanup (pending): draneylucas@gmail.com parent dedupe (low priority, test data)
### Type Bug ### Lineage Standalone — discovered during 2026-04-10 Westside Ops Streamlit session while investigating Marcus's jersey/contract list. Surfaced three concrete duplicate-email clusters in the parents table, all caused by the same root bug. ### Repo `forgejo_admin/basketball-api` ### What Broke The `parents` table accumulates duplicate rows for the same logical person because the uniqueness check on email is a literal string comparison, with no normalization for case or Gmail's dot-insensitive local part. Three concrete dupe clusters confirmed in prod today: 1. `aliceuwamahoro13a.@gmail.com` and `aliceuwamahoro13.a@gmail.com` — ONE Alice Uwamahoro, TWO parent rows (ids 175 and 176) + TWO player rows (ids 201 and 202), registered 6 minutes apart on 2026-04-08. Blocked contract creation during tonight's Marcus sync. 2. `marcusdraney23@gmail.com` has THREE rows with different capitalizations: `marcusdraney23@gmail.com`, `Marcusdraney23@gmail.com`, `marcusdraney23@gmail.com`. That's Coach Marcus Draney's own email — single person, three rows. 3. `draneylucas@gmail.com` has two rows (Lucas's own, likely test data, same mechanism). Gmail treats the local part as case-insensitive AND ignores dots — but the app's uniqueness check uses a literal string compare, so any capitalization difference or dot placement creates a new parent row. Every time this happens, downstream player / contract / order rows get orphaned or duplicated. ### Repro Steps 1. On the registration form (or POST to the registration endpoint), register with parent email `alice.uwamahoro@gmail.com` 2. Register again with parent email `aliceuwamahoro@gmail.com` (same address, dots removed) 3. Query `SELECT COUNT(*) FROM parents WHERE lower(email) LIKE '%aliceuwamahoro%'` 4. Observe: 2 rows instead of 1 Alternative repro using capitalization: 1. Register with `test@gmail.com` 2. Register with `Test@gmail.com` 3. Observe: 2 rows ### Expected Behavior Registration with any email that Gmail would deliver to the same inbox should collapse to a single `parents` row. Specifically: - Case should be normalized to lowercase for all providers - For `@gmail.com` and `@googlemail.com` addresses, dots (`.`) in the local part should be stripped before the uniqueness check - `+suffix` tags should also be stripped for Gmail (`alice+tag@gmail.com` → `alice@gmail.com`) - Non-Gmail addresses preserve dots (they may be significant) The normalized form is what gets stored in the database going forward. Existing non-normalized rows should still resolve on lookup (lookup normalizes the input before querying). ### Environment - Cluster/namespace: prod (basketball-api) - Service version/commit: current main (unknown SHA — grep latest Woodpecker build tag) - Affects: `parents` table, registration POST handler, parent lookup functions - Related alerts: none (silent data corruption) - Discovery: Westside Ops Streamlit dashboard query joining players → parents for Marcus's 5-player list ### Acceptance Criteria - [ ] Bug no longer reproduces — `alice.@gmail.com` and `alice@gmail.com` collapse to one row - [ ] Capitalization differences collapse to one row (`Test@gmail.com` = `test@gmail.com`) - [ ] `+suffix` tags collapse for Gmail (`alice+tag@gmail.com` = `alice@gmail.com`) - [ ] Non-Gmail addresses preserve dots (Yahoo, Outlook, etc.) - [ ] Existing parent rows with non-normalized emails still resolve on lookup during the transition period - [ ] Unit tests cover: Gmail dot stripping, Gmail lowercase, Gmail +tag stripping, non-Gmail preservation - [ ] Integration test: simulate the Alice dupe scenario end-to-end, assert one row after two registrations - [ ] No regression in existing registration flow - [ ] Symptom-cleanup tickets for Alice (#TBD) and Marcus (#TBD) parent dupes are now unblocked ### Related - `westside-basketball` — project this affects - Symptom cleanup (pending): Alice Uwamahoro player+parent dedupe (ticket A, not yet filed) - Symptom cleanup (pending): `marcusdraney23@gmail.com` parent dedupe (ticket B, not yet filed) - Symptom cleanup (pending): `draneylucas@gmail.com` parent dedupe (low priority, test data)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo_admin/basketball-api#418
No description provided.