Bug Description
The CodeChef leaderboard relies entirely on HTML scraping (BeautifulSoup) which breaks whenever CodeChef changes their page layout. There's no fallback mechanism for when parsing fails.
Affected Files
api/leaderboard/views.py:332-386 (CodechefLeaderboard.get_codechef_data)
api/leaderboard/management/commands/update_db.py:64-122 (codechef_user_update)
Problem
# views.py:340-348
rating_div = soup.find("div", class_="rating-number")
if not rating_div:
if "user not found" in response.text.lower():
return "NOT_FOUND"
logger.error(f"CodeChef parsing error for {username}: rating_div not found")
return "TRANSIENT_ERROR"
# views.py:350-363
instance["highest_rating"] = (
container_highest_rating.find_next("small")
.text.split()[-1]
.rstrip(")")
)
container_ranks = soup.find("div", class_="rating-ranks")
ranks = container_ranks.find_all("a")
instance["global_rank"] = ranks[0].strong.text
This scraper depends on:
div.rating-number — CSS class can change anytime
div.rating-header + next small tag — fragile DOM structure
div.rating-ranks + ranks[0].strong + ranks[1].strong — assumes exact structure
img.profileImage[-1]["src"] — assumes profile images exist in DOM
Any CodeChef frontend update silently returns "TRANSIENT_ERROR" for ALL users, leaving no data visible on the leaderboard.
Steps to Reproduce
- If CodeChef ships any frontend update that changes class names
- All CodeChef data fails to refresh
- Users see stale or missing data with no indication why
Proposed Fix
- Short term: Add more robust error handling with partial data returns:
try:
rating_div = soup.find("div", class_="rating-number")
if rating_div:
instance["rating"] = int(rating_div.text)
else:
# Try alternative selectors
rating_span = soup.select_one("span.rating, .rating-value")
if rating_span:
instance["rating"] = int(rating_span.text.strip())
else:
return "TRANSIENT_ERROR"
except Exception:
logger.error(f"Failed to parse rating for {username}")
return "TRANSIENT_ERROR"
-
Long term: Use CodeChef's official API if available: https://www.codechef.com/api/rankings/rating or check if they've released a proper API since this scraper was written.
-
Monitoring: Add a health check that alerts when multiple consecutive users fail to parse, so maintainers know scraping is broken.
Severity
MEDIUM — Silent failures with no user notification. Affects all CodeChef leaderboard users.
Bug Description
The CodeChef leaderboard relies entirely on HTML scraping (BeautifulSoup) which breaks whenever CodeChef changes their page layout. There's no fallback mechanism for when parsing fails.
Affected Files
api/leaderboard/views.py:332-386(CodechefLeaderboard.get_codechef_data)api/leaderboard/management/commands/update_db.py:64-122(codechef_user_update)Problem
This scraper depends on:
div.rating-number— CSS class can change anytimediv.rating-header+ nextsmalltag — fragile DOM structurediv.rating-ranks+ranks[0].strong+ranks[1].strong— assumes exact structureimg.profileImage[-1]["src"]— assumes profile images exist in DOMAny CodeChef frontend update silently returns
"TRANSIENT_ERROR"for ALL users, leaving no data visible on the leaderboard.Steps to Reproduce
Proposed Fix
Long term: Use CodeChef's official API if available:
https://www.codechef.com/api/rankings/ratingor check if they've released a proper API since this scraper was written.Monitoring: Add a health check that alerts when multiple consecutive users fail to parse, so maintainers know scraping is broken.
Severity
MEDIUM — Silent failures with no user notification. Affects all CodeChef leaderboard users.