Add pool data from cardano node#170
Conversation
LauraAntunes1
left a comment
There was a problem hiding this comment.
Just a couple of comments but otherwise looks fine
| return None | ||
|
|
||
| homepage_lower = homepage.lower() | ||
|
|
There was a problem hiding this comment.
Suggestion: We can also check if homepage_lower has a space in it.
It will help filtering out most invalid values included in INVALID_EXACT
There was a problem hiding this comment.
Suggestion: Or may be, we can add another function to filter out invalid domain name before we match INVALID_EXACT:
Something on the lines:
def is_valid_url(url):
parsed = urlparse(url)
return parsed.scheme in ('http', 'https') and bool(parsed.netloc) and ' ' not in parsed.netloc
There was a problem hiding this comment.
Do you mean a trailing space or like a space in the middle? Because trailing spaces are handled on line 272 via strip(), and I'm not sure if spaces in the middle are a thing.
For the url validity it's a good idea but I was thinking we might want to allow for cases where they skip the prefix but are still valid pages, e.g. where the entry is mypage.com instead of https://mypage.com
There was a problem hiding this comment.
I meant the spaces in between, for example coming soon and in process will automatically become invalid. A domain name does not have a space in it.
Yes, we can extend on URL validity to check if a TLD is present without http/https protocol. However, it may remain a manual check in the beginning.
ZeeshanJan
left a comment
There was a problem hiding this comment.
Looks good to me.
Please see a few suggestions I have left.
All Submissions:
Description
Added pool metadata for Cardano that has been fetched from a full node and combined it with metadata from Big Query, so that the clustering is as complete as possible (it was observed that there were some clusters missing from the previous data, but this should be fixed now).
Checklist
Update Mapping Support Information Submissions: