Skip to content

Invalid links returned with some chinese characters as delimiters #15

@tommedema

Description

@tommedema

Steps to reproduce:

  1. linkify the following text

【视频奇志大兵《发烧友》 在线观看 - 酷6视频】奇志大兵《发烧友》 在线观看,奇志 大兵 搞笑双簧 _ 发烧友 (追星族) http://t.cn/RZwjG7U(分享自 @酷6网)

  1. the output link is

http://t.cn/RZwjG7U(分享自

whereas the output link should be

http://t.cn/RZwjG7U

The reason is that ( is not recognized as a separating delimiter, yet it is quite common in Chinese.

Out of 500 posts I gathered, about 20 to 30 of them had links like this, resulting in invalid links reported by linkify.

Note that I realize that these users are technically posting invalid URLs, but 20-30 out of 500 is very common and therefore there should be a way to deal with this. Any suggestion?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions