Skip to content

Commit df0c5e7

Browse files
typo fix for legal section
1 parent 75d3062 commit df0c5e7

4 files changed

Lines changed: 3 additions & 3 deletions

File tree

docs/search/search_index.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

docs/section-5-legal-and-ethical-considerations/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,7 @@ <h3 id="web-scraping-code-of-conduct">Web scraping code of conduct<a class="head
537537
<p><strong>Publish your own data in a reusable way.</strong> Don’t force others to write their own scrapers to get at your data. Use open and software-agnostic formats (e.g. JSON, XML), provide metadata (data about your data: where it came from, what it represents, how to use it, etc.) and make sure it can be indexed by search engines so that people can find it.</p>
538538
</li>
539539
<li>
540-
<p><strong>View <code>robots.txt</code> file</strong>. Robots.txt is a file used by websites to let bots know if or how the site should be crawled and indexed. When you are trying to extract data from the web, it is critical to understand what robots.txt is and how to respect it to avoid legal ramifications. This file can be accessed for any domain by accessing <domain_url>/robots.txt. For eg: <a href="https://www.monash.edu/robots.txt"><code>monash.edu/robots.txt</code></a>, <a href="https://www.facebook.com/robots.txt"><code>facebook.com/robots.txt</code></a>, <a href="https://www.linkedin.com/robots.txt"><code>linkedin.com/robots.txt</code></a>.</p>
540+
<p><strong>View <code>robots.txt</code> file</strong>. Robots.txt is a file used by websites to let 'bots' know if or how the site should be crawled and indexed. When you are trying to extract data from the web, it is critical to understand what robots.txt is and how to respect it to avoid legal ramifications. This file can be accessed for any domain by accessing <code>&lt;domain_url&gt;/robots.txt</code>. For eg: <a href="https://www.monash.edu/robots.txt"><code>monash.edu/robots.txt</code></a>, <a href="https://www.facebook.com/robots.txt"><code>facebook.com/robots.txt</code></a>, <a href="https://www.linkedin.com/robots.txt"><code>linkedin.com/robots.txt</code></a>.</p>
541541
</li>
542542
</ol>
543543
<p>Happy scraping!</p>

docs/sitemap.xml.gz

0 Bytes
Binary file not shown.

markdowns/section-5-legal-and-ethical-considerations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ This all being said, if you adhere to the following simple rules, you will proba
7373

7474
7. __Publish your own data in a reusable way.__ Don’t force others to write their own scrapers to get at your data. Use open and software-agnostic formats (e.g. JSON, XML), provide metadata (data about your data: where it came from, what it represents, how to use it, etc.) and make sure it can be indexed by search engines so that people can find it.
7575

76-
8. __View `robots.txt` file__. Robots.txt is a file used by websites to let bots know if or how the site should be crawled and indexed. When you are trying to extract data from the web, it is critical to understand what robots.txt is and how to respect it to avoid legal ramifications. This file can be accessed for any domain by accessing <domain_url>/robots.txt. For eg: [`monash.edu/robots.txt`](https://www.monash.edu/robots.txt), [`facebook.com/robots.txt`](https://www.facebook.com/robots.txt), [`linkedin.com/robots.txt`](https://www.linkedin.com/robots.txt).
76+
8. __View `robots.txt` file__. Robots.txt is a file used by websites to let 'bots' know if or how the site should be crawled and indexed. When you are trying to extract data from the web, it is critical to understand what robots.txt is and how to respect it to avoid legal ramifications. This file can be accessed for any domain by accessing `<domain_url>/robots.txt`. For eg: [`monash.edu/robots.txt`](https://www.monash.edu/robots.txt), [`facebook.com/robots.txt`](https://www.facebook.com/robots.txt), [`linkedin.com/robots.txt`](https://www.linkedin.com/robots.txt).
7777

7878
Happy scraping!
7979

0 commit comments

Comments
 (0)