Skip to content

Commit ec62d7a

Browse files
committed
Docker Setup Working
1 parent e53f4c3 commit ec62d7a

26 files changed

Lines changed: 3328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# spaCy API Demonstration\n",
8+
"\n",
9+
"This notebook demonstrates the native API functions used in the Bitcoin sentiment analysis project, focusing on spaCy for natural language processing and Selenium for web scraping. It serves as a companion to the main pipeline notebook, `spacy_example.ipynb`, and uses functions from `spacy_utils.py`."
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"## Step 1: spaCy Demonstration\n",
17+
"\n",
18+
"We use spaCy for tokenization, lemmatization, and named entity recognition (NER). Below are examples using spaCy's API functions."
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": 2,
24+
"metadata": {},
25+
"outputs": [
26+
{
27+
"name": "stdout",
28+
"output_type": "stream",
29+
"text": [
30+
"Cleaned Text: I just bought some Bitcoin at $50,000!\n",
31+
"\n",
32+
"Tokens:\n",
33+
"I (Lemma: I, POS: PRON)\n",
34+
"just (Lemma: just, POS: ADV)\n",
35+
"bought (Lemma: buy, POS: VERB)\n",
36+
"some (Lemma: some, POS: DET)\n",
37+
"Bitcoin (Lemma: Bitcoin, POS: PROPN)\n",
38+
"at (Lemma: at, POS: ADP)\n",
39+
"$ (Lemma: $, POS: SYM)\n",
40+
"50,000 (Lemma: 50,000, POS: NUM)\n",
41+
"! (Lemma: !, POS: PUNCT)\n",
42+
"\n",
43+
"Entities:\n",
44+
"Bitcoin (PERSON)\n",
45+
"50,000 (MONEY)\n",
46+
"\n",
47+
"Dependency Parsing:\n",
48+
"I --> nsubj (Head: bought)\n",
49+
"just --> advmod (Head: bought)\n",
50+
"bought --> ROOT (Head: bought)\n",
51+
"some --> det (Head: Bitcoin)\n",
52+
"Bitcoin --> dobj (Head: bought)\n",
53+
"at --> prep (Head: bought)\n",
54+
"$ --> nmod (Head: 50,000)\n",
55+
"50,000 --> pobj (Head: at)\n",
56+
"! --> punct (Head: bought)\n",
57+
"\n",
58+
"POS Tags:\n",
59+
"I: PRON (pronoun)\n",
60+
"just: ADV (adverb)\n",
61+
"bought: VERB (verb)\n",
62+
"some: DET (determiner)\n",
63+
"Bitcoin: PROPN (proper noun)\n",
64+
"at: ADP (adposition)\n",
65+
"$: SYM (symbol)\n",
66+
"50,000: NUM (numeral)\n",
67+
"!: PUNCT (punctuation)\n"
68+
]
69+
}
70+
],
71+
"source": [
72+
"import spacy\n",
73+
"import re\n",
74+
"\n",
75+
"# Load the spaCy model\n",
76+
"nlp = spacy.load(\"en_core_web_sm\")\n",
77+
"\n",
78+
"# Example tweet text\n",
79+
"text = \"I just bought some Bitcoin #BTC at $50,000!\"\n",
80+
"\n",
81+
"# Clean the text\n",
82+
"cleaned_text = re.sub(r\"http\\S+|www\\S+|https\\S+\", \"\", text, flags=re.MULTILINE)\n",
83+
"cleaned_text = re.sub(r\"@\\w+|#\\w+\", \"\", cleaned_text)\n",
84+
"cleaned_text = cleaned_text.encode(\"ascii\", \"ignore\").decode() # Remove emojis\n",
85+
"cleaned_text = re.sub(r\"\\s+\", \" \", cleaned_text).strip()\n",
86+
"print(f\"Cleaned Text: {cleaned_text}\\n\")\n",
87+
"\n",
88+
"# Process the text with spaCy\n",
89+
"doc = nlp(cleaned_text)\n",
90+
"\n",
91+
"# Tokenization\n",
92+
"print(\"Tokens:\")\n",
93+
"for token in doc:\n",
94+
" print(f\"{token.text} (Lemma: {token.lemma_}, POS: {token.pos_})\")\n",
95+
"\n",
96+
"# Named Entity Recognition (NER)\n",
97+
"print(\"\\nEntities:\")\n",
98+
"for ent in doc.ents:\n",
99+
" print(f\"{ent.text} ({ent.label_})\")\n",
100+
"\n",
101+
"# Dependency Parsing\n",
102+
"print(\"\\nDependency Parsing:\")\n",
103+
"for token in doc:\n",
104+
" print(f\"{token.text} --> {token.dep_} (Head: {token.head.text})\")\n",
105+
"\n",
106+
"# Part-of-Speech Tagging\n",
107+
"print(\"\\nPOS Tags:\")\n",
108+
"for token in doc:\n",
109+
" print(f\"{token.text}: {token.pos_} ({spacy.explain(token.pos_)})\")\n"
110+
]
111+
},
112+
{
113+
"cell_type": "markdown",
114+
"metadata": {},
115+
"source": [
116+
"## Step 2: Selenium Demonstration\n",
117+
"\n",
118+
"We use Selenium to scrape tweets from X (Twitter). Below is an example using the `BitcoinSentimentAnalyzer` class from `spacy_selenium_utils.py`."
119+
]
120+
},
121+
{
122+
"cell_type": "code",
123+
"execution_count": 3,
124+
"metadata": {},
125+
"outputs": [
126+
{
127+
"name": "stdout",
128+
"output_type": "stream",
129+
"text": [
130+
"Sample Tweets:\n",
131+
"- Bitcoin just touched $74K.\n",
132+
"That same HR manager who flagged my crypto side hustle now runs “on-chain payroll workshops.”\n",
133+
"Yeah Jessica, glad compliance caught up with capitalism.\\n\n",
134+
"- After longtime I am in profit in bitcoin investment, waiting good news for Solana. \n",
135+
"As a bitcoin price increases this week.\n",
136+
"What is the next target for Bitcoin?\n",
137+
"\n",
138+
"#Memes #CryptoMarket #cryptotrader #Bitcoin\n",
139+
"#BitcoinPizzaDay\\n\n",
140+
"- Absolutely! It’s all about understanding value and the real cost of inflation. Bitcoin shines a light on the hidden truths of our financial system!\\n\n"
141+
]
142+
}
143+
],
144+
"source": [
145+
"from spacy_selenium_utils import BitcoinSentimentAnalyzer\n",
146+
"\n",
147+
"# Initialize the analyzer\n",
148+
"analyzer = BitcoinSentimentAnalyzer(\n",
149+
" \n",
150+
" x_username=\"sidrohtest\",\n",
151+
" x_password=\"siddhirohantesting#123\"\n",
152+
")\n",
153+
"\n",
154+
"# Scrape tweets\n",
155+
"tweets = analyzer.scrape_tweets(keywords=[\"Bitcoin\"], max_tweets=3) # Limited to 3 tweets for demo\n",
156+
"\n",
157+
"# Display the scraped tweets\n",
158+
"print(\"Sample Tweets:\")\n",
159+
"for tweet in tweets:\n",
160+
" print(f\"- {tweet['text']}\\\\n\")\n",
161+
"\n",
162+
"# Clean up\n",
163+
"del analyzer"
164+
]
165+
},
166+
{
167+
"cell_type": "markdown",
168+
"metadata": {},
169+
"source": [
170+
"## Step 3: Integration with Main Pipeline\n",
171+
"\n",
172+
"The Selenium functionality demonstrated above is integrated into the main pipeline in `spacy_utils.py`. For the full pipeline execution, see `spacy_example.ipynb`."
173+
]
174+
}
175+
],
176+
"metadata": {
177+
"kernelspec": {
178+
"display_name": "Python 3 (ipykernel)",
179+
"language": "python",
180+
"name": "python3"
181+
},
182+
"language_info": {
183+
"codemirror_mode": {
184+
"name": "ipython",
185+
"version": 3
186+
},
187+
"file_extension": ".py",
188+
"mimetype": "text/x-python",
189+
"name": "python",
190+
"nbconvert_exporter": "python",
191+
"pygments_lexer": "ipython3",
192+
"version": "3.8.10"
193+
}
194+
},
195+
"nbformat": 4,
196+
"nbformat_minor": 4
197+
}

0 commit comments

Comments
 (0)