causify-ai
diff --git a/‎DATA605/Spring2025/projects/TutorTask204_Spring2025_RealTime_Bitcoin_Sentiment_Analysis_spaCy_Selenium/docker_data605_style/.ipynb_checkpoints/spacy_selenium_API-checkpoint.ipynb‎
Lines changed: 197 additions & 0 deletions b/‎DATA605/Spring2025/projects/TutorTask204_Spring2025_RealTime_Bitcoin_Sentiment_Analysis_spaCy_Selenium/docker_data605_style/.ipynb_checkpoints/spacy_selenium_API-checkpoint.ipynb‎
Lines changed: 197 additions & 0 deletions
@@ -0,0 +1,197 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# spaCy API Demonstration\n",
+    "\n",
+    "This notebook demonstrates the native API functions used in the Bitcoin sentiment analysis project, focusing on spaCy for natural language processing and Selenium for web scraping. It serves as a companion to the main pipeline notebook, `spacy_example.ipynb`, and uses functions from `spacy_utils.py`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 1: spaCy Demonstration\n",
+    "\n",
+    "We use spaCy for tokenization, lemmatization, and named entity recognition (NER). Below are examples using spaCy's API functions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cleaned Text: I just bought some Bitcoin at $50,000!\n",
+      "\n",
+      "Tokens:\n",
+      "I (Lemma: I, POS: PRON)\n",
+      "just (Lemma: just, POS: ADV)\n",
+      "bought (Lemma: buy, POS: VERB)\n",
+      "some (Lemma: some, POS: DET)\n",
+      "Bitcoin (Lemma: Bitcoin, POS: PROPN)\n",
+      "at (Lemma: at, POS: ADP)\n",
+      "$ (Lemma: $, POS: SYM)\n",
+      "50,000 (Lemma: 50,000, POS: NUM)\n",
+      "! (Lemma: !, POS: PUNCT)\n",
+      "\n",
+      "Entities:\n",
+      "Bitcoin (PERSON)\n",
+      "50,000 (MONEY)\n",
+      "\n",
+      "Dependency Parsing:\n",
+      "I --> nsubj (Head: bought)\n",
+      "just --> advmod (Head: bought)\n",
+      "bought --> ROOT (Head: bought)\n",
+      "some --> det (Head: Bitcoin)\n",
+      "Bitcoin --> dobj (Head: bought)\n",
+      "at --> prep (Head: bought)\n",
+      "$ --> nmod (Head: 50,000)\n",
+      "50,000 --> pobj (Head: at)\n",
+      "! --> punct (Head: bought)\n",
+      "\n",
+      "POS Tags:\n",
+      "I: PRON (pronoun)\n",
+      "just: ADV (adverb)\n",
+      "bought: VERB (verb)\n",
+      "some: DET (determiner)\n",
+      "Bitcoin: PROPN (proper noun)\n",
+      "at: ADP (adposition)\n",
+      "$: SYM (symbol)\n",
+      "50,000: NUM (numeral)\n",
+      "!: PUNCT (punctuation)\n"
+     ]
+    }
+   ],
+   "source": [
+    "import spacy\n",
+    "import re\n",
+    "\n",
+    "# Load the spaCy model\n",
+    "nlp = spacy.load(\"en_core_web_sm\")\n",
+    "\n",
+    "# Example tweet text\n",
+    "text = \"I just bought some Bitcoin #BTC at $50,000!\"\n",
+    "\n",
+    "# Clean the text\n",
+    "cleaned_text = re.sub(r\"http\\S+|www\\S+|https\\S+\", \"\", text, flags=re.MULTILINE)\n",
+    "cleaned_text = re.sub(r\"@\\w+|#\\w+\", \"\", cleaned_text)\n",
+    "cleaned_text = cleaned_text.encode(\"ascii\", \"ignore\").decode()  # Remove emojis\n",
+    "cleaned_text = re.sub(r\"\\s+\", \" \", cleaned_text).strip()\n",
+    "print(f\"Cleaned Text: {cleaned_text}\\n\")\n",
+    "\n",
+    "# Process the text with spaCy\n",
+    "doc = nlp(cleaned_text)\n",
+    "\n",
+    "# Tokenization\n",
+    "print(\"Tokens:\")\n",
+    "for token in doc:\n",
+    "    print(f\"{token.text} (Lemma: {token.lemma_}, POS: {token.pos_})\")\n",
+    "\n",
+    "# Named Entity Recognition (NER)\n",
+    "print(\"\\nEntities:\")\n",
+    "for ent in doc.ents:\n",
+    "    print(f\"{ent.text} ({ent.label_})\")\n",
+    "\n",
+    "# Dependency Parsing\n",
+    "print(\"\\nDependency Parsing:\")\n",
+    "for token in doc:\n",
+    "    print(f\"{token.text} --> {token.dep_} (Head: {token.head.text})\")\n",
+    "\n",
+    "# Part-of-Speech Tagging\n",
+    "print(\"\\nPOS Tags:\")\n",
+    "for token in doc:\n",
+    "    print(f\"{token.text}: {token.pos_} ({spacy.explain(token.pos_)})\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2: Selenium Demonstration\n",
+    "\n",
+    "We use Selenium to scrape tweets from X (Twitter). Below is an example using the `BitcoinSentimentAnalyzer` class from `spacy_selenium_utils.py`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sample Tweets:\n",
+      "- Bitcoin just touched $74K.\n",
+      "That same HR manager who flagged my crypto side hustle now runs “on-chain payroll workshops.”\n",
+      "Yeah Jessica, glad compliance caught up with capitalism.\\n\n",
+      "- After longtime I am in profit in bitcoin investment, waiting good news for Solana. \n",
+      "As a bitcoin price increases this week.\n",
+      "What is the next target for Bitcoin?\n",
+      "\n",
+      "#Memes #CryptoMarket #cryptotrader #Bitcoin\n",
+      "#BitcoinPizzaDay\\n\n",
+      "- Absolutely! It’s all about understanding value and the real cost of inflation. Bitcoin shines a light on the hidden truths of our financial system!\\n\n"
+     ]
+    }
+   ],
+   "source": [
+    "from spacy_selenium_utils import BitcoinSentimentAnalyzer\n",
+    "\n",
+    "# Initialize the analyzer\n",
+    "analyzer = BitcoinSentimentAnalyzer(\n",
+    "    \n",
+    "    x_username=\"sidrohtest\",\n",
+    "    x_password=\"siddhirohantesting#123\"\n",
+    ")\n",
+    "\n",
+    "# Scrape tweets\n",
+    "tweets = analyzer.scrape_tweets(keywords=[\"Bitcoin\"], max_tweets=3)  # Limited to 3 tweets for demo\n",
+    "\n",
+    "# Display the scraped tweets\n",
+    "print(\"Sample Tweets:\")\n",
+    "for tweet in tweets:\n",
+    "    print(f\"- {tweet['text']}\\\\n\")\n",
+    "\n",
+    "# Clean up\n",
+    "del analyzer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: Integration with Main Pipeline\n",
+    "\n",
+    "The Selenium functionality demonstrated above is integrated into the main pipeline in `spacy_utils.py`. For the full pipeline execution, see `spacy_example.ipynb`."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}