Skip to content

bofus10/MlHTMLParserThread

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MlHTMLParserThread

ML Arg web parser: Will scrap all the vendors listed on your vendors table on your db

Pre-requisits 📋

  • You will need to compile the code and generate a jar file for execution
  • Setup a DB
  • Add sellers

Setup 🔧

For this to work you will need to install and setup a DB, in my case I use mariaDB and then load the sellers you want to scrap from

DB Structure can be found under samples directory

Once DB is up and running, fill the sellers and you can launch your program

java -jar compiled_jar

Adding vendors 🛠️

You need to fill the vendors table first. This table has 3 columns ID,name,display

ID: Any ID You want to give to the seller for your internal usage, ID is not an autoincrement this is due to the program not being able to scrap all of seller listing at once due to ML AuthToken based authorizartion, to sort this we filter within the seller's list by price and work in up to 1000 products batches

name: Seller_ID, this can be gather from any of the products page of the seller, inspecting it and looking for seller_id

display: This is the Display Name you want the app to show to you, if seller_id = 12345678, display=PCWorld

example

seller_id = 12345678

If seller has more than 1000 products we can: 

12345678&price=*-5000 #from 0 to 5000$

12345678&price=5000-* #from 5000 to MAX

Both records need to have the SAME ID.

Setup Telegram Bot ⚙️

You will need to setup a Telegram bot for it to send you updates and mesages You can follow https://core.telegram.org/bots

Once you got your bot setup, you need to save your API_KEY You will need to get your chat_id to set to where you want to send messages to To do this, best option is to open a chat with your new bot and send him a nice message after that you can check whats your chat_id by checking what the bot recived on:

https://api.telegram.org/botBOT_API_KEY/getUpdates

That will return a json file with all the info needed to process, like the chat_id

Once you got you BOT_TOKEN and CHAT_ID you need to fill those on config/config.properties This file will require 3 Bots and 2 Chats:

  • One Bot and Chat for regular message
  • One Bot and Chat for special price message
  • One Bot for custom searchs

Content config.properties ⚙️

  • sql_queue_size = SQL Queue Size

  • thread_num = Number of threads to launch -> More threads = More data flow, make your your sql_queue_size is properly sized and your DB can handle the flow.

  • perc_regular = Base Percentaje to trigger a regular msg alert

  • perc_special = Base Percentaje to trigger an special msg alert

Usage 🚀

The usage is mainly via Telegram Bots, all message will get to you automatically

Search Bot is for you to search products within your database, return result size can be modified on config.properties Open a chat with your search bot and type whatever you want to find sorted by lower price

Logging 🔩

You can check logs folder to check if data is being inserted on the DB

License 📄

Project License type: GPL-3.0 License - Check LICENSE file for more details.

About

Arg ML web parses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages