Cluster, Sphider, Musotik

Filed under geek, music, projects
Tagged as , , ,

So I’ve decided that I’m going to write/find a web crawler to find mp3s/ogg etc.. Then rewrite Musotik from scratch. It’ll also crawl torrent sites. I’m working with Sphider right now. It may become the base of my crawler.

After I get the spider wrote, I’ll run it on 3 of the cluster boxes, updating 1 mysql db. The other box will be the main webserver and database server.

  • 12/13/10 2am – Started ripping apart the Sphider script.
  • 12/13/10 2:30am – 3 nodes are indexing to the same DB. Testing with Digg, PirateBay, and Drawgasmic
  • 12/13/10 3:45am – Going to let it index.. here what I have so far:
  • —-Currently in database: 16 sites, 4964 links, 0 categories and 103853 keywords.
  • 12/13/10 4pm – let it run all night/day.
  • —-Currently in database: 16 sites, 32603 links, 0 categories and 278085 keywords.

So its pretty slow with Sphider. I also don’t need everything that Sphider does. I’m trying to decide whether I should write something from scratch or modify Sphider.

My next post will be about that, and probably heavy with PHP code.

Post a Comment

Your email is never published nor shared. Required fields are marked *