Home Network Project
RSS icon Email icon Home icon
  • Getting notifications of Craigslist postings with BASH and stuff

    Posted on April 11th, 2009 lance No comments

    I want a puppy!!

    My daughter wants a puppy real bad.  She’s been quite responsible this past year with her chores, so it’s time to surprise her with a puppy.  We don’t want just any ol’ dog that’s in a cardboard box outside the grocery store.  We have a particular one in mind.  A yellow lab.

    I figure craigslist is a great place to look.  I will look around at other places, but I have designed a way to be lazy about looking for new postings on craigslist.  BASH scripting with SED and GREP. Now my understanding of these powerful tools borders on minimal, but if what I have learned can help you to be lazy at watching for items on craigslist as well, then my job here is complete.

    First off, I don’t have enough time, space or the expert knowledge to teach BASH scripting, SED or GREP here, however you can google them and find tons of resources, or use these links for some helpful information to get you started.

    Getting Started With BASH
    SED command
    How to use the GREP command

    The Script

    First off, I went to craigslist, selected my geographical area, then under the Community section, chose Pets. Here I can search for Yellow Labs.  I decided to search just for Labs and then search for Yellow using GREP.  Why that way?  To gain experience.  Even though I want my search and notification of such to make things easy, I just may need more skills for future scriptings.  So, after doing this search it yields the following URL.

    http://anchorage.craigslist.org/search/pet?query=lab

    If you notice, it has my area, but then at the end it has the search query.  Referencing this URL will always search the Alaska area in the Pet category with a search for “lab“.  You can search for anything in your script and it will always be the same search.  You could even get more detailed in your script and have it do several types of searches and be more dynamic if you want.  Today we’ll keep it simple.

    The following is a rundown of my script.

    #!/usr/bin/bash
    wget http://anchorage.craigslist.org/search/pet?query=lab

    First we of course start off with the bash/bang sequence, then we download our URL from craigslist.  It creates a weird file name called “pet?query=lab”.  To make things easier, I right away rename the file.

    mv pet\?query\=lab lab.html

    If you notice there’s a few extra forward slashes in there.  These are escape characters so the script will output the ? and = properly, otherwise it will assume you are using them as part of a function.  The easiest thing to do to get those in there was to highlight the file name after I list the directory with ls and then paste it into the script and it took care of escaping the proper characters for me.  Again, easy!

    After renaming to something easier to deal with I took a look at the file and if I did a grep for yellow, it showed me the whole file (or most of it anyway).  The reason being is that the text in the file pretty much did not have any or if just a few newlines in it, so the whole things was one line and when grep found yellow in that line, it outputted the whole line.  So I had to break it down by finding something that was unique to each line I wanted that would keep the description and the URL to the items posting intact.

    Now we introduce SED.

    sed -f  labsed lab.html > lab.fmt

    Among other powerful things SED can do, it will replace strings of text with something else that you specify. That’s easy and simple enough for a newbie.  The command above references a script file called LABSED and applies it to lab.html and outputs that to lab.fmt.  I know my naming conventions may be totally insane, but it’s a fit for me.  Just so long as I can keep it straight in my head.

    The LABSED script is as follows:

    s/<p>/\
    /g

    Pretty simple, eh?  Well it took some time because I wanted to replace the <p> code with a newline character and it took a while to find how to do that.  So to break this down, grep uses the s/string to replace/replace with this/ option here.  So I want to search for <p> and replace it with a newline character, so I had to escape it with a backslash \ and then a carriage return.  After that last slash there’s a g.  Grep will normally just replace the first occurrence you specify by default, with the g, I specify for it to keep going and replace ALL occurrences.

    Now that each description with it’s  link is broken to a single line, it’s time to keep just the lines with yellow on them.

    grep -i yellow lab.fmt > lab.match

    Here I use grep to search for yellow.  Of course the -i option is specified to ignore case.  Without that I was missing a few postings.  Allow lines that match are outputted to another file.  I do believe it’s possible to use SED to search lines that don’t match the string you specify and then remove them, but this works just fine.

    We’re going to use the if/else commands to see if our information is new or not and act accordingly.

    Now to compare the new with the old.

    DIFF is a command that will compare the difference between two files.  The command below will compare the two and if they match, then do nothing, or as in this case just echo something, anything, I don’t care.  I tried it once without the echo command and it had a problem (I think because there was nothing to do, so it was like “what?”, and I was like “yeah do nothing” and it was like “no!”).  So I’ll figure that problem out some other day, but until then, echo works.

    if diff lab.match lab.new >/dev/null; then
    echo nothing new
    else

    Do it or ELSE.

    So if the files DO NOT match, we’ve got something new posted on craigslist.  Yay!  So I don’t need the lab.new file anymore. (This is the previous file that we compared the new one to.  I know it may not make sense, maybe my naming it lab.new could be confusing, but like I said, I am insane.  8-O I didn’t say that?  Are you sure?  Okay.  Let’s carry on.)   Then I rename the lab.match file to lab.new so that IT will be the file we compare the next one to.  Make sense now? :-D

    rm lab.new
    mv lab.match lab.new
    sed ’s/”\//”http:\/\/anchorage.craigslist.org\//g’ lab.new > lab.url

    Again we use SED.  Since the URL links for each originally are on a page from craigslist they don’t need to specify the domain name, just the directory and html file.  But since our new page is going to be somewhere else, in this case my Linux Box, I need to put the domain back into the URL.  So I do a search on the first occurance of “/ .  (that’s a quote and a slash)   I then replace that with “http://anchorage.craigslist.org/ (without the quotes).   Again a new file, lab.url, is created.  Now this file will have ALL the HTML coding for the description of the posting along with the hyperlink to the page on craigslist.

    Now it’s time to make our INDEX.HTML file (or whatever you want to call yours).  We’ll delete the old one, then build the new.  I have some simple HTML code for the header and footer and then insert the text “Puppy listing”, the lab.url file and the date the page was created inbetween the head.html and foot.html, creating one good fully functional piece of art.  Well it’s not pretty, but it does the trick.

    rm index.html

    cp head.html index.html
    echo Puppy listing >> index.html
    cat lab.url >> index.html
    echo This page created >> index.html
    date >> index.html
    cat foot.html >> index.html

    Okay, the last thing in my script is to send myself a text message whenever there’s a new listing.

    echo “YELLOW LAB PUPPY” | mail -s Craigslist 907nxxxxxx@msg.cellprovider.com
    fi

    Here I send the text “Yellow Lab Puppy” to the mail program which specifies a subject of “Craigslist” and then it e-mails it to the e-mail address my phone provider has specified will forward as a text message to my cell phone.  Check with your provider, most have this option available.

    As long as your sendmail program works, this should be fine.  You can opt to just have it sent to a regular e-mail address if you wish.  I like the thought that I can be out and about and get a notification without having to check my e-mail.

    I then use CRONTAB to check for new puppy listings every hour between 8am and Midnight.  For more info on CRONTAB see this page.

    *one thing I noticed that I’ll have to work on is that if there are no new postings that match my criteria for some time and an older posting slips out of the search, the comparing the two files will result in the whole process thinking there’s something new, only because it’s comparing the two files for an exact match between the two.  This is something else to attend to.  I’ll keep you posted should I fix this soon.

    And the resulting HTML file posts as follows:

    Puppy listing Mar 19 – Large Male Yellow Lab - (Anchorage) pic
    Mar 15 – LOST Yellow LAB! - (Anchorage-Spenard) pic
    Mar 12 – Free Female Yellow Lab - (Palmer)
    This page created
    Thu Mar 31 13:46:47 AKDT 2009

    I get texted about new listings and I can check this page on the internet (since my box is accessible through DDNS, another posting I’ll get to).