Adium Sort By Log Size

January 22nd, 2012

Back when I was using Pidgin, I loved its “sort by log size” feature. It organized your contact list for you, floating your most-talked-to contacts to the top. It was also fun to see which of your friends “wins” by being on top.

Adium had a plugin that did this, unsurprisingly called the Sort By Log Size Plugin. Unfortunately, it was partly incompatible with Adium 1.4, occasionally preventing the contact list from loading at all. Even worse, it appeared that the plugin had been abandoned, with no updates since June 2009.

For the last several years I have been relying on manually sorting my contacts into ordered groups on Adium, but I finally got my lazy ass off my couch yesterday and fixed the bug myself. I’m trying to get into contact with the developer, but until then it can be downloaded from my Github (direct download).

Update: The developer has accepted the patch and is providing an update that will be available pending approval! Hooray!

Best Seemingly Useless VIM Trick Ever

November 2nd, 2010

At some point this summer, I discovered the vim commands Ctrl+A and Ctrl+X. In normal mode, they find the first number on the current line and increment and decrement it, respectively.

What’s your first thought when you read that? Mine was “Why the heck is that useful enough to be included as default functionality? How often can you need that? That is a complete waste of a keybinding and is just confusing.”

Well, I stand partially corrected. It is not completely useless. I have found myself using it several times in the last month or so. My favorite use case is for making lists:

First, type

 1. Some list element.

Then, in normal mode, record the following macro:

qa
Y
p
Ctrl-A
q

Now, 10@a will perform the macro 10 times, copying the line and auto-incrementing the number each time.

1. Some list element.
2. Some list element.
3. Some list element.
4. Some list element.
 ... etc.

Neat! I’m still not entirely convinced it’s actually useful, but its cute nonetheless.

Amazon Prime is Ruining My Budget

October 27th, 2010

Amazon Prime for students is brilliant. In the last two weeks, just because if I have the slightest impulse to buy something, I can load up Amazon and get it in 2 days, I have bought:

Impulsive purchases are just so easy now. Thanks to my good friends at Amazon and their free 2-day shipping, I am now a much poorer college student, but am enriched by instant mashed potatoes and children’s books. (Good god, I just went through that list and it costs $130. Those little things really do add up.)

Calvin and Hobbes + NASA Astronomy Picture of the Day

September 6th, 2010

ihaveideas on reddit had an excellent idea.

I have been playing with ImageMagick recently so I figured implementing this would be good practice.

I think it works so far :-). Here are some results:

Pretty sweet, huh?

Code is available here. It’s kinda hacky right now, but it seems to work.
The bash script takes one argument, the URL of the APOD post. If no argument is given it defaults to the most recent post.

Possible future improvements: Write a program that runs this daily and sets it as your desktop background. That can wait until at least tomorrow.

Auto-Uploading Screenshots in OS X

August 8th, 2010

99% of time I take a screenshot, its to show someone. The workflow there is:

  1. Take a screenshot of a selected area with Cmd+Shift+4. OS X saves it to the Desktop.
  2. Open a terminal.
  3. SCP the file to my web directory.
  4. Open the web directory in a browser to make sure the upload worked and so I can copy the URL.
  5. Paste the URL to whoever I want to share it with.

This is slow and boring. The other day my fellow intern Larry showed me his way of auto-uploading these screenshots, so I shamelessly stole it, modified it a little and am now claiming it as my own:

First, I made an Automator application that runs the following shell script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
USER="user"
HOST="host.com"
UPLOAD_DIR="/var/www/screenshots"
WEB_DIR="screenshots"
LOCAL_DIR="$HOME/Desktop"
 
FILE="Screenshot.`date +%m%d%H%M%S`.png"
URL="http://$HOST/$WEB_DIR/$FILE"
 
screencapture -xi $LOCAL_DIR/$FILE
ping -t 1 -c 1 $HOST
if [ $? -eq 0 ]; then
        scp $LOCAL_DIR/$FILE $USER@$HOST:$UPLOAD_DIR/
        rm $LOCAL_DIR/$FILE
        echo $URL | pbcopy
        growlnotify -m "Screenshot uploaded to $URL"
else
        echo $LOCAL_DIR/$FILE | pbcopy
        growlnotify -m "Screenshot copied to Desktop"
fi

Then, I used Quicksilver to bind Cmd+Shift+4 to call this Application. I have an SSH key set up so the SCP works without any interaction from me. The script takes the screenshot, checks to see if the server is up, uploads the image, and copies the URL to my clipboard. If the server is down, it reverts to the default functionality of leaving the image on the Desktop.

The workflow is now:

  1. Take a screenshot of a selected area with Cmd+Shift+4.
  2. Magic!
  3. Paste the URL to whoever I want to share it with.

Larry just has Quicksilver run the code as a shell script but I like the automator application since it adds a little busy spinner in the taskbar while the script runs since uploading can be a bit slow if the connection sucks.

Dependencies: Quicksilver, Automator, growlnotify, I suppose. Automator, growlnotify, and possibly Quicksilver are all optional.

Fibonacci functions and Dynamic Programming

February 20th, 2010

Bhargav and I had a short discussion regarding Fibonacci number generating functions over winter break and I recently answered an interview question along the same vein, so I guess its a question worth looking at:

The recursive formula for the generation of the nth Fibonacci number Fn is Fn = Fn-1 + Fn-2, where F1 = F2 = 1. How then, do we write a function that generates the nth Fibonacci number? The simplest approach is to convert this formula directly into code (all sample code given in python):

def F(n):
    if n == 1 or n == 2:
         return 1
    else:
         return F(n-1) + F(n-2)

This works, but there’s a minor problem. When we run it for small numbers, it seems fine, but it slows down rapidly for large values of n. On my computer (which, admittedly, isn’t very fast, but it’s still a reasonably nice computer), F30 takes about half a second to run, F35 takes about 6 seconds, and I don’t even want to know when F50 will finish (according to my rough calculations, in about 6 days). This is clearly unacceptable. I could calculate F50 by hand in less than 6 days (or 20 minutes). So what’s wrong?

If we look at the function, we’ll see that when you call F(n), it makes two recursive calls to F(n-1) and F(n-2), each of those makes two more recursive calls, which each make two more, and so on. Furthermore, each additional call only reduces the problem size by a constant factor, one or two. This leads to exponential growth, where a call to F(n) takes almost twice as long as a call to F(n-1). This is bad! The culprit here is we’re doing extra work. Here is the tree of recursive calls for F(6):
Recursive tree of function calls for F(6)

As you can see, F(4) is being calculated in both subtrees of F(6), and F(3) is calculated in three different subtrees. As n gets larger, more and more work is being redone in multiple subtrees.

Bhargav’s solution was to use a hash table to save a solution once it’s been calculated, so that future calls can check if the solution has already been found for some n before calculating it again:

fib_results = {1:1, 2:1}
def F(n):
    if n in fib_results:
         return fib_results[n]
    else:
         fib_results[n] = F(n-1) + F(n-2)
         return fib_results[n]

This is in fact much faster and runs in linear time instead of exponential time. Unfortunately, it takes linear space in the form of the hash table, and it is unclear how or whether we can avoid using that space, though it seems like it might be possible since we didn’t use linear space in the naive solution. This approach is called memoization. Memoization is an example of a top-down alternative to dynamic programming. It’s advantage over the bottom-up approach, called dynamic programming, is that it is much easier to convert a naive recursive algorithm to a memoized algorithm. The disadvantage is that dynamic programming is generally somewhat faster in implementation and more powerful, making it clearer how to make optimizations such as getting rid of that linear space hash table.

The idea behind dynamic programming is, once we’ve defined a problem in terms of its subproblems, to solve it by starting from the base cases and solving subsequently larger problems, usually in tabular form. To find F(n), we create an array of n cells, filling cells 1 and 2 with the value 1 (those are the base cases). To fill each cell, we look at the two previous cells and fill it with their sum. Conveniently, this means that the ith cell contains the value of F(i). Hence, by the time we fill the nth cell we will have found the value of F(n).

Sequence for filling DP table to solve F(n).

This algorithm would look like this:

def F(n):
    # Create an array of n cells
    fib_table = [None]*n

    # Initialize base cases
    fib_table[0] = 1
    fib_table[1] = 1
    
    # Fill table
    for i in range(2, n):
        fib_table[i] = fib_table[i-1] + fib_table[i-2]
    return fib_table[n-1]

Unfortunately, this still takes linear space! It will probably be a little faster than the memoized algorithm, since arrays are a little faster than dictionaries, but we still haven’t got rid of the need for the lookup table. However, if we look at this algorithm, it becomes clear that to fill each cell, we only need to look at the two cells before it. In fact, we can forget the values in every cell before those two, since we’ll never need them again. To save using that space, we can just reuse the same three cells over and over again. In fact, since we only have three cells, we can just use variables instead of creating a table:

def F(n):
    a = 1
    b = 1
    c = 1
    for i in range(2, n):
        c = a + b
        a = b
        b = c
    return c

Mission accomplished. We have a Fibonacci number function that runs in linear time and constant space. Dynamic programming is clearly awesome. A Fibonacci number function is one of the simplest applications of dynamic programming. One could easily imagine coming up with this solution without going through this thought process. However, there are many more complex problems that dynamic programming can solve, without which the best solutions are incredibly difficult to find. Some applications of dynamic programming include RNA folding and seam carving (crazy content aware image resizing)

Poetry Spam?*

July 9th, 2009

Some time in mid-February, I got tired of my spam filter’s many false positives and decided to turn it off. I preferred having to sift through 10-20 spam emails by hand than risk missing an important email falsely flagged and blocked. An additional bonus was the entertainment value of new and novel spam. Over the last month or two, I noticed a large amount of weird spam (as in not like your average spam, since all spam is pretty weird). This phenomenon peaked around two weeks ago, when I was receiving upwards of 5-10 of them a day.

Each email consisted of a subject, and when appeared to be a couplet of 19th-century-poetry sounding text. Some examples included:

Subject: and said, as her tear drops back she forced,
and, though i should say it never,
the trees bring forth sweet ecstasy only for the sake of present ease or gratification?

Subject: launched on yon shining bay,–
the latest spawn the press hath cast,–
father, father, where are you going look on the rising sun: there god does live

Subject: thy summer’s play younger and younger every day;
darkness passes away then cherish pity; lest you drive an angel from
answer’d the lovely maid and said; i am a watry weed, this time last evening. right there! all aboard!”

And that was it! No link to porn, no methods to please my man in bed, no pills to grow myself a 4 foot penis.

I think Scott put it best.

Subject: Re: the sky-lark and thrush,
From: Scott Smith
…the fuck?

On Sat, Jun 27, 2009 at 1:15 AM, Nathan Harrison
wrote:

> he doth give his joy to all. sits and smiles on the night.
> become a garden mild. the miner pauses in his rugged labor,

Incidentally, that one is my favorite, for some reason. It almost makes sense.

In the name of science, I did a little snooping regarding the emails. First, I checked the originating IP addresses from the email headers. I looked at maybe a dozen emails and traced the originating countries to see if they came from any one place. They came from a variety of countries, including Uruguay, Kuwait, Germany, Italy, and even Urbana, Ohio. Was this a semi-sentient global botnet determined to demonstrate its Vogon-esque poetry skills to the world?

The next question then is, were these couplets originally composed or plagarized from some famous poems? I pulled up Google and searched for some of the more coherent sounding phrases, and indeed several of them were hits. It appeared that this botnet was channeling the works of William Blake and Bret Harte to create its poetic genius.

So what’s going on? My guess is probably some variant on standard Bayesian poisoning, to try to confuse spam checkers. According to the wikipedia article on email spam, a common technique for Bayesian poisoning involves taking text from Project Gutenberg, and suspiciously, both Blake and Harte are in there.

However, Bayesian poisoning is used for text that goes along with spam trying to sell things, not random text by itself. Meh, whatever. Over the course of the last week, these emails have slowed, and I haven’t seen one in a couple days.  All I get now are my normal “How to Have rGeat sex – Smash alll Records www. bu15. net. Bewitcheed, Bedazzled, uBsted” emails.  Perhaps this will forever become a mystery.

Hey, maybe this is Conficker finally doing something interesting

* Note: Not to be confused with spam poetry or “spoetry”, another interesting phenomenon where people make poems out of spam subject lines. Seriously.

Testing Google’s PHP optimization tips

June 27th, 2009

Recently, Google posted a series of articles titled “Let’s make the web faster“, including one by Google webmaster Eric Higgins about optimizing PHP.

Some of Google’s tips are obvious, like using caching or avoiding unnecessary SQL queries. A couple, however, seemed pretty weird to me, so I ran some benchmarks to check them out for myself. (See bottom of post for specs, and other info on how I ran the benchmarks)

  • Use echo instead of print: Higgins doesn’t explain why he thinks echo is faster than print. The PHP documentation links to an faq that explains the difference, and mentions that echo is marginally faster because it doesn’t set a return value, but there is no real difference in speed between the two.

    To test this, I ran two tests, using echo and then print to print a short string (5 characters) and then a long string (5000 characters) 1,000 times each.

    The Benchmark

    Using echo to print a short string 1,000 times: 1170 microseconds
    Using print to print a short string 1,000 times: 1180 microseconds

    Using echo to print a long string 1,000 times: 1163 microseconds
    Using print to print a long string 1,000 times: 1182 microseconds

    Verdict: For short strings, echo was slightly faster, and for long strings, it looks like print was slightly faster. For both cases, this was a difference of only about a couple hundred-millionths of a second per call, and likely well within normal variation between calls. I’d say use whichever one you feel like using, personally.

  • Use switch/case instead of if/else for loosely-typed comparisons:Apparently this has better performance, readability, and maintainability. I’ll bit on the readability and maintainability, but this article is about performance.

    I’m testing with code very similar to the example that Higgins provided for this one. There are four branches, each of which calls a function that does nothing but return, and each branch is equally likely.

    The Benchmark

    Using if/else, repeated 1,000 times: 422 microseconds
    Using switch/case, repeated 1,000 times: 425 microseconds

    Verdict: Once again, the difference is barely noticeable, and it looks like if/else was actually slightly faster on this test. I would say not to worry too much about it, though the switch/case definitely looks better from a code readability point of view.

  • Provide echo with many strings as arguments instead of using concatenation: In the video, Eric replaced each of the periods in an echo call with commas, so that the strings were passed as many arguments to echo instead of concatenating them before passing them in. This seems definitely weird to me, but it might work.

    For this one, I’m testing echo "Hello "."World!"."\n"; vs. echo "Hello ","World!","\n";

    The Benchmark

    Using concatenation, repeated 1,000 times: 1225 microseconds
    Using arguments, repeated 1,000 times: 3255 microseconds

    Ouch! No way is it faster to pass in multiple arguments to echo than concatenation them first! I notice that I have three arguments to echo and that took almost 3 times as long to run… let’s try it with 6 arguments. This time I’m testing echo "Hello "."World!"."\n"."Hello "."World!"."\n"; vs. echo "Hello ","World!","\n"." ","World!","\n";

    The Benchmark (2)

    Using concatenation, repeated 1,000 times: 1445 microseconds
    Using arguments, repeated 1,000 times: 5354 microseconds

    It’s not quite linear, but it’s definitely correlated.

    Verdict: There’s no question about it. Use concatenation! It’s better! Google doesn’t know everything!

The moral of this story… Google is smart, but they don’t know everything. Test things for yourself to make sure their results are the same as yours. It’s possible that Google is using a different version of PHP, running on different hardware, or something like that. Each computer is different and if these small optimizations matter to you, you have to try it for yourself to make sure.

Edit: The PHP devs and at least one other blogger have reached about the same conclusions.

Note: I ran these benchmarks on my Linode VPS, with 360MB RAM and 4×2.50GHz processors. I was using PHP 5.2.6-3ubuntu4.1 with Suhosin-Patch 0.9.6.2 from the command line, and the PHP Benchmark library. Each reported time was an average over 100 trials. Source code for my benchmarks is available here. You have to pipe stdout to /dev/null, since I’m using print and echo. Results are in stderr.

Writing a keygen: Hacker challenge

April 1st, 2009

I stumbled upon a hacker challenge posed by blogger Andy Sloane last night.  He explained how, as a hobby, likes to try to crack or create key generators for commercial programs.  Naturally, he never distributes these cracks, but is only doing it for the intellectual challenge.  Writing a key generator is usually more useful than cracking, since a good keygen typically allows you to receive future updates.  Often, it is also more interesting from an intellectual point of view, as writing a keygen usually requires some knowledge in reverse engineering and number theory.  Andy tries to demonstrate what he means by providing a simple key checker of his own for us to break.  Since I had a lighter workload last night, I figured I’d try it.

For his challenge to the world, he offers the source code of the key checker and three working keys.  The source is available on his blog.  (While a normal keygen writer would not have access to the source code of the key checker, it is usually simple, though tedious, to decompile and interpret the compiled binaries that would be released.)

After downloading and skimming the source, I spent a while trying to understand the custom written (and minimally commented) library he uses for large (125 bit, in this case) integers.  Rather than try to interpret the code itself, I used gdb to inspect the way the library worked, and had soon figured out the inner workings of the key checker:

Each key that it received, it would convert into an integer.  Each key had a unique integer k (with some exceptions: O, Q, and 0; I and 1; and Z and 2 were considered to be interchangeable to lower the change of an misread key) to which it was decoded.

Two magic numbers n = 21967608700894359473408781171922272451 and d=4444901153489723031681183441752517327 were hard-coded into the program.  A new number mkd mod n was created.  The key was accepted if m ≡ 12345678 mod c, where c = n/232⌋ (rounded down to the nearest integer).

This was clearly a problem for number theory! This algorithm looked suspiciously similar to the RSA encryption algorithm, which similarly employs two large numbers with special properties.

A quick summary of the steps behind RSA encryption:

  1. Pick two large primes p and q.
  2. Compute n = pq.
  3. Compute the totient of n, φ(n) = (p-1)(q-1).
  4. Choose an integer e such that e is relatively prime to and less than φ(n).
  5. Compute the integer d such that ed ≡ 1 mod φ(n).
  6. For your message m, encrypt it by generating ciphertext c ≡ me mod n. e is known as the public key and is used to encrypt the message.
  7. The person who receives the ciphertext by using d and applying the formula m = cd mod n.

This works because of Euler’s theorem, which states that aφ(n) ≡ a mod n for any integers a and n.

From that, cd =(me)d=med= m1+kφ(n) m mod n. (Check out the explanation of RSA on wikipedia for a more in depth explanation.

I quickly used Mathematica to factor n to see if it was the product of two primes. It was!  I checked that d was relatively prime to φ(n) to be sure.  It also was. The only part where this differed from RSA was the final step where it checked to see if m ≡ 12345678 mod c before accepting the key.

I was a little thrown off by c being related to n, but I quickly realized that it didn’t matter what c was.  It was just there to make sure that more than one m (and therefore more than one key) was accepted.  Effectively, what this key checker did was to take RSA generated ciphertext as a key and check to see if it decryped to a message that left a remainder of 12345678 when divided by c.

Therefore, to generate a key, I needed to create a correct message and encrypt it, then convert the ciphertext back to the “key” format.  Creating a message is easy. m is an accepted message if m ≡ 12345678 mod c, so m = ac + 12345678, where a is any natural number.  To encrypt m, I need to find the private key e.  Since n is small enough to factor and ed ≡ 1 mod φ(n), it is easy to find e through modular arithmetic.

Voila! Now I have everything I need to write a keygen for Andy’s example key checker. In fact, here it is.  Now, clearly, I should go do my homework, having wasted several hours on this.

Good password habits?

March 5th, 2009

Cardinary Sins of Passwords

  • Using short passwords with few numbers or special characters.
  • Using the same password for many things (or everything).
  • Rarely (or never) changing passwords.
  • Using a name or common phrase in a password.

If you’re guilty of one or more of these bad password habits, raise your hand!

*Raises hand*

Despite being (I think) very tech savvy, I have some very bad password habits, and I would not be at all surprised if many of my similarly computer-smart friends do too. Mostly, I use the same password for many things, and I rarely change my passwords.  The password I use most often, for almost all of my accounts on various websites, I first used with my first email account so long ago in the mid 90s. It is a relatively short and simple password, and someone could definitely wreak some havoc if they got their hands on it, though they wouldn’t have access to anything important. Of my other passwords, the newest one is about 3 years old and I came up with the oldest one almost 7 years ago. That’s plenty of time for someone to get their hands on the keys to my personal information, bank accounts, etc.

One of my new years resolutions this year was to improve on my password habits, and I definitely haven’t gotten anywhere with that. (Another one was to blog more, heh, fail there too.) So I guess the question is, why is it so hard to keep passwords up to date and secure?

For me, the number one barrier to changing passwords regularly is the hassle of having to memorize a new password, especially a complex one with numbers and special characters. Similarly, I reuse the 3 or 4 passwords I use most often to minimize the risk of forgetting a password.  With so many different services which require passwords, and more every week as I sign up for a new web site, it is sometimes difficult to remember whether I have an account to a certain site, much less remember a unique password for each one. Furthermore, I’ve never had any problems with security, so I don’t have any real motivation to change them.

Possible solutions

Using a password manager: I got a free license for 1password during MacHeist’s Giving Tree this year, which I’m going to start using. This allows me to use more passwords of greater complexity, and not have to worry about forgetting them. Of course, this is only as secure as the master password I use for 1password, and I have to figure something out for when I am on a public computer. If I don’t actually remember my passwords, I won’t be able to get to my accounts away from my computer.
Using phrase passwords: Instead of the traditional 8-12 character random passwords, use a 20-30 character phrase that is easy to memorize but difficult to guess. To help mitigate the risk of getting locked out of my accounts if I lose access to my password manager somehow, I think I will start using phrases from songs or quotes or something as passwords, instead. Unfortunately, the longer a phrase is, the greater a chance for a typo while typing it in.
Using a password generation system: Another option which might be safer than phrase passwords is using a system to generate secure passwords that are unique for every site. One such system is this one suggested by Lifehacker.

I’m starting to migrate towards these practices right now. Hopefully, I’ll be able to keep it up and maintain my good luck with regard to password security for a while longer.