Sunday, 25 April 2010

Using a blog as a Logbook

Last productivity post for a while I promise, then back to proper physics.

I'm trying out using a private blog (secured and unlisted etc) as a logbook. Logbooks are so important, the first time you realise you need one it's too late. My reasoning for going online goes:
  1. I can access from anywhere, including emailing in posts from my phone. For example I could take a photo of a whiteboard discussion and send it to the blog so I won't forget about it.
  2.  It's safely backed up on the servers of whoever's hosting it.
  3. I'm more likely to actually make entries because it's easy.
  4. A blog is much like a logbook anyway so it's naturally suited.
I can see some downsides but they're all pretty minor. Security could be an issue but how sensitive is the information you put in your logbook? Well mine isn't much, I don't want to accidentally broadcast my latest idea but it wouldn't cause a new climategate or anything. Besides, I think it's pretty secure.

I've chosen to use WordPress for my logbook for one simple reason. The LaTeX integration is fantastic. It's so good I'd consider moving this blog if it wasn't such a pain (for the record I otherwise like blogger). In wordpress you do this:

I will now insert an equation here, $latex E=mc^2$, inline with the text.

which would look like
although the superscripts do appear to have messed up the alignment... Otherwise it does a brilliant job at interpreting the tex and inserting the image. If you need a lot of LaTeX then there are programmes that convert between regular .tex files and the wordpress format.

There are similar things available for blogger but I think you lose your source code in a more drastic way. Anyway, I'm going to see how it goes.

Wednesday, 21 April 2010

Pipes and Python

I spent ages writing a post about some tricks I use to do quick analysis of data but it got incredibly bloated and started waffling about work flows and so on. Anyway, I woke up from that nightmare so I thought I'd just bash out a couple of my top tips.

This is a pretty nerdy post, you may want to back away slowly.

Pipes
Pipes are, in my opinion, why the command line will reign for many years to come. Using the pipe I can quickly process my data by passing it between different programmes gradually refining it as it goes. Here's an example that makes a histogram (from a Bash terminal):

> cat myfile.data | awk 'NR>100 {print $5}' | histogram | xmgrace -pipe

The first command prints the data file. The | is the pipe, this redirects the output to the next programme, Awk, which here we are simply using to pick out the 5th column for all rows over 100 and print the result. Our pruned data is piped down the line to a programme I made called histogram which does the histogram and outputs the final result to my favourite plotting programme to have a look at it.

So we've used three programmes with a single "one liner" (some of my one-liners become ginormous). Once you start getting the hang of this sort of daisy chaining it can speed things up incredibly. One bit that took me a while the first time was the histogram programme. This took an annoying amount of time to set up because I used C.

This is where Python now comes in.

Python

I won't even try to give a Python tutorial. I'm a decade late to the party and have barely scratched the surface. However, I've found that for relatively little effort you can get access to thousands of functions, libraries and even graphics. Most importantly you can quickly write a programme, pipe in some data, and do sophisticated analysis on it.

With the scipy and numpy libraries I've done root-finding and integration. The pylab module seems to provide many of the functions you'd get in MatLab. Python is a bit of a missing link for me, it's much lighter than huge programmes like Mathematica or MatLab and I just get things done quickly. Here's that histogram programme, Python style.


#! /usr/bin/env python
import sys
import pylab
import numpy

# Check the inputs from the command line
if len(sys.argv)!=3:
   print "Must provide file name and number of bins"
   sys.exit(1)

# Read in the data file
f = open(sys.argv[1],'r')
histo=[]
for line in f.readlines():
   histo.append(map(float, line.split()))

dimension = len(histo[0])

if dimension == 1:
   pylab.hist(histo, bins=int(sys.argv[2]))

   pylab.xlabel("x")
   pylab.ylabel("N(x)")
   pylab.show()

elif dimension == 2:
   # Need to chop up the histo list into two 1D lists
   x=[]
   y=[]
   for val in histo:
      x.append(val[0])
      y.append(val[1])

   # This function is apparently straight out of MatLab
   # I killed most of the options
   pylab.hexbin(x, y, gridsize = int(sys.argv[2]))

   pylab.show()


Which conveniently detects how many dimensions we're histogramming in so you don't need two programmes. This is pretty short for a programme that does what it does.

I hate wasting my time trying to do something that my brain imagined hours ago. I wouldn't say that these techniques are super easy, but once you've learned the tools they are quick to reuse. I'd say they're as important to my work now as knowing C. Got any good tricks? Leave a comment.

Something less nerdy next week I promise.

Wednesday, 7 April 2010

Bootstrapping: errors for dummies

The trouble with science is that you need to do things properly. I'm working on a paper at the moment where we measured some phase diagrams. We've known what the results are for ages now, but because we have to do it properly we have to quantify how certain we are. Yes, that's right. ERRORS!

I've come on a long way with statistics, I've learned to love them, I defy anyone to truly love errors. However, I took a step closer this month after discovering bootstrapping. It's a name that has long confused me, I seem to see it everywhere. It comes from the phrase "to pull yourself up by your boot straps". My old friend says it's "a self-sustaining process that proceeds without external help". We'll see why that's relevant in a moment.

Doing errors "properly"
Calculating errors properly is often a daunting task. You can spend thousands on the software, many people make careers out of it. This will often involve creating a statistical model and all sorts of clever stuff. I really don't have much of a clue about this and, to be honest, I just want a reasonable error bar that doesn't undersell, or oversell, my data. Also, in my case, I have to do quite a bit of arithmetic gymnastics to convert my raw data into a final number so knowing where to start with models is beyond me.

Bootstrapping
I think this is best introduced with an example. Suppose we have measured the heights of ten women and we want to make an estimate of the average height of the population. For the sake of argument our numbers are:

135.8 145.0 160.2 160.9 145.6
156.3 170.5 192.7 174.3138.2
in cm

The mean is 157.95cm, the standard deviation is 16.88cm. Suppose we don't have anything except these numbers. We don't necessarily want to assume a particular model (Normal distribution in this case), we just want to do the best with what we have.

The key step with bootstrapping is to make a new "fake" data set by randomly selecting from the original (allowing duplicates). If the measurements are all independent and randomly distributed etc, then the fake data set can be thought of as an alternate version of the data. It is a data set that you could have taken the first time if you'd happened to get a different sample of people. Each fake set is thought equally likely. So let's make a fake set:

156.3192.7160.9135.8135.8
156.3156.3170.5156.3192.7
Mean=161.36cm, standard deviation = 18.5935

As you can see, there's quite a bit of replication of data. For larger sets it doesn't look quite so weird. On average you keep about 60% of the original data and the rest is replicated. Now let's do this again lots and lots of times (say 10000) using different fake data sets each time, generating different means and standard deviations. We can make a histogram


From this distribution we can estimate the error on the mean to whatever confidence interval we like. If it's 67% (+/- sigma) then we can say that the error on the mean is +/-5.2cm. Incidentally that's nearly what we'd get if we'd assumed a normal distribution and done 16.88/sqrt(10). Strangely the mean of the means is not 157.95 as the input data was, but 160.2. This is interesting because I drew the example data from a normal distribution centred at 160cm.

We can also plot the bootstrapped standard deviation.
What's interesting about this is that the average is std=15.2 whereas the actual standard deviation that I used for the data was 19.5. I guess this is an artefact of the tiny data set. That said 19.5 looks within "error".

So, without making any assumptions about the model we've got a way of getting an uncertainty in measurements where all we have is the raw data. This is where the term bootstrap comes in; the error calculation was a completely internal process. If it all seems a bit too good to be true then you're not alone. It took statisticians a while to accept bootstrapping and I'm sure it's not always appropriate. For me it's all I've got and it's relatively easy.

To make these figures I used a python code that you can get hereData here.

Update: It's been pointed out to me that working out the error on the standard deviation is a bit dodgy. I think that the distribution is interesting - "what standard deviations could I have measured in a sample of 10?" - but perhaps one should be a little careful extrapolating to the population values. Like I said, I'm not a statistician!

Wednesday, 24 March 2010

Even colder still

In a previous post I was talking about how you can use a laser to cool atoms. By tuning the laser to just below the energy of an atomic transition you can selectively kick atoms that are moving towards the laser. If you fire six lasers in (one for each side of the cube) you can selectively kick any atom that is trying to leave the centre. So we've made a trap!

There is a hitch unfortunately. There is a minimum to which one can cool the atoms, once the atoms have an energy that is comparable to the photons coming from the laser then that's about as low as they can go. After all, there's only so much you can cool something by kicking it. We're already pretty cold - around 100 micro Kelvin - we'd like to go a bit colder if we can. Now we're into magnetic traps.

Magnetic Traps

Up to now we've been acting quite aggressively towards the atoms - kicking anything that's moving too quickly. To do better we're going try and round them up where we can control things better. Fortunately there's a neat way to do this. We can make use of an inhomogeneous magnetic field and the Zeeman effect.

If you apply a magnetic field to our gas of atoms then the magnetic dipoles of the atoms tend to line up with the field. Being quantum physics they can only do so in a discrete number of ways. What happens is that the transition that used to be a line splits and shifts into a number of different lines.


If we use a stronger field then the shift is larger. We can finely tune the energy at which our laser will interact with the atoms. So now we do this; if we put a magnetic field that is zero in the middle of the trap and gets bigger as you move away from the centre (you can do this) then we can control how hard we kick the atoms depending where they are. If we do it right then inside the trap we hardly kick them at all and outside trap we kick them back in.

Evaporation

We've managed to confine the atoms in our trap, the final step is to switch off the lasers (to stop all that noisy kicking and recoiling) and to try and use evaporation to get rid of as much energy as possible. It is understandably quite complicated to stop them all flying out once you've switched off the lasers and unfortunately it's at this point I start getting lost! The actual cooling mechanism is nothing more complicated than why your cup of tea goes cold.

After all this we're down the micro Kelvin level - a millionth of a degree above absolute zero! At these sort of temperatures the atoms can undergo a quantum phase transition and become a Bose-Einstein Condensate (BEC). This is a new state of matter, predicted by theory and finally observed in the nineties. As far as I know this is as cold as it gets anywhere in the universe.

Well I think I'm done with cooling things now. It starts off beautifully simple and then gets a bit harder! Needless to say I salute anyone that can actually do this - it's back to simulations for me.

EDIT: I over-link to wikipedia but this is a good page on Magneto-optical traps

Wednesday, 17 March 2010

Ghost Jams

via Lester, a nice video showing ghost jams in action



See New Scientist for more.

The drivers were asked to drive around at a constant speed. For a while this works OK, eventually a ghost jam develops and propagates at the same speed that they're observed in real traffic. I don't know if they tried to apply any external stimulus to see if they could guide it better.

Monday, 22 February 2010

Simulating a molecule with a quantum computer

Simulating a molecule

There's a fairly nifty paper out in PRL on simulating a molecule with a quantum computer. In principle doing calculations on quantum systems will be much faster with quantum computers (when they become a reality) thanks to being able to hold the computer in a superposition of states. These guys have had a bash using an NMR based "computer" - it's pretty fun.

Tuesday, 16 February 2010

Help with twitter name

What do you think this blog's twitter feed should be called?

KineticallyConstrained is a bit long (will hurt the retweets)
KineticCon?
KConstrained?
Kinetically?
KinCon (taken)
TwittersPointlessDontBother?

So many important decisions...