Crossmatching Catalogs using KD-Trees


A lot of sky catalogs are available online such as .
These are detailed descriptions available too with these catalogs, explaining which column represents which variable.
Here i will attempt to explain only one of them:
If you check the "Byte-by-byte Description of file", look at the different lables. This tells what information resides in tha catalog and more importantly in what format. For eg, we can see that the RA coordinate is in Hours, mins, sec format, the values being in 3 different columns. Similarly for DEC coordinate, being in degrees, min, sec format, with an extra column being for the sign. There are several other columns showing information about luminosity, polarization etc.
Lets focus on RA and DEC for now. You can have these same coordinate in different format in some other catalog. So it becomes important to bring them in a common form before matching them.
We can convert these values in degrees and radians or any other format depending on your other catalog. In my case, i used , in which i found the values already in degrees. So we would convert the data from first catalog in degrees using simple formulas. This would mean pre-prosessing the data once. O(n).
I used numpy's loadtxt function to load relevant columns and convert the RA and DEC columns in degrees:
exit: ⌘↩
def hms2dec (d,m,s):
return (15*(d+ m/60 + s/(60*60)));

def dms2dec (d, am, asec):
signMultiplier = 1 if (d>=0 ) else -1;
return (signMultiplier* (abs(d) + am/60 + asec/(60*60)))

def import_catalog ():
cat = np.loadtxt('bss3.dat', usecols=range(1, 7));
final = [];
for i, r in enumerate(cat,1) :
final.append([hms2dec(r[0],r[1],r[2]), dms2dec(r[3],r[4],r[5])]);
return final;
Simple enough. Similarly if you find a catalog in a different format, you can pre-process to bring them in common formats.
A second point to note here is since these coordinates are in radians or degrees, any distance finding function should find the angular distance.
The conslusion is , reading the catalog to note which data and in what format is available is equally important before jumping into crossmatching.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
) instead.