Radio Galaxy Zoo Talk

Radio Galaxy Zoo: Machine learning for radio source host galaxy cross-identification

  • JeanTate by JeanTate

    by M. J. Alger, J. K. Banfield, C. S. Ong, L. Rudnick, O. I. Wong, C. Wolf, H. Andernach, R. P. Norris, and S. S. Shabala. Up on astro-ph today (link to abstract):

    We consider the problem of determining the host galaxies of radio sources by cross-identification. This has traditionally been done manually, which will be intractable for wide-area radio surveys like the Evolutionary Map of the Universe (EMU). Automated cross-identification will be critical for these future surveys, and machine learning may provide the tools to develop such methods. We apply a standard approach from computer vision to cross-identification, introducing one possible way of automating this problem, and explore the pros and cons of this approach. We apply our method to the 1.4 GHz Australian Telescope Large Area Survey (ATLAS) observations of the Chandra Deep Field South (CDFS) and the ESO Large Area ISO Survey South 1 (ELAIS-S1) fields by cross-identifying them with the Spitzer Wide-area Infrared Extragalactic (SWIRE) survey. We train our method with two sets of data: expert cross-identifications of CDFS from the initial ATLAS data release and crowdsourced cross-identifications of CDFS from Radio Galaxy Zoo. We found that a simple strategy of cross-identifying a radio component with the nearest galaxy performs comparably to our more complex methods, though our estimated best-case performance is near 100 per cent. ATLAS contains 87 complex radio sources that have been cross-identified by experts, so there are not enough complex examples to learn how to cross-identify them accurately. Much larger datasets are therefore required for training methods like ours. We also show that training our method on Radio Galaxy Zoo cross-identifications gives comparable results to training on expert cross-identifications, demonstrating the value of crowdsourced training data.

    Cool, eh!

    I particularly like Figure 7 (and the accompanying text), "Cumulative number of radio components (N) in the expert (Norris) and Radio Galaxy Zoo (RGZ) training sets with different signal-to-noise ratios (SNR)." 😃

    Posted

  • sisifolibre by sisifolibre

    " We also show that training our method on Radio Galaxy Zoo cross-identifications gives comparable results to training on expert cross-identifications, demonstrating the value of crowdsourced training data."

    This encourage me a lot 😃

    Posted

  • JeanTate by JeanTate in response to sisifolibre's comment.

    It's nice to have external validation of the (collective) accuracy of our work, isn't it? 😃

    Figure 7 may have an even nicer message: the real number of radio components vs SNR may be somewhere between the blue (Norris) and dotted red (us, collectively) lines; if so, we were too optimistic and Norris too cautious.

    Why do I think that? This is not a plot of N vs S (radio luminosity, or brightness), but it may be a close proxy; if so, then the distribution of radio sources, by apparent luminosity (if it were optical, we'd say magnitude), is one of the things radio astronomers used, decades ago, to show that the radio sources are predominantly extra-galactic and ~what you'd expect for uniformly distributed (throughout the universe) AGNs, out to a quite large redshift. Maybe ATLAS was deep/sensitive enough that the curve should flatten at the faint end?

    Posted

  • sisifolibre by sisifolibre

    Thanks Jean, I alwais learn with your comments.

    This is one of my favorite topics in this project: evolution and distribution of AGN are "clues" about the cosmological evolution

    Posted