Radio Galaxy Zoo Talk

Radio Galaxy Zoo: host galaxies and radio morphologies derived from visual inspection

  • JeanTate by JeanTate

    Radio Galaxy Zoo: host galaxies and radio morphologies derived from visual inspection, a.k.a. Banfield+ (2105), is now up on arXiv, as arXiv:1507.07272 (link is to the abstract):

    We present results from the first twelve months of operation of Radio Galaxy Zoo, which upon completion will enable visual inspection of over 170,000 radio sources to determine the host galaxy of the radio emission and the radio morphology. Radio Galaxy Zoo uses 1.4GHz radio images from both the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) and the Australia Telescope Large Area Survey (ATLAS) in combination with mid-infrared images at 3.4μm from the {\it Wide-field Infrared Survey Explorer} (WISE) and at 3.6μm from the {\it Spitzer Space Telescope}. We present the early analysis of the WISE mid-infrared colours of the host galaxies. For images in which there is >75% consensus among the Radio Galaxy Zoo cross-identifications, the project participants are as effective as the science experts at identifying the host galaxies. The majority of the identified host galaxies reside in the mid-infrared colour space dominated by elliptical galaxies, quasi-stellar objects (QSOs), and luminous infrared radio galaxies (LIRGs). We also find a distinct population of Radio Galaxy Zoo host galaxies residing in a redder mid-infrared colour space consisting of star-forming galaxies and/or dust-enhanced non star-forming galaxies consistent with a scenario of merger-driven active galactic nuclei (AGN) formation. The completion of the full Radio Galaxy Zoo project will measure the relative populations of these hosts as a function of radio morphology and power while providing an avenue for the identification of rare and extreme radio structures. Currently, we are investigating candidates for radio galaxies with extreme morphologies, such as giant radio galaxies, late-type host galaxies with extended radio emission, and hybrid morphology radio sources.

    This is the subject of a 2 March 2015 GZ blog post, by Ivy, First Radio Galaxy Zoo paper has been submitted!*

    Now it's on arXiv - "Accepted to MNRAS" - we can all read it! 😃

    *oddly, it's not tagged RGZ

    Posted

  • 42jkb by 42jkb scientist, admin

    The pdf is now available on astro-ph as of Tuesday. Enjoy and thanks for all your hard work!

    Posted

  • KWillett by KWillett scientist, admin, translator

    If anyone is reading the paper (or any parts of it), please ask us questions! The science team would be happy to respond.

    Posted

  • DZM by DZM Zooniverse Team

    We need to get this on the Publications page, too, yes?

    @KWillett -- just saw you commenting about the publications page -- are you on this? 😃 Thanks!

    Posted

  • KWillett by KWillett scientist, admin, translator

    Yep - I added it, Darren. I don't know if you have access to that page as well as Grant; if so, it'd be great if you could push it live.

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    OK, looks like I'll be the first to do so ...

    [S3] Although initially designed as a pilot study in preparation for the 7 million complex radio sources from the upcoming EMU survey, we are currently exploring the inclusion of other radio surveys for subsequent phases of this project

    Can you give us some idea of what such surveys you are currently considering? And when might such subsequent phases kick off?

    Further, are you open to suggestions from ordinary zooites as to the former?

    [S3.1] We extracted the radio sources for this project from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST; White et al. 1997; Becker et al. 1995)

    There are quite a few FIRST releases, the latest of which was surely after RGZ was 'locked down' (if not actually launched). Which release(s) is(are) used in RGZ? And are you adding new FIRST sources (i.e. new sources that are in releases later that that used by RGZ), or is the database limited to just those FIRST sources in the "177,218"?

    ETA (rather than start a new post):

    [S3.2] The Radio Galaxy Zoo interface is shown in Fig. 3.

    Something I've wondered about since Day One: how did you chose the color schemes?

    These days, I'm sure there are lots of solidly researched recommendations, which surely come with pointers as to when and where to use various schemes (depending on things like the goals and target participant population). For example, how to avoid creating unnecessary difficulties for the more common kinds of color blindness. Which - if any - of these did the team consult/consider in its development phase?

    ETA2:

    [S3.2] Each Radio Galaxy Zoo subject is only offered once to each participant

    Hmm ... there are plenty of 'overlap' ARG fields, in each of those a "Radio Galaxy Zoo subject" will appear once, but taken over all fields, many will appear twice (and more?). For example, ARG0002356 and ARG0002359. I haven't read far enough yet, but surely these pose particularly challenging analysis questions, don't they?

    Posted

  • JeanTate by JeanTate

    This is worth a post of its own ...

    [S3.2] Each Radio Galaxy Zoo subject is only offered once to each participant

    That's certainly the design goal. However, as many a zooite can attest, during the period covered by the paper it was not met (see the many threads on "repeats"). This does not seem to be covered in the paper; why not?

    ETA: This seems quite, um, incorrect:

    [S4.2] The underlying data reduction relies on independent classifications by distinct users, and no individual subject is inspected by the same user more than once

    Apart from 'repeats', there are the individual subjects (= cataloged radio sources) which appear in more than one ARG field ... (see 'overlaps' above)

    Posted

  • 42jkb by 42jkb scientist, admin

    We are looking into LOFAR, ASKAP, MWA, MeeRKAT (probably forgetting some here) surveys and are in the beginning discussion phases so we don't have a timeline yet for launch. Yes we are open to suggestions from anyone in regards to both the surveys/data and how to improve the project.

    We are using the 14 March 2004 FIRST catalogue and have had no discussions as to adding to our FIRST RGZ sources. We are 40% complete right now so we do have some time before we need to consider this.

    Ah yes the colour schemes on the interface. The red heatmap for the infrared is common for radio astronomy and we stuck with what we know. The contours were originally green and like you said colour blindness needs to be taken into consideration so they were changed to the blue/teal colour. Do you have any colour suggestions for the interface - maybe for RGZ2?

    Posted

  • Dolorous_Edd by Dolorous_Edd

    I have a dumb question

    Why SDSS J123458.46+531851.3 (aka ARG0000fa1 ) was chosen as an example to demonstrate zooites ability to find sources with large angular size? Especially since it seems to be a known source?

    And this

    . The overall radio size of 4.6 Mpc makes it the third-largest radio galaxy known

    Aaa .. first one is J1420--0545 ( published size 4.69 Mpc, not decrowned so far ) ok and the second largest RG 3C236 ( the runner up!, I found something about ~4.5Mpc )

    Am I missed something?

    Posted

  • 42jkb by 42jkb scientist, admin in response to Dolorous Edd's comment.

    Do you mean figure 12? We chose this as an example because it is known and a good example of the ability of RGZ. As for the third largest, I recall this coming down to which redshift was used in the estimate. The conservative side places this source as the third largest I believe.

    Posted

  • Dolorous_Edd by Dolorous_Edd in response to 42jkb's comment.

    Do you mean figure 12?

    Yes

    the third largest

    I imagined the following order: J1420--0545 (published size 4.69 Mpc ) -> 3C 236 ( I saw mention of ~ 4.5Mpc but Machalski et al claim 4.38Mpc ) -> SDSS J1234+5318 ( ~ 4.6Mpc my brain malfunctioned here )

    And of course there is also 2012sngi.confP...1A

    2 examples of 3 GRG > 4 ; one 4.2 other 5.8

    Posted

  • JeanTate by JeanTate

    [S4.1] Table 1 compares the classification distributions between the experts and the volunteers

    [Table 1] Classification distributions of experts vs. volunteers for the control sample of 100 subjects in Radio Galaxy Zoo. Experts and volunteers agreed on the plurality classification for 74 out of 100 galaxies; most disagreements were for cases where the experts are in better agreement than the volunteers or where the image has a complicated, Class C morphology. The plurality classification is the classification with the most classifications

    [S4.2] Classifications are separated into three categories corresponding to the vote fraction and consensus level of the classifiers: [...] Class C: 3 or more experts did not agree on the fundamental radio/IR morphology

    I'm having difficulty understanding what the "Volunteers A B C" refers to. Part of this is the use of "C" as both a 'consensus level' and "the consensus for a subject" (I think this creates unnecessary confusion). Perhaps two examples would help? Say, what is the (A,C) cell with the value 9 ("Agreed" half) and the same one in "Disagreed" (value 2)?

    Posted

  • JeanTate by JeanTate in response to 42jkb's comment.

    We are looking into LOFAR, ASKAP, MWA, MeeRKAT (probably forgetting some here) surveys and are in the beginning discussion phases so we don't have a timeline yet for launch. Yes we are open to suggestions from anyone in regards to both the surveys/data and how to improve the project.

    Thanks! My suggestion would be to prioritize surveys which have comparable (or better!) resolution to FIRST, and which are also within the SDSS footprint (or SkyMapper or Pan-STARRS), assuming its results are public. Then ones which operate at different frequencies. Why? Because:

    • we zooites can make more independent discoveries if we can check/follow-up using an optical survey like SDSS
    • a resolution worse than FIRST/WISE complicates matching radio sources to IR ones
    • a different frequency permits crude analyses of spectral slope (e.g. it can help in distinguishing synchrotron radiation-dominated sources from others)

    We are using the 14 March 2004 FIRST catalogue

    Yeah, I see that the paper actually says this: [S3.1.1] "The majority of the data in Radio Galaxy Zoo comes from the 1.4GHz FIRST survey (catalogue version 14 March 2004)" My big Oops!

    Ah yes the colour schemes on the interface. The red heatmap for the infrared is common for radio astronomy and we stuck with what we know. The contours were originally green and like you said colour blindness needs to be taken into consideration so they were changed to the blue/teal colour. Do you have any colour suggestions for the interface - maybe for RGZ2?

    Personally, I like 'heat' and the general idea of black->white reflecting 'noise-floor luminosity'->'saturation' (a rather arbitrary threshold), as RGZ uses for the radio; but I think one or two of the colormaps used in SAO DS9 might be better. However, I'm pretty sure a great deal of work has gone into what schemes are 'best' for this particular application. I see that Kelly Borden and Laura Whyte are co-authors; given their deep experience in education, perhaps at least one of them can quickly access such research?

    Posted

  • JeanTate by JeanTate in response to 42jkb's comment.

    [S3.2] We record the nearest IR source to the participants’ clicks as the host galaxy. Such an identification requires fewer independent classifications, and so these images are retired from the interface after 5 classifications.

    Doesn't this assume that it's not a coincidence? For example, perhaps the solo radio component is a distant lobe, which, by coincidence, happens to be close (within, say, 10") to a completely unrelated IR source? The 5 classifications also increases the chance that none of the super-zooites (the 'top 100', say) - those who are much more likely to check for such 'coincidence' possibilities - will get such sources to classify.

    [S3.3] Participants who choose not to log into the system still have their classifications recorded; in the absence of other information, we use their IP addresses as substitute IDs. Anonymous users have generated 26.8% of the total classifications to date.

    I'm sure I am not alone in sometimes having classified but mostly do so while logged in! What analyses have you done to see how many of the 'anonymous users' have IP addresses which correspond to registered/logged-on users?

    Posted

  • JeanTate by JeanTate in response to 42jkb's comment.

    [S3.3] On May 1, 2015, Radio Galaxy Zoo had over 6900 registered volunteers and 1,155,000 classifications.

    [S4.2] The validity of the single classification assumption is straightforward to verify for the 883,494 classifications (73.2 per cent) that come from volunteers who are logged in to the Radio Galaxy Zoo interface.

    The latter seems to suggest that the total number of classifications used is ~1,207,000 (883,494/0.732). Which is correct?

    [S5.1] From the 53,229 images with completed classifications to date

    Is 'to date' = 'May 1, 2015'? What is the breakdown of this 53,229 by 'retired after 5 classifications' and 'retired after 20'?

    [S5.1] We matched 41,568 (78 per cent) of our radio sources to a WISE source within a radius of 6′′.

    As 78% ~ 41,568/53,229, it seems that this implies very close to one 'radio source' per 'image' ... but many ARG fields ('images') contain more than one radio source! Can a Science Team member clarify please?

    Posted

  • JeanTate by JeanTate in response to 42jkb's comment.

    Re Figure 9: I found this particularly interesting, because the WISE image, at ~the position of the host, contains a 'blended source', which is likely at least two separate galaxies (SDSS is down, so I can't check this right now). And the KDE distribution of the '181/217' zooites' consensus classification seems to reflect some uncertainty as to which part of the blend is the host; however, the experts' one seems to have no such uncertainty (it's the W-most part of the blend).

    Also, as the blend is >6" in size, the WISE-matching should have given an ambiguous result (for the zooites' consensus classification), shouldn't it? If so, how did the analysis in S5.1 ("WISE colours") deal with this?

    Posted

  • JeanTate by JeanTate in response to 42jkb's comment.

    [S5.1] the mid-infrared colour-colour plot appears to be a reasonable discriminator for many types of AGN [...] corresponding to infrared colours typically associated with QSOs and Seyfert galaxies

    From my reading of the literature, "QSOs and Seyfert galaxies" are all AGNs (or contain AGNs), and that there are many - sometimes conflicting - definitions of "QSO" ("Seyfert" seems somewhat more narrowly, and consistently, defined). What definition(s) are being used in this part of the paper?

    Posted

  • JeanTate by JeanTate in response to 42jkb's comment.

    [S5.1] To date, four examples of spiral galaxies hosting a double-lobed radio source have been discovered (Morganti et al. 2011; Hota et al. 2011; Bagchi et al. 2014; Mao et al. 2015)

    Hmm ... per Table 1 in Morganti et al. 2011, seven "powerful radio radio sources hosted by disk galaxies" had (in 2011) "been studied in detail"! True, at least one of these (NGC 612) is a lenticular (i.e. not a spiral galaxy) and/or not hosting double-lobed radio sources; however, at least one that is not among the four cited certainly is ... 0313-192 (Ledlow et al. 2001); Keel et al. (2006)), it's an edge-on Sb with very prominent double lobes.

    [S6] Currently, the projects being facilitated by RadioTalk include: [...] (2) the search for double-lobed radio sources associated with spiral host galaxies (led by Mao)

    Really?

    Posted

  • JeanTate by JeanTate

    I've no particular part of the paper to refer to re this post ...

    RGZ Talk's (flawed) Search returns 2,151 hits on "#overedge". At the 0-th level, that's likely to be a pointer to how many subjects/images/fields/radio sources have been flagged as having/being over edge. The giant identified in Figure 12 would certainly be one of these (actually possibly three or more), as would many of the other giants that HAndernach is leading the search for.

    Many, perhaps most, of these can be identified only after the classification phase is over; they'll show up only in RGZ Talk comments and/or Discussion posts. Given that it's likely only a minority of zooites-who-classify write RGZ Talk comments/posts*, and that many of those who do do not, in effect, search for possible overedge sources/components much less write about them, ~2,000 may be a serious underestimate.

    I couldn't find anything specific on this in the paper (perhaps I missed it?); in any case, how is this being taken account of in the analyses? For example, how does it impact the findings reported in Section 5.1?

    *none of the "Anonymous users" can do this, for example, and there are hundreds? thousands?? of these

    Posted

  • 42jkb by 42jkb scientist, admin in response to JeanTate's comment.

    Still taking time to digest all the comments here. I'm still working on it and will try to answer these soon.

    Posted

  • KWillett by KWillett scientist, admin, translator in response to JeanTate's comment.

    It's not a classification of the volunteers; this is of the science team, who did classifications on a smaller set of subjects that are also in the general pool.

    I know that the different variable names can be confusing; I might try to come up with a suggestion for a different variable for the consensus. Any Greek letters which aren't taken yet?

    Posted

  • KWillett by KWillett scientist, admin, translator in response to JeanTate's comment.

    Doesn't this assume that it's not a coincidence?

    We assume every click isn't a coincidence (otherwise, automated matching to the nearest IR source would be the approach to use). But we're taking the nearest IR source (as identified in WISE) to where the users clicked on 'IR source', not the location of the radio lobe(s). If there's no IR source nearby, users should be clicking 'No Source' rather than on a position in a blank field.

    What analyses have you done to see how many of the 'anonymous users' have IP addresses which correspond to registered/logged-on users?

    None so far, but that's driven by the fact that the weighting algorithms for specific users are in an early stage. Even as those improve, I have reservations about doing that; there are lots of ways for a particular user to have several different IP addresses, or vice versa: we have hundreds of different people coming in from a single address in classroom environments, for example. There are also some privacy concerns in matching personal info to IP addresses, which I don't really have the authority or comfort level to deal with.

    Posted

  • KWillett by KWillett scientist, admin, translator in response to JeanTate's comment.

    The latter seems to suggest that the total number of classifications used is ~1,207,000 (883,494/0.732). Which is correct?

    Total classifications in that timeframe from midnight to midnight were 1,204,848, so the second number is correct (within rounding errors). The previous number might have removed tutorial subject classifications, or come from slightly different settings on the time of day.

    What is the breakdown of this 53,229 by 'retired after 5 classifications' and 'retired after 20'?

    As of today (30 Jul 2015), 24,640 retired subjects are single-component with 5 classifications, and 37,778 had 20 classifications. An additional 8,191 subjects were "caught in the middle" when we switched from a retirement threshold; they were single-component images with at least 5 classifications already (but less than 20).

    As 78% ~ 41,568/53,229, it seems that this implies very close to one 'radio source' per 'image' ... but many ARG fields ('images') contain more than one radio source!

    I think you're right, Jean - a more accurate statement would be "Out of all the consensus classifications of radio sources in the completed RGZ images 41,568 were matched to a WISE counterpart within 6 arcsec." It's not obvious what the denominator should be in this case - the total number of "radio sources" isn't well-defined, since there are many images where the RGZ users didn't come to a consensus about how many sources there were per image. We probably should not have expressed it as a percentage, and just stated the total number of matched sources.

    Posted

  • KWillett by KWillett scientist, admin, translator in response to JeanTate's comment.

    If so, how did the analysis in S5.1 ("WISE colours") deal with this?

    The matching technique includes all possible WISE sources within the 6" radius. In the case of multiple sources, we choose the closest match to the reported center. In the case of Figure 9, that is the WISE object on the right.

    Posted

  • KWillett by KWillett scientist, admin, translator in response to JeanTate's comment.

    ... definitions of "QSO" ("Seyfert" seems somewhat more narrowly, and consistently, defined). What definition(s) are being used in this part of the paper?

    We don't adopt a strict definition; the colors (which come from the non-thermal part of the SED, plus heated dust) can apply to either category. One really needs spectroscopy to distinguish between the various types of AGN, particularly with regard to the broad lines. Diagnostics like WISE colors are very useful for broad categorization (AGN vs galaxy), but a single color isn't sufficient to distinguish between the sub-types.

    The full catalog is matching against SDSS optical spectroscopy; for objects that are matched, we'll have the data to split these into the various AGN categories you mention.

    Posted

  • KWillett by KWillett scientist, admin, translator in response to JeanTate's comment.

    how is [overedge] being taken account of in the analyses?

    Good question. I don't think it'll be part of the main catalog, which relies on the clicks and automated aggregation, rather than the tags. @HAndernach is leading the collation of these objects, which are part of his particular science interests. I suspect it will be a separate paper than the full catalog release; we'll try to estimate our level of completeness, but it will probably be low given the number of users. One possibility, if we think it'll be scientifically useful, might be a new Panoptes project where we add "overedge" as a specific response.

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks Kyle.

    My original question is still open, I'm afraid; can you - or someone else - please walk through the cases I mentioned?

    Here they are again (bold added):


    [S4.1] Table 1 compares the classification distributions between the experts and the volunteers

    [Table 1] Classification distributions of experts vs. volunteers for the control sample of 100 subjects in Radio Galaxy Zoo. Experts and volunteers agreed on the plurality classification for 74 out of 100 galaxies; most disagreements were for cases where the experts are in better agreement than the volunteers or where the image has a complicated, Class C morphology. The plurality classification is the classification with the most classifications

    [S4.2] Classifications are separated into three categories corresponding to the vote fraction and consensus level of the classifiers: [...] Class C: 3 or more experts did not agree on the fundamental radio/IR morphology

    I'm having difficulty understanding what the "Volunteers A B C" refers to. Part of this is the use of "C" as both a 'consensus level' and "the consensus for a subject" (I think this creates unnecessary confusion). Perhaps two examples would help? Say, what is the (A,C) cell with the value 9 ("Agreed" half) and the same one in "Disagreed" (value 2)?

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Two parts; this is the first:

    [Kyle] We assume every click isn't a coincidence (otherwise, automated matching to the nearest IR source would be the approach to use). But we're taking the nearest IR source (as identified in WISE) to where the users clicked on 'IR source', not the location of the radio lobe(s). If there's no IR source nearby, users should be clicking 'No Source' rather than on a position in a blank field.

    Thanks! And thanks for the other responses too (I'm looking forward to responses to my as-yet-unanswered questions too).

    My question was not clear, sorry (original post below).

    What I'm asking is likely for a later paper; I'm asking about 'false positives' (i.e. zooite identifications of compact radio sources associated IR sources, which are physically unreal/coincidences). These will certainly happen/are guaranteed to happen; I guess I'm asking about how you'll go about estimating how common this is, and what you might consider doing to identify these.

    For example, position offsets (i.e. marked IR position vs centroid of the FIRST contours): assuming no physically real offset (an assumption which can be tested), the distribution of offsets should/may be different, for those due to coincidences vs the real associations. A complication: the apparent size of a compact owes much to its flux ('brighter' sources appear 'bigger'), so perhaps the distribution of offsets reflects this too?

    I'm also asking about whether having fewer classifications for compacts (5) than for the rest (20) introduces a 'false positive' bias for the compacts. Especially as - so far - no analyses have been done, incorporating zooites' comments (I'm surely not alone in discovering that, sometimes, my classification is wrong, once I have examined the FIRST and SDSS cutouts; I'll usually write this up, and so will many other zooites, in the form of a comment).


    [S3.2] We record the nearest IR source to the participants’ clicks as the host galaxy. Such an identification requires fewer independent classifications, and so these images are retired from the interface after 5 classifications.

    Doesn't this assume that it's not a coincidence? For example, perhaps the solo radio component is a distant lobe, which, by coincidence, happens to be close (within, say, 10") to a completely unrelated IR source? The 5 classifications also increases the chance that none of the super-zooites (the 'top 100', say) - those who are much more likely to check for such 'coincidence' possibilities - will get such sources to classify.

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks.

    My follow-up question is similar to my last, how do you (plan to) deal with false positives?

    I'm sure my experience is far from unique: in checking SDSS after classifying, I've found many cases where the closer WISE object is obviously not the host (e.g. it's a foreground star, and a more distant WISE object is an early-type galaxy). But of course you can't tell the difference, in the WISE 'heat' image).

    Posted

  • JeanTate by JeanTate in response to KWillett's comment.

    Thanks.

    @HAndernach is leading the collation of these objects, which are part of his particular science interests.

    And from what I can see, he's doing a sterling job!

    However, as I understand it, his interest is focused on giants ... but not all 'overedges' are giants, nor even separate (detached) lobes. Many, for example, are surely simply large lobes (or plumes or ...), only parts of which are in within the relevant ARG field.

    One possibility, if we think it'll be scientifically useful, might be a new Panoptes project where we add "overedge" as a specific response.

    And another may be a new project focused explicitly on 'overedge' classifications (in the Comments and Discussions, and Collections), using much bigger fields perhaps ... 😃

    Posted

  • zutopian by zutopian

    Another Talk discussion related to the paper, which was officially published on 7th Sept.:

    Press release: Volunteer black hole hunters as good as experts
    http://radiotalk.galaxyzoo.org/#/boards/BRG0000008/discussions/DRG0000ctf

    Posted

  • zutopian by zutopian

    There are following two blog posts.:

    First Radio Galaxy Zoo paper has been submitted! dated 2 March 2015
    http://blog.galaxyzoo.org/2015/03/02/first-radio-galaxy-zoo-paper-has-been-submitted/

    First Radio Galaxy Zoo paper has been accepted! dated 28 July 2015
    http://blog.galaxyzoo.org/2015/07/28/first-radio-galaxy-zoo-paper-has-been-accepted/

    Posted