Radio Galaxy Zoo Talk

Anyone getting any repeat images?

  • JeanTate by JeanTate

    Project Star Date, which began not long after RGZ, is reporting instances of repeat images: a zooite getting the same image to classify, twice or even more often (see Repeat images).

    Two days' ago, zooite TED91 wrote (in that SD Talk thread), "It happened also in other projects as " The Andromeda project " or "Spacewarps".", and earlier today MODERATOR jules confirmed at least the AP part (same thread):

    As far as I remember the Andromeda Project had a brief glitch concerning how the site requested images. I think images were sent to classifiers in batches of 5 - another batch being sent when the 5th one was classified. However, for a brief spell the system sent a new batch after each classification including those in the previous batch not yet classified resulting in duplicates. That may not be the case here, however, but I thought I'd mention it just in case it helps.

    I think it'd be more difficult for us here in RGZ to notice if there are repeat images, unless one of the repeats is a particularly striking image.

    Needless to say, repeats are not supposed to happen, in today's Zooniverse projects (things were different in the original Galaxy Zoo). As Arfon makes clear in this Zooniverse blog post: How the Zooniverse Works: Keeping It Personal

    Posted

  • WizardHowl by WizardHowl

    I've just had a repeat and only noticed because I had made a comment on it two images previously. The image was:
    http://radiotalk.galaxyzoo.org/#/subjects/ARG0002ycm

    In RGZ we get images based on radio sources, so for objects like doublelobes and triples, my understanding is we might get these multiple times but centred on the different radio sources. In this case it was more obviously a repeat because the image was centred on the same compact source which is the only strong source in the field of view, rather than the very faint diffuse emission at the top-right.

    Rather than repeat a classification, I shall log out and back in again later.

    Posted

  • DocR by DocR scientist

    I will make sure the software gurus are aware.

    Posted

  • bumishness by bumishness admin

    Dumb question, but were you logged in under the same account on the site both times you received the same subject? As Jean states, you shouldn't be getting duplicates. If you are, something is definitely wrong somewhere!

    Posted

  • JeanTate by JeanTate in response to bumishness's comment.

    I've not knowingly had repeats (except for what may have been beta images mistakenly added to the main server), but over in StarDate (M83), they are rather frequent. Just a while ago, in their Talk, SCIENTIST whitmore wrote:

    We are able to look at data dumps and see what is happening. Only question is whether we can easily fix it or whether this would change the overall performance of the tool balanced against how many people are getting lots of repeats. The main reason you are seeing so many is that you are doing so many classifications, which is great !

    As far as I know, every Zooniverse project has had the same User-Subject-Classifications architecture for quite a while now, so repeats turning up in one project almost certainly means they can happen in all Zooniverse projects. Might have some serious implications for analysis of the classifications in the clicks databases ...

    Posted

  • bumishness by bumishness admin in response to JeanTate's comment.

    As far as I know, every Zooniverse project has had the same
    User-Subject-Classifications architecture for quite a while now, so
    repeats turning up in one project almost certainly means they can
    happen in all Zooniverse projects. Might have some serious
    implications for analysis of the classifications in the clicks
    databases ...

    Correct, which is why we take this kind of problem very seriously. Our entire model depends upon each user only seeing each subject once. There was a very brief moment two weeks ago where this was possible due to how we structured our systems in response to the huge spike in traffic we got. But under normal operating procedures, this definitely should not happen.

    We are investigating further and I'll update this with anything we find.

    Posted

  • bumishness by bumishness admin

    After some investigating, we are pretty sure this is an isolated incident to the m83 project. m83 definitely is having problems with repeats, but we've been unable to find any other systemic problem with any other project.

    There is always potential for the isolated inciden, but this should be in the very low, hundredths of a percent range.

    WizardHowl, I honestly say I can't explain why you received a dupe. If it's a consolation to the community, we've so far received nearly zero repeats from over 400k classifications.

    Posted

  • JeanTate by JeanTate in response to bumishness's comment.

    That is very welcome news; thanks bumishness! 😃

    we've so far received nearly zero repeats from over 400k classifications.

    This refers to RGZ, right? If so, does that include the 'special' repeats already reported?

    Also, can you confirm what jules wrote, over in SD_M83 Talk ("As far as I remember the Andromeda Project had a brief glitch concerning how the site requested images. I think images were sent to classifiers in batches of 5 - another batch being sent when the 5th one was classified. However, for a brief spell the system sent a new batch after each classification including those in the previous batch not yet classified resulting in duplicates.")?

    And did something similar happen SpaceWarps (as TED91 said)?

    Posted

  • JeanTate by JeanTate

    I just got one, ARG0001oo1

    It was back-to-back; I'd just written a comment, clicked Next, and up came the same field/image!

    Posted

  • WizardHowl by WizardHowl

    I've had two repeat images (that I've noticed) in the last two days, yesterday it was http://radiotalk.galaxyzoo.org/#/subjects/ARG00019n4 and today I just got http://radiotalk.galaxyzoo.org/#/subjects/ARG0000r4u

    This is worrying to have multiple repeats in such a short space of time - has anything changed this week?

    If nothing has changed, I have a couple of ideas how this might happen. I'm going to use a little pseudocode to explain:

    I expect somewhere in Galaxy Zoo there's a piece of code that does something like this:

    repeat
    {

    objID2analyse = GenerateRandomObjID()

    }

    while{objID2analyse has a match in [arrayofobjIDsClassified]}

    A seemingly straightforward loop like this can turn up a duplicate ID in a couple of ways:

    Firstly, if the size of the [arrayofobjIDsClassified] is too small, then new additions will either overflow, causing a memory leak/crash, or overwrite an existing entry. For the number of objects I've classified, the number storing the size of the array will need at least 14 bits, so if this number is an unsigned 16-bit integer this will be ok, for now (but maybe some people could exceed this eventually).

    Secondly, the loop becomes infinite in the case where a user has classified all objects; therefore a programmer would typically keep a count of the number of times the program passes through the loop, perhaps like this:

    loopcounter++

    if(loopcounter greater than safetymargin) then exitloop

    I'm not sure how many objects are in the RGZ database but if a user has classified 10% of them and the value of safetymargin is just 3, then there's a 0.1% chance for each image being a duplicate. I would expect this number to be larger and ideally to generate a warning of some kind upon being triggered so hopefully if something like this is happening it would be traceable.

    I hope this is helpful but I don't really know how many duplicates I've encountered - I only noticed these two because I had either commented on them or added them to my favourites when classifying. I have passed the 1000 images in my favourites and anything more being added doesn't change that number although I don't expect that to have an impact on the process of selecting a random image to classify. I realise debugging with large datasets can be both difficult and time-consuming but both the cases I outline above arise as a result of a user having classified a larger number of images than anticipated; whether this is really the case here I don't know but at least it's a starting point.

    Posted

  • JeanTate by JeanTate

    I just got another one, ARG0001wxp This time it was just over two months between the two instances. And there's no possibility whatsoever that it's two different radio sources close to each other in the same field.

    As I now comment on every RGZ object I classify (well, I aim to do that, but sometimes - for small, faint compact sources, for example - I don't), I can tell when I've just finished a repeat. So the rate of repeats - for me, recently - is extremely high, perhaps even ~1%.

    Houston, we have a problem. 😮

    Posted

  • WizardHowl by WizardHowl in response to JeanTate's comment.

    Another repeat, again I noticed it only because it was listed in my favourites without my having clicked on it this time: http://radiotalk.galaxyzoo.org/#/subjects/ARG00029to

    Posted

  • WizardHowl by WizardHowl in response to WizardHowl's comment.

    the same image repeated again: http://radiotalk.galaxyzoo.org/#/subjects/ARG00029to

    Posted

  • JeanTate by JeanTate

    I've now posted some comments, in a Quench project thread, that readers of this RGZ thread may find interesting. It's on page 7 of the Quench project: a proposal aimed at reviving and completing it. thread, here (link should take you directly to the relevant post).

    Posted

  • JeanTate by JeanTate

    I have Grant Miller's* permission to quote from an email he sent me ("this" refers to the on-going existence of 'repeats', particularly in RGZ):

    I just got this response from our project manager:

    "We made a slight change to the zoo library over the weekend which should completely eliminate any chance of this happening. Still need to push it to sites though."

    He seems confident there will be no more repeats once this code is implemented, but please do keep an eye out and let me know if you see anything suspicious.

    *Grant is the Zooniverse's Community Manager

    Posted

  • WizardHowl by WizardHowl

    another repeat, http://radiotalk.galaxyzoo.org/#/subjects/ARG0000p1g came up twice with just one image in between the two occurrences

    Posted

  • WizardHowl by WizardHowl

    http://radiotalk.galaxyzoo.org/#/subjects/ARG0003nns just appeared consecutively

    Posted

  • JeanTate by JeanTate in response to WizardHowl's comment.

    Have you had a look at the Radio-Galaxy-Zoo part of GitHub (direct link)?

    You can sign up, browse the issues (8 currently open, none of which have to do with repeats), even check out the contributors (and their contributions).

    Posted

  • JeanTate by JeanTate

    Zooite 1001G may have had a repeat, within the last few minutes: ARG000309f

    Posted

  • ivywong by ivywong scientist, admin

    Hi all,

    Just a quick note to thank you all for your patience. We have been and still are investigating the root of this problem. We have implemented a few updates which should reduce the number of duplicates but these updates do not solved all the duplicates. We now think that the problem lies with the caching engine not syncing reliably all the time but we are still trying to understand this properly in order to fix it. So just as a heads-up, we are still doing our best to solve this. Thanks again for your patience and all your help.

    sincerely,
    Ivy

    Posted

  • JeanTate by JeanTate in response to ivywong's comment.

    Thanks Ivy, this sort of feedback/regular updating is much appreciated. 😃

    Posted

  • WizardHowl by WizardHowl

    Agreed, thanks for the explanation.

    Ironically I now get a repeat, ARG0001xml appeared successively 😦

    Posted

  • ivywong by ivywong scientist, admin

    @WizardHowl: when you experience the repeats, are you seeing all of the repeats on a single day? or is it spread across several days? I am just thinking out loud but the next time you start to see a repeat, do you mind logging out of RGZ and waiting a few hours before logging back on? If this helps, this could be a temporary solution until the new developer patches come in. I am very sorry again. This does seem to be a very difficult bug.

    Posted

  • WizardHowl by WizardHowl in response to ivywong's comment.

    The repeats I have experienced have been rare and not clustered, however I have mostly already been doing as you suggest and logging out for at least an hour when I encounter one.

    I suspect there may be two separate bugs, as there are two categories of repeats: those where a very recently-encountered image repeats (immediately previous or just a couple of images separating them) and those where there is a repeat of an image classified a long time ago, weeks or months apart. I have not encountered the second category for some time (~2 months, as evidenced by the comments in this thread), so I am hoping this has already been fixed, although this is also the hardest to notice as I need to have previously tagged it as a favourite or commented after classifying.

    Posted

  • ivywong by ivywong scientist, admin

    Thanks heaps @WizardHowl for the update. Relieved to know that repeats that you've encountered do not dominate your work here on RGZ. I suspect the first type of repeats you describe are what we are trying to investigate here. If you do notice the second type again, it'd be good to let us know because I hope that this has been fixed by the recent patches applied a month ago too.

    cheers,
    Ivy

    Posted

  • JeanTate by JeanTate

    Although the number of objects I have classified is quite small, and the rate at which I classify is modest (shall we say), I too have good news: I have not had a 'repeat' for quite a while (I reported the last one on March 10 2014). Fingers crossed.

    If I do get one again, I'll certainly be reporting it!

    Posted

  • ivywong by ivywong scientist, admin

    That is good news Jean! Thanks for that 😃

    Posted

  • JeanTate by JeanTate

    No, I haven't come across any repeats today ... 😃

    I just wanted to note that it seems at least one other zooite - Tobend - also got repeats (hopefully just history now) ... check out Collection Deja vu ("Possible repeated images"), started some time around February 10 2014. For every zooite who noticed, how many actually commented, in one way or another? For every zooite who noticed, how many didn't?

    Do we know, from a detailed analysis, just how common repeats were, back in the 'bad old days'?

    Posted

  • ivywong by ivywong scientist, admin

    Pre-April fixes, we saw repeat spikes for certain individuals on "busy days" (days when we had some media coverage or publicity etc when the online traffic also spiked). So it can be very bad (>10) for a few while for most others, it's business as usual. Post-April fixes, we no longer see this (phew) and the number of repeats are now very low. That is why we are still digging into the problem..

    Posted

  • JeanTate by JeanTate in response to ivywong's comment.

    Repeat images (again)? is a new thread, started by KWillett, as a continuation of this one (I guess).

    Two repeats have been reported, so far:

    • xDocR got 'the tutorial image' ARG0003r15 twice
    • antikodon got ARG00026zh to classify a second time

    Posted

  • enno.middelberg by enno.middelberg scientist, translator

    Another thought: not getting the same image repeatedly also implies that the object is not listed twice in the input catalog. Are we sure that FIRST is 100.000% unique? However, I admit that when one gets the same image twice in direct succession a software bug is more likely, but I still wanted to raise this point.

    Posted

  • ivywong by ivywong scientist, admin

    Hi Enno,

    Yeah, we have checked that apart from the legitimate cases of repeats from the FIRST catalogue that we could indeed get repeats off the queue server. Things have improved but we are still not 100% repeat-proof yet....

    Ivy

    Posted

  • JeanTate by JeanTate

    antikodon reports getting ARG00026zh again, second time in two days! 😮 This is also the ARG repeat from June 3 ...

    Posted

  • WizardHowl by WizardHowl

    What happens if someone has classified all the active images? This happened in Disk Detectives and they have recently increased the number of active images there to deal with it.

    Posted

  • 42jkb by 42jkb scientist, admin

    We have over 170,000 images in RGZ so I'm not too sure how long it will take for one to classify them all. Maybe by then we will have some more data from other radio surveys.

    JeanTate - we are working on the repeat issue and hope to have this solved soon. Thanks for keeping us posted!

    Posted

  • 42jkb by 42jkb scientist, admin in response to WizardHowl's comment.

    Ah I was wrong here. There are only 20,000 images active at one time. The team is working on increasing this number.

    Posted

  • JeanTate by JeanTate

    They're back! 😦

    WizardHowl says "REPEAT 2nd in just the last few days, has anything changed?", ARG0001tye

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    Now it's my turn: ARG0001wbi 😦

    August 27 2014 11:39 AM and September 14 2014 1:44 PM

    Posted

  • ivywong by ivywong scientist, admin

    Thanks. Noted. Will keep an eye out for these.

    Posted

  • JeanTate by JeanTate in response to ivywong's comment.

    I think I may be able to make an estimate of how common repeats are, at least for me, at least since June (when 42jkb wrote "we are working on the repeat issue and hope to have this solved soon.") Would that be of interest, or help?

    Posted

  • 42jkb by 42jkb scientist, admin in response to JeanTate's comment.

    Yes it would be helpful to have some stats on this as we thought we had this fixed. Thanks!

    Posted

  • JeanTate by JeanTate in response to 42jkb's comment.

    First go: June to September is ~90 days. Assume I classified RGZ objects at a rate of ~seven per day* in that period ... ~630. One repeat, so repeats happen at a rate of ~1 per 1,000 RGZ classifications.

    Here's something odd: within the last few days - possibly Saturday or Sunday - I got the Tutorial instead of a new image (I think someone else said this had happened to them too). I can't say for certain that I had not inadvertently clicked a button other than Next (but I don't really know how to get to the Tutorial anyway).

    *my aim was to do 10 per day, and to comment on all of them (so I would know, for certain, when I had a repeat). Didn't happen; my average was more like 10 per day, two days out of three (I could do a bit of work to constrain this better, but I doubt it'll make much difference to my estimate). Also, I did not always comment (various reasons), but I estimate I did for 95+% of completed classifications.

    Posted

  • ivywong by ivywong scientist, admin

    I think we fixed the majority of the cases but there is a quirk with one of the pieces of software that were not made by the developers and the bug fixes for that is not obvious. Perhaps @bumishness can comment on that?

    Posted

  • JeanTate by JeanTate

    Repeat reported by WizardHowl: ARG0000510

    Posted