Are STAP Stem Cell Nature Papers Compromised?

Are the Nature STAP stem cell papers compromised? Could Nature‘s investigation conclude with something as serious as retraction or rather a mild slap on the wrist? Somewhere in between?

Why is this question even being raised?

A large number of potentially serious issues with these papers have been identified and discussed on PubPeer here and here, as well as elsewhere. These have included confirmed data duplication in the Nature letter.

A potential resolution to these purported problems could be achieved if the authors and Nature publicly release unmodified, original versions of the images and data in these two papers.

I’m not holding my breath on that one, but it seems like a path to clarity.

Nature‘s own policies on images are relevant here and it seems that legitimate questions can be at the very least raised as to whether these papers violated aspects of these policies in quite a few ways.

Post-publication reviewers have found numerous problems even just with Figure 1 of the Obokata, et al. Nature article. Let’s use this as an example. We’ll go through just that one figure step-by-step. I’ve posted the figure legend at the bottom of this post for reference.

  • Figure 1a is their experimental model.
  • Figure 1b reports that an Oct4-GFP reporter comes on specifically after low-pH treatment in spheres of cells in suspension. Concerns have been raised that the green signal may have been enhanced and/or could be autofluorescence, and the 3 panels may not have been captured and/or manipulated in an equivalent manner to each other.
  • Figure 1c is FACS analysis of the Oct4-GFP reporter turning on at day 7. The upper 2 panels are low-pH treated cells. By day 7 this data would suggest that basically 100% of cells became GFP+, a stunning reported finding.
  • Figure 1d is a quantification of viable cells by relative GFP status reporting a conversion to a GFP+ state in about 50% of cells by d7. This would seem to conflict with Figure 1c.
  • Figure 1e shows cells growing on a plate and some are GFP+. The figure panels are of such low magnification that there’s not much to be sure about here. Like the rest of the data in this figure overall, there are no positive controls.
  • Fig. 1g rectangles
  • Figure 1f, after simply increasing the brightness, looks very unusual (see above). First, it is of very poor quality and has extreme pixelation–is that simply due to live cell imaging capture? Panels d2 and d3 look entirely different in nature from the d0 and d1 panels. There are strange flat gray rectangles, especially in d2 and the rest of the d2 and d3 panels are divided up in relatively large squares and rectangles. Are some of these simply combinations of low-signal pixels due to jpegs/compression artifacts or what? That is possible but if so, why are these artifacts not evident in panels d0 and d1, which only have an even sea of small, normal appearing color variegated (rather than flat gray) pixels? The green signal in d2 and d3 panels seems washed out. What is the pink signal showing up? Update: To be clear, I do not personally believe that Figure 1f has “cutting and pasting” as some have suggested, but the low quality and strange, relatively inconsistent appearance of the panels makes one wonder if the 4 panels were not processed in an equivalent manner.
  • Figure 1g shows EM of cells of 3 different types. Since only 1 cell of each type is shown, not much can be concluded here.
  • Figure 1h shows that the GFP+ cells are small. Why does the left edge of the green histogram peak abruptly end at 4 microns? Was a gate applied to remove anything smaller? One kind of gets the feeling that there could have been many GFP+ cells/cellular fragments/dead cells <4 microns. Why are there no X axis values smaller than 4 or larger than 10?
  • Figure 1i has some odd things such as straight lines surrounding lane 3 that are visible when the brightness is boosted. This has been a big issue on PubPeer. In addition, many people have been asking why the STAP cells made from T cells in lanes 4 and 5 would contain a prominent unrecombined (upper) band? Were the “T cells” used not very pure?

Again this is just one figure in one of the two papers. Numerous other concerns have been raised about other figures in both papers. 

What do I think at this point?

On one level I really dislike all this microscopic analysis of data problems in the STAP papers, but on the other hand this is such a huge finding, the reputation of the stem cell field is on the line at some level, and trust has been shaken in these papers that it seems that this kind of post-publication analysis here is on the whole needed and appropriate even if very uncomfortable.

The more I look at these two STAP papers, the more concerned I get. Also, many of the figures would have benefited from having murine iPS cell induction images and data shown in parallel as controls. That would have provided crucial context.

The bottom line for me now is that at some level a part of me still clings to a tiny and receding hope this has all been overblown due to simple misunderstandings, but that seems increasingly unlikely. The  definite numerous problems with the 2011 Obokata/Vacanti paper also have reduced confidence.

What will Nature‘s investigation conclude? No one can be sure, but I am predicting that Nature is taking this extremely seriously and if appropriate, will not minimize important issues when it makes its findings public. Nature‘s own integrity as a journal is at play here too and they know that.

For reference, Figure 1 legend verbatim:

a, Schematic of low-pH treatment. b, Oct4-GFP+ cell clusters appeared in culture of low-pH-treated CD45+cells (middle; high magnification, right) on day 7 (d7) but not in culture of control CD45+ cells (left). Top: bright-field view; bottom, GFP signals. Scale bar, 100 μm. c, FACS analysis. The x axis shows CD45 epifluorescence level; y axis shows Oct4-GFP level. Non-treated, cultured in the same medium but not treated with low pH. d, GFP+ (green) and GFP (yellow) cell populations (average cell numbers per visual field; ×10 objective lens). n = 25; error bars show average ± s.d. e, Snapshots of live imaging of culture of low-pH-treated CD45+ cells (Oct4-gfp). Arrows indicate cells that started expressing Oct4-GFP. Scale bar, 50 μm. f, Cell size reduction in low-pH-treated CD45+ cells on day 1 before turning on Oct4-GFP without cell division on day 2. In this live imaging, cells were plated at a half density for easier viewing of individual cells. Scale bar, 10 μm. g, Electron microscope analysis. Scale bar, 1 μm. h, Forward scattering analysis of Oct4-GFPCD45+ cells (red) and Oct4-GFP+CD45 cells (green) on day 7. Blue line, ES cells.i, Genomic PCR analysis of (D)J recombination at the Tcrb gene. GL is the size of the non-rearranged germline type, whereas the smaller ladders correspond to the alternative rearrangements of J exons. Negative controls, lanes 1, 2; positive controls, lane 3; FACS-sorted Oct4-GFP+ cells (two independent preparations on day 7), lanes 4, 5.”

19 thoughts on “Are STAP Stem Cell Nature Papers Compromised?


  1. I was so disappointed. Especially by what appears to be mosaic composition in Fig. 1f. Changing the order of electrophoresis lanes and mistakenly picking up a wrong picture out of hundreds of similar ones could have been acceptable as a human error. However, these image manipulations are not and make the whole paper look just like a piece of fabrication. For instance, look at “Bidirectional” letter to Nature, Extended Data Figure 5g, FACS plot of Oct4-GFP vs Integrin a7 or control IgG. I do not know if someone has already mentioned, but intensities of Oct4-GFP in the top and bottom panels are so different, where they should remain the same because they represent the same cells. The oblique pattern of Oct4-GFP positive cells in the top panel might reflect either auto-fluorescence (large, irregular or dying cells) or carry-over of strong PE signlas in the FITC channel due to poor compensation of the machine. However, the difference of the Oct4-GFP signals seems too large to be explained in such a way. Besides, if the machine were not properly compensated, one could make the figure in the top panel using the same sample as in the bottom panel simply by increasing the gain of the PE channel such that the strong GFP signals would cause false positive in the PE channel, which is why negative controls would be essential (single stains in this case).


    • Depressingly *every* Oct4-GFP fluorescent image I’ve looked at in the two Nature papers has possible processing anomalies. i.e. this is not a phenomenon limited to one figure. Basically, by bringing up the levels and contrast in Photoshop – it appears as if clumps of GFP+ cells have been superimposed on to a noiseless black background. The suspicion (certainly not proven) is that this manipulation was to make it look as if the cells are selectively GFP+ – rather a more general autofluorescence as the cells slowly die (say).


      • Giving the authors the benefit of the doubt, it still sure seems like many of the images in this paper are of surprisingly low resolution. Again, perhaps the primary data in unmodified form could resolve many of these concerns?


  2. If Figure 1f are captured images from a video, which one is it? There are two videos of low-pH-treated CD45+cells in the supplementary information. Are there any anomalies in these videos?


    • I don’t know. I do not have a lot of live cell imaging experience so I wondered if someone who does have such experience might have observed when they capture such “video” they tend to get such artifacts…or not?


  3. Fig 1i lane 4 and 5 have unrecombined bands because the rearrangement does not always occur in both TCR allele due to allelic exclusion. The oddest thing in Fig 1i is that an unrecombined band is not visible in lane 3. CD45+ lymphocytes include B cells, NK cells, and unrearranged-TCR-having T cells. Lane 3 should have an unrecombined band.


    • That’s exactly the issue, yes – the starting population is mixed (see extended figure 2g, which is very similar to 1i) and the resulting STAP cells equally so. So why go through the effort of running a sample in lane 3 (and splicing said lane into the gel image) that clearly cannot be the starting cell population from which lanes 4 and 5 were derived?


  4. I think this paper is going to turn out to be complete bunk, but Fig. 1f looks like it’s just a standard JPEG compression artifact —which can be avoided if one knows how to go about compressing figures before and after layout. But most scientists don’t know jack about image processing or image preparation for publication. And the automated PDF builders that the journals use often defeat even the most diligent scientists…


    • You may be right Spiny that Fig. 1f strangeness is just compression artifacts, but if so can you help us out by explaining in such a scenario why d0 and d1 panels would have no artifacts? Is it just a matter of the relative complexity of the background image being low in d2 and d3 in certain regions so pixels get lumped into larger gray rectangles? If these images were all taken, as I think is indicated in the paper, on the same cell culture dishes with the same cells with the same scope, is it unreasonable to think they should have similar backgrounds?


  5. In the paper “Bidirectional developmental potential…” besides the duplicated image in figures 1 and 2 there are things really hard to believe about the data they show in figure 1. First they should have done the analysis in chimaeric embryos previous to the chrorio-allantoic fusion (embryos before 6 somites, around E8) to conclude that only STAPs and not ES cells contribute to the placenta. Developing my point: After the allantois (derived from the embryonic mesoderm) fuses with the chorion (extraembryonic) the formation of the labyrinthine placenta occurs. In this, the trophoblast (extraembryonic) associates with the fetal blood vessels (embryonic) that have extensive villous branching. Then in an E12.5 placenta from a chimaera formed from a wildtype blastocyst injected with wildtype reporter Rosa26-GFP or CAG-GFP ES cells you should always find GFP positive cells in the placenta due to the contribution of the fetal blood vessels. Then is not possible that in the control chimaeras in figure 1a they didn’t find Rosa26-GFP cells (as stated in the text) / CAG-GFP (as stated in the figure). The same applies for the fetal membranes, the yolk sac mesoderm is derived from the embryo then in a chimaera you always have ES-derived cells (mesoderm) in the yolk sac. There is no way that control is real. Mild: the long exposure panel of the control embryo was not overexposed since the little piece of embryo in the far right of the panel is not overexposed.


    • p.s. A nice example of chimaeric embryos analysing the contribution of ES cell derived tissue to the placenta in E12.5 embryos can be found in PNAS USA 100(26):15637-15642. (2003) see figure 4.


  6. I just read something else in a Japanese blog (http://stapcells.blogspot.jp/2014/02/blog-post_7834.html). It feels a little bit like quibbling, but it has been noticed that the method text on karyotyping is a copy paste from a 2005 paper (Guo J, 2005), with very minot midifications and some mistakes and typos added.
    Compare the following:
    STAP cell paper: “Karyotype analysis was performed by Multicolor FISH analysis (M-FISH). Subconfluent STAP stem cells were arrested in metaphase by colcemid (final concentration 0.270 µg ml−1) to the culture medium for 2.5 h at 37 °C in 5% CO2. Cells were washed with PBS, treated with trypsin and EDTA (EDTA), re-suspended into cell medium and centrifuged for 5 min at 1,200 r.p.m. To the cell pellet in 3 ml of PBS, 7 ml of a pre-warmed hypotonic 0.0375 M KC1 solution was added. Cells were incubated for 20 min at 37 °C. Cells were centrifuged for 5 min at 1,200 r.p.m. and the pellet was re-suspended in 3–5 ml of 0.0375 M KC1 solution. The cells were fixed with methanol/acetic acid (3:1; vol/vol) by gently pipetting. Fixation was performed four times before spreading the cells on glass slides. ”

    Guo J’s paper (2005): “Metaphase spreads of the ES cells were performed as follows. Subconfluent ES cells were arrested in metaphase by adding colcemid (final concentration 0.270 μg/ml) to the culture medium for 2.5 h at 37° C in 5% CO2. Cells were washed with PBS, treated with trypsin-ethylenediaminetetraacetic acid (EDTA), resuspended into cell medium and centrifuged for 5 min at 1200 rpm. To the cell pellet in 3 ml of PBS, 7 ml of a prewarmed hypotonic 0.0375 M KCl solution was added. Cells were incubated for 20 min at 37° C. Cells were centrifuged for 5 min at 1200 rpm and the pellet was resuspended in 3–5 ml of 0.0375 M KCl solution. The cells were fixed with methanol/acetic acid (3:1, vol:vol) by gently pipetting. Fixation was performed four times prior to spreading the cells on glass slides.”


  7. I have not been able to reproduce the appearance of adjusted figure 1f presented here. Clearly whatever was done was more than increasing brightness. My own increase in brightness produced something without any irregularity http://i.imgur.com/RSaLEsS.jpg

    I also do not know how much this indicates without expert analysis. Clearly, the figures were not in high resolution. The fault of this occurring lies mostly with Nature as many researchers are lacking in skill and patience for formatting, at least I am. My problem is this type of image post-publication image manipulation has found a problem with almost every gel and microscope image in the entire paper. It’s hard to believe anyone manipulated images that much.

    The fact that image 1f comes from a video monitoring the cells throughout the conversion process actually places the veracity of this data well above normal standards. Perhaps this shows how unreliable poorly defined concerns are. I have a feeling the image adjustment done on Pubpeer in fig. 1f and elsewhere is meaningless. On pubpeer there is such manipulation done showing irregularity within single images, showing this irregularity does not mean that the images were captured in different ways.

    I believe the video is the source of the poor resolution on Fig. 1f. In other cases, perhaps the contrast settings were too high, combined with other issues, like compression.

Comments are closed.