Unreplicable state-dependent effects on start-box emergence latency in wild-origin sticklebacks

Animals are predicted to adjust their behaviour in relation to their bodily energetic state. Adjustment can be driven by either positive feedbacks (e


| INTRODUC TI ON
Increased boldness and activity can allow individuals to gain more resources but can also come with an increase in mortality risk, and trade-offs between resource acquisition and risk-taking are expected (Gotthard, 2000;Milinski, 1986;Werner & Anholt, 1993).
These trade-offs are likely to depend on the bodily state (i.e. energy reserves, body size, hormone levels, etc.) of a given individual animal (Sih et al., 2015). Three different feedback loops are typically discussed in relation to behavioural state-dependence: (i) starvation avoidance, (ii) asset protection and (iii) state-dependent safety (e.g. Luttbeg & Sih, 2010;Sih et al., 2015). Starvation avoidance refers to behavioural modifications to increase foraging activity when an individual's energetic reserves are either low or rapidly decreasing.
For instance, an individual with depleted energetic reserves needs to restore its reserves and should therefore also increase its foraging activity and boldness (e.g. Killen et al., 2011). Hence, when there is a risk of predation or conflict with competitors, this negative feedback loop leads to increased risk-taking (i.e. search activity and boldness) with increased risk of starvation mortality (Biro & Booth, 2009;Gotceitas & Godin, 1991;Werner & Anholt, 1993). Asset protection can be viewed as the opposite aspect of the same negative feedback loop: when body condition is high and future fitness prospects are good, risk-taking in general should be avoided, to save the previously obtained "assets" (i.e. a positive residual reproductive value). Here, an individual with higher energetic reserves is risk-aversive because it can afford to be so, in contrast to a similar sized individual with lower energy reserves. State-dependent safety is a positive behavioural feedback loop, where animals in good energetic condition or with large body size can afford to be, for example, more active, as their state protects them from many of the risks associated with foraging. A high energetic state could, for instance, increase escape performance (Stankowich, 2009; but see Kullberg et al., 1996).
In this study, the focus will be on state in terms of realized growth performance in relation to the maximal predicted growth performance under continuous ad libitum feeding. Animals which have experienced an extended period of food limitation have repeatedly been observed to elicit a compensatory growth response (i.e. a faster-than-normal growth rate) when food becomes available again (Ali et al., 2003;Arndt, 1997;Dmitriew, 2011). In fishes, this growth compensation can be induced by increased food conversion efficiency but is often mainly a result of hyperphagia (Ali et al., 2003). When associated to hyperphagia, growth compensation should reasonably be associated to increased foraging activity and, when foraging is associated with mortality risk, decreased risk-aversion. Hypothetically, catching up in size is beneficial for increasing immediate survivability and reproductive capacity in the short term, while also it comes with a delayed cost, explaining why normal growth rates are submaximal (Johnsson & Bohlin, 2006;Metcalfe & Monaghan, 2001).
To investigate whether refeeding after a period of food restriction alters general risk-aversiveness in sticklebacks (Gasterosteidae), a laboratory experiment was set up allowing for controlled food rations and continuous growth measurements. Three food treatments were applied: (i) continuously high rations (HH), (ii) high ration followed by low ration (HL), and (iii) low ration followed by high ration (LH). Fish in the LH-group were predicted to show compensatory growth rates during refeeding (i.e. higher growth rates than HH) and, if starvation avoidance/asset protection is the main feedback affecting behaviour, they should also express a more risk-taking behaviour in a standardized start-box emergence test (i.e. shorter emergence latency from shelter into a novel area than HH), as they are predicted to be more eager to start foraging in an unfamiliar and potentially risky situation.
Alternatively, if state-dependent safety is the main feedback acting in this situation, then HH should have a shorter emergence latency than LH. The HL-group was mainly included to get insight into recently food-restricted individuals. Under starvation avoidance, HL would also be expected to have shorter emergence latency, but these fish may also have a reduced routine metabolism and activity due to their reduced food intake, leading to long emergence latency.
The experiment was run twice in 2 consecutive years. The first experiment (2012) was developed as a part of an undergraduate course, where students were investigating the question of whether manipulation of growth rates had effects on behaviour. Results were supporting the hypothesis that restricted and refed fish showed more bold-like behaviours (see Section 3 for 2012 experiment below). Given these interesting effects, the experiment was run again in the same course the following year with the aim of an exact replication.

| Subject animals
Two species of sticklebacks were included in the experiment: three-spined stickleback Gasterosteus aculeatus L. and nine-spined stickleback Pungitius pungitius (L.). Both species are used as model organisms in ecology and evolution (Huntingford & Ruiz-Gomez, 2009;Merilä, 2013). All subject fish were collected from the same location in a coastal stream (Stora ån; 57°38.381′N, 11°55.209′E), using minnow-traps constructed from 1.5-L transparent plastic soda bottles ( Figure S1, see electronic supplement). A few more G. aculeatus were caught than needed for the experiment in both years, but exclusion was randomly determined; all captured P. pungitius were used. Capture dates were February 6 in 2012, and February 7-8 in 2013. As determined by visual inspection, and based on previous experience (Landeira-Dabarca et al., 2019;Näslund et al., 2016a), the individuals from this population do not show any signs of being parasitized by either Glugea or Schistocephalus, two common stickleback parasites which could influence behaviour (Milinski, 1985); other parasites were not been screened for. The fish were acclimated to laboratory temperatures by passive warming of the stream water up to room temperature (12°C). Thereafter, they were group-housed in a large tank and fed to satiation with bloodworms each day until the experiment started. Throughout the experiment, average water temperature was kept at 12°C (range: 11-13°C) and the light regime | 623 NÄSLUND was programmed to follow the natural cycle. Mean size (standard length and wet mass) for each treatment at the start of the experiment is presented in Table S2.

| Experimental design and procedures
Experimental treatments were initiated 8 days (2012) and 5-6 days (2013) after capture (denoted as 'day 1' of the experiment; Figure S2).  Table 1), using a stratified random assignment so that treatments and species were evenly distributed in the experimental room. The first treatment group ("HH") received high food ration throughout the experimental period; the second group ("HL") started on high ration but transitioned to low ration on day 19; the third group ("LH") started on low ration but transitioned to high ration on day 19 ( Figure S2). Food delivered to the fish consisted of thawed bloodworms; high ration was 15% of body mass until day 19 (i.e. the day on which rations were switched for SF and FS groups) and 25% of body mass after day 19; low ration was 2% of body mass throughout the study. High rations were based on Ali and Wootton (2000), which estimated normal average ad libitum consumption of three-spined sticklebacks to be 11.7% of body mass per day and hyperphagic phase consumption to be maximally 22.3% of body mass per day. Low rations were based on Ali et al., (1998), which estimated maintenance ration to be around 2% of body mass per day. Rations were calculated for each tank separately and were updated after each round of body size measurements (see Section 2.4 below). In 2012, one "HL" (G. aculeatus) tank suffered two mortalities prior to the time-point for the switch in ration; hence, one individual (fish ID: "7b/12bM") was moved to this tank, from a random tank that had also stated on high food ration, at the time of the ration switch.
Behavioural trials were run on day 29 of the experiment ( Figure   S2), using a start-box emergence test (see dimensions of the arena in Figure 1). Using a total of 10 parallel arenas, individual fish were trialled individually. Trial order of the tanks matched the tank numbering, which in turn represented the tanks' position in the room.
This trial order was primarily chosen to avoid unnecessary disturbance in the room when netting fish (i.e. not having to walk past tanks containing untrialled fish). The stratified random assignment of treatment groups and species into the tanks led to a mixture of treatments and species being trialled simultaneously. After being gently netted from their home tank and transferred in a cup of water to the test arena, the fish were put in a closed start-box at the end on the arena, which was covered by a lid to create a dark environment.
After 5 min acclimation to the start-box, a guillotine door was lifted allowing the fish access to the rest of the arena through a gate. Each trial was recorded for 30 min; all fish emerged from the start-box P. pungitius LH n = 6 n = 18 n = 16 (n F = 9; n M = 7) Note: Final sample size for 2013 is supplemented with sample size for females (n F ) and males (n M ) (unknown in 2012). For P. pungitius in the HL treatment, one rearing tank contained only 2 individuals from the start of the experiment (*).
Abbreviations: HH, high food ration first and second treatment period, high ration third period; HL, high ration first and second period, low ration third period; LH, low ration first and second period, high ration third period (refeeding groups).

TA B L E 1 Summary of number of tanks and individuals used in the experiments
to the fish during the trials. The start-box emergence test is commonly applied in animal behaviour experiments and is commonly referred to as measuring "boldness-like" or "risk-taking" traits (e.g. Hansen et al., 2020;Näslund et al., 2015). For instance, it is one of the more common tests applied to score boldness in experiments on consistent individual differences and behavioural syndromes in fish (e.g. Brown et al., 2005;Burns, 2008;MacGregor et al., 2021;Toms et al., 2010).

| Student participation
Five students were involved in the experiments each year (see Acknowledgment). While students were conducting the experiments, the author (JN) was closely involved at all stages of both experiments and always present during behavioural scoring, ascertaining that protocols were followed strictly.

| Data collection
Body mass was measured on day 1, 10 (2012) or 11 (2013), 18 and 28 (see Figure S2) by weighing the animals on a digital scale (precision, 0.01 g; Kern EW 3000-2 M; Kern & Sohn GmbH), after blotting away excess water using a moist dishcloth ("blotted wet mass"). Specific growth rate (G) was calculated for mass using the formula where m 0 and m 1 are, respectively, the wet mass at the first (t 0 ) and last (t 1 ) day of a given period of the experiment.
In the 2013 experiment, all experimental fish were euthanized 4 days after the behavioural test (maintaining the experimental food rations). The fish were dissected, and the blotted wet mass of the intact body, eviscerated body ("carcass"), and liver was weighed for each individual (precision: 0.1 mg; AB54-S; Mettler Toledo). The eviscerated body and liver were dried at 70°C for 48 h, whereafter their dry mass was weighed. Water content of the eviscerated body (%) was obtained by dividing dry mass with blotted wet mass. Fish from the 2012 experiment were prepared for molecular bioassays but were lost due to freezer failure.
For G. aculeatus, the lateral bony armour plates were found to be reduced in some individuals. Some individuals only had a few anterior plates (low-plated morph) while some had only partial reduction (a gap in plates between the anterior-most and the caudal-most plates; partial-plated morph). Similar plate polymorphism is found in many G. aculeatus populations (e.g. Ziuganov, 1983). Plate morphs were scored from photographs of the experimental individuals, after the experiments were conducted (2012: n full = 34, n partial = 6, n low = 5; 2013: n full = 44, n partial = 3, n low = 1).

| Analyses
All analyses were run using Bayesian linear mixed models in brms (Bürkner, 2017), a package for R (R Core Team, 2020) providing an interface for Stan (Carpenter et al., 2017).
Specific growth rate (G) was analysed in a model using treatment ("TR"; 3 levels), species ("SP"; 2 levels) and experimental period ("PD"; 3 levels) as fixed factors, including all their interactions, and tank ("TANK"; see Table 1) and fish identity ("ID"; see Table 1) as a random factors; standard body length ("BL") was initially included as a covariate but excluded if not improving the model fit, assessed based on information criteria from leave-one-out cross-validation (LOOIC; Vehtari et al., 2017). Body mass (log e -transformed) and body length were analysed using models with the same structure, except for PD being replaced by day of measurement ("DAY"; fixed factor, 4 levels).
These models were run separately for each of the 2 years. Only individuals surviving until the start of the behavioural test were included in the above specified models, to reduce influence from individuals affected by other factors than the treatment.
Emergence latency (log e -transformed) was analysed in a model using year ("YR"; 2 levels), TR and SP as fixed factors, including all their interactions, with TANK as a random factor. Differences among factor combinations were assessed based on distributions of posterior contrasts. As with growth analyses, body length (BL) was initially included as a covariate but excluded if not improving the model fit, assessed based on LOOIC. All models assumed Gaussian error distribution (identity link functions) and were run with 4 chains, 5000 iterations per chain, a burn-in of 2500, and weak normalizing priors (μ = 0, σ = 10) for the population-level effects; all other modelling parameters were left as default.

| Ethical statement
All applicable international, national and institutional guidelines for the care and use of animals were followed. The experimental procedures were approved by the Ethical Committee on Animal Experiments in Gothenburg, Sweden (ethical licence number 8-2011).

| Food treatment effects on growth
Treatments resulted in the expected effects for both species in both years, with LH-groups growing slower than HH-and HL-groups during food restriction (period 1 and 2) and LH-groups accelerated their growth rates above that of the HH-groups during their refeeding period (period 3), while HL decreased their growth rate during period 3 (their restriction period) (Figure 2; see supplementary material   Table 1 for initial and final sample size.

| Emergence latency
In the first experiment (2012), the HH-groups for both species had longer emergence latency than LH ( Figure 3A;  Figure 3B; Table 2). The HL-groups were intermediate for both species, but more similar to HH-groups ( Figure 3A-B; Table 2). In the second experiment (2013), the effects were qualitatively reversed with respect to treatment effects. The HH-groups for both species had shorter emergence latency than LH-groups ( Figure 3A;

| Exploring emergence latency in relation to potential confounding factors
After the 2013 experiment, the subject fish were dissected after trials which allowed for exploring effects of sex on behaviour.
Running the model on 2013 data only, removing year and adding sex as a factor, revealed no effect of sex with credibility limits F I G U R E 3 Emergence latency (geometric means with 95% credibility intervals) in start-box emergence tests for sticklebacks (Gasterosteus aculeatus, green; Pungitius pungitius, brown). HH: high food ration first and second treatment period, high ration third period; HL: high ration first and second period, low ration third period; LH: low ration first and second period, high ration third period (refeeding groups).  Abbreviations: HH, high food ration first and second treatment period, high ration third period; HL, high ration first and second period, low ration third period; LH, low ration first and second period, high ration third period (refeeding groups).
TA B L E 2 Estimated emergence latencies in seconds (geometric means and 95% credibility interval limits; backtransformed from the log-scale)  With respect lateral bony armour plates in Gasterosteus aculeatus, no strong indication of differences across years was found, although the frequencies were not identical (χ 2 -test: χ 2 = 4.86, p = .09). To see if plate morph explained behaviour in this species, the model was run on G. aculeatus data only, removing species as a factor and adding plate morph (three levels: full-, partial-and low-plated morph). Both partial-and low-plated morphs showed similar behavioural characteristics as the full-plated morph (

| Body water content and relative liver mass in 2013
The ratio between dry and wet carcass mass was higher for the HHgroups as compared to the LH-groups for both species (G. aculeatus: 99.9% p.d. > 0; P. pungitius: 97.0% p.d. > 0; Figure 4A-B; Table S3). HLgroups were intermediate, but for G. aculeatus, the ratio was still clearly lower than in the HH-group (98.7% p.d. > 0). Gasterosteus aculeatus had generally lower water content than P. pungitius ( Figure 4C).
Relative liver mass was similar between HH-and LH-groups, but lower in the recently food-restricted HL-groups (<1% overlap with 0 for all p.d. relating to contrasts involving HL) (Figure 4D-E; Table S3).

| DISCUSS ION
Experimental replication is important in science, to increase or decrease the confidence in previously found effects (Ioannidis, 2005;Kelly, 2006Kelly, , 2019Nakagawa & Parker, 2015;Nosek & Errington, 2020). This is particularly true for studies in animal behaviour, where sample sizes are often relatively low (Jennions & Møller, 2003), which can increase the risk of spurious effects in the statistical analyses (Anderson et al., 2001). While many results are indeed robust, it is not uncommon that results fail to replicate in subsequent experiments (e.g. Clark et al., 2020;Jones et al., 2019;Roche et al., 2020;Wang et al., 2018). Publication bias and erroneous analyses, favouring positive results in the a priori hypothesized direction, may further contribute to promote spurious effects in the literature (Baltzley & Nabity, 2018;Jennions & Møller, 2002). This report provides a case of a non-reproducible result in food ration manipulated sticklebacks, tested for risk-taking behaviour in a commonly used standardized test.

| Failure to replicate results across years
The first experiment showed results supporting a behavioural feedback involving asset protection and starvation avoidance, in line with the general pattern found in a meta-analysis on state-dependent risk-taking behaviour (Moran et al., 2021). Previous studies on sticklebacks have also detected positive associations between hunger and risk-taking (Croy & Hughes, 1991;Fraser & Huntingford, 1986).
Continuously well-fed fish were showing longer latency to emerge from the sheltered start-box, while fish undergoing refeeding (and possibly compensatory growth) were showing shorter emergence latency.
Recently, food-restricted fish were showing intermediate emergence latency, statistically not clearly distinguishable from either of the other groups. A general difference between the two species was also found, with P. pungitius being faster to emerge, suggesting that this smallerbodied species was more motivated to leave the refuge. This result contrasts with results from Webster et al., (2009), in which P. pungitius were more prone to spend time in cover than G. aculeatus.
The second experiment in 2013 was run to replicate the first experiment, aiming for an exact replication (Kelly, 2006). This experiment resulted in the opposite pattern, as compared to the 2012 experiment, with continuously well-fed fish having the shortest emergence latencies, which would support a state-dependent safety feedback.
It is perhaps possible that different state-dependent feedback loops acted within the fish in the different years. If so, this study could be an indication of instability in the mechanisms affecting state-dependent behaviour. From the current experiment, it is not possible to determine which factor would be the one affecting the outcome of behavioural state-dependency. The species-dependent difference in emergence latency found in 2012 was not supported in 2013, further indicating that the results from this experimental design were unstable.

| Hypothetical causes for different effects between experiments
While it is possible that minor differences in procedures might have occurred unknowingly, none were identified during or after the experimental procedures. The protocols applied (i.e. the growth treatment or the behavioural test) could be unreliable in their effects on the fish, with minor differences having large impacts on their behaviour (see e.g. Hansen et al., 2020). However, if the experiment would be replicated in a different laboratory, then even more subtle differences in experimental design would be expected by necessity (different populations, holding tanks, food source, experimental facilities, etc.). Hence, the presented experiments should reasonably be as close to an exact replication as practically achievable. Average size of the subject fish differed between years, with fish being on average smaller in 2013. However, there were no detectable effects of body size on behavioural expression, which is in line with similar emergence test experiments on G. aculeatus by MacGregor et al., (2021). King et al., (2013) found sex effects on risk-taking in G. aculeatus, with males spending more time out of cover. Sex was only recorded in 2013, but no effect of sex was seen in this year. Different plate morphs in G. aculeatus have previously been found to exhibit different risk-taking behaviours (Grand, 2000), but no indications of such effects were detected in the present experiments. Since the fish were housed in small groups during feeding treatments, food rations for individual fish may vary across days.
Other possible causes for the different results are intergenerational differences in selection pressure on behavioural expression, and environmental differences during winter affecting the tested cohorts differently. Given the usage of wild-captured fish in this experimental design, these potential problems would be uncontrollable for the experimenter. Given the similar patterns across treatment groups between the species within each year, it seems likely that there could have been a common environmental factor affecting the fish. The differences in body size across years suggest that the cohorts differed in some respects (e.g. growth or age-structure, or selection pressure) across years, despite being The chance factor in capture and assignment is, of course, a possible explanation to the results in general. The remedy for this would be larger sample sizes, which is a reasonable recommendation based on the present results.
Trapping individuals may create a biased sample due to certain behavioural types being more prone to enter traps (or escaping when trapped), which has been demonstrated in G. aculeatus in 2-hour trapping trials (Kressler et al., 2021). However, while the captured fish may constitute a biased sample in general, the traps and capture site were the same across years. Hence, any bias would also likely be similar across years if the overall population has the same distribution of behavioural types.
Pooling of the data from the 2 years led to effects from each year cancelling each other out, which then indicates that there is no state-dependency in the scored behaviour in this experimental setup, which would be in line with experiments on young brown trout Salmo trutta L. (Näslund et al., 2016b).

| Body water content and relative liver mass
Water content of the carcass (i.e. the body without body-cavity organs) was slightly but consistently lower in the restricted-refed (LH) fish, as compared to the continuously fed (HH) fish. Tissue hydration in starving fish is well known from several fish species and amphibians and hypothesized to be a way to limiting body mass loss during periods of low food abundance (Ali et al., 2003;McCue, 2010).
The effects were strongest in the restricted-refed groups (LH), which were the groups experiencing the longest restriction period (tissue hydration can be a slow process; Mendez & Wieser, 1993).
Notably, the refeeding period was not long enough to regain a normal water content; instead, it appears that the fish were aiming to compensate the lost growth opportunity by adding lower-quality (i.e. higher water content) tissue and thereby gaining body size more rapidly. Some studies indicate that tissue hydration could be a response specifically linked to compensatory growth responses, as increased water content is sometimes seen in the refeeding phase, but not in the restriction phase, of restriction-refeeding experiments (Johansen et al., 2001;Türkmen et al., 2012). This pattern is not general across species (Liu et al., 2011;Mendez & Wieser, 1993) but cannot be excluded for sticklebacks.
Relative liver mass decreased with recent food restriction (i.e. in HL-groups) but was similar between refed (LH) and continuously fed (HH) fish. This indicates that the energy in the liver is mobilized during food shortage, but quickly compensated when high food consumption levels are recovered. This finding is consistent with previous studies on fish in general (e.g. Ali et al., 2003;van Dijk et al., 2005;Liu et al., 2011;Metcalfe et al., 2002).

| CON CLUS IONS
This study reveals that repeated experiments using the start-box emergence test can result in contrasting effects when testing individuals in different energetic states, without clear indications of potential confounding factors. Unobserved cohort differences in the wild-caught subjects are hypothesized to contribute to the unstable results, as laboratory conditions were close to identical across the two experiments. Alternatively, the emergence test may be sensitive to minor, unperceived, alterations in the trial procedures. This latter possibility should be further investigated, given that the test is commonly applied to score boldness in fish behaviour experiments.
It should be noted that both experiments reported here could have been presented as support for a state-dependent feedback, if reported on their own. This highlights the importance of replicating experimental findings.

CO N FLI C T O F I NTE R E S T
The author declares that he has no conflict of interest.