Front. Behav. Neurosci. Frontiers in Behavioral Neuroscience Front. Behav. Neurosci. 1662-5153 Frontiers Media S.A. 10.3389/fnbeh.2022.869511 Neuroscience Perspective Ten Points to Improve Reproducibility and Translation of Animal Research Spanagel Rainer * Institute of Psychopharmacology, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany

Edited by: Christian P. Müller, University of Erlangen Nuremberg, Germany

Reviewed by: Roberto Ciccocioppo, University of Camerino, Italy; Liubov Kalinichenko, University of Erlangen Nuremberg, Germany

*Correspondence: Rainer Spanagel, rainer.spanagel@zi-mannheim.de

This article was submitted to Pathological Conditions, a section of the journal Frontiers in Behavioral Neuroscience

21 04 2022 2022 16 869511 08 02 2022 22 03 2022 Copyright © 2022 Spanagel. 2022 Spanagel

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Findings from animal experiments are often difficult to transfer to humans. In this perspective article I discuss two questions. First, why are the results of animal experiments often so difficult to transfer to humans? And second, what can be done to improve translation from animal experiments to humans? Translation failures are often the result of poor methodology. It is not merely the fact that low statistical power of basic and preclinical studies undermine a “real effect,” but the accuracy with which data from animal studies are collected and described, and the resulting robustness of the data is generally very low and often does not allow translation to a much more heterogeneous human condition. Equally important is the fact that the vast majority of publications in the biomedical field in the last few decades have reported positive findings and have thus generated a knowledge bias. Further contributions to reproducibility and translation failures are discussed in this paper, and 10 points of recommendation to improve reproducibility and translation are outlined. These recommendations are: (i) prior to planning an actual study, a systematic review or potential preclinical meta-analysis should be considered. (ii) An a priori power calculation should be carried out. (iii) The experimental study protocol should be pre-registered. (iv) The execution of the study should be in accordance with the most recent ARRIVE guidelines. (v) When planning the study, the generalizability of the data to be collected should also be considered (e.g., sex or age differences). (vi) “Method-hopping” should be avoided, meaning that it is not necessary to use the most advanced technology but rather to have the applied methodology under control. (vii) National or international networks should be considered to carry out multicenter preclinical studies or to obtain convergent evidence. (viii) Animal models that capture DSM-5 or ICD-11 criteria should be considered in the context of research on psychiatric disorders. (ix) Raw data of publication should be made publicly available and should be in accordance with the FAIR Guiding Principles for scientific data management. (x) Finally, negative findings should be published to counteract publication bias. The application of these 10 points of recommendation, especially for preclinical confirmatory studies but also to some degree for exploratory studies, will ultimately improve the reproducibility and translation of animal research.

open science p-hacking HARCKing confirmatory animal study exploratory animal study DSM-based animal model 3R 6R

香京julia种子在线播放

    1. <form id=HxFbUHhlv><nobr id=HxFbUHhlv></nobr></form>
      <address id=HxFbUHhlv><nobr id=HxFbUHhlv><nobr id=HxFbUHhlv></nobr></nobr></address>

      Introduction

      Research in the field of behavioral neuroscience and science in general has a problem: many published scientific findings cannot be replicated. Most researchers are familiar with the situation in which a result from an experiment carried out years ago can no longer be reproduced. After a few sleepless nights and a lot of troubleshooting, however, a small methodological problem is typically identified and the results from former experiments can again be reproduced. Not only can the replication of one’s own results cause difficulties, but often other working groups cannot reproduce the results of their colleagues. Given this dilemma, it is for many researchers the greatest scientific satisfaction if another research group can replicate their published results. Sir Karl Popper, an Austrian-British philosopher who founded the philosophy of critical rationalism put this phenomenon in a nutshell: We dont even take our own observations seriously or accept them as scientific observations until we have repeated and tested them. It is only through such repetitions that we can convince ourselves that it is not just an isolated coincidence that is involved, but rather events that are fundamentally inter-subjectively verifiable due to their regularity and reproducibility (Popper, 1935). Poppers critical rationalism is more relevant today than ever before. His philosophical approach represents a balanced middle ground between a belief in science and the relativism of truth. The replication of a scientific finding supports a belief in science, but if a finding is not replicated, it does not necessarily mean that the original finding is wrong, only that it needs to be relativized and subjected to critical discourse. This critical discourse usually leads to an explanation of the discrepancy, thereby sometimes placing opposing findings in relation and thus leading to better knowledge gain. The discourse, by way of critical rationalism, is not only the compass to maneuver through the corona crisis, but also a guide through the replication and translation crisis and runs like a red line through the following paragraphs of this perspective paper.

      The Replication Crisis is a Multidisciplinary Phenomenon and Contributes Largely to the Translation Failures from Animal to Human Research

      How can one measure the replication of scientific findings? There is no generally accepted standard method for measuring and evaluating a replication. However, reproducibility is well-assessable using significance and P-values, effect sizes, subjective ratings, and meta-analyses of effect sizes. These indicators correlate with each other and together provide a meaningful statement about the comparison between the replication and the original finding (Open Science Collaboration, 2015). However, in animal studies, statistical analyses and resulting P-values underlie a wide sample-to-sample variability and therefore should not be solely used as the guiding principle to assess reproducibility (Halsey et al., 2015).

      Replication failures are an inherent problem of doing science, but are we in a replication crisis? Indeed, meta-research that itself uses scientific methodology to study the quality of a scientific approach has identified widespread difficulty in replicating results in all experimental scientific fields, including behavioral neuroscience. This problem is termed “the replication crisis.” Although the term “crisis” has a subjective negative meaning, a recent survey of 1,500 researchers supports this general perception (Baker, 2016). Only the subjective assessment of replication was used as an indicator, and more than half of the respondents answered “Yes—there is a significant replication crisis” in research, and another almost 40% of respondents indicated that there is indeed a crisis. The term “replication crisis” is also often associated with the accusation of unethical behavior to data manipulation and the pure fabrication of results. The question of the extent of scientific misconduct inevitably arises. Fanelli (2009) conducted a systematic review and meta-analysis on this question. In anonymous surveys, scientists were explicitly asked whether they had ever fabricated or falsified research data or whether they had changed or modified results in order to improve the result. The meta-analysis of these studies showed that around 2% of those questioned admitted to scientific misconduct, the number of unreported cases is likely to be somewhat higher. It remains to be seen whether these 2% black sheeps show systemic misconduct resulting from the Publish or Perish incentive system, or whether this represents the normal range of unethical human behavior. If it is not scientific misconduct that leads to the replication crisis how does the problem of the lack of reproducibility of published results, which greatly affects the reliability of animal experimentation, come about?

      A Summary of Reasons that Contribute to Replication Failures in Animal Experiments

      Many factors contribute to the replication failures of findings derived from animal research.

      Most animal studies in the field of behavioral neuroscience are exploratory in nature and have a very low statistical power. A study with low statistical power has a reduced chance of detecting a true effect. Low power also reduces the likelihood that a statistically significant result reflects a true effect (Button et al., 2013). This results in a lack of repeatability with a wide sample-to-sample variability in the P-value. Thus, unless statistical power is very high (and much higher than in most experiments), the fickle P-value should be interpreted cautiously (Halsey et al., 2015). This fundamental problem of small sample sizes and the hereof resulting wide sample-to-sample variability in the fickle P-value largely contributes to replication failures.

      Related to this inherent statistical challenge is the phenomenon of “p-hacking,” the selective reporting of significant data. P-hacking can be conducted in many ways. For example, by performing many statistical tests on the data and only reporting those that come back with significant results, by deciding to include or drop outliers, by combining or splitting treatment groups, and other ways to manipulate data analysis. Head et al. (2015) demonstrated how one can test for p-hacking when performing a meta-analysis and that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured.

      HARKing, or Hypothesizing After the Results are Known, is also a source of uncertainty (Kerr, 1998). Although a study is usually planned using a working hypothesis, oftentimes the results neither support nor reject the working hypothesis, but may point to a new working hypothesis—the article is then written in a way that makes the most sense of the results.

      Although p-hacking and HARKing are common practices in science, both likely have little impact on the replication crisis as compared to another problem: poor methodology. Thus, the accuracy with which animal research is collected and described, and the resulting robustness of the data, is generally very low (Perrin, 2014; Bespalov et al., 2016). Kilkenny et al. (2009) examined this systematically in 271 published animal studies. More than half of the studies did not provide information on the age of the animals, 24% of the studies did not provide information on the sex, 35% of the studies did not provide information on the number of animals used, 89% of the studies did not perform any randomization, and 83% of the studies did not provide information on blinding of the experiments (Kilkenny et al., 2009).

      A problem that also relates to poor methodology that has yet to receive attention is a phenomenon it is referred to here as “method hopping.” Method hopping is the race for the newest technology. There is an extremely rapid development and turnover rate in technology especially in the neuroscience field. A technology that is currently considered state of the art can already be outdated a year later. This problem is escalated by the policies of top journals and many grant institutions, which always favor the use of the newest technology to answer a research question. An indication that method hopping contributes to replication failures is the fact that publications in high-profile journals such as Nature and Science that could not be replicated are cited 153 times more often than those studies that could be successfully replicated (Serra-Garcia and Gneezy, 2021). It is suggested that more-cited papers have more “interesting” results driven by new technologies which leads to a subjectively more lax peer-review process and subsequently to a negative correlation between replicability and citation count (Serra-Garcia and Gneezy, 2021). Hence, this race for the newest technology comes at a high price, as oftentimes researchers do not have their newest technology under control, and published measurements may subsequently rely on simple technical failures.

      In addition to poor methodology, another overlooked problem is that published findings can ultimately only be generalized to a limited extent. The problem of generalization of findings from animal experiments is evident to all animal experimenters. In rodent studies, for example, strain, age, sex, and microbiome composition are critical biological factors that largely contribute to study differences. Numerous environmental factors such as light, temperature, cage size, numbers of animals per cage, enrichment, and food composition—to name just the most important ones—also contribute to large data variability. However, even if biological and environmental factors are standardized, findings can still differ widely. This is best exemplified by a multi-site study by John Crabbe from Portland together with colleagues from labs in Edmonton, Canada, and Albany, New York. They studied eight inbred strains of mice in all three labs using various behavioral tests (Crabbe et al., 1999). All laboratory conditions—from the light-dark cycle to the manufacturer of the mouse food, etc.—were harmonized between the three laboratories. However, the results were strikingly different across laboratories. For example, in a standard locomotor activity test using an identical open field apparatus, all strains in the different laboratories showed different activity measurements. This comparative study of locomotor activity in mice leads to the following conclusion: there are significant differences in locomotor activity between different inbred strains and these differences do not translate from one laboratory to another. The same conclusions can be drawn for alterations in body weight. All inbred strains differed significantly in weight gain; however, these differences could not be extrapolated from one laboratory to another. In summary, this suggests that results from one laboratory are not transferrable to another. Mice from an inbred strain that are per definition genetically identical exhibit different locomotor activity and weight gain in an Albany lab than in a Portland or Edmonton lab. However, sex differences were comparable across sites. In all 8 animal strains and in all three laboratories, female animals showed higher activity and lower weight gain than male animals. It is therefore possible to generalize that regardless of the inbred strain and laboratory in which locomotor activity is measured, female animals are generally more active and have a lower weight gain than male animals.

      The study by Crabbe et al. (1999) published in Science caused a shock wave in the global scientific community because it indicated that even simple parameters such as motor activity and weight gain are site dependent (with the exception of gender differences) and thus not replicable. However, in the last 20 years, many new scientific discoveries have been made that contribute to a better understanding of the replication problems from one laboratory to another. Epigenetic events and microbiotic differences often contribute to significant trait and behavioral changes, even in genetically identical inbred strains (Blewitt and Whitelaw, 2013; Vuong et al., 2017; Chu et al., 2019). Furthermore, natural variabilities in stress resilience may also contribute to this problem (Kalinichenko et al., 2019). This is no different with humans. Monozygotic twins who are raised in different families and places can differ significantly in what they do and how they do it and how they cope with stressful situations. The environment shapes our behavior and stress coping strategies via epigenetic mechanisms, the immune state (Kalinichenko et al., 2019), and the environment-dependent settlement of microorganisms, especially in our intestines, that also have a considerable influence on our behavior and resilience or vulnerability to various diseases (Blewitt and Whitelaw, 2013; Vuong et al., 2017; Chu et al., 2019). These new findings provide an explanatory framework to better explain the variability that arises from measurements in different laboratories.

      In summary, there are major statistical problems, poor methodology—the accuracy with which animal research is collected and described—and “method hopping,” as well as generalization challenges that contribute to the replication crisis in animal experiments. In addition to replication problems, several other phenomena contribute to translation failures from animal to human research.

      A Summary of Phenomena that Contribute to Translation Failures in Animal to Human Research

      A publication bias for positive results contributes significantly to the lack of translation from animal to human research. This publication bias has been impressively demonstrated by Fanelli (2012). Fanelli compared original publications reporting positive results over a 20-year span and demonstrated a consistent positive publication bias in the field of neuroscience. Regardless of the research area and country of origin of the research, positive results were almost exclusively (>90% of all published studies) published in the preceding decades. Very similar pronounced publication biases for positive results were also found for other disciplines, such as pharmacology and psychiatry, and appears to be a global phenomenon in science (Fanelli, 2012).

      Another source of translation failures is the incorrect choice of the model organism. Sydney Brenner—best known for his brilliant scientific quotes—concluded in his Nobel award speech “Choosing the right organism for one’s research is as important as finding the right problems to work on.” Convergent brain anatomy, complex behavioral repertoire, advanced cognitive capacities, and close genetic homology with humans make non-human primates by far the best model organism for behavioral neuroscience. However, the use of non-human primates for scientific purposes became an ethical issue in Western industrialized countries and today we are facing a situation where most of non-human primate research is done in China. Especially, advances in cloning of macaque monkeys allows now the generation of monkeys with uniform genetic backgrounds that are useful for the development of non-human primate models of human diseases (Liu et al., 2019).

      For most studies in the field of behavioral neuroscience, a rodent model organism is the first choice, but a notable shift has occurred over the last two decades, with mice taking a more and more prominent role in biomedical science compared to rats (Ellenbroek and Youn, 2016; Yartsev, 2017). This shift is primarily driven by the availability of a huge number of transgenic mouse lines. However, for modeling complex behavior (e.g., social behavior) or pathophysiology (e.g., addictive behavior), the rat is the model organism that typically produces better translational information (Vengeliene et al., 2014; Ellenbroek and Youn, 2016; Spanagel, 2017). For example, whereas the majority of rats readily engage in social behavior, mice spend significantly less time interacting with a conspecific and many even find such interactions aversive. As a result, interactions between males are much rarer and, when they do occur, are more aggressive and territorial in nature. Furthermore, their social play-fighting involves only a small subset of what is exhibited by rats (Pellis and Pasztor, 1999). Clearly, social behavior in mice shows less face validity than that in rats. Nevertheless, a simple Medline search from 2002 to 2022 for the Keywords (social interaction) and (mice) has 2,400 hits compared to 1,700 hits for (social interaction) and (rat). In conclusion, despite the fact that rats are more social than mice and thus may better model human social behavior, researchers are preferring to use mice because more transgenic lines are available and housing costs are less.

      In addition to the choice of the model organism, the employed animal test also has a great impact on translatability to the human situation. This is particularly evident when modeling aspects of complex pathological behavior as seen in different psychiatric conditions. The current DSM-5 and ICD-11 psychiatric diagnostic classification systems are based on clinical observations and patient symptom reports, and are inherently built on anthropomorphic terms. As diagnoses are made according to these classification systems worldwide, logic dictates that animal models of psychiatric disorders should also be based on DSM-5/ICD-11 criteria. Is this possible? Modeling the full spectrum of a mental disorder in humans is not possible in animals due to the high level of complexity. However, we can transfer anthropomorphic terminology to animal models with empirical, translatable and measurable parameters, and thus reliably examine at least some key criteria of a disease of interest in animal models—excellent examples for DSM-based animal models are provided for addictive behavior (Deroche-Gamonet and Piazza, 2014; Spanagel, 2017).

      Further translation problems relate to pharmacotherapy development. Many preclinical studies aim to generate evidence that a new drug improves pathological behavior in a given animal model. However, the phenomenon of tolerance development—defined as a loss of efficacy with repeated drug exposure—is quite common but rarely assessed. Indeed a recent analysis indicates that many published preclinical efficacy studies in the field of behavioral neuroscience are conducted with acute drug administration only (Bespalov et al., 2016). Another limitation in the drug development process is that most preclinical drug testing in animals is conducted by intraperitoneal or subcutaneous administration while in humans the same treatment is planned to be oral. However, the pharmacokinetics of a given drug very much depends on the route of administration. If oral administration is already considered on the preclinical level appropriate dosing in humans can be better achieved, and it can be predicted early on if adequate therapeutic window exists.

      Another problem that has yet to be discussed in the literature is the lack of a placebo effect in animal studies. Conditioned placebo effects, in particular placebo-induced analgesia, have been described in laboratory animals (Herrenstein, 1962; Keller et al., 2018). But when testing for pharmacological interventions in an animal model of a given psychiatric condition, a placebo effect cannot occur in laboratory animals unless they are first conditioned to the drug; however, this possibility is not given due to the experimental design). It can be assumed that the missing placebo effect in preclinical intervention studies leads to a considerable overestimation of the effect size in the translation in human clinical trials.

      In summary, there are numerous phenomena that contribute to translation failures. It is not only the non-reproducibility of preclinical findings, but also a publication bias for positive results, the incorrect choice of the model organism, and the use of non-relevant disease models (for example not DSM-5/ICD11 conform) that result in translation failures in the clinical condition. With respect to preclinical psychopharmacotherapy development, tolerance phenomena are often not assessed, and the lack of identification of placebo effects in animal studies can lead to an overestimation of the effect size that can easily shrink due to large placebo effects common in neuropsychiatric trials (Scherrer et al., 2021).

      In the second part of this perspective I will present a list of recommendations on how to counteract replication and translation failures.

      What Can Be Done to Improve Reproducibility in Animal Experimentation? Effect Size Estimation and Bayes Factors Are Alternatives to <italic>P</italic>-values

      Is there an alternative to the inherent problem of a fickle P-value (Halsey et al., 2015) in exploratory animal studies that use a small sample size, typically in the range of n = 5–12? Not really, as exploratory animal research is performed under the limitation of lab resources and time, and must be also conducted within the 3R framework. Furthermore, the use of laboratory animals in biomedical research is a matter of intense public debate. Recent statistics indicate that about half of the western population would be in favor while the other half would oppose it (Petetta and Ciccocioppo, 2021). Therefore, the authorities that evaluate and approve the animal protocols are under pressure of this public and political debate and thus try to limit the number of animals to be employed.

      What would help, however, is to report an effect size that gives quantitative information about the magnitude of the effect and its 95% confidence interval (CI), which indicates the uncertainty of that measure by presenting the range within which the true effect size is likely to lie (Michel et al., 2020). Indeed it has become more and more common practice to report effect sizes along with P-values. Reporting the effect size provides researchers with an estimate of how big the effect is, i.e., the difference between two groups. There are many ways to calculate the effect size—for a simple group comparison Cohen’s d is usually calculated. There is general agreement that d = 0.2 is a small effect, d = 0.5 a medium effect size, and d = 0.8 a large effect size. The calculation of the effect size with its 95% CI can help the researcher in deciding whether to continue with a study or abandon it because the findings are not reliable. If an exploratory experiment yields a highly significant effect (P < 0.01) with a small effect size (d = 0.2), one should be cautious to conduct further elaborate experiments. If an exploratory experiment yields a significant effect (P < 0.05) with a large effect size (d = 0.8), a better basis for further experiments is provided. Although these two scenarios of statistical analysis may help researchers to decide to continue or rather stop a study, the real advantage of describing effect sizes is that findings from several experiments can be combined with a meta-analysis to obtain more accurate effect size estimates.

      Furthermore, one can augment or substitute a fickle P-value with the Bayes factor to inform on the relative levels of evidence for the null and alternative hypotheses; this approach is particularly appropriate for studies in which one wishes to continue collecting data until clear evidence for or against a working hypothesis has accrued (Halsey, 2019). The use of the Bayes factor is also starting to gain momentum and is easily calculable for a range of standard study designs (van de Schoot et al., 2021). Although in recent years Bayesian thinking and statistics are increasingly applied to many scientific fields, there is still a lack of knowledge and guidance on how to use this statistical approach, and most, scientific journals have yet to adapt their journal policies toward Bayesian statistics. However, even if we can augment or substitute a fickle P-value with a Bayes factor, it will not solve the inherent problem of using small sample sizes in exploratory research.

      Convergent Evidence From Cross-Species Studies Provide More Solid Ground for Replication

      Alternatively to these different statistical approaches, a better assessment of reproducibility is obtained with convergent evidence, in which measurements obtained by different methods point to a similar finding. Convergent evidence requires more experiments with different methodological approaches that try to prove a working hypothesis. Although this clearly means more work, more experimental animals, and more time necessary to answer a research question, it provides more solid ground for replication. Convergent evidence can also be obtained by cross-species comparisons. Notably, convergent evidence that comes from cross-species neuroimaging and genetic studies provides solid evidence. Multi-modal neuroimaging for structural and functional measures is a standard methodology to study human psychopathology and the same measures can also be used in high-field scanners for experimental laboratory animals; if similar structural or functional brain signatures are obtained in humans and experimental animals, convergent cross-species evidence is provided. For example, converging evidence from diffusion tensor imaging measures in alcohol-dependent rats and humans show comparable persistent white matter alterations that evolve soon after the cessation of alcohol use (De Santis et al., 2019). This convergence now allows to study in alcohol-dependent rats the underlying molecular cause for the persistent white matter alterations, a methodological step that could not be done in humans. In terms of cross-species genetics, another example is a study using a genome-wide association meta-analysis (GWAS) of alcohol intake that demonstrates an association of a genetic variant in the ras-specific guanine-nucleotide releasing factor 2 (RASGRF2) gene, which encodes a protein that mediates activation of the ERK pathway. This association only occurred in males and it was further shown that male individuals showed a blunted mesolimbic dopamine response to alcohol-related stimuli (Stacey et al., 2012). Accordingly, alcohol intake was decreased in male but not female Rasgrf2 knockout mice and alcohol-induced dopamine release in the mesolimbic system was also blunted in male Rasgrf2 knockouts (Stacey et al., 2012). This cross-species genetic study shows convergent evidence for a sex-specific effect of a specific gene variant in alcohol intake and provides a plausible molecular mechanism.

      Preregistration Avoids P-Hacking and HARKing

      As indicated above, p-hacking and HARKing are common practices in science. A very simple way to avoid these phenomena is with pre-registration of the study design. Thus animal studies should be registered, much like the registration of clinical trials. There are already various platforms for this, e.g., www.animalstudyregistry.org or www.preclinicaltrials.eu (Pulverer, 2020). In the case of an exploratory study Type I, in which methodological new ground is broken, however, registration of the study design makes little sense, since experimental parameters are likely to be modified again and again in the course of the study until the new method ultimately measures what is intended. Registration only makes sense if a method is already established in a laboratory and is used to carry out new measurements (exploratory study) (Figure 1), or if previously published findings are to be confirmed (confirmatory study) (Kimmelman et al., 2014; Figure 2).

      The workflow of an exploratory animal study type I and II. In the case of an exploratory study Type I, in which methodological new ground is broken the choice of model organism is of critical importance as well as the choice of behavioral test (which could be also a DSM-5-based behavioral model). The study design should be in accordance with the 3R principles, however, registration of the study design makes no sense, since experimental parameters are likely to be modified again and again in the course of the study until the new approach ultimately measures what is intended and can then be published. For a type II study the method is already established in a laboratory and is used to carry out new measurements. The study design can involve a power calculation (based on a systematic review), and is in accordance with the ARRIVE 2.0 guidelines and the 3R/6R principles. Pre-registration should be done and the publication irrespective if it reports negative or positive results should be open access and all digital data should be handled in accordance to the FAIR principles.

      The workflow of a confirmatory animal study. If possible the null hypothesis should be based on a systematic review and even better for a quantitative statement on a meta-analysis. However, for most hypotheses it is not possible to perform a meta-analysis. The study design should include a power calculation, should adhere to the ARRIVE 2.0 guidelines and the 3R/6R principles, and should involve both sexes for generalization. For drug testing a multi-center study provides best translation and should consider tolerance development and a correction for the placebo effect. The study has to be pre-registered also in the form of a registered report. The publication should be open access and all digital data should be handled in accordance to the FAIR principles.

      Confirmatory study designs can also published in the form of a Registered Report. This is a publication format that emphasizes the importance of the research question and the quality of the methodology through peer review prior to data collection. High-quality protocols are then tentatively accepted for publication if the authors use the registered methodology. The Center for Open Science in Charlottesville, United States already lists 295 scientific journals that offer the Registered Report Format.

      The ARRIVE Guidelines Lead to a Comprehensible Methodology

      The ARRIVE Guidelines (Animal Research Reporting of In Vivo Experiments)1 are a well-established instrument for the transparent presentation of the methodology used. Precise reporting on study design, blinding, age and sex of the animals, and many more experimental factors are critical for future replication studies. Many journals have already implemented ARRIVE 2.0 in the guidelines for authors (Percie du Sert et al., 2020). These guidelines, which are attached to a submitted manuscript in the form of a checklist, lead to improved reporting on animal experiments and create transparency for both the editors of scientific journals and reviewers. For complex studies, especially in the case of confirmatory preclinical animal studies, professional help should also be sought and may be required in later FDA or EMA approval procedures. Thus, questions about the study design, statistical procedures and patent law implications should be clarified by professionals.2 The ARRIVE Guidelines and professional advice lead to a comprehensible methodology that enables other laboratories to use the method as well as replicate the resulting data. Unfortunately, ARRIVE guidelines are often adopted to a previously performed experiment. In order to improve this counterproductive conduct, ARRIVE 2.0 compliant study designs should be pre-registered.

      Method hopping, the race for the newest technology, is of major concern when it comes to replication. It is a system-inherent problem of biomedical research in general and will only improve by changing the policies of high impact journals and evaluation procedures of high profile grant applications. However, each individual scientist is also responsible for his/her contribution to this system-inherent problem, as each scientist is free to decide to be a part of this race or to rely on a well-established technology. A skeptical view on method hopping should not hinder technological developments in the biomedical field, but a critical attitude toward the utility of new technologies results in a balance between the use of a well-established technology and new cutting-edge technology.

      Systematic Reviews and Meta-Analyses Lead to Generalized Conclusion

      As previously stated, findings from animal studies, even when replicated, can ultimately only be generalized to a limited extent to the heterogeneous human situation. For example, gender-specific differences are common in psychiatric diseases. How can those biological differences be captured by animal experiments? The National Institutes of Health (NIH) released a policy in 2015 on sex as a biological variable, calling for researchers to factor sex into research designs, analyses, and reporting in animal studies. If NIH-funded researchers propose studying only one sex, they must provide “strong justification from the scientific literature, preliminary data, or other relevant considerations” to justify this (Clayton and Collins, 2014). This NIH driven policy applies more and more to global animal experimentation, and comparative studies on both sexes should be implemented in all exploratory type II studies and confirmatory studies. For exploratory studies type I, in which methodological new ground is broken, the focus on one sex is usually sufficient.

      The generalization of variable and sometimes opposing results that arises from measurements in different laboratories can be obtained by systematic reviews and even quantified by meta-analyses. There is already a long tradition in clinical research of conducting systematic reviews and meta-analyses, which have become an evidence-based foundation of clinical research (Gurevitch et al., 2018). In 1976, the statistician Gene Glass coined the term “meta-analysis” to refer to “the statistical analysis of a large collection of analytical results from individual studies to integrate the results” (Glass, 1976; O’Rourke, 2007). In medicine, the meta-analysis has quickly established itself as an evidence-based instrument. Together with the more than 12,000 systematic reviews of the international research network Cochrane, essential building blocks for improved replication of clinical studies have been established. However, systematic reviews and meta-analysis in animal research have largely gone unnoticed by many researchers and have only recently gained momentum. An example on sex differences illustrates the power of doing a meta-analysis using animal data. It is assumed that behavioral repertoires of males and females may arise from differences in brain function. This may particularly apply to the neurochemistry of the mesolimbic dopamine system in reward processing. Thus, it is generally assumed that differences in basal dopamine levels in the nucleus accumbens and reward-induced dopamine-releasing properties are sex-specific (Müller, 2020). However, a recent meta-analysis that included 676 female and 1,523 male rats found no sex differences in basal levels of dopamine or dopaminergic response to rewards (Egenrieder et al., 2020). In conclusion, asking a research question and stating a working hypothesis accordingly should be based, if possible, on a systematic review or even meta-analysis. For confirmatory animal studies especially, but also to some extent exploratory studies type II, a systematic review or even meta-analysis should be considered (Figures 1, 2).

      Multi-Site Preclinical Confirmatory Trials as a New Module for Translating Animal Findings to a Heterogeneous Human Population

      Furthermore, we can use the variability that arises from measurements in different laboratories for improved translation. Clinical trials are usually multi-center, and therefore the logical conclusion is that multi-center preclinical studies provide a better prediction of the clinical situation. Although the aforementioned study by Crabbe et al. (1999) provides an excellent example of a multi-site animal study, this methodology is still very rarely used. The reason for this rare use is that there is so far little guidance on how to conduct a preclinical confirmatory multi-site trial. As mentioned in Box 1, confirmatory studies are of particular interest for testing a drug’s clinical potential and restricting the advance of ineffective interventions into clinical testing (Kimmelman et al., 2014). This new preclinical module in the drug development process is currently being used for the first time as part of a European research program to study the effects of psychedelic drugs in an animal model of alcohol addiction.3 However, even if multi-site preclinical testing is very well-powered and probably the best approach for the translation of animal studies to humans, the challenge of heterogeneity seen in a human population remains. A preclinical study is usually performed in a specific outbred or inbred rodent strain. Although there is individual variability in behavioral measurement even in inbred strains, limited genetic and behavioral diversity in rodents fails to capture the phenotypic and genetic variability seen in human studies. One way to overcome this challenge is the use of the heterogeneous stock (HS) rats. These are highly recombinant animals, established by crossbreeding eight genetically diverse founder strains (Hansen and Spuhler, 1984; Solberg Woods and Palmer, 2019), resulting in a diversity that mimics the diversity found in the human population. Together with behavioral characterization, this allows for genetic analysis for any measurable behavioral or psychopathological quantitative trait by whole-genome sequencing of each individual. Combining hundreds of animals’ genetic data can thus feed a subsequent GWAS to identify gene variants contributing to a phenotype of interest. Moreover, using a new multidimensional data clustering strategy of large-scale data sets across different study sites can now be used to identify specific behavioral domains for disease vulnerability or resilience in rats (Allen et al., 2021). Therefore, it is recommended to use HS rats in confirmatory preclinical multi-site studies to mimic the human situation and provide additional genetic information under well-controlled environmental conditions.

      Box 1. Glossary

      ARRIVE 2.0 The ARRIVE guidelines (Animal Research: Reporting of In Vivo Experiments) are a checklist of recommendations to improve the reporting of research involving animals—with the aim of increasing the quality and reliability of published research, and thereby increasing reproducibility of animal research (www.arriveguidelines.org). In addition see also the PREPARE guidelines (Planning Research and Experimental Procedures on Animals: Recommendations for Excellence) www.norecopa.no/prepare.

      Confirmatory animal study Confirmatory animal studies resemble to a large degree clinical trials. In this type of study, an a priori hypothesis that is built on previously published results is stated, an adequate power calculation is made, a pre-specified experimental design is pre-registered, and all ARRIVE 2.0 and FAIR guidelines are strictly followed. Confirmatory studies aim less at elaborating theories or mechanisms of a drug’s action than rigorously testing a drug’s clinical potential and restricting the advance of ineffective interventions advanced into clinical testing (Kimmelman et al., 2014). In the best case scenario, confirmatory animal studies are performed in two or more laboratories, resembling a multi-site clinical trial design. Hence, multi-site confirmatory preclinical studies are a new module in the drug development chain and have a great potential to improve the translation of preclinical animal research to humans (Figure 2).

      Exploratory animal study Type I In this type of study, scientifically and often methodologically novel ideas are developed, and experimental parameters must be modified again and again in the course of the study until the new method ultimately measures what is intended. Exploratory studies Type I are often driven by a loosely articulated working hypothesis or a hypothesis that evolves over the course of sequential experiments (see also HARKing). The sequence of individual experiments in exploratory studies and details of their design (including sample size, since effect sizes are unknown) are typically not established at the beginning of investigation. Exploratory animal studies Type I aim primarily at developing pathophysiological theories (Kimmelman et al., 2014; Figure 1).

      Exploratory study Type II In this type of study, a method already established in a laboratory to carry out new measurements is used to break new scientific ground. In this case, a working hypothesis is clearly stated, a power calculation is made because effect sizes are usually known from previous experiments, pre-registration is recommended, and ARRIVE 2.0 and FAIR guidelines should be followed (Figure 1).

      FAIR In 2016, the FAIR Guiding Principles for scientific data were published (Wilkinson et al., 2016). FAIR improves the Findability, Accessibility, Interoperability, and Reuse of digital datasets (www.go-fair.org). Initially FAIR focused on datasets from human research and trials; very recently the guidelines were also adapted to preclinical research (Briggs et al., 2021).

      HARCKing Hypothesizing After the Results are Known.

      Method hopping Method hopping refers to the race for the newest technology and has the potential to produce unreliable measurements.

      P-hacking The selective reporting of significant data. Data analysis is performed in such a way as to find patterns in data that can be presented as statistically significant, thus dramatically increasing the risk of false positives.

      3R Russell and Burch introduced the 3R principles more than 60 years ago (Tannenbaum and Bennett, 2015). The 3Rs (Replacement, Reduction, and Refinement) have become the guiding principles for the ethical use of animals in research (www.nc3rs.org.uk).

      6R Although the 3Rs provide the ethical principle of animal research, animal welfare alone does not suffice to make animal research ethical if the research does not have sufficient scientific value. Therefore, Strech and Dirnagl (2019) introduced Robustness, Registration and Reporting, in addition to the 3Rs, all of which aim to safeguard and increase the scientific value and reproducibility of animal research.

      Multi-site preclinical confirmatory trials aim for rigorously testing a drug’s clinical potential and restricting the advance of ineffective interventions advanced into clinical testing (Kimmelman et al., 2014). As said the phenomenon of tolerance development is quite common but rarely assessed (Bespalov et al., 2016). Therefore, it is essential to plan within the study design chronic intermittent treatment schedules, similar to the application in clinical trials, to study the occurrence of tolerance. In addition to studying tolerance, the pronounced placebo effect usually seen in a clinical psychiatric trial has to be also considered in multi-site preclinical confirmatory trials. Since a placebo effect does not occur in those animal studies one has to correct for the placebo effect otherwise an overestimation of the effect size is achieved. Especially, in less severe study populations a medium effect size for placebo is often reported (Scherrer et al., 2021). Therefore, if the effect size in a multi-site preclinical confirmatory trials for drug testing is not at least large (d ≥ 0.8) it is unlikely that a later preclinical trial that builds on those preclinical findings will yield significant results.

      Multi-site preclinical confirmatory trials are cost-intensive and require a well harmonized collaborative effort. Moreover, there are only very limited funding schemes that would support such an approach. An alternative is to study a large animal cohort by splitting an experiment into several “mini-experiments” spread over different time points a few weeks apart. Indeed, such a “mini-experiment” design in comparison to a conventionally standardized design, according to which all animals are tested at one specific point in time improved the reproducibility and accurate detection of exemplary treatment effects (von Kortzfleisch et al., 2020).

      Publishing Negative Results to Counteract the Publication Bias for Positive Results

      A large proportion of animal experimentation results in negative results that do not supporting the working hypothesis. These negative results are usually not published, resulting in a publication bias. However, publishing negative results is easier said than done, as it is difficult to convince reviewers of the validity of negative results. Reviewers often require more control experiments and converging evidence to support the validity of negative findings in the peer-review process than in the review of positive findings. Moreover, scientific journals have little interest in publishing negative findings (Bespalov et al., 2019). In return, Springer Publisher founded the Journal of Negative Results in BioMedicine back in 2002. The journal was not able to assert itself in the “market,” did not receive an impact factor, and was discontinued in 2017. This problem has been recognized, as major journals such as Nature trumpet “Highlight negative results to improve science” (Mehta, 2019), but in fact there is little movement in the publication landscape. Instead of HARKing and p-hacking and thus turning a negative result into a false positive, negative results should be published as they are.

      Open Science for Scientific Data and Publications

      There is now a strong open science movement to publish results, either positive or negative, as open access, and scientific data should be handled by the FAIR Guiding Principles that act within the open science framework. Although the open science movement has great advocacy by many stakeholders, some criticism is also of note. Every transformation process—such as the open science movement—is complex and requires time and financial resources. The pendulum often swings too far during a transformation process and good intentions are quickly misguided. This process is best described by the implementation of Open Access publications. Free access to scientific literature and the consequent accessibility of primary and metadata is certainly a noble goal. As a result, everyone has access to scientific literature, and the journal crisis—the annual increase in prices for scientific journals with stagnant or even declining budgets—can be eliminated. There is a cost to that goal, but who foots the bill? In Germany, for example, an alliance of various scientific organizations initiated the DEAL project a few years ago. New contract models were negotiated nationwide with the three major scientific publishers: Elsevier, Springer Nature, and Wiley. DEAL was completed in 2020 and one would think that open access publishing in Germany would be a foregone conclusion. Not even close. At almost all universities, it is largely unclear who is required to bear the costs, which are increasingly being passed on to the individual working groups. Incidentally, the prices are rising at the same rate, and the publishers are earning even more than before. In the beginning of 2022, Nature Neuroscience released their Article Processing Charge (APC) to be paid for gold access, which is US$11,390. Publishing scientific findings is and will remain big business, and whether the Open Access movement will counteract this or even promote economization can only be assessed in the next 10 years. Basically, scientific publishing is a perverted world anyway: imagine that an author painstakingly writes a novel, several editors proofread it free of charge and the publisher then charges the author a high fee for publication, so that the book is now freely accessible. Unthinkable in the free economy, everyday business in science.

      Summary of Points for Recommendation to Improve Reproducibility and Translation

      In Box 2 ten points of recommendation are summarized that will help to improve reproducibility and translation of animal experimentation. Most of these recommendations are also part of the new 6R framework that enlarges the scope of the widely established 3R framework for the ethical use of animals in research by three additional guiding principles that are Robustness, Registration and Reporting, all of which aim to safeguard and increase the scientific value of animal research (Strech and Dirnagl, 2019).

      Box 2. [b] Ten points of recommendation to improve reproducibility and translation

      Prior to planning an actual study, a systematic review or potential preclinical meta-analysis should be considered.

      An a priori power calculation should be carried out.

      The experimental study protocol should be pre-registered.

      The execution of the study should be in accordance with the most recent ARRIVE guidelines. In addition to conventional statistics Bayes values and effect size estimation should be considered.

      When planning the study, the generalizability of the data to be collected should also be considered (e.g., sex or age differences).

      “Method-hopping” should be avoided, meaning that it is not necessary to use the most advanced technology but rather to have the applied methodology under control.

      National or international networks should be considered to obtain convergent evidence or to carry out a multicenter preclinical confirmatory study. The latter aimed at conducting drug testing should also consider tolerance development and correction for placebo effects.

      Animal models that capture DSM-5 or ICD-11 criteria should be considered in the context of research on psychiatric disorders.

      Raw data of publication should be made publicly available and digital datasets should be in accordance with the FAIR Guiding Principles for scientific data management.

      Negative findings should be published to counteract publication bias.

      Conclusion

      Science has long been regarded as “self-correcting,” given that it is founded on the replication of earlier work. However, in recent decades the checks and balances that once ensured scientific fidelity have been challenged in the entire biomedical field. This has compromised the ability of today’s researchers to reproduce others’ findings (Collins and Tabak, 2014), and has also resulted in many translation failures. This perspective article outlines a list of reasons responsible, to a large extent, for a lack of reproducibility in animal studies and difficulties in translation from animals to humans. This perspective article also provides recommendations to improve reproducibility and translation. However, even if the 10 points of recommendation listed in Box 2 are implemented, a translation error can always occur. The decisive question shall then be: Why did a translation error occur? Here a prominent example of a translation error—the development of corticotropin-releasing factor (CRF) receptor 1 antagonists for alcohol addiction and other stress-related psychiatric conditions—is provided.

      Convincing evidence from animal studies demonstrated an up-regulated CRF and CRF-R1 expression within the amygdala that underlie the increased behavioral sensitivity to stress following development of alcohol dependence (Heilig and Koob, 2007). Animal studies also suggested that CRF activity plays a major role in mediating relapse provoked by stressors, as well as escalation of drinking in alcohol dependent rats (Heilig and Koob, 2007). These findings led to the expectation that brain penetrant CRF-R1 antagonists would block stress-induced craving and relapse in people with alcohol addiction. Big-Pharma invested hundreds of millions in drug development programs for CRF-R1 antagonists and subsequent clinical trials. However, clinical trials in alcohol dependent patients have not supported the expectations (Kwako et al., 2015; Schwandt et al., 2016). Other clinical trials with CRF-R1 antagonists for stress-related psychiatric conditions also failed. This includes negative trials for depression (Binneman et al., 2008), anxiety (Coric et al., 2010), and post-traumatic stress disorder (Dunlop et al., 2017). This translation failure, however, was the results of a publication bias for positive results and the ignorance of earlier animal studies that indicated that CRF-R1 antagonists will not work in clinical trials. Already in 2002 it was shown that CRF-R1 knockout mice show an escalation in alcohol drinking (Sillaber et al., 2002) and that CRF-R1 antagonists do not reduce drinking in a relapse model (Molander et al., 2012). Moreover, CRF-R1 can produce bidirectional effects on motivational behavior, depending on the neuronal population in which this receptor is expressed (Refojo et al., 2011). These studies with negative or even opposing results explain the observed null effects of antagonists on stress-induced disorders and one wonders why those results were not considered at the time when cost-intensive clinical trials were initiated.

      One can learn from translation errors. For this purpose, however, a critical dialogue between basic researcher, pre-clinicians and clinicians is of essential importance. There is only a crisis if we cannot find an explanation for the lack of replication and/or a translation failure.

      Data Availability Statement

      The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author/s.

      Author Contributions

      The author confirms being the sole contributor of this work and has approved it for publication.

      Conflict of Interest

      The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

      Publisher’s Note

      All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

      Funding

      This work was supported by the Ministry of Education and Research Baden-Württemberg for the 3R-Center Rhein-Neckar (www.3r-rn.de), the Bundesministerium für Bildung und Forschung (BMBF) funded SysMedSUDs consortium (01ZX1909A), AhEAD consortium (01KC2004A), Target-Oxy consortium (031L0190A), the ERA-NET consortium Psi-Alc (01EW1908; www.psialc.org), the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), TRR 265 consortium (A05; Heinz et al., 2020), SFB1158 (B04; www.sfb1158.de), and GRK2350 (www.grk2350.de).

      I thank Rick Bernardi for English editing and Marcus Meinhardt and Wolfgang Sommer for giving critical comments to the manuscript.

      References Allen C. Kuhn B. N. Cannella N. Crow A. D. Roberts A. T. Lunerti V. (2021). Network-Based discovery of opioid use vulnerability in rats using the bayesian stochastic block model. Front. Psychiatry 12:745468. 10.3389/fpsyt.2021.745468 34975564 Baker M. (2016). 1,500 scientists lift the lid on reproducibility. Nature 533 452454. 10.1038/533452a 27225100 Blewitt M. Whitelaw E. (2013). The use of mouse models to study epigenetics. Cold Spring Harb. Perspect. Biol. 5:a017939. 10.1101/cshperspect.a017939 24186070 Bespalov A. Steckler T. Altevogt B. Koustova E. Skolnick P. Deaver D. (2016). Failed trials for central nervous system disorders do not necessarily invalidate preclinical models and drug targets. Nat. Rev. Drug Discov. 15:516. 10.1038/nrd.2016.88 27312728 Bespalov A. Steckler T. Skolnick P. (2019). Be positive about negatives-recommendations for the publication of negative (or null) results. Eur. Neuropsychopharmacol. 29 13121320. 10.1016/j.euroneuro.2019.10.007 31753777 Binneman B. Feltner D. Kolluri S. Shi Y. Qiu R. Stiger T. (2008). A 6-week randomized, placebo-controlled trial of CP-316,311 (a selective CRH1 antagonist) in the treatment of major depression. Am. J. Psychiatry 165 617620. 10.1176/appi.ajp.2008.07071199 18413705 Briggs K. Bosc N. Camara T. Diaz C. Drew P. Drewe W. C. (2021). Guidelines for FAIR sharing of preclinical safety and off-target pharmacology data. ALTEX 38 187197. 10.14573/altex.2011181 33637997 Button K. S. Ioannidis J. P. Mokrysz C. Nosek B. A. Flint J. Robinson E. S. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14 365376. 10.1038/nrn3475 23571845 Chu C. Murdock M. H. Jing D. Won T. H. Chung H. Kressel A. M. (2019). The microbiota regulate neuronal function and fear extinction learning. Nature 574 543548. 10.1038/s41586-019-1644-y 31645720 Clayton J. A. Collins F. S. (2014). Policy: NIH to balance sex in cell and animal studies. Nature 509 282283. 10.1038/509282a 24834516 Collins F. S. Tabak L. A. (2014). Policy: NIH plans to enhance reproducibility. Nature 505 612613. 10.1038/505612a 24482835 Coric V. Feldman H. H. Oren D. A. Shekhar A. Pultz J. Dockens R. C. (2010). Multicenter, randomized, double-blind, active comparator and placebo-controlled trial of a corticotropin-releasing factor receptor-1 antagonist in generalized anxiety disorder. Depress. Anxiety 27 417425. 10.1002/da.20695 20455246 Crabbe J. C. Wahlsten D. Dudek B. C. (1999). Genetics of mouse behavior: interactions with laboratory environment. Science 284 16701672. 10.1126/science.284.5420.1670 10356397 De Santis S. Bach P. Pérez-Cervera L. Cosa-Linan A. Weil G. Vollstädt-Klein S. (2019). Microstructural white matter alterations in men with alcohol use disorder and rats with excessive alcohol consumption during early abstinence. JAMA Psychiatry 76 749758. 10.1001/jamapsychiatry.2019.0318 30942831 Deroche-Gamonet V. Piazza P. V. (2014). Psychobiology of cocaine addiction: contribution of a multi-symptomatic animal model of loss of control. Neuropharmacology 76(Pt B) 437449. 10.1016/j.neuropharm.2013.07.014 23916478 Dunlop B. W. Binder E. B. Iosifescu D. Mathew S. J. Neylan T. C. Pape J. C. (2017). Corticotropin-releasing factor receptor 1 antagonism is ineffective for women with posttraumatic stress disorder. Biol. Psychiatry 82 866874. 10.1016/j.biopsych.2017.06.024 28793974 Egenrieder L. Mitricheva E. Spanagel R. Noori H. R. (2020). No basal or drug-induced sex differences in striatal dopaminergic levels: a cluster and meta-analysis of rat microdialysis studies. J. Neurochem. 152 482492. 10.1111/jnc.14911 31705667 Ellenbroek B. Youn J. (2016). Rodent models in neuroscience research: is it a rat race? Dis. Model Mech. 9 10791087. 10.1242/dmm.026120 27736744 Fanelli D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One 4:e5738. 10.1371/journal.pone.0005738 19478950 Fanelli D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics 90 891904. Glass G. V. (1976). Primary, secondary and meta-analysis of research. Educ. Res. 10 38. Gurevitch J. Koricheva J. Nakagawa S. Stewart G. (2018). Meta-analysis and the science of research synthesis. Nature 555 175182. 10.1038/nature25753 29517004 Halsey L. G. (2019). The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum? Biol. Lett. 15:20190174. 10.1098/rsbl.2019.0174 31113309 Halsey L. G. Curran-Everett D. Vowler S. L. Drummond G. B. (2015). The fickle P value generates irreproducible results. Nat. Methods 12 179185. 10.1038/nmeth.3288 25719825 Hansen C. Spuhler K. (1984). Development of the National Institutes of Health genetically heterogeneous rat stock. Alcohol. Clin. Exp. Res. 8 477479. 10.1111/j.1530-0277.1984.tb05706.x 6391259 Head M. L. Holman L. Lanfear R. Kahn A. T. Jennions M. D. (2015). The extent and consequences of p-hacking in science. PLoS Biol. 13:e1002106. 10.1371/journal.pbio.1002106 25768323 Heilig M. Koob G. F. (2007). A key role for corticotropin-releasing factor in alcohol dependence. Trends Neurosci. 30 399406. 10.1016/j.tins.2007.06.006 17629579 Heinz A. Kiefer F. Smolka M. N. Endrass T. Beste C. Beck A. (2020). Addiction research consortium: losing and regaining control over drug intake (ReCoDe)-From trajectories to mechanisms and interventions. Addict. Biol. 25:e12866. 10.1111/adb.12866 31859437 Herrenstein R. J. (1962). Placebo effect in the rat. Science 138 677678. 10.1126/science.138.3541.677 13954106 Kalinichenko L. S. Kornhuber J. Müller C. P. (2019). Individual differences in inflammatory and oxidative mechanisms of stress-related mood disorders. Front. Neuroendocrinol. 55:100783. 10.1016/j.yfrne.2019.100783 31415777 Keller A. Akintola T. Colloca L. (2018). Placebo analgesia in rodents: current and future research. Internat. Rev. Neurobiol. 138 115. Kerr N. L. (1998). HARKing: hypothesizing after the results are known. Pers. Soc. Psychol. Rev. 2 196217. 10.1207/s15327957pspr0203_4 Kilkenny C. Parsons N. Kadyszewski E. Festing M. F. Cuthill I. C. Fry D. (2009). Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One 4:e7824. 10.1371/journal.pone.0007824 19956596 Kimmelman J. Mogil J. S. Dirnagl U. (2014). Distinguishing between exploratory and confirmatory preclinical research will improve translation. PLoS Biol. 12:e1001863. 10.1371/journal.pbio.1001863 24844265 Kwako L. E. Spagnolo P. A. Schwandt M. L. Thorsell A. George D. T. Momenan R. (2015). The corticotropin releasing hormone-1 (CRH1) receptor antagonist pexacerfont in alcohol dependence: a randomized controlled experimental medicine study. Neuropsychopharmacology 40 10531063. 10.1038/npp.2014.306 25409596 Liu Z. Cai Y. Liao Z. Xu Y. Wang Y. Wang Z. (2019). Cloning of a gene-edited macaque monkey by somatic cell nuclear transfer. Natl. Sci. Rev. 6 101108. 10.1093/nsr/nwz003 34691835 Mehta D. (2019). Highlight negative results to improve science. Nature 10.1038/d41586-019-02960-3 [Online ahead of print] 33009522 Michel M. C. Murphy T. J. Motulsky H. J. (2020). New author guidelines for displaying data and reporting data analysis and statistical methods in experimental biology. J. Pharmacol. Exp. Ther. 372 136147. 10.1124/jpet.119.264143 31884418 Molander A. Vengeliene V. Heilig M. Wurst W. Deussing J. M. Spanagel R. (2012). Brain-specific inactivation of the Crhr1 gene inhibits post-dependent and stress-induced alcohol intake, but does not affect relapse-like drinking. Neuropsychopharmacology 37 10471056. 10.1038/npp.2011.297 22113086 Müller C. P. (2020). Everything you always wanted to know about sex and dopamine, but were afraid to ask: an Editorial for ‘No basal or drug-induced sex differences in striatal dopaminergic levels: a cluster and metaanalysis of rat microdialysis studies’ on page 482. J. Neurochem. 152 422424. 10.1111/jnc.14916 31792978 Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science 349:aac4716. 10.1126/science.aac4716 26315443 O’Rourke K. (2007). An historical perspective on meta-analysis: dealing quantitatively with varying study results. J. R. Soc. Med. 100 579582. 10.1177/0141076807100012020 18065712 Pellis S. M. Pasztor T. J. (1999). The developmental onset of a rudimentary form of play fighting in C57 mice. Dev. Psychobiol. 34 175182. Percie du Sert N. Hurst V. Ahluwalia A. Alam S. Avey M. T. Baker M. (2020). The ARRIVE guidelines 2.0: updated guidelines for reporting animal research. PLoS Biol. 18:e3000410. 10.1371/journal.pbio.3000410 32663219 Perrin S. (2014). Preclinical research: make mouse studies work. Nature 507 423425. 10.1038/507423a 24678540 Petetta F. Ciccocioppo R. (2021). Public perception of laboratory animal testing: historical, philosophical, and ethical view. Addict. Biol. 26:e12991. 10.1111/adb.12991 33331099 Popper K. (1935). Logik der Forschung Zur Erkenntnistheorie der modernen Naturwissenschaft. Vienna: Springer, 54, 13. 10.1007/s10838-020-09531-5 Pulverer B. (2020). Registered animal studies and null data. EMBO Rep. 21:e49868. 10.15252/embr.201949868 31867857 Refojo D. Schweizer M. Kuehne C. Ehrenberg S. Thoeringer C. Vogl A. M. (2011). Glutamatergic and dopaminergic neurons mediate anxiogenic and anxiolytic effects of CRHR1. Science 333 19031907. 10.1126/science.1202107 21885734 Scherrer B. Guiraud J. Addolorato G. Aubin H. J. de Bejczy A. Benyamina A. (2021). Baseline severity and the prediction of placebo response in clinical trials for alcohol dependence: a meta-regression analysis to develop an enrichment strategy. Alcohol. Clin. Exp. Res. 45 17221734. 10.1111/acer.14670 34418121 Schwandt M. L. Cortes C. R. Kwako L. E. George D. T. Momenan R. Sinha R. (2016). The CRF1 antagonist verucerfont in anxious alcohol-dependent women: translation of neuroendocrine, but not of anti-craving effects. Neuropsychopharmacology 41 28182829. 10.1038/npp.2016.61 27109623 Serra-Garcia M. Gneezy U. (2021). Nonreplicable publications are cited more than replicable ones. Sci. Adv. 7:eabd1705. 10.1126/sciadv.abd1705 34020944 Sillaber I. Rammes G. Zimmermann S. Mahal B. Zieglgänsberger W. Wurst W. (2002). Enhanced and delayed stress-induced alcohol drinking in mice lacking functional CRH1 receptors. Science 296 931933. 10.1126/science.1069836 11988580 Solberg Woods L. C. Palmer A. A. (2019). Using heterogeneous stocks for fine-mapping genetically complex traits. Methods Mol. Biol. 2018 233247. 10.1007/978-1-4939-9581-3_11 Spanagel R. (2017). Animal models of addiction. Dialogues Clin. Neurosci. 19 247258. 10.31887/DCNS.2017.19.3 Stacey D. Bilbao A. Maroteaux M. Jia T. Easton A. C. Longueville S. (2012). RASGRF2 regulates alcohol-induced reinforcement by influencing mesolimbic dopamine neuron activity and dopamine release. Proc. Natl. Acad. Sci. U.S.A. 109 2112821133. 10.1073/pnas.1211844110 23223532 Strech D. Dirnagl U. (2019). 3Rs missing: animal research without scientific value is unethical. BMJ Open Sci. 3:e000035. 10.1136/bmjos-2018-000048 35047678 Tannenbaum J. Bennett T. (2015). Russell and Burch’s 3Rs then and now: the need for clarity in definition and purpose. J. Am. Assoc. Lab Anim. Sci. 54 120132. van de Schoot R. Depaoli S. King R. Kramer B. Märtens K. Mahlet G. (2021). Bayesian statistics and modelling. Nat. Rev. Methods Primers 1:16. 10.1038/s43586-020-00001-2 Vengeliene V. Bilbao A. Spanagel R. (2014). The alcohol deprivation effect model for studying relapse behavior: a comparison between rats and mice. Alcohol 48 313320. 10.1016/j.alcohol.2014.03.002 24811155 von Kortzfleisch V. T. Karp N. A. Palme R. Kaiser S. Sachser N. Richter S. H. (2020). Improving reproducibility in animal research by splitting the study population into several ‘mini-experiments’. Sci. Rep. 10:16579. 10.1038/s41598-020-73503-4 33024165 Vuong H. E. Yano J. M. Fung T. C. Hsiao E. Y. (2017). The microbiome and host behavior. Annu. Rev. Neurosci. 40 2149. 10.1146/annurev-neuro-072116-031347 28301775 Wilkinson M. D. Dumontier M. Aalbersberg I. J. Appleton G. Axton M. Baak A. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018. 10.1038/sdata.2016.18 26978244 Yartsev M. M. (2017). The emperor’s new wardrobe: rebalancing diversity of animal models in neuroscience research. Science 358 466469. 10.1126/science.aan8865 29074765

      www.arriveguidelines.org

      www.paasp.net

      www.psialc.org

      ‘Oh, my dear Thomas, you haven’t heard the terrible news then?’ she said. ‘I thought you would be sure to have seen it placarded somewhere. Alice went straight to her room, and I haven’t seen her since, though I repeatedly knocked at the door, which she has locked on the inside, and I’m sure it’s most unnatural of her not to let her own mother comfort her. It all happened in a moment: I have always said those great motor-cars shouldn’t be allowed to career about the streets, especially when they are all paved with cobbles as they are at Easton Haven, which are{331} so slippery when it’s wet. He slipped, and it went over him in a moment.’ My thanks were few and awkward, for there still hung to the missive a basting thread, and it was as warm as a nestling bird. I bent low--everybody was emotional in those days--kissed the fragrant thing, thrust it into my bosom, and blushed worse than Camille. "What, the Corner House victim? Is that really a fact?" "My dear child, I don't look upon it in that light at all. The child gave our picturesque friend a certain distinction--'My husband is dead, and this is my only child,' and all that sort of thing. It pays in society." leave them on the steps of a foundling asylum in order to insure [See larger version] Interoffice guff says you're planning definite moves on your own, J. O., and against some opposition. Is the Colonel so poor or so grasping—or what? Albert could not speak, for he felt as if his brains and teeth were rattling about inside his head. The rest of[Pg 188] the family hunched together by the door, the boys gaping idiotically, the girls in tears. "Now you're married." The host was called in, and unlocked a drawer in which they were deposited. The galleyman, with visible reluctance, arrayed himself in the garments, and he was observed to shudder more than once during the investiture of the dead man's apparel. HoME香京julia种子在线播放 ENTER NUMBET 0016www.jqrzg.com.cn
      gjbduq.com.cn
      www.lyy520.com.cn
      www.htzpsx.com.cn
      rncxyy.org.cn
      mmilul.com.cn
      rphxce.com.cn
      nbhongyuan.com.cn
      www.npnvh.net.cn
      www.miyih.org.cn
      处女被大鸡巴操 强奸乱伦小说图片 俄罗斯美女爱爱图 调教强奸学生 亚洲女的穴 夜来香图片大全 美女性强奸电影 手机版色中阁 男性人体艺术素描图 16p成人 欧美性爱360 电影区 亚洲电影 欧美电影 经典三级 偷拍自拍 动漫电影 乱伦电影 变态另类 全部电 类似狠狠鲁的网站 黑吊操白逼图片 韩国黄片种子下载 操逼逼逼逼逼 人妻 小说 p 偷拍10幼女自慰 极品淫水很多 黄色做i爱 日本女人人体电影快播看 大福国小 我爱肏屄美女 mmcrwcom 欧美多人性交图片 肥臀乱伦老头舔阴帝 d09a4343000019c5 西欧人体艺术b xxoo激情短片 未成年人的 插泰国人夭图片 第770弾み1 24p 日本美女性 交动态 eee色播 yantasythunder 操无毛少女屄 亚洲图片你懂的女人 鸡巴插姨娘 特级黄 色大片播 左耳影音先锋 冢本友希全集 日本人体艺术绿色 我爱被舔逼 内射 幼 美阴图 喷水妹子高潮迭起 和后妈 操逼 美女吞鸡巴 鸭个自慰 中国女裸名单 操逼肥臀出水换妻 色站裸体义术 中国行上的漏毛美女叫什么 亚洲妹性交图 欧美美女人裸体人艺照 成人色妹妹直播 WWW_JXCT_COM r日本女人性淫乱 大胆人艺体艺图片 女同接吻av 碰碰哥免费自拍打炮 艳舞写真duppid1 88电影街拍视频 日本自拍做爱qvod 实拍美女性爱组图 少女高清av 浙江真实乱伦迅雷 台湾luanlunxiaoshuo 洛克王国宠物排行榜 皇瑟电影yy频道大全 红孩儿连连看 阴毛摄影 大胆美女写真人体艺术摄影 和风骚三个媳妇在家做爱 性爱办公室高清 18p2p木耳 大波撸影音 大鸡巴插嫩穴小说 一剧不超两个黑人 阿姨诱惑我快播 幼香阁千叶县小学生 少女妇女被狗强奸 曰人体妹妹 十二岁性感幼女 超级乱伦qvod 97爱蜜桃ccc336 日本淫妇阴液 av海量资源999 凤凰影视成仁 辰溪四中艳照门照片 先锋模特裸体展示影片 成人片免费看 自拍百度云 肥白老妇女 女爱人体图片 妈妈一女穴 星野美夏 日本少女dachidu 妹子私处人体图片 yinmindahuitang 舔无毛逼影片快播 田莹疑的裸体照片 三级电影影音先锋02222 妻子被外国老头操 观月雏乃泥鳅 韩国成人偷拍自拍图片 强奸5一9岁幼女小说 汤姆影院av图片 妹妹人艺体图 美女大驱 和女友做爱图片自拍p 绫川まどか在线先锋 那么嫩的逼很少见了 小女孩做爱 处女好逼连连看图图 性感美女在家做爱 近距离抽插骚逼逼 黑屌肏金毛屄 日韩av美少女 看喝尿尿小姐日逼色色色网图片 欧美肛交新视频 美女吃逼逼 av30线上免费 伊人在线三级经典 新视觉影院t6090影院 最新淫色电影网址 天龙影院远古手机版 搞老太影院 插进美女的大屁股里 私人影院加盟费用 www258dd 求一部电影里面有一个二猛哥 深肛交 日本萌妹子人体艺术写真图片 插入屄眼 美女的木奶 中文字幕黄色网址影视先锋 九号女神裸 和骚人妻偷情 和潘晓婷做爱 国模大尺度蜜桃 欧美大逼50p 西西人体成人 李宗瑞继母做爱原图物处理 nianhuawang 男鸡巴的视屏 � 97免费色伦电影 好色网成人 大姨子先锋 淫荡巨乳美女教师妈妈 性nuexiaoshuo WWW36YYYCOM 长春继续给力进屋就操小女儿套干破内射对白淫荡 农夫激情社区 日韩无码bt 欧美美女手掰嫩穴图片 日本援交偷拍自拍 入侵者日本在线播放 亚洲白虎偷拍自拍 常州高见泽日屄 寂寞少妇自卫视频 人体露逼图片 多毛外国老太 变态乱轮手机在线 淫荡妈妈和儿子操逼 伦理片大奶少女 看片神器最新登入地址sqvheqi345com账号群 麻美学姐无头 圣诞老人射小妞和强奸小妞动话片 亚洲AV女老师 先锋影音欧美成人资源 33344iucoom zV天堂电影网 宾馆美女打炮视频 色五月丁香五月magnet 嫂子淫乱小说 张歆艺的老公 吃奶男人视频在线播放 欧美色图男女乱伦 avtt2014ccvom 性插色欲香影院 青青草撸死你青青草 99热久久第一时间 激情套图卡通动漫 幼女裸聊做爱口交 日本女人被强奸乱伦 草榴社区快播 2kkk正在播放兽骑 啊不要人家小穴都湿了 www猎奇影视 A片www245vvcomwwwchnrwhmhzcn 搜索宜春院av wwwsee78co 逼奶鸡巴插 好吊日AV在线视频19gancom 熟女伦乱图片小说 日本免费av无码片在线开苞 鲁大妈撸到爆 裸聊官网 德国熟女xxx 新不夜城论坛首页手机 女虐男网址 男女做爱视频华为网盘 激情午夜天亚洲色图 内裤哥mangent 吉沢明歩制服丝袜WWWHHH710COM 屌逼在线试看 人体艺体阿娇艳照 推荐一个可以免费看片的网站如果被QQ拦截请复制链接在其它浏览器打开xxxyyy5comintr2a2cb551573a2b2e 欧美360精品粉红鲍鱼 教师调教第一页 聚美屋精品图 中韩淫乱群交 俄罗斯撸撸片 把鸡巴插进小姨子的阴道 干干AV成人网 aolasoohpnbcn www84ytom 高清大量潮喷www27dyycom 宝贝开心成人 freefronvideos人母 嫩穴成人网gggg29com 逼着舅妈给我口交肛交彩漫画 欧美色色aV88wwwgangguanscom 老太太操逼自拍视频 777亚洲手机在线播放 有没有夫妻3p小说 色列漫画淫女 午间色站导航 欧美成人处女色大图 童颜巨乳亚洲综合 桃色性欲草 色眯眯射逼 无码中文字幕塞外青楼这是一个 狂日美女老师人妻 爱碰网官网 亚洲图片雅蠛蝶 快播35怎么搜片 2000XXXX电影 新谷露性家庭影院 深深候dvd播放 幼齿用英语怎么说 不雅伦理无需播放器 国外淫荡图片 国外网站幼幼嫩网址 成年人就去色色视频快播 我鲁日日鲁老老老我爱 caoshaonvbi 人体艺术avav 性感性色导航 韩国黄色哥来嫖网站 成人网站美逼 淫荡熟妇自拍 欧美色惰图片 北京空姐透明照 狼堡免费av视频 www776eom 亚洲无码av欧美天堂网男人天堂 欧美激情爆操 a片kk266co 色尼姑成人极速在线视频 国语家庭系列 蒋雯雯 越南伦理 色CC伦理影院手机版 99jbbcom 大鸡巴舅妈 国产偷拍自拍淫荡对话视频 少妇春梦射精 开心激动网 自拍偷牌成人 色桃隐 撸狗网性交视频 淫荡的三位老师 伦理电影wwwqiuxia6commqiuxia6com 怡春院分站 丝袜超短裙露脸迅雷下载 色制服电影院 97超碰好吊色男人 yy6080理论在线宅男日韩福利大全 大嫂丝袜 500人群交手机在线 5sav 偷拍熟女吧 口述我和妹妹的欲望 50p电脑版 wwwavtttcon 3p3com 伦理无码片在线看 欧美成人电影图片岛国性爱伦理电影 先锋影音AV成人欧美 我爱好色 淫电影网 WWW19MMCOM 玛丽罗斯3d同人动画h在线看 动漫女孩裸体 超级丝袜美腿乱伦 1919gogo欣赏 大色逼淫色 www就是撸 激情文学网好骚 A级黄片免费 xedd5com 国内的b是黑的 快播美国成年人片黄 av高跟丝袜视频 上原保奈美巨乳女教师在线观看 校园春色都市激情fefegancom 偷窥自拍XXOO 搜索看马操美女 人本女优视频 日日吧淫淫 人妻巨乳影院 美国女子性爱学校 大肥屁股重口味 啪啪啪啊啊啊不要 操碰 japanfreevideoshome国产 亚州淫荡老熟女人体 伦奸毛片免费在线看 天天影视se 樱桃做爱视频 亚卅av在线视频 x奸小说下载 亚洲色图图片在线 217av天堂网 东方在线撸撸-百度 幼幼丝袜集 灰姑娘的姐姐 青青草在线视频观看对华 86papa路con 亚洲1AV 综合图片2区亚洲 美国美女大逼电影 010插插av成人网站 www色comwww821kxwcom 播乐子成人网免费视频在线观看 大炮撸在线影院 ,www4KkKcom 野花鲁最近30部 wwwCC213wapwww2233ww2download 三客优最新地址 母亲让儿子爽的无码视频 全国黄色片子 欧美色图美国十次 超碰在线直播 性感妖娆操 亚洲肉感熟女色图 a片A毛片管看视频 8vaa褋芯屑 333kk 川岛和津实视频 在线母子乱伦对白 妹妹肥逼五月 亚洲美女自拍 老婆在我面前小说 韩国空姐堪比情趣内衣 干小姐综合 淫妻色五月 添骚穴 WM62COM 23456影视播放器 成人午夜剧场 尼姑福利网 AV区亚洲AV欧美AV512qucomwwwc5508com 经典欧美骚妇 震动棒露出 日韩丝袜美臀巨乳在线 av无限吧看 就去干少妇 色艺无间正面是哪集 校园春色我和老师做爱 漫画夜色 天海丽白色吊带 黄色淫荡性虐小说 午夜高清播放器 文20岁女性荫道口图片 热国产热无码热有码 2015小明发布看看算你色 百度云播影视 美女肏屄屄乱轮小说 家族舔阴AV影片 邪恶在线av有码 父女之交 关于处女破处的三级片 极品护士91在线 欧美虐待女人视频的网站 享受老太太的丝袜 aaazhibuo 8dfvodcom成人 真实自拍足交 群交男女猛插逼 妓女爱爱动态 lin35com是什么网站 abp159 亚洲色图偷拍自拍乱伦熟女抠逼自慰 朝国三级篇 淫三国幻想 免费的av小电影网站 日本阿v视频免费按摩师 av750c0m 黄色片操一下 巨乳少女车震在线观看 操逼 免费 囗述情感一乱伦岳母和女婿 WWW_FAMITSU_COM 偷拍中国少妇在公车被操视频 花也真衣论理电影 大鸡鸡插p洞 新片欧美十八岁美少 进击的巨人神thunderftp 西方美女15p 深圳哪里易找到老女人玩视频 在线成人有声小说 365rrr 女尿图片 我和淫荡的小姨做爱 � 做爱技术体照 淫妇性爱 大学生私拍b 第四射狠狠射小说 色中色成人av社区 和小姨子乱伦肛交 wwwppp62com 俄罗斯巨乳人体艺术 骚逼阿娇 汤芳人体图片大胆 大胆人体艺术bb私处 性感大胸骚货 哪个网站幼女的片多 日本美女本子把 色 五月天 婷婷 快播 美女 美穴艺术 色百合电影导航 大鸡巴用力 孙悟空操美少女战士 狠狠撸美女手掰穴图片 古代女子与兽类交 沙耶香套图 激情成人网区 暴风影音av播放 动漫女孩怎么插第3个 mmmpp44 黑木麻衣无码ed2k 淫荡学姐少妇 乱伦操少女屄 高中性爱故事 骚妹妹爱爱图网 韩国模特剪长发 大鸡巴把我逼日了 中国张柏芝做爱片中国张柏芝做爱片中国张柏芝做爱片中国张柏芝做爱片中国张柏芝做爱片 大胆女人下体艺术图片 789sss 影音先锋在线国内情侣野外性事自拍普通话对白 群撸图库 闪现君打阿乐 ady 小说 插入表妹嫩穴小说 推荐成人资源 网络播放器 成人台 149大胆人体艺术 大屌图片 骚美女成人av 春暖花开春色性吧 女亭婷五月 我上了同桌的姐姐 恋夜秀场主播自慰视频 yzppp 屄茎 操屄女图 美女鲍鱼大特写 淫乱的日本人妻山口玲子 偷拍射精图 性感美女人体艺木图片 种马小说完本 免费电影院 骑士福利导航导航网站 骚老婆足交 国产性爱一级电影 欧美免费成人花花性都 欧美大肥妞性爱视频 家庭乱伦网站快播 偷拍自拍国产毛片 金发美女也用大吊来开包 缔D杏那 yentiyishu人体艺术ytys WWWUUKKMCOM 女人露奶 � 苍井空露逼 老荡妇高跟丝袜足交 偷偷和女友的朋友做爱迅雷 做爱七十二尺 朱丹人体合成 麻腾由纪妃 帅哥撸播种子图 鸡巴插逼动态图片 羙国十次啦中文 WWW137AVCOM 神斗片欧美版华语 有气质女人人休艺术 由美老师放屁电影 欧美女人肉肏图片 白虎种子快播 国产自拍90后女孩 美女在床上疯狂嫩b 饭岛爱最后之作 幼幼强奸摸奶 色97成人动漫 两性性爱打鸡巴插逼 新视觉影院4080青苹果影院 嗯好爽插死我了 阴口艺术照 李宗瑞电影qvod38 爆操舅母 亚洲色图七七影院 被大鸡巴操菊花 怡红院肿么了 成人极品影院删除 欧美性爱大图色图强奸乱 欧美女子与狗随便性交 苍井空的bt种子无码 熟女乱伦长篇小说 大色虫 兽交幼女影音先锋播放 44aad be0ca93900121f9b 先锋天耗ばさ无码 欧毛毛女三级黄色片图 干女人黑木耳照 日本美女少妇嫩逼人体艺术 sesechangchang 色屄屄网 久久撸app下载 色图色噜 美女鸡巴大奶 好吊日在线视频在线观看 透明丝袜脚偷拍自拍 中山怡红院菜单 wcwwwcom下载 骑嫂子 亚洲大色妣 成人故事365ahnet 丝袜家庭教mp4 幼交肛交 妹妹撸撸大妈 日本毛爽 caoprom超碰在email 关于中国古代偷窥的黄片 第一会所老熟女下载 wwwhuangsecome 狼人干综合新地址HD播放 变态儿子强奸乱伦图 强奸电影名字 2wwwer37com 日本毛片基地一亚洲AVmzddcxcn 暗黑圣经仙桃影院 37tpcocn 持月真由xfplay 好吊日在线视频三级网 我爱背入李丽珍 电影师傅床戏在线观看 96插妹妹sexsex88com 豪放家庭在线播放 桃花宝典极夜著豆瓜网 安卓系统播放神器 美美网丝袜诱惑 人人干全免费视频xulawyercn av无插件一本道 全国色五月 操逼电影小说网 good在线wwwyuyuelvcom www18avmmd 撸波波影视无插件 伊人幼女成人电影 会看射的图片 小明插看看 全裸美女扒开粉嫩b 国人自拍性交网站 萝莉白丝足交本子 七草ちとせ巨乳视频 摇摇晃晃的成人电影 兰桂坊成社人区小说www68kqcom 舔阴论坛 久撸客一撸客色国内外成人激情在线 明星门 欧美大胆嫩肉穴爽大片 www牛逼插 性吧星云 少妇性奴的屁眼 人体艺术大胆mscbaidu1imgcn 最新久久色色成人版 l女同在线 小泽玛利亚高潮图片搜索 女性裸b图 肛交bt种子 最热门有声小说 人间添春色 春色猜谜字 樱井莉亚钢管舞视频 小泽玛利亚直美6p 能用的h网 还能看的h网 bl动漫h网 开心五月激 东京热401 男色女色第四色酒色网 怎么下载黄色小说 黄色小说小栽 和谐图城 乐乐影院 色哥导航 特色导航 依依社区 爱窝窝在线 色狼谷成人 91porn 包要你射电影 色色3A丝袜 丝袜妹妹淫网 爱色导航(荐) 好男人激情影院 坏哥哥 第七色 色久久 人格分裂 急先锋 撸撸射中文网 第一会所综合社区 91影院老师机 东方成人激情 怼莪影院吹潮 老鸭窝伊人无码不卡无码一本道 av女柳晶电影 91天生爱风流作品 深爱激情小说私房婷婷网 擼奶av 567pao 里番3d一家人野外 上原在线电影 水岛津实透明丝袜 1314酒色 网旧网俺也去 0855影院 在线无码私人影院 搜索 国产自拍 神马dy888午夜伦理达达兔 农民工黄晓婷 日韩裸体黑丝御姐 屈臣氏的燕窝面膜怎么样つぼみ晶エリーの早漏チ○ポ强化合宿 老熟女人性视频 影音先锋 三上悠亚ol 妹妹影院福利片 hhhhhhhhsxo 午夜天堂热的国产 强奸剧场 全裸香蕉视频无码 亚欧伦理视频 秋霞为什么给封了 日本在线视频空天使 日韩成人aⅴ在线 日本日屌日屄导航视频 在线福利视频 日本推油无码av magnet 在线免费视频 樱井梨吮东 日本一本道在线无码DVD 日本性感诱惑美女做爱阴道流水视频 日本一级av 汤姆avtom在线视频 台湾佬中文娱乐线20 阿v播播下载 橙色影院 奴隶少女护士cg视频 汤姆在线影院无码 偷拍宾馆 业面紧急生级访问 色和尚有线 厕所偷拍一族 av女l 公交色狼优酷视频 裸体视频AV 人与兽肉肉网 董美香ol 花井美纱链接 magnet 西瓜影音 亚洲 自拍 日韩女优欧美激情偷拍自拍 亚洲成年人免费视频 荷兰免费成人电影 深喉呕吐XXⅩX 操石榴在线视频 天天色成人免费视频 314hu四虎 涩久免费视频在线观看 成人电影迅雷下载 能看见整个奶子的香蕉影院 水菜丽百度影音 gwaz079百度云 噜死你们资源站 主播走光视频合集迅雷下载 thumbzilla jappen 精品Av 古川伊织star598在线 假面女皇vip在线视频播放 国产自拍迷情校园 啪啪啪公寓漫画 日本阿AV 黄色手机电影 欧美在线Av影院 华裔电击女神91在线 亚洲欧美专区 1日本1000部免费视频 开放90后 波多野结衣 东方 影院av 页面升级紧急访问每天正常更新 4438Xchengeren 老炮色 a k福利电影 色欲影视色天天视频 高老庄aV 259LUXU-683 magnet 手机在线电影 国产区 欧美激情人人操网 国产 偷拍 直播 日韩 国内外激情在线视频网给 站长统计一本道人妻 光棍影院被封 紫竹铃取汁 ftp 狂插空姐嫩 xfplay 丈夫面前 穿靴子伪街 XXOO视频在线免费 大香蕉道久在线播放 电棒漏电嗨过头 充气娃能看下毛和洞吗 夫妻牲交 福利云点墦 yukun瑟妃 疯狂交换女友 国产自拍26页 腐女资源 百度云 日本DVD高清无码视频 偷拍,自拍AV伦理电影 A片小视频福利站。 大奶肥婆自拍偷拍图片 交配伊甸园 超碰在线视频自拍偷拍国产 小热巴91大神 rctd 045 类似于A片 超美大奶大学生美女直播被男友操 男友问 你的衣服怎么脱掉的 亚洲女与黑人群交视频一 在线黄涩 木内美保步兵番号 鸡巴插入欧美美女的b舒服 激情在线国产自拍日韩欧美 国语福利小视频在线观看 作爱小视颍 潮喷合集丝袜无码mp4 做爱的无码高清视频 牛牛精品 伊aⅤ在线观看 savk12 哥哥搞在线播放 在线电一本道影 一级谍片 250pp亚洲情艺中心,88 欧美一本道九色在线一 wwwseavbacom色av吧 cos美女在线 欧美17,18ⅹⅹⅹ视频 自拍嫩逼 小电影在线观看网站 筱田优 贼 水电工 5358x视频 日本69式视频有码 b雪福利导航 韩国女主播19tvclub在线 操逼清晰视频 丝袜美女国产视频网址导航 水菜丽颜射房间 台湾妹中文娱乐网 风吟岛视频 口交 伦理 日本熟妇色五十路免费视频 A级片互舔 川村真矢Av在线观看 亚洲日韩av 色和尚国产自拍 sea8 mp4 aV天堂2018手机在线 免费版国产偷拍a在线播放 狠狠 婷婷 丁香 小视频福利在线观看平台 思妍白衣小仙女被邻居强上 萝莉自拍有水 4484新视觉 永久发布页 977成人影视在线观看 小清新影院在线观 小鸟酱后丝后入百度云 旋风魅影四级 香蕉影院小黄片免费看 性爱直播磁力链接 小骚逼第一色影院 性交流的视频 小雪小视频bd 小视频TV禁看视频 迷奸AV在线看 nba直播 任你在干线 汤姆影院在线视频国产 624u在线播放 成人 一级a做爰片就在线看狐狸视频 小香蕉AV视频 www182、com 腿模简小育 学生做爱视频 秘密搜查官 快播 成人福利网午夜 一级黄色夫妻录像片 直接看的gav久久播放器 国产自拍400首页 sm老爹影院 谁知道隔壁老王网址在线 综合网 123西瓜影音 米奇丁香 人人澡人人漠大学生 色久悠 夜色视频你今天寂寞了吗? 菲菲影视城美国 被抄的影院 变态另类 欧美 成人 国产偷拍自拍在线小说 不用下载安装就能看的吃男人鸡巴视频 插屄视频 大贯杏里播放 wwwhhh50 233若菜奈央 伦理片天海翼秘密搜查官 大香蕉在线万色屋视频 那种漫画小说你懂的 祥仔电影合集一区 那里可以看澳门皇冠酒店a片 色自啪 亚洲aV电影天堂 谷露影院ar toupaizaixian sexbj。com 毕业生 zaixian mianfei 朝桐光视频 成人短视频在线直接观看 陈美霖 沈阳音乐学院 导航女 www26yjjcom 1大尺度视频 开平虐女视频 菅野雪松协和影视在线视频 华人play在线视频bbb 鸡吧操屄视频 多啪啪免费视频 悠草影院 金兰策划网 (969) 橘佑金短视频 国内一极刺激自拍片 日本制服番号大全magnet 成人动漫母系 电脑怎么清理内存 黄色福利1000 dy88午夜 偷拍中学生洗澡磁力链接 花椒相机福利美女视频 站长推荐磁力下载 mp4 三洞轮流插视频 玉兔miki热舞视频 夜生活小视频 爆乳人妖小视频 国内网红主播自拍福利迅雷下载 不用app的裸裸体美女操逼视频 变态SM影片在线观看 草溜影院元气吧 - 百度 - 百度 波推全套视频 国产双飞集合ftp 日本在线AV网 笔国毛片 神马影院女主播是我的邻居 影音资源 激情乱伦电影 799pao 亚洲第一色第一影院 av视频大香蕉 老梁故事汇希斯莱杰 水中人体磁力链接 下载 大香蕉黄片免费看 济南谭崔 避开屏蔽的岛a片 草破福利 要看大鸡巴操小骚逼的人的视频 黑丝少妇影音先锋 欧美巨乳熟女磁力链接 美国黄网站色大全 伦蕉在线久播 极品女厕沟 激情五月bd韩国电影 混血美女自摸和男友激情啪啪自拍诱人呻吟福利视频 人人摸人人妻做人人看 44kknn 娸娸原网 伊人欧美 恋夜影院视频列表安卓青青 57k影院 如果电话亭 avi 插爆骚女精品自拍 青青草在线免费视频1769TV 令人惹火的邻家美眉 影音先锋 真人妹子被捅动态图 男人女人做完爱视频15 表姐合租两人共处一室晚上她竟爬上了我的床 性爱教学视频 北条麻妃bd在线播放版 国产老师和师生 magnet wwwcctv1024 女神自慰 ftp 女同性恋做激情视频 欧美大胆露阴视频 欧美无码影视 好女色在线观看 后入肥臀18p 百度影视屏福利 厕所超碰视频 强奸mp magnet 欧美妹aⅴ免费线上看 2016年妞干网视频 5手机在线福利 超在线最视频 800av:cOm magnet 欧美性爱免播放器在线播放 91大款肥汤的性感美乳90后邻家美眉趴着窗台后入啪啪 秋霞日本毛片网站 cheng ren 在线视频 上原亚衣肛门无码解禁影音先锋 美脚家庭教师在线播放 尤酷伦理片 熟女性生活视频在线观看 欧美av在线播放喷潮 194avav 凤凰AV成人 - 百度 kbb9999 AV片AV在线AV无码 爱爱视频高清免费观看 黄色男女操b视频 观看 18AV清纯视频在线播放平台 成人性爱视频久久操 女性真人生殖系统双性人视频 下身插入b射精视频 明星潜规测视频 mp4 免賛a片直播绪 国内 自己 偷拍 在线 国内真实偷拍 手机在线 国产主播户外勾在线 三桥杏奈高清无码迅雷下载 2五福电影院凸凹频频 男主拿鱼打女主,高宝宝 色哥午夜影院 川村まや痴汉 草溜影院费全过程免费 淫小弟影院在线视频 laohantuiche 啪啪啪喷潮XXOO视频 青娱乐成人国产 蓝沢润 一本道 亚洲青涩中文欧美 神马影院线理论 米娅卡莉法的av 在线福利65535 欧美粉色在线 欧美性受群交视频1在线播放 极品喷奶熟妇在线播放 变态另类无码福利影院92 天津小姐被偷拍 磁力下载 台湾三级电髟全部 丝袜美腿偷拍自拍 偷拍女生性行为图 妻子的乱伦 白虎少妇 肏婶骚屄 外国大妈会阴照片 美少女操屄图片 妹妹自慰11p 操老熟女的b 361美女人体 360电影院樱桃 爱色妹妹亚洲色图 性交卖淫姿势高清图片一级 欧美一黑对二白 大色网无毛一线天 射小妹网站 寂寞穴 西西人体模特苍井空 操的大白逼吧 骚穴让我操 拉好友干女朋友3p