Deep Research

The Scientific Method: An Extended Summary

Extended summary of philosophy of science concepts situated within the scientific discoveries and disciplinary developments that motivated them.

This document expands on the core philosophical concepts in the philosophy of science by situating each within the scientific discoveries, experiments, and disciplinary developments that motivated them. Where the original summary presents the conceptual architecture, this companion document traces the empirical roots.

1. Ancient & Early Modern Foundations

Aristotle's Organon: Biology as the Template for Scientific Knowledge

Aristotle's conception of scientific knowledge as "knowledge of causes" emerged directly from his biological investigations. His Historia Animalium and De Partibus Animalium represent perhaps the first systematic empirical research programme in natural history. Aristotle dissected over 500 animal species, documenting their anatomy, reproduction, and behaviour. His four-cause framework (material, formal, efficient, final) was not abstract metaphysics but a schema that made sense of biological organisation: understanding an eye requires knowing what it is made of (material), its structure (formal), how it develops (efficient), and what it does (final/functional).

The dual method of induction followed by deduction reflects observational practice. Aristotle first accumulated particulars—the internal structure of cephalopods, the gestation periods of viviparous fish, the brooding habits of birds—before abstracting general principles and then deriving further consequences. His observation that cetaceans are viviparous and possess lungs, leading him to classify them with mammals rather than fish, exemplifies induction from anatomical particulars toward taxonomic generalisation.

Ibn al-Haytham's Optics: The Camera Obscura and Controlled Experimentation

Ibn al-Haytham's insistence on reproducible experimentation arose from his systematic study of light. His Kitāb al-Manāẓir (Book of Optics) established the intromission theory of vision against the Platonic-Euclidean extramission theory through experimental demonstration. Using the camera obscura, he showed that light travels in straight lines by observing that an image inverts when passing through a small aperture—a result incompatible with visual rays emanating from the eye.

His experiments on reflection and refraction were quantitative and reproducible. He measured angles of incidence and reflection, demonstrated the law of reflection experimentally, and studied refraction at interfaces. Crucially, he articulated that results must be independently verifiable: future investigators should be able to replicate the apparatus and confirm the findings. This methodological commitment—born from optical experimentation—became foundational to experimental science.

Francis Bacon's Novum Organum: Eliminating False Causes in Natural Philosophy

Bacon's eliminative induction was a reaction against scholastic natural philosophy, which he viewed as sterile verbal disputation disconnected from nature. His method of systematic tabulation emerged from reflection on how to establish genuine causal connections amid confounding factors.

The famous example is heat. Bacon constructed tables of presence (instances where heat is present), absence (similar instances where heat is absent), and degrees (instances varying in intensity). By comparing fire, the sun, fermenting dung, and friction-heated metals, he sought what is common to all and absent when heat is absent. The method systematically excludes candidate causes: if a property is absent when the effect is present, it cannot be the cause. This was motivated by the practical goal of understanding and controlling natural processes—Bacon's programme was explicitly oriented toward "the relief of man's estate" through technological mastery.

Descartes' Method: Analytic Geometry and Mechanical Philosophy

Descartes' methodological scepticism and deductive ideal emerged from his work in mathematics and physics. His development of analytic geometry—the identification of geometric curves with algebraic equations—demonstrated how complex problems could be decomposed into simpler elements and reconstructed through clear steps. The method of doubt was the epistemological analogue: strip away all uncertain beliefs to find an indubitable foundation, then rebuild knowledge deductively.

His mechanical philosophy sought to explain all natural phenomena through matter in motion, eliminating Aristotelian forms and final causes. Descartes' vortex theory of planetary motion, his corpuscular theory of light, and his account of animal physiology as hydraulic machinery all exemplified the programme: reduce qualitative diversity to quantitative mechanism. Though much of this physics was later superseded, the methodological ideal—clear and distinct ideas, mathematical formulation, mechanical explanation—shaped the aspirations of the Scientific Revolution.

Newton's Principia: From Phenomena to Mathematical Laws

Newton's hypotheses non fingo ("I feign no hypotheses") was a methodological stance born from the unprecedented success of mathematical physics. The Principia demonstrated that Kepler's laws of planetary motion, Galileo's laws of falling bodies, the tides, and cometary orbits all follow from three laws of motion plus universal gravitation. Newton showed this through rigorous geometrical proof.

The method was to infer force laws from observed motions (the inverse-square law from Kepler's third law), then generalise by induction to universal gravitation, and finally deduce consequences for phenomena not yet observed. When pressed on the cause of gravitational attraction, Newton refused to speculate: the mathematical law sufficed for prediction; metaphysical hypotheses about underlying mechanisms were unnecessary and potentially misleading. This stance—empirical adequacy over speculative mechanism—marked a decisive shift in what counts as scientific explanation.

2. The Hypothetico-Deductive Model

Origin in Mathematical Physics

The H-D model became explicit as philosophers reflected on mature physics. The structure—hypothesise a law, derive observational consequences, test against experiment—described what physicists from Newton onward actually did. Maxwell's derivation of the speed of electromagnetic waves from his equations, and the subsequent experimental confirmation by Hertz, exemplifies the pattern: a theoretical hypothesis yields a precise quantitative prediction, which observation then confirms.

The Quine-Duhem Problem: Auxiliary Hypotheses in Practice

Pierre Duhem, a physicist and historian of science, articulated the problem that hypotheses cannot be tested in isolation through detailed study of the history of physics. His example was optics. When Foucault's experiment showed that light travels slower in water than in air, this seemed to refute the corpuscular theory (which predicted faster propagation in denser media). But Duhem noted that the prediction depended not on the corpuscular hypothesis alone but on auxiliary assumptions about how corpuscles interact with matter. A sufficiently inventive theorist might modify auxiliaries rather than abandon the core theory.

The Michelson-Morley experiment (1887) provides another illustration. The null result for ether drift could be (and was) accommodated by hypothesising length contraction (Lorentz-FitzGerald) rather than abandoning the ether. Only in retrospect, after Einstein's special relativity provided a comprehensive alternative, did the experiment become "crucial." Duhem's point was that no single experiment forces a unique theoretical response.

Underdetermination: Historical Examples

The wave-particle debate in optics exhibited underdetermination for over a century. Both theories could accommodate known phenomena (reflection, refraction) with appropriate auxiliary hypotheses. Even interference and diffraction, which strongly favoured wave theory, did not strictly exclude sophisticated corpuscular accounts. The history of mechanical ether models shows multiple incompatible theories making identical predictions for observable phenomena—a proliferation that eventually motivated the shift toward instrumentalist or structural interpretations of theoretical claims.

3. Popper's Falsificationism

The Demarcation Problem: Einstein versus Freud

Popper developed falsificationism in 1920s Vienna, where Marxism, psychoanalysis, and individual psychology (Adler) were intellectually prominent. His central observation was that these theories could accommodate any evidence: a Freudian could explain both altruistic and selfish behaviour through the same theoretical apparatus by adjusting which defence mechanisms or drives were operative. The theories were "confirmed" by everything and hence, Popper argued, confirmed by nothing.

Einstein's general relativity provided the contrast. The 1919 Eddington expedition measured stellar positions during a solar eclipse to test the prediction that light bends in a gravitational field. The theory predicted a specific deflection angle (1.75 arcseconds for grazing rays) distinct from both Newtonian gravity (0.87 arcseconds) and no deflection. A negative result would have falsified the theory. This asymmetry—psychoanalysis is compatible with all observations, general relativity is incompatible with many—grounded the demarcation criterion.

Corroboration in Practice

For Popper, the repeated confirmation of Mercury's anomalous perihelion precession (43 arcseconds per century beyond Newtonian predictions), which general relativity explained precisely, did not "verify" the theory but "corroborated" it. The theory had survived a severe test—a quantitative prediction with no adjustable parameters. Similarly, the discovery of gravitational lensing, gravitational redshift, and (later) gravitational waves corroborated without verifying: each new test was an opportunity for falsification that the theory passed.

Lakatos's Critique: The Practice of Theory Protection

Imre Lakatos noted that scientists routinely protect theories from falsification through auxiliary hypothesis adjustment—just as Duhem observed. His point was not that this is illegitimate but that naive falsificationism misrepresents practice. Newton's gravitational theory faced anomalies (Mercury's perihelion, the moon's motion) that were not taken as refutations but as puzzles to be solved by better calculations or additional bodies. The methodological question is not whether a single anomaly should kill a theory but how to evaluate programmes over time.

4. Kuhn's Paradigm Theory

Historical Case Studies

Thomas Kuhn was a physicist turned historian of science. His Structure of Scientific Revolutions (1962) drew on detailed historical analysis, particularly the Copernican revolution and the development of chemistry.

The transition from Ptolemaic to Copernican astronomy exemplifies paradigm shift. For over a millennium, geocentric astronomy successfully predicted planetary positions to observational accuracy. Anomalies (retrograde motion, variations in brightness) were accommodated through epicycles and equants. The Copernican system initially offered no predictive advantage—Copernicus still used circular orbits and required epicycles. The shift involved a gestalt change: what had been treated as explananda (planetary stations and retrogradations) became consequences of heliocentric geometry rather than ad hoc additions.

The chemical revolution provides another example. Phlogiston theory explained combustion, calcination, and respiration within a coherent framework: phlogiston was released in burning, absorbed in reduction. Lavoisier's oxygen theory required reconceptualising these processes—but more fundamentally, it changed what questions were asked. Weight relations, which phlogiston theory struggled with (phlogiston sometimes had to have negative weight), became central to the new paradigm.

Incommensurability in Practice

Kuhn's incommensurability thesis was grounded in semantic analysis of theoretical terms. "Mass" in Newtonian mechanics and "mass" in special relativity are not synonymous: the former is an intrinsic property conserved absolutely, the latter is frame-dependent and interconvertible with energy. "Planet" meant something different before and after Copernicus (Earth was not classified as a planet in Ptolemaic astronomy).

Perceptual incommensurability draws on gestalt psychology: scientists trained in different paradigms may literally see different things when viewing the same apparatus. Kuhn cites the air pump: Boyle and his followers saw a vacuum being produced; scholastic critics saw the impossibility of creating a void being demonstrated by the pump's imperfect operation.

5. Lakatos's Research Programmes

The Newtonian Programme

Lakatos illustrated his framework using Newtonian mechanics. The hard core—the three laws of motion and universal gravitation—was held irrefutable by methodological decision. When observations conflicted, the protective belt of auxiliary hypotheses was modified: the assumption of isolated two-body systems was relaxed, atmospheric refraction was accounted for, additional gravitating bodies were postulated.

The discovery of Neptune (1846) is the canonical example of a progressive problem-shift. Uranus's observed orbit deviated from predictions. Rather than abandoning Newtonian theory, Le Verrier and Adams independently postulated an unseen planet and calculated its position from the gravitational perturbations required to explain the anomaly. Neptune was found within one degree of Le Verrier's prediction. This was not an ad hoc save but a novel prediction: the perturbation hypothesis predicted a previously unknown fact subsequently confirmed.

The Bohr Programme

Bohr's atomic model provides another illustration. The hard core—quantised electron orbits, the frequency condition relating energy transitions to emitted radiation—accommodated the hydrogen spectrum precisely. Anomalies in multi-electron atoms were addressed through auxiliary modifications: elliptical orbits, relativistic corrections, spin. The programme was progressive as long as modifications led to new confirmed predictions (the Stark effect, fine structure). It became degenerating when modifications were purely accommodative without novel empirical content—a state that motivated the transition to quantum mechanics.

6. Feyerabend's Methodological Anarchism

Galileo's Telescope

Feyerabend's central historical example is Galileo's use of the telescope to argue for Copernican astronomy. By strict methodological standards of the time, telescopic observations were problematic: the instrument's theory was not understood, terrestrial and celestial applications might differ, and trained observers reported different things. Aristotelian philosophers were not being irrational to question whether apparent moons of Jupiter were optical artefacts.

Galileo succeeded not by adhering to established method but by propaganda, persuasion, and gradual establishment of telescope reliability through terrestrial applications. He violated the consistency condition (his mechanics was incompatible with the tower argument against Earth's motion, which he answered with the ship analogy—a thought experiment, not an experimental result). He proliferated theories and arguments, some valid, some fallacious, in a rhetorical campaign. Feyerabend's point: strict methodological rules would have prevented one of the most important scientific advances.

The Historical Record

Feyerabend examined multiple episodes: the survival of atomism despite apparent refutation by the theory of Brownian motion, the persistence of Aristotelian physics despite anomalies, the rehabilitation of acupuncture and herbal medicine in China. His conclusion was empirical: there is no methodological rule that was not violated in some successful scientific episode, and such violations were often essential to progress. "Anything goes" is not a recommendation but a description of what the historical record shows.

7. Bayesian Confirmation Theory

Historical Roots: Laplace and Inverse Probability

Bayesian reasoning has roots in Laplace's celestial mechanics. Determining the orbit of a newly discovered comet from a few observations required estimating parameters (orbital elements) from data—an inverse problem. Laplace developed methods for combining observations with prior information, treating the probability of hypotheses given evidence. His argument for the stability of the solar system was implicitly Bayesian: given the observed near-circularity and coplanarity of planetary orbits, certain initial conditions are more probable than others.

Modern Applications: Particle Physics

The discovery of the Higgs boson (2012) illustrates Bayesian thinking in contemporary practice. The standard model predicted a particle with specific properties; experiments accumulated data on decay channels. The analysis combined multiple channels, each with different background rates and signal expectations. The "5-sigma" threshold for discovery is a frequentist criterion, but the interpretation—that the signal is overwhelmingly more probable under the Higgs hypothesis than under background-only—is naturally Bayesian.

Bayes factors allow quantification of relative evidence. In searches for gravitational waves, the LIGO-Virgo collaboration uses Bayesian model comparison to assess whether a signal is more consistent with a compact binary merger than with instrumental noise. Prior distributions for source parameters (masses, spins, distances) are informed by astrophysical population models.

Cosmological Parameter Estimation

Modern cosmology is fundamentally Bayesian. The Planck satellite's measurements of the cosmic microwave background are analysed by sampling the posterior distribution of cosmological parameters (Hubble constant, matter density, dark energy equation of state) given the power spectrum data. The tension between Planck-derived and local measurements of H₀ is expressed in terms of posterior distributions with minimal overlap.

8. Frequentist Statistical Frameworks

Fisher at Rothamsted: Agricultural Experiments

Ronald Fisher developed the foundations of frequentist statistics while working at the Rothamsted Experimental Station (1919–1933). The practical problem was comparing crop yields under different treatments (fertilisers, cultivars, soil preparations) in the presence of spatial variation in soil fertility. Fisher invented randomisation, blocking, and Latin square designs to control confounding.

The analysis of variance (ANOVA) emerged from partitioning yield variation into components attributable to treatment effects versus random variation. The F-test, the concept of degrees of freedom, and maximum likelihood estimation all developed in this agricultural context. Fisher's Statistical Methods for Research Workers (1925) codified methods designed for the practical constraints of field experiments: limited replication, uncontrolled environmental variation, and the need for actionable conclusions.

Neyman-Pearson: Industrial Quality Control

Jerzy Neyman and Egon Pearson developed their framework partly in response to industrial quality control problems. The question was not "what does this specific sample tell us?" (Fisher's emphasis) but "what decision rule minimises long-run error rates?" In manufacturing, accepting or rejecting a batch based on sampling inspection requires balancing the cost of rejecting good batches (Type I error) against the cost of accepting bad ones (Type II error).

The Neyman-Pearson lemma (1933) showed that likelihood ratio tests are optimal—no other test has higher power at the same significance level. This result was abstract mathematics, but the framework was motivated by industrial and bureaucratic contexts where repeated decisions under uncertainty required systematic error control.

9–11. The Replication Crisis and Institutional Response

The Psychology Reproducibility Project

The 2015 Open Science Collaboration attempted to replicate 100 published psychology experiments. The headline result—only 36% replicated by the original criterion of statistical significance—prompted reflection on methodological practices. But the scientific content was equally important. Effect sizes in replications were on average half those in original studies, suggesting systematic inflation in the published literature.

Specific examples illuminate the problem. Social priming effects, such as the claim that exposure to words related to old age causes slower walking, failed to replicate in well-powered preregistered studies. Ego depletion—the idea that self-control is a limited resource diminished by exertion—showed drastically reduced effect sizes in multi-lab replications. These were not obscure findings but textbook results informing theory in social and cognitive psychology.

Preclinical Cancer Biology

The Begley and Ellis (2012) report that only 11% of landmark preclinical cancer studies could be reproduced was particularly consequential because failed replications represented millions of dollars of wasted drug development. The scientific content involved cell biology: drug targets identified in cancer cell lines often did not translate to animal models or clinical trials. Reasons included genetic instability of cell lines (many "cancer cell lines" were contaminated or misidentified), failure to account for tumour microenvironment, and irreproducible antibody reagents.

Institutional Responses

The replication crisis motivated concrete institutional changes. Registered reports, where journals commit to publishing based on the methodology regardless of results, address publication bias at source. The requirement for data sharing (NIH 2023 policy) enables independent verification and meta-analysis. Reporting guidelines (CONSORT, ARRIVE) standardise methods documentation. These are not merely bureaucratic requirements but responses to specific scientific failures with specific scientific remedies.

12. Meta-Analysis Methodology

Origins in Medicine: The Cochrane Collaboration

Meta-analysis developed to address a practical problem in evidence-based medicine: combining results from multiple clinical trials with different sample sizes, designs, and patient populations. The Cochrane Collaboration (founded 1993) systematised procedures for identifying studies, assessing bias, extracting effect sizes, and combining estimates.

The scientific motivation was the recognition that individual trials, especially in medicine, are often underpowered. A single trial might show a non-significant trend toward benefit; combining multiple such trials might reveal a consistent effect obscured by sampling variation. Conversely, apparent effects might vanish when publication bias is corrected—small positive studies were published while small null studies languished in file drawers.

Heterogeneity and Moderator Analysis

The I² statistic quantifies the proportion of variability due to genuine between-study differences rather than sampling error. High heterogeneity motivates moderator analysis: why do studies differ? Perhaps the treatment works better in certain populations, at certain doses, or with certain outcome measures. This returns meta-analysis to scientific content: heterogeneity is not merely a statistical nuisance but an indicator of theoretically important moderating factors.

13. Formal Sciences: Mathematical Foundations

Gödel's Incompleteness Theorems

Kurt Gödel's 1931 incompleteness theorems arose from the foundational crisis in mathematics—the attempt to place mathematics on rigorous axiomatic foundations after paradoxes (Russell's paradox, the Burali-Forti paradox) threatened naive set theory. Hilbert's programme sought to prove the consistency of arithmetic by finite methods.

Gödel showed this was impossible. Any sufficiently powerful formal system (capable of expressing elementary arithmetic) contains true statements that cannot be proved within the system, and the system cannot prove its own consistency if consistent. The scientific consequence is that mathematical certainty is conditional: we cannot prove that our axioms are consistent, only explore their consequences.

Turing and Computability

Alan Turing's analysis of computability (1936) was motivated by the Entscheidungsproblem—the question of whether there exists a mechanical procedure to decide the truth of any mathematical statement. Turing showed there is no such procedure by defining computation precisely (Turing machines) and proving the undecidability of the halting problem.

This had scientific consequences for computer science. Program correctness cannot be verified algorithmically in general (Rice's theorem). Testing can find bugs but cannot prove their absence (Dijkstra's observation). Verification must proceed through formal proof with human guidance, or through testing that provides statistical confidence but not certainty.

14. Mathematics-Physics Relationship

Wigner's Puzzle in Concrete Terms

Eugene Wigner's 1960 essay "The Unreasonable Effectiveness of Mathematics" was prompted by specific examples from physics. Group theory, developed as pure mathematics (Galois, Abel, Lie), turned out to describe symmetries fundamental to particle physics. Complex numbers, introduced to solve polynomials, became essential for quantum mechanics. Riemannian geometry, developed abstractly by Riemann in 1854, provided the mathematical framework for general relativity sixty years later.

The puzzle is genuine: mathematical structures developed for purely intellectual reasons, without empirical motivation, repeatedly find precise application to physical reality. Non-Euclidean geometries were logical curiosities before they described spacetime. Hilbert spaces were abstract generalisations before they became the state spaces of quantum mechanics.

QED Precision: The Anomalous Magnetic Moment

Quantum electrodynamics achieves extraordinary precision. The anomalous magnetic moment of the electron (the g-2 factor) has been calculated to twelve significant figures and measured to comparable precision—agreement to parts per trillion. This is the most precisely tested prediction in the history of science.

Yet QED is formulated using mathematical objects (continuous fields, point particles, renormalisation procedures) whose physical interpretation is problematic. How can structures that seem idealisations describe reality so precisely? Structural realism offers a response: what the theory gets right is relational structure—how quantities relate to each other—not the intrinsic nature of the entities involved.

15. Methodology Across Disciplines

Physics: The Paradigm of Mathematical Law

Physics has served as the model of natural science since Newton. The H-D method fits physics well because theories make precise quantitative predictions derivable from mathematical laws. The solar system is sufficiently simple (few bodies, known force law) that predictions can be computed and compared with observation.

But even within physics, methodological diversity exists. Cosmology cannot perform controlled experiments; it relies on observation of a single historical sequence (the universe's evolution). Solid-state physics deals with emergent phenomena (superconductivity, magnetism) requiring approximation methods rather than exact solutions. Complexity increases from particle physics to condensed matter to biophysics.

Biology: Historical Contingency and Functional Explanation

Biology resists the physics model. Evolutionary explanations are historical: why do mammals have three inner-ear bones? Because of contingent events in the evolution of the mammalian jaw. The answer is not a timeless law but a historical narrative. Selection pressures vary with environment; there is no universal law of ear-bone number.

Functional explanation pervades biology in ways foreign to physics. We explain the heart by what it does (pump blood), not merely by its physical composition. This teleological language is not metaphysical commitment to purposes but recognition that biological structures are selected for their effects. Methodologically, this means biological explanation requires understanding function, not just mechanism.

Earth Sciences: Limited Experimental Control

Geology and climate science face distinctive methodological challenges. Experiments are possible (laboratory simulations, numerical models) but cannot replicate the scale of planetary processes. The geological record is a natural archive, but incomplete and requiring interpretation through uniformitarian assumptions (present processes explain past formations).

Climate science combines physics (radiation balance, fluid dynamics) with historical data (ice cores, tree rings, sediment records) and numerical simulation. Predictions concern ensemble averages and probability distributions, not deterministic forecasts. Attribution studies—did climate change cause a specific extreme event?—require counterfactual reasoning: what would have happened without anthropogenic forcing?

16. Computational Simulation

Molecular Dynamics and Climate Models

Molecular dynamics simulations integrate Newton's equations for systems of thousands to billions of particles using empirical force fields. The equations are known; the interest lies in emergent behaviour (phase transitions, protein folding, material properties) that cannot be derived analytically. Verification asks whether the code correctly solves the equations; validation asks whether the force field accurately represents interatomic interactions.

Climate models discretise fluid dynamics equations on a global grid, parameterising sub-grid processes (cloud formation, turbulence) that cannot be resolved. Different models make different parameterisation choices, leading to ensemble uncertainty. Model intercomparison projects (CMIP) assess spread across models as a measure of structural uncertainty beyond parameter uncertainty within any single model.

The Epistemic Status Question

Simulations produce data requiring interpretation, like experiments. But unlike experiments, simulations derive from theoretical assumptions—the model is theory instantiated in code. This creates a unique epistemology. When a simulation exhibits unexpected behaviour, is this a discovery about the physical system or an artefact of numerical error, discretisation, or flawed assumptions?

Validation against experimental data is essential but incomplete: a model fitting known data might fail for novel conditions. Climate projections for 2100 cannot be validated against observation. The epistemic status of such predictions depends on physical reasoning about model components, sensitivity analyses, and expert judgement—a more complex evidential relationship than simple H-D confirmation.

17. Contemporary Philosophy of Science

Methodological Pluralism

The search for the scientific method has been abandoned not because of philosophical fashion but because of accumulated evidence from the history and sociology of science. Detailed case studies show that successful science employs different methods depending on domain, historical context, and practical constraints. Physics, biology, psychology, and geology each have distinctive evidential structures, experimental possibilities, and explanatory forms.

This pluralism is not relativism. Within each domain, there are better and worse methods, more and less reliable results, genuine progress and genuine error. But the criteria are domain-specific. What counts as strong evidence in particle physics (5-sigma discovery threshold) differs from epidemiology (meta-analyses, Bradford Hill criteria) or paleontology (stratigraphy, phylogenetic bracketing).

Values in Science

The traditional view that non-epistemic values should not influence scientific conclusions has been challenged. The choice of significance threshold (α = 0.05 vs. 0.01) is a methodological decision with value implications: how bad is a false positive relative to a false negative? In drug approval, this is explicitly an ethical question trading off type I and type II errors. In basic research, default thresholds encode implicit judgements.

Inductive risk arguments generalise the point. Any decision about when evidence suffices involves risk of error; the acceptable risk depends on stakes, which are value-laden. This does not mean science is arbitrary but that methodological choices cannot be purely value-free. The response is transparency: make value judgements explicit rather than pretending to value-neutrality.

18. Synthesis

Unifying Commitments Across Sciences

Despite methodological diversity, scientific inquiry shares commitments that distinguish it from other knowledge practices:

Empirical testability (for natural sciences): Claims must connect to observable consequences. Even where direct experimentation is impossible (cosmology, paleontology), theories must make contact with evidence.

Logical rigor: Arguments should be valid; conclusions should follow from premises; formal systems should be consistent. The formal sciences exemplify this directly; natural sciences aspire to it within the constraints of empirical content.

Systematic doubt: Established beliefs are subject to revision. Peer review, replication, and criticism institutionalise scepticism. The replication crisis shows what happens when these practices weaken.

Public scrutiny: Methods and data should be available for evaluation. The transparency movement addresses failures of openness that enabled the replication crisis.

Willingness to revise: Scientific knowledge is provisional. Even well-established theories (Newtonian mechanics, classical genetics) have been revised or superseded. This fallibilism is not weakness but strength—science learns from error.

Integration of Formal and Empirical

The relationship between mathematics and physics remains philosophically contested but practically productive. Mathematical physics proceeds by formulating empirical content in mathematical language, deriving consequences, and testing predictions. Structural realism provides one account of why this works: physical systems instantiate abstract structures that mathematics describes.

Computational science occupies a distinctive position, applying theoretical equations but producing results that cannot be derived analytically and function more like experimental data. The verification/validation distinction acknowledges this dual character. Simulation is neither pure theory nor pure experiment but a third mode of scientific investigation with its own epistemology.

Lessons from the Replication Crisis

The replication crisis has been, paradoxically, a success story for self-correcting science. The problems—publication bias, p-hacking, undisclosed flexibility—were identified through empirical investigation of scientific practices. The remedies—preregistration, data sharing, reporting guidelines—emerged from the scientific community. The episode illustrates both the vulnerability of science to methodological error and its capacity for reform.

The deeper lesson concerns the relationship between methodology and institutional structure. Individual scientists face incentives (publication pressure, career advancement) that can conflict with epistemic goals. Good methodology requires not just individual virtue but institutional design: journals that accept registered reports, funders that require data sharing, metrics that value replication over novelty.

Conclusion

The philosophy of science is not a body of a priori doctrine but a reflective enterprise shaped by scientific practice. Each major philosophical position emerged from engagement with specific scientific developments: Popper's falsificationism from contrasting Einstein with Freud, Kuhn's paradigm theory from historical case studies, Lakatos's research programmes from analysing Newton and Bohr, contemporary Bayesianism from parameter estimation in physics and cosmology.

This extended summary has traced these connections, showing how abstract epistemological concepts—falsifiability, incommensurability, confirmation, research programmes—arise from concrete scientific work. The goal is not to reduce philosophy to history but to show their mutual dependence. Scientific practice is philosophically informed (if often implicitly); philosophical reflection is scientifically grounded (when done well).

For mathematical physics and computational science specifically, the lessons are: mathematical certainty is conditional on unprovable assumptions, the applicability of mathematics to physics is a genuine puzzle that structural realism may help address, and simulation occupies a distinctive epistemic niche requiring both verification and validation. These are not mere philosophical curiosities but methodological considerations affecting how we should interpret and trust our results.

References (For Further Reading)

The original summary provides the primary sources. This extended summary has drawn on secondary historical literature including:

Crombie, A.C. Styles of Scientific Thinking in the European Tradition
Lindberg, D.C. The Beginnings of Western Science
Galison, P. How Experiments End
Shapin, S. & Schaffer, S. Leviathan and the Air-Pump
Hacking, I. Representing and Intervening
Winsberg, E. Science in the Age of Computer Simulation
Oreskes, N. & Conway, E. Merchants of Doubt
Gigerenzer, G. et al. The Empire of Chance